Main Content

Acquire Image and Skeletal Data Using Kinect V1

In Detect the Kinect V1 Devices, you see that the two sensors on the Kinect® for Windows® device are represented by two device IDs, one for the color sensor and one of the depth sensor. In that example, Device 1 is the color sensor and Device 2 is the depth sensor. This example shows how to create a videoinput object for the color sensor to acquire RGB images and then for the depth sensor to acquire skeletal data.

  1. Create the videoinput object for the color sensor. DeviceID 1 is used for the color sensor.

    vid = videoinput('kinect',1,'RGB_640x480');
  2. Look at the device-specific properties on the source device, which is the color sensor on the Kinect camera.

    src = getselectedsource(vid);
    
    src
    
    Display Summary for Video Source Object:
     
          General Settings:
            Parent = [1x1 videoinput]
            Selected = on
            SourceName = ColorSource
            Tag = 
            Type = videosource
     
          Device Specific Properties:
            Accelerometer = [0.0 -1.0 0.0]
            AutoExposure = on
            AutoWhiteBalance = on
            BacklightCompensation = AverageBrightness
            Brightness = 0.2156
            CameraElevationAngle = 3
            Contrast = 1
            ExposureTime = 1.0
            FrameInterval = 0
            FrameRate = 30
            Gain = 0
            Gamma = 2.2
            Hue = 0
            PowerLineFrequency = Disabled
            Saturation = 1
            Sharpness = 0.5
            WhiteBalance = 2700

    As you can see in the output, the color sensor has a set of device-specific properties.

    Device-Specific Property – Color SensorDescription
    AccelerometerReturns 3-D vector of acceleration data for both the color and depth sensors. The data is updated while the device is running or previewing.

    This 1 x 3 double represents the x, y, and z values of acceleration in gravity units g (9.81m/s^2). For example,

    [0.06 -1.00 -0.09]

    represents values of x as 0.06 g, y as -1.00 g, and z as -0.09 g.

    AutoExposureUse to set the exposure automatically. This control whether other related properties are activated. Values are on (default) and off.

    on means that exposure is set automatically, and these properties are not able to be set and will throw a warning: FrameInterval, ExposureTime, and Gain.

    off means that these properties are not able to be set and will throw a warning: PowerLineFrequency, BacklightCompensation, and Brightness.

    AutoWhiteBalanceUse to enable or disable automatic white balance setting.

    on (default) means that it will automatically configure white balance and the WhiteBalance property cannot be set.

    off means that the WhiteBalance property is settable.

    BacklightCompensationConfigures backlight compensation modes to adjust the camera to capture images dependent on environmental conditions.

    Note that this property is only valid if AutoExposure is set to Enabled. The default is AverageBrightness.

    Values are:

    AverageBrightness favors an average brightness level

    CenterPriority favors the center of the scene

    LowLightsPriority favors a low light level

    CenterOnly favors the center only

    BrightnessIndicates the brightness level. The value range is 0.0 to 1.0, and the default value is 0.2156.

    Note that this property is only valid if AutoExposure is set to Enabled.

    CameraElevationAngleControls the angle of the sensor lens. This is the camera angle relative to the ground. The value must be an integer property with range of -27 to 27 degrees. The default value is the last set value, since this is a sticky setting. Only set it if you want to change the angle of the camera. This property is shared with the depth sensor also.
    ContrastIndicates contrast level. Values must be in the range 0.5 to 2, with a default value of 1.
    ExposureTimeIndicates the exposure time in increments of 1/10,000 of a second. The value range is 0 to 4000, and the default is 0.

    Note that this property is only valid if AutoExposure is set to Disabled.

    FrameIntervalIndicates the frame interval in units of 1/10,000 of a second. The value range is 0 to 4000, and the default is 0.

    Note that this property is only valid if AutoExposure is set to Disabled.

    FrameRateFrames per second for the acquisition. This property is read only and the possible values for the color sensor are 12, 15, and 30 (default). It reflects the actual frame rate when running.
    GainIndicates a multiplier for the RGB color values. The value range is 1.0 to 16.0, and the default is 1.0.

    Note that this property is only valid if AutoExposure is set to Disabled.

    GammaIndicates gamma measurement. Values must be in the range 1 to 2.8, with a default value of 2.2.
    HueIndicates hue setting. Values must be in the range -22 to 22, with a default value of 0.
    PowerLineFrequencyOption for reducing flicker caused by the frequency of a power line. Values are Disabled, FiftyHertz, and SixtyHertz. The default is Disabled.

    Note that this property is only valid if AutoExposure is set to Enabled.

    SaturationIndicates saturation level. Values must be in the range 0 to 2, with a default value of 1.
    SharpnessIndicates sharpness level. Values must be in the range 0 to 1, with a default value of 0.5.
    WhiteBalanceIndicates color temperature in degrees Kelvin. The value range is 2700 to 6500 and the default is 2700.

    Note that this property is only valid if AutoWhiteBalance is set to Disabled.

  3. You can optionally set some of these properties shown in the previous step. For example, you might be acquiring images in a low light situation. You could adjust the acquisition for this by setting the BacklightCompensation property to LowLightsPriority, which favors a low light level.

    src.BacklightCompensation = 'LowLightsPriority';
  4. Preview the color stream by calling preview on the color sensor object you created.

    preview(vid);

    When you are done previewing, close the preview window.

    closepreview(vid);
  5. Create the videoinput object for the depth sensor. Note that a second object is created (vid2), and DeviceID 2 is used for the depth sensor.

    vid2 = videoinput('kinect',2,'Depth_640x480');
  6. Look at the device-specific properties on the source device, which is the depth sensor on the Kinect.

    src = getselectedsource(vid2);
    
    src
    
    Display Summary for Video Source Object:
     
          General Settings:
            Parent = [1x1 videoinput]
            Selected = on
            SourceName = DepthSource
            Tag = 
            Type = videosource
     
          Device Specific Properties:
            Accelerometer = [0.0 -1.0 0.0]
            BodyPosture = Standing
            CameraElevationAngle = 4
            DepthMode = Default
            FrameRate = 30
            IREmitter = on        
            SkeletonsToTrack = [1x0 double]
            TrackingMode = off

    As you can see in the output, the depth sensor has a set of device-specific properties associated with skeletal tracking. These properties are specific to the depth sensor.

    Device-Specific Property – Depth SensorDescription
    AccelerometerReturns 3-D vector of acceleration data for both the color and depth sensors. The data is updated while the device is running or previewing.

    This 1 x 3 double represents the x, y, and z values of acceleration in gravity units g (9.81m/s^2). For example,

    [0.06 -1.00 -0.09]

    represents values of x as 0.06 g, y as -1.00 g, and z as -0.09 g.

    BodyPostureIndicates whether the tracked skeletons are standing or sitting. Values are Standing (gives 20 point skeleton data) and Seated (gives 10 point skeleton data, using joint indices 2 - 11). Standing is the default.

    Note that if BodyPosture is set to Seated mode, and TrackingMode is set to Position, no position is returned, since Position is the location of the hip joint and the hip joint is not tracked in Seated mode.

    See the subsection “BodyPosture Joint Indices” at the end of this example for the list of indices of the 20 skeletal joints.

    CameraElevationAngleControls the angle of the sensor lens. This is the camera angle relative to the ground. The value must be an integer property with range of -27 to 27 degrees. The default value is the last set value, since this is a sticky setting. Only set it if you want to change the angle of the camera. This property is shared with the color sensor also.
    DepthModeIndicates the range of depth in the depth map. Values are Default (range of 50 to 400 cm) and Near (range of 40 to 300 cm).
    FrameRateFrames per second for the acquisition. This property is read only and is fixed at 30 for the depth sensor for all formats. It reflects the actual frame rate when running.
    IREmitterControls whether the IR emitter is on or off. Values are on and off. Initially, the default value is on. However, this is a sticky property, so the default value is the last set value. If you set it to off, it will remain off in future uses until you change the setting.

    An advantage of this property is that it is useful when using multiple Kinect devices to avoid interference.

    SkeletonsToTrackIndicates the Skeleton Tracking ID returned as part of the metadata. Values are:

    [] Default tracking

    [TrackingID1] Track 1 skeleton with Tracking ID = TrackingID1

    [TrackingID1 TrackingID2] Track 2 skeletons with Tracking IDs = TrackingID1 and TrackingID2

    TrackingModeIndicates tracking state. Values are:

    Skeleton tracks full skeleton with joints

    Position tracks hip joint position only

    Off disables skeleton position tracking (default)

    Note that if BodyPosture is set to Seated mode, and TrackingMode is set to Position, no position is returned, since Position is the location of the hip joint and the hip joint is not tracked in Seated mode.

  7. Start the second videoinput object (the depth stream).

    start(vid2);
  8. Skeletal data is accessed as metadata on the depth stream using getdata.

    % Get the data on the object.
    [frame, ts, metaData] = getdata(vid2);
    
    % Look at the metadata to see the parameters in the skeletal data.
    metaData
    
    metaData = 
     
    10x1 struct array with fields:
        AbsTime: [1x1 double]
        FrameNumber: [1x1 double]
        IsPositionTracked: [1x6 logical]
        IsSkeletonTracked: [1x6 logical] 
        JointDepthIndices: [20x2x6 double]
        JointImageIndices: [20x2x6 double]
        JointTrackingState: [20x6 double]
        JointWorldCoordinates: [20x3x6 double]
        PositionDepthIndices: [2x6 double]
        PositionImageIndices: [2x6 double]
        PositionWorldCoordinates: [3x6 double]
        RelativeFrame: [1x1 double]
        SegmentationData: [640x480 double]
        SkeletonTrackingID: [1x6 double]
        TriggerIndex: [1x1 double]

    These metadata fields are related to tracking the skeletons.

    MetaData Description
    AbsTimeA 1 x 1 double that represents the full timestamp, including date and time, in MATLAB® clock format.
    FrameNumberA 1 x 1 double that represents the frame number.
    IsPositionTrackedA 1 x 6 Boolean matrix of true/false values for the tracking of the position of each of the six skeletons. A 1 indicates the position is tracked and a 0 indicates it is not.
    IsSkeletonTrackedA 1 x 6 Boolean matrix of true/false values for the tracked state of each of the six skeletons. A 1 indicates it is tracked and a 0 indicates it is not.
    JointDepthIndicesIf the BodyPosture property is set to Standing, this is a 20 x 2 x 6 double matrix of x-and y-coordinates for 20 joints in pixels relative to the depth image, for the six possible skeletons. If BodyPosture is set to Seated, this would be a 10 x 2 x 6 double for 10 joints.
    JointImageIndicesIf the BodyPosture property is set to Standing, this is a 20 x 2 x 6 double matrix of x-and y-coordinates for 20 joints in pixels relative to the color image, for the six possible skeletons. If BodyPosture is set to Seated, this would be a 10 x 2 x 6 double for 10 joints.
    JointTrackingStateThis 20 x 6 integer matrix contains enumerated values for the tracking accuracy of each joint for all six skeletons. Values include:

    0 not tracked

    1 position inferred

    2 position tracked

    JointWorldCoordinatesA 20 x 3 x 6 double matrix of x-, y- and z-coordinates for 20 joints, in meters from the sensor, for the six possible skeletons, if the BodyPosture is set to Standing. If it is set to Seated, this would be a 10 x 3 x 6 double for 10 joints.

    See step 9 for the syntax on how to see this data.

    PositionDepthIndicesA 2 x 6 double matrix of X and Y coordinates of each skeleton in pixels relative to the depth image.
    PositionImageIndicesA 2 x 6 double matrix of X and Y coordinates of each skeleton in pixels relative to the color image.
    PositionWorldCoordinatesA 3 x 6 double matrix of the X, Y and Z coordinates of each skeleton in meters relative to the sensor.
    RelativeFrameThis 1 x 1 double represents the frame number relative to the execution of a trigger if triggering is used.
    SegmentationDataImage size double array with each pixel mapped to a tracked/detected skeleton, represented by numbers 1 to 6. This segmentation map is a bitmap with pixel values corresponding to the index of the person in the field-of-view who is closest to the camera at that pixel position. A value of 0 means there is no tracked skeleton.
    SkeletonTrackingIDThis 1 x 6 integer matrix contains the tracking IDs of all six skeletons. These IDs track specific skeletons using the SkeletonsToTrack property in step 5.

    Tracking IDs are generated by the Kinect and change from acquisition to acquisition.

    TriggerIndexA 1 x 1 double that represents the trigger the event is associated with if triggering is used.
  9. Look at any individual property by drilling into the metadata. For example, look at the IsSkeletonTracked property.

    metaData.IsSkeletonTracked
    
    ans = 
     
         1     0     0     0     0     0

    In this case the data shows that of the six possible skeletons, there is one skeleton being tracked and it is in the first position. If you have multiple skeletons, this property is useful to confirm which ones are being tracked.

  10. Get the joint locations for the first person in world coordinates using the JointWorldCoordinates property. Since this is the person in position 1, the index uses 1.

    metaData.JointWorldCoordinates(:,:,1)
    
    ans =
     
       -0.1408   -0.3257    2.1674
       -0.1408   -0.2257    2.1674
       -0.1368   -0.0098    2.2594
       -0.1324    0.1963    2.3447
       -0.3024   -0.0058    2.2574
       -0.3622   -0.3361    2.1641
       -0.3843   -0.6279    1.9877
       -0.4043   -0.6779    1.9877
        0.0301   -0.0125    2.2603
        0.2364    0.2775    2.2117
        0.3775    0.5872    2.2022
        0.4075    0.6372    2.2022
       -0.2532   -0.4392    2.0742
       -0.1869   -0.8425    1.8432
       -0.1869   -1.2941    1.8432
       -0.1969   -1.3541    1.8432
       -0.0360   -0.4436    2.0771
        0.0382   -0.8350    1.8286
        0.1096   -1.2114    1.5896
        0.1196   -1.2514    1.5896

    The columns represent the X, Y, and Z coordinates in meters of the 20 points on skeleton 1.

  11. Optionally view the segmentation data as an image.

    % View the segmentation data as an image.
    imagesc(metaDataDepth.SegmentationData);
    % Set the color map to jet to color code the people detected.
    colormap(jet);
    

BodyPosture Joint Indices

The BodyPosture property, in step 5, indicates whether the tracked skeletons are standing or sitting. Values are Standing (gives 20 point skeleton data) and Seated (gives 10 point skeleton data, using joint indices 2 - 11).

This is the order of the joints returned by the Kinect adaptor:

   Hip_Center = 1;
   Spine = 2;
   Shoulder_Center = 3;
   Head = 4;
   Shoulder_Left = 5;
   Elbow_Left = 6;
   Wrist_Left = 7;
   Hand_Left = 8;
   Shoulder_Right = 9;
   Elbow_Right = 10;
   Wrist_Right = 11;
   Hand_Right = 12;
   Hip_Left = 13;
   Knee_Left = 14;
   Ankle_Left = 15;
   Foot_Left = 16; 
   Hip_Right = 17;
   Knee_Right = 18;
   Ankle_Right = 19;
   Foot_Right = 20;

When BodyPosture is set to Standing, all 20 indices are returned, as shown above. When BodyPosture is set to Seated, numbers 2 through 11 are returned, since this represents the upper body of the skeleton.