Actor metadata#

Overview#

The ground truth exposed in this modality is the identity, behavior, position, and orientation of the actor in your rendered scene.

This modality consists of the following file:

Relevant file

Location

actor_metadata.json

Datapoint folder

actor_metadata.json#

This file contains the parameters that were used to generate the human subject at the center of the scene. It has the following format:

{
    "version": "1.0.0",
    "identity_label": {
        "age": "young",
        "gender": "female",
        "ethnicity": "southeast_asian"
    },
    "identity_id": "56fb1462-c05a-4f87-8913-7754ebbd0fd9",
    "facial_hair_included": false,
    "face_expression": {
        "name": "happiness",
        "intensity_level": 2
    },
    "head_metadata": {
        "head_root_location": {
            "x": 0.0,
            "y": 0.0,
            "z": 0.0
        },
        "head_rotation": {
            "pitch": 0.0,
            "yaw": 0.0,
            "roll": 0.0
        },
        "head_six_dof": {
             "location": {
                 "x": -0.0020751950796693563,
                 "y": -0.06417405605316162,
                 "z": 0.14418649673461914
             },
             "look_at_vector": {
                 "x": 8.74227803627227e-08,
                 "y": -0.9999999999999962,
                 "z": 9.553428304548457e-16
             }
        }
    }
    "accessories": [
       //If the actor is wearing an accessory, accessory information appears here.
    ],
}

Note

There are additional fields in actor_metadata.json that are part of the Eye keypoints modality.

Objects and fields:

  • version: String. Version tracking for this file. Whenever you access this file in a datapoint, make sure to check that the version matches what you expect it to be; otherwise its format and fields may not be recognized.

  • identity_label: Object. This object contains fields that describe the identity of the subject:
    • age: String. The age range of the actor in the scene. Valid values are young for age 18-30; adult for ages 31-50; and older for ages 51+.

    • gender: String. The gender of the actor in the scene. Valid values are male and female.

    • ethnicity: String. The ethnicity of the actor in the scene. Valid values are african, hispanic, north_european, mediterranean, south_asian, and southeast_asian.

  • identity_id: String. This string contains the unique ID of the subject in this scene, which defines which of Datagen’s identities was used to generate the facial shape and texture (before age, gender, and ethnicity are applied).

  • facial_hair_included: Boolean. This value is true if the subject is male and you decided to include male facial hair when generating your dataset; otherwise it is false.

  • face_expression: Object. This object contains two fields that determine the expression on the subject’s face:
    • name: String. Describes the expression on the subject’s face. Valid values are happiness, sadness, surprise, anger, fear, contempt, disgust, mouth_open, and none. When you generated your dataset, you selected which expressions to include and the probability of each one.

    • Intensity_level: Int. A measure of the strength of the above emotion displayed on the subject’s face, from 1 (mild) to 5 (intense). If the subject’s emotion is none, this value will always be 1.

  • head_metadata: Object. This object contains a series of objects and fields that describe the position and orientation of the actor’s head.

    • head_root_location: Object. This object contains three Floats named x, y, and z, and gives the 3D coordinates of the root of the head, defined as a point in the front of the neck. See About our coordinate systems for details.

      When you created your dataset, you either defined the head’s position explicitly or gave the Datagen platform a range of valid positions. In the latter case, our system uses a uniform distribution to select a random position in that range and places the actor’s head at that point.

    • head_rotation: Object. This object contains three Floats named yaw, pitch, and roll. The values are measured in degrees; see About our coordinate systems for details. When you created your dataset, you either defined the head’s orientation explicitly or gave the Datagen platform a range of valid orientations. In the latter case, our system uses a uniform distribution to select a random orientation in that range. The head is rotated about the neck within realistic human physiological limits, and the neck is realistically stretched by those head movements but not otherwise moved.

    • head_six_dof: Object. This object provides location and rotation data for the subject based on iBUG keypoint 28, which is the keypoint directly between the eyes (see Facial keypoints (iBUG)).

      • location: Object. This object contains three Floats named x, y, and z, giving the coordinates of keypoint 28 in global coordinates. See About our coordinate systems for details.

      • look_at_vector: Object. This object contains the normalized x, y, and z values of the vector that defines the direction that the head is pointed.

        Note

        Important: The look-at vector defines the orientation of the head – NOT the direction of the subject’s gaze. See the section below.

        • x: Float. At the head’s default position and orientation, the x axis runs from left to right. Lowering the value of x in the look-at vector turns the head to its right (the camera’s left); raising the value of x in the look-at vector turns the head to its left (the camera’s right). In the default position, the value of x will be very close to 0, because the head is looking straight at the camera, perpendicular to the x axis.

        • y: Float. At the head’s default position and orientation, the y axis runs from front to back. Lowering the value of y in the look-at vector turns the head towards the camera, while raising the value of y in the look-at vector turns the head away from the camera. In the default position, the value of y will be very close to -1, because the head is looking straight at the camera in the -y direction.

        • z: Float. At the head’s default position and orientation, the z axis runs from bottom to top. Lowering the value of z in the look-at vector turns the head downwards, while raising the value of z in the look-at vector turns the head upwards. In the default position, the value of z will be very close to 0, because the head is looking straight at the camera, perpendicular to the z axis.

  • accessories: List. This object is a list of accessories that the actor is wearing. When there are no accessories, it uses this format:

    "accessories": [],
    

    When there are accessories, each accessory uses its own format:

    • glasses: A glasses accessory has the following format:

      {
         "type": "glasses",
         "style": "aviator",
         "location": "on_nose",
         "lens_color": "apple_green",
         "reflectivity": 7.0,
         "transparency": 5.0
      }
      

      It contains the following fields:

      • type (String): The type of accessory - in this case, glasses.

      • style (String): The style of the glasses. Valid values include aviator, oval, oversized, rimless, geometric, browline, full_frame, round, and cat_eye.

      • location (String): The location of the accessory on the face. For glasses, the only valid value is currently on_nose.

      • lens_color (String): The color of the glasses lenses. Valid values include light_yellow, apple_green, light_red, light_blue, black, yellow, blue, green, and red.

      • reflectivity (Float): The reflectivity of the glasses lenses, on a scale from 0 to 10. The higher the reflectivity, the more you can see the actor’s surroundings in the lenses. The value in this field interacts closely with the transparency field below.

      • transparency (Float): The transparency of the glasses lenses, from 0 to 10. The higher the transparency, the more you can see the actor’s eyes behind the lenses. The value in this field interacts closely with the reflectivity field above.

    • mask: A mask accessory has the following format:

      {
         "type": "mask",
         "style": "cloth",
         "location": "on_nose",
         "mask_color": "light_blue",
         "mask_texture": "woven",
         "roughness": 1.0
      }
      

      It contains the following fields:

      • type (String): The type of accessory - in this case, mask.

      • style (String): The style of the mask. Currently the only valid value is “cloth”.

      • location (String): The location of the accessory on the face. Valid values include on_nose, on_mouth, and on_chin.

      • mask_color (String): The color of the mask. Valid values include red, silver, light_yellow, light_red, pink, light_blue, wheat, orange, rose_gold, brown, dark_green, and green.

      • mask_texture (String): The texture of the mask. Valid values include woven, cloth, and diamond_pattern.

      • roughness (Float): The roughness of the mask material, on a scale from 0 to 1. The higher the roughness, the less reflective the mask’s surface.

Using this ground truth, you can train your model to recognize faces, ages, ethnicities, genders, glasses, masks, and facial position and orientation.

environment.json#

On this platform, the data in this file is not relevant to this modality.