Actor metadata#
Overview#
The ground truth exposed in this modality is the identity, behavior, position, and orientation of the face of the synthetic human subject at the (metaphorical) center of the rendered scene.
This modality consists of the following file:
Relevant file |
Location |
---|---|
actor_metadata.json |
Scene folder |
actor_metadata.json#
This file contains the parameters that were used to generate the human subject at the center of the scene. It uses the following format:
{
"version": "1.0.0",
"identity_label": {
"age": "young",
"gender": "female"
"ethnicity": "southeast_asian",
},
"identity_id": "56fb1462-c05a-4f87-8913-7754ebbd0fd9",
"facial_hair_included": false,
"face_expression": {
"name": "happiness"
"intensity_level": 2,
},
"head_metadata": {
"head_root_location": {
"x": 0.0,
"y": 0.0,
"z": 0.0
},
"head_rotation": {
"pitch": 0.0,
"yaw": 0.0,
"roll": 0.0
},
"head_six_dof": {
"location": {
"x": -0.0020751950796693563,
"y": -0.06417405605316162,
"z": 0.14418649673461914
},
"look_at_vector": {
"x": 8.74227803627227e-08,
"y": -0.9999999999999962,
"z": 9.553428304548457e-16
}
}
}
}
Note
There are additional fields in actor_metadata.json that are part of the Eye keypoints modality.
Objects and fields:#
version: String. Version tracking for this file. Whenever you access this file in a datapoint, make sure to check that the version matches what you expect it to be; otherwise its format and fields may not be recognized.
- identity_label: Object. This object contains three fields that describe the identity of the subject:
age: String. The age range of the actor in the scene. Valid values are “young” for age 18-30; “adult” for ages 31-50; and “older” for ages 51+.
gender: String. The gender of the actor in the scene. Valid values are “male” and “female”.
ethnicity: String. The ethnicity of the actor in the scene. Valid values are “african”, “hispanic”, “north_european”, “mediterranean”, “south_asian”, and “southeast_asian”.
identity_id: String. This string contains the unique ID of the subject in this scene, which defines which of Datagen’s identities was used to generate the facial shape and texture (before age, gender, and ethnicity are applied).
facial_hair_included: Boolean. This value is true if the subject is male and you decided to include male facial hair when generating your dataset; otherwise it is false.
- face_expression: Object. This object contains two fields that determine the expression on the subject’s face:
name: String. Describes the expression on the subject’s face. Valid values are “happiness”, “sadness”, “surprise”, “anger”, “fear”, “contempt”, “disgust”, “mouth_open”, and “none”. When you generated your dataset, you selected which expressions to include and the probability of each one.
Intensity_level: Int. A measure of the strength of the above emotion displayed on the subject’s face, from 1 (mild) to 5 (intense). If the subject’s emotion is “none”, this value will always be 1.
head_metadata: Object. This object contains a series of objects and fields that describe the head’s position and orientation in space.
head_root_location: Object. This object contains three Floats named “x”, “y”, and “z”, and gives the 3D coordinates of the head; see About our coordinate systems for details.
When you created your dataset, you gave the Datagen platform a range of valid positions for the head. Our system selected a random position within that range, using a uniform distribution, when it generated this human subject.
head_rotation: Object. This object contains three Floats named “yaw”, “pitch”, and “roll”. The values are measured in degrees; see About our coordinate systems for details. When you created your dataset you gave the Datagen platform a range of valid orientations for the head. Our platform selected a random orientation in that range, using a uniform distribution, when it generated this human subject. For these orientations, the head is rotated about the neck within realistic human physiological limits, and the neck is realistically stretched by those head movements but not otherwise moved.
head_six_dof: Object. This object provides location and rotation data for the subject based on iBUG keypoint 28, which is the keypoint directly between the eyes (see Facial keypoints (iBUG)).
location: Object. This object contains three Floats named “x”, “y”, and “z”, giving the coordinates of keypoint 28 in global coordinates. See About our coordinate systems for details.
look_at_vector: Object. This object contains the normalized x, y, and z values of the vector that defines the direction that the face is pointed.
Note
Important: The look-at vector defines the orientation of the face – NOT the direction of the subject’s gaze. See the section below.
x: Float. At the face’s default position and orientation, the x axis runs from left to right. Lowering the value of x in the look-at vector turns the face to its right (the camera’s left); raising the value of x in the look-at vector turns the face to its right (the camera’s left). In the default position, the value of x will be very close to 0, because the face is looking straight at the camera, perpendicular to the x axis.
y: Float. At the face’s default position and orientation, the y axis runs from front to back. Lowering the value of y in the look-at vector turns the face towards the camera, while raising the value of y in the look-at vector turns the face away from the camera. In the default position, the value of y will be very close to -1, because the face is looking straight at the camera in the -y direction.
z: Float. At the face’s default position and orientation, the z axis runs from bottom to top. Lowering the value of z in the look-at vector turns the face downwards, while raising the value of z in the look-at vector turns the face upwards. In the default position, the value of z will be very close to 0, because the face is looking straight at the camera, perpendicular to the z axis.
Using this ground truth, you can train your model to recognize faces, ages, ethnicities, genders, and facial position and orientation – as well as verify the accuracy of the network against the reality.
Because the position of the subject’s face in the scene remains the same regardless of where the cameras are, only one actor_metadata.json file per scene is needed.