File structure#
Note
When you download a dataset from our platform, it is compressed as a .tar.gz file that must be extracted using an unzipping tool such as gzip or 7zip.
If you download your dataset directly through our SDK, the .tar.gz file will be automatically unzipped for you after download.
This page guides you through the structure of a downloaded dataset. It lists each file, a brief description of the data it contains, and the modalities that that data belongs to. (Some files contain data belonging to more than one modality, and some modalities are split across multiple files.)
The structure of a dataset differs slightly depending on the generator you are using:
Datapoints
In datasets generated by the Faces platform, each datapoint is given its own folder. A datapoint consists of a single RGB image along with all of its metadata.
The folders are named datapoint_00001, datapoint_00002, datapoint_00003, and so on:
dataset
├---->datapoint_00001
├---->datapoint_00002
├---->datapoint_00003
etc.
Each of these datapoints describes a unique combination of scene, camera, and lighting scenario. If you used the platform to ask for multiple copies of the same image, each under a different lighting scenario, and rendered from multiple angles by different cameras, each combination will appear as its own datapoint. For example, if you ordered 5 scenes with 4 cameras and 3 lighting scenarios each, you will receive a total of 60 datapoints.
The datapoint folder contains files that are not dependent on the camera’s location and orientation within the scene:
actor_metadata.json: The parameters that define the actor in the scene. This file contains information on the identity and facial expression of the actor as well as the location and rotation of the actor’s head. It also includes data regarding the actor’s eyes. See the Actor metadata and Eye keypoints modalities.
datapoint_request.json: A file that contains the full set of instructions necessary to reproduce this datapoint. If you want, you can edit the request in our SDK using the
datagen.api.load()
function, and then upload it to our platform to generate it.lights_metadata.json: Settings that are used for special lighting conditions such as near-infrared lighting. This file exists only if special lighting was used in this datapoint. See the Rendered image modality.
semantic_segmentation_metadata.json: A reference file for the colors that are used in semantic segmentation images. See the Semantic segmentation modality.
Cameras
Most of the modalities are highly dependent on how you view the scene - in other words, the camera’s location and settings. The camera location and settings can affect the pixel coordinates of facial and body keypoints, the distances between the camera lens and parts of the actor’s head and body, and so on.
Camera-level information is stored in a subfolder of the top-level datapoint folder, as follows:
dataset
├---->datapoint_00001
├---->camera
├---->datapoint_00002
├---->camera
├---->datapoint_00003
├---->camera
Each camera folder contains the following camera-level information:
camera_metadata.json: The camera’s intrinsic and extrinsic parameters, such as location, orientation, resolution, and aspect ratio. See the Camera metadata modality.
depth.exr: A depth map of the rendered image, giving the distance between the camera lens and each pixel that depicts part of the actor in the scene. See the Depth map modality.
environment.json: Information regarding the lighting scenario that was used when generating this datapoint. See the Rendered image modality.
face_bounding_box.json: The coordinates of the corners of the bounding box surrounding the subject’s face in the image. See the Bounding box modality.
hdri_map.exr: A flattened image of the full 360° HDRI background in the scene, if any. See the HDRI map modality.
infrared_spectrum.png: A rendered image of the actor in near-infrared lighting. Only present if this datapoint used the NIR lighting scenario. See the Rendered image modality.
normal_maps.exr: An image of the actor in which each pixel is recolored based on the direction of the normal vector coming out of the surface at that location. See the Normal map modality.
semantic_segmentation.png: An image of the actor in which each pixel is recolored based on the object or body part that it belongs to. See the Semantic segmentation modality.
visible_spectrum.png: A rendered image of the subject, using the lighting scenario you selected. Only present if you defined your camera to use the visible spectrum. See the Rendered image modality.
Keypoints
The next layer contains a set of JSON files that list the locations of keypoints on the actor’s head and body, in both 2D and 3D coordinates:
dataset
├---->datapoint_00001
├---->camera
├---->key_points
├---->datapoint_00002
├---->camera
├---->key_points
├---->datapoint_00003
├---->camera
├---->key_points
body_key_points.json: The 2D and 3D locations of 27 keypoints that describe the location of points on the actor’s body. See the Body keypoints modality.
ears_key_points.json: The 2D and 3D locations of the 55 keypoints in the iBug ear keypoint standard. See the Ear keypoints modality.
face_dense_key_points.json: The 2D and 3D locations of the 468 keypoints in Google’s MediaPipe facial keypoint standard. See the Facial keypoints (MediaPipe) modality.
face_standard_key_points.json: The 2D and 3D locations of the 68 keypoints in the iBUG facial keypoint standard. See the Facial keypoints (iBUG) modality.
feet_key_points.json: The 2D and 3D locations of the 6 keypoints in the OpenPose foot keypoint standard. See the Foot keypoints modality.
hands_key_points.json: The 2D and 3D locations of the 42 keypoints (21 for each hand) in Google’s MediaPipe hand keypoint standard. See the Hand keypoints modality.
head_key_points.json: The 2D and 3D locations of the 81 keypoints in a standard developed by Datagen that describes the structure of the human head. See the Head keypoints modality.
all_key_points.json: A collection of all of the above sets of keypoints in a single file for convenience.
Scenes
In datasets generated by the Humans in Context (HIC) platform, each animated sequence is referred to as a “scene”. Each scene is given its own folder, which is named scene_00001, scene_00002, scene_00003, and so on:
dataset
├---->scene_00001
├---->scene_00002
├---->scene_00003
etc.
The same scene might be rendered under multiple lighting scenarios and/or from multiple camera angles (you can ask for this using the Datagen platform) but these are all considered part of the same scene, and will all be included in the same scene folder.
At the level of the scene folder, you can find data that is independent both of time and of the location/orientation of the camera in the scene:
semantic_segmentation_metadata.json: A reference file for the colors that are used in semantic segmentation images. See the Semantic segmentation modality.
Frames
The second and third levels divide the animation sequences into frames. The number of frames depends on the FPS you selected when you generated the sequence; since each sequence is 10 seconds long, the number of frame folders is 10*FPS:
dataset
├---->scene_00001
├---->frames
├---->001
├---->002
├---->003
etc.
├---->scene_00002
├---->frames
├---->001
├---->002
├---->003
etc.
├---->scene_00003
├---->frames
├---->001
├---->002
├---->003
etc.
Inside each frame folder (001, 002, 003, etc.) are files that contain data that is not affected by camera location:
lights_metadata.json: Settings that are used for special lighting conditions such as near-infrared lighting. This file exists only if special lighting was used in this datapoint. See the Rendered image modality.
Cameras
The next level is a subfolder that gives the name of the camera or cameras that are viewing this sequence:
dataset
├---->scene_00001
├---->frames
├---->001
├---->media_dashboard_camera_cabin_view
├---->002
├---->media_dashboard_camera_cabin_view
├---->003
├---->media_dashboard_camera_cabin_view
├---->scene_00002
├---->frames
├---->001
├---->top_center_wheel_camera
├---->002
├---->top_center_wheel_camera
├---->003
├---->top_center_wheel_camera
etc.
├---->scene_00003
├---->frames
├---->001
├---->ceiling_center
├---->laptop_camera_close
├---->ne_corner
etc.
├---->002
├---->ceiling_center
├---->laptop_camera_close
├---->ne_corner
etc.
├---->003
├---->ceiling_center
├---->laptop_camera_close
├---->ne_corner
etc.
This folder contains most of the modalities, since they are dependent on camera location and settings as well as which frame in the sequence you are viewing. The pixel coordinates of facial and body keypoints, the distances between the camera lens and parts of the actor’s head and body, and so on - these change depending on camera location and what the
Each named camera folder contains the following files:
actor_metadata.json: The parameters that define the actor in the scene. This file contains information on the identity and facial expression of the actor as well as the location and rotation of the actor’s head. It also includes data regarding the actor’s eyes. See the Actor metadata and Eye keypoints modalities.
camera_metadata.json: The camera’s intrinsic and extrinsic parameters, such as location, orientation, resolution, and aspect ratio. See the Camera metadata modality.
center_of_geometry.json: The center of mass of important objects in the scene, in both 2D and 3D coordinates. See the Center of geometry modality.
depth.exr: A depth map of the rendered image, giving the distance between the camera lens and each pixel that depicts an object in the scene. See the Depth map modality.
environment.json: Information about the environment of the motion sequence: the specific animation, lighting scenario, and any clutter in the scene. See the Rendered image modality.
infrared_spectrum.png: A rendered image of the scene in near-infrared lighting. Only present if this scene used the NIR lighting scenario. See the Rendered image modality.
normal_maps.exr: An image of the actor in which each pixel is recolored based on the direction of the normal vector coming out of the surface at that location. See the Normal map modality.
semantic_segmentation.png: An image of the scene in which each pixel is recolored based on the object or body part that it belongs to. See the Semantic segmentation modality.
visible_spectrum_day.png, visible_spectrum_evening.png, visible_spectrum_night.png: Rendered images of the scene, one per lighting scenario you selected. These are only present if this datapoint used the visible spectrum. See the Rendered image modality.
Keypoints
The next layer contains a set of JSON files that list the locations of keypoints on the actor’s head and body, in both 2D and 3D coordinates:
dataset
├---->scene_00001
├---->frames
├---->001
├---->media_dashboard_camera_cabin_view
├---->key_points
├---->002
├---->media_dashboard_camera_cabin_view
├---->key_points
├---->003
├---->media_dashboard_camera_cabin_view
├---->key_points
├---->scene_00002
├---->frames
├---->001
├---->top_center_wheel_camera
├---->key_points
├---->002
├---->top_center_wheel_camera
├---->key_points
├---->003
├---->top_center_wheel_camera
├---->key_points
├---->scene_00003
├---->frames
├---->001
├---->ceiling_center
├---->key_points
├---->laptop_camera_close
├---->key_points
├---->ne_corner
├---->key_points
├---->002
├---->ceiling_center
├---->key_points
├---->laptop_camera_close
├---->key_points
├---->ne_corner
├---->key_points
├---->003
├---->ceiling_center
├---->key_points
├---->laptop_camera_close
├---->key_points
├---->ne_corner
├---->key_points
body_key_points.json: The 2D and 3D locations of 27 keypoints that describe the location of points on the actor’s body. See the Body keypoints modality.
eyes_key_points.json: The 2D and 3D locations of 58 keypoints that define locations in and on the surface of the eye. See the Eye keypoints modality.
face_dense_key_points.json: The 2D and 3D locations of the 468 keypoints in Google’s MediaPipe facial keypoint standard. See the Facial keypoints (MediaPipe) modality.
face_standard_key_points.json: The 2D and 3D locations of the 68 keypoints in the iBUG facial keypoint standard. See the Facial keypoints (iBUG) modality.
feet_key_points.json: The 2D and 3D locations of the 6 keypoints in the OpenPose foot keypoint standard. See the Foot keypoints modality.
hands_key_points.json: The 2D and 3D locations of the 42 keypoints (21 for each hand) in Google’s MediaPipe hand keypoint standard. See the Hand keypoints modality.
all_key_points.json: A collection of all of the above sets of keypoints in a single file for convenience.