Introduction to modalities#
Datagen’s datapoints are rendered images of synthetic human beings. Each image is accompanied by annotation files that highlight different aspects of the underlying ground truth. Because the images are computer-generated, we are able to provide pixel-perfect ground truth data.
The annotations are designed to bring different parts of the ground truth to the forefront: the locations of facial landmarks; a normal map that reconstructs the contours of the face; the direction of eye gaze; and so on. Each of these ground truths is called a modality.
This section of the documentation details the structure and format of each modality, so you can process the data properly and integrate it into your training set.
Table of Contents#
This section of the documentation provides two guides through the modalities:
Per file: The File structure page goes folder-by-folder, file-by-file through a downloaded dataset and tells you what data each file contains.
Per modality: Each of the following pages describes a single modality, and gives a list of files that contain data relevant to that modality:
Visual modalities#
These modalities are primarily made up of image files. They include the original rendered image itself as well as various recolorations of that image for different purposes:
Rendered image: The original rendered image and information about the lighting environment
Depth map: A recolored image that shows the distance from each pixel in the image to the camera lens
Normal map: A recolored image that shows the direction of normal vectors coming out of each surface in the image
HDRI map: A low-resolution copy of the background image in the scene
Semantic segmentation: A recolored image that identifies semantic objects and parts of objects in the scene
Keypoint modalities#
These modalities primarily consist of JSON files that list the 2D and 3D coordinates of landmarks in the scene:
About our coordinate systems: An explanation of the 3D and 2D coordinate systems that we use in our keypoint files
Facial keypoints (iBUG): The 68 landmarks that make up the iBug facial landmark standard
Facial keypoints (MediaPipe): The 468 landmarks that make up the MediaPipe facial landmark standard
Body keypoints: Keypoints that identify body landmarks according to a standard developed by Datagen
Ear keypoints: The 55 landmarks for each ear that make up the iBug ear landmark standard
Eye keypoints: Keypoints that identify eye landmarks according to a standard developed by Datagen
Foot keypoints: The 3 landmarks for each foot that make up the CMU Perceptual Computing Lab foot landmark standard
Hand keypoints: The 21 landmarks for each hand that make up the MediaPipe hand landmark standard
Head keypoints: Keypoints that identify non-facial head landmarks according to a standard developed by Datagen
Bounding box: The coordinates of a bounding box identifying which parts of the image contain a human face
Center of geometry: The coordinates of the center of geometry for important objects in the scene
Other modalities#
These modalities contain miscellaneous data about the actors and cameras:
Actor metadata: Information about the identity and behavior of the actor(s) in the scene
Camera metadata: Internal and external parameters for the camera in the scene