Facial keypoints (MediaPipe)

Overview

The ground truth exposed in this modality is a set of facial landmarks that conforms to the MediaPipe standard.

This modality consists of the following file:

Relevant file

Location

face_dense_keypoints.json

key_points folder

dense_keypoints.json

This file contains the locations of each of 468 landmarks on the face of the generated human subject in the scene. These keypoints conform to the MediaPipe facial landmark standard.

Image1 Image2

The locations (left) and numbering (right) of the 468 facial landmarks developed by MediaPipe. Sources: https://arxiv.org/abs/1907.06724 and https://github.com/google/mediapipe/blob/a908d668c730da128dfa8d9f6bd25d519d006692/mediapipe/modules/face_geometry/data/canonical_face_model_uv_visualization.png

The file uses the following format:

{
    "version": "2.0.0",
    "face": {
        "standard": {
            // 468 keypoints defining the contours of the face
        }
    }
}

Objects and fields:

  • version: String. Version tracking for this file. Whenever you access this file in a datapoint, make sure to check that the version matches what you expect it to be; otherwise its format and fields may not be recognized.

  • standard: Object. This object contains 468 objects, each identifying a specific keypoint on the face. The keypoint objects are named “0” through “467”, and each one has the same format:

    "66": {
       "global_3d": {
          "x": 0.009309491142630577,
          "y": -0.043830640614032745,
          "z": 0.07293474674224854
       },
       "pixel_2d": {
          "x": 692,
          "y": 548
       },
       "is_visible": "true"
    
    • global_3d: Object. Contains a set of three Floats giving the location of the keypoint in global coordinates: “x”, “y”, and “z”. See About our coordinate systems for details.

    • pixel_2d: Object. Contains a set of two Ints giving the x and y coordinates of the keypoint in the images produced by this camera. See About our coordinate systems for details.

    • is_visible: Boolean. Indicates whether the keypoint is visible in the images produced by this camera. The value of this field is false if the keypoint is outside the frame, is on the wrong side of the face, or is blocked from the camera’s view by another object. Otherwise the value is true.

Using this ground truth, you can train your model to recognize individual facial landmarks, trace the outline of important parts of the face, and verify the accuracy of the network against reality. See https://github.com/DatagenTech/dgutils/blob/master/Notebooks/keypoints_bbox.ipynb for some examples on how to identify and display facial landmarks.

Because the coordinates of the subject in the scene remain the same regardless of lighting conditions and background imagery, only one MediaPipe keypoints file per camera is needed regardless of the number of lighting scenarios. If you have more than one camera in the scene, each camera folder has its own MediaPipe keypoints file; the 3D coordinates in each of these files will be the same, but different keypoints may be visible to different cameras and the 2D coordinates (representing where the landmarks are in the rendered image) will differ.