Bounding box


The ground truth exposed in this modality is the bounding box that defines the extent of the face in each datapoint.

This modality consists of the following file:

Relevant file



Camera folder


This file contains the minimum and maximum x and y values of a bounding box that precisely surrounds the subject’s face. It uses the following format:

    "min_x": 535,
    "min_y": 340,
    "max_x": 964,
    "max_y": 684

These four Ints provide you with the coordinates of the four corners of the bounding box. In the example above, the bounding box has vertices at (535,340), (535, 684), (964, 684), and (964, 340).

The minimum and maximum values are taken from among the 468 MediaPipe keypoints under the keypoints_2d_coordinates object in dense_keypoints.json.

Using this ground truth, you can train your model to draw precise bounding boxes and verify the accuracy of the network against the reality.

Because the position of the subject’s face in the image remains the same regardless of lighting conditions and background imagery, only one bounding box file per camera is needed regardless of the number of lighting scenarios. If you have more than one camera in the scene, each camera folder has its own bounding box file.