Bounding box#


The ground truth exposed in this modality is the bounding box that defines the extent of a human face in an image.

This modality consists of the following file:

Relevant file



Camera folder


This file contains the minimum and maximum x and y values of a bounding box that precisely surrounds an actor’s face. It has the following format:

    "version": "1.0.0",
    "min_x": 535,
    "min_y": 340,
    "max_x": 964,
    "max_y": 684
  • version: String. Version tracking for this file. Whenever you access this file in a datapoint, make sure to check that the version matches what you expect it to be; otherwise its format and fields may not be recognized.

  • min_x, min_y, max_x, max_y: Ints. These provide you with the coordinates of the four corners of the bounding box. In the example above, the bounding box has vertices at (535,340), (535, 684), (964, 684), and (964, 340).

    The minimum and maximum values are taken from among the 468 MediaPipe keypoints under the keypoints_2d_coordinates object in face_dense_key_points.json.

Using this ground truth, you can train your model to draw precise bounding boxes around the human face.