Depth map#


The ground truth exposed in this modality is the distance between the camera lens and each surface visible in the image.

This modality consists of the following file:

Relevant file



Camera folder


This file contains a depth map equivalent to the visual spectrum image. It uses the 32-bit floating point version of the OpenEXR file format.

Image1 Image2

A depth map of a human face (left) and the corresponding visual spectrum image (right)

To create the depth map, we have replaced the color value of the pixel with a number representing the 3D distance between the camera lens and the surface represented in that pixel. The differences are too small to be seen by the naked eye, but can be extracted and compared by most image processing tools.


The same depth map, where the differences have been emphasized programmatically#

The use of 32-bit floating-point color provides room to store extremely accurate measurements in the color data. Using this ground truth, you can train your model to learn the shape of the human face and the distances between the camera and each part of the face, as well as to verify the accuracy of the network against the reality.

Because the depth map of the subject in the scene remains the same regardless of lighting conditions and background imagery, only one depth map per camera is needed regardless of the number of lighting scenarios. However, if you have more than one camera in the scene, each camera folder has its own depth map, depicting the distances from that camera’s point of view.

To process a depth map, we recommend using code along the following lines:

import cv2

def load_depth(path):
    depth_map = cv2.imread(path, cv2.IMREAD_UNCHANGED)  # shape = (N, N, 3)
    return depth_map[..., 0]  # shape = (N, N)

path = "depth.exr"
depth_map = load_depth(path)