The ground truth exposed in this modality is the distance between the camera lens and each surface visible in the image.
This modality consists of the following file:
This file contains a depth map equivalent to the visual spectrum image. It uses the 32-bit floating point version of the OpenEXR file format.
A depth map of a human face (left) and the corresponding visual spectrum image (right)
To create the depth map, we have replaced the color value of the pixel with a number representing the 3D distance between the camera lens and the surface represented in that pixel. The differences are too small to be seen by the naked eye, but can be extracted and compared by most image processing tools.
The use of 32-bit floating-point color provides room to store extremely accurate measurements in the color data. Using this ground truth, you can train your model to learn the shapes of human faces, human bodies, or other objects in the scene, and to recognize the distances between a camera and each part of the face.
Using this ground truth, you can train your model to recognize the contours of a person’s face and perform 3D reconstruction.
To process a depth map, we recommend using code along the following lines:
import cv2 def load_depth(path): depth_map = cv2.imread(path, cv2.IMREAD_UNCHANGED) # shape = (N, N, 3) return depth_map[..., 0] # shape = (N, N) path = "depth.exr" depth_map = load_depth(path)