Camera metadata#

Overview#

The ground truth exposed in this modality is the set of camera parameters: external parameters such as its position and orientation in the scene, and internal parameters such as its resolution and pixel aspect ratio.

This modality consists of the following file:

Relevant file

Location

camera_metadata.json

Camera folder

camera_metadata.json#

This file contains values that define the camera’s inner workings and relationship to the world. It has the following format:

{
   "version": "1.1.0",
   "camera_name": "camera_1",
   "camera_type": "PERSP",
   "focal_length": 49.999996185302734,
   "location": {
      "x": 0.0,
      "y": -2.5799999237060547,
      "z": 0.20000004768371582
   },
   "orientation": {
      "look_at_vector": {
         "x": 0.0,
         "y": 1.0,
         "z": 1.4901162614933128e-07
      },
      "up_vector": {
         "x": 0.0,
         "y": -1.4901162614933128e-07,
         "z": 1.0
      }
   },
   "aspect_px": {
      "x": 1.0,
      "y": 1.0
   },
   "resolution_px": {
      "x": 1024.0,
      "y": 1024.0
   },
   "fov": {
      "horizontal": 7.999999938029066,
      "vertical": 7.999999938029066
   },
   "sensor": {
      "sensor_width": 36.0,
      "sensor_height": 36.0
   },
   "intrinsic_matrix": [
      [
         7321.940972222222,
         0.0,
         512.0
      ],
      [
         0.0,
         7321.940972222222,
         512.0
      ],
      [
         0.0,
         0.0,
         1.0
      ]
   ],
   "extrinsic_matrix": [
      [
         1.0000000596046377,
         0.0,
         0.0,
         0.0
      ],
      [
         -0.0,
         -0.0,
         -1.0000001192092896,
         0.20000007152557941
      ],
      [
         0.0,
         0.9999999403953552,
         1.4901161193847656e-07,
         2.579999740123746
      ]
   ]
}

Objects and fields:

  • version: String. Version tracking for this file. Whenever you access this file in a datapoint, make sure to check that the version matches what you expect it to be; otherwise its format and fields may not be recognized.

  • camera_name: String. A label for this camera. If you used the SDK to generate your datapoints, you were able to set the camera name yourself. If you used the Faces platform UI, camera names tend to be of the form “camera_1”, “camera_2”, etc.; if you used the Humans in Context platform UI, camera names are defined based on their role in the scene.

  • camera_type: String. Currently all cameras have the value “PERSP” in this field, indicating they are perspective cameras. In the future we will be adding alternative types of cameras such as fisheye lenses.

  • location: Object. This object contains three Floats named “x”, “y”, and “z”, which give the coordinates of the camera as you defined them when you created your dataset. See About our coordinate systems for details.

  • orientation: Object. This object contains two vectors that describe how the camera is oriented in the 3D space.

    • look_at_vector: Object. This object contains the normalized x, y, and z components of the vector that defines the direction that the camera is pointed. You defined these values through the camera yaw, pitch, and roll controls when you created your dataset.

      • x: Float. At the camera’s default position and orientation, the x axis runs from right to left. By default, the x component of the camera’s look-at vector is 0, which means it is pointing neither to the left nor to the right.

        ../_images/CameraDefault.png

        The camera’s view of the subject at default orientation#

        ../_images/camera1%C2%B0-x.png

        The camera’s view of the subject with a 1° rotation in the -x direction#

        ../_images/camera1%C2%B0%2Bx.png

        The camera’s view of the subject with a 1° rotation in the +x direction#

      • y: Float. At the camera’s default position and orientation, the y axis runs from forward to back. By default, the y component of the camera’s look-at vector is 1, which means it is pointing in the positive y direction - towards the actor, whose head is located at the origin.

        ../_images/CameraDefault.png

        In this default image, the camera is facing in the +y direction, and the subject is facing in the -y direction.#

        Modalities/images/camera-y.png

        In this image, the camera has been first moved to +1.6 on the y axis, then rotated 180 degrees so that it is facing in the -y direction. The actor’s head is also by default pointed in the -y direction, and therefore we are looking at the back of the actor’s head. (If we had not moved the camera to y=+1.6 before rotating it, we would not be able to see the actor at all.)#

      • z: Float. At the camera’s default position and orientation, the z axis runs from bottom to top. By default, the x component of the camera’s look-at vector is 0, which means it is tilted neither downward nor upward.

        ../_images/CameraDefault.png

        The camera’s view of the subject at default orientation#

        ../_images/camera1%C2%B0-z.png

        The camera’s view of the subject with a 1° rotation in the -z direction#

        ../_images/camera1%C2%B0%2Bz.png

        The camera’s view of the subject with a 1° rotation in the +z direction#

    • up_vector: Object. This object contains the normalized x, y, and z components of the vector that defines the “up” direction in the camera space, providing you with the camera’s orientation.

      • x: Float. The x axis runs from left to right from the default camera’s point of view. The default value of the x component of the camera’s up vector is 0, which means the vector is neither tilted to the left nor to the right. Lowering the value of x in the look-at vector rotates the camera counterclockwise, while raising the value of x in the look-at vector rotates the camera clockwise. (When done in isolation, neither of these operations has any effect on the camera’s look-at vector above, but they do affect the orientation of the images that the camera produces.)

        ../_images/cameraup1%C2%B0-x.png

        The spotlight’s view of the subject with the up vector tilted 1° in the -x direction#

        ../_images/CameraDefault.png

        The camera’s view of the subject at default orientation#

        ../_images/cameraup1%C2%B0%2Bx.png

        The camera’s view of the subject with the up vector tilted 1° in the +x direction#

      • y: Float. The y axis runs from back to front from the default camera’s point of view. The default value of y in the camera’s up vector is 0, which means the vector is not tilted to front or to back. Lowering the value of y in the camera’s up vector tilts the lens away from the subject, while raising the value of y in the camera’s up vector tilts the lens towards the subject.

        ../_images/cameraup1%C2%B0-y.png

        The camera’s view of the subject with the up vector tilted 1° in the -y direction from the default position#

        ../_images/CameraDefault.png

        The camera’s view of the subject at default orientation#

        ../_images/cameraup1%C2%B0%2By.png

        The camera’s view of the subject with the up vector tilted 1° in the +y direction from the default position#

      • z: Float. The z axis runs from bottom to top from the default camera’s point of view. The default value of z in the camera’s up vector is 1, which means the vector is pointing straight up. Lowering the value of z in this vector tilts the camera downward, while raising the value of z tilts it upward again.

        ../_images/CameraDefault.png

        The camera’s view of the subject at default orientation, with z=1#

        ../_images/cameraupz%3D-1.png

        The camera’s view of the subject when the z component of its up vector is -1 and the camera’s look-at vector is unchanged.#

  • intrinsic_matrix: Two-dimensional array. This 3x3 array holds the values of the intrinsic camera matrix K, which contains the camera’s focal length, principal point offset, and axis skew.

    \[\begin{split}K=\begin{matrix} f_{x} & s & x_{0} \\ 0 & f_{y} & y_{0} \\ 0 & 0 & 1 \\ \end{matrix}\end{split}\]
  • extrinsic_matrix: Two-dimensional array. This 3x4 array holds the values of the extrinsic camera matrix R | t, which describes the transformation from global to camera coordinates. The matrix is made up of the 3x3 rotation matrix R and the 3x1 translation vector t:

    \[\begin{split}K = \begin{matrix} r_{1,1} & r_{1,2} & r_{1,3} & t_{1} \\ r_{2,1} & r_{2,2} & r_{2,3} & t_{2} \\ r_{3,1} & r_{3,2} & r_{3,3} & t_{3} \\ \end{matrix}\end{split}\]

    The extrinsic_matrix array gives the values in each row in the matrix from left to right. For example, this array:

    "extrinsic_matrix": [
      [
         1.0000000596046377,
         0.0,
         0.0,
         0.0
      ],
      [
         -0.0,
         -0.0,
         -1.0000001192092896,
         0.20000007152557941
      ],
      [
         0.0,
         0.9999999403953552,
         1.4901161193847656e-07,
         2.579999740123746
      ]
    ]
    

    describes this matrix:

    \[\begin{split}K = \begin{matrix} 1 & 0 & 0 & 0 \\ 0 & 0 & - 1 & 0.2 \\ 0 & 1 & 0 & 2.58 \\ \end{matrix}\end{split}\]
  • aspect_px: Object. This object contains two Float objects: the x and y values that define the pixel aspect ratio of the camera. Currently the platform only supports a pixel ratio of 1:1, so the values of “x” and “y” will both be 1.0. Note that the pixel aspect ratio is not the same as the image aspect ratio; see resolution_px below.

  • resolution_px: Object. This object contains two Float objects that give the number of pixels in each rendered image: “x” is the number of pixels in the image from left to right, and “y” is the number of pixels in the image from top to bottom. You defined these values when you created your dataset; by default they are both equal to 1024.

  • fov: Object. This object contains two Float objects that give the camera’s field of view in degrees in the “horizontal” and “vertical” directions. You defined these values when you created your dataset; by default they are both equal to 8.

  • sensor: Object. This object contains two Float objects that give the size of the camera sensor. These two objects, “sensor_width” and “sensor_height”, are each equal to 36.0.

  • focal_length: Float. The distance, in mm, between the camera sensor and the camera lens. It is derived from the field of view and sensor size.

Using this ground truth, you can give your model all of the background information it needs to process your datapoints. These values provide a higher degree of accuracy than the values that can be derived for cameras in the real world.