## Overview#

The ground truth exposed in this modality is the set of camera parameters: external parameters such as its position and orientation in the scene, and internal parameters such as its resolution and pixel aspect ratio.

This modality consists of the following file:

Relevant file

Location

Camera folder

This file contains values that define the camera’s inner workings and relationship to the world. It uses the following format:

{
"camera_name": "camera_1",
"camera_type": "PERSP",
"location": {
"x": 0.0,
"y": -2.5799999237060547,
"z": 0.20000004768371582
},
"orientation": {
"look_at_vector": {
"x": 0.0,
"y": 1.0,
"z": 1.4901162614933128e-07
},
"up_vector": {
"x": 0.0,
"y": -1.4901162614933128e-07,
"z": 1.0
}
},
"aspect_px": {
"x": 1.0,
"y": 1.0
},
"resolution_px": {
"x": 1024.0,
"y": 1024.0
},
"fov": {
"horizontal": 7.999999938029066,
"vertical": 7.999999938029066
},
"sensor": {
"sensor_width": 36.0,
"sensor_height": 36.0
},
"intrinsic_matrix": [
[
7321.940972222222,
0.0,
512.0
],
[
0.0,
7321.940972222222,
512.0
],
[
0.0,
0.0,
1.0
]
],
"extrinsic_matrix": [
[
1.0000000596046377,
0.0,
0.0,
0.0
],
[
-0.0,
-0.0,
-1.0000001192092896,
0.20000007152557941
],
[
0.0,
0.9999999403953552,
1.4901161193847656e-07,
2.579999740123746
]
]
}


### Objects and fields:#

• camera_name: String. The internal name for this camera. Cameras are generally named “camera_1”, “camera_2”, and so on, up to the number of cameras you placed in the scene.

• camera_type: String. Currently all cameras have the value “PERSP” in this field, indicating they are perspective cameras. In the future we will be adding alternative types of cameras such as fisheye lenses.

• location: Object. This object contains three Floats named “x”, “y”, and “z”, which give the coordinates of the camera as you defined them when you created your dataset. See About our coordinate systems for details.

• orientation: Object. This object contains two vectors that describe how the camera is oriented in the 3D space.

• look_at_vector: Object. This object contains the normalized x, y, and z components of the vector that defines the direction that the camera is pointed. You defined these values through the camera yaw, pitch, and roll controls when you created your dataset.

• x: Float. At the camera’s default position and orientation, the x axis runs from right to left. By default, the x component of the camera’s look-at vector is 0, which means it is pointing neither to the left nor to the right.

• y: Float. At the camera’s default position and orientation, the y axis runs from forward to back. By default, the y component of the camera’s look-at vector is 1, which means it is pointing in the positive y direction - towards the subject’s face, which is located at the origin.

• z: Float. At the camera’s default position and orientation, the z axis runs from bottom to top. By default, the x component of the camera’s look-at vector is 0, which means it is tilted neither downward nor upward.

• up_vector: Object. This object contains the normalized x, y, and z components of the vector that defines the “up” direction in the camera space, providing you with the camera’s orientation.

• x: Float. The x axis runs from left to right from the default camera’s point of view. The default value of the x component of the camera’s up vector is 0, which means the vector is neither tilted to the left nor to the right. Lowering the value of x in the look-at vector rotates the camera counterclockwise, while raising the value of x in the look-at vector rotates the camera clockwise. (When done in isolation, neither of these operations has any effect on the camera’s look-at vector above, but they do affect the orientation of the images that the camera produces.)

• y: Float. The y axis runs from back to front from the default camera’s point of view. The default value of y in the camera’s up vector is 0, which means the vector is not tilted to front or to back. Lowering the value of y in the camera’s up vector tilts the lens away from the subject, while raising the value of y in the camera’s up vector tilts the lens towards the subject.

• z: Float. The z axis runs from bottom to top from the default camera’s point of view. The default value of z in the camera’s up vector is 1, which means the vector is pointing straight up. Lowering the value of z in this vector tilts the camera downward, while raising the value of z tilts it upward again.

• intrinsic_matrix: Two-dimensional array. This 3x3 array holds the values of the intrinsic camera matrix K, which contains the camera’s focal length, principal point offset, and axis skew.

$\begin{split}K=\begin{matrix} f_{x} & s & x_{0} \\ 0 & f_{y} & y_{0} \\ 0 & 0 & 1 \\ \end{matrix}\end{split}$
• extrinsic_matrix: Two-dimensional array. This 3x4 array holds the values of the extrinsic camera matrix R | t, which describes the transformation from global to camera coordinates. The matrix is made up of the 3x3 rotation matrix R and the 3x1 translation vector t:

$\begin{split}K = \begin{matrix} r_{1,1} & r_{1,2} & r_{1,3} & t_{1} \\ r_{2,1} & r_{2,2} & r_{2,3} & t_{2} \\ r_{3,1} & r_{3,2} & r_{3,3} & t_{3} \\ \end{matrix}\end{split}$

The extrinsic_matrix array gives the values in each row in the matrix from left to right. For example, this array:

"extrinsic_matrix": [
[
1.0000000596046377,
0.0,
0.0,
0.0
],
[
-0.0,
-0.0,
-1.0000001192092896,
0.20000007152557941
],
[
0.0,
0.9999999403953552,
1.4901161193847656e-07,
2.579999740123746
]
]


describes this matrix:

$\begin{split}K = \begin{matrix} 1 & 0 & 0 & 0 \\ 0 & 0 & - 1 & 0.2 \\ 0 & 1 & 0 & 2.58 \\ \end{matrix}\end{split}$
• aspect_px: Object. This object contains two Float objects: the x and y values that define the pixel aspect ratio of the camera. Currently the platform only supports a pixel ratio of 1:1, so the values of “x” and “y” will both be 1.0. Note that the pixel aspect ratio is not the same as the image aspect ratio; see resolution_px below.

• resolution_px: Object. This object contains two Float objects that give the number of pixels in each rendered image: “x” is the number of pixels in the image from left to right, and “y” is the number of pixels in the image from top to bottom. You defined these values when you created your dataset; by default they are both equal to 1024.

• fov: Object. This object contains two Float objects that give the camera’s field of view in degrees in the “horizontal” and “vertical” directions. You defined these values when you created your dataset; by default they are both equal to 8.

• sensor: Object. This object contains two Float objects that give the size of the camera sensor. These two objects, “sensor_width” and “sensor_height”, are each equal to 36.0.

Using this ground truth, you can give your model all of the background information it needs to process your datapoints. These values provide a higher degree of accuracy than the values that can be derived for cameras in the real world.

Because the values in this file define the camera’s position, behavior, and relationship to the scene, they are independent of lighting conditions and background imagery. Therefore each camera will one and only one camera_metadata.json file.