About our coordinate systems#

Overview#

Every keypoint identified in Datagen’s modalities is provided in both 2D and 3D coordinates. This page describes the coordinate systems that our modalities use.

3D Coordinates#

3D coordinates describe a location in objective space. They are therefore camera-independent: You can have fifteen cameras photographing the same landmark from different angles and with different settings, but the landmark’s 3D coordinates will remain the same for all of them.

3D coordinates describe the distance between the landmark and the global origin, in a global coordinate system measured in meters. In this coordinate system, x is to the left and right; y is forward and back; and z is up and down. These directions are from the point of view of the default positions of the camera and the actor, where the camera is located at (0, -1.6, 0.12) and the actor - specifically, a point at the front of the actor’s neck - is located at the origin (0, 0, 0).

../_images/coordinate_system.png

Every time a set of 3D coordinates appears in Datagen’s metadata, it uses the same format: an object called “global_3d” containing three Floats that describe the x, y, and z coordinates of a landmark or keypoint:

"global_3d": {
   "x": 0.1370361637738016,
   "y": -0.7877883480654823,
   "z": 0.022076432851867542
},

Actor location and head rotation in the 3D space#

Default position#

Since a human is not a point-sized object, it is important to emphasize which part of the actor is actually located at the origin. These coordinates actually define the location of the suprasternal notch (the small pocket at the center of the V at the front of the neck). If you assigned the human to a different position (or used our platform to randomly select from a range of positions), you are actually defining the location of the suprasternal notch.

../_images/Suprasternal_notch.png

The suprasternal notch, which always faces in the negative y direction and by default is located at the origin of the world space. Image by Prince of Sarras at the English-language Wikipedia, CC BY-SA 3.0, via Wikimedia Commons#

As an example of how the rest of the body is oriented around the suprasternal notch, let us take keypoint 28, which is the keypoint directly between the eyes in the iBUG facial keypoint standard. Because it is particularly useful for defining the location and orientation of the head (in contrast to the orientation of the rest of the body), this keypoint is separately included in actor_metadata.json.

../_images/Landmark28.png

At default settings, when the actor has not been moved, the head has not been rotated, and the suprasternal notch is at the origin, the coordinates of keypoint 28 are as follows:

  • x will be very close to 0, because the keypoint is in the center of the face and therefore almost directly above the origin on the left-right axis.

  • y will be very close to 0, because the keypoint is almost directly above the suprasternal notch - usually slightly positive, but occasionally slightly negative.

  • z will be positive, because the keypoint is above the suprasternal notch.

Moving the actor#

  • x: Float. Lower values of x move the actor to their right; higher values of x move the actor to their left.

    ../_images/subjectx%3D-0.05.png

    The subject at x=-0.05, with the camera at the default position#

    ../_images/subjectDefault.png

    The subject at x=0 (default), with the camera at the default position#

    ../_images/subjectx%3D0.05.png

    The subject at x=0.05, with the camera at default position#

  • y: Float. Lower values of y move the actor forwards; higher values of y move the actor backwards. If y is lower than -1.6, the actor has been moved so far forward that they are behind the default position of the camera (see the camera section below), and they will not be visible in the rendered images unless the camera itself is moved or rotated.

    ../_images/subjecty%3D-0.2.png

    The subject at y=-0.2, with the camera at the default position#

    ../_images/subjectDefault.png

    The subject at y=0 (default), with the camera at the default position#

    ../_images/subjecty%3D0.2.png

    The subject at y=0.2, with the camera at the default position#

  • z: Float. Lower values of z move the actor down; higher levels of z move the actor up.

    ../_images/subjectz%3D-0.05.png

    The subject at z=-0.05, with the camera at the default position#

    ../_images/subjectDefault.png

    The subject at z=0 (default), with the camera at the default position#

    ../_images/subjectz%3D0.05.png

    The subject at z=0.05, with the camera at the default position#

Rotating the head#

The orientation of the head defines how it is rotated about the actor’s neck, as the actor’s body (including the neck) remains fixed in place. The platform limits the rotation of the head based on the realistic limits of human physiology.

Note

You always stretch your neck in some way when you rotate your head. As a result, the rotation controls may pull the suprasternal notch slightly off of the coordinates that you defined for it. This does not however change the origin point of the head, which will always be exactly where you defined it.

The head is rotated first by pitch, then by yaw, then by roll, using the values you entered into the pitch, yaw, and roll controls in the platform.

  • yaw: Float. Lower values of yaw turn the head to its right. Higher values of yaw turn the head to its left.

    ../_images/subject-10yaw.png

    An actor’s head with -10° of yaw#

    ../_images/subject%2B10yaw.png

    An actor’s head with 10° of yaw#

  • pitch: Float. Lower values of pitch turn the head upward. Higher values of pitch turn the head downward.

    ../_images/subject-10pitch.png

    An actor’s head with -10° of pitch#

    ../_images/subject%2B10pitch.png

    An actor’s head with 10° of pitch#

  • roll: Float. Lower values of roll turn the head counterclockwise. Higher values of roll turn the head clockwise.

    ../_images/subject-10roll.png

    An actor’s head with -10° of roll#

    ../_images/subject%2B10roll.png

    An actor’s head with 10° of roll#

Camera location and rotation in the 3D space#

Default position#

By default, our platform places the camera in approximately the same horizontal plane as the actor’s head, but rotated 180 degrees around the z axis and 1.6 meters away in the negative y direction.

This places the camera directly opposite the actor’s face, such that the two are looking directly at each other.

Our datapoints use an idealized point-sized camera. Therefore, when you input the camera’s position in the platform you are defining that position exactly.

../_images/subjectDefault.png

An actor’s head at (0, 0, 0), from the point of view of a camera at the default coordinates (0, -1.6, 0.12)#

Moving the camera#

  • x: Float. Lower values in the x component move the camera to its left; higher values of x move the camera to its right. The camera’s default x component is 0, which centers the camera on the left-right axis and points it directly at the default position of the subject.

    ../_images/camerax%3D-0.05.png

    The camera’s view of the subject when x=-0.05#

    ../_images/camerax%3D0.png

    The camera’s view of the subject when x=0 (default)#

    ../_images/camerax%3D0.05.png

    The camera’s view of the subject when x=0.05#

  • y: Float. Lower values in the y component move the camera backwards; higher values of y move the camera forwards. The camera’s default y component is -1.600, which places it far enough away from the default position of the actor’s head to see it in its entirety. If the y component is greater than 0, then the camera has moved past the subject, and you will need to rotate the camera or move the subject if you want the subject to appear in the images.

    ../_images/cameray%3D-1.65.png

    The camera’s view of the subject when y=-1.65#

    ../_images/cameray%3D-1.6.png

    The camera’s view of the subject when y=-1.6 (default)#

    ../_images/cameray%3D-1.55.png

    The camera’s view of the subject when y=-1.55#

  • z: Float. Lower values in the z component move the camera down; higher values move the camera up. The spotlight’s default z component is 0.12, which places it slightly higher than the default position of the subject. Since the subject’s coordinates actually define the location of the suprasternal notch, raising the camera slightly in this way puts it at approximately the same height as the subject’s face.

    ../_images/cameraz%3D0.07.png

    The camera’s view of the subject when z=0.07#

    ../_images/cameraz%3D0.12.png

    The camera’s view of the subject when z=0.12 (default)#

    ../_images/cameraz%3D0.17.png

    The camera’s view of the subject when z=0.17#

Rotating the camera#

The camera is rotated by applying first yaw, then pitch, then roll, according to the values you entered in the yaw, pitch, and roll controls on the platform.

  • yaw: Float. Lower values of yaw turn the camera to its left. Higher values of yaw turn the camera to its right.

    ../_images/camera-1yaw.png

    The camera’s view with -1° of yaw#

    ../_images/subjectDefault.png

    The camera’s view with 0° degrees of yaw (default)#

    ../_images/camera%2B1yaw.png

    The camera’s view with 1° of yaw#

  • pitch: Float. Lower values of pitch turn the camera downward. Higher values of pitch turn the camera upward.

    ../_images/camera-1pitch.png

    The camera’s view with -1° of pitch#

    ../_images/subjectDefault.png

    The camera’s view with 0° degrees of pitch (default)#

    ../_images/camera%2B1pitch.png

    The camera’s view with 1° of pitch#

  • roll: Float. Lower values of roll turn the camera clockwise. Higher values of roll turn the camera counterclockwise.

    ../_images/camera-1roll.png

    The camera’s view with -1° of roll#

    ../_images/subjectDefault.png

    The camera’s view with 0° degrees of roll (default)#

    ../_images/camera%2B1roll.png

    The camera’s view with 1° of roll#

2D Coordinates#

2D coordinates describe the location of a landmark in the photographs taken by a specific camera with specific settings. They are therefore highly camera-dependent. The landmark will have the same coordinates in every single photograph taken by the same camera, no matter how much the lighting conditions or backgrounds change. But in the set of photographs taken by a different camera, in a different place in the scene, with different settings, the landmark will likely have completely different coordinates.

2D coordinates describe the distance between the landmark and the upper left corner of the image. In this coordinate system, x is the distance from the top edge of the image, and y is the distance from the left side of the image.

../_images/single_point_example.png

The point labeled in green is at x=438 pixels from the top edge of the image, and y=592 from the left edge of the image#

Every time a set of 2D coordinates appears in Datagen’s metadata, it uses the same format: an object called “pixel_2d” that contains two Ints describing the x and y coordinates of a specific pixel in a 2D image:

"pixel_2d": {
   "x": 3724,
   "y": -143
}

Note that there is no requirement that the landmark be visible in the image. In the example above, the landmark is 143 pixels to the left of the left side of the image, which means it is not visible at all; furthermore, the value of x is so high that the landmark is probably below the bottom of the image.