Semantic segmentation

Overview

The ground truth exposed in this modality is the parent body part of each pixel visible in the corresponding visual spectrum image.

This modality consists of the following files:

Relevant file

Location

semantic_segmentation.png

Camera folder

semantic_segmentation_metadata.json

Scene folder

semantic_segmentation.png

This file contains a version of the visual spectrum image that has been converted into a semantic segmentation map.

Image1 Image2

A semantic segmentation map of a human face (left) and the corresponding visual spectrum image (right)

In a semantic segmentation map, the color value of each pixel has been replaced with a color value that indicates which class it belongs to. The color key is found in semantic_segmentation_metadata.json, in the Scene folder.

Using this ground truth, you can train your model to recognize individual parts of the face and verify the accuracy of the network against the reality. See https://github.com/DatagenTech/dgutils/blob/master/Notebooks/segmentation.ipynb for some examples on how to use semantic segmentation to isolate objects in the datapoint.

Because the semantic segmentation of the subject in the scene remains the same regardless of lighting conditions and background imagery, only one semantic segmentation map per camera is needed regardless of the number of lighting scenarios. If you have more than one camera in the scene, each camera folder has its own semantic segmentation map, showing the body parts from that camera’s point of view.

semantic_segmentation_metadata.json

This file contains the lookup table of RGB values that were assigned to each semantic object in semantic_segmentation.png. All datasets use the same lookup table, though it is subject to change as we introduce new features into the platform.

At the top of this file is a single field called version, a String that provides version tracking for this file. Whenever you access this file in a datapoint, make sure to check that the version matches what you expect it to be; otherwise its format and fields may not be recognized. As of this writing, the most recent version is “1.0.0”.

This table shows the RGB values in version 1.0.0 of the file. The first six columns represent nested JSON objects; only the lowest node in each branch is given a color value:

Semantic segmentation color values

Segment

R

G

B

human

head

eyebrow

left

25890

62289

62077

right

4241

48855

61452

hair

4384

1699

55003

eye

left

eyeball

65036

35253

58649

tear_duct

34327

19882

46768

eyelid

52594

303

33189

right

eyeball

9488

53408

21751

tear_duct

4195

21866

38595

eyelid

61585

56473

45546

mouth

teeth

top

29528

21170

7322

bottom

56124

29399

39367

lips

top

left

58222

23071

11009

right

35538

21858

35678

bottom

left

60167

35896

35807

right

7346

20603

45223

interior

40260

6791

10197

gums

top

30375

54288

35256

bottom

49584

25641

61743

tongue

45705

61178

36197

neck

left

38693

46071

10081

right

18268

6851

56303

skin

left

9114

4994

13827

right

44166

8762

45533

nose

left

25984

22567

16578

right

54652

54478

51619

ear

left

25594

22703

49163

right

4027

30928

11794

beard

46673

34005

57112

background

0

0

0

Because the lookup table remains the same across all cameras and lighting scenarios, this file is saved in the Scene folder. Each datapoint has its own version of the file, which lists only items present in the scene. For example, if the subject of the scene does not have a beard, the “beard”, item will not appear in the file.