Skip to main content

Color Formats

To represent color values, a format is agreed upon. Color formats are made up of three things, the color model--which includes the order of the components and sometimes chroma subsampling-- the bit depth, and whether it is a packed or a planar format. In some cases, endianness may be important.

Color Models​

A color model is a method of representing colors in a video or image using data. Different color models store color and brightness information in different ways. There are many different color models, but this section will cover the models most commonly used for images and video.

RGB​

RGB is probably the most well-known color model, and is primarily used in image encoding. RGB consists of three color channels, Red, Green, and Blue, which are then combined to determine the final color of each pixel. Typically, RGB is the final model that a monitor or TV will use to display images, because the pixels on a screen are made up of red, green, and blue LEDs, although it is not commonly used for video encoding because other models can provide better compression.

YUV​

YUV, also known as YCbCr, is the most widely used color model for video encoding. It consists of three components: Y aka Luma, which represents luminance or brightness, and two chroma planes, which represent color. Generally a video player will have to convert a YUV video into RGB before it can be rendered, but there are significant compression benefits to using YUV over RGB for video.

The most notable reason to use YCbCr is an optimization called chroma subsampling. This means that the chroma components can be encoded at a lower resolution than the luma components, which results in a smaller output file. You can read more about chroma subsampling further below.

Component order​

The order in which the components in a color model are arranged is simply represented by writing them out. For example, RGB for red first, then green, then blue, or BGR for blue, green, red.

Bit depth​

A bit depth is how many bits are available to store the sample value. There are two main ways to specify the bit depth in a :

  • bits per component. Here, RGB888 reads as RGB color model, with 8 bits for the red component, 8 bits for the green component, and 8 bits for the blue component and RGB565 reads as RGB color model, with 5 bits for the red component, 6 bits for the green component, and 5 bits for the blue component.
  • bits per sample. Here, RGB24 reads as RGB color model, with 24 bits in total for the red, green, and blue components. This is ambiguous, because one does not know exactly how many bits are allocated to each component. RGB565, RGB556, and RGB655 (even though the latter ones do not make much sense as the eye is most sensitive to green light) all become RGB16.

Packed vs planar​

Components can be stored either packed, where all components are interleaved (here, RGB):

Sample number:   1   2   3   4   5
Data: RGB RGB RGB RGB RGB

or stored separately for each component:

Sample number: 1 2 3 4 5
Data: R R R R R...
Data: G G G G G...
Data: B B B B B...

In planar formats, many operations can be easier to implement, as it is possible to implement the algorithm once and then operate on all planes. On the other hand, packed formats are simpler and often used in hardware.1

Endianness​

Different computer architectures store numbers differently. For more information, visit the Wikipedia article on endianness. There are two main ways to store numbers with more than 8 bits (1 is the least significant byte and 4 is the most significant byte, here 4 bytes):

  • Most significant byte first, little endian, 4321. This is what x86-family processors use.
  • Least significant byte first, big endian, 1234. This is what PowerPC-family processors use.

This can be important for color formats, as some computers might store it in their native endianness. VapourSynth doesn't seem to care about endianness, but FFmpeg does.

For example, RGB565 might store its two bytes in 12 or 21 order, and if they are read in the wrong order, it will produce garbage.

Chroma subsampling​

In Y'CbCr signals, there are three widely used variants of chroma subsampling:

  • 4:2:0 which has half the vertical and horizontal chroma resolution
  • 4:2:2 which has half the horizontal chroma resolution but full vertical resolution
  • 4:4:4 which has full chroma resolution (no subsampling)

4:2:2 is not particularly useful over the other options, so this guide will focus on 4:2:0 and 4:4:4.

4:2:0 is the most commonly used format for videos. Nearly every DVD, blu-ray, camera recording, etc. uses 4:2:0 subsampling. This is because, in the majority of cases, human eyes do not notice the reduction in chroma resolution. There is very little benefit to using 4:4:4 in the average case.

However, there are some exceptions. The most notable is screen recordings. Things like text overlays, video game UI overlays, etc. have very fine, color-dependent detail that can be destroyed by chroma subsampling and result in an aliased look to the video. Therefore, it is recommended to use 4:4:4 subsampling when recording your screen, and 4:2:0 subsampling in most other cases.

Common formats​

VS nameFFmpeg nameMeaning
GRAY8gray8Brightness only, 8 bits, packed
GRAY16gray16le, gray16be (the suffix specifies the endianness)Brightness only, 16 bits
RGB888rgb24red, green, blue, 8 bits per component
YUV420P8yuv420pluma, chroma blue, chroma red, 8 bits per component, planar, 4:2:0 subsampling
YUV422P8yuv422pluma, chroma blue, chroma red, 8 bits per component, planar, 4:2:2 subsampling
YUV444P8yuv444pluma, chroma blue, chroma red, 8 bits per component, planar, no subsampling
YUV420P10yuv420p10le, yuv420p10leluma, chroma blue, chroma red, 10 bits per component, planar, 4:2:0 subsampling
YUV422P10yuv422p10le, yuv422p10leluma, chroma blue, chroma red, 10 bits per component, planar, 4:2:2 subsampling
YUV444P10yuv444p10le, yuv444p10leluma, chroma blue, chroma red, 10 bits per component, planar, no subsampling

References​

Footnotes​

  1. YUV - VideoLAN Wiki ↩