HEVC: An introduction to high efficiency coding

1. Summary

High Efficiency Video Coding (HEVC) is a new standard for video compression that has the potential to deliver better performance than earlier standards such as H.264/AVC.

Source video, consisting of a sequence of video frames, is encoded or compressed by an HEVC video encoder to create a compressed video bitstream. The compressed bitstream is stored or transmitted. A video decoder decompresses the bitstream to create a sequence of decoded frames.

HEVC has the same basic structure as previous standards such as MPEG-2 Video and H.264/AVC. However, HEVC contains many incremental improvements such as:

  • More flexible partitioning, from large to small partition sizes

  • Greater flexibility in prediction modes and transform block sizes

  • More sophisticated interpolation and deblocking filters

  • More sophisticated prediction and signaling of modes and motion vectors

  • Features to support efficient parallel processing.

The result is a video coding standard that can enable better compression, at the cost of potentially increased processing power.

2. What is HEVC?

  1. An international standard for video compression. Developed by a working group of ISO/IEC MPEG (Moving Picture Experts Group) and ITU-T VCEG (Video Coding Experts Group), HEVC is an international standard, jointly published as ISO/IEC 23008-2 and ITU-T Recommendation H.265. HEVC is published as a document (the standard itself) together with a reference software implementation (the test model, HM).

  2. A format for compressed video. The HEVC standard specifies a format for compressed or encoded video sequences, together with a method for decoding this format. An HEVC-compatible video sequence should (a) meet the specification of the compressed video format and (b) be correctly decode-able using the method described in the standard. HEVC video sequences can be stored in media files, streamed over the internet, transmitted by broadcast, etc.

  3. A set of tools or methods for video compression. HEVC specifies a number of methods or tools that may be used by a video compression encoder. It’s up to the designer of the encoder which tools are actually used, and how they are applied

  4. Better video compression. Depending on how the tools are used, HEVC has the potential to offer significantly higher compression than earlier standards such as H.264 / AVC. Achieving the best possible compression is likely to require significant computational resources.

3. Why do we need it?

HEVC aims to provide a step change improvement in video compression compared with earlier standards. HEVC’s predecessor, the H.264/AVC standard, was first published in 2003. Since then, digital video has become increasingly ubiquitous. High Definition is now the norm for many devices and applications. HEVC was developed to address the following trends:

  • Widespread use of digital video, at increasingly high resolutions, which puts a significant strain on network capacity.

  • Increasing use of video resolutions beyond HD, which will increase the burden on networks and storage even further.

  • Continuing improvements in processing capacity. In 2013, a mobile handset or tablet is likely to have more computing power than a desktop computer from 2003.

With these issues in mind, a new video compression standard that makes use of higher computational capacities to enable more efficient handling of high resolution video is an attractive proposition. With HEVC, it should be possible to store or transmit video more efficiently than with earlier technologies such as H.264. This means:

  • At the same picture size and quality, an HEVC video sequence should occupy less storage or transmission capacity than the equiv lent H.264 video sequence.

  • At the same storage or transmission bandwidth, the quality and/or resolution of an HEVC video sequence should be higher than the corresponding H.264 video sequence.

 

4. How does HEVC work?

HEVC is based on the same general structure as previous standards. Source video, consisting of a sequence of video frames, is encoded or compressed by a video encoder to create a compressed video bitstream. The compressed bitstream is stored or transmitted. A video decoder decompresses the bitstream to create a sequence of decoded frames.

The steps carried out by a video encoder (Figure 2) include:

  • Partitioning each picture into multiple units

  • Predicting each unit using inter or intra prediction, and subtracting the prediction from the unit

  • Transforming and quantizing the residual (the difference between the original picture unit and the prediction)

  • Entropy encoding the transform output, prediction information, mode information and headers.

A video decoder reverses the steps:

  • Entropy decoding and extracting the elements of the coded sequence

  • Rescaling and inverting the transform stage

  • Predicting each unit and adding the prediction to the output of the inverse transform

  • Reconstructing a decoded video image.

 

The HEVC standard defines (ii) the syntax or format of a compressed video sequence and (ii) a method of decoding a compressed sequence. The actual design of the encoder is not standardised.

 

4.1. Partitioning

HEVC supports highly flexible partitioning of a video sequence. Each frame of the sequence is split up into rectangular or square regions (Units or Blocks), each of which is predicted from previously coded data. After prediction, any residual information is transformed and entropy encoded.

Each coded video frame, or picture, is partitioned into Tiles and/or Slices, which are further partitioned into Coding Tree Units (CTUs). The CTU is the basic unit of coding, analogous to the Macroblock in earlier standards, and can be up to 64x64 pixels in siz .

A Coding Tree Unit can be subdivided into square regions known s Coding Units (CUs) using a quadtree structure (Figure 3). Each CU is predicted using Inter or Intra prediction and transformed using one or more Transform Units (see below).

Figure 4 shows a video frame partitioned into slices, with one slice highlighted in blue. The highlighted slice contains six 64x64 CTUs.

Figure 5 shows a close-up of the CTU highlighted in Figure 4. The 64x64 CTU is split into four 32x32 regions, with the top-left 32x32 CU highlighted. In the other four quarters, the 32x32 region is split further, to 16x16 or 8x8 CUs.

4.2. Prediction

Frames of video are coded using Intra or Inter prediction. Figure 6 shows a sequence of coded video frames or coded pictures. The first picture (0) is coded using Intra prediction only, using spatial prediction from other regions of the same picture. Subsequent pictures are predicted from one, two or more reference pictures, using Inte and/or Intra prediction for each Prediction Unit (PU). The prediction sources for each picture are indicated by arrows.

 

Each Coding Unit (CU) is partitioned into one or more Prediction Units (PUs), each of which is predicted using Intra or Inter prediction.

Intra prediction: Each PU is predicted from neighbouring image data in the same picture, using DC prediction (an average value for the PU), planar prediction (fitting a plane surface to the PU) or directional prediction (extrapolating from neighbouring data).

Inter prediction: Each PU is predicted from image data in one or two reference pictures (before or after the current picture in display order), using motion compensated prediction. Motion vectors have up to quarter-sample resolution (luma component).

Figure 7 shows two examples of Prediction Units. The CTU in the centre of the Figure is predicted using a single 64x64 PU. All the samples in this PU are predicted using the same motion compensated inter prediction from one or two reference frames. Shown on the right is an 8x16 PU, which is part of the prediction structure for a 32x32 CU.

Further reading

Iain E. Richardson, “Coding Video: A Practical Guide to HEVC and Beyond”, John Wiley & Sons, 2024.

About the author

Vcodex is led by Professor Iain Richardson, an internationally known expert on the MPEG and H.264 video compression standards. Based in Delft, The Netherlands, he frequently travels to the US and Europe.

Iain Richardson is an internationally recognised expert on video compression and digital video communications. He is the author of four other books about video coding which include two widely-cited books on the H.264 Advanced Video Coding standard. For over thirty years, he has carried out research in the field of video compression and video communications, as a Professor at the Robert Gordon University in Aberdeen, Scotland and as an independent consultant with his own company, Vcodex. He advises companies on video compression technology and is sought after as an expert witness in litigation cases involving video coding.