What is AV1?

2024-07-18 1968 words 10 minutes

Contents

What is AV1?

In today’s world, video content consumption is steadily increasing, and high-resolution (4K, 8K) content is becoming more widespread. This situation has increased the need for more effective video compression technologies. AV1 (AOMedia Video 1) is an open-source and royalty-free video codec standard developed to address this need.

Why Did We Need AV1?

Previous codecs like H.264/AVC and H.265/HEVC were either insufficient to meet the increasing demands for video quality and resolution, or their licensing fees posed a significant cost, especially for companies engaged in large-scale video distribution. AV1 was developed to offer both higher compression ratios and create an open and free alternative.

How Does AV1 Work?

AV1 is fundamentally a block-based and hybrid video codec. The encoding and decoding processes involve the following steps:

Partitioning
Prediction
Transform and Quantization
Entropy Coding

Partitioning

AV1 uses a highly flexible partitioning structure:

Superblocks: 128x128 pixels in size
Coding Blocks: Ranging from 4x4 to 128x128 in size
Recursive structure: Superblocks can be adaptively divided into smaller blocks based on content

*Figure 1: AV1 Partitioning Structure*

Prediction

Prediction mechanisms, one of the most important components of the AV1 codec, greatly affect video compression efficiency. AV1 offers more advanced and diverse prediction methods compared to previous codecs. These prediction methods are divided into two main categories: Intra Prediction and Inter Prediction. Additionally, AV1 uses special prediction techniques for chroma and luma components.

1. Intra Prediction

Intra prediction uses pixel values within the current frame to make predictions. AV1 offers a wide range of modes for intra prediction.

1.1. Angular Modes

AV1 offers a total of 56 angular intra prediction modes. This is a significant increase compared to H.265/HEVC’s 33 modes.

Modes progress from 45-degree angles with 3-4 degree increments.
These fine angle intervals allow for more accurate predictions, especially in images with complex textures.

*Figure 2: AV1's 56 angular intra prediction modes*

1.2. Planar Mode

The planar mode assumes that pixels within the block form a smooth transition. This mode is effective particularly in areas with gradual color changes.

1.3. DC Mode

The DC mode predicts by taking the average of pixels around the block. This is useful in flat and single-color areas.

1.4. Palette Mode

AV1 offers a special palette mode for content with a limited color palette. This mode:

Identifies unique colors within the block.
Saves these colors to a palette.
Encodes the palette index for each pixel.

The palette mode is particularly effective for screen sharing and certain types of animation.

1.5. Smooth Prediction Modes

AV1 offers smooth prediction modes specially designed for smooth transitions:

Smooth Vertical
Smooth Horizontal
Smooth

These modes are effective in areas with gradual color changes.

2. Inter Prediction

Inter prediction uses information from previous or future frames to predict the current frame. AV1 also offers various innovations in this area.

2.1. Advanced Motion Vector Prediction

AV1 uses advanced algorithms to predict motion vectors:

Analyzes motion vectors of neighboring blocks.
Examines motion patterns in previous frames.
Uses this information to predict the most likely motion vector for the current block.

2.2. Compound Prediction

AV1 can use multiple reference frames for a single block:

Combines predictions from two different reference frames.
Uses weighted average, distance-weighted compound, and other complex combination methods.

2.3. Warped Motion Compensation

This feature is used to model complex movements such as camera movements or object rotations:

Predicts the motion model using affine transformations.
Supports global and local motion models.
Particularly effective in scenes involving pan, zoom, and rotation.

2.4. Overlapped Block Motion Compensation (OBMC)

OBMC is used to smooth sharp transitions at block boundaries:

Takes into account motion vectors of neighboring blocks.
Provides smoother transitions at block boundaries.
Reduces block artifacts.

2.5. Multiple Reference Frames

AV1 can use multiple reference frames for a block:

Can use distant past frames as references.
This is particularly useful in scenes with periodic movements.

3. Adaptive Prediction Mode Selection

AV1 uses a sophisticated algorithm to select the most suitable prediction mode for each block:

Uses the Rate-Distortion Optimization (RDO) technique.
Tries different prediction modes and selects the mode that provides the best bit-rate/quality balance.

4. Chroma and Luma Prediction

AV1 provides more effective prediction by handling chroma (color) and luma (brightness) components separately.

4.1. Luma Prediction

Luma prediction uses most of the intra and inter prediction methods mentioned above:

All 56 angular modes, planar mode, DC mode, and smooth modes can be used for luma prediction.
Luma prediction is typically done in block sizes ranging from 4x4 to 128x128.

4.2. Chroma Prediction

Chroma prediction shows some differences compared to luma prediction:

It’s typically done in smaller block sizes (e.g., 4x4 to 64x64).
The number of intra prediction modes used is generally fewer than for luma.

4.3. Chroma from Luma (CfL) Prediction

CfL prediction, one of AV1’s most innovative features, predicts chroma values using luma values:

The correlation between luma and chroma values is analyzed.
A linear model is created based on this correlation.
Chroma values are predicted by applying luma values to this model.

*Figure 3: AV1's Chroma from Luma (CfL) prediction method*

4.4. Separate Chroma Prediction Modes

AV1 also offers separate prediction modes for chroma:

Special intra prediction modes for chroma
Chroma-specific inter prediction
Separate motion vectors for chroma

4.5. Joint Chroma Prediction

AV1 can perform joint prediction for two chroma components (Cb and Cr):

Utilizes the correlation between two chroma components.
Reduces the total number of bits required for chroma.

5. Adaptive Chroma and Luma Prediction Selection

The AV1 encoder uses a complex decision mechanism to select the most suitable chroma and luma prediction method for each block:

Evaluates different prediction modes using the Rate-Distortion Optimization (RDO) technique.
Selects the best prediction mode for luma and chroma separately or together.
Chooses between CfL, separate chroma prediction, or joint chroma prediction depending on the content.

AV1’s advanced prediction mechanisms play a critical role in achieving high compression ratios. These improvements in intra and inter prediction methods provide significant quality increases, especially in complex content and at low bit rates. Special prediction techniques for chroma and luma enable more effective encoding of color and brightness information. These features make AV1 stand out as the video codec of the future.

Transform and Quantization

AV1 uses multiple transform functions:

DCT (Discrete Cosine Transform)
ADST (Asymmetric Discrete Sine Transform)
Flipadst
Identity transform

Transform sizes can range from 4x4 to 64x64 and can be rectangular.

Entropy Coding

AV1 adopts a different entropy coding approach that uses multi-symbol arithmetic coding instead of CABAC used in h265/hevc. This system is generally called AV1 Entropy Coder or just “EC”. Its main features are:

Multi-Symbol Coding: AV1 uses a multi-symbol arithmetic coder, not a binary one. This can encode multiple symbols at once.
Non-Binary Arithmetic Coding: Instead of binary arithmetic coding, it uses a system that can handle a wider range of symbol probabilities.
Context Modeling: AV1 uses customized context models for different types of data, but this is not as complex as in CABAC.
Simpler Probability Update: It uses a simpler probability update mechanism compared to CABAC.

This approach provides faster encoding and decoding compared to CABAC, while still offering good compression performance. This design choice of AV1 aims to strike a balance between low latency and high efficiency.

Other Features of AV1

Film Grain Synthesis

Preserve the natural film grain appearance in film or high-quality video content
Recreate fine details and texture lost during compression

*Figure 4: AV1's Film grain diagram*

Working Principle

On the encoding side:
- Film grain in the original video is analyzed
- Film grain is parameterized
Parameters are transmitted along with the video stream
On the decoding side:
- Synthetic film grain is recreated using the parameters
- The created grain is applied to the video

Advantages

Reduces bit rate
Preserves the visual effect of film grain even at low bit rates
Grain intensity can be adjusted for different devices or displays

Parameters

Grain intensity
Grain size
Grain distribution according to color channels
Grain distribution according to brightness levels

Application Process

Analysis
Encoding
Decoding

Compatibility and Control

Old devices or non-supporting players can ignore this feature
Content creators can control film grain synthesis with encoder settings

Limitations

It can be difficult to fully capture complex or dynamic film grain patterns
Manual adjustments may be needed for high-quality, professional content

Loop Restoration Filter

The Loop Restoration Filter, one of the most important features of the AV1 (AOMedia Video 1) codec, is an advanced filtering mechanism designed to improve video quality and reduce coding artifacts. This filter is applied in the final stages of the coding process and significantly improves image quality, especially at low bit rates.

Purpose of the Loop Restoration Filter

The main purposes of the Loop Restoration Filter are:

Reduce coding artifacts
Remove noise while preserving edge details
Improve overall image quality
Provide high-quality video streaming even at low bit rates

Working Principle of the Loop Restoration Filter

AV1’s Loop Restoration Filter consists of two main components:

Wiener Filter
Self-guided Filter

Wiener Filter

The Wiener filter aims to reduce noise in the image using a statistical approach. This filter works as follows:

Analyzes local statistics in the image.
Calculates optimal filter coefficients based on these statistics.
Filters the image using the calculated coefficients.

The Wiener filter is particularly effective in reducing noise in smooth areas.

Self-guided Filter

The self-guided filter performs filtering using the structure of the image itself. The working principle of this filter is as follows:

Uses the image itself as a guide.
Smooths out flat areas while preserving edge structures.
Adjusts the filtering intensity by analyzing pixel values in local regions.

The self-guided filter is particularly effective in reducing artifacts while preserving edge details.

Application of the Loop Restoration Filter

The Loop Restoration Filter is applied in the final stages of the coding process:

The video frame is encoded and decoded.
The deblocking filter is applied.
CDEF (Constrained Directional Enhancement Filter) is applied.
Finally, the Loop Restoration Filter is applied.

The filter divides the frame into small blocks (typically 64x64 or 128x128 pixels) and selects the most suitable filtering method for each block:

Wiener filter
Self-guided filter
No filtering (if filtering doesn’t provide benefit)

Advantages of the Loop Restoration Filter

Adaptive Structure: Can use different filtering techniques in different regions of the image.
High Quality: Significantly improves image quality even at low bit rates.
Edge Preservation: Reduces noise and artifacts while preserving edge details.
Efficiency: Increases coding efficiency, providing high quality at lower bit rates.

Loop Restoration Filter vs. Other Codecs

AV1’s Loop Restoration Filter serves a similar function to H.265/HEVC’s Sample Adaptive Offset (SAO) filter, but is more advanced and adaptive. VP9 does not have a similar mechanism.

Performance Impact

The application of the Loop Restoration Filter provides a significant quality increase, especially at low bit rates. Typically:

0.5 - 1.5 dB PSNR increase
Noticeable improvement in subjective image quality
5-15% increase in coding efficiency

AV1’s Loop Restoration Filter is an important feature that demonstrates the complexity and effectiveness of modern video codecs. This filter provides a great advantage, especially for platforms that distribute video over the internet, by providing high-quality video streaming even at low bit rates. Similar and more advanced filtering techniques can be expected to be used in future codec developments.

CDEF (Constrained Directional Enhancement Filter)

A filter that reduces noise while preserving edge details.

Parallel Processing

AV1 is designed to work efficiently on multi-core processors:

Tile-based parallelism
Row-based multi-threading

Performance of AV1

AV1 can offer approximately 30% better compression ratios compared to H.265/HEVC, and 50% better compared to VP9.

Conclusion

AV1 stands out as a strong candidate to meet future video coding needs with its high compression efficiency, open-source structure, and royalty-free license. It offers significant advantages especially for streaming platforms and large-scale content distributors.

I hope this article has provided you with useful information about AV1. In my next article, I will develop and share an AV1 converter program.