Better late than never: SVT-AV1 v2.2.x Deep Dive

November 14, 2024 · 160 min read

Encoder

SVT-AV1 v2.2.0 was released in late August and a minor version v2.2.1 followed suit to adress some bugs. This blog post will focus on comparing this new encoder version to the last, on the basis of benchmarks and visual comparisons. We will quantify the new trade-offs between compression efficiency and encoding speed, so you can choose the right balance for your projects. Our metrics of choice today will be SSIMULACRA2 and XPSNR, used in conjonction with a revised methodology.

Feedback

The biggest missed opportunity of the previous SVT-AV1 deep dives was the absence of visual comparisons. Indeed, metrics may be convenient for easily quantifying differences between encoder versions or encoding parameters, but they fail to give the information of how much these differences matter for your eyes. However, making properly useful visual comparisons isn't an easy task. Comparing two encodes of varying bitrates will bias the result against one or the other, which is not desirable. Figuring out the best way to present these comparisons and making the appropriate scripts took me weeks. Gathering all the necessary data, crafting the comparisons and proceeding to a double-checking pass took me another few weeks. These reasons explain why this blog post took so long to release, but I hope it will have been worth the wait! On that note though, I have been uploading thousands of png screenshots to slow.pics and I ended up rate-limited. This has prevented me from uploading the visual comparisons for preset 11, 12 and some of 10. I'm actively trying to fix this situation, so please understand.

Another reason for this taking so long was my decision to increase the amount of video samples while increasing the amount of CRF values tested. A grand total of 3682 encodes were done for this blog post alone, in the span of around two weeks, where my PC would be exclusively encoding 24/7. Send help.

Also, the graphs will now be using the harmonic mean instead of the arithmetic mean. Indeed, arithmetic mean scores fail to account for deviations and outliers. Using the harmonic mean implies low-scoring frames have more weight towards the final score, which adds a consistency component to the picture. That's not it though! Consistency is capital in an enjoyable watching experience. As such, each graph now possesses a SSIMULACRA2 (Harmonic) version, a XPSNR (Harmonic) version and a SSIMULACRA2 (Standard Deviation) version, in order to closely monitor variations in consistency between presets.

Lastly, the biggest complaint I have received is me exclusively using anime clips. The reason for that is pretty simple: I mostly encode anime content on my free time and am myself very little interested in other types of media. However, these blog posts have grown in exposure and I understand a majority of people is more concerned about the performance of encoders on live action content or gaming clips. Thus, this new blog post is comprised of 3 live action clips, 2 gaming clips and 2 anime clips! See the sacrifices I'm making for y'all?

Methodology

The resources provided will include both graphs and image comparisons. Graphs offer a straightforward, objective look at efficiency across encoder parameters, using metrics as benchmarks for performance. In contrast, image comparisons display actual samples from encoded files, allowing you to assess quality firsthand. This adds a subjective dimension to the comparisons, giving you a more nuanced understanding of each preset's impact on visual quality.

The testing methodology involves using relatively short video samples with a wide range of content types, uncompressed to the y4m file format for ease of use. These lossless files are directly fed to SvtAv1EncApp, implying the performance of a single encoder instance is what's being measured here. A more serious AV1 encoding pipeline should probably be leveraging a chunked encoding approach, especially on higher core count systems. Once an encode is done, SSIMULACRA2 scores are calculated using the Zig implementation, XPSNR scores on the other hand are calculated using a ffmpeg filter, and the data is then aggregated into a final Harmonic or Standard Deviation score to create the graphs for this benchmark. The Constant Rate Factor (CRF) is plotted against encoding time, and the metrics scores against encode size (bitrate). The former may represent the efficiency as defined by the speed achieved at a certain quality target. For the latter, Bits Per Pixel scores (BPP) are calculated so that the Metric / BPP graphs may represent the compression efficiency, normalized by resolution.

How to read the graphs? For the compression efficiency ones, the closer to the top left the better. For the encoding speed ones, the closer to the left the faster. For the standard deviation ones, the closer to the bottom left the better.

The clips used in this test were acquired legally. The Codec Wiki and its contributors do not endorse media piracy.

SvtAv1EncApp was compiled directly from the v2.1.2 and v2.2.1 source code using Clang 18.1.8 and the provided Build/linux/build.sh script with the following command: build.sh cc=clang cxx=clang++ jobs=$(nproc) enable-lto static native release. The testing machine is comprised of an i3 12100 in its stock configuration, with 2x8GB of 3200MHz CL14 DDR4 RAM, in Arch Linux with kernel 6.9.12 and the performance governor enabled. All encodes have been made in the same session without rebooting.

I want to give a disclaimer concerning encoding speeds. Contrary to the efficiency results which should be reproducible independantly of the machine, measuring speed is a pretty difficult endeavor, with increased risks for errors. The performance numbers I mention may differ for you depending on the hardware configuration at hand.

Samples & Encoding Settings

The samples are as follows:

17s Avatar The Way Of Water (trailer 3) clip sourced from thedigitaltheater.com (1916x804p, 23.976fps).
6s Ducks Take Off clip sourced from xiph.org (1280x720p, 50fps).
3s Fallout 4 clip sourced from another encoder fellow (1920x1080p, 60fps).
8s Minecraft clip sourced from xiph.org (1920x1080p, 60fps).
8s Sol Levante HDR clip sourced from opencontent.netflix.com (3840x2160p, 24fps). This one is pretty educative as SVT-AV1's behavior isn't influenced by the existence (or lack thereof) of HDR metadata in a source.
21s Suzume (trailer 2) clip sourced from thedigitaltheater.com (1920x808p, 23.976fps).
13s The Mandalorian (trailer 2) clip sourced from thedigitaltheater.com (1920x800p, 23.976fps).

All clips have been encoded in a wide quality range, from --crf 10 to --crf 50, by increments of 2, with the exception of preset -1 that uses increments of 4.

--preset X --hierarchical-levels 4 are the only parameter used here, in conjunction with the CRF values. I have been asked to use --hierarchical-levels 4 by fellow SVT-AV1-PSY developers to force smaller mini GOPs, more appropriate for testing.

Else, the SVT-AV1 defaults were used. The ones worth mentioning are:

--tune 1: tune PSNR
--aq-mode 2: variance deltaq
--enable-qm 0: quantisation matrices disabled
--irefresh-type 2: closed GOP
--enable-tf 1: temporal filtering enabled And more, like CDEF and restoration enabled, overlays and film-grain disabled...

Visual comparisons

comp_showcase

Throughout this blog post, you’ll find slow.pics links that provide various visual comparisons between presets.

The “full” links offer comparisons across the entire quality range for each source.
The HQ (High Quality), MQ (Medium Quality), and LQ (Low Quality) links showcase more targeted comparisons. These have been carefully handcrafted to be as size-normalized as possible, given the available encodes. We want to be focusing on encodes with minimal bitrate deviation for a fair comparison.

Feel free to double-check the bitrate of each frame or scene to make a more informed observation, keeping the size difference in mind when comparing the encodes.

Use the arrow keys and numpad to navigate between screenshots. Alternatively, you can click on "Slider comparison" and select two sources if you prefer comparing this way.

Without further ado, let's start with the first comparisons!

Presets comparisons (-1 -> 12)

In the following graphs, you may find comparisons between all SVT-AV1 presets, ranging from the slowest --preset -1 to the fastest --preset 12.

Just like in v2.1.x, preset 6 and 13 do not exist in v2.2.x and are instead mapped to preset 7 and 12 respectively.

Efficiency

First of all, the complete efficiency graphs:

SSIMU2

XPSNR

STD DEV

Preview

You may notice something odd going on with the Avatar results using XPSNR. I have tried to understand the cause, without success. For the remainder of this blog post, the Avatar XPSNR results will be omitted. I will continue investigating and aim to have a workaround in place for next time.

Anyway, this graph may be impressive, but difficult to read. So let's analyse different quality targets.

The same graphs but focusing on the "high quality" range (CRF10 -> 22):

SSIMU2

XPSNR

STD DEV

Preview

Same, but now focusing on the "medium quality" range (CRF24 -> 36):

SSIMU2

XPSNR

STD DEV

Preview

And lastly, focusing on the "low quality" range (CRF38 -> 50):

SSIMU2

XPSNR

STD DEV

Preview

If we now focus on presets 4 and below, where it's more difficult to discern the differences between presets, we get this at "high quality":

SSIMU2

XPSNR

STD DEV

Preview

This at "medium quality":

SSIMU2

XPSNR

STD DEV

Preview

And the following at "low quality":

SSIMU2

XPSNR

STD DEV

Preview

Speed

Let's now compare the speed of all presets:

Speed

Preview

Unusable, right?

Then, here is what it looks like with a logarithmic scale:

Speed

Preview

Interpretation

It appears as if once again preset 2 through preset 4 remain the most balanced presets all-around in an efficient encoding scenario, with preset 3 not offering much improvements over preset 4 in average scores but nicely improving on consistency instead, and preset 2 offering a nice efficiency and consistency uplift on top.

In this release again, the quality gap between preset 2 and preset 1 is pretty narrow, and the speed penalty from preset 1 onward continuously increases, ending up close to ~2x. In comparison, the penalty of going from preset 3 to preset 2 is closer to ~1.5x. As such, using preset 1 is entering placebo territory and it is usually not recommended to waste precious encoding resources on preset 0 and preset -1. This especially applies at medium to high quality, though at extremely low quality targets, like the CRF40-50 range, we can still see appreciable gains from these placebo presets in some clips.

As for the faster presets, presets 5 to 10 are usually grouped on the graphs focusing on average scores and the ones focusing on consistency. They tend to stand apart from their slower counterparts by just a bit. Though preset 10 can be worryingly close to preset 11 on some occasions. They are all viable for your real-time needs. The rule is the same as usual: go the slowest you can bear that still achieves your goal!

Presets 11 and 12 are especially inefficient and inconsistent, and to be avoided at all costs. If possible, forget they even exist, as it's probably better to use a comparably fast (or faster) competing codec. They could still be of use in a convex-hull scenario, but in the case of realtime transcoding, you will be better off with some hardware solution like the ones found in RTX 4000 or Arc GPUs.

TLDR

The same conclusions as the previous blog posts can be made: clear quality gains can be observed as we decrease presets, until preset 2, however the effectiveness of dropping presets is noticeably less and less important as quality is increased.

SVT-AV1 v2.1.x vs v2.2.x presets comparisons:

In this section, we’ll examine the efficiency and speed differences across presets when upgrading from SVT-AV1 2.1.x to 2.2.x. This comparison should bring a new level of nuance to our results, highlighting both incremental improvements and any notable shifts in performance.

SVT-AV1 v2.1.x brought some nice improvements over v2.0.0, but does v2.2.x bring appreciable improvements in the presets trade-offs this time around as well? Let's find out!

`preset -1`: v2.1.x vs v2.2.x

Let's start things off with the battle of the placebos, with the Compression efficiency & consistency at "high to medium-ish quality" (CRF10 -> 30):

SSIMU2

XPSNR

STD DEV

Preview

Along with the Compression efficiency & consistency at "medium-ish to low quality" (CRF34 -> 50):

SSIMU2

XPSNR

STD DEV

Preview

Basically no changes at all, except a slight regression on Minecraft.

What about their speeds though?:

Speed

Preview

Well preset -1 basically became 15 to 25% faster, not bad at all!

Preset -1 visual comparisons:

Avatar (full), Avatar (HQ), Avatar (LQ)

Ducks (full), Ducks (HQ), Ducks (LQ)

Fallout (full), Fallout (HQ), Fallout (LQ)

Minecraft (full), Minecraft (HQ), Minecraft (LQ)

Sol Levante (full), Sol Levante (HQ), Sol Levante (LQ)

Suzume (full), Suzume (HQ), Suzume (LQ)

The Mandalorian (full), The Mandalorian (HQ), The Mandalorian (LQ)

`preset 0`: v2.1.x vs v2.2.x

Compression efficiency & consistency graphs, high quality range:

SSIMU2

XPSNR

STD DEV

Preview

Compression efficiency & consistency graphs, medium quality range:

SSIMU2

XPSNR

STD DEV

Preview

Compression efficiency & consistency graphs, low quality range:

SSIMU2

XPSNR

STD DEV

Preview

Efficiency wise, this new preset 0 is close to unchanged from the old preset 0, but its consistency improved slightly in a few clips at high quality and decreased in one clip at low quality.

Speed graphs:

Speed

Preview

Preset 0's speed sees an improvement of about 20% at best. Overall, preset 0 got a proper upgrade!

Preset 0 visual comparisons:

Avatar (full), Avatar (HQ), Avatar (MQ), Avatar (LQ)

Ducks (full), Ducks (HQ), Ducks (MQ), Ducks (LQ)

Fallout (full), Fallout (HQ), Fallout (MQ), Fallout (LQ)

Minecraft (full), Minecraft (HQ), Minecraft (MQ), Minecraft (LQ)

Sol Levante (full), Sol Levante (HQ), Sol Levante (MQ), Sol Levante (LQ)

Suzume (full), Suzume (HQ), Suzume (MQ), Suzume (LQ)

The Mandalorian (full), The Mandalorian (HQ), The Mandalorian (MQ), The Mandalorian (LQ)

`preset 1`: v2.1.x vs v2.2.x

Compression efficiency & consistency graphs, high quality range:

SSIMU2

XPSNR

STD DEV

Preview

Compression efficiency & consistency graphs, medium quality range:

SSIMU2

XPSNR

STD DEV

Preview

Compression efficiency & consistency graphs, low quality range:

SSIMU2

XPSNR

STD DEV

Preview

Preset 1 is mostly unchanged but sees another slight regression in Minecraft at high quality.

Speed graphs:

Speed

Preview

Depending on the clip, speed is mostly unchanged or ever so slighty improved. Preset 1 is a bit stagnant this release.

Preset 1 visual comparisons:

Avatar (full), Avatar (HQ), Avatar (MQ), Avatar (LQ)

Ducks (full), Ducks (HQ), Ducks (MQ), Ducks (LQ)

Fallout (full), Fallout (HQ), Fallout (MQ), Fallout (LQ)

Minecraft (full), Minecraft (HQ), Minecraft (MQ), Minecraft (LQ)

Sol Levante (full), Sol Levante (HQ), Sol Levante (MQ), Sol Levante (LQ)

Suzume (full), Suzume (HQ), Suzume (MQ), Suzume (LQ)

The Mandalorian (full), The Mandalorian (HQ), The Mandalorian (MQ), The Mandalorian (LQ)

`preset 2`: v2.1.x vs v2.2.x

Compression efficiency & consistency graphs, high quality range:

SSIMU2

XPSNR

STD DEV

Preview

Compression efficiency & consistency graphs, medium quality range:

SSIMU2

XPSNR

STD DEV

Preview

Compression efficiency & consistency graphs, low quality range:

SSIMU2

XPSNR

STD DEV

Preview

Preset 2's efficiency has regressed at high quality on some clips, improved in some and stayed the same in others. Except in one clip, consistency seems to have improved all around. At low to medium quality targets, efficiency is mostly unchanged, same for consistency. Not exactly noteworthy.

Speed graphs:

Speed

Preview

Speed was improved by about 10-20%. Not a bad showcase, for sure.

Preset 2 visual comparisons:

Avatar (full), Avatar (HQ), Avatar (MQ), Avatar (LQ)

Ducks (full), Ducks (HQ), Ducks (MQ), Ducks (LQ)

Fallout (full), Fallout (HQ), Fallout (MQ), Fallout (LQ)

Minecraft (full), Minecraft (HQ), Minecraft (MQ), Minecraft (LQ)

Sol Levante (full), Sol Levante (HQ), Sol Levante (MQ), Sol Levante (LQ)

Suzume (full), Suzume (HQ), Suzume (MQ), Suzume (LQ)

The Mandalorian (full), The Mandalorian (HQ), The Mandalorian (MQ), The Mandalorian (LQ)

`preset 3`: v2.1.x vs v2.2.x

Compression efficiency & consistency graphs, high quality range:

SSIMU2

XPSNR

STD DEV

Preview

Compression efficiency & consistency graphs, medium quality range:

SSIMU2

XPSNR

STD DEV

Preview

Compression efficiency & consistency graphs, low quality range:

SSIMU2

XPSNR

STD DEV

Preview

Practically, it's a wash.

Speed graphs:

Speed

Preview

Still, preset 3 got slightly faster, I'm happy to report this is a speedup!

Preset 3 visual comparisons:

Avatar (full), Avatar (HQ), Avatar (MQ), Avatar (LQ)

Ducks (full), Ducks (HQ), Ducks (MQ), Ducks (LQ)

Fallout (full), Fallout (HQ), Fallout (MQ), Fallout (LQ)

Minecraft (full), Minecraft (HQ), Minecraft (MQ), Minecraft (LQ)

Sol Levante (full), Sol Levante (HQ), Sol Levante (MQ), Sol Levante (LQ)

Suzume (full), Suzume (HQ), Suzume (MQ), Suzume (LQ)

The Mandalorian (full), The Mandalorian (HQ), The Mandalorian (MQ), The Mandalorian (LQ)

`preset 4`: v2.1.x vs v2.2.x

Compression efficiency & consistency graphs, high quality range:

SSIMU2

XPSNR

STD DEV

Preview

Compression efficiency & consistency graphs, medium quality range:

SSIMU2

XPSNR

STD DEV

Preview

Compression efficiency & consistency graphs, low quality range:

SSIMU2

XPSNR

STD DEV

Preview

Preset 4 sees a consistent though small improvement in average scores and standard deviation across the entire quality range on basically all clips.

Speed graphs:

Speed

Preview

Unfortunately, it got slower as a result. If you remember, v2.1.0 did the exact contrary over v2.0.0, I wonder if preset 4 simply took back the place it previously had...

Preset 4 visual comparisons:

Avatar (full), Avatar (HQ), Avatar (MQ), Avatar (LQ)

Ducks (full), Ducks (HQ), Ducks (MQ), Ducks (LQ)

Fallout (full), Fallout (HQ), Fallout (MQ), Fallout (LQ)

Minecraft (full), Minecraft (HQ), Minecraft (MQ), Minecraft (LQ)

Sol Levante (full), Sol Levante (HQ), Sol Levante (MQ), Sol Levante (LQ)

Suzume (full), Suzume (HQ), Suzume (MQ), Suzume (LQ)

The Mandalorian (full), The Mandalorian (HQ), The Mandalorian (MQ), The Mandalorian (LQ)

`preset 5`: v2.1.x vs v2.2.x

Compression efficiency & consistency graphs, high quality range:

SSIMU2

XPSNR

STD DEV

Preview

Compression efficiency & consistency graphs, medium quality range:

SSIMU2

XPSNR

STD DEV

Preview

Compression efficiency & consistency graphs, low quality range:

SSIMU2

XPSNR

STD DEV

Preview

Both metrics say preset 5 regressed slightly to moderately, though surprisingly its consistency is basically unchanged.

Speed graphs:

Speed

Preview

The result of this regression is an impressive speedup of up to 25%.

Preset 5 visual comparisons:

Avatar (full), Avatar (HQ), Avatar (MQ), Avatar (LQ)

Ducks (full), Ducks (HQ), Ducks (MQ), Ducks (LQ)

Fallout (full), Fallout (HQ), Fallout (MQ), Fallout (LQ)

Minecraft (full), Minecraft (HQ), Minecraft (MQ), Minecraft (LQ)

Sol Levante (full), Sol Levante (HQ), Sol Levante (MQ), Sol Levante (LQ)

Suzume (full), Suzume (HQ), Suzume (MQ), Suzume (LQ)

The Mandalorian (full), The Mandalorian (HQ), The Mandalorian (MQ), The Mandalorian (LQ)

`preset 6`: v2.1.x vs v2.2.x

Preset 6 is mapped to preset 7 in v2.2.x.

`preset 7`: v2.1.x vs v2.2.x

Compression efficiency & consistency graphs, high quality range:

SSIMU2

XPSNR

STD DEV

Preview

Compression efficiency & consistency graphs, medium quality range:

SSIMU2

XPSNR

STD DEV

Preview

Compression efficiency & consistency graphs, low quality range:

SSIMU2

XPSNR

STD DEV

Preview

Preset 7 is close to unchanged in v2.2.x.

Speed graphs:

Speed

Preview

It still got some slight to moderate speedups though, which can be appreciated.

Preset 7 visual comparisons:

Avatar (full), Avatar (HQ), Avatar (MQ), Avatar (LQ)

Ducks (full), Ducks (HQ), Ducks (MQ), Ducks (LQ)

Fallout (full), Fallout (HQ), Fallout (MQ), Fallout (LQ)

Minecraft (full), Minecraft (HQ), Minecraft (MQ), Minecraft (LQ)

Sol Levante (full), Sol Levante (HQ), Sol Levante (MQ), Sol Levante (LQ)

Suzume (full), Suzume (HQ), Suzume (MQ), Suzume (LQ)

The Mandalorian (full), The Mandalorian (HQ), The Mandalorian (MQ), The Mandalorian (LQ)

`preset 8`: v2.1.x vs v2.2.x

Compression efficiency & consistency graphs, high quality range:

SSIMU2

XPSNR

STD DEV

Preview

Compression efficiency & consistency graphs, medium quality range:

SSIMU2

XPSNR

STD DEV

Preview

Compression efficiency & consistency graphs, low quality range:

SSIMU2

XPSNR

STD DEV

Preview

The efficiency and consistency of preset 8 has improved at high quality.

Speed graphs:

Speed

Preview

And we can observe a speed increase of around 10%. Some crazy speed deviations can be noticed in Sol Levante.

Preset 8 visual comparisons:

Avatar (full), Avatar (HQ), Avatar (MQ), Avatar (LQ)

Ducks (full), Ducks (HQ), Ducks (MQ), Ducks (LQ)

Fallout (full), Fallout (HQ), Fallout (MQ), Fallout (LQ)

Minecraft (full), Minecraft (HQ), Minecraft (MQ), Minecraft (LQ)

Sol Levante (full), Sol Levante (HQ), Sol Levante (MQ), Sol Levante (LQ)

Suzume (full), Suzume (HQ), Suzume (MQ), Suzume (LQ)

The Mandalorian (full), The Mandalorian (HQ), The Mandalorian (MQ), The Mandalorian (LQ)

`preset 9`: v2.1.x vs v2.2.x

Compression efficiency & consistency graphs, high quality range:

SSIMU2

XPSNR

STD DEV

Preview

Compression efficiency & consistency graphs, medium quality range:

SSIMU2

XPSNR

STD DEV

Preview

Compression efficiency & consistency graphs, low quality range:

SSIMU2

XPSNR

STD DEV

Preview

Efficiency and consistency stayed mostly the same.

Speed graphs:

Speed

Preview

Speed improved by a few percents at most. Preset 9 has stagnated over v2.1.x.

Preset 9 visual comparisons:

Avatar (full), Avatar (HQ), Avatar (MQ), Avatar (LQ)

Ducks (full), Ducks (HQ), Ducks (MQ), Ducks (LQ)

Fallout (full), Fallout (HQ), Fallout (MQ), Fallout (LQ)

Minecraft (full), Minecraft (HQ), Minecraft (MQ), Minecraft (LQ)

Sol Levante (full), Sol Levante (HQ), Sol Levante (MQ), Sol Levante (LQ)

Suzume (full), Suzume (HQ), Suzume (MQ), Suzume (LQ)

The Mandalorian (full), The Mandalorian (HQ), The Mandalorian (MQ), The Mandalorian (LQ)

`preset 10`: v2.1.x vs v2.2.x

Compression efficiency & consistency graphs, high quality range:

SSIMU2

XPSNR

STD DEV

Preview

Compression efficiency & consistency graphs, medium quality range:

SSIMU2

XPSNR

STD DEV

Preview

Compression efficiency & consistency graphs, low quality range:

SSIMU2

XPSNR

STD DEV

Preview

Preset 10 received the most efficiency and consistency improvements out of all presets in this release. What will be the cost of such drastic change though?

Speed graphs:

Speed

Preview

Well, not that much all things considered! Preset 10's speed did decrease from barely anything to 20% in the most extreme situation, but its improvements well outshine its speed regression. Overall, preset 11 went from borderline unusable to becoming an interesting new fast real-time preset. This is pretty huge in my opinion, as it offers a new kind of trade-off no other AV1 encoder or prior SVT-AV1 versions did.

Preset 10 visual comparisons:

Avatar (full), Avatar (HQ), Avatar (MQ), Avatar (LQ)

Ducks (full), Ducks (HQ), Ducks (MQ), Ducks (LQ)

Fallout (full), Fallout (HQ), Fallout (MQ), Fallout (LQ)

Minecraft (full), Minecraft (HQ), Minecraft (MQ), Minecraft (LQ)

Sol Levante (full), Sol Levante (HQ), Sol Levante (MQ), Sol Levante (LQ)

Suzume (full), Suzume (HQ), Suzume (MQ), Suzume (LQ)

The Mandalorian (full), The Mandalorian (HQ), The Mandalorian (MQ), The Mandalorian (LQ)

`preset 11`: v2.1.x vs v2.2.x

Compression efficiency & consistency graphs, high quality range:

SSIMU2

XPSNR

STD DEV

Preview

Compression efficiency & consistency graphs, medium quality range:

SSIMU2

XPSNR

STD DEV

Preview

Compression efficiency & consistency graphs, low quality range:

SSIMU2

XPSNR

STD DEV

Preview

The new preset 11 places itself between the old preset 11 and the old preset 12 efficiency and consistency wise.

Speed graphs:

Speed

Preview

Unsurprisingly, its speed is also in-between the old preset 11 and the old preset 12. I'm unsure this new trade-off helps in anything.

`preset 12`: v2.1.x vs v2.2.x

Compression efficiency & consistency graphs, high quality range:

SSIMU2

XPSNR

STD DEV

Preview

Compression efficiency & consistency graphs, medium quality range:

SSIMU2

XPSNR

STD DEV

Preview

Compression efficiency & consistency graphs, low quality range:

SSIMU2

XPSNR

STD DEV

Preview

Preset 12 is almost unchanged from the previous release.

Speed graphs:

Speed

Preview

Same speed wise. So, no improvements at all for the two fastest presets. They remain all around very bad performers. In hindsight, it doesn't matter that I couldn't upload the preset 11 & 12 visual comparisons, because there is literally nothing to see.

`preset 13`: v2.1.x vs v2.2.x

Preset 13 is mapped to preset 12 in v2.2.x.

TLDR

With v2.2.x, we observed new efficiency/speed trade-offs for a good amount of presets. Some presets, like -1 and 0, received significant speed improvements at no efficiency cost. Presets 3 and 7 received more reasonable speedups. Presets 2, 8 and 11 have seen new trade-offs that are mostly beneficial. Preset 10 was deeply revamped and replaces preset 9 in my book as the fastest, still viable, real-time preset. Preset 4 seems to have returned to the state it was in v2.0.0. On the other hand, preset 5 seemed to have regressed slightly, and presets 1, 9 and 12 are basically unchanged from v2.1.x.

Conclusion

The release of SVT-AV1 v2.2.x brings some welcome speed improvements. Presets 2 through 4 continue to lead in efficiency for AV1 encoding, delivering top-tier quality and compression. Meanwhile, presets 5 through 10 offer solid alternatives for those who find presets 2 through 4 too slow, balancing quality with noticeably faster encoding times.

Hopefully, this comprehensive third deep dive has given you a helpful starting point for choosing settings when encoding with ~~the latest~~ SVT-AV1(-PSY) v2.2.x.

Future

Once more, this testing focused on establishing the new presets dynamics, however I haven't revisited the different SVT-AV1 parameters since v1.8.0. A few meaningful features have been added since, like variance boost, and with this overhauled methodology, the conclusion made in that blog post may be different now. I think it will be worth to revisit this in the future, maybe in the next blog post for v2.3.0? Yes, I'm fully aware I'm late because v2.3.0 has already been out for two whole weeks. Even if the frontend of this blog post doesn't seem to have radically changed, my entire workflow has tremendously evolved since last time. It may have taken me since mid-August to manage to complete this blog post, but my efforts should allow me to produce a follow-up faster.

I am conscious of this blog post's limitations. First of all, I observed some odd behaviors from XPSNR on certain clips which I haven't been able to pinpoint yet. It would also give me nonsensical standard deviation results, which is the reason why only the standard deviation in SSIMU2 scores was given. Second of all, SVT-AV1's own behavior starts to get messy when you approach the SSIMULACRA2 0 score, rendering all the data in that region pretty much useless. Plus, aggregating these metrics scores take forever, so I'm looking into ways to accelerate the process, for instance by offloading the work to my GPU using turbo-metrics. Also, I'm stuck between wanting to increase the amount of data points in each graphs to get more detailed results and having to keep everything readable. This time around I had to seperate each graph into three quality levels, but that's already too much for my liking. I will look into improving myself on the data presentation front. I feel like I have to streamline this formula to make it more digestible for everyone.

Please, I'm open to your remarks and suggestions to improve on this blog post formula.

That said, here are my plans for future blog posts:

a follow-up v2.3.0 article that also revisits the useful SVT-AV1 parameters. I'm planning for this to release before 2025.
an article focused on giving you encoding tips and explaining common AV1 encoding knowlegde is still planned for someday.
a future article focused on observing the evolution of all software AV1 encoders since the beginning of their development, as well as comparisons with vpxenc, AVM (development ground for AV2) and VVenC.
and many more...

Thanks for reading!

Support me by making a donation on my Ko-Fi page, as a reward for my efforts and to compensate for the electricity bills of two whole weeks of non-stop encoding.

Feedback​

Methodology​

Samples & Encoding Settings​

Visual comparisons​

Presets comparisons (-1 -> 12)​

Efficiency​

Speed​

Interpretation​

TLDR​

SVT-AV1 v2.1.x vs v2.2.x presets comparisons:​

preset -1: v2.1.x vs v2.2.x​

preset 0: v2.1.x vs v2.2.x​

preset 1: v2.1.x vs v2.2.x​

preset 2: v2.1.x vs v2.2.x​

preset 3: v2.1.x vs v2.2.x​

preset 4: v2.1.x vs v2.2.x​

preset 5: v2.1.x vs v2.2.x​

preset 6: v2.1.x vs v2.2.x​

preset 7: v2.1.x vs v2.2.x​

preset 8: v2.1.x vs v2.2.x​

preset 9: v2.1.x vs v2.2.x​

preset 10: v2.1.x vs v2.2.x​

preset 11: v2.1.x vs v2.2.x​

preset 12: v2.1.x vs v2.2.x​

preset 13: v2.1.x vs v2.2.x​

TLDR​

Conclusion​

Future​

Feedback

Methodology

Samples & Encoding Settings

Visual comparisons

Presets comparisons (-1 -> 12)

Efficiency

Speed

Interpretation

TLDR

SVT-AV1 v2.1.x vs v2.2.x presets comparisons:

`preset -1`: v2.1.x vs v2.2.x

`preset 0`: v2.1.x vs v2.2.x

`preset 1`: v2.1.x vs v2.2.x

`preset 2`: v2.1.x vs v2.2.x

`preset 3`: v2.1.x vs v2.2.x

`preset 4`: v2.1.x vs v2.2.x

`preset 5`: v2.1.x vs v2.2.x

`preset 6`: v2.1.x vs v2.2.x

`preset 7`: v2.1.x vs v2.2.x

`preset 8`: v2.1.x vs v2.2.x

`preset 9`: v2.1.x vs v2.2.x

`preset 10`: v2.1.x vs v2.2.x

`preset 11`: v2.1.x vs v2.2.x

`preset 12`: v2.1.x vs v2.2.x

`preset 13`: v2.1.x vs v2.2.x

TLDR

Conclusion

Future