Better late than never: SVT-AV1 v2.2.x Deep Dive
SVT-AV1 v2.2.0 was released in late August and a minor version v2.2.1 followed suit to adress some bugs. This blog post will focus on comparing this new encoder version to the last, on the basis of benchmarks and visual comparisons. We will quantify the new trade-offs between compression efficiency and encoding speed, so you can choose the right balance for your projects. Our metrics of choice today will be SSIMULACRA2 and XPSNR, used in conjonction with a revised methodology.
Feedback
The biggest missed opportunity of the previous SVT-AV1 deep dives was the absence of visual comparisons. Indeed, metrics may be convenient for easily quantifying differences between encoder versions or encoding parameters, but they fail to give the information of how much these differences matter for your eyes. However, making properly useful visual comparisons isn't an easy task. Comparing two encodes of varying bitrates will bias the result against one or the other, which is not desirable. Figuring out the best way to present these comparisons and making the appropriate scripts took me weeks. Gathering all the necessary data, crafting the comparisons and proceeding to a double-checking pass took me another few weeks. These reasons explain why this blog post took so long to release, but I hope it will have been worth the wait! On that note though, I have been uploading thousands of png screenshots to slow.pics and I ended up rate-limited. This has prevented me from uploading the visual comparisons for preset 11, 12 and some of 10. I'm actively trying to fix this situation, so please understand.
Another reason for this taking so long was my decision to increase the amount of video samples while increasing the amount of CRF values tested. A grand total of 3682 encodes were done for this blog post alone, in the span of around two weeks, where my PC would be exclusively encoding 24/7. Send help.
Also, the graphs will now be using the harmonic mean instead of the arithmetic mean. Indeed, arithmetic mean scores fail to account for deviations and outliers. Using the harmonic mean implies low-scoring frames have more weight towards the final score, which adds a consistency component to the picture. That's not it though! Consistency is capital in an enjoyable watching experience. As such, each graph now possesses a SSIMULACRA2 (Harmonic) version, a XPSNR (Harmonic) version and a SSIMULACRA2 (Standard Deviation) version, in order to closely monitor variations in consistency between presets.
Lastly, the biggest complaint I have received is me exclusively using anime clips. The reason for that is pretty simple: I mostly encode anime content on my free time and am myself very little interested in other types of media. However, these blog posts have grown in exposure and I understand a majority of people is more concerned about the performance of encoders on live action content or gaming clips. Thus, this new blog post is comprised of 3 live action clips, 2 gaming clips and 2 anime clips! See the sacrifices I'm making for y'all?
Methodology
The resources provided will include both graphs and image comparisons. Graphs offer a straightforward, objective look at efficiency across encoder parameters, using metrics as benchmarks for performance. In contrast, image comparisons display actual samples from encoded files, allowing you to assess quality firsthand. This adds a subjective dimension to the comparisons, giving you a more nuanced understanding of each preset's impact on visual quality.
The testing methodology involves using relatively short video samples with a wide range of content types, uncompressed to the y4m file format for ease of use. These lossless files are directly fed to SvtAv1EncApp, implying the performance of a single encoder instance is what's being measured here. A more serious AV1 encoding pipeline should probably be leveraging a chunked encoding approach, especially on higher core count systems. Once an encode is done, SSIMULACRA2 scores are calculated using the Zig implementation, XPSNR scores on the other hand are calculated using a ffmpeg filter, and the data is then aggregated into a final Harmonic or Standard Deviation score to create the graphs for this benchmark. The Constant Rate Factor (CRF) is plotted against encoding time, and the metrics scores against encode size (bitrate). The former may represent the efficiency as defined by the speed achieved at a certain quality target. For the latter, Bits Per Pixel scores (BPP) are calculated so that the Metric / BPP
graphs may represent the compression efficiency, normalized by resolution.
How to read the graphs? For the compression efficiency ones, the closer to the top left the better. For the encoding speed ones, the closer to the left the faster. For the standard deviation ones, the closer to the bottom left the better.
The clips used in this test were acquired legally. The Codec Wiki and its contributors do not endorse media piracy.
SvtAv1EncApp was compiled directly from the v2.1.2 and v2.2.1 source code using Clang 18.1.8 and the provided Build/linux/build.sh
script with the following command: build.sh cc=clang cxx=clang++ jobs=$(nproc) enable-lto static native release
. The testing machine is comprised of an i3 12100 in its stock configuration, with 2x8GB of 3200MHz CL14 DDR4 RAM, in Arch Linux with kernel 6.9.12 and the performance governor enabled. All encodes have been made in the same session without rebooting.
I want to give a disclaimer concerning encoding speeds. Contrary to the efficiency results which should be reproducible independantly of the machine, measuring speed is a pretty difficult endeavor, with increased risks for errors. The performance numbers I mention may differ for you depending on the hardware configuration at hand.
This testing was conducted within the AV1 Weeb Edition Discord server, which is focused on encoding animated content in AV1.
Samples & Encoding Settings
The samples are as follows:
- 17s
Avatar The Way Of Water (trailer 3)
clip sourced from thedigitaltheater.com (1916x804p, 23.976fps). - 6s
Ducks Take Off
clip sourced from xiph.org (1280x720p, 50fps). - 3s
Fallout 4
clip sourced from another encoder fellow (1920x1080p, 60fps). - 8s
Minecraft
clip sourced from xiph.org (1920x1080p, 60fps). - 8s
Sol Levante
HDR clip sourced from opencontent.netflix.com (3840x2160p, 24fps). This one is pretty educative as SVT-AV1's behavior isn't influenced by the existence (or lack thereof) of HDR metadata in a source. - 21s
Suzume (trailer 2)
clip sourced from thedigitaltheater.com (1920x808p, 23.976fps). - 13s
The Mandalorian (trailer 2)
clip sourced from thedigitaltheater.com (1920x800p, 23.976fps).
All clips have been encoded in a wide quality range, from
--crf 10
to--crf 50
, by increments of 2, with the exception of preset -1 that uses increments of 4.
--preset X --hierarchical-levels 4
are the only parameter used here, in conjunction with the CRF values. I have been asked to use --hierarchical-levels 4
by fellow SVT-AV1-PSY developers to force smaller mini GOPs, more appropriate for testing.
Else, the SVT-AV1 defaults were used. The ones worth mentioning are:
--tune 1
: tune PSNR--aq-mode 2
: variance deltaq--enable-qm 0
: quantisation matrices disabled--irefresh-type 2
: closed GOP--enable-tf 1
: temporal filtering enabled And more, like CDEF and restoration enabled, overlays and film-grain disabled...
Visual comparisons
Throughout this blog post, you’ll find slow.pics links that provide various visual comparisons between presets.
- The “full” links offer comparisons across the entire quality range for each source.
- The HQ (High Quality), MQ (Medium Quality), and LQ (Low Quality) links showcase more targeted comparisons. These have been carefully handcrafted to be as size-normalized as possible, given the available encodes. We want to be focusing on encodes with minimal bitrate deviation for a fair comparison.
Feel free to double-check the bitrate of each frame or scene to make a more informed observation, keeping the size difference in mind when comparing the encodes.
Use the arrow keys and numpad to navigate between screenshots. Alternatively, you can click on "Slider comparison" and select two sources if you prefer comparing this way.
Without further ado, let's start with the first comparisons!
Presets comparisons (-1 -> 12)
In the following graphs, you may find comparisons between all SVT-AV1 presets, ranging from the slowest --preset -1
to the fastest --preset 12
.
Just like in v2.1.x, preset 6 and 13 do not exist in v2.2.x and are instead mapped to preset 7 and 12 respectively.
Efficiency
- First of all, the complete efficiency graphs:
You may notice something odd going on with the Avatar results using XPSNR. I have tried to understand the cause, without success. For the remainder of this blog post, the Avatar XPSNR results will be omitted. I will continue investigating and aim to have a workaround in place for next time.
Anyway, this graph may be impressive, but difficult to read. So let's analyse different quality targets.
- The same graphs but focusing on the "high quality" range (CRF10 -> 22):
- Same, but now focusing on the "medium quality" range (CRF24 -> 36):
- And lastly, focusing on the "low quality" range (CRF38 -> 50):
- If we now focus on presets 4 and below, where it's more difficult to discern the differences between presets, we get this at "high quality":
- This at "medium quality":
- And the following at "low quality":
Speed
- Let's now compare the speed of all presets:
Unusable, right?
- Then, here is what it looks like with a logarithmic scale:
Interpretation
It appears as if once again preset 2 through preset 4 remain the most balanced presets all-around in an efficient encoding scenario, with preset 3 not offering much improvements over preset 4 in average scores but nicely improving on consistency instead, and preset 2 offering a nice efficiency and consistency uplift on top.
In this release again, the quality gap between preset 2 and preset 1 is pretty narrow, and the speed penalty from preset 1 onward continuously increases, ending up close to ~2x. In comparison, the penalty of going from preset 3 to preset 2 is closer to ~1.5x. As such, using preset 1 is entering placebo territory and it is usually not recommended to waste precious encoding resources on preset 0 and preset -1. This especially applies at medium to high quality, though at extremely low quality targets, like the CRF40-50 range, we can still see appreciable gains from these placebo presets in some clips.
As for the faster presets, presets 5 to 10 are usually grouped on the graphs focusing on average scores and the ones focusing on consistency. They tend to stand apart from their slower counterparts by just a bit. Though preset 10 can be worryingly close to preset 11 on some occasions. They are all viable for your real-time needs. The rule is the same as usual: go the slowest you can bear that still achieves your goal!
Presets 11 and 12 are especially inefficient and inconsistent, and to be avoided at all costs. If possible, forget they even exist, as it's probably better to use a comparably fast (or faster) competing codec. They could still be of use in a convex-hull scenario, but in the case of realtime transcoding, you will be better off with some hardware solution like the ones found in RTX 4000 or Arc GPUs.
TLDR
The same conclusions as the previous blog posts can be made: clear quality gains can be observed as we decrease presets, until preset 2, however the effectiveness of dropping presets is noticeably less and less important as quality is increased.
SVT-AV1 v2.1.x vs v2.2.x presets comparisons:
In this section, we’ll examine the efficiency and speed differences across presets when upgrading from SVT-AV1 2.1.x to 2.2.x. This comparison should bring a new level of nuance to our results, highlighting both incremental improvements and any notable shifts in performance.
SVT-AV1 v2.1.x brought some nice improvements over v2.0.0, but does v2.2.x bring appreciable improvements in the presets trade-offs this time around as well? Let's find out!
preset -1
: v2.1.x vs v2.2.x
- Let's start things off with the battle of the placebos, with the Compression efficiency & consistency at "high to medium-ish quality" (CRF10 -> 30):
- Along with the Compression efficiency & consistency at "medium-ish to low quality" (CRF34 -> 50):
Basically no changes at all, except a slight regression on Minecraft.
- What about their speeds though?:
Well preset -1 basically became 15 to 25% faster, not bad at all!
- Preset -1 visual comparisons:
Avatar (full), Avatar (HQ), Avatar (LQ)
Ducks (full), Ducks (HQ), Ducks (LQ)
Fallout (full), Fallout (HQ), Fallout (LQ)
Minecraft (full), Minecraft (HQ), Minecraft (LQ)
Sol Levante (full), Sol Levante (HQ), Sol Levante (LQ)
Suzume (full), Suzume (HQ), Suzume (LQ)
The Mandalorian (full), The Mandalorian (HQ), The Mandalorian (LQ)
preset 0
: v2.1.x vs v2.2.x
- Compression efficiency & consistency graphs, high quality range:
- Compression efficiency & consistency graphs, medium quality range:
- Compression efficiency & consistency graphs, low quality range:
Efficiency wise, this new preset 0 is close to unchanged from the old preset 0, but its consistency improved slightly in a few clips at high quality and decreased in one clip at low quality.
- Speed graphs:
Preset 0's speed sees an improvement of about 20% at best. Overall, preset 0 got a proper upgrade!
- Preset 0 visual comparisons:
Avatar (full), Avatar (HQ), Avatar (MQ), Avatar (LQ)
Ducks (full), Ducks (HQ), Ducks (MQ), Ducks (LQ)
Fallout (full), Fallout (HQ), Fallout (MQ), Fallout (LQ)
Minecraft (full), Minecraft (HQ), Minecraft (MQ), Minecraft (LQ)
Sol Levante (full), Sol Levante (HQ), Sol Levante (MQ), Sol Levante (LQ)
Suzume (full), Suzume (HQ), Suzume (MQ), Suzume (LQ)
The Mandalorian (full), The Mandalorian (HQ), The Mandalorian (MQ), The Mandalorian (LQ)
preset 1
: v2.1.x vs v2.2.x
- Compression efficiency & consistency graphs, high quality range:
- Compression efficiency & consistency graphs, medium quality range:
- Compression efficiency & consistency graphs, low quality range:
Preset 1 is mostly unchanged but sees another slight regression in Minecraft at high quality.
- Speed graphs:
Depending on the clip, speed is mostly unchanged or ever so slighty improved. Preset 1 is a bit stagnant this release.
- Preset 1 visual comparisons:
Avatar (full), Avatar (HQ), Avatar (MQ), Avatar (LQ)
Ducks (full), Ducks (HQ), Ducks (MQ), Ducks (LQ)
Fallout (full), Fallout (HQ), Fallout (MQ), Fallout (LQ)
Minecraft (full), Minecraft (HQ), Minecraft (MQ), Minecraft (LQ)
Sol Levante (full), Sol Levante (HQ), Sol Levante (MQ), Sol Levante (LQ)
Suzume (full), Suzume (HQ), Suzume (MQ), Suzume (LQ)
The Mandalorian (full), The Mandalorian (HQ), The Mandalorian (MQ), The Mandalorian (LQ)
preset 2
: v2.1.x vs v2.2.x
- Compression efficiency & consistency graphs, high quality range:
- Compression efficiency & consistency graphs, medium quality range:
- Compression efficiency & consistency graphs, low quality range:
Preset 2's efficiency has regressed at high quality on some clips, improved in some and stayed the same in others. Except in one clip, consistency seems to have improved all around. At low to medium quality targets, efficiency is mostly unchanged, same for consistency. Not exactly noteworthy.
- Speed graphs:
Speed was improved by about 10-20%. Not a bad showcase, for sure.
- Preset 2 visual comparisons:
Avatar (full), Avatar (HQ), Avatar (MQ), Avatar (LQ)
Ducks (full), Ducks (HQ), Ducks (MQ), Ducks (LQ)
Fallout (full), Fallout (HQ), Fallout (MQ), Fallout (LQ)
Minecraft (full), Minecraft (HQ), Minecraft (MQ), Minecraft (LQ)
Sol Levante (full), Sol Levante (HQ), Sol Levante (MQ), Sol Levante (LQ)
Suzume (full), Suzume (HQ), Suzume (MQ), Suzume (LQ)
The Mandalorian (full), The Mandalorian (HQ), The Mandalorian (MQ), The Mandalorian (LQ)
preset 3
: v2.1.x vs v2.2.x
- Compression efficiency & consistency graphs, high quality range:
- Compression efficiency & consistency graphs, medium quality range:
- Compression efficiency & consistency graphs, low quality range:
Practically, it's a wash.
- Speed graphs:
Still, preset 3 got slightly faster, I'm happy to report this is a speedup!
- Preset 3 visual comparisons:
Avatar (full), Avatar (HQ), Avatar (MQ), Avatar (LQ)
Ducks (full), Ducks (HQ), Ducks (MQ), Ducks (LQ)
Fallout (full), Fallout (HQ), Fallout (MQ), Fallout (LQ)
Minecraft (full), Minecraft (HQ), Minecraft (MQ), Minecraft (LQ)
Sol Levante (full), Sol Levante (HQ), Sol Levante (MQ), Sol Levante (LQ)
Suzume (full), Suzume (HQ), Suzume (MQ), Suzume (LQ)
The Mandalorian (full), The Mandalorian (HQ), The Mandalorian (MQ), The Mandalorian (LQ)
preset 4
: v2.1.x vs v2.2.x
- Compression efficiency & consistency graphs, high quality range:
- Compression efficiency & consistency graphs, medium quality range:
- Compression efficiency & consistency graphs, low quality range:
Preset 4 sees a consistent though small improvement in average scores and standard deviation across the entire quality range on basically all clips.
- Speed graphs:
Unfortunately, it got slower as a result. If you remember, v2.1.0 did the exact contrary over v2.0.0, I wonder if preset 4 simply took back the place it previously had...
- Preset 4 visual comparisons:
Avatar (full), Avatar (HQ), Avatar (MQ), Avatar (LQ)
Ducks (full), Ducks (HQ), Ducks (MQ), Ducks (LQ)
Fallout (full), Fallout (HQ), Fallout (MQ), Fallout (LQ)
Minecraft (full), Minecraft (HQ), Minecraft (MQ), Minecraft (LQ)
Sol Levante (full), Sol Levante (HQ), Sol Levante (MQ), Sol Levante (LQ)
Suzume (full), Suzume (HQ), Suzume (MQ), Suzume (LQ)
The Mandalorian (full), The Mandalorian (HQ), The Mandalorian (MQ), The Mandalorian (LQ)
preset 5
: v2.1.x vs v2.2.x
- Compression efficiency & consistency graphs, high quality range:
- Compression efficiency & consistency graphs, medium quality range:
- Compression efficiency & consistency graphs, low quality range:
Both metrics say preset 5 regressed slightly to moderately, though surprisingly its consistency is basically unchanged.
- Speed graphs:
The result of this regression is an impressive speedup of up to 25%.
- Preset 5 visual comparisons:
Avatar (full), Avatar (HQ), Avatar (MQ), Avatar (LQ)
Ducks (full), Ducks (HQ), Ducks (MQ), Ducks (LQ)
Fallout (full), Fallout (HQ), Fallout (MQ), Fallout (LQ)
Minecraft (full), Minecraft (HQ), Minecraft (MQ), Minecraft (LQ)
Sol Levante (full), Sol Levante (HQ), Sol Levante (MQ), Sol Levante (LQ)
Suzume (full), Suzume (HQ), Suzume (MQ), Suzume (LQ)
The Mandalorian (full), The Mandalorian (HQ), The Mandalorian (MQ), The Mandalorian (LQ)
preset 6
: v2.1.x vs v2.2.x
preset 6
: v2.1.x vs v2.2.xPreset 6 is mapped to preset 7 in v2.2.x.
preset 7
: v2.1.x vs v2.2.x
- Compression efficiency & consistency graphs, high quality range:
- Compression efficiency & consistency graphs, medium quality range:
- Compression efficiency & consistency graphs, low quality range:
Preset 7 is close to unchanged in v2.2.x.
- Speed graphs:
It still got some slight to moderate speedups though, which can be appreciated.
- Preset 7 visual comparisons:
Avatar (full), Avatar (HQ), Avatar (MQ), Avatar (LQ)
Ducks (full), Ducks (HQ), Ducks (MQ), Ducks (LQ)
Fallout (full), Fallout (HQ), Fallout (MQ), Fallout (LQ)
Minecraft (full), Minecraft (HQ), Minecraft (MQ), Minecraft (LQ)
Sol Levante (full), Sol Levante (HQ), Sol Levante (MQ), Sol Levante (LQ)
Suzume (full), Suzume (HQ), Suzume (MQ), Suzume (LQ)
The Mandalorian (full), The Mandalorian (HQ), The Mandalorian (MQ), The Mandalorian (LQ)
preset 8
: v2.1.x vs v2.2.x
- Compression efficiency & consistency graphs, high quality range:
- Compression efficiency & consistency graphs, medium quality range:
- Compression efficiency & consistency graphs, low quality range:
The efficiency and consistency of preset 8 has improved at high quality.
- Speed graphs:
And we can observe a speed increase of around 10%. Some crazy speed deviations can be noticed in Sol Levante.
- Preset 8 visual comparisons:
Avatar (full), Avatar (HQ), Avatar (MQ), Avatar (LQ)
Ducks (full), Ducks (HQ), Ducks (MQ), Ducks (LQ)
Fallout (full), Fallout (HQ), Fallout (MQ), Fallout (LQ)
Minecraft (full), Minecraft (HQ), Minecraft (MQ), Minecraft (LQ)
Sol Levante (full), Sol Levante (HQ), Sol Levante (MQ), Sol Levante (LQ)
Suzume (full), Suzume (HQ), Suzume (MQ), Suzume (LQ)
The Mandalorian (full), The Mandalorian (HQ), The Mandalorian (MQ), The Mandalorian (LQ)
preset 9
: v2.1.x vs v2.2.x
- Compression efficiency & consistency graphs, high quality range:
- Compression efficiency & consistency graphs, medium quality range: