Deep Dive into SVT-AV1's Evolution (Part 1): Presets Analysis from v2.0 to v3.0
It's been almost a year since SVT-AV1 v2.0.0 dropped in March 2024, and we finally got v3.0.0 in late February of this year. Minor versions v3.0.1 and v3.0.2 came along afterward with some bug fixes and ARM SIMD improvements, but they didn't meaningfully alter encoding results.
So what's actually different between these versions? I've been wanting to run tests across all the major releases from v2.0.0 to v3.0.x to see how the speed vs quality trade-offs have evolved this past year. Using SSIMULACRA2, Butteraugli, XPSNR, and VMAF (plus some methodology tweaks I'll get into), I'll break down what each version brought to the table and in a second part, we'll also deep dive a few specific options that appeared in the encoder since my first blog post release so you can figure out what you may want to use for your projects. That includes variance boost, fast decode, temporal filtering strength and a few others...
Feedback
Although the reception was warm, I got less feedback than usual this time around, but honestly, that hasn't prevented me from wanting more. And better! I'm keeping the stuff that seemed well-received: the diverse test clips, visual comparisons, and my ongoing attempt to stay as objective as possible.
In my last post's conclusion, I mentioned some frustrations: XPSNR was acting up, metrics were taking forever to calculate, and there was just too much data to make sense of. Good news is I've tackled all of these. I fixed the XPSNR issue (it expected mod-8 inputs), started using the new Vship to speed up metrics calculation, and found a cleaner way to present all the data points at just two quality levels, like I used to do.
Also, it needs to be pointed out I've been way too optimistic about how often I can get these posts out. Especially when I revamp the methodology each time. I'm sorry about that, again. Though, my ambition may also be at fault: the size of this article got so out of hand I eventually decided to split it into two parts for convenience's sake.
Methodology
You'll find both graphs and visual comparisons in this analysis. The graphs give you objective data on encoder efficiency across different settings, using various metrics. The image comparisons show the actual encoded samples so you can judge quality for yourself, adding that subjective element that numbers alone can't capture.
This time, I'm using a new tool called Metrics from the self-proclaimed Psychovisual Experts group, which provides scripts for measuring and comparing video codecs. I've heavily modified these scripts for my specific needs, but if you want to run your own tests locally, definitely check out the original Metrics toolkit!
Here's how the testing works: I use relatively short video samples covering a wide range of content types, all converted to uncompressed y4m format (if they weren't already!) for consistency. These lossless files go straight into SvtAv1EncApp, so we're measuring single-instance encoder performance here. Keep in mind that serious AV1 encoding pipelines should probably use chunked encoding (with a tool like Av1an), especially on higher core count systems.
Once encoding is done, we run multiple full-reference metrics comparing the encodes against the original source. Using several different metrics helps compensate for each one's weaknesses and gives a more complete picture of the actual visual differences between encodes. I try to stray away from the industry standard metrics which tend to have poor correlation to Mean Opinion Scores (MOS), and instead focus more on psychovisual metrics that better represent actual visual quality. I calculate SSIMULACRA2 and Butteraugli scores using Vship, an accurate GPU-accelerated port of the Zig implementation I used before that's also much faster. XPSNR and VMAF scores come from their respective ffmpeg filters, but with a twist!
If you remember, last time I started making use of Harmonic Mean for SSIMU2 to better account for inconsistent scoring behaviors within a video. We'll be doing it again today.
This time sees the addition of the (in)famous VMAF metric, but in a radically different form than you've probably ever seen until now. Indeed, it is well known VMAF is rather unreliable, but with a few modifications to the scoring method, we can try to make it better. I'm computing scores across all three color planes (instead of just luma as VMAF isn't chroma aware), then weighting them with this formula: ((4.0 * vmaf_y) + vmaf_u + vmaf_v) / 6.0
. This approach is inspired by the better-vmaf mod. I also chose to use VMAF's neg model and to disable the motion component (motion.motion_force_zero=true
) since it's notorious in our niche encoding communities for inflating scores during motion and producing nonsensical results (bad looking frames having near perfect scores...). I'll refer to this metric as W-VMAF in the rest of this post.
XPSNR gets similar treatment: motion component disabled (by commenting out this line) and the same weighting formula: ((4.0 * xpsnr_mse_y) + xpsnr_mse_u + xpsnr_mse_v) / 6.0
. As you may glimpse from said formula, XPSNR's dB values are converted back to linear MSE, we then calculate the weighted average and convert back to dB. I'll refer to this metric as W-XPSNR in the rest of this post.
Butteraugli runs in stock configuration except for the intensity multiplier set to 203 nits instead of the currently default 80 (based on an industry reference, used by MPV for instance).
I believe these modifications produce more meaningful results than stock metrics, but proving that is beyond this post's scope. You'll have to trust the methodology or test it yourself. All this data gets aggregated to create the benchmark graphs.
The "speed graphs" plot Constant Rate Factor (CRF) against encoding time to show speed efficiency at different quality targets. For compression efficiency, I plot metric scores against output file size. Since Metrics doesn't use bits per pixel (BPP) and I had received feedback a few blog posts back that this metric could be confusing, I decided to drop it this time around.
To achieve more accurate efficiency curves with fewer probes, it is more effective to prioritize probes in the higher-quality (lower CRF) range of the quality spectrum, since bitrate will increase quicker. I used the following formula, courtesy of better-vmaf's author, to determine ten CRF values from 10 to 50 to use for the testing: crfs = [min_q + (max_q - min_q) * ((step / (q_steps - 1)) ** scaling_factor) for step in range(q_steps)]
, then equally split into High Quality and Low Quality graphs and visual comparisons.
How to read the graphs? For the compression efficiency graphs, higher and further left is better, except for Butteraugli, which is a distance metric where lower scores mean better quality, so you want bottom-left instead of top-left. For the encoding speed graphs, further left means faster.
One important caveat about speed measurements: while the efficiency results should be reproducible regardless of your hardware, measuring encoding speed is trickier. The performance numbers you see here will likely differ on your setup depending on your specific hardware configuration. Please take them with a grain of salt.
The clips used in this test were acquired legally. The Codec Wiki and its contributors do not endorse media piracy.
As the testing started a good while ago, about when v3.0.1 released in fact, and many encodes were already completed by the time v3.0.2 was out, I used the following encoder versions for this test: v2.0.0, v2.1.2, v2.2.1, v2.3.0 and v3.0.1. The different SvtAv1EncApp binaries were compiled directly from their respective source codes found on the release section of the SVT-AV1 Gitlab repository, using Clang 19.1.7 and the provided Build/linux/build.sh
script with the following command: build.sh cc=clang cxx=clang++ jobs=$(nproc) enable-lto static native release
. The testing machine is now comprised of an i7 12700F, whose E-cores have been disabled to avoid scheduler-related issues, with 2x8GB of TOTL 3200MHz CL14 DDR4 RAM, in Arch Linux with kernel 6.12.17 and the performance governor enabled. I tried my best to run most encodes in the same session without rebooting, but a few issues made it so I had to re-run some in a new session. However, this does not concern this part of the article, so we'll revisit this in the next one.
There is an exception to one of my above statement. As you may know, there is a feature that significantly impacted SVT-AV1's competitiveness: variance boost! This feature can provide good efficiency improvements by increasing quality in low-contrast areas in frames, at little to no performance cost when properly bitrate normalized. This feature was in the work during the v1.7.0 days, but it only got merged to git relatively shortly after v2.0.0 released. As I wanted to enable varboost by default for this entire testing (as I had expressed in the conclusion of the last blog post), I decided to manually patch the v2.0.0 source code as provided on the release page with the following two commits: "Introduce the variance boost feature" and "Do not adjust picture QP/qindex value with variance boost on". Of course, the feature slightly evolved since then, but this little modification still allowed me to test the evolution of the encoder performance with this key feature on.
We're almost ready to deep dive into an ocean of metrics, graphs, and revelations! (Okay, maybe not revelations... but hopefully a few surprises.)
Samples
The samples are the same as last time:
- 17s
Avatar The Way Of Water (trailer 3)
clip sourced from thedigitaltheater.com (1920x808p with 4 columns and 4 rows of pure black borders to fix XPSNR, 23.976fps). - 6s
Ducks Take Off
clip sourced from xiph.org (1280x720p, 50fps). - 3s
Fallout 4
clip sourced from another encoder fellow (1920x1080p, 60fps). - 8s
Minecraft
clip sourced from xiph.org (1920x1080p, 60fps). - 8s
Sol Levante
HDR clip sourced from opencontent.netflix.com (3840x2160p, 24fps). This one is pretty educative as SVT-AV1's behavior isn't influenced by the existence (or lack thereof) of HDR metadata in a source. - 21s
Suzume (trailer 2)
clip sourced from thedigitaltheater.com (seems to have been deleted since) (1920x808p, 23.976fps). - 13s
The Mandalorian (trailer 2)
clip sourced from thedigitaltheater.com (1920x800p, 23.976fps).
Visual comparisons
Throughout this blog post, you’ll find slow.pics links that provide various visual comparisons between presets.
The "High Quality" (noted HQ) and "Low Quality" (noted LQ) links showcase comparisons at two different quality targets. These have been carefully handcrafted to be as size-normalized as possible, given the available encodes. We want to be focusing on encodes with minimal bitrate deviation for a fair comparison.
Feel free to double-check the bitrate of each frame or scene (as written in the top-left corner of each screenshot) to make a more informed observation, keeping the size difference in mind when comparing the encodes.
Use the arrow keys and numpad to navigate between screenshots. Alternatively, you can click on "Slider comparison" and select two sources if you prefer comparing this way.
Quality Target & Encoding Settings
All clips have been encoded in a wide quality range, from
--crf 10
to--crf 50
, with values determined using the previously described formula.
--preset X --enable-variance-boost 1
are the only parameters used here, in conjunction with the CRF values. We will compare various varboost parameters' combinations in Part 2 of the article, but for now, the defaults are used. Obviously, it wouldn't be fun without me realizing, as I'm writing these words, that I forgot to use --hierarchical-levels 4
like last time, but well, it is what it is. Nothing that significant.
Else, the SVT-AV1 defaults were used. The ones worth mentioning are:
--tune 1
: tune PSNR--aq-mode 2
: variance deltaq--enable-qm 0
: quantisation matrices disabled--enable-tf 1
: temporal filtering enabled--tf-strength 3
: default temporal filtering strength--sharpness 0
: default deblock and rate distortion mode decision--fast-decode 0
: decode optimizations disabled
And more, like CDEF and restoration enabled, overlays and film-grain disabled...
Anyway, time to kick things off with the presets comparisons!
SVT-AV1 v3.0.x Presets Comparisons (-1 -> 10)
If all you care about is what preset to use in the latest available version, this is the section for you!
In the following graphs, you may find comparisons between all SVT-AV1 v3.0.x presets, ranging from the slowest --preset -1
to the current fastest --preset 10
.
Since v2.3.0 (and unlike in v2.1.x and v2.2.x), preset 6 made its return as its own separate preset: it is not mapped to preset 7 anymore. However, the maximum preset has become 10, and anything above it is effectively mapped to 10.
Efficiency
- Here are the full efficiency graphs:
Quite a lot of data eh? So much so that readability is impacted. Hence why we'll also focus on two quality targets to better understand what's exactly going on.
- Thus, let's look at the same data but zooming in on the "high quality" range (defined here as CRF10 through 23):
- And now, let's zoom in on the "low quality" range (defined here as CRF28 through 50):
As we can see, presets 9 and 10 can be quite unpredictable depending on the clip. Their efficiency curves are also clearly not monotonic, with unexpected efficiency regressions happening at very low CRFs (high bitrates), especially visible on SSIMU2. This behavior is likely enhanced by the harmonic scoring, but it is also observable on Butteraugli and W-VMAF to a lesser extent. When looking at preset 10 in the low quality range, the curves can be so chaotic on the SSIMU2 side we can easily conclude there's a quality consistency issue at play, that is to say that certain frames score significantly lower than others.
- Therefore, let's remove presets 9 and 10 from the equation and have a better look at the "usable" presets, from 8 and below:
- Same, but focusing on the "high quality" range (CRF10 -> 23):
- And the "low quality" range (CRF28 -> 50):
Efficiency wise, we instantly notice presets 2 and below are grouped together very tightly, while every other presets appear to be more evenly spaced. Consistency, and thus SSIMU2 harmonic scores in particular, tend to get really bad below CRF38 or 44 depending on the clip, which at least teaches us that SVT-AV1 begins to struggle around these quality targets whatever the chosen preset (in its almost-stock configuration).
Something interesting seems to be happening with preset 8 where it scores better than preset 7 on certain clips, even going neck and neck with preset 6 at times according to W-VMAF. That is something worth investigating visually, in the following visual comparisons between the three presets:
HQ | LQ |
---|---|
Avatar (HQ) | Avatar (LQ) |
Ducks (HQ) | Ducks (LQ) |
Fallout (HQ) | Fallout (LQ) |
Minecraft (HQ) | Minecraft (LQ) |
Sol Levante (HQ) | Sol Levante (LQ) |
Suzume (HQ) | Suzume (LQ) |
The Mandalorian (HQ) | The Mandalorian (LQ) |
Do you agree with W-VMAF's numbers here? I think I do, to an extent.
- Let's take a closer look at presets 4 and below to see if we can better observe the differences between the slower modes:
- At "high quality" (CRF10 -> 23):
- And at "low quality" (CRF28 -> 50):
It is better, but still clear the differences are relatively small. Though, we can also put BD-rate numbers to these differences thanks to psy-ex's metrics!
The following table show the BD-rate regressions of preset 0 through 4, with preset -1 as the reference, when averaged across all four metrics and all seven clips:
Preset | BD-rate regression (vs P-1) |
---|---|
0 | 0.97% |
1 | 3.79% |
2 | 7.82% |
3 | 17.08% |
4 | 22.41% |
Presets 0 through 2 are pretty close to the reference preset -1 and we notice a huge jump from preset 2 to 3, with preset 4 relatively scoring very close to preset 3.
We will put these numbers into perspective after having looked at the performance of every presets.
Speed
Efficiency numbers are great, but they only tell half the story. The relative speed differences between presets can paint a drastically different picture of the situation and change our entire interpretation of the results so far.
In the following graphs, you may find speed comparisons of the different presets, with either a linear or a logarithmic scale. The latter is useful to better visualize the large speed variations between presets, as it compresses the scale and makes smaller differences more apparent.
I must reiterate my previous disclaimer that these numbers should be taken with a grain of salt. They only represent the speed of the given presets at each CRF value over a single run on a given machine. Making multiple runs could help eliminate small, undesired variations, but they'd come at the cost of time and electricity. It is not reasonable here considering the scale of this blog post, sacrifices must unfortunately be made. To help mitigate this issue, the processor temperature is closely monitored and the fans actively adjust to prevent overheating. I also make sure no background tasks are running and that only a single encoder instance is running at all times. This won't prevent some outliers from passing through, but they are usually easier to detect.
- That said, let's begin by comparing the performance of all presets, from -1 to 10:
First thing first, SVT-AV1 scales exceptionally well, from well above realtime speeds, to painfully placebo speeds. Looking at the graphs with logarithmic scales, we can see that encoding speeds, the slower presets we go, tend to increase in an exponential manner. Unfortunately for us, efficiency doesn't usually follow that trend! We do notice a substancial gap between presets 4 and 5 however.
- Let's also look at the performance of just presets -1 through 4, considered the slower, non-realtime presets:
On the logarithmic scale, the presets are impressively evenly spaced, except for preset -1 which is somewhat closer to preset 0.
Now then, we need to combine what we learned about efficiency and speed to interpret the results in an informative way!
Interpretation (TLDR)
Despite the repeated presets shifting, some things never change.
For good modes, presets 2 and 4 still offer the best bang for your buck in balancing efficiency and speed. Preset 2 is usually slightly more than 2x faster than preset 1 and about 2x slower than preset 3 while offering close to preset 1 efficiency and largely better efficiency than preset 3. Preset 4 happens to be the "slowest" good preset that's still very competitive, with little efficiency differences against preset 3, making it a valuable choice for anyone wanting a good balance of quality and speed for non-realtime usecases.
For realtime modes, any preset from 5 and above will do as long as it reaches realtime on your system! However, consistency takes a huge hit at presets 9 and 10, rendering those two presets unappealing compared to hardware solutions that are likely to be quite competitive for this usecase. Preset 8, which strikes an excellent efficiency to speed ratio for what it is, is the last preset I would deem truly usable in SVT-AV1 v3.0.x.
SVT-AV1 v2.0.0 -> v3.0.x Initial Presets Comparisons (-1 -> 10)
In this section, we will first examine the presets -1 to 10 range, independently of the presets shifting happening between the tested encoder versions. In a following section, we'll make more targeted comparisons taking into account the shifts.
preset -1
: v2.0.0 -> v3.0.x
- Compression efficiency graphs, full quality range:
- Compression efficiency graphs, "high quality" range (CRF10 -> 23):
- Compression efficiency graphs, "low quality" range (CRF28 -> 50):
- Speed graphs:
- Interpretation:
Efficiency wise, preset -1 has not improved from v2.0.0 to v3.0.x, rather it has stagnated. At worst, it regressed of exactly 2.0% BD-rate on one metric in one given clip. On average, the regression is closer to 0.5%. The good news though is that the preset became up to 65% faster vs v2.0.0! That's the best case scenario, as on average across the studied quality range, it has gotten faster of about 37.5%, which is still an amazing trade-off!
It is fascinating to see how very different the presets' speed behave with CRF between versions, as at CRF50, the preset is barely faster on v3.0.x than it was on v2.0.0.
Now, feel free to look at the following comparisons to see if you can spot much of a difference between the different versions at preset -1!
- Preset -1 Visual Comparisons:
HQ | LQ |
---|---|
Avatar (HQ) | Avatar (LQ) |
Ducks (HQ) | Ducks (LQ) |
Fallout (HQ) | Fallout (LQ) |
Minecraft (HQ) | Minecraft (LQ) |
Sol Levante (HQ) | Sol Levante (LQ) |
Suzume (HQ) | Suzume (LQ) |
The Mandalorian (HQ) | The Mandalorian (LQ) |
preset 0
: v2.0.0 -> v3.0.x
- Compression efficiency graphs, full quality range:
- Compression efficiency graphs, "high quality" range (CRF10 -> 23):
- Compression efficiency graphs, "low quality" range (CRF28 -> 50):
- Speed graphs:
- Interpretation:
Here, we have an interesting case where preset 0 efficiency got slightly better since v2.0.0 (by about 1.75%), though it remained largely unchanged after v2.1.x. There's one funny exception: according to all metrics, it peaked at HQ on the Minecraft clip during v2.1.x and regressed ever since. Speed wise, the preset took a hit in v2.1.x, improved a lot in v2.2.x almost closing the gap with v2.0.0, and regressed again in v2.3.0 and especially in v3.0.x. The regressions are not a good look, even though by in large, they're rather insignificant.
Can you spot the differences in the comparisons below though?
- Preset 0 Visual Comparisons:
HQ | LQ |
---|---|
Avatar (HQ) | Avatar (LQ) |
Ducks (HQ) | Ducks (LQ) |
Fallout (HQ) | Fallout (LQ) |
Minecraft (HQ) | Minecraft (LQ) |
Sol Levante (HQ) | Sol Levante (LQ) |
Suzume (HQ) | Suzume (LQ) |
The Mandalorian (HQ) | The Mandalorian (LQ) |
preset 1
: v2.0.0 -> v3.0.x
- Compression efficiency graphs, full quality range:
- Compression efficiency graphs, "high quality" range (CRF10 -> 23):
- Compression efficiency graphs, "low quality" range (CRF28 -> 50):
- Speed graphs:
- Interpretation:
We start seeing more diverse results with preset 1. Versions after v2.0.0 tend to fare better efficiency wise, but each version present different strengths and weaknesses depending on the clip and metric. The speed graphs are quite chaotic, though the preset seems to have consistently gotten slightly slower and slower since the v2.0.0 days. Taking all of this into account, the overall interpretation would still be that the preset behavior is mostly unchanged from before.
Let's play at "spot the differences"! After looking at the following comps, tell me what you think of the new trade-offs!
- Preset 1 Visual Comparisons:
HQ | LQ |
---|---|
Avatar (HQ) | Avatar (LQ) |
Ducks (HQ) | Ducks (LQ) |
Fallout (HQ) | Fallout (LQ) |
Minecraft (HQ) | Minecraft (LQ) |
Sol Levante (HQ) | Sol Levante (LQ) |
Suzume (HQ) | Suzume (LQ) |
The Mandalorian (HQ) | The Mandalorian (LQ) |
preset 2
: v2.0.0 -> v3.0.x
- Compression efficiency graphs, full quality range:
- Compression efficiency graphs, "high quality" range (CRF10 -> 23):
- Compression efficiency graphs, "low quality" range (CRF28 -> 50):
- Speed graphs:
- Interpretation:
On Minecraft, we notice (especially at HQ) a trend of preset 2 progressively becoming worse after v2.1.x, by about 2.5% in fact vs v3.0.x. On other clips, it has either stagnated or slightly improved, except for v3.0.x where it seems to score consistently last or second to last. Relatively, the differences are small, so broadly speaking, the preset is again very similar to what it used to be. Well, except for its consistent speed improvement of 10-40% since v2.0.0, making the new trade-off considerably more appealing.
Does the following comparisons help quantify the speed improvements in the new version? Well, no... they can't, but it's nice to have them anyway! Check them out!
- Preset 2 Visual Comparisons:
HQ | LQ |
---|---|
Avatar (HQ) | Avatar (LQ) |
Ducks (HQ) | Ducks (LQ) |
Fallout (HQ) | Fallout (LQ) |
Minecraft (HQ) | Minecraft (LQ) |
Sol Levante (HQ) | Sol Levante (LQ) |
Suzume (HQ) | Suzume (LQ) |
The Mandalorian (HQ) | The Mandalorian (LQ) |
preset 3
: v2.0.0 -> v3.0.x
- Compression efficiency graphs, full quality range:
- Compression efficiency graphs, "high quality" range (CRF10 -> 23):
- Compression efficiency graphs, "low quality" range (CRF28 -> 50):
- Speed graphs:
- Interpretation:
At last, efficiency changes that can be considered significant! The presets shifting is at play here, and it will only become more evident as we progress towards the faster presets. With v3.0.x having fewer presets than some earlier versions, the SVT-AV1 team spread them out more to fill the gaps, completing the shifting initiated in v2.3.0.
Preset 3 in v3.0.x almost consistently scores last, though that is less pronounced at Low Quality. On average, its BD-rate is 2.7% lower in v3.0.x vs v2.0.0. However, partly thanks to that, it received a hefty speed boost of about 24% since!
Take a look at the graphs below, does this seem worth it to you?
- Preset 3 Visual Comparisons:
HQ | LQ |
---|---|
Avatar (HQ) | Avatar (LQ) |
Ducks (HQ) | Ducks (LQ) |
Fallout (HQ) | Fallout (LQ) |
Minecraft (HQ) | Minecraft (LQ) |
Sol Levante (HQ) | Sol Levante (LQ) |
Suzume (HQ) | Suzume (LQ) |
The Mandalorian (HQ) | The Mandalorian (LQ) |
preset 4
: v2.0.0 -> v3.0.x
- Compression efficiency graphs, full quality range:
- Compression efficiency graphs, "high quality" range (CRF10 -> 23):
- Compression efficiency graphs, "low quality" range (CRF28 -> 50):
- Speed graphs:
- Interpretation:
Same goes for preset 4! Our slow preset of choice has become 29% faster on average, at the cost of a 3.0% efficiency loss.
With that in mind and the comps available right below, have your opinion of preset 4 changed?
- Preset 4 Visual Comparisons:
HQ | LQ |
---|---|
Avatar (HQ) | Avatar (LQ) |
Ducks (HQ) | Ducks (LQ) |
Fallout (HQ) | Fallout (LQ) |
Minecraft (HQ) | Minecraft (LQ) |
Sol Levante (HQ) | Sol Levante (LQ) |
Suzume (HQ) | Suzume (LQ) |
The Mandalorian (HQ) | The Mandalorian (LQ) |
preset 5
: v2.0.0 -> v3.0.x
- Compression efficiency graphs, full quality range:
- Compression efficiency graphs, "high quality" range (CRF10 -> 23):
- Compression efficiency graphs, "low quality" range (CRF28 -> 50):
- Speed graphs:
- Interpretation:
Unsurprisingly, the spreading continues! The gaming clips are the most affected here, showing an efficiency decrease of 6.1% and 14.3% respectively. Then again, the average across all seven clips is quite a bit lower at 4.5% lower efficiency. Speed is improved by 44%, so the trade-off sounds reasonable.
I'm starting to lose inspiration to introduce the comparisons, so simply have a look!
- Preset 5 Visual Comparisons:
HQ | LQ |
---|---|
Avatar (HQ) | Avatar (LQ) |
Ducks (HQ) | Ducks (LQ) |
Fallout (HQ) | Fallout (LQ) |
Minecraft (HQ) | Minecraft (LQ) |
Sol Levante (HQ) | Sol Levante (LQ) |
Suzume (HQ) | Suzume (LQ) |
The Mandalorian (HQ) | The Mandalorian (LQ) |
preset 6
: v2.0.0 -> v3.0.x
- Compression efficiency graphs, full quality range:
- Compression efficiency graphs, "high quality" range (CRF10 -> 23):
- Compression efficiency graphs, "low quality" range (CRF28 -> 50):
- Speed graphs:
- Interpretation:
More of the same! Though, the shift seems to have been initiated in v2.1.x, so preset 6 in v3.0.x is quite different from its older sibling from v2.0.0. Compared to v2.0.0, preset 6 is now both 47% faster and 8.8% lower efficiency, while compared to v2.1.2 instead, it is roughly 27% faster and 2.6% lower efficiency. The gaming clips are the most affected again, showcasing that complex content suffers more as you move to higher presets.
Find the visual comparisons right after, if you're interested.
- Preset 6 Visual Comparisons:
HQ | LQ |
---|---|
Avatar (HQ) | Avatar (LQ) |
Ducks (HQ) | Ducks (LQ) |
Fallout (HQ) | Fallout (LQ) |
Minecraft (HQ) | Minecraft (LQ) |
Sol Levante (HQ) | Sol Levante (LQ) |
Suzume (HQ) | Suzume (LQ) |
The Mandalorian (HQ) | The Mandalorian (LQ) |
preset 7
: v2.0.0 -> v3.0.x
- Compression efficiency graphs, full quality range:
- Compression efficiency graphs, "high quality" range (CRF10 -> 23):
- Compression efficiency graphs, "low quality" range (CRF28 -> 50):
- Speed graphs:
- Interpretation:
With preset 7, it is clear the downward shift in efficiency was progressively set in motion since v2.3.0. Compared to v2.0.0, v3.0.x's preset 7 sees an 11.6% efficiency hit, traded for a 68% speed improvement, not too bad. Versus v2.3.0, the new preset 7 has a 5% worse efficiency and only 14% better speeds. This difference is arguably the least appealing of every presets so far...
How did the preset evolve visually though? The answer is in the following comparisons!
- Preset 7 Visual Comparisons:
HQ | LQ |
---|---|
Avatar (HQ) | Avatar (LQ) |
Ducks (HQ) | Ducks (LQ) |
Fallout (HQ) | Fallout (LQ) |
Minecraft (HQ) | Minecraft (LQ) |
Sol Levante (HQ) | Sol Levante (LQ) |
Suzume (HQ) | Suzume (LQ) |
The Mandalorian (HQ) | The Mandalorian (LQ) |
preset 8
: v2.0.0 -> v3.0.x
- Compression efficiency graphs, full quality range:
- Compression efficiency graphs, "high quality" range (CRF10 -> 23):
- Compression efficiency graphs, "low quality" range (CRF28 -> 50):
- Speed graphs:
- Interpretation:
Preset 8's trade-offs are significantly altered in v3.0.x. This is relatively proportionate to the gain in speed, but we may have expected better. Efficiency wise, it is 15.8% lower and 48% faster than in v2.0.0. To be fair, as the presets are starting to stray too much from each other, we should probably compare it to a faster v2.0.0 preset like 9.
Look up the visual comparisons while keeping that in mind.
- Preset 8 Visual Comparisons:
HQ | LQ |
---|---|
Avatar (HQ) | Avatar (LQ) |
Ducks (HQ) | Ducks (LQ) |
Fallout (HQ) | Fallout (LQ) |
Minecraft (HQ) | Minecraft (LQ) |
Sol Levante (HQ) | Sol Levante (LQ) |
Suzume (HQ) | Suzume (LQ) |
The Mandalorian (HQ) | The Mandalorian (LQ) |
preset 9
: v2.0.0 -> v3.0.x
- Compression efficiency graphs, full quality range:
- Compression efficiency graphs, "high quality" range (CRF10 -> 23):
- Compression efficiency graphs, "low quality" range (CRF28 -> 50):
- Speed graphs:
- Interpretation:
Similar to preset 8, it is becoming really hard to make a fair comparison between very disparate presets, especially when it is starting to break at extreme CRF values. It is clear v3.0.x's preset 9 is built differently than v2.0.0's, it plays in a completely different league. I will not elaborate further at this point, as we will revisit this preset in the next section.
There are still visual comparisons, if you're curious:
- Preset 9 Visual Comparisons:
HQ | LQ |
---|---|
Avatar (HQ) | Avatar (LQ) |
Ducks (HQ) | Ducks (LQ) |
Fallout (HQ) | Fallout (LQ) |
Minecraft (HQ) | Minecraft (LQ) |
Sol Levante (HQ) | Sol Levante (LQ) |
Suzume (HQ) | Suzume (LQ) |
The Mandalorian (HQ) | The Mandalorian (LQ) |
preset 10
: v2.0.0 -> v3.0.x
- Compression efficiency graphs, full quality range:
- Compression efficiency graphs, "high quality" range (CRF10 -> 23):
- Compression efficiency graphs, "low quality" range (CRF28 -> 50):
- Speed graphs:
- Interpretation:
Unlike the previous two presets, the new preset 10 can somewhat be compared to its previous iterations. After disregarding v2.2.1's which is the one to behave distinctly here, v3.0.x's preset 10 show similar characteristics across the quality range. For what it's worth, the preset is 38.5% less efficient but 39.1% faster compared to v2.0.0. It should be noted that as we go faster, the gap between fast presets is more likely to shrink on shorter clips due to the low encode times (in the order of a few seconds). As the time taken by the encoder to initialize isn't quite negligible anymore, using longer clips would likely increase the lead of preset 10 in v3.0.x vs v2.0.0 as encoding speeds would have gotten enough time to stabilize. This is obviously out of the scope of this blog post, but still, the analysis in the next section will help shed light on how the fastest preset in v3.0.x compares to the fastest one in v2.0.0.
I invite you to take a look at the visual comparisons if you want to see how preset 10 evolved from v2.0.0 to v3.0.x.
- Preset 10 Visual Comparisons:
HQ | LQ |
---|---|
Avatar (HQ) | Avatar (LQ) |
Ducks (HQ) | Ducks (LQ) |
Fallout (HQ) | Fallout (LQ) |
Minecraft (HQ) | Minecraft (LQ) |
Sol Levante (HQ) | Sol Levante (LQ) |
Suzume (HQ) | Suzume (LQ) |
The Mandalorian (HQ) | The Mandalorian (LQ) |
SVT-AV1 v2.0.0 vs v3.0.x Selective Presets Comparisons
Some presets cannot be directly compared anymore, as the reduction in presets since v2.0.0 means the gap had to be filled, and the fastest presets were the most affected by this move. I decided to focus on SVT-AV1 v3.0.x's presets 8, 9 and 10 here, then we'll proceed do something new for this blog post series...
v3.0.x's preset 10
vs v2.0.0's preset 13
How does v3.0.x's fastest preset fares against v2.0.0's own?
- Compression efficiency graphs, full quality range:
- Compression efficiency graphs, "high quality" range (CRF10 -> 23):
- Compression efficiency graphs, "low quality" range (CRF28 -> 50):
- Speed graphs:
- Interpretation:
Well, the answer is quite nicely! Depending on the clip and metric, v3.0.x is overall either lightly ahead or lightly behind, while staying equally fast! The trade-offs are mildly different, as v3.0.x is capable of being even slightly faster, at the cost of some efficiency. It's fair to say the dev team managed to preserve the performance of the fastest mode despite said reduction in presets. Preset 13 was always considered an experimental mode reserved for convex hull purposes, so it's my assumption that the dev team had no real intention of making the fastest preset even faster if accuracy was impacted further than it already is.
Here are some visual comparisons to visualize the situation! Did you miss them?
- Visual Comparisons:
HQ | LQ |
---|---|
Avatar (HQ) | Avatar (LQ) |
Ducks (HQ) | Ducks (LQ) |
Fallout (HQ) | Fallout (LQ) |
Minecraft (HQ) | Minecraft (LQ) |
Sol Levante (HQ) | Sol Levante (LQ) |
Suzume (HQ) | Suzume (LQ) |
The Mandalorian (HQ) | The Mandalorian (LQ) |
v3.0.x's preset 9
vs v2.0.0's presets 11 & 12
- Compression efficiency graphs, full quality range:
- Compression efficiency graphs, "high quality" range (CRF10 -> 23):
- Compression efficiency graphs, "low quality" range (CRF28 -> 50):
- Speed graphs:
- Interpretation:
In a nutshell, preset 9 in v3.0.x fares in between v2.0.0's presets 11 and 12 in efficiency, though usually closer to preset 11 overall, while usually being closer in speed to preset 12 (notable exception on the gaming clips)! That's a free efficiency or speed boost for any non-gamer upgrading!
Screenshots... screenshots everywhere! Find them right below:
- Visual Comparisons:
HQ | LQ |
---|---|
Avatar (HQ) | Avatar (LQ) |
Ducks (HQ) | Ducks (LQ) |
Fallout (HQ) | Fallout (LQ) |
Minecraft (HQ) | Minecraft (LQ) |
Sol Levante (HQ) | Sol Levante (LQ) |
Suzume (HQ) | Suzume (LQ) |
The Mandalorian (HQ) | The Mandalorian (LQ) |
v3.0.x's preset 8
vs v2.0.0's presets 9 & 10
- Compression efficiency graphs, full quality range:
- Compression efficiency graphs, "high quality" range (CRF10 -> 23):
- Compression efficiency graphs, "low quality" range (CRF28 -> 50):
- Speed graphs:
- Interpretation:
As with preset 9 previously, preset 8 in v3.0.x usually competes closer to v2.0.0's preset 9 in efficiency, while performing one tier higher with speeds comparable to v2.0.0's preset 10. The results are more chaotic on the gaming clips but one clear advantage of preset 8 is its robust consistency, unlike v2.0.0's preset 10 (and v3.0.x's two faster presets).
Here's our last set of comps for the day:
- Visual Comparisons:
HQ | LQ |
---|---|
Avatar (HQ) | Avatar (LQ) |
Ducks (HQ) | Ducks (LQ) |
Fallout (HQ) | Fallout (LQ) |
Minecraft (HQ) | Minecraft (LQ) |
Sol Levante (HQ) | Sol Levante (LQ) |
Suzume (HQ) | Suzume (LQ) |
The Mandalorian (HQ) | The Mandalorian (LQ) |
SVT-AV1 v2.0.0 -> v3.0.x General BD-rate Evolution
Before we wrap up on the presets analysis, I'm going to attempt an exercise widely used for comparing video encoder performance in both academic research and industry benchmarks. BD-rate (Bjøntegaard Delta rate) calculates the bitrate savings between two encoders at equivalent quality levels, giving you a single percentage that represents compression efficiency gains. You've already seen these numbers throughout the post. While BD-rate has its limitations (it assumes rate-distortion curves follow specific mathematical models and can struggle with very different encoder behaviors), it remains the most widely accepted metric for encoder comparisons. Psy-ex's metrics conveniently outputs BD-rate numbers when running its benchmarking scripts, so this served as the basis for what we're about to do.
The graphs in this section plot BD-rate (%) against encoding time (ms), showing you the classic speed-vs-efficiency trade-off that defines encoder development.
The first batch of graphs uses SVT-AV1 v2.0.0's preset -1 as the reference:
We can instantly see the limitations of this approach. Using accurate results is vital for these graphs to make any sense, and having such bumpy curves is proof that despite my efforts in ensuring the measured encoding times were correct, this methodology is too sensitive to even the smallest of deviations. That being said, I'm not helped by the very behavior of the encoder I'm benchmarking: as we've seen in earlier speed graphs, SVT-AV1's performance doesn't scale linearly with CRF values. That is usually the expected behavior of encoders, and prior SVT-AV1 versions used to be like this too (cf. first deep dive article). This entails that when calculating the average BD-rate across the entire quality range, the shape of the speed curve can seriously throw off the results.
Anyway, let's try to interpret these results a bit. The faster presets can be found at the top left of the graph, and the slower ones in the bottom right. We can still generally notice the brighter colors tend to be above the darker ones (representing the newer versions), meaning the trade-offs did in fact improve overall. If we take preset -1 as an easy-to-analyze example, we can confirm our previous findings that it did in fact get faster and faster with versions at little to no efficiency impact.
What happens if we take a radically different preset as the reference though? Well, I got you covered with the following graphs, using SVT-AV1 v2.0.0's preset 10 (the default if unspecified) as the reference:
In this case, it tends to straighten the curves and make any sort of analysis harder. It does not help the situation and there's not much more to say with these graphs. At least, it gives a different perspective of the results.
All in all, let's just say this was a fine experiment and a good learning opportunity. I'll think of ways I can do this better by next time.
SVT-AV1 v2.0.0 -> v3.0.x Conclusion
So what do you think of SVT-AV1's evolution from v2.0.0 to v3.0.x? Do you find it underwhelming? Well, that's almost expected. The SVT-AV1 dev team has been hard at work to reduce the amount of presets. As that's been a focus for quite some time, I'm not sure the initial reasons for this are still valid today.
Anyway, what we're mostly seeing are small but free speed improvements from smart trade-off decisions. It's worth noting that versions 2.0.0 and 3.0.0 weren't actually major feature milestones. The version numbers jumped because of breaking API changes that aren't backward compatible with previous releases.
Beyond the incremental performance gains, analysis of the different changelogs reveals a clear development strategy focused on specific areas. The dev team has invested heavily in ARM optimizations and memory requirements reduction, worked on the fast-decode feature to further reduce decoding cycles, and streamlined the architecture by removing the decoder component entirely. While many of these changes might not influence quality metrics, they represent important steps toward broader SVT-AV1 adoption by facilitating integration for actors in the industry and ensuring cross-platform consistency.
The biggest changes since v2.0.0 probably lie in the parameters originating from SVT-AV1-PSY that were recently introduced.
Presets Analysis TLDR
So SVT-AV1 v3.0.0 delivers some nice speed gains across the board. I will refrain from giving numbers to which exact presets possess the best efficiency-to-speed trade-offs due to my speed accuracy concerns, but I can confidently say presets 2 and 4 remain the efficiency champions, giving you excellent quality without completely destroying your encode times. Presets 5-8 strike a good balance, trading a bit of efficiency for significantly quicker encodes.
This should give you a good foundation for picking your go-to preset(s) in v3.0.x.
SVT-AV1 v3.0.x Parameters Revisited
This blog post is already pretty long... Although we won't revisiting every encoder parameter like we did back in the first deep dive on v1.8.0, we will concentrate on a few important ones, some of them coming straight from the SVT-AV1-PSY project!
This section will be developed further in Part 2 of the article. I will leave you with the headlines to give you a taste of what's to come...
Variance Boost
Coming soon™
--tune
Coming soon™
--luminance-qp-bias
Coming soon™
--sharpness
Coming soon™
--tf-strength
Coming soon™
--lossless
Coming soon™
--fast-decode
Coming soon™
Tiles
Coming soon™
Closing Thoughts
Today, we tested five SVT-AV1 versions, ranging from v2.0.0 to v3.0.x, to quantify their relative efficiency and speed. New metrics were introduced and the format from the last blog post entry was iterated upon to improve the overall quality and flow of this present article. We used this opportunity to experiment and learn encoding knowledge along the way, I hope it was valuable to you too! This is only the first part of this benchmarking session, as we'll deep dive variance boost and a few other exciting features in the next one. Revisiting certain key parametres in such a context has been on my mind for quite a while, so I hope you will look forward to Part 2!
Your feedback and suggestions are always welcome as I work to improve this blog format. Do you have a request for me? Something you'd like to see fixed or added? Let me know what you think on socials or in the different communities I'm active in!
Thanks for reading and see you soon!
I want to extend my thanks to the people who contributed, directly or indirectly, to the making of this article, including Gianni Rosato (gb82), Line (Lumen), Soda, Emre, Bolu, Julio Barba, the people behind slow.pics for hosting thousands of screenshots each time I make these articles, the SVT-AV1 dev team for the work on this amazing encoder, and probably others I'm forgetting...
Consider supporting me by making a donation on my Ko-Fi page, to reward my efforts and to compensate for the electricity bills of weeks of non-stop encoding.