Skip to main content

8 posts tagged with "compression"

View All Tags

Deep Dive into SVT-AV1's Evolution (Part 2): Encoder Parameters Revisited

· 250 min read
Trix
Encoder

Welcome to the second part of my SVT-AV1 testing analysis!

I received lots of kind words, including constructive feedback to improve on my methodology, I sincerely appreciate it! While I couldn’t incorporate those suggestions for this follow-up, stay tuned, because you’ll likely hear more from me before summer ends.

Before we continue, if you haven’t seen Part 1 yet, I recommend giving it a quick look! It covers the test methodology, sample clips, and base encoder settings in detail. The only change to this Part is my decision to remove the full graphs in an attempt to cut down on bloat. If you really want to, you can still access them on the repository where we upload pictures for the blog posts here.

Now, without further ado, let’s dive right back in where we left off!

SVT-AV1 v3.0.x Parameters Revisited

Today, we are looking at 8 encoder features present in SVT-AV1 v3.0.x! Although we won't be revisiting every encoder parameter like we did back in the first deep dive on v1.8.0, we are going to concentrate on a few important ones, some of them coming straight from the SVT-AV1-PSY project!

Let's start with the feature you are likely most exciting about: varboost!

Variance Boost

The author of variance boost made a highly visual explanation of the feature in the official SVT-AV1 documentation you can find here. This overview will probably do a better job than me in describing what it does, but in a few words, varboost allocates more bits to low-contrast areas in a frame.

You can enable varboost via the --enable-variance-boost parameter and control its behavior by changing its --variance-boost-strength or its --variance-octile setting. Basically, the strength controls how much areas are to be boosted, while octile controls how much of the area needs to be deemed low-contrast before being boosted.


Varboost Strength

We're going to test all 4 available strengths values, with octile as the variable...

...starting with --variance-boost-strength 1!

---> Compression efficiency graphs, "high quality" range (CRF10 -> 23):

At HQ, on almost all metrics (excepted VMAF which is more inconsistent here), all octile values are more efficient than varboost disabled. With one notable exception in octile 8 which often scores the same or worse than no varboost depending on the clip. The lower octiles seem to score increasingly better here.

---> Compression efficiency graphs, "low quality" range (CRF28 -> 50):

At LQ, the results tend to be the same, however there are more outliers and overall the octile value seem to be more negligible. No varboost often scores better than many octile values on VMAF, which is an interesting behavior to observe.

---> Speed graphs:

In terms of speed, we won't be surprised to learn that the lower octile you go, the slower the encoding instance will be, as the output filesize is bigger and that tends to slow down SVT-AV1. By bitrate normalizing, we would get closer performance between all options.

---> Interpretation:

1 is the most conservative strength value, so we expect reasonable results from varboost here. All octile values consistently provide efficiency improvements except for --variance-octile 8. It is counter-intuitive to see lower octiles score better in efficiency so far, but we'll see if that changes as we increase the strength.

---> Visual Comparisons:

HQLQ
Avatar (HQ)Avatar (LQ)
Ducks (HQ)Ducks (LQ)
Fallout (HQ)Fallout (LQ)
Minecraft (HQ)Minecraft (LQ)
Sol Levante (HQ)Sol Levante (LQ)
Suzume (HQ)Suzume (LQ)
Mandalorian (HQ)Mandalorian (LQ)
Let's continue with --variance-boost-strength 2, the default strength value in every iteration of SVT-AV1:

---> Compression efficiency graphs, "high quality" range (CRF10 -> 23):

---> Compression efficiency graphs, "low quality" range (CRF28 -> 50):

---> Speed graphs:

---> Interpretation:

The picture is consistent with what we've previously seen, so I won't repeat myself. However, we must keep in mind that the higher strength and the lower octile we go, the more the curve is shifted to the top right, which may not paint the most fair picture of the situation and might skew our interpretation in favor of the one end of the graph or another. That being said, except for VMAF which behaves a bit differently with varboost, we see the default of --variance-boost-strength 2 --variance-octile 6 always increases efficiency over no varboost across basically all clips and metrics, so we can confidently say it does live up to its promise.

---> Visual Comparisons:

HQLQ
Avatar (HQ)Avatar (LQ)
Ducks (HQ)Ducks (LQ)
Fallout (HQ)Fallout (LQ)
Minecraft (HQ)Minecraft (LQ)
Sol Levante (HQ)Sol Levante (LQ)
Suzume (HQ)Suzume (LQ)
Mandalorian (HQ)Mandalorian (LQ)
Now for --variance-boost-strength 3:

---> Compression efficiency graphs, "high quality" range (CRF10 -> 23):

---> Compression efficiency graphs, "low quality" range (CRF28 -> 50):

---> Speed graphs:

---> Interpretation:

It's more of the same, varboost at strength 3 tends to perform better in efficiency at higher quality targets though there can still be gains at lower qualities depending on the content at hand.

---> Visual Comparisons:

HQLQ
Avatar (HQ)Avatar (LQ)
Ducks (HQ)Ducks (LQ)
Fallout (HQ)Fallout (LQ)
Minecraft (HQ)Minecraft (LQ)
Sol Levante (HQ)Sol Levante (LQ)
Suzume (HQ)Suzume (LQ)
Mandalorian (HQ)Mandalorian (LQ)
And lastly, --variance-boost-strength 4, which will print you a warning that this may be too aggressive in some usecases:

---> Compression efficiency graphs, "high quality" range (CRF10 -> 23):

---> Compression efficiency graphs, "low quality" range (CRF28 -> 50):

---> Speed graphs:

---> Interpretation:

So, what does this unusual varboost strength have in store for us? Well, nothing as dangerous as we're led to believe. In fact, by in large, we observe the same things as the other three strengths. You may have noticed this yourself, as we've increased the strength, Butteraugli has tended to highlight lower octiles more and more at high quality and no varboost more and more at low quality. Still, varboost on often provides great gains and the optimal octile depends a lot on the sample. The default octile 6 is a fine, safe default.

---> Visual Comparisons:

HQLQ
Avatar (HQ)Avatar (LQ)
Ducks (HQ)Ducks (LQ)
Fallout (HQ)Fallout (LQ)
Minecraft (HQ)Minecraft (LQ)
Sol Levante (HQ)Sol Levante (LQ)
Suzume (HQ)Suzume (LQ)
Mandalorian (HQ)Mandalorian (LQ)

Varboost Octile

Now, let's do the contrary and test all 8 octile values while solely adjusting the strength...

...starting with --variance-octile 1!

---> Compression efficiency graphs, "high quality" range (CRF10 -> 23):

---> Compression efficiency graphs, "low quality" range (CRF28 -> 50):

---> Speed graphs:

---> Interpretation:

For an octile value that supposedly should boost too many blocks in frames, 1 scores pretty consistently better than no varboost, especially at high quality. What's more, even though it can be argued all varboost curves are extremely close to one another, higher strengths are ever so slightly better performing than lower ones at high quality, while the contrary is more often observed at low quality though not always! Fallout seems to benefit less from varboost, heck it can be harmful at low quality according to some metrics. You will be unsurprised to hear that the higher the strength the slower encoding tends to be. Again, speed is affected by the resulting output filesize.

---> Visual Comparisons:

Coming soon™

Let's look at --variance-octile 2:

---> Compression efficiency graphs, "high quality" range (CRF10 -> 23):

---> Compression efficiency graphs, "low quality" range (CRF28 -> 50):

---> Speed graphs:

---> Interpretation:

Octile 2's results are similar to 1, that is to say close curves between all strengths, with a tendency for higher strengths to be better at high quality, for lower strengths to be at times preferable at low quality, and for varboost to overall be beneficial over it disabled, except according to VMAF.

---> Visual Comparisons:

Coming soon™

Let's test --variance-octile 3:

---> Compression efficiency graphs, "high quality" range (CRF10 -> 23):

---> Compression efficiency graphs, "low quality" range (CRF28 -> 50):

---> Speed graphs:

---> Interpretation:

It can be summarized the same as octile 1 and 2.

---> Visual Comparisons:

Coming soon™

And --variance-octile 4:

---> Compression efficiency graphs, "high quality" range (CRF10 -> 23):

---> Compression efficiency graphs, "low quality" range (CRF28 -> 50):

---> Speed graphs:

---> Interpretation:

I swear it's not just me getting lazy, there's not much more to say than as already been said!

---> Visual Comparisons:

Coming soon™

And --variance-octile 5!

---> Compression efficiency graphs, "high quality" range (CRF10 -> 23):

---> Compression efficiency graphs, "low quality" range (CRF28 -> 50):

---> Speed graphs:

---> Interpretation:

Cf. previous interpretations.

---> Visual Comparisons:

Coming soon™

What about --variance-octile 6, the default octile value in every iteration of SVT-AV1:

---> Compression efficiency graphs, "high quality" range (CRF10 -> 23):

---> Compression efficiency graphs, "low quality" range (CRF28 -> 50):

---> Speed graphs:

---> Interpretation:

The default octile value tends, again, to perform better on higher strengths at high quality, and sometimes lower strengths at low quality. We notice it is getting easier to distinct between strengths as we've increased octile, which can be good depending on how you see things. Something we haven't discussed until now is how higher strengths tend to be more inconsistent according to SSIMU2 as you get closer to the CRF40-50 range, which could be problematic depending on your usecase, and thus something to keep in mind.

---> Visual Comparisons:

Coming soon™

Following with --variance-octile 7 tests:

---> Compression efficiency graphs, "high quality" range (CRF10 -> 23):

---> Compression efficiency graphs, "low quality" range (CRF28 -> 50):

---> Speed graphs:

---> Interpretation:

Cf. previous interpretations.

---> Visual Comparisons:

Coming soon™

And lastly --variance-octile 8 testing:

---> Compression efficiency graphs, "high quality" range (CRF10 -> 23):

---> Compression efficiency graphs, "low quality" range (CRF28 -> 50):

---> Speed graphs:

---> Interpretation:

Well, here we are, octile 8 barely provides any gains and can be (quite) harmful depending on the clip and metric, especially at low qualities, so this can hardly ever be recommended.

---> Visual Comparisons:

Coming soon™


Varboost Curve

We will be comparing all 3 available varboost curves...

...with the default settings first:

---> Compression efficiency graphs, "high quality" range (CRF10 -> 23):

---> Compression efficiency graphs, "low quality" range (CRF28 -> 50):

---> Speed graphs:

---> Interpretation:

We observe that all three curves tend to be blurred together, with the notable exception of curve 2 scoring non-negligibly better on a few clips, consistently across all metrics. Speed-wise, I would argue it's a tie as all three curves are sometimes situationally slightly slower or faster.

---> Visual Comparisons:

HQLQ
Avatar (HQ)Avatar (LQ)
Ducks (HQ)Ducks (LQ)
Fallout (HQ)Fallout (LQ)
Minecraft (HQ)Minecraft (LQ)
Sol Levante (HQ)Sol Levante (LQ)
Suzume (HQ)Suzume (LQ)
Mandalorian (HQ)Mandalorian (LQ)
I also conducted the tests with a different combination of varboost settings I have used on occasion:

---> Compression efficiency graphs, "high quality" range (CRF10 -> 23):

---> Compression efficiency graphs, "low quality" range (CRF28 -> 50):

---> Speed graphs:

---> Interpretation:

The winner is less clear here, this time we can say it's a tie. Such results are why testing with a different combination of settings can be educative.

---> Visual Comparisons:

HQLQ
Avatar (HQ)Avatar (LQ)
Ducks (HQ)Ducks (LQ)
Fallout (HQ)Fallout (LQ)
Minecraft (HQ)Minecraft (LQ)
Sol Levante (HQ)Sol Levante (LQ)
Suzume (HQ)Suzume (LQ)
Mandalorian (HQ)Mandalorian (LQ)

Varboost Conclusion

We have seen that, in SVT-AV1 v3.0.x, varboost on more often than not provides consistent improvements whatever the combination of settings, though we see better gains at high qualities and/or with higher strengths and lower octiles, less gains at low qualities and/or with lower strengths and higher octiles.

It is best to stay on the default --variance-boost-strength 2 --variance-octile 6 unless you want to hyper-tune for your content, or you want to encode at lower than CRF20 where I guess a combination like --variance-boost-strength 3 --variance-octile 3 can provide more consistent efficiency gains.

--variance-boost-strength 4 is still clearly better than no varboost and sometimes lower strengths too, but it often performs worse than 3. --variance-octile 1 & 2 often came out on top, but they may be too aggressive or inflate filesizes too much, so a more conservative value is advised.

My last recommendation would be to remember to adjust CRF in consequence, as enabling varboost, increasing the strength and decreasing octile can drastically boost filesizes. If you understand how these graphs work, then you know you aren't compromising anything by increasing CRF to compensate for the filesize increase.


--tune

The original SVT-AV1 implementation offers three tunes to choose from:

  • --tune 0 (VQ - Visual Quality): Favors sharper decisions, reducing blur but potentially introducing artifacts.
  • --tune 1 (PSNR - Peak Signal-to-Noise Ratio): The default in mainline SVT-AV1.
  • --tune 2 (SSIM - Structural Similarity Index Measure): The previous efficiency champion.

Has the dynamics between tunes changed since we last tested them on SSIMULACRA2 almost a year and a half ago? Let's see!

---> Compression efficiency graphs, "high quality" range (CRF10 -> 23):

---> Compression efficiency graphs, "low quality" range (CRF28 -> 50):

---> Speed graphs:

---> Interpretation:

In most cases, on most metrics, tune 1 comes out on top in efficiency, with tune 2 sometimes trading blows and even winning. Tune 0 is usually quite a bit behind, except on XPSNR. In terms of speed, all tunes seem to perform closely from one another, without a clear winner.

For most users, sticking with the default (tune 1) is recommended, as it provides the best balance of everything. However, tune 0 can be worth experimenting with if you prefer sharper outputs. Just be aware of the potential trade-offs in artifacting.

Specialized Tunes in SVT-AV1 Forks

The SVT-AV1-PSY based forks introduced new tunes to catter to new usecases. We won't be testing these forks today but I'm mentioning them for reference purposes. Note that the following two --tune 3 modes are entirely different depending on the encoder variant!

  • --tune 3 (SVT-AV1-PSY(EX) exclusive): A psychovisual enhancement of --tune 2, borrowing some features from --tune 0 and other tweaks. A general-purpose psychovisual tune for a wide range of content.
  • --tune 3 (SVT-AV1-HDR): Acts as a grain-optimized mode, disabling CDEF, restoration, and temporal filtering while applying aggressive psychovisual adjustments. Best suited for noisy live-action content.

We may want to confirm our findings with a visual analysis on the encoded samples:

---> Visual Comparisons:

HQLQ
Avatar (HQ)Avatar (LQ)
Ducks (HQ)Ducks (LQ)
Fallout (HQ)Fallout (LQ)
Minecraft (HQ)Minecraft (LQ)
Sol Levante (HQ)Sol Levante (LQ)
Suzume (HQ)Suzume (LQ)
Mandalorian (HQ)Mandalorian (LQ)

Lastly, my own opinion regarding the tunes is that it probably does not matter as much as you think. Again, unless you're willing to hyper-tune, which implies testing parameters each time you encode a new source, no one can guess in advance what's going to be best for that content because it depends on an almost infinite amount of variables. The thing with psychovisual approaches is that metrics may be saying something and your eyes something else. Plus it is wildly subjective: my eyes may disagree with yours, or anyone else's. Don't fret too much over the tune.


Let's continue with the newly introduced parameters of v3.0.0, starting out with an important feature in the context of AV1: luma bias!

--luminance-qp-bias

Anyone familiar with AV1 for long enough is aware that its encoders have struggled on dark scenes forever. As the encoder implementations mature, performance in such scenes naturally improve, however in many cases they persist in allocating insufficient bits to these darker scenes. This setting changes (almost) everything! It effectively applies a dumb qp offset to frames of lower overall brightness. The higher the value, the stronger the effect is. This implementation has one advantage and one weakness: it gives the user control over the bitrate balancing between bright and dark frames, however if only parts of the frame are dark and the rest is fairly bright, it may not fix cases of localized detail loss or blurring.

So let's see the effect it can have on efficiency and visuals!

---> Compression efficiency graphs, "high quality" range (CRF10 -> 23):

---> Compression efficiency graphs, "low quality" range (CRF28 -> 50):

---> Speed graphs:

---> Interpretation:

On the HQ graphs, we can see no luma-bias tends to score last or at least lower than conservative luma-bias values. On the LQ graphs, we usually observe the opposite, that is to say that no luma-bias tends to barely come out on top, with increasing luma-bias values decreasing efficiency ever so slightly.

If we look at performance, the impact of luma-bias is negligible at lower CRF values, but gradually increases with higher CRF values for the simple reason that the frames whose QP will be reduced will take a bit more time to encode, as is the expected behavior of any encoder. The effect on QP, and thus performance, is going to be more important the higher base CRF you start with.

Let's look at some BD-rate numbers directly, to try and clarify the graphs:

Luma-Bias EffectLB0LB10LB20LB30LB40LB50LB60LB70LB80LB90LB100
Avatar0%+0.18%+0.59%+0.74%+0.89%+1.11%+1.28%+1.45%+1.63%+1.73%+1.94%
Ducks0%+1.33%+2.34%+3.04%+3.76%+4.43%+5.02%+5.74%+6.29%+6.82%+7.43%
Fallout0%-0.31%+0.79%+1.29%+1.84%+2.56%+2.72%+3.04%+3.40%+3.73%+4.06%
Minecraft0%+0.69%+1.43%+1.99%+2.64%+3.10%+3.47%+3.87%+4.26%+4.59%+5.06%
Sol Levante0%-0.09%-0.12%-0.02%+0.21%+0.29%+0.44%+0.42%+0.53%+0.60%+0.75%
Suzume0%-0.45%-0.43%-0.34%-0.31%-0.30%-0.21%-0.11%-0.07%+0.02%+0.08%
Mandalorian0%+0.57%+0.91%+1.29%+1.56%+1.80%+2.24%+2.30%+2.53%+2.78%+3.03%

Unfortunately, I couldn't easily separate the BD-rate numbers at low quality and the ones at high quality, so we can't exactly reproduce what I interpreted from the graphs earlier. Still, this gives insightful data, which showcase that the BD-rate across the entire quality range is improved the most on the anime clips and tends to be harmful on live action and gaming clips, again across the entire quality range, not when isolating a smaller range where we could make luma-bias look way more appealing.

Luma-bias is no magic, it simply offsets a frame QP depending on its average luminance. So unless rate control is badly tuned, the feature is not expected in theory to bring any efficiency improvements.

My advice for choosing a luma-bias value is to simply isolate a test sample in your source and test out a few values (like 10, 30, 50...), look at the bitrate balance between dark and bright frames by checking the impact on visuals and filesizes, and finally decide what you prefer. On that note, you cannot go wrong by using a conservative value below 50 in my opinion.

Beware that the author of the feature warned me it isn't suited for the PQ transfer used in most UHD blu-rays! Use luma-bias exclusively on SDR and HDR HLG videos.

---> Visual Comparisons:

HQLQ
Avatar (HQ)Avatar (LQ)
Ducks (HQ)Ducks (LQ)
Fallout (HQ)Fallout (LQ)
Minecraft (HQ)Minecraft (LQ)
Sol Levante (HQ)Sol Levante (LQ)
Suzume (HQ)Suzume (LQ)
Mandalorian (HQ)Mandalorian (LQ)

--sharpness

Sharpness is a straighforward parameter, though it may not do exactly what you'd assume it to do. It does not affect the encodes' clarity but rather impacts the deblocking filter sharpness, which can lead to increased fidelity.

---> Compression efficiency graphs, "high quality" range (CRF10 -> 23):

---> Compression efficiency graphs, "low quality" range (CRF28 -> 50):

---> Speed graphs:

---> Interpretation:

Like luma-bias, this feature can only be useful if the deblocking filter isn't properly tuned for all usecases to begin with. On surface, it looks like there is not much room for improvements. It does depend on the metric and clip, but in many of these, all sharpness values perform extremely closely from each another, apart from 3 (a bit lower or higher) and 4+ (usually noticeably lower).

Speed wise, negative sharpness values perform closely to the default of 0, and increasing positive values become slower and slower, but as we can see on the x axis, it's rather negligible.

Again, there are so many data points that a BD-rate table will help visualize stuff differently:

Sharpness Effect-7-6-5-4-3-2-10+1+2+3+4+5+6+7
Avatar0%-0.04%-0.05%-0.04%-0.16%-0.04%-0.03%-0.02%-0.03%-0.01%+1.07%+3.44%+6.05%+8.28%+9.65%
Ducks0%-0.05%-0.09%-0.10%-0.14%-0.10%-0.09%-0.06%+0.04%+0.06%+2.14%+8.94%+14.24%+18.19%+20.74%
Fallout0%-0.05%-0.07%-0.13%-0.16%-0.15%-0.13%-0.13%+0.01%+0.01%+0.99%+5.62%+10.23%+13.68%+16.00%
Minecraft0%-0.04%-0.08%-0.12%-0.17%-0.11%-0.08%-0.01%+0.21%+0.26%-1.14%+0.88%+3.88%+5.60%+6.83%
Sol Levante0%-0.03%-0.07%-0.10%-0.02%-0.01%-0.17%-0.06%-0.05%+0.04%-3.61%-3.61%-2.65%-1.89%-1.53%
Suzume0%-0.03%-0.07%-0.15%-0.13%-0.17%-0.20%-0.13%-0.75%-0.77%-0.46%+2.73%+5.87%+7.89%+8.91%
Mandalorian0%-0.02%-0.04%-0.01%-0.17%-0.13%-0.06%-0.20%-0.95%-1.07%-1.86%-0.49%+1.09%+2.16%+2.65%

The same disclaimer is applicable: BD-rate numbers across such a wide quality range are bound to not be representative of smaller ranges, and doing such average across all 4 metrics could reduce the impact of outliers and make the situation look better than it actually is. Which is why this information is important to pair with the individual graphs, and the visual comparisons.

Anyway, on average we can see in the table above that negative sharpness values (-7 to -1) generally show minor BD-rate improvements or neutral effects, while positive sharpness values (+1 to +7) can show decent, consistent BD-rate improvements or neutral effects with 1 & 2, and either bigger gains or significant degradations with 3 and above.

We notice that "Sol Levante" loves sharpness (any value really), whereas the gaming clips and "ducks take off" rather dislike it.

So, what effect does sharpness have on visuals:

---> Visual Comparisons:

HQLQ
Avatar (HQ)Avatar (LQ)
Ducks (HQ)Ducks (LQ)
Fallout (HQ)Fallout (LQ)
Minecraft (HQ)Minecraft (LQ)
Sol Levante (HQ)Sol Levante (LQ)
Suzume (HQ)Suzume (LQ)
Mandalorian (HQ)Mandalorian (LQ)

--tf-strength

Temporal filtering in SVT-AV1 combines information from multiple nearby video frames to create cleaner reference pictures with reduced noise, which helps improve compression quality especially for noisy source material.

The feature was often considered too strong and often created unavoidable blocking on keyframes, so we historically disabled temporal filtering. Thankfully, a strength parameter has been introduced which has allowed to tame its effects.

---> Compression efficiency graphs, "high quality" range (CRF10 -> 23):

---> Compression efficiency graphs, "low quality" range (CRF28 -> 50):

---> Speed graphs:

---> Interpretation:

On this corpus of clips, we instantly notice tf-strength 4 performs very poorly and should probably never be used. Lower strengths than the default 3 tend to score increasingly better, though we seem to hit a ceiling below 2. No temporal filtering is competitive with low tf-strengths, though it still gets beaten slightly at times. There is an interesting outlier in Minecraft where low strengths and especially no tf perform noticeably worse for some reason. The difference in speed between all options is overall negligible.

From this, it is recommended at a minimum to reduce --tf-strength from its default 3 to 1, or below, to at least completely eliminate the tf blocking issue.

Additional Parameter in SVT-AV1 Forks

The SVT-AV1-PSY based forks include an additional --kf-tf-strength parameter which decouples tf strength on keyframes, and allows the user to concurrently fix the blocking issue and use a stronger tf strength on all other frames if they want. In mainline SVT-AV1, tf-strength is the same between keyframes and other frames, unless you use tune 0 in which case tf-strength on keyframes will be 1 value lower than on other frames (so for example 1 on keyframes if --tune 0 --tf-strength 2 is set, the strength being capped at a minimum of 0 in any case).

---> Visual Comparisons:

HQLQ
Avatar (HQ)Avatar (LQ)
Ducks (HQ)Ducks (LQ)
Fallout (HQ)Fallout (LQ)
Minecraft (HQ)Minecraft (LQ)
Sol Levante (HQ)Sol Levante (LQ)
Suzume (HQ)Suzume (LQ)
Mandalorian (HQ)Mandalorian (LQ)

--lossless

SVT-AV1 finally added a lossless mode in v3.0.0! Until then, aomenc was the only AV1 software encoder capable of doing lossless encoding, as both rav1e and SVT-AV1 wouldn't allow you to set Q0/CRF0. It probably shouldn't be expected to see better lossless compression out of SVT-AV1, rather, the encoder aims to achieve feature parity with aomenc, but can it do it fast?

I initially intended this section to compare aomenc to SVT-AV1, but complications quickly arose from my testing as you'll soon be able to see. Obviously, there is no question of efficiency graphs here as the metrics are expected to always be maxed out with lossless enabled. Anyway, no graphs will be needed here.

In fact, SVT-AV1 lossless mode isn't actually mathematically lossless. Rather, the feature was designed to reach a PSNR of 100 more often than not (not inf!), as we can see in the Merge Request that introduced the feature.

Running --preset 4 --lossless 1 proves it on my test samples:

"Lossless" TestPSNR (average)PSNR (min)PSNR (max)
Avatar96.61600686.680008inf
Ducksinfinfinf
Fallout104.37840292.930594inf
Minecraft96.26305287.993916inf
Sol Levante119.890723106.868980inf
Suzume116.226350102.759654inf
Mandalorian102.54689391.381363inf

XPSNR and W-VMAF are not included here as the numbers were nonsensical due to the chroma weighting at play. As for Butteraugli's different intensity multiplier and SSIMULACRA2's harmonic scoring, they are not expected to affect the score of a lossless frame, however the GPU implementation itself isn't 100% score accurate to their original's so they can't be considered reliable for this specific usecase. Therefore, regular ffmpeg's PSNR filter was used here.

Sure, the scores are high, but not lossless-like! Funnily enough, the anime clips' average score a bit higher. That makes SVT-AV1 more lossless on anime than on live action! Is this confirmation that SVT-AV1 is a weeb encoder? You tell me. Anyway, for some reason, "ducks take off" is the only clip that is properly lossless out of SVT-AV1 here.

What's even more concerning is that even though the outputs are not mathematically lossless, the filesizes are often bigger than libx264 -preset veryfast -qp 0, the latter of which is properly lossless:

"Lossless" Testx264 FilesizeSVT-AV1 Filesize
Avatar282,198,409 o280,441,784 o (-0.6%)
Ducks244,077,992 o234,884,891 o (-3.8%)
Fallout272,838,887 o278,457,794 o (+2.1%)
Minecraft382,935,638 o473,884,851 o (+23.8%)
Sol Levante438,207,769 o448,937,567 o (+2.4%)
Suzume372,844,589 o384,223,861 o (+3.1%)
Mandalorian182,526,455 o185,680,771 o (+1.7%)

Needless to say the AV1 encodes took at least 10x longer to encode as well!

Considering that "Ducks" is mathematically lossless and a few percents smaller, it shows that the format is in fact capable of compression gains in this department. Simply, the current iteration is either broken or misleading in its true intentions.

While the encodes are not mathematically lossless, I doubt anyone can see any difference with our bare eyes, but I still made comps so you can see for yourself.

---> Visual Comparisons:


--fast-decode

SVT-AV1 ships with its own built-in method for reducing decoding bottlenecks by smartly tuning down or disabling specific internal tools that trade off some efficiency for decoding performance.

The encoder offers two --fast-decode levels, with 2 being more aggressive. The default is 0.

Three presets have been tested with the feature to quantify what the effects would be for different usecases. Due to some hardware-related issues, all the encodes in this section were re-run, so the speed numbers of disabled fast-decode encodes are different from before, however this ensured there would be no impact on my subsequent analysis.

Preset 2

---> Compression efficiency graphs, "high quality" range (CRF10 -> 23):

---> Compression efficiency graphs, "low quality" range (CRF28 -> 50):

---> Speed graphs:

---> Interpretation:

The efficiency impact of fast-decode is clearly visible, be it at low or high quality targets. Beware of the much different scales on the y-axix between the HQ and LQ graphs, which could mislead you into thinking the effect at low quality is lesser, even though that's not the case.

We observe that fast-decode can non-negligibly influence the speed of your encoding instances, with 1 appearing slower than 0 and 2 being faster than both. There's an exception in "Sol Levante" where 0 and 1 perform the same.

Let's look at the effects on decoding speed with data I nicely aggregated into tables!

Decoding speeds were collected using ffmpeg 7.1.0 from Arch's official repository using: ffmpeg -hide_banner -benchmark -i "" -f null - > /dev/null. The tests were repeated 5 times and the performance numbers you'll see are the average of these 5 runs. I can say with confidence the speed deviation was largely negligible, but better safe than sorry!

Preset 2 Decoding Test (CRF12)FD0FD1FD2
Avatar197fps204fps (+3.6%)212fps (+7.6%)
Ducks118fps126fps (+6.8%)125fps (+5.9%)
Fallout65fps67fps (+3.1%)68fps (+4.6%)
Minecraft82fps85fps (+3.7%)91fps (+11.0%)
Sol Levante98fps105fps (+7.1%)107fps (+9.2%)
Suzume360fps401fps (+11.4%)397fps (+10.3%)
Mandalorian326fps348fps (+6.7%)340fps (+4.3%)

In our high quality target, --fast-decode 1 provides a +6% decoding performance increase on average, while --fast-decode 2 provides +7.5%.

Preset 2 Decoding Test (CRF33)FD0FD1FD2
Avatar429fps465fps (+8.4%)482fps (+12.4%)
Ducks341fps366fps (+7.3%)342fps (+0.3%)
Fallout131fps140fps (+6.9%)140fps (+6.9%)
Minecraft182fps211fps (+15.9%)217fps (+19.2%)
Sol Levante194fps211fps (+8.8%)210fps (+8.2%)
Suzume684fps766fps (+12.0%)770fps (+12.6%)
Mandalorian885fps912fps (+3.1%)946fps (+6.9%)

At CRF33, --fast-decode 1's decoding speeds were faster of about +8.9% and --fast-decode 2 of about +9.5%.

Sure, --fast-decode 2 doesn't provide substancial decoding benefits over 1, heck even over 0 at times, however it also acts as a "fast-encode" parameter, so it can be argued it has its purpose.

Before we continue with the visual comparisons, I will add that we are not on a low powered device, so the importance of such gains may not be immediately visible nor perfectly accurate to be honest.

---> Visual Comparisons:

HQLQ
Avatar (HQ)Avatar (LQ)
Ducks (HQ)Ducks (LQ)
Fallout (HQ)Fallout (LQ)
Minecraft (HQ)Minecraft (LQ)
Sol Levante (HQ)Sol Levante (LQ)
Suzume (HQ)Suzume (LQ)
Mandalorian (HQ)Mandalorian (LQ)

Preset 4

---> Compression efficiency graphs, "high quality" range (CRF10 -> 23):

---> Compression efficiency graphs, "low quality" range (CRF28 -> 50):

---> Speed graphs:

---> Interpretation:

It can be observed the gaming clips's efficiency is less affected by the fast-decode feature, if at all. Even then, --fast-decode 2 continue to provide encoding performance benefits.

As we move to faster presets, the resulting streams are expected to be less complex and thus easier to decode. Is that assumption correct? And if so, does it imply that fast-decode will have a smaller impact when using preset 4?

Preset 4 Decoding Test (CRF12)FD0FD1FD2
Avatar185fps202fps (+9.2%)203fps (+9.7%)
Ducks117fps126fps (+7.7%)121fps (+3.4%)
Fallout67fps71fps (+6.0%)70fps (+4.5%)
Minecraft87fps95fps (+9.2%)94fps (+8.0%)
Sol Levante91fps100fps (+9.9%)99fps (+8.8%)
Suzume324fps372fps (+14.8%)360fps (+11.1%)
Mandalorian312fps336fps (+7.7%)326fps (+4.5%)

Looking at the FD0 column of the below and above table, compared to the preset 2 ones, my claim seems to barely apply and only consistently at low quality levels at that. We'll see how it goes for preset 6 in the next sub-section.

Anyway, the average decoding speed impact at HQ is as follow:

  • +9.2% for --fast-decode 1,
  • +7.1% for --fast-decode 2.
Preset 4 Decoding Test (CRF33)FD0FD1FD2
Avatar432fps494fps (+14.4%)499fps (+15.5%)
Ducks354fps394fps (+11.3%)375fps (+5.9%)
Fallout143fps156fps (+9.1%)146fps (+2.1%)
Minecraft197fps258fps (+31.0%)254fps (+28.9%)
Sol Levante200fps224fps (+12.0%)217fps (+8.5%)
Suzume696fps811fps (+16.5%)763fps (+9.6%)
Mandalorian902fps981fps (+8.8%)999fps (+10.8%)

--fast-decode 1 gives a hefty +14.7% increase in decoding performance at lower quality targets, as certain clips like "Minecraft" greatly benefit from it. --fast-decode 2 brings a more modest +11.6% improvement.

The trade-offs are quite a bit different than at preset 2 which is quite fascinating, but --preset 4 seems to draw more benefit from fast-decode!

---> Visual Comparisons:

HQLQ
Avatar (HQ)Avatar (LQ)
Ducks (HQ)Ducks (LQ)
Fallout (HQ)Fallout (LQ)
Minecraft (HQ)Minecraft (LQ)
Sol Levante (HQ)Sol Levante (LQ)
Suzume (HQ)Suzume (LQ)
Mandalorian (HQ)Mandalorian (LQ)

Preset 6

---> Compression efficiency graphs, "high quality" range (CRF10 -> 23):

---> Compression efficiency graphs, "low quality" range (CRF28 -> 50):

---> Speed graphs:

---> Interpretation:

The efficiency situation here is the same as on preset 4, however we notice --fast-decode 1 is no longer slower than disabled. Instead, it places itself in-between 0 and 2 in terms of performance, offering another kind of trade-off again!

What influence on decoding speeds can we expect at --preset 6?

Preset 6 Decoding Test (CRF12)FD0FD1FD2
Avatar218fps216fps (-0.9%)228fps (+4.6%)
Ducks128fps127fps (-0.8%)130fps (+1.6%)
Fallout76fps74fps (-2.6%)77fps (+1.3%)
Minecraft110fps106fps (-3.6%)112fps (+1.8%)
Sol Levante100fps104fps (+4.0%)107fps (+7.0%)
Suzume382fps381fps (-0.3%)394fps (+3.1%)
Mandalorian343fps343fps (0.0%)357fps (+4.1%)

This time, be it at high or low quality, the base decoding speed without fast-decode is almost always faster.

At CRF12, --fast-decode 1 and --fast-decode 2 respectively deliver a -0.6% decoding speed regression and a +3.4% speed improvement.

Preset 6 Decoding Test (CRF33)FD0FD1FD2
Avatar468fps480fps (+2.6%)508fps (+8.5%)
Ducks358fps364fps (+1.7%)382fps (+6.7%)
Fallout169fps168fps (-0.6%)177fps (+4.7%)
Minecraft278fps303fps (+9.0%)335fps (+20.5%)
Sol Levante209fps215fps (+2.9%)221fps (+5.7%)
Suzume667fps694fps (+4.0%)715fps (+7.2%)
Mandalorian878fps912fps (+3.9%)950fps (+8.2%)

The gains at low qualities are more appealing, at +3.4% for --fast-decode 1 and +8.8% for --fast-decode 2.

--preset 6 benefits less from fast-decode, especially mode 1, probably due to a shift in the decoding bottlenecks past this point.

---> Visual Comparisons:

HQLQ
Avatar (HQ)Avatar (LQ)
Ducks (HQ)Ducks (LQ)
Fallout (HQ)Fallout (LQ)
Minecraft (HQ)Minecraft (LQ)
Sol Levante (HQ)Sol Levante (LQ)
Suzume (HQ)Suzume (LQ)
Mandalorian (HQ)Mandalorian (LQ)

Fast Decode Conclusion

In performance constrained scenarios, like low power ARM devices, fast-decode could come in handy to help smooth out your playback experience, at the possible cost of some efficiency.

Keep in mind it has been observed that the output of fast-decode modes can be more prone to macro-blocking depending on source characteristics, so proceed with caution.


Tiles

AV1 tiles are a straightforward method of splitting the video frame into independent tiles of equal size to hopefully increase encoding and decoding thread-ability. In SVT-AV1, tiles don't increase encoding speeds but they can help devices (especially low-powered ones) to software decode AV1 more easily. We are going to challenge these claims.

One can combine fast-decode and tiles to decrease decoding complexity further.

---> Compression efficiency graphs, "high quality" range (CRF10 -> 23):

---> Compression efficiency graphs, "low quality" range (CRF28 -> 50):

---> Speed graphs:

---> Interpretation:

Except on "Suzume", and to some extent on "Mandalorian" at low qualities, where we see the higher number of tiles start to impact efficiency, the effect of tiles on efficiency is by in large negligible. It does seem like encoding speeds aren't particularly impacted by tiles in this encoder.

Tiles Decoding Test (CRF12)c0r0c1r0c1r1c2r0c2r1
Avatar185fps335fps (+81%)446fps (+141%)477fps (+158%)487fps (+163%)
Ducks117fps224fps (+91%)330fps (+182%)348fps (+197%)410fps (+250%)
Fallout67fps125fps (+87%)186fps (+178%)202fps (+201%)223fps (+233%)
Minecraft87fps164fps (+89%)223fps (+156%)256fps (+194%)267fps (+207%)
Sol Levante91fps162fps (+78%)205fps (+125%)219fps (+141%)217fps (+138%)
Suzume324fps544fps (+68%)679fps (+110%)732fps (+126%)729fps (+125%)
Mandalorian312fps520fps (+67%)631fps (+102%)681fps (+118%)688fps (+121%)
Tiles Decoding Test (CRF33)c0r0c1r0c1r1c2r0c2r1
Avatar432fps698fps (+62%)780fps (+81%)816fps (+89%)758fps (+75%)
Ducks354fps650fps (+84%)845fps (+139%)860fps (+143%)920fps (+160%)
Fallout143fps259fps (+81%)355fps (+148%)375fps (+162%)386fps (+170%)
Minecraft197fps346fps (+76%)432fps (+119%)489fps (+148%)481fps (+144%)
Sol Levante200fps295fps (+48%)285fps (+43%)312fps (+56%)292fps (+46%)
Suzume696fps991fps (+42%)1020fps (+47%)1082fps (+55%)1008fps (+45%)
Mandalorian902fps1183fps (+31%)1164fps (+29%)1207fps (+34%)1128fps (+25%)

No, you aren't dreaming! We immediately realize that the impact of tiles is significantly greater than either of the fast-decode modes, an especially appealing outcome given their relatively minor effect on compression efficiency. Decoding speeds improve even further at higher quality settings, with the smallest gain reaching +67% at CRF12. That kind of difference can turn an unplayable file into one that runs smoothly.

Tile-rows offer less benefit than tile-columns, and in some cases even cause slight regressions at lower qualities. Still, seeing up to +250% improvements in decoding speed is nothing to scoff at. Dav1d’s performance is genuinely impressive, now exceeding 1000fps on certain clips!

---> Visual Comparisons:

HQLQ
Avatar (HQ)Avatar (LQ)
Ducks (HQ)Ducks (LQ)
Fallout (HQ)Fallout (LQ)
Minecraft (HQ)Minecraft (LQ)
Sol Levante (HQ)Sol Levante (LQ)
Suzume (HQ)Suzume (LQ)
Mandalorian (HQ)Mandalorian (LQ)

If I had to give recommendations based on what we saw, for a good balance between efficiency and decoding performance, consider the following tile settings:

  • --tile-columns 1 --tile-rows 0: for 1080p and above
  • --tile-columns 2 --tile-rows 0: for 4K and above

Of course, if decoding speed isn’t a concern at all, you can stick with the default --tile-columns 0 --tile-rows 0. But even then, I believe enabling tiles is worth considering for future-proofing purposes.

Statistics

Who doesn't love a bunch of random and useless stats? Well, if you recognize yourself in this, I got you. I compiled a few, though I only included stuff that ended up being used in these two parts. Yes, test encodes, test graphs or test comps figures are not included here.

Here's this deep dive in numbers:

  • 10577 total AV1 encodes...
  • ...which account for a total size of 294GB...
  • ...which makes for an average of about 28MB per file
  • 3928 total graphs
  • 2030 total decoding runs
  • 427 slow.pics comps...
  • ...for a grand total of about 12000 distinct screenshots uploaded!

Even I was flabbergasted when I discovered the actual scope of this benchmarking session!

Conclusion

In this second part, we took a deep dive into several key SVT-AV1 encoding parameters, re-evaluating them in today's context. The goal was mainly to explore how these settings impact efficiency and encoding speeds, but at times also decoding performance or practical usability. While I could have revisited even more parameters, the sheer time investment required means I had to draw the line somewhere. I made sure to go really in-depth with the ones that mattered most in my opinion.

As always, your mileage may vary regarding any kind of speed figures. The key is to test things yourself, with your own clips, workflows, and goals in mind. What works for one setup might not suit another, and that’s what makes this kind of testing both challenging... and somewhat rewarding too.

This wraps up Part 2, but there’s more to come. I'm always thinking about what to explore next, and your feedback helps shape that direction. So if you have suggestions, requests, or thoughts to share, I’d love to hear them, in the usual places.

Thanks again for reading, and I hope you found this deep dive insightful!

Future

First of all, I do not consider this testing complete. I have been told some comps had issues, and while re-generating them, I ended up rate-limited by slow.pics again, though only temporarily fortunately. As you can imagine, even with scripts, preparing and uploading these takes quite a while already, but I'm taking even more precautions to avoid getting banned again. I decided not to delay this Part further, so you wouldn't have to wait more, but I'll ask you to be patient a bit longer for me to add the fixed comps for Part 1 and the varboost octiles comps for this Part. I expect to be able to update both articles before the end of the week.

As I eluded in the last section, I have ideas for future blog posts. First of all, SVT-AV1 v3.1.0 is right around the corner, and it is touted to be quite an update especially for VBR and realtime usecases! It could be the occasion for me to do the long awaited target bitrate tests and compare efficiency with CRF on our test samples...

Second, most of the ideas I proposed in past articles remain valid possibilities and I still want to test other AV1 encoders in this format. AV2 is approaching too so it would be interesting to make some early comparisons of AV1 with AVM, if there's enough time for that.

Third, there's still so much that could be improved in my methodology, to make it more robust and precise, so I will continue experimenting as usual to find a better overall formula!

Lastly, I’ve also been working on a few AV1-related projects behind the scenes, including one that involves SVT-AV1 directly... I’m looking forward to sharing more about it in the future, so stay tuned!

I want to extend my thanks a second time to the people who contributed, directly or indirectly, to the making of this article, including Gianni Rosato (gb82), Line (Lumen), Soda, Emre, Bolu, Julio Barba, the people behind slow.pics for hosting thousands of screenshots each time I make these articles, the SVT-AV1 dev team for the work on this amazing encoder, and probably others I'm forgetting...


Consider supporting me by making a donation on my Ko-Fi page.

Deep Dive into SVT-AV1's Evolution (Part 1): Presets Analysis from v2.0 to v3.0

· 242 min read
Trix
Encoder

It's been almost a year since SVT-AV1 v2.0.0 dropped in March 2024, and we finally got v3.0.0 in late February of this year. Minor versions v3.0.1 and v3.0.2 came along afterward with some bug fixes and ARM SIMD improvements, but they didn't meaningfully alter encoding results.

So what's actually different between these versions? I've been wanting to run tests across all the major releases from v2.0.0 to v3.0.x to see how the speed vs quality trade-offs have evolved this past year. Using SSIMULACRA2, Butteraugli, XPSNR, and VMAF (plus some methodology tweaks I'll get into), I'll break down what each version brought to the table and in a second part, we'll also deep dive a few specific options that appeared in the encoder since my first blog post release so you can figure out what you may want to use for your projects. That includes variance boost, fast decode, temporal filtering strength and a few others...

Better late than never: SVT-AV1 v2.2.x Deep Dive

· 160 min read
Trix
Encoder

SVT-AV1 v2.2.0 was released in late August and a minor version v2.2.1 followed suit to adress some bugs. This blog post will focus on comparing this new encoder version to the last, on the basis of benchmarks and visual comparisons. We will quantify the new trade-offs between compression efficiency and encoding speed, so you can choose the right balance for your projects. Our metrics of choice today will be SSIMULACRA2 and XPSNR, used in conjonction with a revised methodology.

AV1 for Dummies

· 18 min read
Gianni Rosato
Maintainer
Simulping
Maintainer / Encoder

AV1 for Dummies is a comprehensive, legible guide on how to get started with AV1 at any experience level. Whether you're on Windows using your first video encoding program, or a seasoned Linux user looking to optimize your encoding pipeline, this guide has you covered.

Encoding Animation with SVT-AV1: A Deep Dive

· 35 min read
Trix
Encoder

This blog post is based on a series of visual quality benchmarks with SSIMULACRA2 and speed benchmarks of SVT-AV1 1.8.0 on a corpus of animated clips.

The resources available will range from graphs to image comparisons (WIP). The former has the advantage of being easily understandable, showcasing pure efficiency comparisons between encoder parameters using metrics as the reference, while the latter are image samples from the encoded files during the tests that enable you to check quality for yourself and add another layer of subjective interpretation to these comparisons.

Reducing Image Load Online

· 11 min read
Gianni Rosato
Maintainer

A big part of understanding any multimedia codec technology is knowing the application for such technology. For images, a big use case is web delivery. Compared to other multimedia, images are incredibly popular on the Web & knowing how to serve them properly can be a massive boon to your website's traffic as well as less of a headache for users on slower connections or who are under bandwidth constraints. The most disappointing part is that images are often poorly done on the web; all too frequently will you run into a site serving massive photographic PNGs for no reason, or photography sites serving photographs fresh out of the editing software with no thought put into their final delivery. A little effort, patience, & knowledge will go a long way toward improving the user experience for individuals using your site, & this article will illustrate some of the basics.