Skip to main content

Observing SVT-AV1 v2.1.0's improvements: A New Deep Dive

· 36 min read
Trix
Encoder

SVT-AV1, the most scalable AV1 encoder, has received a new update and one may wonder if the old presets recommendation still holds today. We will delve into that in this blog post, based on a series of speed and visual quality benchmarks with SSIMULACRA2 and XPSNR of SVT-AV1 2.1.0 on a corpus of varied animated clips.

Feedback

I unfortunately never got to update the previous blog post with the image comparisons, and some people expressed concerns that this testing may not be representative of live action content. For the former, I will have to ask for your patience again, because this blog post won't initially contain image comparisons either, but this time they are being actively worked on, along with a magnificent comparisons component and this page will get updated once that is done. As for the latter, please be reassured that this testing in its entirety is perfectly representative of any modern content people typically encode: the diversity of japanese animation is rich and the content specifically chosen for this benchmark is relatively complex. From 3DCG to extremely noisy clips, we are far from the easy-to-compress static scenes of some slice-of-life show.

I have also decided to complement this benchmark of another psychovisually-driven metric (XPSNR) so that double-checking is made easier. Thus, each graph possesses a SSIMULACRA2 version and a XPSNR version. Don't hesitate to switch between one another!

Methodology

The resources available will range from graphs to image comparisons (WIP, for real this time). The former has the advantage of being easily understandable, showcasing pure efficiency comparisons between encoder parameters using metrics as the reference, while the latter are image samples from the files encoded during the tests that enable you to check quality for yourself, adding another layer of subjective interpretation to these comparisons.

The testing methodology involves using relatively short video samples from a wide range of modern anime genre, which have been either losslessly encoded with x264 --qp 0 for ease of use or losslessly cut from their source. These lossless files are then pipped into SvtAv1EncApp directly, meaning we are measuring the performance of a single encoder instance and not leveraging chunked encoding like any actual final AV1 encoding pipeline should. Once an encode is done, SSIMULACRA2 scores are calculated using the Zig implementation, XPSNR scores on the other hand are calculated using a ffmpeg filter, and lots of useful data are aggregated to make the graphs for this benchmark, including encoding time, encode size (bitrate), and metrics scores. Bits per pixel scores (BPP) are calculated so that the Metric / BPP graphs may represent the closest we have to real efficiency.

The clips used in this test were acquired legally. The Codec Wiki and its contributors do not endorse media piracy.

SvtAv1EncApp was compiled directly from the v2.0.0 and v2.1.0 source code using the provided Build/linux/build.sh script, Clang 16.0.6, and Profile-Guided Optimization (PGO). The testing machine is comprised of an i3 12100 with 16GB of 3200MHz CL14 DDR4 RAM in Arch Linux with kernel 6.7.7 and the performance governor enabled. All encodes have been made in the same session without rebooting.

This testing was conducted within the AV1 Weeb Edition Discord server, which is focused on encoding animated content in AV1.

Samples

The samples are as follows:

  • 11s Blame! clip which sports 3DCG action with lots of grain, effects and high-contrast elements.
  • 13s Blue Lock clip which sports rapid camera movements, complex geometry and high-contrast elements.
  • 5s Spy x Family first ending sequence with an extremely high amount of dynamic noise. New most complex source of this set.
  • 12s Jigokuraku (Hell's Paradise) flashback clip with huge static grain in a very dark scenery and some action.
  • 5s The Garden of Sinners clean but fast-paced 3DCG scene with explosions.

The resolution of every clip is 1080p, except for the first one which is 1920x804.

All clips have been encoded in a wide quality range, from --crf 6 to --crf 46, by increments of 4.

Without further ado, let's start with the first comparisons!

Presets comparisons (-1 -> 13 12)

In the following graphs, you may find comparisons between all SVT-AV1 presets, ranging from the slowest --preset -1 to the fastest --preset 12.

Yes, you heard that right. Preset 7 and 13 are no more in v2.1.0. This new update, like the previous one, mostly consisted of optimizing the presets trade-offs. The devs have made the choice to map preset 7 to preset 6 and preset 13 to preset 12 due to the lack of spacing between the new presets. We will discuss the implications of this further ahead.

--preset X is the only parameter used here, in conjunction with the CRF values. That means everything else is default. The defaults worth mentioning are:

  • --tune 1: tune PSNR
  • --aq-mode 2: variance deltaq
  • --enable-qm 0: quantisation matrices disabled
  • --irefresh-type 2: closed GOP
  • --enable-tf 1: temporal filtering enabled

And more, like CDEF and restoration enabled, overlays and film-grain disabled...

Efficiency

  • First of all, here are the full efficiency graphs:

This is all very cool, but visually bloated.

  • Now the same graphs but focusing on the "high quality" range (CRF6 -> 22):
  • Same, but now focusing on the "low quality" range (CRF26 -> 46):
  • If we now focus on presets 4 and below, where it's more difficult to discern the differences between presets, we get this at "high quality":
  • And the following at "low quality":

Speed

  • Let's now see speed comparisons between all presets:

Once is not custom, preset -1 is so abysmally slow it makes the graph unusable.

  • Same, but without the placebo preset -1:
  • Lastly, here is what it looks like with a logarithmic scale:

Interpretation

As for interpreting the results, it would seem like preset 2 and preset 4 remain all-around very balanced presets, with preset 3 being in a nice in-between spot, an improvement over v2.0.0's preset 3 in a way. We will better understand the reasons for this when we'll compare the new version to the last, a bit after.

The quality gap between preset 2 and preset 1 is usually pretty narrow, however the speed penalty from going to preset 1 is ~2x, when the penalty of going from preset 3 to preset 2 is closer to ~1.5x. As such, preset 1 enters placebo territory, and considering the very little benefits of going any lower than it, compared to the huge performance loss of even lower presets, I advise you not to waste encoding resources on preset 0 and preset -1. This applies especially at medium to high quality, however at extremely low quality like the CRF40 range, we can still see some small gains from these placebo presets.

When we start talking about faster presets though, things are pretty different from previous versions: presets 5 to 9 behave similarly on the graphs and seem to stand apart from their slower counterparts by just a bit. If you can bear the speed of preset 4, you should definitely be going for it, however if fast encoding is a necessity, for example in the case of realtime transcoding or streaming, presets 5 through 9 will serve you right with great efficiency/speed trade-offs between one another. No preset in that range particularly stands out from the others, so simply pick one depending on your performance needs.

Presets 10 to 12 are pretty inefficient, and to be avoided if possible. They can still be of use in a convex-hull scenario, but in the case of realtime transcoding, you may be better off with a hardware encoder like the ones found in RTX 4000 or Arc GPUs, especially since SVT-AV1's target bitrate mode is even less efficient than CRF mode.

TLDR

The same conclusions as the previous blog post can be made: clear quality gains can be observed as we decrease presets, until preset 2, however the effectiveness of dropping presets is noticeably less and less important as quality is increased.

In the next part, we will evaluate the differences in efficiency and speed of every presets when updating from SVT-AV1 2.0.0 to 2.1.0, which should enable an increase of nuance from the previous results alone.

SVT-AV1 v2.0.0 vs v2.1.0 presets comparisons:

Two months ago, I conducted a similar test to this one to compare the presets evolution between versions 1.8.0 and 2.0.0. The results were pretty unsatisfying: I noticed that presets -1 to 8 in v2.0.0 performed like the old presets 0 to 9 did in v1.8.0. We basically saw an efficiency regression at a given preset, and speedups did not follow suit as well as we would have anticipated. All in all, it wasn't all that bad, it suffice to say you could simply drop a preset from before and you were good to go again. What was more concerning however is that the release note claimed important speedups that did not impact efficiency and my testing proved otherwise. My theory is that due to the dev team testing methodology, which consist of mostly pretty low resolution clips and non-psychovisual metrics like PSNR, SSIM or bad psychovisual metrics like VMAF, it's very well possible they were tricked into thinking they introduced improvements as they tweaked the presets when in reality the metrics simply didn't notice the quality degradation. Such issue is an additional reason why the industry should adopt more competent metrics, ones that better correlate with the human vision, to improve encoders in more impactful ways and better avoid pointless regressions.

So the question for today's testing is: have the SVT-AV1 devs redeemed themselves and actually improved the presets trade-offs this time around? Let's find out!

preset -1: v2.0.0 vs v2.1.0

  • Let's start off with a battle of the placebos, with the efficiency at "high quality":
  • And the efficiency at "low quality":

Yes, this is a bit underwhelming, but you can't just improve the best an encoder has to offer with just tweaking right?

  • Now, let's compare their respective speeds:

Let's be grateful it became ever so slightly faster, I guess.

preset 0: v2.0.0 vs v2.1.0

  • Efficiency graphs, high quality:
  • Efficiency graphs, low quality:

Overall, efficiency wise, this new preset 0 places itself in-between old preset -1 and 0

  • Speed graphs:

Interestingly enough, its speed is much closer to the old preset 0 than to the old preset -1. This means preset 0 was genuinely improved over v2.0.0!

preset 1: v2.0.0 vs v2.1.0

  • Efficiency graphs, high quality:
  • Efficiency graphs, low quality:

In efficiency, this new preset 1 is often equal to old preset 0, else in-between old preset 0 and 1.

  • Speed graphs:

We observe that the new preset is a bit closer to old preset 1 speeds than it is to old preset 0 speeds. Good news!

preset 2: v2.0.0 vs v2.1.0

  • Efficiency graphs, high quality:
  • Efficiency graphs, low quality:

Oh well, that's awkward.

  • Speed graphs:

Speed was left untouched too, meaning preset 2 is unchanged in v2.1.0.

preset 3: v2.0.0 vs v2.1.0

  • Efficiency graphs, high quality:
  • Efficiency graphs, low quality:

The new preset 3's efficiency is the same as the old one.

  • Speed graphs:

However, the preset got slightly faster, so this is a speedup!

preset 4: v2.0.0 vs v2.1.0

  • Efficiency graphs, high quality:
  • Efficiency graphs, low quality:

We can observe that preset 4 got slightly to moderately worse efficiency wise.

  • Speed graphs:

Fortunately, the consequence of that slight efficiency decrease is a big performance improvement!

preset 5: v2.0.0 vs v2.1.0

  • Efficiency graphs, high quality:
  • Efficiency graphs, low quality:

Preset 5 seems to have gotten ever so slightly worse efficiency wise.

  • Speed graphs:

Yet it became slightly faster, this is overall a good trade-off.

preset 6: v2.0.0 vs v2.1.0

  • Efficiency graphs, high quality:
  • Efficiency graphs, low quality:

The new preset 6 has a huge responsibility: being able to compensate in the absence of its preset 7 sibling. It seems to performs in-between old preset 6 and 7, usually closer to old 7.

  • Speed graphs:

Preset 6 is now ever so slightly slower to old 7, this is an interesting trade-off, overall a win over old 7.

preset 7: v2.0.0 vs v2.1.0

Again, there is no preset 7. Actually, it's preset 6 that disappeared but I'm not remaking the graphs just for fun. If you select preset 6, you will be granted the following message: Svt[warn]: Preset M6 is mapped to M7.

always_has_been

preset 8: v2.0.0 vs v2.1.0

  • Efficiency graphs, high quality:
  • Efficiency graphs, low quality:

In efficiency, this new preset 8 is sometimes equal or slightly worse to the old 8, and sometimes equal or slightly worse than old 7...

  • Speed graphs:

Overall, the speed is pretty much unchanged from old 8. It looks like a slight regression, that's pretty disappointing.

preset 9: v2.0.0 vs v2.1.0

  • Efficiency graphs, high quality:
  • Efficiency graphs, low quality:

The new preset 9 is the same as ever, ever so slightly better in some scenario but nothing groundbreaking.

  • Speed graphs:

Its speed remains the same, sometimes ever so slightly slower. Basically the preset is pretty much unchanged, which may as well be a relief, as the last usable preset of the encoder.

preset 10: v2.0.0 vs v2.1.0

  • Efficiency graphs, high quality:
  • Efficiency graphs, low quality:

Preset 10 is slightly to moderately worse efficiency wise.

  • Speed graphs:

Its speed is mostly the same, sometimes ever so slightly faster. It's a wash, avoid this preset at all costs!

preset 11: v2.0.0 vs v2.1.0

  • Efficiency graphs, high quality:
  • Efficiency graphs, low quality:

Preset 11's efficiency is untouched.

  • Speed graphs:

Preset 11's speed is unchanged as well.

preset 12: v2.0.0 vs v2.1.0

  • Efficiency graphs, high quality:
  • Efficiency graphs, low quality:

Just as preset 6, preset 12 is now mapped to 13, and unsurprisingly, its efficiency is equal to old 13.

  • Speed graphs:

The speeds seem to be in-between old 12 and 13, so potentially a slight speedup. Still, nothing to get excited at.

TLDR

From these extensive comparisons, it appears that some presets have received genuine improvements in their respective efficiency/speed trade-off. Some presets, like -1, 0, 1 and 3, received the most significant improvements, followed by 4, 5 and 6 with overall beneficial new trade-offs. Preset 12 got slightly faster too. On the other hand, presets 8 and 10 seemed to have regressed slightly, and presets 2 and 9 are perfectly unchanged from v2.0.0.

Conclusion

SVT-AV1 2.1.0 introduced some welcomed improvements. Presets 2 through 4 remain the king of optimal AV1 encoding, while presets 5 through 9 stand as good options for the people that find 2-4 to be too slow for their liking.

Let's be honest a second, not much as changed in SVT-AV1 since the first blog post. There was no need to redo all the parameter testing for the simple reason that their behavior remained the same, as did the conclusions drawn from them. I hope this article wasn't disappointing in a sense... Still, be reassured, this was just an appetizer, there will be more in the near future!

By the way, did you know that the SVT-AV1-PSY project was initiated a few months ago? Its defaults were tailored according to the testing done in the last blog post, allowing a free efficiency boost for anyone not keen to tweak their encoders. Furthermore, SVT-AV1-PSY introduced a sharpness parameter to control distortion, a quarter-step quantizer for more CRF precision, a new subjective SSIM tune, Dolby Vision support, frame luma bias, and some other knobs to improve the appeal and consistency of your encodes. It is actively maintained by a group of talented people, including the main dev of the aom-av1-lavish fork of aomenc. Some of the changes are being backported to mainline SVT-AV1 due to the increased interest of the mainline devs. Please check it out!

Hopefully, this comprehensive second deep dive should give you a helpful new starting point for choosing settings when encoding with the latest SVT-AV1(-PSY) 2.1.0.

Future

My plans for the future regarding the blog post include:

  • polishing this blog post and aggrementing it of image comparisons.
  • a follow-up article in the relatively near future about giving you encoding tips and explaining common AV1 encoding knowlegde, for instance showcasing why film grain synthesis is a game-changer or why chunked encoding can prove beneficial to your encoding pipeline.
  • an article focused on observing the evolution of SVT-AV1 since the beginning of its development, as well as comparisons with current aomenc, rav1e and SVT-AV1-PSY, including a quick look at the current state of AVM (development ground for AV2) in comparison to VVC's state.

Thanks for reading!