[FFmpeg-devel] [PATCH] avcodec/proresenc_anatoliy: change quantization scaling to floating point to utilize vectorization

Rostislav Pehlivanov atomnuker at gmail.com
Wed Feb 28 01:19:43 EET 2018


On 27 February 2018 at 21:22, David Murmann <david.murmann at btf.de> wrote:

>
> On 2/27/2018 9:58 PM, Hendrik Leppkes wrote:
> > On Tue, Feb 27, 2018 at 9:35 PM, David Murmann <david.murmann at btf.de>
> wrote:
> >> Quantization scaling seems to be a slight bottleneck,
> >> this change allows the compiler to more easily vectorize
> >> the loop. This improves total encoding performance in my
> >> tests by about 10-20%.
> >>
> >> Signed-off-by: David Murmann <david at btf.de>
> >> ---
> >>   libavcodec/proresenc_anatoliy.c | 12 ++++++++----
> >>   1 file changed, 8 insertions(+), 4 deletions(-)
> >>
> [...]
> >> +    for (j = 0; j < blocks_per_slice; j++) {
> >> +        for (i = 0; i < 64; i++) {
> >> +            block[i] = (float)in[(j << 6) + i] / (float)qmat[i];
> >> +        }
> >> +
> >> +        for (i = 1; i < 64; i++) {
> >> +            int val = block[progressive_scan[i]];
> >>               if (val) {
> >>                   encode_codeword(pb, run, run_to_cb[FFMIN(prev_run,
> 15)]);
> >
> > Usually, using float is best avoided. Did you test re-factoring the
> > loop structure without changing it to float?
>
> Yes, the vector instructions don't have integer division, AFAIK, and the
> compiler just generates a loop with idivs. This is quite a bit slower
> than converting to float, dividing and converting back, if the compiler
> uses vector instructions. In the general case this wouldn't be exact,
> but since the input values are int16 they should losslessly fit into
> float32. On platforms where this auto-vectorization fails this might
> actually be quite a bit slower, but I have not seen that in my tests
> (though I have only tested on x86_64).
>
> --
> David Murmann
>
> david at btf.de
> Telefon +49 (0) 221 82008710
> Fax +49 (0) 221 82008799
>
> http://btf.de/
>
> --
> btf GmbH | Leyendeckerstr. 27, 50825 Köln | +49 (0) 221 82 00 87 10
> Geschäftsführer: Philipp Käßbohrer & Matthias Murmann | HR Köln | HRB 74707
> _______________________________________________
> ffmpeg-devel mailing list
> ffmpeg-devel at ffmpeg.org
> http://ffmpeg.org/mailman/listinfo/ffmpeg-devel
>

No, you're going about it the wrong way. Floats should most definitely be
avoided in encoders/decoders. Non-deterministic output on platforms is a
smaller issue to how they can obliterate performance if compilers emit an
actual div instruction.

Instead, here's what you can do to make it even faster: replace the
division with a multiply + a shift. Keeps the output identical too. I've
just sent an old patch of mine (for a different but similar codec) you can
work off of - just take the last bit of code there, run it at init to
generate the LUTs for all quantizers and then just multiply and shift by
looking into the tables you generate. Here's the link:
http://ffmpeg.org/pipermail/ffmpeg-devel/2018-February/225867.html


More information about the ffmpeg-devel mailing list