[FFmpeg-devel] [PATCH] JPEG2000: SSE optimisation of DWT decoding

Clément Bœsch u at pkh.me
Fri Aug 11 19:23:27 EEST 2017

On Fri, Aug 11, 2017 at 06:32:37PM +0300, Ivan Kalvachev wrote:
> On 8/10/17, maxime taisant <maximetaisant at hotmail.fr> wrote:
> >> From: Ivan Kalvachev <ikalvachev at gmail.com>
> >> On 8/8/17, maxime taisant <maximetaisant at hotmail.fr> wrote:
> >> > From: Maxime Taisant <maximetaisant at hotmail.fr>
> >> >
> >> > Hi,
> >> >
> >> > Here is some SSE optimisations for the dwt function used to decode JPEG2000.
> >> > I tested this code by using the time command while reading a JPEG2000
> >> > encoded video with ffmpeg and, on average, I observed a 4.05% general
> >> > improvement, and a 12.67% improvement on the dwt decoding part alone.
> BTW, forgot to tell you that FFmpeg has its own benchmarking macros
> that counts the cpu cycles that it takes for a execution of block of C code.
> Use it (in .c files) like this
> #include "libavutil/timer.h"
>     {
>     function();
>     STOP_TIMER("function")
>    }
> The functions would output the results to stderr,
> and they would use the name you provide to them.
> (So you can benchmark more than one thing at a time).
> Make sure the function(s) runs a lot per benchmark run.
> The macro would show results at log2 measures.
> Do more (3 or 5) separate benchmark runs, since
> the final results always slightly differs. (function could
> be interrupted, and if detected the measurement
> would be discarded/skipped).
> Also, try the macro without any function inside,
> so you have the "NULL" function. The stop_timer
> has an instruction fence opcode, that blocks until
> all prior microcodes are executed and this could
> take a while. Your benchmarks could never get
> faster than the NULL function.

j2k is already in checkasm; I'd suggest to integrate test(s) for the
functions in that place (tests/checkasm/jpeg2000dsp.c).

  make checkasm && tests/checkasm/checkasm --test=jpeg2000dsp --bench

This will make use of the {START,STOP}_TIMER code and provide clean
performance comparisons which you can share here.

Clément B.
