[FFmpeg-devel] [PATCH] Improved the performance of 1 decode + N filter graphs and adaptive bitrate.

Michael Niedermayer michael at niedermayer.cc
Tue Mar 26 23:24:11 EET 2019


On Tue, Mar 26, 2019 at 06:07:21PM -0400, Shaofei Wang wrote:
> It enabled MULTIPLE SIMPLE filter graph concurrency, which bring above about
> 4%~20% improvement in some 1:N scenarios by CPU or GPU acceleration
> 
> Below are some test cases and comparison as reference.
> (Hardware platform: Intel(R) Core(TM) i7-6700 CPU @ 3.40GHz)
> (Software: Intel iHD driver - 16.9.00100, CentOS 7)
> 
> For 1:N transcode by GPU acceleration with vaapi:
> ./ffmpeg -vaapi_device /dev/dri/renderD128 -hwaccel vaapi \
>     -hwaccel_output_format vaapi \
>     -i ~/Videos/1920x1080p_30.00_x264_qp28.h264 \
>     -vf "scale_vaapi=1280:720" -c:v h264_vaapi -f null /dev/null \
>     -vf "scale_vaapi=720:480" -c:v h264_vaapi -f null /dev/null
> 
>     test results:
>                 2 encoders 5 encoders 10 encoders
>     Improved       6.1%    6.9%       5.5%
> 
> For 1:N transcode by GPU acceleration with QSV:
> ./ffmpeg -hwaccel qsv -c:v h264_qsv \
>     -i ~/Videos/1920x1080p_30.00_x264_qp28.h264 \
>     -vf "scale_qsv=1280:720:format=nv12" -c:v h264_qsv -f null /dev/null \
>     -vf "scale_qsv=720:480:format=nv12" -c:v h264_qsv -f null /dev/null
> 
>     test results:
>                 2 encoders  5 encoders 10 encoders
>     Improved       6%       4%         15%
> 
> For Intel GPU acceleration case, 1 decode to N scaling, by QSV:
> ./ffmpeg -hwaccel qsv -c:v h264_qsv \
>     -i ~/Videos/1920x1080p_30.00_x264_qp28.h264 \
>     -vf "scale_qsv=1280:720:format=nv12,hwdownload" -pix_fmt nv12 -f null /dev/null \
>     -vf "scale_qsv=720:480:format=nv12,hwdownload" -pix_fmt nv12 -f null /dev/null
> 
>     test results:
>                 2 scale  5 scale   10 scale
>     Improved       12%     21%        21%
> 
> For CPU only 1 decode to N scaling:
> ./ffmpeg -i ~/Videos/1920x1080p_30.00_x264_qp28.h264 \
>     -vf "scale=1280:720" -pix_fmt nv12 -f null /dev/null \
>     -vf "scale=720:480" -pix_fmt nv12 -f null /dev/null
> 
>     test results:
>                 2 scale  5 scale   10 scale
>     Improved       25%    107%       148%
> 
> Signed-off-by: Wang, Shaofei <shaofei.wang at intel.com>
> ---
> The patch will only effect on multiple SIMPLE filter graphs pipeline,
> Passed fate and refine the possible data race,
> AFL tested, without introducing extra crashs/hangs
> 
>  fftools/ffmpeg.c | 172 +++++++++++++++++++++++++++++++++++++++++++++++++------
>  fftools/ffmpeg.h |  13 +++++
>  2 files changed, 169 insertions(+), 16 deletions(-)

this still fails with some (fuzzed) samples
valgrind does not seem to produce much usefull though
...
Error while decoding stream #0:0: Invalid data found when processing input
[rv40 @ 0x25829f40] marking unfished frame as finished
[rv40 @ 0x25829f40] concealing 27 DC, 27 AC, 27 MV errors in B frame
[rv40 @ 0x2584f000] Slice indicates MB offset 142, got 140
[rv40 @ 0x2584f000] Dquant for P-frame
[rv40 @ 0x2584f000] concealing 2 DC, 2 AC, 2 MV errors in P frame
Error while decoding stream #0:0: Invalid data found when processing input
[rv40 @ 0x256dfb40] Slice indicates MB offset 140, got 135
[rv40 @ 0x256dfb40] concealing 5 DC, 5 AC, 5 MV errors in B frame
[rv40 @ 0x25701940] concealing 112 DC, 112 AC, 112 MV errors in P frame
Error while decoding stream #0:0: Invalid data found when processing input=N/A    
    Last message repeated 1 times
[rv40 @ 0x25804e80] First slice header is incorrect
[rv40 @ 0x256dfb40] Slice indicates MB offset 187, got 140
[rv40 @ 0x256dfb40] Dquant for P-frame
[rv40 @ 0x256dfb40] concealing 47 DC, 47 AC, 47 MV errors in P frame
[rv40 @ 0x25726a00] Dquant for B-frame
[rv40 @ 0x25770b80] concealing 115 DC, 115 AC, 115 MV errors in P frame
Error while decoding stream #0:0: Invalid data found when processing input=N/A    
[rv40 @ 0x25804e80] Dquant for P-frame
[rv40 @ 0x257dfdc0] Dquant for P-framee=-577014:32:22.77 bitrate=N/A speed=N/A    
[rv40 @ 0x25804e80] Dquant for B-frame
Too many packets buffered for output stream 0:0.
pthread_join failed with error: Resource deadlock avoided

gdb output:
pthread_join failed with error: Resource deadlock avoided

Program received signal SIGABRT, Aborted.
[Switching to Thread 0x7fffe4def700 (LWP 13350)]
0x00007fffefeaec37 in raise () from /lib/x86_64-linux-gnu/libc.so.6
(gdb) bt
Python Exception <type 'exceptions.ImportError'> No module named gdb.frames: 
#0  0x00007fffefeaec37 in raise () from /lib/x86_64-linux-gnu/libc.so.6
#1  0x00007fffefeb2028 in abort () from /lib/x86_64-linux-gnu/libc.so.6
#2  0x000000000042bea1 in strict_pthread_join (thread=140737033205504, value_ptr=0x0) at ./libavutil/thread.h:55
#3  0x000000000042d342 in ffmpeg_cleanup (ret=1) at fftools/ffmpeg.c:526
#4  0x00000000004250d8 in exit_program (ret=1) at fftools/cmdutils.c:139
#5  0x000000000042dfd4 in write_packet (of=0x231b780, pkt=0x7fffe4dee680, ost=0x237f680, unqueue=0) at fftools/ffmpeg.c:738
#6  0x000000000042eb39 in output_packet (of=0x231b780, pkt=0x7fffe4dee680, ost=0x237f680, eof=0) at fftools/ffmpeg.c:903
#7  0x0000000000430c30 in do_video_out (of=0x231b780, ost=0x237f680, next_picture=0x7fffa4005200, sync_ipts=98940.800010681152) at fftools/ffmpeg.c:1337
#8  0x0000000000431986 in reap_filters (flush=1, ifilter=0x22d2c00) at fftools/ffmpeg.c:1533
#9  0x0000000000434dc8 in filter_pipeline (arg=0x22d2c00) at fftools/ffmpeg.c:2318
#10 0x00007ffff0249184 in start_thread () from /lib/x86_64-linux-gnu/libpthread.so.0
#11 0x00007fffeff7603d in clone () from /lib/x86_64-linux-gnu/libc.so.6


[...]
-- 
Michael     GnuPG fingerprint: 9FF2128B147EF6730BADF133611EC787040B0FAB

Those who are too smart to engage in politics are punished by being
governed by those who are dumber. -- Plato 
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 181 bytes
Desc: not available
URL: <http://ffmpeg.org/pipermail/ffmpeg-devel/attachments/20190326/c34716ce/attachment.sig>


More information about the ffmpeg-devel mailing list