[FFmpeg-devel] [PATCH v3] Improved the performance of 1 decode + N filter graphs and adaptive bitrate.

Michael Niedermayer michael at niedermayer.cc
Thu Jan 17 14:30:05 EET 2019


On Wed, Jan 16, 2019 at 04:17:07PM -0500, Shaofei Wang wrote:
> With new option "-abr_pipeline"
> It enabled multiple filter graph concurrency, which bring obove about
> 4%~20% improvement in some 1:N scenarios by CPU or GPU acceleration
> 
> Below are some test cases and comparison as reference.
> (Hardware platform: Intel(R) Core(TM) i7-6700 CPU @ 3.40GHz)
> (Software: Intel iHD driver - 16.9.00100, CentOS 7)
> 
> For 1:N transcode by GPU acceleration with vaapi:
> ./ffmpeg -vaapi_device /dev/dri/renderD128 -hwaccel vaapi \
>     -hwaccel_output_format vaapi \
>     -i ~/Videos/1920x1080p_30.00_x264_qp28.h264 \
>     -vf "scale_vaapi=1280:720" -c:v h264_vaapi -f null /dev/null \
>     -vf "scale_vaapi=720:480" -c:v h264_vaapi -f null /dev/null \
>     -abr_pipeline
> 
>     test results:
>                 2 encoders 5 encoders 10 encoders
>     Improved       6.1%    6.9%       5.5%
> 
> For 1:N transcode by GPU acceleration with QSV:
> ./ffmpeg -hwaccel qsv -c:v h264_qsv \
>     -i ~/Videos/1920x1080p_30.00_x264_qp28.h264 \
>     -vf "scale_qsv=1280:720:format=nv12" -c:v h264_qsv -f null /dev/null \
>     -vf "scale_qsv=720:480:format=nv12" -c:v h264_qsv -f null /dev/null
> 
>     test results:
>                 2 encoders  5 encoders 10 encoders
>     Improved       6%       4%         15%
> 
> For Intel GPU acceleration case, 1 decode to N scaling, by QSV:
> ./ffmpeg -hwaccel qsv -c:v h264_qsv \
>     -i ~/Videos/1920x1080p_30.00_x264_qp28.h264 \
>     -vf "scale_qsv=1280:720:format=nv12,hwdownload" -pix_fmt nv12 -f null /dev/null \
>     -vf "scale_qsv=720:480:format=nv12,hwdownload" -pix_fmt nv12 -f null /dev/null
> 
>     test results:
>                 2 scale  5 scale   10 scale
>     Improved       12%     21%        21%
> 
> For CPU only 1 decode to N scaling:
> ./ffmpeg -i ~/Videos/1920x1080p_30.00_x264_qp28.h264 \
>     -vf "scale=1280:720" -pix_fmt nv12 -f null /dev/null \
>     -vf "scale=720:480" -pix_fmt nv12 -f null /dev/null \
>     -abr_pipeline
> 
>     test results:
>                 2 scale  5 scale   10 scale
>     Improved       25%    107%       148%
> 
> Signed-off-by: Wang, Shaofei <shaofei.wang at intel.com>
> Reviewed-by: Zhao, Jun <jun.zhao at intel.com>
> ---
>  fftools/ffmpeg.c        | 228 ++++++++++++++++++++++++++++++++++++++++++++----
>  fftools/ffmpeg.h        |  15 ++++
>  fftools/ffmpeg_filter.c |   4 +
>  fftools/ffmpeg_opt.c    |   6 +-
>  4 files changed, 237 insertions(+), 16 deletions(-)

Looking at this i see alot of duplicated code and alot of ifdefs

if i look at one of the duplicated functions i see:

@@ -1,10 +1,11 @@
-static int reap_filters(int flush)
+static int pipeline_reap_filters(int flush, InputFilter * ifilter)
 {
     AVFrame *filtered_frame = NULL;
     int i;
 
-    /* Reap all buffers present in the buffer sinks */
     for (i = 0; i < nb_output_streams; i++) {
+        if (ifilter == output_streams[i]->filter->graph->inputs[0]) break;
+    }
         OutputStream *ost = output_streams[i];
         OutputFile    *of = output_files[ost->file_index];
         AVFilterContext *filter;
@@ -12,7 +13,7 @@
         int ret = 0;
 
         if (!ost->filter || !ost->filter->graph->graph)
-            continue;
+        return 0;
         filter = ost->filter->filter;
 
         if (!ost->initialized) {
@@ -25,9 +26,8 @@
             }
         }
 
-        if (!ost->filtered_frame && !(ost->filtered_frame = av_frame_alloc())) {
+    if (!ost->filtered_frame && !(ost->filtered_frame = av_frame_alloc()))
             return AVERROR(ENOMEM);
-        }
         filtered_frame = ost->filtered_frame;
 
         while (1) {
@@ -97,7 +97,6 @@
 
             av_frame_unref(filtered_frame);
         }
-    }
 
     return 0;
 }
\ No newline at end of file


This is basically the same just copy and pasted 2 lines changed, one
unrelated cosmetic change and code calling it outside under ifdef

This is not ok

also IIRC nicolas knows this part of the codebase best so it probably
makes sense when he comments. But as far as my oppionion
goes, i would prefer to avoid duplicate codepathes or ifdefs.
They have alot of disadvantages making maintaince harder
also making testing harder as only one of several alternative pathes
would be tested in each individual test, ...

So what i really would like to see is this being done in a cleaner
way. Preferably one codepath when possible, and best results by default 
no need to manually enable the fast path.

Also the question of scalability should be considered. Not saying
that requires any change but it should be given a thought what
happens if there are 1000 or 1 output and if the change makes sense
for such cases too.

thanks

[...]
-- 
Michael     GnuPG fingerprint: 9FF2128B147EF6730BADF133611EC787040B0FAB

Dictatorship naturally arises out of democracy, and the most aggravated
form of tyranny and slavery out of the most extreme liberty. -- Plato
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 181 bytes
Desc: not available
URL: <http://ffmpeg.org/pipermail/ffmpeg-devel/attachments/20190117/bb1ae986/attachment.sig>


More information about the ffmpeg-devel mailing list