[FFmpeg-devel] [PATCH v2] swscale/output: Altivec-optimize yuv2plane1_8

Carl Eugen Hoyos ceffmpeg at gmail.com
Tue Nov 27 02:11:03 EET 2018


2018-11-27 0:17 GMT+01:00, Carl Eugen Hoyos <ceffmpeg at gmail.com>:
> 2018-11-17 9:12 GMT+01:00, Lauri Kasanen <cand at gmx.com>:
>> ./ffmpeg_g -f rawvideo -pix_fmt rgb24 -s hd1080 -i /dev/zero -pix_fmt
>> yuv420p \
>> -f null -vframes 100 -v error -nostats -
>>
>> 1158 UNITS in planar1,   65528 runs,      8 skips
>>
>> -cpuflags 0
>>
>> 19082 UNITS in planar1,   65533 runs,      3 skips
>>
>> 16.48 speedup ratio. On x86, SSE2 is ~7. Curiously, the Power C version
>> takes as many cycles as the x86 SSE2 version, yikes it's fast.
>>
>> Note that this function uses VSX instructions, but is not marked so.
>> This is because several existing functions also make that mistake.
>> I'll submit a patch moving them once this is reviewed.
>>
>> v2: Remove !BE check
>> Signed-off-by: Lauri Kasanen <cand at gmx.com>
>> ---
>>  libswscale/ppc/swscale_altivec.c | 53
>> ++++++++++++++++++++++++++++++++++++++++
>>  1 file changed, 53 insertions(+)
>>
>> diff --git a/libswscale/ppc/swscale_altivec.c
>> b/libswscale/ppc/swscale_altivec.c
>> index 2fb2337..8c6056d 100644
>> --- a/libswscale/ppc/swscale_altivec.c
>> +++ b/libswscale/ppc/swscale_altivec.c
>> @@ -324,6 +324,53 @@ static void hScale_altivec_real(SwsContext *c,
>> int16_t
>> *dst, int dstW,
>>              }
>>          }
>>  }
>> +
>> +static void yuv2plane1_8_u(const int16_t *src, uint8_t *dest, int dstW,
>> +                           const uint8_t *dither, int offset, int start)
>> +{
>> +    int i;
>> +    for (i = start; i < dstW; i++) {
>> +        int val = (src[i] + dither[(i + offset) & 7]) >> 7;
>> +        dest[i] = av_clip_uint8(val);
>> +    }
>> +}
>> +
>> +static void yuv2plane1_8_altivec(const int16_t *src, uint8_t *dest, int
>> dstW,
>> +                           const uint8_t *dither, int offset)
>> +{
>> +    const int dst_u = -(uintptr_t)dest & 15;
>> +    int i, j;
>> +    LOCAL_ALIGNED(16, int16_t, val, [16]);
>
>> +    const vector uint16_t shifts = (vector uint16_t) {7, 7, 7, 7, 7, 7,
>> 7,
>> 7};
>
> The patch breaks compilation with xlc, sorry for not testing earlier:
> libswscale/ppc/swscale_altivec.c:344:11: error: unknown type name 'vector'
>     const vector uint16_t shifts = (vector uint16_t) {7, 7, 7, 7, 7, 7, 7, 7};

In case this error does not make much sense to you, don't worry too
much, the following change was necessary to make xlc pass rv20-1239:
diff --git a/fftools/ffmpeg_filter.c b/fftools/ffmpeg_filter.c
index 6518d50..fb749c5 100644
--- a/fftools/ffmpeg_filter.c
+++ b/fftools/ffmpeg_filter.c
@@ -744,6 +744,7 @@ static int configure_input_video_filter
     InputFile     *f = input_files[ist->file_index];
     AVRational tb = ist->framerate.num ? av_inv_q(ist->framerate) :
                                          ist->st->time_base;
+if(!ist->framerate.num)tb = ist->st->time_base;
     AVRational fr = ist->framerate;
     AVRational sar;
     AVBPrint args;

;-)

(As expected, other tests also fail.)

Carl Eugen


More information about the ffmpeg-devel mailing list