[FFmpeg-devel] [PATCH v5] libswscale/ppc: VSX-optimize 9-16 bit yuv2planeX

Michael Niedermayer michael at niedermayer.cc
Sun Jan 13 00:48:34 EET 2019


On Sat, Jan 12, 2019 at 06:25:57PM +0200, Lauri Kasanen wrote:
> On Sat, 12 Jan 2019 14:52:07 +0100
> Michael Niedermayer <michael at niedermayer.cc> wrote:
> 
> > On Sat, Jan 12, 2019 at 10:47:50AM +0200, Lauri Kasanen wrote:
> > > ./ffmpeg_g -f rawvideo -pix_fmt rgb24 -s hd1080 -i /dev/zero -pix_fmt yuv420p16be \
> > > -s 1920x1728 -f null -vframes 100 -v error -nostats -
> > > 
> > > 9-14 bit funcs get about 6x speedup, 16-bit gets about 15x.
> > > Fate passes, each format tested with an image to video conversion.
> > > 
> > > Only POWER8 includes 32-bit vector multiplies, so POWER7 is locked out
> > > of the 16-bit function. This includes the vec_mulo/mule functions too,
> > > not just vmuluwm.
> > > 
> > > yuv420p9le
> > >   12341 UNITS in planarX,  130976 runs,     96 skips
> > >   73752 UNITS in planarX,  131066 runs,      6 skips
> > > yuv420p9be
> > >   12364 UNITS in planarX,  131025 runs,     47 skips
> > >   73001 UNITS in planarX,  131055 runs,     17 skips
> > > yuv420p10le
> > >   12386 UNITS in planarX,  131042 runs,     30 skips
> > >   72735 UNITS in planarX,  131062 runs,     10 skips
> > > yuv420p10be
> > >   12337 UNITS in planarX,  131045 runs,     27 skips
> > >   72734 UNITS in planarX,  131057 runs,     15 skips
> > > yuv420p12le
> > >   12236 UNITS in planarX,  131058 runs,     14 skips
> > >   73029 UNITS in planarX,  131062 runs,     10 skips
> > > yuv420p12be
> > >   12218 UNITS in planarX,  130973 runs,     99 skips
> > >   72402 UNITS in planarX,  131069 runs,      3 skips
> > > yuv420p14le
> > >   12168 UNITS in planarX,  131067 runs,      5 skips
> > >   72480 UNITS in planarX,  131069 runs,      3 skips
> > > yuv420p14be
> > >   12358 UNITS in planarX,  130948 runs,    124 skips
> > >   73772 UNITS in planarX,  131063 runs,      9 skips
> > > yuv420p16le
> > >   10439 UNITS in planarX,  130911 runs,    161 skips
> > >  157923 UNITS in planarX,  131068 runs,      4 skips
> > > yuv420p16be
> > >   10463 UNITS in planarX,  130874 runs,    198 skips
> > >  154405 UNITS in planarX,  131061 runs,     11 skips
> > 
> > The number of skips in the benchmark is much larger on one
> > side. That way the numbers become hard to compare as
> > more cases aer skipped on one side
> > 
> > please adjust the parameters so the skip counts are compareable
> > or redo the tests until the numbers are more similar
> > thanks
> 
> How do I do that? It's a VM, so there are going to be pauses no matter
> what, when other VMs run. Or should I take the largest run count with
> about the same skips?

I would try to adjust TIMER_REPORT so that either VM switches
are skiped on both sides of the test reliably or that they are never
skipped. The idea is to do the same to both so theres no asymetry
from differntly successfull skips

thx

[...]
-- 
Michael     GnuPG fingerprint: 9FF2128B147EF6730BADF133611EC787040B0FAB

Asymptotically faster algorithms should always be preferred if you have
asymptotical amounts of data
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 181 bytes
Desc: not available
URL: <http://ffmpeg.org/pipermail/ffmpeg-devel/attachments/20190112/f5fcb08c/attachment.sig>


More information about the ffmpeg-devel mailing list