[FFmpeg-devel] [PATCH 1/2] swresample: Refactor resample asm and port it to yasm

Michael Niedermayer michaelni at gmx.at
Thu Mar 20 04:04:02 CET 2014


On Wed, Mar 19, 2014 at 10:16:17PM -0300, James Almer wrote:
> On 19/03/14 9:08 PM, Michael Niedermayer wrote:
> > On Wed, Mar 19, 2014 at 06:45:03PM -0300, James Almer wrote:
> >> This reduces code duplication and makes it easier to implement new asm 
> >> functions in the future
> >>
> >> Signed-off-by: James Almer <jamrial at gmail.com>
> >> ---
> >>  libswresample/resample.c            | 96 ++++++++++---------------------------
> >>  libswresample/resample_template.c   | 49 +++++++------------
> >>  libswresample/swresample_internal.h | 24 ++++++++++
> >>  libswresample/x86/Makefile          |  1 +
> >>  libswresample/x86/resample.asm      | 64 +++++++++++++++++++++++++
> >>  libswresample/x86/resample_mmx.h    | 74 ----------------------------
> >>  libswresample/x86/swresample_x86.c  | 16 +++++++
> >>  7 files changed, 148 insertions(+), 176 deletions(-)
> >>  create mode 100644 libswresample/x86/resample.asm
> >>  delete mode 100644 libswresample/x86/resample_mmx.h
> > 
> > benchmark:
> > 
> > before: 253482 decicycles in resample, 1024 runs, 0 skips
> > after   356545 decicycles in resample, 1024 runs, 0 skips
> > 
> > tested using ffplay HAYLEY\ WESTENRA-WHISPERS\ IN\ A\ DREAM.webm -af aformat=s32,aresample=48000,aformat=s32
> > 
> > 
> 
> Where did you put the timer.h macros? I put them at the beginning and end of 
> the swri_resample_<sampleformat> function/macro in resample_template.c

i had them in the loop in multiple_resample()


> And what about 16bits 44100khz to 16 bits 22050khz (using the sse2 code), which 
> is the one i tried and where i noticed a boost?

i didnt try that one


> 
> Testing a 16bits 44100khz file and using the command you mention above (but with 
> ffmpeg) i get
> 
> before: 2606446 decicycles in resample, 65522 runs, 14 skips
> after:  2642538 decicycles in resample, 65497 runs, 39 skips

interresting


> 
> Which is indeed slower but not nearly as bad as in your test. Though without 
> testing the same files doubt we could get a proper picture.
> 
> Nonetheless, we can drop this patch if it really affects performance that much 
> in some scenarios. I mainly wrote it to reduce the considerable code duplication 
> that exists and that will increase with each asm version added, and to remove 
> arch-specific code that was outside the respective folders.
> 
> I can port the float sse version to inline in that case.
> 
> >> +%if mmsize == 8
> >> +    emms
> >> +%endif
> > 
> > this is not ok
> > emms is slow and does not belong in the inner loop
> 
> This is a problem. Not sure how to make sure to run emms_c() from outside the 
> loop only when an mmx version of scalarproduct is used.

well, it was outside before the patch


[...]
-- 
Michael     GnuPG fingerprint: 9FF2128B147EF6730BADF133611EC787040B0FAB

The real ebay dictionary, page 2
"100% positive feedback" - "All either got their money back or didnt complain"
"Best seller ever, very honest" - "Seller refunded buyer after failed scam"
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 198 bytes
Desc: Digital signature
URL: <http://ffmpeg.org/pipermail/ffmpeg-devel/attachments/20140320/22f8edfa/attachment.asc>


More information about the ffmpeg-devel mailing list