[FFmpeg-devel] [PATCH] FLAC parser

Fri Oct 22 01:15:23 CEST 2010

On Thu, Oct 21, 2010 at 06:39:45PM +0200, Michael Chinen wrote:
> On Thu, Oct 21, 2010 at 1:05 AM, Michael Niedermayer <michaelni at gmx.at> wrote:
> > On Wed, Oct 20, 2010 at 03:38:14PM +0200, Michael Chinen wrote:
> >> Hi,
> >>
> >> On Tue, Oct 19, 2010 at 2:48 PM, Michael Niedermayer <michaelni at gmx.at> wrote:
> >> >[...]
> >> >> I did profiling again and it turns out I missed one exit point for the
> >> >> function the last time. ?The non-flat wrap buffer version is about
> >> >> 2-4% faster overall. ?I've squashed it into the 0003.
> >> >
> >> >what is the speed difference between current svn and after this patch ?
> >>
> >> I used the -benchmark flag for 'ffmpeg -i fourminsong.flac a.wav' and
> >> five runs and got
> >> without patch: utime = 2.044-2.042s
> >> with patch: ? ?utime = 2.363-2.379s
> >>
> >> So flac demuxing with the parser is slower.
> >
> > its not a problem when the parser is needed, like for -acodec copy but when
> > it is not needed then a 15% slowdown is a problem. That said it of course
> > would be nicer if it was faster than that even when needed
> >
> >
> >
> > [...]
> >> >
> >> >
> >> > [...]
> >> >> +static int find_headers_search(FLACParseContext *fpc, uint8_t *buf, int buf_size,
> >> >> + ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? int search_start)
> >> >> +
> >> >> +{
> >> >> + ? ?FLACFrameInfo fi;
> >> >> + ? ?int size = 0, i;
> >> >> + ? ?uint8_t *header_buf;
> >> >> +
> >> >> + ? ?for (i = 0; i < buf_size - 1; i++) {
> >> >> + ? ? ? ?if ((AV_RB16(buf + i) & 0xFFFE) == 0xFFF8) {
> >> >
> >> > something based on testing several positions at once is likely faster
> >> > like
> >> > x= AV_RB32()
> >> > (x & ~(x+0x01010101))&0x80808080
> >> > that will detect 0xFF bytes and only after that testing the 4 positions for
> >> > FFF8
> >>
> >> Hmm. ?Since in both cases (header there/header not there) this will
> >> require more masks on a 2 byte int how will it be faster?
> >> Also since it is 15 bits that we are looking for is the 32 bit
> >> handling a mistake?
> >
> > the code is executed 4 times less often than your 2 byte masking
> > see ff_avc_find_startcode_internal() for something quite similar
> 
> Thanks I now see the light - I didn't see at first you meant to
> process in 4 byte chunks.
> It is about 2-3x faster with the multiple byte processing:
> fastest without multibyte processing:
> 357748 dezicycles in with more pos testing, 16337 runs, 47 skips
> slowest with:
> 127551 dezicycles in with more pos testing, 15364 runs, 1020 skips
> 
> (of course there are more skips so it is harder to profile)
> 
> The -benchmark utime dropped down to a range of
> utime = 2.232-2.236
> vs the prepatch:
> utime = 2.049-2.058
> 
> so now it is a slowdown of about 10%.

what amount of that is the startcode search and what amount is the crc16 check?
note START/STOP_TIMER can easily be used to test this

[...]
-- 
Michael     GnuPG fingerprint: 9FF2128B147EF6730BADF133611EC787040B0FAB

In fact, the RIAA has been known to suggest that students drop out
of college or go to community college in order to be able to afford
settlements. -- The RIAA
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 198 bytes
Desc: Digital signature
URL: <http://lists.mplayerhq.hu/pipermail/ffmpeg-devel/attachments/20101022/88cf5c54/attachment.pgp>