[FFmpeg-devel] [PATCH] Mimic decoder

Ramiro Polla ramiro
Sat Mar 15 22:47:03 CET 2008


Hello,

[...]
>>>> /*
>>>>  * vlc_decode_block
>>>>  *
>>>>  * De-serialize (reconstruct) a variable length coded 8x8 block.
>>>>  */
>>>> static int vlc_decode_block(MimicContext *ctx, DCTELEM *block, int 
>>>> num_coeffs)
>>>> {
>>>>     unsigned int pos;
>>>>
>>>>     memset(block, 0, 64 * sizeof(DCTELEM));
>>> clear_blocks() in dsp
>> clear_blocks() clears 6*64*sizeof(DCTELEM). On mimic it's only 
>> 64*sizeof(DCTELEM). Is there another function that will clear only what I 
>> need?
> 
> Hmm, not that i remember, anyway you can move the memset to after the idct
> that way it will be in the cache when its cleared.

Unless I misunderstood you, the blocks are on the stack, so there's no 
reason to clear them after the IDCT. If it's ok I'll leave that memset 
there.

[...]
>>>>         value = get_bits(&ctx->gb, num_bits);
>>>>
>>>>         block[ctx->scantable.permutated[pos]] = 
>>>> ctx->vlcdec_lookup[num_bits][value];
>>> you are missing a check for pos being < 64 here
>> Ok. It slowed down a bit though.
>> MSN Messenger sets coefs to 28. The max ammount of pos+= is 14, so pos will 
>> never be higher than 41 for all MSN Messenger created videos. Is there 
>> something that can be done to make the check know it will most likely 
>> always be < 64? (besides profiling)
> 
> If the check really annoys you that much then theres a way to get rid of it.
> Simply make block 256 entries large, so the uint8_t scantable cannot cause
> out of array writes. But this must be carefully documented so noone
> simplifies it away thinking the size being unneeded.
> 
> or, make the End of block code vlc=256 instead of =0
> and use
> 
> pos += vlc & (256+15);
> 
> if(pos >= 64)
>     return vlc==256;
> 
> (yes this also avoids the -1 check)

4.7% slower.

>>> and
>>> value= 1 - (1<<num_bits);
>>> if(num_bits>1)
>>>     value+= get_bits(num_bits-1);
>>> if(get_bits1())
>>>     value -= value;
>>> block[ctx->scantable.permutated[pos]] = value;
>>> Or if you think the table matters speedwise then make it static dont
>>> duplicate it in every decoder instance.
>>           ref   calc  static
>> mimic.o : 51868 51304 52192
>> avg time: 1.692 1.712 1.728
>>
>> That's very odd... Shouldn't the static table be at least as fast as the
>> current code? 
> 
> Yes
> If you want to debug it the asm output from gcc would be a start but i
> doubt its worth the time spend.

With a bigger sample, I got
           ref   calc  static
avg time: 6.688 6.932 6.744

>> Attached are the 2 patches I used to test. (lut_calc.diff and 
>> lut_static.diff). Anyways, most FFmpeg stuff introduced a considerable 
>> speedup over the original code, so I'm tempted to apply one of these two. 
> 
>> Could I use the hardcoded tables ifdef there?
> 
> I dont see much sense in it. I mean the calc code doesnt need the table at
> all and needs around 100 byte while the table needs 1024.

If it's ok I'd prefer to keep the current code with vlcdec_lookup in 
MimicContext.

> [...]
>>         if(vlc == -1)
>>             return 0;
>>         if(!vlc) /* end-of-block code */
>>             return 1;
> 
> maybe
> if(vlc<=0)
>     return !vlc;
> 
> would be faster (no iam not suggesting this if its not faster)

It's also slower (1.7%), so I'll just leave it as is.

[...]
>> static int decode(MimicContext *ctx, int quality, int num_coeffs,
>>                   int is_pframe)
[...]
>>     for(plane = 0; plane < 3; plane++) {
[...]
>>         for(y = 0 ; y < ctx->num_vblocks[plane] ; y++) {
[...]
>>             for(x = 0; x < ctx->num_hblocks[plane]; x++) {
[...]
>>                 if(!is_pframe || get_bits1(&ctx->gb) == is_chroma) {
[...]
>>                     if(is_chroma || !is_pframe || !get_bits1(&ctx->gb)) {
[...]
>>                     } else {
>>                         unsigned int backref;
>>                         AVPicture back_frame;
>>                         /* Yes: read the backreference (4 bits) and copy. */
>>                         backref = get_bits(&ctx->gb, 4);
> 
>>                         prepare_avpic(ctx, &back_frame, (AVPicture*)
>>                               &ctx->buf_ptrs[(ctx->ptr_index + backref) & 15]);
> 
> It would be faster if this wouldnt be done per block

Now the backreference frames are also kept flipped in MimicContext.

> [...]
>> /*
>>  * initialize_vlcdec_lookup
>>  *
>>  * Internal helper-function used to initialize
>>  * the lookup-table used by the VLC-decoder.
>>  */
> 
> not doxygen compatible

Fixed.

> [...]
> 
>>     if(!(is_pframe && !ctx->prev_frame.data[0])) {
>>         int swap_buf_size = buf_size - MIMIC_HEADER_SIZE;
>>         ctx->swap_buf = av_fast_realloc(ctx->swap_buf, &ctx->swap_buf_size,
>>                                  swap_buf_size + FF_INPUT_BUFFER_PADDING_SIZE);
>>         if(!ctx->swap_buf)
>>             return AVERROR_NOMEM;
>>
>>         ctx->dsp.bswap_buf((uint32_t*)ctx->swap_buf,
>>                            (const uint32_t*) (buf + MIMIC_HEADER_SIZE),
>>                            swap_buf_size>>2);
>>         init_get_bits(&ctx->gb, ctx->swap_buf, swap_buf_size << 3);
>>
>>         if(!decode(ctx, quality, num_coeffs, is_pframe))
>>             goto fail;
>>     } else {
>>         av_log(avctx, AV_LOG_ERROR, "decoding must start with keyframe\n");
>>         goto fail;
>>     }
> 
> Maybe get_buffer() could be called for the previous frame if its missing?
> Not sure if this makes much sense, that is if frames can be lost ...

This only fails if the stream doesn't start decoding at the beginning, 
so there are still no backreferences.

Ramiro Polla
-------------- next part --------------
An embedded and charset-unspecified text was scrubbed...
Name: mimic.c
URL: <http://lists.mplayerhq.hu/pipermail/ffmpeg-devel/attachments/20080315/61973701/attachment.asc>



More information about the ffmpeg-devel mailing list