[FFmpeg-devel] [PATCH] Mimic decoder

Ramiro Polla ramiro
Sun Mar 16 20:15:41 CET 2008


Hello,

Michael Niedermayer wrote:
> On Sat, Mar 15, 2008 at 06:47:03PM -0300, Ramiro Polla wrote:
>> Hello,
>>
>> [...]
>>>>>> /*
>>>>>>  * vlc_decode_block
>>>>>>  *
>>>>>>  * De-serialize (reconstruct) a variable length coded 8x8 block.
>>>>>>  */
>>>>>> static int vlc_decode_block(MimicContext *ctx, DCTELEM *block, int 
>>>>>> num_coeffs)
>>>>>> {
>>>>>>     unsigned int pos;
>>>>>>
>>>>>>     memset(block, 0, 64 * sizeof(DCTELEM));
>>>>> clear_blocks() in dsp
>>>> clear_blocks() clears 6*64*sizeof(DCTELEM). On mimic it's only 
>>>> 64*sizeof(DCTELEM). Is there another function that will clear only what I 
>>>> need?
>>> Hmm, not that i remember, anyway you can move the memset to after the idct
>>> that way it will be in the cache when its cleared.
>> Unless I misunderstood you, the blocks are on the stack, so there's no 
>> reason to clear them after the IDCT. If it's ok I'll leave that memset 
>> there.
> 
> Stacks and alignment and gcc ...

Moved to MimicContext to avoid gcc randomness.

> [...]
>>>>> and
>>>>> value= 1 - (1<<num_bits);
>>>>> if(num_bits>1)
>>>>>     value+= get_bits(num_bits-1);
>>>>> if(get_bits1())
>>>>>     value -= value;
>>>>> block[ctx->scantable.permutated[pos]] = value;
>>>>> Or if you think the table matters speedwise then make it static dont
>>>>> duplicate it in every decoder instance.
>>>>           ref   calc  static
>>>> mimic.o : 51868 51304 52192
>>>> avg time: 1.692 1.712 1.728
>>>>
>>>> That's very odd... Shouldn't the static table be at least as fast as the
>>>> current code? 
>>> Yes
>>> If you want to debug it the asm output from gcc would be a start but i
>>> doubt its worth the time spend.
>> With a bigger sample, I got
>>           ref   calc  static
>> avg time: 6.688 6.932 6.744
> 
> And the standard deviations? Mean is only one side of the story ...

The more tests I run, the more innacurate they become.

           ref      calc     static
avg time: 6.819993 6.873992 6.823993
std dev : 0.017436 0.022716 0.022627

What's the best way to run benchmarks?
With no X, I run 20 runs of
./ffmpeg_g -i ../data/y.ml20 -vcodec rawvideo -f rawvideo -r 5 -y /dev/null

Is there a way to just benchmark demuxing+decoding without anything 
else? (besides writing my own simple program).

The input video has less than 5 frames per second, so I use -r 5 to 
avoid outputting 1000 fps.

>>>> Attached are the 2 patches I used to test. (lut_calc.diff and 
>>>> lut_static.diff). Anyways, most FFmpeg stuff introduced a considerable 
>>>> speedup over the original code, so I'm tempted to apply one of these two.
>>>> Could I use the hardcoded tables ifdef there?
>>> I dont see much sense in it. I mean the calc code doesnt need the table at
>>> all and needs around 100 byte while the table needs 1024.
>> If it's ok I'd prefer to keep the current code with vlcdec_lookup in 
>> MimicContext.
> 
> You can keep it if you merge the *qscale into the table as its not
> const then anymore ...

I don't understand how. qscale depends on quality, which can change per 
frame. qscale also depends on pos...

>>> [...]
>>>>         if(vlc == -1)
>>>>             return 0;
>>>>         if(!vlc) /* end-of-block code */
>>>>             return 1;
>>> maybe
>>> if(vlc<=0)
>>>     return !vlc;
>>> would be faster (no iam not suggesting this if its not faster)
>> It's also slower (1.7%), so I'll just leave it as is.
> 
> hmm the end-of-block check should be before the invalid check at least
> because it will be more often true.
> also as you already tried various, what about
> 
> if(vlc<=0)
>     return vlc;
> 
> with the others adjusted accordingly
> 
> or
> 
> pos += vlc & (256+15);
> if(pos >= 64)
>     return 0;
> 
> (that is leaving the end of block check but merging the vlc==-1 check with
> the pos >=64 check)
> This one should be faster in theory of course gcc is unpredictable

Just switched eob check to before -1 as most benchmarks oscillated 
around the same avg+-stddev.

> [...]
>>         /* FFmpeg's IDCT behaves somewhat different from the original code, so
>>          * a factor of 4 was added to the input */
>>
>>         coeff = ctx->vlcdec_lookup[num_bits][value];
>>         if(pos<3)
>>             coeff <<= 4;
>>         else /* TODO Use >> 10 instead of / 1001 */
>>             coeff = (coeff * qscale) / 1001;
> 
> Can you elaborate a little about these 1024 vs. 1001 and 
> "different from the original" ?

 >>10 was suggested by Mans. He said it'd be better to avoid a division 
for each coeff and use shift instead. But then qscale would have to be 
somewhat different so that >>10 would yield the same result as / 1001. I 
didn't get the output to be bit-exact, so I decided to just add that TODO.

"different from the original" is relative to the libmimic IDCT. It 
returns values 4 times greater than FFmpeg's IDCT. So because of that 
the input coeffs were >>2.

> [...]
>> /**
>>  * Internal helper-function used to initialize
>>  * the lookup-table used by the VLC-decoder.
>>  */
>> static void initialize_vlcdec_lookup(int8_t lookup_tbl[8][128])
>> {
>>     int i, j;
>>
>>     for(i = 1 ; i < 8 ; i++) {
>>         int first = (1<<i)-1;
>>         int last = 1<<(i-1);
>>         int cur = first;
> 
> cur - first is redundant remove one please

Removed first.

> [...]
>>     ctx->picture.data[0] = NULL;
>>     if(avctx->get_buffer(avctx, &ctx->picture)) {
>>         av_log(avctx, AV_LOG_ERROR, "get_buffer() failed\n");
>>         return -1;
>>     }
> 
> reference=1 is missing before get_buffer()

Added.

> [...]
>> AVCodec mimic_decoder = {
>>     "mimic",
>>     CODEC_TYPE_VIDEO,
>>     CODEC_ID_MIMIC,
>>     sizeof(MimicContext),
>>     mimic_decode_init,
>>     NULL,
>>     mimic_decode_end,
>>     mimic_decode_frame
>> };
> 
> CODEC_CAP_DR1 is missing

Added.
-------------- next part --------------
An embedded and charset-unspecified text was scrubbed...
Name: mimic.c
URL: <http://lists.mplayerhq.hu/pipermail/ffmpeg-devel/attachments/20080316/18fd79d6/attachment.asc>



More information about the ffmpeg-devel mailing list