[FFmpeg-devel] [PATCH] Mimic decoder

Sat Mar 15 04:29:23 CET 2008

On Fri, Mar 14, 2008 at 06:24:40PM -0300, Ramiro Polla wrote:
> Hello Michael,
>
> Michael Niedermayer wrote:
>> On Tue, Mar 11, 2008 at 06:51:49PM -0300, Ramiro Polla wrote:
>>> Hello,
>>>
>>> I have taken libmimic [0] and used it to implement a native Mimic [1] 
>>> decoder for FFmpeg.
>>>
>>> The Mimic codec is used by MSN Messenger to transmit Webcam conversations 
>>> (not Video Conversations). It is also supported by aMSN and Mercury 
>>> Messenger. IIRC, those two programs allow to record the streamed data. 
>>> There's also MSN Webcam Recorder [2] that can record those streams.
>>>
>>> I have the whole history in an SVN server [3] that goes all the way back 
>>> to the first integration that had only minor changes from the original 
>>> code (like glib's ints to C99).
>>>
>>> There's one big issue that we discussed some on IRC and I decided to take 
>>> to the list. The original code does lots of deblocking as 
>>> post-processing. Some people on IRC said libavcodec should not do pure 
>>> post-processing (it's not in-loop), and that I should drop that code. But 
>>> pretty much every user that will use this expects the output to be the 
>>> same as in MSN Messenger (which does the deblocking). So, is there any 
>>> chance the deblocking can be accepted in the codec? If so, I'd need to 
>>> clean it up some more.
>> no
>> Either the deblock is worse than what we have in which case ->/dev/null
>> or its better in which case please move it to libpostproc or libavfilter
>> But most likely MS copy and pasted it from the mpeg4 reference, which
>> would make it equal quality but very significantly slower to libpp.
>
> Deblocking removed. I'll try to bring it back as a vfilter some other 
> time...

you dont give up ;)
I strongly suggest that you compare it against libpostproc first. And
only implement a filter if it is really sufficiently different from it.

[...]
>> [...]
>>> /*
>>>  * vlc_decode_block
>>>  *
>>>  * De-serialize (reconstruct) a variable length coded 8x8 block.
>>>  */
>>> static int vlc_decode_block(MimicContext *ctx, DCTELEM *block, int 
>>> num_coeffs)
>>> {
>>>     unsigned int pos;
>>>
>>>     memset(block, 0, 64 * sizeof(DCTELEM));
>> clear_blocks() in dsp
>
> clear_blocks() clears 6*64*sizeof(DCTELEM). On mimic it's only 
> 64*sizeof(DCTELEM). Is there another function that will clear only what I 
> need?

Hmm, not that i remember, anyway you can move the memset to after the idct
that way it will be in the cache when its cleared.

[...]
>>>         value = get_bits(&ctx->gb, num_bits);
>>>
>>>         block[ctx->scantable.permutated[pos]] = 
>>> ctx->vlcdec_lookup[num_bits][value];
>> you are missing a check for pos being < 64 here
>
> Ok. It slowed down a bit though.
> MSN Messenger sets coefs to 28. The max ammount of pos+= is 14, so pos will 
> never be higher than 41 for all MSN Messenger created videos. Is there 
> something that can be done to make the check know it will most likely 
> always be < 64? (besides profiling)

If the check really annoys you that much then theres a way to get rid of it.
Simply make block 256 entries large, so the uint8_t scantable cannot cause
out of array writes. But this must be carefully documented so noone
simplifies it away thinking the size being unneeded.

or, make the End of block code vlc=256 instead of =0
and use

pos += vlc & (256+15);

if(pos >= 64)
    return vlc==256;

(yes this also avoids the -1 check)

>
>> and
>> value= 1 - (1<<num_bits);
>> if(num_bits>1)
>>     value+= get_bits(num_bits-1);
>> if(get_bits1())
>>     value -= value;
>> block[ctx->scantable.permutated[pos]] = value;
>> Or if you think the table matters speedwise then make it static dont
>> duplicate it in every decoder instance.
>
>           ref   calc  static
> mimic.o : 51868 51304 52192
> avg time: 1.692 1.712 1.728
>
> That's very odd... Shouldn't the static table be at least as fast as the
> current code? 

Yes
If you want to debug it the asm output from gcc would be a start but i
doubt its worth the time spend.

> Attached are the 2 patches I used to test. (lut_calc.diff and 
> lut_static.diff). Anyways, most FFmpeg stuff introduced a considerable 
> speedup over the original code, so I'm tempted to apply one of these two. 

> Could I use the hardcoded tables ifdef there?

I dont see much sense in it. I mean the calc code doesnt need the table at
all and needs around 100 byte while the table needs 1024.

[...]
>         if(vlc == -1)
>             return 0;
>         if(!vlc) /* end-of-block code */
>             return 1;

maybe
if(vlc<=0)
    return !vlc;

would be faster (no iam not suggesting this if its not faster)

[...]
> static int decode(MimicContext *ctx, int quality, int num_coeffs,
>                   int is_pframe)
> {
>     int y, x, plane, offset;
>     DECLARE_ALIGNED_16(DCTELEM, dct_block[64]);
>     uint8_t *src, *dst, *p;
>     AVPicture cur_frame;
>     AVPicture prev_frame;
>     uint8_t *base_src;
>     uint8_t *base_dst;
> 
>     prepare_avpic(ctx, &cur_frame , (AVPicture*) &ctx->picture   );
>     prepare_avpic(ctx, &prev_frame, (AVPicture*) &ctx->prev_frame);
> 
>     for(plane = 0; plane < 3; plane++) {
>         const int is_chroma = !!plane;
>         const int qscale = is_chroma ? av_clip(10000-quality, 1000, 10000)<<2 :
>                                        av_clip(10000-quality, 2000, 10000)<<2 ;
>         const int stride = cur_frame.linesize[plane];
>         base_src = prev_frame.data[plane];
>         base_dst = cur_frame. data[plane];
>         offset = 0;
>         for(y = 0 ; y < ctx->num_vblocks[plane] ; y++) {
>             unsigned int num_rows = 8;
>             /* The last row of blocks in chrominance for 160x120 resolution
>              * is half the normal height and must be accounted for. */
>             if(is_chroma &&
>                y + 1 == ctx->num_vblocks[plane] && ctx->avctx->height & 15)
>                 num_rows = 4;
>             src = base_src + offset;
>             dst = base_dst + offset;
>             for(x = 0; x < ctx->num_hblocks[plane]; x++) {
>                 /* Check for a change condition in the current block.
>                  * !pframes always change.
>                  * Luma plane checks for get_bits1 == 0, and chroma planes
>                  * check for get_bits1 ==1. */
>                 if(!is_pframe || get_bits1(&ctx->gb) == is_chroma) {
>                     /* Luma only: Is the new content the same as it was in one
>                      * of the 15 last frames preceding the previous?
>                      * Chroma planes don't use backreferences. */
>                     if(is_chroma || !is_pframe || !get_bits1(&ctx->gb)) {
>                         if(!vlc_decode_block(ctx, dct_block, num_coeffs, qscale))
>                             return 0;
>                         ctx->dsp.idct_put(dst, stride, dct_block);
>                     } else {
>                         unsigned int backref;
>                         AVPicture back_frame;
>                         /* Yes: read the backreference (4 bits) and copy. */
>                         backref = get_bits(&ctx->gb, 4);

>                         prepare_avpic(ctx, &back_frame, (AVPicture*)
>                               &ctx->buf_ptrs[(ctx->ptr_index + backref) & 15]);

It would be faster if this wouldnt be done per block

[...]
> /*
>  * initialize_vlcdec_lookup
>  *
>  * Internal helper-function used to initialize
>  * the lookup-table used by the VLC-decoder.
>  */

not doxygen compatible

[...]

>     if(!(is_pframe && !ctx->prev_frame.data[0])) {
>         int swap_buf_size = buf_size - MIMIC_HEADER_SIZE;
>         ctx->swap_buf = av_fast_realloc(ctx->swap_buf, &ctx->swap_buf_size,
>                                  swap_buf_size + FF_INPUT_BUFFER_PADDING_SIZE);
>         if(!ctx->swap_buf)
>             return AVERROR_NOMEM;
> 
>         ctx->dsp.bswap_buf((uint32_t*)ctx->swap_buf,
>                            (const uint32_t*) (buf + MIMIC_HEADER_SIZE),
>                            swap_buf_size>>2);
>         init_get_bits(&ctx->gb, ctx->swap_buf, swap_buf_size << 3);
> 
>         if(!decode(ctx, quality, num_coeffs, is_pframe))
>             goto fail;
>     } else {
>         av_log(avctx, AV_LOG_ERROR, "decoding must start with keyframe\n");
>         goto fail;
>     }

Maybe get_buffer() could be called for the previous frame if its missing?
Not sure if this makes much sense, that is if frames can be lost ...

[..]
-- 
Michael     GnuPG fingerprint: 9FF2128B147EF6730BADF133611EC787040B0FAB

Frequently ignored awnser#1 FFmpeg bugs should be sent to our bugtracker. User
questions about the command line tools should be sent to the ffmpeg-user ML.
And questions about how to use libav* should be sent to the libav-user ML.
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: Digital signature
URL: <http://lists.mplayerhq.hu/pipermail/ffmpeg-devel/attachments/20080315/37699220/attachment.pgp>