[FFmpeg-devel] [PATCH] Mimic decoder

Ramiro Polla ramiro
Fri Mar 14 22:24:40 CET 2008


Hello Michael,

Michael Niedermayer wrote:
> On Tue, Mar 11, 2008 at 06:51:49PM -0300, Ramiro Polla wrote:
>> Hello,
>>
>> I have taken libmimic [0] and used it to implement a native Mimic [1] 
>> decoder for FFmpeg.
>>
>> The Mimic codec is used by MSN Messenger to transmit Webcam conversations 
>> (not Video Conversations). It is also supported by aMSN and Mercury 
>> Messenger. IIRC, those two programs allow to record the streamed data. 
>> There's also MSN Webcam Recorder [2] that can record those streams.
>>
>> I have the whole history in an SVN server [3] that goes all the way back to 
>> the first integration that had only minor changes from the original code 
>> (like glib's ints to C99).
>>
>> There's one big issue that we discussed some on IRC and I decided to take 
>> to the list. The original code does lots of deblocking as post-processing. 
>> Some people on IRC said libavcodec should not do pure post-processing (it's 
>> not in-loop), and that I should drop that code. But pretty much every user 
>> that will use this expects the output to be the same as in MSN Messenger 
>> (which does the deblocking). So, is there any chance the deblocking can be 
>> accepted in the codec? If so, I'd need to clean it up some more.
> 
> no
> Either the deblock is worse than what we have in which case ->/dev/null
> or its better in which case please move it to libpostproc or libavfilter
> But most likely MS copy and pasted it from the mpeg4 reference, which
> would make it equal quality but very significantly slower to libpp.

Deblocking removed. I'll try to bring it back as a vfilter some other 
time...

> [...]
>>     int width;
>>     int height;
>>     enum PixelFormat pix_fmt;
> 
> Duplicate of the respective things in AVCodecContext

Added a pointer to AVCodecContext instead.

>>     int f[2];
> 
> !
> 
> (which is a one letter awnser for no in case you didnt guess it)
> that is please change this to qscale, qp or whatever ...

Renamed to qscale =)

> [...]
>> /*
>>  * vlc_decode_block
>>  *
>>  * De-serialize (reconstruct) a variable length coded 8x8 block.
>>  */
>> static int vlc_decode_block(MimicContext *ctx, DCTELEM *block, int num_coeffs)
>> {
>>     unsigned int pos;
>>
>>     memset(block, 0, 64 * sizeof(DCTELEM));
> 
> clear_blocks() in dsp

clear_blocks() clears 6*64*sizeof(DCTELEM). On mimic it's only 
64*sizeof(DCTELEM). Is there another function that will clear only what 
I need?

>>     /* The DC-value is read in as is. */
>>     block[0] = get_bits(&ctx->gb, 8);
>>
>>     for(pos = 1; pos < num_coeffs; pos++) {
>>         uint32_t vlc, num_bits;
>>         int value;
>>
>>         vlc = get_vlc2(&ctx->gb, ctx->vlc1.table, ctx->vlc1.bits, 4);
>>         if(vlc == -1)
>>             return 0;
>>         if(vlc == 3) /* end-of-block code */
>>             return 1;
>>
>>         pos +=     magic_values[vlc][0];
>>         num_bits = magic_values[vlc][1];
> 
> the 2 magic_values together fit in 7 bit
> if you reshuffle the entries in huffcodes/huffbits then something like:
> pos     += vlc&15;
> num_bits = vlc>>4;
> would do the same and you would not need the magic_values table. 

           ref   >>3&7 >>4&15
mimic.o : 52124 51824 51824
avg time: 1.708 1.688 1.684

Changed to what you suggested. The table in the >>4 &15 case has 5 more 
zeroes than >>3 &7, but is a tiny little bit faster.

>>         value = get_bits(&ctx->gb, num_bits);
>>
>>         block[ctx->scantable.permutated[pos]] = ctx->vlcdec_lookup[num_bits][value];
> 
> you are missing a check for pos being < 64 here

Ok. It slowed down a bit though.
MSN Messenger sets coefs to 28. The max ammount of pos+= is 14, so pos 
will never be higher than 41 for all MSN Messenger created videos. Is 
there something that can be done to make the check know it will most 
likely always be < 64? (besides profiling)

> and
> 
> value= 1 - (1<<num_bits);
> if(num_bits>1)
>     value+= get_bits(num_bits-1);
> if(get_bits1())
>     value -= value;
> 
> block[ctx->scantable.permutated[pos]] = value;
> 
> Or if you think the table matters speedwise then make it static dont
> duplicate it in every decoder instance.

           ref   calc  static
mimic.o : 51868 51304 52192
avg time: 1.692 1.712 1.728

That's very odd... Shouldn't the static table be at least as fast as the
current code? Attached are the 2 patches I used to test. (lut_calc.diff 
and lut_static.diff). Anyways, most FFmpeg stuff introduced a 
considerable speedup over the original code, so I'm tempted to apply one 
of these two. Could I use the hardcoded tables ifdef there?

>>     }
>>
>>     return 1;
>> }
>>
> 
>> static void dequant_block(MimicContext *ctx, DCTELEM *block, int is_chrom)
>> {
>>     int i, f = ctx->f[is_chrom];
>>
>>     /* FFmpeg's IDCT behaves somewhat different from the original code, so
>>      * a factor of 4 was added to the input */
>>
>>     /* De-quantize. */
>>     block[0] <<= 3;
>>     block[1] <<= 4;
>>     block[8] <<= 4;
>>
>>     for(i = 2; i < 64; i++) {
>>         if(i == 8)
>>             continue;
>>
>>         /* TODO Use >> 10 instead of / 1001 */
>>         block[i] = (block[i] * f) / 1001;
>>     }
>> }
> 
> Do this during block decode, it will be much faster as >90% of all coeffs
> are 0

4.2% speedup =)

>> /*
>>  * decode_main
>>  *
>>  * Main decoding loop.
>>  */
>> static int decode(MimicContext *ctx, int quality, int num_coeffs, int is_pframe)
>> {
>>     int y, x, i, chrom_ch, offset;
>>     DECLARE_ALIGNED_16(DCTELEM, dct_block[64]);
>>     uint8_t *src, *dst, *p;
>>     uint8_t *base_src, *base_dst;
>>     uint32_t bit;
>>     int y_stride = ctx->picture.linesize[0];
>>     int crcb_stride = ctx->picture.linesize[1];
> 
>>     int crcb_size = ctx->picture.linesize[1] * (ctx->height>>1);
>>
>>     /* Clear Cr and Cb planes. */
>>     memset(ctx->picture.data[1], 128, crcb_size);
>>     memset(ctx->picture.data[2], 128, crcb_size);
> 
> remove this insanity please

Removed. They will always be written to anyways.

>>     ctx->f[0] = av_clip(10000-quality, 1000, 10000)<<2;
>>     ctx->f[1] = av_clip(10000-quality, 2000, 10000)<<2;
>>
> 
>>     /* Decode Y plane. */
>>     for(y = ctx->num_vblocks_y - 1; y >= 0 ; y--) {
> 
> use luma instead of "y" its confusing when mixed wth x/y

With the merged loops there's no more need to call anything y or crcb.

>>         offset = (y_stride << 3) * y;
>>         src = ctx->prev_frame.data[0] + offset;
>>         dst = ctx->picture.data[0]  + offset;
>>         for(x = 0; x < ctx->num_hblocks_y; x++) {
>>             /* Check for a change condition in the current block. */
>>             bit = is_pframe ? get_bits1(&ctx->gb) : 0;
>>             if(!bit) {
>>                 /* Yes: Is the new content the same as it was in one of
>>                  * the 15 last frames preceding the previous? */
>>                 if(is_pframe)
>>                     bit = get_bits1(&ctx->gb);
>>                 if(!bit) {
>>                     /* No: decode it. */
>>                     if(!vlc_decode_block(ctx, dct_block, num_coeffs)) {
>>                         /* Corruped frame, return. */
>>                         return 0;
>>                     }
>>                     dequant_block(ctx, dct_block, 0);
>>                     ctx->dsp.idct_put(dst+7*y_stride, -y_stride, dct_block);
> 
> Id change linesize to *-1 and make data[*] point to the last line,
> that way all this would look like a normal top to bottom decoder.

Done.

>>                 } else {
>>                     uint32_t backref;
>>                     /* Yes: read the backreference (4 bits) and copy. */
>>                     backref = get_bits(&ctx->gb, 4);
>>                     p = ctx->buf_ptrs[(ctx->ptr_index + backref) % 16].data[0];
> 
> &15

Done.

>>                     p += offset + (x << 3);
>>                     for(i = 7; i >= 0; i--) {
>>                         int new_offset = y_stride * i;
>>                         memcpy(dst + new_offset, p + new_offset, 8);
>>                     }
>>                 }
>>             } else {
> 
>>                 /* No change no worries: just copy from the previous frame. */
>>                 for(i = 7; i >= 0; i--) {
>>                     int new_offset = y_stride * i;
>>                     memcpy(dst + new_offset, src + new_offset, 8);
>>                 }
> 
> see put_pixels_tab in dsp

Done. It takes quite some time to find it, though. It's hidden in many 
macros inside dsputil.c (thanks to Mans for finding it =).

>>             }
>>             src += 8;
>>             dst += 8;
>>         }
>>     }
>>
>>     /* Decode Cr and Cb planes. */
>>     for(chrom_ch = 0; chrom_ch < 2; chrom_ch++) {
>>         base_src = (chrom_ch ? ctx->prev_frame.data[1] : ctx->prev_frame.data[2]);
>>         base_dst = (chrom_ch ? ctx->picture.data[1] : ctx->picture.data[2]);
> 
> base_src= ctx->prev_frame.data[2-chrom_ch];
> base_dst= ctx->picture   .data[2-chrom_ch];
> 
> and merge the chroma and luma loops if possible

With the merged loops and prepare_avpic switching chroma planes it's 
become simpler.

> [...]
>>     /*
>>      * Make a copy of the current frame and store in
>>      * the circular pointer list of 16 entries.
>>      */
>>     ctx->prev_frame = ctx->buf_ptrs[ctx->ptr_index];
>>
>>     av_picture_copy((AVPicture*)&ctx->prev_frame, (AVPicture*)&ctx->picture,
>>                     ctx->pix_fmt, ctx->width, ctx->height);
> 
> no, remove all copiing its unneeded.
> 
> 
> 
> [...]
> 
>>         /* TODO find out why dsp.bswap_buf crashes */
> 
> size in 32bit vs size in 8bit ? Iam just guessing of course ...

Well, you guessed right =)

Ramiro Polla
-------------- next part --------------
An embedded and charset-unspecified text was scrubbed...
Name: lut_calc.diff
URL: <http://lists.mplayerhq.hu/pipermail/ffmpeg-devel/attachments/20080314/69ec9204/attachment.txt>
-------------- next part --------------
An embedded and charset-unspecified text was scrubbed...
Name: lut_static.diff
URL: <http://lists.mplayerhq.hu/pipermail/ffmpeg-devel/attachments/20080314/69ec9204/attachment.asc>
-------------- next part --------------
An embedded and charset-unspecified text was scrubbed...
Name: mimic.c
URL: <http://lists.mplayerhq.hu/pipermail/ffmpeg-devel/attachments/20080314/69ec9204/attachment-0001.txt>



More information about the ffmpeg-devel mailing list