[Ffmpeg-devel] Re: [PATCH] Tiertex .SEQ files support

Gregory Montoir cyx
Mon Sep 25 13:44:39 CEST 2006


Hi,

Michael Niedermayer wrote:
> Hi
> 
> On Sun, Sep 24, 2006 at 10:59:10PM +0200, Gregory Montoir wrote:
> [...]
>>> [...]
>>>> +static unsigned char *seq_decode_op2(SeqVideoContext *seq, unsigned char 
>>>> *src, unsigned char *dst)
>>>> +{
>>>> +    int b, i, len;
>>>> +    GetBitContext gb;
>>>> +
>>>> +    len = *src++;
>>>> +    if (len & 0x80) {
>>>> +        switch (len & 15) {
>>>> +        case 1:
>>>> +            src = seq_unpack_rle_frame(src, seq->unpack_buffer, 
>>>> sizeof(seq->unpack_buffer));
>>>> +            for (b = 0; b < 8; ++b) {
>>>> +                memcpy(dst, &seq->unpack_buffer[b * 8], 8);
>>>> +                dst += seq->frame.linesize[0];
>>>> +            }
>>>> +            break;
>>>> +        case 2:
>>>> +            src = seq_unpack_rle_frame(src, seq->unpack_buffer, 
>>>> sizeof(seq->unpack_buffer));
>>>> +            for (i = 0; i < 8; ++i) {
>>>> +                for (b = 0; b < 8; ++b)
>>>> +                    dst[b * seq->frame.linesize[0]] = 
>>>> seq->unpack_buffer[i * 8 + b];
>>>> +                ++dst;
>>>> +            }
>>>> +            break;
>>>> +        }
>>> i would move the call to seq_unpack_rle_frame before the switch so that 
>>> there
>>> just a single call, that way the cost for inlinng the function would be
>>> smaller for the compiler (if it does inline it ...)
>> If you move that call before the switch, you should also handle the case
>> where (len & 3) == 0 (this probably never happens, I need to check the
>> sample files, but I would prefer being "compatible" with the original
>> playback code).
> 
> if((len&15) == 1 || (len&15) == 2)
>     src = seq_unpack_rle_frame(src, seq->unpack_buffer, sizeof(seq->unpack_buffer));

So, what you suggest is to replace code like this :

   switch (len & 3) {
   case 1:
     some_common_func();
     // do_1_stuff
   case 2:
     some_common_func();
     // do_2_stuff
   }

by :

   if (len & 3)
     some_common_func();
   switch (len & 3) {
   case 1:
     // do_1_stuff
   case 2:
     // do_2_stuff
   }

This introduces a (redundant) condition test and doesn't make the code
more readable, IMO. But anyway...


> 
>>
>>>> +    } else {
>>>> +        unsigned char *color_table = src;
>>>> +        src += len;
>>>> +        if (len > 8) {
>>>> +            init_get_bits(&gb, src, 32 * 8); src += 32;
>>>> +            for (b = 0; b < 8; ++b) {
>>>> +                for (i = 0; i < 8; ++i)
>>>> +                    dst[i] = color_table[get_bits(&gb, 4)];
>>>> +                dst += seq->frame.linesize[0];
>>>> +            }
>>>> +        } else if (len > 4) {
>>>> +            init_get_bits(&gb, src, 24 * 8); src += 24;
>>>> +            for (b = 0; b < 8; ++b) {
>>>> +                for (i = 0; i < 8; ++i)
>>>> +                    dst[i] = color_table[get_bits(&gb, 3)];
>>>> +                dst += seq->frame.linesize[0];
>>>> +            }
>>>> +        } else if (len > 2) {
>>>> +            init_get_bits(&gb, src, 16 * 8); src += 16;
>>>> +            for (b = 0; b < 8; ++b) {
>>>> +                for (i = 0; i < 8; ++i)
>>>> +                    dst[i] = color_table[get_bits(&gb, 2)];
>>>> +                dst += seq->frame.linesize[0];
>>>> +            }
>>>> +        } else {
>>>> +            init_get_bits(&gb, src, 8 * 8); src += 8;
>>>> +            for (b = 0; b < 8; ++b) {
>>>> +                for (i = 0; i < 8; ++i)
>>>> +                    dst[i] = color_table[get_bits(&gb, 1)];
>>>> +                dst += seq->frame.linesize[0];
>>>> +            }
>>>> +        }
>>>> +    }
>>> why not: (assuming its not slow)
>>>
>>> if(len > 8) bits= 4;
>>> else        bits= foobar_table[len]; (or some av_log2 based stuff)
>>> (init_get_bits(&gb, src, 8*8*bits); src += 8*bits)
>>> for (b = 0; b < 8; ++b) {
>>>    for (i = 0; i < 8; ++i)
>>>        dst[i] = color_table[get_bits(&gb, bits)];
>>>    dst += seq->frame.linesize[0];
>>> }
>> Looks nice, I rewrote that part using a custom log2 table (I haven't
>> been able to re-use the existing av_log2 table, as I need to "round up"
>> the result rather that rounding down).
> 
> what about av_log2(a-1)+1

Yes, that works. Would be nice to have this as a macro in some common
header file, though.


> [...]
>>>> +        seq_fill_buffer(seq, seq->current_frame_data[5],
>>>> +          seq->current_frame_data + offset,
>>>> +          end_offset - offset);
>>>> +    }
>>>> +    offset = LE_16(&seq->current_frame_data[10]);
>>>> +    if (offset != 0) {
>>>> +        end_offset = LE_16(&seq->current_frame_data[12]);
>>>> +        if (end_offset == 0)
>>>> +            end_offset = LE_16(&seq->current_frame_data[14]);
>>>> +        seq_fill_buffer(seq, seq->current_frame_data[6],
>>>> +          seq->current_frame_data + offset,
>>>> +          end_offset - offset);
>>>> +    }
>>>> +    offset = LE_16(&seq->current_frame_data[12]);
>>>> +    if (offset != 0) {
>>>> +        end_offset = LE_16(&seq->current_frame_data[14]);
>>>> +        seq_fill_buffer(seq, seq->current_frame_data[7],
>>>> +          seq->current_frame_data + offset,
>>>> +          end_offset - offset);
>>>> +    }
>>> for(i=0; i<3; i++)
>>>    buf_num[i]= get_byte();
>>> for(i=0; i<4; i++)
>>>    offset[i]= get_le16();
>>> end_offset[3]= 0;
>>> for(i=2; i>=0; i--)
>>>    end_offset[i]= offset[i+1] ? offset[i+1] : end_offset[i+1]
>>>
>>> for(i=0; i<3; i++){
>>>    if(offset[i])
>>>        seq_fill_buffer(seq, buf_num[i],
>>>            seq->current_frame_data + offset[i],
>>>            end_offset[i] - offset[i]);
>>> }
>>>
>>> dont you think that something like the above code is more readable? also 
>>> there are no url_fseek() needed ...
>> Indeed, the for loop makes the code more friendly.
>>
>> But still, I don't see how you can use the get_* functions on
>> ByteIOContext *and not* using url_fseek later on... Looking at your
>> pseudo code, seq_fill_buffer takes an offset as argument. So if you want
>> to copy data from a given offset, without having read the whole 6kb
>> buffer before, I will have to "seek". Same goes for the sound and
>> palette data.
>>
>> Am I missing something ?
> 
> the few url_fseek() will likely be faster then the memcpy(), benchmarks
> are of course welcome ...

Well, there is no real need to do benchmarks. If you don't read the
whole 6kb frame buffer at once, you'll need to do at most 6 reads by
frame (1 for palette data, 1 for audio and 4 for video) and do the
appropriate seeking before each read. Thus, number of I/O will be much
more important.

Anyway, I did the change. This file format is so simple and the frame
data sizes so small ; on nowadays machines, this won't make much a
difference.

Regards,
Gregory

-------------- next part --------------
An embedded and charset-unspecified text was scrubbed...
Name: ffmpeg-tiertexseq-20060925.diff
URL: <http://lists.mplayerhq.hu/pipermail/ffmpeg-devel/attachments/20060925/0838d529/attachment.asc>



More information about the ffmpeg-devel mailing list