[FFmpeg-devel] [PATCH] DVCPRO HD - revised two-part patch

Michael Niedermayer michaelni
Wed Feb 13 03:13:57 CET 2008


On Fri, Jan 04, 2008 at 03:05:32PM +0800, Dan Maas wrote:
> Thanks for your comments. After taking another look at the find_macroblock 
> functions, I've decided to withdraw that part of the patch. The lookup 
> tables are not as big as I thought (a few tens of KB) and it was not hard 
> to produce them for DV100. We can always switch to functions later if it 
> clearly improves performance.
>
> (I kept the 16-bit lookup tables for DV25/50 and added 32-bit tables for 
> DV100. If this is too ugly we could make the DV25/50 tables into 32-bits or 
> find another approach, let me know.)
>
> The new DVCPRO HD patch is in two parts. The first patch (attached) makes 
> all of the code changes necessary to support DVCPRO HD, and the second 
> patch (URL below) adds the macroblock lookup tables.
> http://largefiles.maasdigital.com/largefiles/maas-dv100-2B-tables.diff.gz

If i understand it correctly your patch adds a decoder and encoder. These must
be split in seperate patches.


>
> Changes since my original 3-part post:
> - macroblock lookup tables kept as described above
> - tab and spacing issues fixed
> - the few // comments switched to /* */
>

> You guys have better eyes for micro-optimizations than I do; if there are 
> more examples like Michael pointed out in the find_macroblock functions, 
> please let me know.

Well its the patch submitters "job" to cleanup and optimize his code if
he wants to see it in svn. I surely will help but i wont approve the patch
as long as i find a single inefficient line in it. Maybe roman will, maybe
he will clean it up, if so fine, he is the dv maintainer and its his
decission ...

The patch is big so
if you wait for me to point to every one it will take a very long time.
Its your code and you know it better and could find and improve suboptimal
parts faster ...


>
> One more thing I wanted to ask about - I have a compile-time #define for 
> changing a quality-speed tradeoff in the DV100 encoder. It would be nicer 
> to do this as a run-time switch, but I'm not sure how best to get a 
> quality-speed parameter down at the codec level. Any help would be 
> appreciated.

[...]
> diff -ur ffmpeg-dv100-0/libavcodec/dv.c ffmpeg-dv100-2A-code/libavcodec/dv.c
> --- ffmpeg-dv100-0/libavcodec/dv.c	2008-01-04 03:46:55.000000000 +0800
> +++ ffmpeg-dv100-2A-code/libavcodec/dv.c	2008-01-04 14:41:10.000000000 +0800
[...]
> @@ -59,8 +60,8 @@
>  
>  /* MultiThreading - dv_anchor applies to entire DV codec, not just the avcontext */
>  /* one element is needed for each video segment in a DV frame */
> -/* at most there are 2 DIF channels * 12 DIF sequences * 27 video segments (PAL 50Mbps) */
> -#define DV_ANCHOR_SIZE (2*12*27)
> +/* at most there are 4 DIF channels * 12 DIF sequences * 27 video segments (1080i50) */
> +#define DV_ANCHOR_SIZE (4*12*27)
>  
>  static void* dv_anchor[DV_ANCHOR_SIZE];
>  

> @@ -255,14 +256,9 @@
>      uint8_t partial_bit_count;
>      uint16_t partial_bit_buffer;
>      int shift_offset;
> +    int qstep; /* DV100 only */
>  } BlockInfo;

not doxygen compatible



>  
> -/* block size in bits */
> -static const uint16_t block_sizes[6] = {
> -    112, 112, 112, 112, 80, 80
> -};
> -/* bit budget for AC only in 5 MBs */
> -static const int vs_total_ac_bits = (100 * 4 + 68*2) * 5;
>  /* see dv_88_areas and dv_248_areas for details */
>  static const int mb_area_start[5] = { 1, 6, 21, 43, 64 };
>  


> @@ -282,7 +278,7 @@
>  }
>  
>  /* decode ac coefs */
> -static void dv_decode_ac(GetBitContext *gb, BlockInfo *mb, DCTELEM *block)
> +static void dv_decode_ac(DVVideoContext *s, GetBitContext *gb, BlockInfo *mb, DCTELEM *block)
>  {
>      int last_index = get_bits_size(gb);
>      const uint8_t *scan_table = mb->scan_table;
> @@ -336,10 +332,15 @@
>              break;
>  
>          pos1 = scan_table[pos];
> -        level <<= shift_table[pos1];
>  
>          /* unweigh, round, and shift down */
> -        level = (level*iweight_table[pos] + (1 << (dv_iweight_bits-1))) >> dv_iweight_bits;
> +        if (DV_PROFILE_IS_HD(s->sys)) {
> +            level *= mb->qstep;
> +            level = (level*iweight_table[pos]) >> 5;
> +        } else {
> +            level <<= shift_table[pos1];
> +            level = (level*iweight_table[pos] + (1 << (dv_iweight_bits-1))) >> dv_iweight_bits;
> +        }
>  
>          block[pos1] = level;
>  

As the only thing used from DVVideoContext is DV_PROFILE_IS_HD(s->sys), its
sufficient to pass just a flag.
And the >>5 looks odd, the rounding is highly asymetric like that

and level*=iweight_table[pos] can be factored out of the if/else

btw is the spec for this DV variant available freely somewhere?



> @@ -361,13 +362,79 @@
>      }
>  }
>  
> +/* interleave scanlines from even- and odd-numbered blocks */
> +static void dv100_interleave(DVVideoContext *s, int mb_x, int mb_y, uint8_t *y_ptr, int c_offset)

doxygen and the description is not sufficient to understand what the
function does.


> +{
> +    int i;
> +    for (i = 0; i < 4; i++) {
> +        uint8_t *p0, *p1; /* pointer to first scanline of the two adjacent blocks */
> +        int stride; /* offset between successive scanlines */
> +
> +        /* duplicated code :( */

avoid the duplication !


> +        if (i < 2) {  /* DCT0/DCT2, DCT1/DCT3 */
> +            if (s->sys->height == 1080 && mb_y == 134) {
> +            /* bottom edge: horizontal row of 4 blocks */
> +                p0 = y_ptr + ((i+0)<<3);
> +                p1 = y_ptr + ((i+2)<<3);
> +            } else {
> +                p0 = y_ptr + ((((i+0) & 1) + ((i+0) >> 1) * s->picture.linesize[0])<<3);
> +                p1 = y_ptr + ((((i+2) & 1) + ((i+2) >> 1) * s->picture.linesize[0])<<3);
> +            }
> +            stride = s->picture.linesize[0];
> +        } else { /* DCT4/DCT5, DCT6/DCT7 */
> +            int j = i<<1;
> +            int chan = 2 - ((j-4)>>1);
> +            if (s->sys->height == 1080 && mb_y == 134) {
> +                /* bottom edge: horizontal row of 2 blocks */
> +                p0 = s->picture.data[chan] + c_offset + (((j+0)&1)<<3);
> +                p1 = s->picture.data[chan] + c_offset + (((j+1)&1)<<3);
> +            } else {
> +                p0 = s->picture.data[chan] + c_offset + ((((j+0) & 1) * s->picture.linesize[chan])<<3);
> +                p1 = s->picture.data[chan] + c_offset + ((((j+1) & 1) * s->picture.linesize[chan])<<3);
> +            }
> +            stride = s->picture.linesize[chan];
> +        }
> +
> +    /* GCC turns these small fixed-size memcpy()s into efficient mov sequences */
> +#define SET(a,b) memcpy(a, b, 8);
> +        {
> +            uint8_t temp[7][8];
> +
> +            SET(temp[0],     p0+1*stride);
> +            SET(temp[1],     p0+2*stride);
> +            SET(temp[2],     p0+3*stride);
> +            SET(temp[3],     p0+4*stride);
> +            SET(temp[4],     p0+5*stride);
> +            SET(temp[5],     p0+6*stride);
> +            SET(temp[6],     p0+7*stride);
> +
> +            SET(p0+1*stride, p1+0*stride);
> +            SET(p0+2*stride, temp[0]);
> +            SET(p0+3*stride, p1+1*stride);
> +            SET(p0+4*stride, temp[1]);
> +            SET(p0+5*stride, p1+2*stride);
> +            SET(p0+6*stride, temp[2]);
> +            SET(p0+7*stride, p1+3*stride);
> +
> +            SET(p1+0*stride, temp[3]);
> +            SET(p1+1*stride, p1+4*stride);
> +            SET(p1+2*stride, temp[4]);
> +            SET(p1+3*stride, p1+5*stride);
> +            SET(p1+4*stride, temp[5]);
> +            SET(p1+5*stride, p1+6*stride);
> +            SET(p1+6*stride, temp[6]);

4 SET() can be avoided here


[...]
> -    BlockInfo mb_data[5 * 6], *mb, *mb1;
> -    DECLARE_ALIGNED_8(DCTELEM, sblock[5*6][64]);
> +    BlockInfo mb_data[5 * DV_MAX_BPM], *mb, *mb1;
> +    DECLARE_ALIGNED_8(DCTELEM, sblock[5*DV_MAX_BPM][64]);

The replacement of a literal by a symbolic should be in a seperate patch!


[...]
> @@ -391,26 +459,49 @@
>      block1 = &sblock[0][0];
>      mb1 = mb_data;
>      init_put_bits(&vs_pb, vs_bit_buffer, 5 * 80);
> -    for(mb_index = 0; mb_index < 5; mb_index++, mb1 += 6, block1 += 6 * 64) {
> +    for(mb_index = 0; mb_index < 5; mb_index++, mb1 += s->sys->bpm, block1 += s->sys->bpm * 64) {    

same as above, such literal -> variable change should be in a seperate patch
These are easy to review and will very quickly be approved.


>          /* skip header */
>          quant = buf_ptr[3] & 0x0f;
>          buf_ptr += 4;
>          init_put_bits(&pb, mb_bit_buffer, 80);
>          mb = mb1;
>          block = block1;
> -        for(j = 0;j < 6; j++) {
> -            last_index = block_sizes[j];
> +        for(j = 0;j < s->sys->bpm; j++) {
> +            last_index = s->sys->block_sizes[j];
>              init_get_bits(&gb, buf_ptr, last_index);
>  

>              /* get the dc */
>              dc = get_sbits(&gb, 9);
>              dct_mode = get_bits1(&gb);
> -            mb->dct_mode = dct_mode;
> -            mb->scan_table = s->dv_zigzag[dct_mode];
> -            mb->iweight_table = dct_mode ? dv_iweight_248 : dv_iweight_88;
>              class1 = get_bits(&gb, 2);
> -            mb->shift_table = s->dv_idct_shift[class1 == 3][dct_mode]
> -                [quant + dv_quant_offset[class1]];
> +
> +            if (DV_PROFILE_IS_HD(s->sys)) {
> +                if (j == 0) {
> +                    /* macroblock Y location */
> +                    int mb_y = mb_pos_ptr[2*mb_index+1];
> +                    /* force non-field mode for 4x1 "bottom macro blocks" */
> +                    if(mb_y == 134) {
> +                        dv100_dct_mode[mb_index] = 0;
> +                    } else {
> +                        dv100_dct_mode[mb_index] = dct_mode;
> +                    }

You read a bit and then ignore it? This looks strange


> +                }
> +                /* DV100 does not use the 2-4-8 DCT */
> +                mb->dct_mode = 0;
> +                if (s->sys->height == 1080) {
> +                    mb->iweight_table = (j < 4) ? dv_iweight_1080_y : dv_iweight_1080_c;
> +                } else { /* 720p */
> +                    mb->iweight_table = (j < 4) ? dv_iweight_720_y : dv_iweight_720_c;
> +                }
> +                mb->qstep = dv100_qstep[quant] << class1;
> +            } else {

> +                mb->dct_mode = dct_mode;
> +                mb->iweight_table = dct_mode ? dv_iweight_248 : dv_iweight_88;
> +                mb->shift_table = s->dv_idct_shift[class1 == 3][dct_mode]
> +                    [quant + dv_quant_offset[class1]];

reindentions should be in a seperate patch (makes review easier)


[...]
>              else /* 4:2:0 */
>                  c_offset = (((mb_y >> 1) * s->picture.linesize[1] + (mb_x >> 1))<<log2_blocksize);
>          }
> -        for(j = 0;j < 6; j++) {
> +        for(j = 0;j < s->sys->bpm; j++) {
>              idct_put = s->idct_put[mb->dct_mode && log2_blocksize==3];
> -            if (s->sys->pix_fmt == PIX_FMT_YUV422P) { /* 4:2:2 */
> +            if (DV_PROFILE_IS_HD(s->sys)) { /* HD 4:2:2 */
> +                if (j < 4) {  /* Four Y blocks */
> +                    /* NOTE: at bottom of image in 1080i50/60, the macroblock is handled as 411 */
> +                    if (s->sys->height == 1080 && mb_y == 134) {
> +                        idct_put(y_ptr + (j<<log2_blocksize), s->picture.linesize[0], block);
> +                    } else {
> +                        idct_put(y_ptr + (((j & 1) + (j >> 1) * s->picture.linesize[0])<<log2_blocksize),
> +                                 s->picture.linesize[0], block);
> +                    }
> +                } else { /* Cr and Cb blocks */
> +                    int chan = 2 - ((j-4)>>1);
> +                    if (s->sys->height == 1080 && mb_y == 134) {
> +                        /* bottom edge: horizontal row of 2 blocks */
> +                        idct_put(s->picture.data[chan] + c_offset + ((j&1)<<log2_blocksize),
> +                                 s->picture.linesize[chan], block);
> +                    } else {
> +                        idct_put(s->picture.data[chan] + c_offset + (((j & 1) * s->picture.linesize[chan])<<log2_blocksize),
> +                                 s->picture.linesize[chan], block);
> +                    }
> +                }
> +            } else if (s->sys->pix_fmt == PIX_FMT_YUV422P) { /* SD 4:2:2 */
>                  if (j == 0 || j == 2) {
>                      /* Y0 Y1 */
>                      idct_put(y_ptr + ((j >> 1)<<log2_blocksize),

This as well as the similar existing code is a mess, it would be nice if that
would be cleaned up eventually before too many copy and pasted variants of
it accumulate. That doesnt mean iam not ok with this as it is jst that
iam pretty sure the whole can be done alot cleaner than having a large special
case for every variant in the loop.


> @@ -508,7 +618,7 @@
>                      idct_put(s->picture.data[6 - j] + c_offset,
>                               s->picture.linesize[6 - j], block);
>                  }
> -                /* note: j=1 and j=3 are "dummy" blocks in 4:2:2 */
> +                /* note: j=1 and j=3 are "dummy" blocks in SD 4:2:2 */
>              } else { /* 4:1:1 or 4:2:0 */
>                  if (j < 4) {
>                      if (s->sys->pix_fmt == PIX_FMT_YUV411P && mb_x < (704 / 8)) {

cosmetic-> seperate patch


> @@ -548,6 +658,9 @@
>              block += 64;
>              mb++;
>          }
> +        if (DV_PROFILE_IS_HD(s->sys) && dv100_dct_mode[mb_index]) {
> +            dv100_interleave(s, mb_x, mb_y, y_ptr, c_offset);
> +        }

while ive not completely reverse engeneered the reordering here, its definitly
not ok to put a block into the image, then copy it back out just to place it
back in there interleaved, theres a whole completely unneeded copy pass


[...]
> +    /* (i=0 is the DC component; we only include it to make the
> +       number of loop iterations even, for future possible SIMD optimization) */
> +    for (i = 0; i < 64; i += 2) {
> +        int level0, level1;
> +
> +        /* get the AC component (in zig-zag order) */
> +        level0 = blk[zigzag_scan[i+0]];
> +        level1 = blk[zigzag_scan[i+1]];
> +
> +        /* extract sign and make it the lowest bit */
> +        bi->sign[i+0] = (level0>>31)&1;
> +        bi->sign[i+1] = (level1>>31)&1;
> +
> +        /* take absolute value of the level */
> +        level0 = FFABS(level0);
> +        level1 = FFABS(level1);
> +
> +        /* weigh it */
> +        level0 = (level0*weight[i+0] + 4096 + (1<<17)) >> 18;
> +        level1 = (level1*weight[i+1] + 4096 + (1<<17)) >> 18;
> +
> +        /* save unquantized value */
> +        bi->save[i+0] = level0;
> +        bi->save[i+1] = level1;
> +    }

the forced unrolling is ugly


[...]
>      uint8_t aspect = 0;
> -    if((int)(av_q2d(c->avctx->sample_aspect_ratio) * c->avctx->width / c->avctx->height * 10) == 17) /* 16:9 */
> +    if(((int)(av_q2d(c->avctx->sample_aspect_ratio) * c->avctx->width / c->avctx->height * 10) == 17)
> +       || DV_PROFILE_IS_HD(c->sys)) /* 16:9 */
>          aspect = 0x02;

superflous ()


[...]
> +/* 1/qstep, shifted up by 16 bits */
> +static const int dv100_qstep_bits = 16;

maybe #define
and doxygen ...

[...]
>  static inline const DVprofile* dv_frame_profile(uint8_t* frame)
>  {
> -    if ((frame[3] & 0x80) == 0) {      /* DSF flag */
> -        /* it's an NTSC format */
> -        if ((frame[80*5 + 48 + 3] & 0x4) && (frame[80*5 + 48] == dv_video_source)) { /* 4:2:2 sampling */
> -            return &dv_profiles[3]; /* NTSC 50Mbps */
> -        } else { /* 4:1:1 sampling */
> -            return &dv_profiles[0]; /* NTSC 25Mbps */
> -        }
> -    } else {
> -        /* it's a PAL format */
> -        if ((frame[80*5 + 48 + 3] & 0x4) && (frame[80*5 + 48] == dv_video_source)) { /* 4:2:2 sampling */
> -            return &dv_profiles[4]; /* PAL 50Mbps */
> -        } else if ((frame[5] & 0x07) == 0) { /* APT flag */
> -            return &dv_profiles[1]; /* PAL 25Mbps 4:2:0 */
> -        } else
> -            return &dv_profiles[2]; /* PAL 25Mbps 4:1:1 */
> +    int i;
> +

> +    /* DSF flag */
> +    int dsf = (frame[3] & 0x80) >> 7;
> +
> +    /* VAUX stype */
> +    int stype = frame[80*5 + 48 + 3] & 0x1f;

Please give variables sane names instead if putting the sane name before
each in a comment.


> +
> +    /* 576i50 25Mbps 4:1:1 is a special case */
> +    if (dsf == 1 && stype == 0 && ((frame[5] & 0x07) != 0)) {

 != 0 is superflous


> +        return &dv_profiles[2];
>      }
> +
> +    for (i=0; i<sizeof(dv_profiles)/sizeof(DVprofile); i++)
> +        if (dsf == dv_profiles[i].dsf && stype == dv_profiles[i].video_stype)
> +            return &dv_profiles[i];
> +
> +    return NULL;
>  }

This change together with the changes in dv_profiles looks independant of the rest
and should be in a seperate patch


[...]
> @@ -2698,10 +2994,14 @@
>  static inline int dv_write_dif_id(enum dv_section_type t, uint8_t chan_num, uint8_t seq_num,
>                                    uint8_t dif_num, uint8_t* buf)
>  {
> +    int fsc = chan_num & 1;
> +    int fsp = 1 - (chan_num >> 1);
> +
>      buf[0] = (uint8_t)t;    /* Section type */
>      buf[1] = (seq_num<<4) | /* DIF seq number 0-9 for 525/60; 0-11 for 625/50 */
> -             (chan_num << 3) | /* FSC: for 50Mb/s 0 - first channel; 1 - second */
> -             7;             /* reserved -- always 1 */
> +             (fsc << 3) |   /* FSC: for 50 and 100Mb/s 0 - first channel; 1 - second */
> +             (fsp << 2) |   /* FSP: for 100Mb/s 1 - channels 0-1; 0 - channels 2-3 */
> +             3;             /* reserved -- always 1 */
>      buf[2] = dif_num;       /* DIF block number Video: 0-134, Audio: 0-8 */
>      return 3;
>  }

this also looks like a canditate for a seperate patch

[...]

> -static int dv_extract_audio(uint8_t* frame, uint8_t* pcm, uint8_t* pcm2,
> +static int dv_extract_audio(uint8_t* frame, uint8_t* ppcm[4],
>                              const DVprofile *sys)
>  {

this with the related parts also could be in a seperate patch


[...]
> @@ -192,11 +196,17 @@
>  
>      smpls = as_pack[1] & 0x3f; /* samples in this frame - min. samples */
>      freq = (as_pack[4] >> 3) & 0x07; /* 0 - 48KHz, 1 - 44,1kHz, 2 - 32 kHz */
> -    stype = (as_pack[3] & 0x1f); /* 0 - 2CH, 2 - 4CH */
> +    stype = (as_pack[3] & 0x1f); /* 0 - 2CH, 2 - 4CH, 3 - 8CH */
>      quant = as_pack[4] & 0x07; /* 0 - 16bit linear, 1 - 12bit nonlinear */

cosmetic -> seperate patch (one with only cosmetics and no functional changes)


[...]

> -    c->ast[0] = c->ast[1] = NULL;
> +    c->ast[0] = c->ast[1] = c->ast[2] = c->ast[3] = NULL;

memset() maybe


>      c->ach = 0;
>      c->frames = 0;
>      c->abytes = 0;
> @@ -310,6 +320,7 @@
>                        uint8_t* buf, int buf_size)
>  {
>      int size, i;
> +    uint8_t *ppcm[4];
>  
>      if (buf_size < DV_PROFILE_BYTES ||
>          !(c->sys = dv_frame_profile(buf)) ||
> @@ -320,11 +331,13 @@
>      /* Queueing audio packet */
>      /* FIXME: in case of no audio/bad audio we have to do something */
>      size = dv_extract_audio_info(c, buf);
> +    ppcm[0] = ppcm[1] = ppcm[2] = ppcm[3] = NULL;

uint8_t *ppcm[4]= {0};
at the top

[...]
> @@ -391,6 +405,11 @@
>          return AVERROR(EIO);
>  
>      c->dv_demux->sys = dv_frame_profile(c->buf);
> +    if (!c->dv_demux->sys) {
> +        av_log(s, AV_LOG_ERROR, "Can't determine profile of DV input stream.\n");
> +        return -1;
> +    }
> +
>      s->bit_rate = av_rescale(c->dv_demux->sys->frame_size * 8,
>                               c->dv_demux->sys->frame_rate,
>                               c->dv_demux->sys->frame_rate_base);

looks unrelated if so -> sperate patch

[...]
-- 
Michael     GnuPG fingerprint: 9FF2128B147EF6730BADF133611EC787040B0FAB

Frequently ignored awnser#1 FFmpeg bugs should be sent to our bugtracker. User
questions about the command line tools should be sent to the ffmpeg-user ML.
And questions about how to use libav* should be sent to the libav-user ML.
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: Digital signature
URL: <http://lists.mplayerhq.hu/pipermail/ffmpeg-devel/attachments/20080213/d246ecef/attachment.pgp>



More information about the ffmpeg-devel mailing list