[FFmpeg-soc] RV40 review

Tue Aug 21 08:58:42 CEST 2007

On Tue, Aug 21, 2007 at 05:04:03AM +0200, Michael Niedermayer wrote:
> Hi
> 
> another review, (again note files are a day or 2 old ...)
> yes these reviewes take more time than one thinks ...
> 
> h264pred.h:
> [...]
> > /**
> >  * @file h264pred.h
> >  * H.264 / AVC / MPEG4 prediction functions.
> >  * @author Michael Niedermayer <michaelni at gmx.at>
> >  */
[...]
> this looks familiar,

Note that it is not my name there, I just reordered it.

> i assume you plan to remove these from
> h264data.h in svn because otherwise they would be duplicated
> (same with h264pred.c)

Yes, and use it for H.264, SVQ3 and RV40.

> [...]
> 
> rv40data.h:
> [...]
> > /**
> >  * Maximum number of macroblocks for each of the possible slice offset sizes
> >  */
> > static const uint16_t rv40_mb_max_sizes[6] = { 0x2F, 0x68, 0x18B, 0x62F, 0x18BF, 0x23FF };
> > /**
> >  * Bits needed to code slice offset for the given size
> >  */
> > static const uint8_t rv40_mb_bits_sizes[6] = { 6, 7, 9, 11, 13, 14 };
> 
> these 2 tables might be identical to ff_mba_max/length()
> the code which uses them might be identical too (ive not checked)
> maybe something can be shared here, but maybe its too tricky file
> dependancy wise and not worth for 18 byte ...
> though a note could be added pointing at the similarity

Note added 

> [...]
> 
> rv40.c:
> [...]
> > /**
> >  * Real Video 4.0 inverse transform
> >  * Code is almost the same as in SVQ3, only scaling is different
> >  */
> > static void rv40_intra_inv_transform(DCTELEM *block, const int offset){
> 
> yes and the scaling can be done before the transform IMHO
> so the actual transform is the same and could be shared

Well, that requires separating transform and inverse quantization in SVQ3
and modifying quantizers for RV40 as SVQ3 also has larger scales (>>20).
I also think it will be a good candidate for DSPContext.
Hopefully I will send a patch soon.  

> [...]
> >         const int z0= 13*(temp[4*0+i] +    temp[4*2+i]);
> >         const int z1= 13*(temp[4*0+i] -    temp[4*2+i]);
> >         const int z2=  7* temp[4*1+i] - 17*temp[4*3+i];
> >         const int z3= 17* temp[4*1+i] +  7*temp[4*3+i];
> > 
> >         block[offset+i*8+0]= ((z0 + z3) + 0x200)>>10;
> >         block[offset+i*8+1]= ((z1 + z2) + 0x200)>>10;
> >         block[offset+i*8+2]= ((z1 - z2) + 0x200)>>10;
> >         block[offset+i*8+3]= ((z0 - z3) + 0x200)>>10;
> 
> the  +0x200 can be added to z0/z1 avoiding 2 +

done

> [...]
> > /**
> >  * Decode coded block pattern
> >  */
> > static int rv40_decode_cbp(GetBitContext *gb, RV40VLC *vlc, int table)
> > {
> >     int pattern, code, cbp=0;
> >     int table2;
> >     static const int cbp_masks[4] = {0x000000, 0x100000, 0x010000, 0x110000};
> 
> cbp_masks[0] is unused

adjusted

[ coeffs[]=... quadruplification ]

moved to decode_subblock() functions

[ dequantizing ]
> if you do dequant during decode then you dont have to dequant the
> common 0 coeffs

Unfortunately, it takes two dequantizers per block and they may be used in different
positions (either single top left coefficient or three coefficients in the top left
corner). I will think on it.

> [...]
> > /**
> >  * Get stored dimension from bitstream
> >  *
> >  * If the width/height is the standard one then it's coded as 3-bit index.
> >  * Otherwise it is coded as escaped 8-bit portions.
> >  */
> > static inline int get_dimension(GetBitContext *gb, const int *dim1, const int *dim2)
> 
> i dont think this should be inline

dropped inline modifier 

> [...]
> > /**
> >  * Get encoded picture size - usually this is called from rv40_parse_slice_header
> >  */
> > static void rv40_parse_picture_size(GetBitContext *gb, int *w, int *h)
> > {
> >     *w = get_dimension(gb, rv40_standard_widths, NULL);
> >     *h = get_dimension(gb, rv40_standard_heights, rv40_standard_heights2);
> 
> avcodec_check_dimensions() should be used here to ensure that the dimensions
> wont produce overflows

It modifies only coded_{width, height} but I will add this check when I finish with slices.

[ unused variable bits in get_omega() ]

dropped 

[ duplicate block type selection code ]

merged common part

> [...]
> > /** Macroblock partition width in 8x8 blocks */
> > const int part_sizes_w[RV40_MB_TYPES] = { 2, 2, 2, 1, 2, 2, 2, 2, 2, 1, 2, 2 };
> > 
> > /** Macroblock partition height in 8x8 blocks */
> > const int part_sizes_h[RV40_MB_TYPES] = { 2, 2, 2, 1, 2, 2, 2, 2, 1, 2, 2, 2 };
> 
> static const uint8_t ?

made it so 

> > 
> > /**
> >  * Motion vectors prediction
> >  *
> >  * Motion prediction performed for the block by using median prediction of
> >  * motion vector from the left, top and right top blocks but in corener cases
> >  * some other vectors may be used instead
> >  */
> > static void rv40_pred_mv(RV40DecContext *r, int block_type, int subblock_no)
> > {
> >     MpegEncContext *s = &r->s;
> >     int mv_pos = s->mb_x * 2 + s->mb_y * 2 * s->b8_stride;
> >     int A[2], B[2], C[2];
> >     int no_A = 1, no_B = 1, no_C = 1;
> >     int i, j;
> >     int mx, my;
> > 
> >     memset(A, 0, sizeof(A));
> >     memset(B, 0, sizeof(B));
> >     memset(C, 0, sizeof(C));
> 
> >     no_A = s->mb_x < 1 || (s->first_slice_line && s->mb_x == s->resync_mb_x);
> >     no_B = s->first_slice_line;
> >     no_C = s->first_slice_line || (s->mb_x + 1) == s->mb_width;
> 
> i think there are a few special cases, which arent considered here
> like maybe: (numbers to distingish slices)
> 0011
> 1.     (here C is available though B is not so no first_slice_line check alone
> will work unless rv40 does somthing simpler than other video codecs)

Thought about it too but looks like that case will be ignored (but in B-frame
MV prediction this case is handled).

> [...]
> >         if(no_B){
> >             C[0] = A[0];
> >             C[1] = A[1];
> >         }else{
> >             if(!no_A){
> >                 C[0] = s->current_picture_ptr->motion_val[0][mv_pos-s->b8_stride-1][0];
> >                 C[1] = s->current_picture_ptr->motion_val[0][mv_pos-s->b8_stride-1][1];
> >             }else{
> >                 C[0] = A[0];
> >                 C[1] = A[1];
> >             }
> >         }
> 
> if(no_B || no_A){
>     C[0] = A[0];
>     C[1] = A[1];
> }else{
>     C[0] = s->current_picture_ptr->motion_val[0][mv_pos-s->b8_stride-1][0];
>     C[1] = s->current_picture_ptr->motion_val[0][mv_pos-s->b8_stride-1][1];
> }
> 
> also if A/B/C where pointers then 
> C=A;
> else
> C= s->current_picture_ptr->motion_val[0][mv_pos-s->b8_stride-1]
> 
> would be enough ...

simplified

> [...]
> >         for(j = 0; j < 2; j++){
> >             for(i = 0; i < 2; i++){
> >                 s->current_picture_ptr->motion_val[0][s->mb_x * 2 + s->mb_y * 2 * s->b8_stride + i + j*s->b8_stride][0] = 0;
> >                 s->current_picture_ptr->motion_val[0][s->mb_x * 2 + s->mb_y * 2 * s->b8_stride + i + j*s->b8_stride][1] = 0;
> >             }
> >         }
> 
> fill_rectangle() (yes it should be moved into some header ...)
> 
> 
> [...]
> >     if(no_up && no_left)
> >         itype = DC_128_PRED;
> >     else if(no_up){
> >         if(itype == VERT_PRED) itype = HOR_PRED;
> >         if(itype == DC_PRED)   itype = LEFT_DC_PRED;
> >     }else if(no_left){
> >         if(itype == HOR_PRED)  itype = VERT_PRED;
> >         if(itype == DC_PRED)   itype = TOP_DC_PRED;
> >         if(itype == DIAG_DOWN_LEFT_PRED) itype = DIAG_DOWN_LEFT_PRED_RV40_NODOWN;
> >     }
> 
> 2 tables like in check_intra4x4_pred_mode() might be simpler
> 
> 
> [...]
> 
> >             for(YY = Y, i = 0; i < 4; i++, cbp >>= 1, no_left = 0, YY += 4){
> 
> thiscodeputstoomuchonthesamelineicanreaditbutitneedsmoretimethanitshould

slightly offloaded this line 

> >                 no_topright = no_up || (i==3 && j) || (i==3 && !j && (s->mb_x-1) == s->mb_width);
> 
> maybe some solution similar to h264.c *_cache would be a good idea, that is
> that there simply is a small array of fixed size which contains the
> availabilty/type of surrounding blocks, that way you just initalize that
> at one point per macroblock with code like:
> if(first_slice_line)
>     mark all blocks above as unavailable
> if(!s->mb_x || (s->mb_x == s->resync_mb_x && s->first_slice_line))
>     mark all blcoks to the left as unavailable
> 
> and then just do blah_cache[current_block-1] for the left blocks avaiabilty
> and blah_cache[current_block-8] for the top blocks avaiabilty
> and blah_cache[current_block-8+block_size] for the top right
> 
> or maybe something like top_samples_available that is a bit per
> each of the 16 4x4 blocks which indicates if the top block is
> available could be used ...

Seems interesting, I will implement it during other optimizations. 

> [...]
> >         no_up = s->first_slice_line;
> >         for(j = 0; j < 2; j++){
> >             no_left = !s->mb_x || (s->mb_x == s->resync_mb_x && s->first_slice_line);
> >             for(i = 0; i < 2; i++, cbp >>= 1, no_left = 0){
> >                 no_topright = no_up || (i && j) || (i && !j && (s->mb_x-1) == s->mb_width);
> >                 rv40_pred_4x4_block(r, U + i*4 + j*4*s->uvlinesize, s->uvlinesize, ittrans[intra_types[i*2+j*2*r->intra_types_stride]], no_up, no_left, i || j, no_topright);
> >                 if(!(cbp & 1)) continue;
> >                 rv40_add_4x4_block(U + i*4 + j*4*s->uvlinesize, s->uvlinesize, s->block[4], i*4+j*32);
> >             }
> >             no_up = 0;
> >         }
> >         no_up = s->first_slice_line;
> >         for(j = 0; j < 2; j++){
> >             no_left = !s->mb_x || (s->mb_x == s->resync_mb_x && s->first_slice_line);
> >             for(i = 0; i < 2; i++, cbp >>= 1, no_left = 0){
> >                 no_topright = no_up || (i && j) || (i && !j && (s->mb_x-1) == s->mb_width);
> >                 rv40_pred_4x4_block(r, V + i*4 + j*4*s->uvlinesize, s->uvlinesize, ittrans[intra_types[i*2+j*2*r->intra_types_stride]], no_up, no_left, i || j, no_topright);
> >                 if(!(cbp & 1)) continue;
> >                 rv40_add_4x4_block(V + i*4 + j*4*s->uvlinesize, s->uvlinesize, s->block[5], i*4+j*32);
> >             }
> >             no_up = 0;
> >         }
> 
> code duplication

merged 

> [...]
> > static int rv40_decode_frame(AVCodecContext *avctx,
> >                             void *data, int *data_size,
> >                             uint8_t *buf, int buf_size)
> > {
> >     RV40DecContext *r = avctx->priv_data;
> >     MpegEncContext *s = &r->s;
> >     AVFrame *pict = data;
> >     SliceInfo si;
> >     int i;
> > 
> >     /* no supplementary picture */
> >     if (buf_size == 0) {
> >         return 0;
> >     }
> 
> this is wrong as rv40 supports b frames, the not yet output frame should
> be returned here

fixed 

> [...]
[ slice decoding in rv40_decode_frame() ]
> 
> please explain what this has_slice slice_data and stuff is good for
> it looks like you workaround demuxer bugs or the lack of a parser in the
> decoder if so thats unacceptable for ffmpeg svn ...

I followed the scheme from rv10.c but since in RV40 slice transmits only
its start position and not mb_count, I found it convenient to store slice
and decode it only when the next slice arrives (with certain drawbacks
which had become a showstopper). I will write a parser to collect these
slices so only if(avctx->slice_count) will remain.  

> [...]
> 
> -- 
> Michael     GnuPG fingerprint: 9FF2128B147EF6730BADF133611EC787040B0FAB