[FFmpeg-devel] [PATCH] DNxHD/VC-3 encoder

Baptiste Coudurier baptiste.coudurier
Wed Oct 3 22:45:04 CEST 2007


Hi Michael, thanks for the review,

Michael Niedermayer wrote:
> On Mon, Sep 24, 2007 at 01:00:12PM +0200, Baptiste Coudurier wrote:
> 
>>Hi Michael,
>>
>>Michael Niedermayer wrote:
>>
>>>Hi
>>>
>>>On Fri, Sep 07, 2007 at 08:05:28PM +0200, Michael Niedermayer wrote:
>>>
>>>>Hi
>>>>
>>>>On Fri, Sep 07, 2007 at 09:17:49AM -0600, Loren Merritt wrote:
>>>>
>>>>>On Fri, 7 Sep 2007, Michael Niedermayer wrote:
>>>>>
>>>>>
>>>>>>when people start using qsort you know theres something wrong in their
>>>>>>encoder (or at least not optimal)
>>>>>>
>>>>>>so, lets first check if i understand the constraints
>>>>>>* its possible to change qscale per macroblock and it doesnt cost anything
>>>>>>* theres a fixed number of bits available for each frame (pre mpeg1 design)
>>>>>>
>>>>>>in that case the optimal solution (though not fastest) is to do a
>>>>>>binary search over lambda (using the number of bits and the ones available
>>>>>>to find the best lambda which results in a frame which is not too large)
>>>>>>
>>>>>>then for each such lambda, find the best qscale for each MB, again by
>>>>>>binary search minimizing SSD + bits*lambda (that way we know the best
>>>>>>qscales and how many bits the frame would need)
>>>>>
>>>>>Your algorithm produces the same decisions as the one with qsort. I don't 
>>>>>know which is faster.
>>>>>
>>>>>Consider the list of possible block encodings sorted by 
>>>>>delta_ssd/delta_bits. Optimizing for qps at any given lambda will produce 
>>>>>some prefix of that list. i.e. for any lambda, there is some 
>>>>>position in the list such that all preceding entries have ssd/bits 
>>>>>better than lambda, and all following entries have ssd/bits worse 
>>>>>than lambda. So instead of evaluating bits(lambda) as a black box and 
>>>>>searching for a specified values of bits, construct the function 
>>>>>lambda(bits) and evaluate it at the specified value of bits.
>>>>
>>>>yes, but as far as i can see thats not what the current code does, it just
>>>>tries qscale and qscale+1 if it would try all qscale then yes they would
>>>>be identical
>>>
>>>just to clarify my original review ...
>>>i dont mind if the code uses qsort, what i was complaining about was that
>>>it isnt finding the optimal (in the RD sense) solution
>>>
>>>if its changed to find the optimal one or its demonstrated that the
>>>PSNR/bitrate difference is negligible then iam happy with that part
>>>
>>
>>I tried lagrange. I keep the qsort rc because it is useable (> real time
>>on quad core cpu with variance, -5 fps for ssd), rdo one is encoding at
>>4 fps :/
>>
>>Im wondering if I did screwed up the implementation because I have an
>>IMHO negligeable gain with rdo.
>>
>>I reused existing quant functions.
>>
>>Threaded functions must return int else you get a warning.
>>
>>About encode_dc, it will be complicated to reuse existing ones because:
>>mpeg12 use some static tables and generation code.
>>mjpeg use uint16_t for dc_codes.
>>
>>About mb_var_thread, existing function use current_picture and set
>>variables I don't need, mb_mean, and use me field too ...
>>
>>Attached are some psnr results with one HD uncompressed sample I have.
>>
>>Any other optimization because what Loren suggested earlier ?
> 
> 
> [...]
> 
>>+static int dnxhd_init_qmat(DNXHDEncContext *ctx, int lbias, int cbias)
>>+{
>>+    int qscale, i;
>>+    int (*qmat_luma  )[64] = av_mallocz(ctx->avctx->qmax * 64 * sizeof(*qmat_luma));
>>+    int (*qmat_chroma)[64] = av_mallocz(ctx->avctx->qmax * 64 * sizeof(*qmat_chroma));
>>+    uint16_t (*qmat16_luma  )[2][64] = av_mallocz(ctx->avctx->qmax * 64 * 2 * sizeof(*qmat16_luma));
>>+    uint16_t (*qmat16_chroma)[2][64] = av_mallocz(ctx->avctx->qmax * 64 * 2 * sizeof(*qmat16_chroma));
>>+
>>+    for (qscale = 1; qscale < ctx->avctx->qmax; qscale++) {
>>+        for (i = 1; i < 64; i++) {
>>+            int j = ff_zigzag_direct[i];
>>+            qmat_luma  [qscale][j] = (int)((UINT64_C(4) << QMAT_SHIFT) / (qscale * ctx->cid_table->  luma_weigth[i]));
>>+            qmat_chroma[qscale][j] = (int)((UINT64_C(4) << QMAT_SHIFT) / (qscale * ctx->cid_table->chroma_weigth[i]));
>>+
>>+            qmat16_luma  [qscale][0][j] = (4 << QMAT_SHIFT_MMX) / (qscale * ctx->cid_table->  luma_weigth[i]);
>>+            qmat16_chroma[qscale][0][j] = (4 << QMAT_SHIFT_MMX) / (qscale * ctx->cid_table->chroma_weigth[i]);
>>+
>>+            if (qmat16_luma  [qscale][0][j]==0 || qmat16_luma  [qscale][0][j]==128*256) qmat16_luma  [qscale][0][j]=128*256-1;
>>+            if (qmat16_chroma[qscale][0][j]==0 || qmat16_chroma[qscale][0][j]==128*256) qmat16_chroma[qscale][0][j]=128*256-1;
>>+
>>+            qmat16_luma  [qscale][1][j] = ROUNDED_DIV(lbias<<(16-QUANT_BIAS_SHIFT), qmat16_luma  [qscale][0][j]);
>>+            qmat16_chroma[qscale][1][j] = ROUNDED_DIV(cbias<<(16-QUANT_BIAS_SHIFT), qmat16_chroma[qscale][0][j]);
>>+        }
>>+    }
>>+
>>+    ctx->q_scale_matrixl   = qmat_luma;
>>+    ctx->q_scale_matrixc   = qmat_chroma;
>>+    ctx->q_scale_matrixl16 = qmat16_luma;
>>+    ctx->q_scale_matrixc16 = qmat16_chroma;
>>+    return 0;
>>+}
> 
> 
> duplicate of convert_matrix()

Yes, there are some tiny differences, and I need more code to readjust
tables, and its seems bigger than duplicating convert_matrix. What
should I do ?

> [...]
> 
>>+    dsputil_init(&ctx->m.dsp, avctx);
>>+    ff_init_scantable(ctx->m.dsp.idct_permutation, &ctx->m.intra_scantable, ff_zigzag_direct);
>>+
>>+    ctx->m.avctx = avctx;
>>+    ctx->m.dct_quantize = dct_quantize_c;
> 
> 
> does this disable the optimized quantize funcions?

Not really, since MPV_common_init_mmx will reassign optimized function
later. I changed it nonetheless.

> what about quantize_trelis? (no its no reason to delay the patch, making
> trellis work could very well be in a later patch, iam just asking)

Well I'd like the encoder to be commited so I have real history on the
work. I find easier to work when code is in svn (diffs).

I had a quick look at trellis but I don't understand it completely yet
(ac_esc_length not set by mpeg2 encoder for example, and levels taken
into account seems different than dnxhd levels, once I get it I could as
well do it for mjpeg)

> [...]
> 
> 
>>+            if (ctx->slice_size[mb_y]&31)
>>+                ctx->slice_size[mb_y] += 32-(ctx->slice_size[mb_y]&31);
> 
> 
> +31)&~31;
> 

Done.

> [...]
> 
>>+static int dnxhd_mb_var_thread(AVCodecContext *avctx, void *arg)
>>+{
>>+    DNXHDEncContext *ctx = arg;
>>+    int mb_y, mb_x;
>>+    for (mb_y = ctx->m.start_mb_y; mb_y < ctx->m.end_mb_y; mb_y++) {
>>+        for (mb_x = 0; mb_x < ctx->m.mb_width; mb_x++) {
>>+            unsigned mb  = mb_y * ctx->m.mb_width + mb_x;
>>+            uint8_t *pix = ctx->thread[0]->src_y + ((mb_y<<4) * ctx->m.linesize) + (mb_x<<4);
>>+            int sum      = ctx->m.dsp.pix_sum(pix, ctx->m.linesize);
>>+            int varc     = (ctx->m.dsp.pix_norm1(pix, ctx->m.linesize) - (((unsigned)(sum*sum))>>8)+500+128)>>8;
> 
> 
> could you explain what that +500 does?

I think not, I assumed it was some black magic, I tried without
and found no differences. I removed it then.

> [...]
> 
>>+            if (bits&31) //padding
>>+                bits += 32-(bits&31);
> 
> 
> +31)&~31;
> 

Done. Another one was changed too.

> [...]
> 
>>+    if (ctx->interlaced && ctx->cur_field) {
>>+        ctx->src_y = ctx->picture->data[0] + ctx->picture->linesize[0];
>>+        ctx->src_u = ctx->picture->data[1] + ctx->picture->linesize[1];
>>+        ctx->src_v = ctx->picture->data[2] + ctx->picture->linesize[2];
>>+    } else {
>>+        ctx->src_y = ctx->picture->data[0];
>>+        ctx->src_u = ctx->picture->data[1];
>>+        ctx->src_v = ctx->picture->data[2];
>>+    }
> 
> 
> id write
> 
> for(i=0; i<3; i++){
>     ctx->src[i]= ctx->picture->data[i];
>     if(ctx->interlaced && ctx->cur_field)
>         ctx->src[i] +=ctx->picture->linesize[i];
> }
> 

Ok, changed.

How does it look like ?

-- 
Baptiste COUDURIER                              GnuPG Key Id: 0x5C1ABAAA
SMARTJOG S.A.                                    http://www.smartjog.com
Key fingerprint                 8D77134D20CC9220201FC5DB0AC9325C5C1ABAAA
Phone: +33 1 49966312
-------------- next part --------------
An embedded and charset-unspecified text was scrubbed...
Name: dnxhd_encoder6.patch
URL: <http://lists.mplayerhq.hu/pipermail/ffmpeg-devel/attachments/20071003/7167764c/attachment.asc>



More information about the ffmpeg-devel mailing list