[FFmpeg-devel] [PATCH] DNxHD/VC-3 encoder

Michael Niedermayer michaelni
Fri Sep 7 16:00:08 CEST 2007


Hi

On Fri, Sep 07, 2007 at 02:03:16PM +0200, Baptiste Coudurier wrote:
> Baptiste Coudurier wrote:
> > Hi guys,
> > 
> > Here is the encoder.
> > 
> > Optimizations ideas very welcome. MMX code is adapted from mpegvideo, I
> > can reuse a dummy MpegVideoEncContext if wanted. Many thanks to people
> > who already helped me.
> > 
> 
> Diego's comments addressed, new patch attached.

[...]

> +static int dnxhd_init_rc(DNXHDEncContext *ctx)
> +{
> +    ctx->frame_bits = (ctx->cid_table->coding_unit_size - 640 - 4) * 8;
> +    ctx->qmax = MAX_QSCALE-1;
> +    ctx->qmin = 1;

shouldnt these be take from AVCodecContext ?


> +    ctx->qscale = 1;
> +    return 0;
> +}
> +

> +// XXX do not use bias since quality is not good at all (-1db) ?
> +static int dnxhd_dct_quantize_c(DNXHDEncContext *ctx, DCTELEM *block, int n, int qscale)
> +{
> +    const int *qmat;
> +    int last_non_zero = 1;
> +    int i, j;
> +    int level;
> +
> +    //ctx->dsp.fdct(block);
> +
> +    qmat = n&2 ? ctx->q_scale_matrixc[qscale] : ctx->q_scale_matrixl[qscale];
> +
> +    // DC is not quantized
> +    block[0] = (block[0] + 4) >> 3;
> +
> +    for (i = 63; i; i--) {
> +        j = ff_zigzag_direct[i];
> +        if (block[j]) {
> +            last_non_zero = i;
> +            break;
> +        } else {
> +            block[j] = 0;
> +        }
> +    }
> +    for (i = 1; i <= last_non_zero; i++) {
> +        j = ff_zigzag_direct[i];
> +        level = block[j] * qmat[j];
> +        if (level > 0) {
> +            block[j] = level >> QMAT_SHIFT;
> +        } else {
> +            block[j] = -(-level >> QMAT_SHIFT);
> +        }
> +    }
> +    return last_non_zero;
> +}

why dont you use the existing quantization code? the same question applies to
the mmx code you duplicate
as far as i can tell from a quick look there is no sigificant difference
between them
also you would get trellis quant for free if you would use the existing
code


[...]
> +static av_always_inline int dnxhd_write_header(AVCodecContext *avctx, uint8_t *buf)

this doesnt need to be always inline i think it doesnt look speed critical


> +{
> +    DNXHDEncContext *ctx = avctx->priv_data;
> +    const uint8_t header_prefix[5] = { 0x00,0x00,0x02,0x80,0x01 };
> +
> +    //memset(buf, 0, 640);

forgotten debug code?


[...]
> +static av_always_inline void dnxhd_encode_dc(DNXHDEncContext *ctx, int diff)
> +{
> +    if (diff) {
> +        int nbits;
> +        if (diff < 0) {
> +            nbits = av_log2_16bit(-2*diff);
> +            diff--;
> +        } else {
> +            nbits = av_log2_16bit(2*diff);
> +        }
> +        put_bits(&ctx->pb, ctx->cid_table->dc_bits[nbits] + nbits,
> +                 (ctx->cid_table->dc_codes[nbits]<<nbits) + (diff & ((1 << nbits) - 1)));
> +    } else {
> +        put_bits(&ctx->pb, ctx->cid_table->dc_bits[0], ctx->cid_table->dc_codes[0]);
> +    }

this is a duplicate of encode_dc() from mpeg12

why the special case for diff=0 ?


> +}
> +

> +static av_always_inline int dnxhd_encode_mb(DNXHDEncContext *ctx, DCTELEM *block, int i)

this is encoding a block not a mb

> +{
> +    int last_non_zero = 0;
> +    int offset = 0;
> +    int n = i&2 ? 1 + (i&1) : 0;
> +    int j;
> +
> +    dnxhd_encode_dc(ctx, block[0] - ctx->last_dc[n]);
> +    ctx->last_dc[n] = block[0];
> +
> +    for (j = 1; j < ctx->last_index[i]; j++) {
> +        int slevel = block[ff_zigzag_direct[j]];
> +        if (slevel) {
> +            int run_level = j - last_non_zero - 1;
> +            int sign, level;
> +            MASK_ABS(sign, slevel);
> +            level = slevel;

the level variable is useless


> +            if (level > 64) {
> +                offset = (level-1) >> 6;
> +                level = 256 | (level & 63); // level 64 is treated as 0
> +            }
> +            if (run_level)
> +                level |= 128;
> +            put_bits(&ctx->pb, ctx->table_vlc_bits[level]+1, (ctx->table_vlc_codes[level]<<1)|(sign&1));
> +            if (offset) {
> +                put_bits(&ctx->pb, 4, offset);
> +                offset = 0;
> +            }
> +            if (run_level)
> +                put_bits(&ctx->pb, ctx->table_run_bits[run_level], ctx->table_run_codes[run_level]);
> +            last_non_zero = j;
> +        }
> +    }
> +    put_bits(&ctx->pb, ctx->table_vlc_bits[0], ctx->table_vlc_codes[0]); // EOB
> +    return 0;

the function always returns 0 so it could as well be void
this also applies to more functions

[...]
> +                level = -level;
> +                level = (2*level+1) * qscale * weigth_matrix[i];

level = (1-2*level) * qscale * weigth_matrix[i];


[...]
> +static int dnxhd_calc_slice_bits_thread(AVCodecContext *avctx, void *arg)
> +{
> +    DNXHDEncContext *ctx = arg;
> +    int mb_y, mb_x;
> +    ctx->thread_size = 0;
> +    for (mb_y = ctx->start_mb_y; mb_y < ctx->end_mb_y; mb_y++) {
> +        ctx->slice_size[mb_y] = ctx->mb_width*(12+8*ctx->table_vlc_bits[0]); //qscale+eob
> +        for (mb_x = 0; mb_x < ctx->mb_width; mb_x++) {
> +            unsigned mb = mb_y * ctx->mb_width + mb_x;
> +            ctx->slice_size[mb_y] += ctx->mb_ac_bits[mb][0]+ctx->mb_dc_bits[mb];
> +        }
> +        if (ctx->slice_size[mb_y]&31)
> +            ctx->slice_size[mb_y] += 32-(ctx->slice_size[mb_y]&31);
> +        ctx->slice_size[mb_y] >>= 3;
> +        ctx->thread_size += ctx->slice_size[mb_y];
> +    }
> +    return 0;
> +}

id guess the thread overhead will be bigger than the time needed to sum these
values up


> +

> +static int dnxhd_mb_var_thread(AVCodecContext *avctx, void *arg)
> +{
> +    DNXHDEncContext *ctx = arg;
> +    int mb_y, mb_x;
> +    for (mb_y = ctx->start_mb_y; mb_y < ctx->end_mb_y; mb_y++) {
> +        for (mb_x = 0; mb_x < ctx->mb_width; mb_x++) {
> +            unsigned mb  = mb_y * ctx->mb_width + mb_x;
> +            uint8_t *pix = ctx->thread[0]->src_y + ((mb_y<<4) * ctx->linesize) + (mb_x<<4);
> +            int sum      = ctx->dsp.pix_sum(pix, ctx->linesize);
> +            int varc     = (ctx->dsp.pix_norm1(pix, ctx->linesize) - (((unsigned)(sum*sum))>>8)+500+128)>>8;
> +            ctx->mb_rc[mb].delta_rd = varc;
> +        }
> +    }
> +    return 0;
> +}

duplicate of mb_var_thread()


> +
> +static int dnxhd_encode_picture(AVCodecContext *avctx, unsigned char *buf, int buf_size, void *data)
> +{
> +    DNXHDEncContext *ctx = avctx->priv_data;
> +    AVFrame *input_frame = data;
> +    int min_bits, max_bits;
> +    int first_field = 1;
> +    int offset, i;
> +
> +    if (buf_size < ctx->cid_table->frame_size) {
> +        av_log(avctx, AV_LOG_ERROR, "output buffer is too small to compress picture\n");
> +        return -1;
> +    }
> +
> +    av_picture_copy((AVPicture *)ctx->picture, data, PIX_FMT_YUV422P, avctx->width, avctx->height);
> +    ctx->picture->interlaced_frame = input_frame->interlaced_frame;
> +    ctx->cur_field = input_frame->interlaced_frame && !input_frame->top_field_first && first_field;
> +
> + encode_coding_unit:
> +    dnxhd_write_header(avctx, buf);
> +
> +    if (ctx->interlaced && ctx->cur_field) {
> +        ctx->src_y = ctx->picture->data[0] + ctx->picture->linesize[0];
> +        ctx->src_u = ctx->picture->data[1] + ctx->picture->linesize[1];
> +        ctx->src_v = ctx->picture->data[2] + ctx->picture->linesize[2];
> +    } else {
> +        ctx->src_y = ctx->picture->data[0];
> +        ctx->src_u = ctx->picture->data[1];
> +        ctx->src_v = ctx->picture->data[2];
> +    }
> +
> + calculate_bits:
> +    avctx->execute(avctx, dnxhd_calc_bits_thread, (void**)&ctx->thread[0], NULL, avctx->thread_count);
> +    dnxhd_calc_frame_bits(ctx, &min_bits, &max_bits);
> +    if (max_bits < ctx->frame_bits && ctx->qscale > 1) {
> +        ctx->qscale--;
> +        goto calculate_bits;
> +    } else if (min_bits > ctx->frame_bits) {
> +        ctx->qscale++;
> +        if (ctx->qscale == ctx->qmax) {
> +            av_log(avctx, AV_LOG_ERROR, "picture could not fit rc constraints\n");
> +            return -1;
> +        }
> +        goto calculate_bits;
> +    } else {
> +        if (avctx->debug & FF_DEBUG_RC)
> +            av_log(avctx, AV_LOG_DEBUG, "frame %d, qscale %d\n", avctx->frame_number, ctx->qscale);
> +        if (avctx->mb_decision != FF_MB_DECISION_RD) // use variance
> +            avctx->execute(avctx, dnxhd_mb_var_thread, (void**)&ctx->thread[0], NULL, avctx->thread_count);
> +        qsort(ctx->mb_rc, ctx->mb_num, sizeof(RCEntry), dnxhd_cmp_delta_rd);
> +        for (i = 0; i < ctx->mb_num && max_bits > ctx->frame_bits; i++) {
> +            max_bits -= ctx->mb_rc[i].delta_bits;
> +            ctx->mb_ac_bits[ctx->mb_rc[i].mb][0] = ctx->mb_ac_bits[ctx->mb_rc[i].mb][1];
> +            ctx->mb_rc[ctx->mb_rc[i].mb].qscale++;
> +        }

when people start using qsort you know theres something wrong in their
encoder (or at least not optimal)

so, lets first check if i understand the constraints
* its possible to change qscale per macroblock and it doesnt cost anything
* theres a fixed number of bits available for each frame (pre mpeg1 design)

in that case the optimal solution (though not fastest) is to do a
binary search over lambda (using the number of bits and the ones available
to find the best lambda which results in a frame which is not too large)

then for each such lambda, find the best qscale for each MB, again by
binary search minimizing SSD + bits*lambda (that way we know the best
qscales and how many bits the frame would need)
--------
now to make this run at a useable speed
1. cache the bits and rate distortion values for every qscale and MB
2. dont use dumb binary search that is dont do
 try(100), try(50) try(25) try(12) try(6) try(9) try(7) try(8)
but instead use the previous lambda as start point
try(10) try(9) try(7) try(8)
or anoter smarter variant
3. dont overdo it with the precission of lambda and maybe try  a logarithic
scale for the binary search over lambda ...


also independant of that try trellis quant it should give a nice PSNR gain


[...]
-- 
Michael     GnuPG fingerprint: 9FF2128B147EF6730BADF133611EC787040B0FAB

Everything should be made as simple as possible, but not simpler.
-- Albert Einstein
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: Digital signature
URL: <http://lists.mplayerhq.hu/pipermail/ffmpeg-devel/attachments/20070907/41cad0b5/attachment.pgp>



More information about the ffmpeg-devel mailing list