[FFmpeg-devel] [PATCH] DNxHD/VC-3 encoder

Baptiste Coudurier baptiste.coudurier
Fri Oct 5 11:53:30 CEST 2007


Hi

Michael Niedermayer wrote:
> Hi
> 
> On Wed, Oct 03, 2007 at 10:45:04PM +0200, Baptiste Coudurier wrote:
>> Hi Michael, thanks for the review,
>>
>> Michael Niedermayer wrote:
>>> On Mon, Sep 24, 2007 at 01:00:12PM +0200, Baptiste Coudurier wrote:
>>>
>>>> Hi Michael,
>>>>
>>>> Michael Niedermayer wrote:
>>>>
>>>>> Hi
>>>>>
>>>>> On Fri, Sep 07, 2007 at 08:05:28PM +0200, Michael Niedermayer wrote:
>>>>>
>>>>>> Hi
>>>>>>
>>>>>> On Fri, Sep 07, 2007 at 09:17:49AM -0600, Loren Merritt wrote:
>>>>>>
>>>>>>> On Fri, 7 Sep 2007, Michael Niedermayer wrote:
>>>>>>>
>>>>>>>
>>>>>>>> when people start using qsort you know theres something wrong in their
>>>>>>>> encoder (or at least not optimal)
>>>>>>>>
>>>>>>>> so, lets first check if i understand the constraints
>>>>>>>> * its possible to change qscale per macroblock and it doesnt cost anything
>>>>>>>> * theres a fixed number of bits available for each frame (pre mpeg1 design)
>>>>>>>>
>>>>>>>> in that case the optimal solution (though not fastest) is to do a
>>>>>>>> binary search over lambda (using the number of bits and the ones available
>>>>>>>> to find the best lambda which results in a frame which is not too large)
>>>>>>>>
>>>>>>>> then for each such lambda, find the best qscale for each MB, again by
>>>>>>>> binary search minimizing SSD + bits*lambda (that way we know the best
>>>>>>>> qscales and how many bits the frame would need)
>>>>>>> Your algorithm produces the same decisions as the one with qsort. I don't 
>>>>>>> know which is faster.
>>>>>>>
>>>>>>> Consider the list of possible block encodings sorted by 
>>>>>>> delta_ssd/delta_bits. Optimizing for qps at any given lambda will produce 
>>>>>>> some prefix of that list. i.e. for any lambda, there is some 
>>>>>>> position in the list such that all preceding entries have ssd/bits 
>>>>>>> better than lambda, and all following entries have ssd/bits worse 
>>>>>>> than lambda. So instead of evaluating bits(lambda) as a black box and 
>>>>>>> searching for a specified values of bits, construct the function 
>>>>>>> lambda(bits) and evaluate it at the specified value of bits.
>>>>>> yes, but as far as i can see thats not what the current code does, it just
>>>>>> tries qscale and qscale+1 if it would try all qscale then yes they would
>>>>>> be identical
>>>>> just to clarify my original review ...
>>>>> i dont mind if the code uses qsort, what i was complaining about was that
>>>>> it isnt finding the optimal (in the RD sense) solution
>>>>>
>>>>> if its changed to find the optimal one or its demonstrated that the
>>>>> PSNR/bitrate difference is negligible then iam happy with that part
>>>>>
>>>> I tried lagrange. I keep the qsort rc because it is useable (> real time
>>>> on quad core cpu with variance, -5 fps for ssd), rdo one is encoding at
>>>> 4 fps :/
>>>>
>>>> Im wondering if I did screwed up the implementation because I have an
>>>> IMHO negligeable gain with rdo.
>>>>
>>>> I reused existing quant functions.
>>>>
>>>> Threaded functions must return int else you get a warning.
>>>>
>>>> About encode_dc, it will be complicated to reuse existing ones because:
>>>> mpeg12 use some static tables and generation code.
>>>> mjpeg use uint16_t for dc_codes.
>>>>
>>>> About mb_var_thread, existing function use current_picture and set
>>>> variables I don't need, mb_mean, and use me field too ...
>>>>
>>>> Attached are some psnr results with one HD uncompressed sample I have.
>>>>
>>>> Any other optimization because what Loren suggested earlier ?
>>>
>>> [...]
>>>
>>>> +static int dnxhd_init_qmat(DNXHDEncContext *ctx, int lbias, int cbias)
>>>> +{
>>>> +    int qscale, i;
>>>> +    int (*qmat_luma  )[64] = av_mallocz(ctx->avctx->qmax * 64 * sizeof(*qmat_luma));
>>>> +    int (*qmat_chroma)[64] = av_mallocz(ctx->avctx->qmax * 64 * sizeof(*qmat_chroma));
>>>> +    uint16_t (*qmat16_luma  )[2][64] = av_mallocz(ctx->avctx->qmax * 64 * 2 * sizeof(*qmat16_luma));
>>>> +    uint16_t (*qmat16_chroma)[2][64] = av_mallocz(ctx->avctx->qmax * 64 * 2 * sizeof(*qmat16_chroma));
>>>> +
>>>> +    for (qscale = 1; qscale < ctx->avctx->qmax; qscale++) {
>>>> +        for (i = 1; i < 64; i++) {
>>>> +            int j = ff_zigzag_direct[i];
>>>> +            qmat_luma  [qscale][j] = (int)((UINT64_C(4) << QMAT_SHIFT) / (qscale * ctx->cid_table->  luma_weigth[i]));
>>>> +            qmat_chroma[qscale][j] = (int)((UINT64_C(4) << QMAT_SHIFT) / (qscale * ctx->cid_table->chroma_weigth[i]));
>>>> +
>>>> +            qmat16_luma  [qscale][0][j] = (4 << QMAT_SHIFT_MMX) / (qscale * ctx->cid_table->  luma_weigth[i]);
>>>> +            qmat16_chroma[qscale][0][j] = (4 << QMAT_SHIFT_MMX) / (qscale * ctx->cid_table->chroma_weigth[i]);
>>>> +
>>>> +            if (qmat16_luma  [qscale][0][j]==0 || qmat16_luma  [qscale][0][j]==128*256) qmat16_luma  [qscale][0][j]=128*256-1;
>>>> +            if (qmat16_chroma[qscale][0][j]==0 || qmat16_chroma[qscale][0][j]==128*256) qmat16_chroma[qscale][0][j]=128*256-1;
>>>> +
>>>> +            qmat16_luma  [qscale][1][j] = ROUNDED_DIV(lbias<<(16-QUANT_BIAS_SHIFT), qmat16_luma  [qscale][0][j]);
>>>> +            qmat16_chroma[qscale][1][j] = ROUNDED_DIV(cbias<<(16-QUANT_BIAS_SHIFT), qmat16_chroma[qscale][0][j]);
>>>> +        }
>>>> +    }
>>>> +
>>>> +    ctx->q_scale_matrixl   = qmat_luma;
>>>> +    ctx->q_scale_matrixc   = qmat_chroma;
>>>> +    ctx->q_scale_matrixl16 = qmat16_luma;
>>>> +    ctx->q_scale_matrixc16 = qmat16_chroma;
>>>> +    return 0;
>>>> +}
>>>
>>> duplicate of convert_matrix()
>> Yes, there are some tiny differences, and I need more code to readjust
>> tables, and its seems bigger than duplicating convert_matrix. What
>> should I do ?
> 
> does it work with all DCTs? the faan dct, the slow and fast jpeg dcts?
> ive not checked but i had the feeling it wont work with all
> if it works i of course prefer the smaller and simpler solution

Ok.

>>> [...]
>>>
>>>> +    dsputil_init(&ctx->m.dsp, avctx);
>>>> +    ff_init_scantable(ctx->m.dsp.idct_permutation, &ctx->m.intra_scantable, ff_zigzag_direct);
>>>> +
>>>> +    ctx->m.avctx = avctx;
>>>> +    ctx->m.dct_quantize = dct_quantize_c;
>>>
>>> does this disable the optimized quantize funcions?
>> Not really, since MPV_common_init_mmx will reassign optimized function
>> later. I changed it nonetheless.
>>
>>> what about quantize_trelis? (no its no reason to delay the patch, making
>>> trellis work could very well be in a later patch, iam just asking)
>> Well I'd like the encoder to be commited so I have real history on the
>> work. I find easier to work when code is in svn (diffs).
>>
> 
>> I had a quick look at trellis but I don't understand it completely yet
>> (ac_esc_length not set by mpeg2 encoder for example, and levels taken
>> into account seems different than dnxhd levels, once I get it I could as
>> well do it for mjpeg)
> 
> trellis for mjpeg would be cool :)
> maybe we could beat jpeg2000 that way ;)

Or snow ;)
Seriously, is it wanted or not ? Is there some documentation of the current
function in mpegvideo_enc.c somewhere ?

> [...]
>> How does it look like ?
> 
> [...]
> 
>> Index: libavcodec/dnxhdenc.c
>> ===================================================================
>> --- libavcodec/dnxhdenc.c	(revision 0)
>> +++ libavcodec/dnxhdenc.c	(revision 0)
> [...]
>> +//#define DEBUG
>> +#define RC_VARIANCE 1 // use variance or ssd for fast rc
>> +
>> +#include "avcodec.h"
>> +#include "dsputil.h"
>> +#include "mpegvideo.h"
>> +#include "dnxhddata.h"
>> +
>> +int dct_quantize_c(MpegEncContext *s, DCTELEM *block, int n, int qscale, int *overflow);
> 
> totally unrelated to your patch but this needs a ff_ prefix

Yes.

> [...]
>> +typedef struct DNXHDEncContext {
>> +    MpegEncContext m; ///< Used for quantization dsp functions
>> +
>> +    AVCodecContext *avctx;
>> +    AVFrame *picture;
>> +    int cid;
>> +    const CIDEntry *cid_table;
> 
>> +    uint8_t *msips;
> 
> whats a msips ? this could benefit from a comment i think

Better renamed to msip and added comment.

> [...]
>> +    ctx->m.avctx = avctx;
>> +    ctx->m.mb_intra = 1;
>> +    ctx->m.h263_aic = 1;
>> +
>> +    MPV_common_init_mmx(&ctx->m);
>> +    if (!ctx->m.dct_quantize)
>> +        ctx->m.dct_quantize = dct_quantize_c;
> 
> iam an idiot, i was staring at this for maybe 2 minutes, somehow i knew
> it was wrong i dont understand why i didnt realize it immedeatly ...
> 
> _mmx()
> 
> you dont even know if the CPU is a x86 ...

Added ifdef

> [...]
>> +                if (i&2) {
>> +                    n = 1 + (i&1);
>> +                    ctx->m.q_intra_matrix16 = ctx->qmatrix_c16;
>> +                    ctx->m.q_intra_matrix   = ctx->qmatrix_c;
>> +                } else {
>> +                    n = 0;
>> +                    ctx->m.q_intra_matrix16 = ctx->qmatrix_l16;
>> +                    ctx->m.q_intra_matrix   = ctx->qmatrix_l;
>> +                }
>> +
> [...]
> 
>> +                if (i&2) {
>> +                    n = 1 + (i&1);
>> +                    ctx->m.q_intra_matrix16 = ctx->qmatrix_c16;
>> +                    ctx->m.q_intra_matrix   = ctx->qmatrix_c;
>> +                } else {
>> +                    n = 0;
>> +                    ctx->m.q_intra_matrix16 = ctx->qmatrix_l16;
>> +                    ctx->m.q_intra_matrix   = ctx->qmatrix_l;
>> +                }
> 
> duplicated

Added dnxhd_switch_matrix

> [...]
>> +static int dnxhd_find_qscale(DNXHDEncContext *ctx)
>> +{
>> +    int bits = 0;
>> +    int up_step = 1;
>> +    int down_step = 1;
>> +    int last_higher = 0;
>> +    int last_lower = INT_MAX;
>> +    int qscale;
>> +    int x, y;
>> +
>> +    qscale = ctx->qscale;
>> +    for (;;) {
>> +        bits = 0;
>> +        ctx->qscale = qscale;
>> +        // XXX avoid recalculating bits
>> +        ctx->avctx->execute(ctx->avctx, dnxhd_calc_bits_thread, (void**)&ctx->thread[0], NULL, ctx->avctx->thread_count);
>> +        for (y = 0; y < ctx->m.mb_height; y++) {
>> +            for (x = 0; x < ctx->m.mb_width; x++)
>> +                bits += ctx->mb_rc[qscale][y*ctx->m.mb_width+x].bits;
>> +            bits = (bits+31)&~31; // padding
>> +            if (bits > ctx->frame_bits)
>> +                break;
>> +        }
>> +        //dprintf(ctx->avctx, "%d, qscale %d, bits %d, frame %d\n", ctx->avctx->frame_number, qscale, bits, ctx->frame_bits);
>> +        if (bits < ctx->frame_bits) {
>> +            if (qscale == 1)
>> +                break;
>> +            if (last_higher == qscale - 1) {
>> +                qscale = last_higher;
>> +                break;
>> +            }
>> +            if (qscale < last_lower)
>> +                last_lower = qscale;
>> +            qscale -= down_step++;
> 
> this looks like it can make qscale negative if bits stays small enough

Added qscale < 1 test.

>> +            up_step = 1;
>> +        } else {
>> +            if (last_lower == qscale + 1)
>> +                break;
>> +            if (qscale > last_higher)
>> +                last_higher = qscale;
>> +            qscale += up_step++;
>> +            down_step = 1;
>> +            if (qscale >= ctx->avctx->qmax)
>> +                return -1;
>> +        }
> 
> this is not optimal (speed wise) IMHO
> if you have a lower and upper qscale then simple bisection should be
> faster

Done.

> [...]
>> +static int dnxhd_encode_picture(AVCodecContext *avctx, unsigned char *buf, int buf_size, void *data)
>> +{
>> +    DNXHDEncContext *ctx = avctx->priv_data;
>> +    AVFrame *input_frame = data;
>> +    int first_field = 1;
>> +    int offset, i, ret;
>> +
>> +    if (buf_size < ctx->cid_table->frame_size) {
>> +        av_log(avctx, AV_LOG_ERROR, "output buffer is too small to compress picture\n");
>> +        return -1;
>> +    }
>> +
>> +    av_picture_copy((AVPicture *)ctx->picture, data, PIX_FMT_YUV422P, avctx->width, avctx->height);
> 
> why do you copy the input image?

If picture is not allocated as 1920x1088 I need to copy it.
Added dnxhd_load_picture and do not copy it anymore if not needed.

-- 
Baptiste COUDURIER                              GnuPG Key Id: 0x5C1ABAAA
SMARTJOG S.A.                                    http://www.smartjog.com
Key fingerprint                 8D77134D20CC9220201FC5DB0AC9325C5C1ABAAA
Phone: +33 1 49966312
-------------- next part --------------
A non-text attachment was scrubbed...
Name: dnxhd_encoder7.patch
Type: text/x-diff
Size: 34762 bytes
Desc: not available
URL: <http://lists.mplayerhq.hu/pipermail/ffmpeg-devel/attachments/20071005/9f99d701/attachment.patch>



More information about the ffmpeg-devel mailing list