[FFmpeg-devel] [PATCH] Common ACELP code & G.729 [4/7] - G.729 core

Vladimir Voroshilov voroshil
Fri Oct 17 10:14:27 CEST 2008


2008/10/17 Michael Niedermayer <michaelni at gmx.at>:

>> diff --git ffmpeg-r15432.orig/libavcodec/g729dec.c ffmpeg-r15432.mod/libavcodec/g729dec.c
>> index 6bd814f..74b0665 100644
>> --- ffmpeg-r15432.orig/libavcodec/g729dec.c
>> +++ ffmpeg-r15432.mod/libavcodec/g729dec.c
>
>> @@ -30,6 +30,9 @@
>>  #include "libavutil/avutil.h"
>>  #include "bitstream.h"
>>
>> +#define FRAC_BITS 15
>> +#include "mathops.h"
>> +
>>  #include "g729.h"
>>  #include "lsp.h"
>>  #include "acelp_math.h"
>
> this is unflexible and confusing, the used shift should not be file global
>
> [...]
>> +static const G729_format_description formats[] =
>> +{
>> +  {8000, 10, 160, 13, 1018156},
>
>> +// Note: may not work
>> +  {4400, 11, 176, 17, 1021547},
>> +};
>
> elaborate on "may" please
> if it does not work, it probably should be removed from the patch
> or fixed so it does work.
> if it does work what does the comment mean?
> and if it has not been tested, test it.

4400 mode is (i assume) Chinese extension of G.729. it is not covered
by official specification.
And since i have no samples i can't test it. This is why i put those comment.

According to above the best way will be just dropping it completely
(but i whould prefer keeping format structure as is, because it will
be necessary when G.729 will be added).

I am also not against dropping BFI code, since i feel myself having
poor knowledge to compare it with other frame erasure algorithms.
(But with BFI speech is understandable with 50% of input packets loss,
afair. I can made this simple test again, if you wish).

>
> [...]
>> @@ -138,6 +531,108 @@ static inline int g729_get_parity(uint8_t value)
>>                  14,
>>                  ctx->subframe_size);
>>
>> +        memcpy(synth, ctx->syn_filter_data, 10 * sizeof(int16_t));
>
> there are 2 memcpies copying this around, i dont see why it would need 2.

I'll try to describe this as much as i understand it.

Well... Synthesis filter can be written as
in[i]=sum{a[k]*out[i-k]},k=0..filter_order, i=0..subframe_size-1
where a[k] - filter coefficients, a[0] == 1, in[i] - input, out[i] -
output signal data

Due to optimization, a[0] is eliminated and filter (in code) is implemented as:
buf[i]=buf[i] - sum{a[k]*buf[i-1-k]}, k=0..filter_order-1

Thus, i need a buffer of length subframe_size+(filter_order-1).
But... the top of this buffer must contains k-1 samples from the end
of previous filter run result.
Brrr. I mean:
buffer after Nth call:
(0),..,(k-1),(k),..,[(subframe_size-k),..,(subframe_size-1)]

buffer before (N+1)th  call:
[(0),..,(k-1)], (the rest is garbage yet)

data in brackets must be the same.

When synthesis filter is single filter working with this buffer, there
is enough one memmove.
Unfortunately there are also high-pass filter and postfilter, which
also changes buffer.
So, i have to save synthesis filter output data before passing them
to below filters and restore saved
data back before next call to synthesis filter.

The same applies to high-pass and postfilter filters.

Did i clarify enough requirement of double memcpy ?
I can also put above as comment in code if anybody help me make it
easily readable/undersandable.

>> +        if(ff_acelp_lp_synthesis_filter(
>> +            synth+10,
>> +            &lp[i][1],
>> +            ctx->exc  + i * ctx->subframe_size,
>> +            ctx->subframe_size,
>> +            10,
>> +            1,
>> +            0x800))
>> +        {
>> +            /* Overflow occured, downscale excitation signal... */
>> +            for(j=0; j<2*MAX_SUBFRAME_SIZE+PITCH_DELAY_MAX+INTERPOL_LEN; j++)
>> +                ctx->exc_base[j] >>= 2;
>
> is this correct for both subframes?

Hm. it should affect only excitation signal, not reconstructed speech
of previous subframe.
Looks like "for" can be reduced to
MAX_SUBFRAME_SIZE+PITCH_DELAY_MAX+INTERPOL_LEN with proper offset.
I'll test this soon.

>
>
>> +
>> +            ff_acelp_lp_synthesis_filter(
>> +                    synth+10,
>> +                    &lp[i][1],
>> +                    ctx->exc  + i * ctx->subframe_size,
>> +                    ctx->subframe_size,
>> +                    10,
>> +                    0,
>> +                    0x800);
>> +        }
>
>> +        /* Save data (without postfilter) for use in next subframe. */
>> +        memcpy(ctx->syn_filter_data, synth+ctx->subframe_size, 10 * sizeof(int16_t));
> [...]
>> +    /* Save signal for use in next frame. */
>
> "signal" and "data" are not acceptable descriptions, these 2 words are just
> random.
>
>
>> +    memmove(ctx->exc_base, ctx->exc_base + 2 * ctx->subframe_size, (PITCH_DELAY_MAX+INTERPOL_LEN)*sizeof(int16_t));
>> +}
>> +
>
>> +/**
>> + * \brief Decodes one G.729 frame (10 bytes long) into parameter vector.
>> + * \param format used format (8k/4.4k/etc)
>> + * \param buf 10 bytes of decoder parameters
>> + * \param buf_size size of input buffer
>> + * \param parm [out] decoded codec parameters
>> + *
>> + * \return 1 if frame erasure detected, 0 - otherwise
>> + */
>> +static int g729_bytes2parm(int format, const uint8_t *buf, int buf_size, G729_parameters *parm)
>> +{
>
> You mix in this description the word parameter and frame completely
> randomly, which considering this is said to convert a frame to parameters
> just cause the reader to waste time as he will try to make sense of it until
> he realizes that there is no sense in this and that reading the source is
> needed.

In quoted comment i meant:
"G729 frame (10 bytes long)" == packed frame
"parameter vector" == structure, contains unpacked coefficients from above.
"frame" == two (perhaps, can be more, i didn't read all Annexes yet)
subframes == unpacked (decoded) frame
Each subframe contains 10ms of speech in 8kHz case.

I didn't find parameter/frame words mixing in other routines.


> besides format being what? the comment gives no hint on what to set it to
> then buf is said to be 10 bytes and buf_size is ?

"format" here is "index in formats structure"
buf_size==10 bytes for 8kHz case, but differs for 6.4kHz.

I'll rewrite this routine by putting init_get_bits to caller.


> I really dont know what i should do with your g729 patches, they need
> VERY heavy cleanup and the code that is already in svn probably still needs
> work.
> I dont really want to let these patches lay around with no review but as they
> are they need to be reviewed and cleaned up by the author or someone who knows
> g729 well enough to spot all bugs and suboptimal code without having to cross
> check the spec & reference implementation.

i can't say that i understand g729 *well*. I'm still learning it.
Unfortunately all available docs are in english and contains a lot
of unknown (for me) terms and algorithms.

> You are not doing that as it seems, instead you wait until i painstakigly
> reverse engeneer each loop and function, and list all contradictions, bugs
> and obviously suboptimal code one by one.
> Which is made even harder by having most of the comments being plain wrong
>
> I mean seriously i speak no russian but the comments could be in russian it
> wouldnt make a difference ;)
>
> So if you could read through your own code and try to clean it up, and
> fix comments that are wrong or unneeded or unclear, this would tremendously
> help! And it also would speed up review, because the cleaner the code, the
> fewer things are there i will complain about and the cleaner it is the
> better it can be understood whic means faster review cycles.

I'll try to remove 4400 mode and bfi  first and recheck comments in
remaining code in next round.

-- 
Regards,
Vladimir Voroshilov     mailto:voroshil at gmail.com
JID: voroshil at gmail.com, voroshil at jabber.ru
ICQ: 95587719




More information about the ffmpeg-devel mailing list