[FFmpeg-devel] [PATCH] Frame erasure case of pitch delay decoding

Vladimir Voroshilov voroshil
Mon Jun 29 00:41:48 CEST 2009


2009/6/29 Michael Niedermayer <michaelni at gmx.at>:
> On Sun, Jun 28, 2009 at 03:11:31PM +0700, Vladimir Voroshilov wrote:
>> 2009/6/28 Vladimir Voroshilov <voroshil at gmail.com>:
>> > 2009/6/28 Michael Niedermayer <michaelni at gmx.at>:
>> >> On Sat, Jun 27, 2009 at 09:54:06AM +0700, Vladimir Voroshilov wrote:
>> >>> 2009/6/27 Michael Niedermayer <michaelni at gmx.at>:
>> >>> > On Sat, Jun 27, 2009 at 01:49:49AM +0700, Vladimir Voroshilov wrote:
>> >>> >> Updated patch.
>> >>> >>
>> >>> >>
>> >>> >> --
>> >>> >> Regards,
>> >>> >> Vladimir Voroshilov ? ? mailto:voroshil at gmail.com
>> >>> >> JID: voroshil at gmail.com, voroshil at jabber.ru
>> >>> >> ICQ: 95587719
>> >>> >
>> >>> >> ?g729dec.c | ? ?7 ++++++-
>> >>> >> ?1 file changed, 6 insertions(+), 1 deletion(-)
>> >>> >> c66e04bc9abc50854d2bd35ac5299ae0e4b28637 ?0005-Frame-erasure-support-for-pitch-delay-decoding.176.patch
>> >>> >> From aa4b23873bc3c5e103a5756ade017558bda12961 Mon Sep 17 00:00:00 2001
>> >>> >> From: Vladimir Voroshilov <voroshil at gmail.com>
>> >>> >> Date: Thu, 11 Jun 2009 13:51:28 +0700
>> >>> >> Subject: [PATCH 05/25] Frame erasure support for pitch delay decoding
>> >>> >>
>> >>> >>
>> >>> >> diff --git ffmpeg-r19281/libavcodec/g729dec.c ffmpeg-r19281_v176/libavcodec/g729dec.c
>> >>> >> index b937235..f193123 100644
>> >>> >> --- ffmpeg-r19281/libavcodec/g729dec.c
>> >>> >> +++ ffmpeg-r19281_v176/libavcodec/g729dec.c
>> >>> >> @@ -306,7 +306,9 @@ static int decode_frame(AVCodecContext *avctx, void *data, int *data_size,
>> >>> >> ? ? ? ? ?gc_1st_index ?= get_bits(&gb, format.gc_1st_index_bits);
>> >>> >> ? ? ? ? ?gc_2nd_index ?= get_bits(&gb, format.gc_2nd_index_bits);
>> >>> >>
>> >>> >> - ? ? ? ?if(!i) {
>> >>> >> + ? ? ? ?if (frame_erasure)
>> >>> >> + ? ? ? ? ? ?pitch_delay_3x ? = 3 * ctx->pitch_delay_int_prev;
>> >>> >> + ? ? ? ?else if(!i) {
>> >>> >> ? ? ? ? ? ? ?if (bad_pitch)
>> >>> >> ? ? ? ? ? ? ? ? ?pitch_delay_3x ? = 3 * ctx->pitch_delay_int_prev;
>> >>> >
>> >>> > if(!i || frame_erasure)
>> >>> >
>> >>> > also, i think bad_pitch and frame_erasure are redundant that is the same
>> >>> > either your frame is damaged or not
>> >>> > If you disagree, id like to hear which real life system uses g729 and
>> >>> > can suffer from single bit errors not passing through a FEC code
>> >>> > (that stuff might exist but my gut feeling says it does not, so unless
>> >>> > ?you have an example i prefer the simpler code)
>> >>>
>> >>> Afaik, bad_pitch is rsult of checking most significant part of input
>> >>> parameters: vector of quantizer
>> >>> along with switched MA predictor. Getting garbage here will cause
>> >>> large quality degradaion.
>> >>> On the other side remaining parts will bring far less noise.
>> >>> I can guess that having something data in frame (even with bad vector
>> >>> of quantizer) from the standpoint of G.279 authors is still better
>> >>> than entire frame erasure.
>> >>
>> >> you keep implicitly assuming that single bit errors happen, so let me
>> >> ask again, do you know of such a case? maybe its a common thing in some
>> >> scenario but if not a parity error in any part of the frame means that
>> >> the rest of the frame will likely be random as well
>> >
>> > Well spec says (4.1.6 Computation of the parity bit):
>> > "If this bit is not identical to transmitted parity bit, it is more
>> > likely that bit
>> > _errorS_ occured during transmission." And then spec describes how to
>> > compute pitch delay of the first subframe
>> > in this case (it is also clearly says that pitch ?delay for second
>> > subframe is calculated as in common case).
>> > This is the _only_ place when parity bit usage is described. Thus
>> > bad_pitch should affect only pitch_delay.
>> >
>> > On the other side spec does not say what to do with pitch delay if
>> > frame erasure is detected.
>> > Reference code uses the same computation (as for bad parity bit) for
>> > both first and second subframes.
>> > This is the only way to avoid usage of adaptive-codebook index in the
>> > second subframe.
>> >
>> >> did you run any tests of burst error, damaged sectors/packets in terms
>> >> of subjective quality for the different variants? has ITU performed
>> >> such tests? and if so are they documented somewhere?
>> >
>> > I did only tests with random frame_erasure flag. I didn't remember
>> > exact results.
>> > If i'm not wrong, most sample remains understandable with 50% frames erased
>> > and even more for some samples.
>> >
>> > I didn't ?test bad_pitch, except ?in "pitch" ITU test, which
>> > definitely tests only specific parts
>> > of code and nothing more.
>> >
>> > And i don't know any ITU test results you saying about.
>
> summary, you didnt test it, and you dont know if ITU did nor what the
> reasoning is behind doing it like that. The spec contains it but we
> dont know if its normative or just a suggestion on how to handle errors

I tested it loooooong time before, about one year ago, when you asked me, but
didn't save them.

> sorry thats too little, i cannot approve an implementation of this part
> unless its better understood.

What is "this part" ? All frame erasure code or just this patch ?

> The understanding is needed to decide which way it should be handled.
> Also i feel its a little annying to skip 2 min testing and rather try to
> beat your view through basically repeating that you dont know at all
> why it would be better.

I don't know what test are you expecting.

This is results of several hours testing.
I've checked three modes:
1. erasure of given % of frames.
2. inverting give % of bits in each packet (code could invert the same
bits several times).
3. (3) but with parity_check disabled.

Tests were done on SPEECH ITU's test vector and results were compared
with "good" code via
tiny_psnr original.sw test.sw 2 0 0

1. % of erased frames
05%: stddev:  772.97 PSNR: 38.56 bytes:   600000/   600000
10%: stddev: 1339.89 PSNR: 33.78 bytes:   600000/   600000
30%: stddev: 1877.97 PSNR: 30.85 bytes:   600000/   600000
50%: stddev: 2105.37 PSNR: 29.85 bytes:   600000/   600000

2. % of inverted bits in each frame
05%: stddev: 2024.41 PSNR: 30.19 bytes:   600000/   600000
10%: stddev: 2450.80 PSNR: 28.53 bytes:   600000/   600000
30%: stddev: 2609.33 PSNR: 27.99 bytes:   600000/   600000
50%: stddev: 2611.21 PSNR: 27.98 bytes:   600000/   600000


3. % of inverted bits in each frame without parity check
05%: stddev: 2202.27 PSNR: 29.46 bytes:   600000/   600000
10%: stddev: 2512.19 PSNR: 28.32 bytes:   600000/   600000
30%: stddev: 2599.65 PSNR: 28.02 bytes:   600000/   600000
50%: stddev: 2663.32 PSNR: 27.81 bytes:   600000/   600000


Subjective results:
1. 50% - sound too quiet, hard to listen. In all other speech can be
easily recognised 30% of (1) is, perhaps, even better than 05% of (2).
2. 05% - contains foreign sounds, but quality is enough  to listen.
30% and 50% - hard to understand speech.
3. same as for (2)

I think, frame erasure test gives good results because "bad frames"
interleaves with "good" frame, which "repairs" speech,
while in others tests all frames are damaged. More over, i think that
05% (= 4 bits per each frame) is significant stream corruption.

Somewhere in G.729 spec i saw sentence that "fixed-codebook  was
trained on stream with 0.1% of bit errors"

If i'm not wrong g.729 is more or less stable with small number of
random bit errors.
And note, that parity bit protects about 17% of  g.729 frame (this is
the size of ac_index)

On the other side frame erasure is good against huge (when most data
of single frame is corrupted)
but rare frame corruption.

Unfortunately, spec does not cover frame erasure detection.

If you need any other tests, help me to make them.

>> >>> If you can see in case of frame erasure pitch delay is computed
>> >>> differently for both subframes, while bad_pitch
>> >>> keeps computing of pitch_delay for second subframe the same.
>> >>>
>> >>> And here we again comes to the question about bitexactness.
>> >>> If it is possible i prefer creating decoder which passes all ITU
>> >>> tests, and ONLY after that
>> >>> break it (decoder) by introducing desired optimizations.
>> >>
>> >> iam not stoping you from creating such decoder
>> >> but for ffmpeg i like to have an independant, optimal and conformant
>> >> implementation of the spec, if that includes bitexactness depends on
>> >> the spec requireing it and there being any advantage in breaking such
>> >> bitexactness to one of many implementations
>> >
>> > Hmm. About test vectors:
>> > ====
>> > This directory contains testvectors to validate the correct execution
>> >
>> > of the G.729 ANSI-C software (version 3.3). NOTE that these vectors
>> >
>> > are not part of a validation procedure. It is very difficult to design
>> >
>> > an exhaustive set of test vectors. Hence passing these vectors should
>> >
>> > be viewed as a minimum requirement, and is not a guarantee that the
>> >
>> > implementation is correct for every possible input signal.
>> >
>> >
>> > [snip]
>> >
>> > The testvectors were designed to provide as much coverage as possible
>> >
>> > in terms of parameters and algorithm. Below we indicate what parts of
>> >
>> > the algorithms are excercised. Note that none of these sequences
>> >
>> > provides an exhaustive coverage.
>> >
>> > ===
>> >
>> > "not part of a validation procedure", "should be viewed as minimum requirement"
>> > I'm in doubt again, whether passing test vector required or not for independent
>> > implementations.
>>
>> OMG. Just find another thing. In Annex D (6.4k mode) D.7 paragraph also says:
>> The ANSI C code
>> represents the normative specification of this annex. The algorithmic
>> description given by the C
>> code shall take precedence over the texts contained in the main body
>> of G.729 and in Annex D.
>>
>> Thus, when writing 6.4k decoder i should follow C code rather than specification
>> when they are saying different things, should i ?
>
> Id by happy if you followed any algorithmic description instead of following
> the implementation statement by statement while making a huge effort not to
> think about the algorithm behind.
>
> [...]
> --
> Michael ? ? GnuPG fingerprint: 9FF2128B147EF6730BADF133611EC787040B0FAB
>
> I have often repented speaking, but never of holding my tongue.
> -- Xenocrates
>
> -----BEGIN PGP SIGNATURE-----
> Version: GnuPG v1.4.9 (GNU/Linux)
>
> iD8DBQFKR9IdYR7HhwQLD6sRAkxSAJ977WAENTIX+wK/ERdmPSTMX+QSoACcCqRn
> ELj+t/k4sD0AdCwQbSON9MY=
> =v/jw
> -----END PGP SIGNATURE-----
>
> _______________________________________________
> ffmpeg-devel mailing list
> ffmpeg-devel at mplayerhq.hu
> https://lists.mplayerhq.hu/mailman/listinfo/ffmpeg-devel
>



-- 
Regards,
Vladimir Voroshilov     mailto:voroshil at gmail.com
JID: voroshil at gmail.com, voroshil at jabber.ru
ICQ: 95587719



More information about the ffmpeg-devel mailing list