[FFmpeg-devel] [PATCH] avcodec: Vorbis decode: don't use a flag to determine if frames have been output

Mon Oct 17 11:15:22 EEST 2022

On Thu, Sep 8, 2022 at 10:26 AM <jyrkive at nekonyansoft.com> wrote:
>
> From: Jyrki Vesterinen <jyrkive at nekonyansoft.com>
>
> If a developer using FFmpeg libraries seeks into an earlier position and calls
> avcodec_flush_buffers() afterwards as recommended, the Vorbis decoder will drop
> the next frame, since buffer flushing clears the first_frame flag. As a result,
> the audio samples the calling code receives may be ahead of the requested seek
> position, which is unacceptable in some use cases such as playing a looping
> sound effect.
>
> This commit removes the first_frame flag entirely and instead uses the
> presentation timestamp to determine if it's the first frame.
> ---
>  libavcodec/vorbisdec.c | 5 +----
>  1 file changed, 1 insertion(+), 4 deletions(-)
>
> diff --git a/libavcodec/vorbisdec.c b/libavcodec/vorbisdec.c
> index 4d03947c49..d4b030d7b9 100644
> --- a/libavcodec/vorbisdec.c
> +++ b/libavcodec/vorbisdec.c
> @@ -130,7 +130,6 @@ typedef struct vorbis_context_s {
>      AVFloatDSPContext *fdsp;
>
>      FFTContext mdct[2];
> -    uint8_t       first_frame;
>      uint32_t      version;
>      uint8_t       audio_channels;
>      uint32_t      audio_samplerate;
> @@ -1845,8 +1844,7 @@ static int vorbis_decode_frame(AVCodecContext *avctx, AVFrame *frame,
>      if ((len = vorbis_parse_audio_packet(vc, channel_ptrs)) <= 0)
>          return len;
>
> -    if (!vc->first_frame) {
> -        vc->first_frame = 1;
> +    if (frame->pts < 0) {
>          *got_frame_ptr = 0;
>          av_frame_unref(frame);
>          return buf_size;
> @@ -1881,7 +1879,6 @@ static av_cold void vorbis_decode_flush(AVCodecContext *avctx)
>                               sizeof(*vc->saved));
>      }
>      vc->previous_window = -1;
> -    vc->first_frame = 0;
>  }
>
>  const FFCodec ff_vorbis_decoder = {
> --
> 2.37.2.windows.2
>

This change seems to be rather fragile and faulty, causing vorbis
decoding to fail in various scenarios for a bunch of downstream
projects.

- A user may not set pts at all, resulting in all frames being dropped
(pure audio files don't necessarily need timestamps)
- A seek could happen before any frame is ever decoded, resulting in
wrong drops, potentially in the middle of playback if the user seeks
backwards after opening in the middle.

In general, using timestamps to control decoder behavior is often just
wrong, as timestamps are not reliable, and most importantly, not tied
to the bitstream at all.

Can we revert this and re-think the approach?

- Hendrik