[Libav-user] Advice on video decoding order madness

Thu Jun 26 21:35:00 CEST 2014

On Thu, 26 Jun 2014 21:15:45 +0200
Info || Non-Lethal Applications <info at non-lethal-applications.com>
wrote:

> 
> On 26 Jun 2014, at 20:35, wm4 <nfxjfg at googlemail.com> wrote:
> 
> > On Thu, 26 Jun 2014 17:29:29 +0200
> > Info || Non-Lethal Applications <info at non-lethal-applications.com>
> > wrote:
> > 
> >> Hi there guys,
> >> 
> >> I wrote my own audio/video player based on FFmpeg/libav.
> >> It works basically but I’d need your advice on some decoding issues I ran into.
> >> 
> >> Being a video-coding newbie, I first thought that it would be wise to just have one function returning decoded video frames at given indices and another giving me audio samples at given indices.
> >> However, I had to realize that randomly accessing the movie file is not working that well and that the file stream should be read in order.
> >> I have to mention that I’m not 100% sure if I did everything the right way though.
> >> 
> >> I then changed my strategy to have one function returning video frames.
> >> The function reads the video frames and tries to also decode the audio packets that “belong” to that video frames.
> >> So that when a given frame is to be presented, the audio can also be played back in sync with the video.
> >> 
> >> This worked pretty well for a couple of files. I found out that these files had the audio info before the video info it belongs to.
> >> So I always read the audio automatically when I read the next video frame.
> >> 
> >> I then opened a DV file with two audio tracks and there was silence … I noticed that there were 29 video frames before the first piece of audio.
> >> After a couple of thousand audio samples for the first track, there were a couple of thousand audio samples for the second track.
> >> 
> >> Can anyone with more experience in this field please tell me what’s the best way to read audio and video in this case so that it can be played back in sync properly?
> >> I could of course cache 29 video frames, but I don’t think that’s the best solution. And what if in the next file, there’s 50 frames?
> >> Is it better to just seek and work on a random access basis or does this totally confuse the decoder?
> >> 
> >> Any advice is highly appreciated!!
> > 
> > You can decode the audio and video independently from each other. You
> > should synchronize the _presentation_ of them using the PTS (PTS =
> > presentation timestamp, after all). So audio of a given PTS is played
> > at the same time as a video frame with the same PTS is displayed.
> > 
> > The simplest way to keep them synchronized is to play the audio as it
> > is, and then adjust the video timestamps to the currently playing audio.
> 
> Thanks for your answer. 
> Playing back is not the issue. I already have a working solution for that.
> 
> Could you elaborate a bit more on your first part please? 
> “You can decode the audio and video independently from each other.”

Well, you can just decode for whatever you get packets for. Or you
could decode them in separate threads (you just need something to
distribute the packets read by libavformat to the correct decoder).
Basically, there is no "order" in which they should be decoded.

> How would I do that? My main problem is the decoding order.
> As written above, I have a movie file with 29 video frames before the first audio frames.
> To read the audio information without throwing anything away, I’d need to cache 29 video frames.
> Is this really the way to go?

Oh I see. To avoid memory bloat, you could queue undecoded packets.
Then you'd read frames until you have your first video frame and your
first audio frame. If you have a video frame, but no audio yet, you can
keep reading packet, and put video packets into a queue (so that they
can be decoded later). How you synchronize playback start is up to you.
If the first video PTS is different from the first audio PTS, you have
to decide what you prefer. Throw away video, throw away the audio,
insert silence/blackness, or start them both at the same time
and let normal A/V sync figure it out; whatever.

Using this approach could lead to large buffering of packets with bad
files. In this case I'd just set a maximum queue size, and start
decoding without video or audio.

> Should I read a video frame, seek to the audio frame, read it, seek back to the next video frame … ?

No, that won't work. Seeking usually jumps to keyframes.