[FFmpeg-devel] How to support audio data placed within video data

Sat Nov 30 10:50:00 EET 2024

Quoting Manuel Lauss (2024-11-28 21:58:09)
> On Thu, Nov 28, 2024 at 3:19 PM Anton Khirnov <anton at khirnov.net> wrote:
> >
> > Quoting Manuel Lauss (2024-11-26 15:25:30)
> > > Hello,
> > >
> > > I'd like to add some audio support for the old libavformat/smush
> > > formats (mainly the "ANIM" variants; the "SANM" variant already has
> > > audio decoding support).
> > >
> > > The audio data (16bit stereo PCM) however is placed at (more or less)
> > > random places within all the video data, also with no relation to the
> > > actual video frame it is embedded into (i.e. most files place a few
> > > hundred ms of audio in the first video frame, while the rest are
> > > roughly the length of a video frame).
> > >
> > > What is the best way to support this scenario?
> >
> > Meaning you have to parse the coded bytestream to get the audio? Is
> > there at least some signalling that audio is present at al?
> 
> No, audio and video are distinct chunks which are again contained
> in a super chunk. This super chunk (a "FRME" since it encodes exactly
> one video frame) is passed by smush.c as a video packet to sanm.c.
> It make some sense since besides pure video/audio data it also
> contains (delta-)palette data, instructions to store/restore a frame,
> subtitles for this video frame, ... and also a few kB of audio which
> is not tied (on the timeline) to the video frame also in this super chunk,
> and it's not even a full standalone audio packet either, it may depend
> on data from the previous FRME and also provide a few bytes of data
> for the following one.
> (i think this was designed for streaming from slow cdroms).
> 
> The end of one super chunk then signals to display the video/audio data
> worked on so far.

Then it seems preferable to have the demuxer extract the audio.

> > The options I can think of are:
> > * parse the bytestream in the demuxer
> 
> So pass the individual chunks of the super chunk off to the
> video/audio codec as packets?  Can I invoke the codecs
> decode function multiple times per frame?

In principle yes, but we generally prefer (when feasible) for a packet
to contain exactly one frame. Though I don't quite see why that should
be needed - you're saying above that the super chunk contains exactly
one video frame.

-- 
Anton Khirnov