[FFmpeg-devel] [PATCH] WIP: subtitles in AVFrame

wm4 nfxjfg at googlemail.com
Mon Nov 21 13:48:53 EET 2016


On Sun, 20 Nov 2016 13:52:02 +0100
Clément Bœsch <u at pkh.me> wrote:

> On Fri, Nov 11, 2016 at 04:46:51PM +0100, wm4 wrote:
> [...]
> > > +
> > > +            frame->sub_start_display = sub.start_display_time ? av_rescale_q(sub.start_display_time, av_make_q(1, 1000), AV_TIME_BASE_Q)
> > > +                                                              : AV_NOPTS_VALUE;
> > > +            frame->sub_end_display   = sub.end_display_time   ? av_rescale_q(sub.end_display_time,   av_make_q(1, 1000), AV_TIME_BASE_Q)
> > > +                                                              : AV_NOPTS_VALUE;  
> > 
> > end_display_time can be UINT32_MAX for "infinite". I think we should
> > clean this up here, and differentiate between "0", "unknown", and
> > "until next event".
> >   
> 
> I don't even know what's the convention today, so this part is currently
> mostly ignored from this first prototype.

dvdsubdec.c leaves end_display_time simply at 0 in some cases (which
means show until next event), while pgssubdec.c always sets it to
UINT32_MAX (which apparently means the same). I don't know why it
was done this way (see 0c911c8fbc).

> What would be the difference between "infinite" and "until next
> (potentially clear) event"?

Probably none.

The important part for non-infinite end time values is whether the
subtitles get strictly replaced by other (later) subtitles, or whether
they overlap.

With the existing API, you have to guess by the codec ID.

> 
> [...]
> > > +static int get_subtitle_buffer(AVFrame *frame)
> > > +{
> > > +    int i, ret;
> > > +
> > > +    ret = get_data_buffer(frame, frame->sub_nb_rects, sizeof(AVFrameSubtitleRectangle));
> > > +    if (ret < 0)
> > > +        return ret;
> > > +    for (i = 0; i < frame->sub_nb_rects; i++)
> > > +        memset(frame->extended_data[i], 0, sizeof(AVFrameSubtitleRectangle));
> > > +    return 0;  
> > 
> > I'm noticing it doesn't allocate the actual subtitle bitmap or text
> > data, which is a difference from audio/video.
> >   
> 
> Yes, exactly. This is tricky. It's really tricky because contrary to audio
> and video, we can not know the size required for every (not yet allocated)
> rectangle from the raw format information alone.
> 
> For video, we have the pixel format, and dimensions. For audio we have the
> sample format and number of samples. But for subtitles, we can at most
> have the number of rectangles.
> 
> To workaround this problem, I just made the get_buffer callback allocate
> the rectangles holder. The decoder will then allocate every rectangle data
> (of different sizes) manually, be it text or actual bitmap rectangle.
> 
> But indeed this means that every ref counting will mean a new allocation
> and copy of the rectangles. As subtitles are meant to be small and sparse,
> I'm assuming it's not exactly a problem.
> 
> Do you have a better suggestion?
> 
> Maybe we could have the user (decoders) allocate the rectangles holder and
> fill all the sizes instead, and then call the get buffer, but this might
> be annoying as it will require for most of them to do it in two-pass
> instead of one as of today.
> 

Seems to me that av_frame_get_buffer() would just have confusing
semantics here, and it'd be better to have dedicated allocation
functions.

But the important part is that the subtitle data is actually properly
refcounted/freed, and that things like copy-on-write actually work.

> > >  }
> > >  
> > >  int av_frame_get_buffer(AVFrame *frame, int align)
> > >  {
> > > +    if (frame->sub_nb_rects)
> > > +        return get_subtitle_buffer(frame);  
> > 
> > No format set is valid for subtitles?
> >   
> 
> Yes, no format set means text. If the format is set, it defines the format
> used by the rectangles.

Seems slightly questionable.

> [...]
> > > +    /* image data for bitmap subtitles, in AVFrame.format (AVPixelFormat) */
> > > +    uint8_t *data[AV_NUM_DATA_POINTERS];
> > > +    int linesize[AV_NUM_DATA_POINTERS];  
> > 
> > Does it define the format somewhere?
> >   
> 
> in AVFrame.format

Right. 

> > > +} AVFrameSubtitleRectangle;  
> > 
> > The struct should probably be extendable (so its size is not part of
> > the ABI).
> >   
> 
> In the current design, there is no need to ever allocate them: you set the
> number of rectangles in your AVFrame, then call av_frame_get_buffer()
> which will allocate them.

You still need to make this explicit. Also, allocation isn't the only
operation you can do on a struct.

> > > @@ -224,9 +242,12 @@ typedef struct AVFrame {
> > >       * For packed audio, there is just one data pointer, and linesize[0]
> > >       * contains the total size of the buffer for all channels.
> > >       *
> > > +     * For subtitles, each pointer corresponds to an AVFrameSubtitleRectangle.
> > > +     *  
> > 
> > It should also mention how many data pointers there are.
> > 
> > Btw. I'm very not much of a fan to have to deal with extended_data. And
> > the duplication with data. (Does it even document what data[] contains
> > for subtitles?)
> >   
> 
> It's exactly like audio: data[] points on the firsts extended_data[]

Well, for audio and video, I have my own structs instead of using
AVFrame (because AVFrame is very restrictive, not extensible, and hard
to use). And wiring up these extended* things is terrible. Actually,
the worse offenders are buf/extended_buf because of their weird
semantics.


More information about the ffmpeg-devel mailing list