[FFmpeg-devel] [PATCH v5 00/12] Subtitle Filtering

Thu Sep 16 13:20:17 EEST 2021

Hendrik Leppkes (12021-09-15):
> Or perhaps some people have a day job, a life and other obligations
> that prevent them from spending time on FFmpeg every day, especially
> outside the weekend.
> But no, that can't be it, surely we are all just evil. /s

> Nothing here is being invented. You are trying to push a major API
> change to the core functionality of our libraries, these rules on how
> to do API changes on that level (or any level, really) have been in
> place and followed by everyone for years and years.
> If you don't believe that, feel free to check how any other major API
> change was handled in the past. Because the overall pattern is always
> the same.
> 
> Do you think I'm the only one thinking that way, only because I spoke
> up? The set certainly wouldn't have been applied if I hadn't said
> anything, it would've just sat there. Maybe someone else might've come
> forward eventually.

Thank you for having the patience and taking the time to explain this.

> As for the actual subject:
> - Part1, add subtitles to AVFrame in avutil. You can move
> enums/structs to avutil as needed (eg AVSubtitleRect or whatever), as
> long as they are still available the same way (eg. avcodec.h should
> include any new avutil header with them, so user code works without
> changes), but moving functions is a more tricky business, so we
> usually make new functions and cleanup their naming convention in the
> process, but I don't think any functions are involved here right now.
> 
> To dive a bit deeper into this, redundant fields should be removed,
> actual subtitle data should be refcounted (with AVBuffers in
> AVFrame->buf), all frame functionality need to properly account for
> the new data type.

There is another point to consider when designing subtitles in AVFrame:
since we intend it not only as a general API cleanup but as a prelude to
extending the API with filtering and such, we must not only think about
what is needed now, we must think about what may be needed later.

The few examples I have, not excluding questions I have not thought of:

- Right now, we have text subtitles and bitmap subtitles. But do we want
  to be able to handle mixed text and bitmap subtitles?

  To this, I would say: probably no. But we should give it a thought.

- Number of rectangles. Currently, the decoders usually output a very
  limited number of rectangles. OTOH, libass may output three alpha maps
  per glyph, that makes potentially hundreds of rectangles.

  Will the data structure be able to handle this efficiently?

  Consider also the issue of rectangles overlapping.

- Speaking of overlapping, we need a system to signal whether a new
  subtitle frame should replace the current one (like dvdsub, srt, etc.)
  or overlap with it (like ASS).

- Colorspace. All current bitmap subtitles formats are paletted. But
  palette pixel formats are bad for many treatments. Inside a filter
  chain, it would probably make sense to have them in a true color
  format.

- Global styles. ASS subtitles, in particular, contain reference to
  globally-defined styles. How do we handle that in libavfilter? I have
  no idea.

- Sparseness. Subtitles streams have gaps, and synchronization with
  other streams requires a next frame, that can be minutes away or never
  come. This needs to be solved in a way compatible with processing.

> - Part3, avfilter support for subtitles in AVFrames. At this point we
> have a defined structure to store subtitles in AVFrames, and actual
> code that can generate or consume them. When approaching this, the
> same rules apply as before, existing subtitle functionality, as crude
> as it may be, has to remain functional as exposed to the user.
> Internal under-the-hood changes are fine, as long as the user does not
> notice.
> 
> I'm not involved in internal avfilter design, so at this point you'll
> have to get Nicolas on board.

The most salient point is negotiation. I have already said it and I say
it again, because the format negotiation in libavfilter is a very
important aspect of what makes it work well, and it is also a very
tricky piece of code.

We need to decide which aspects of the subtitles formats are negotiated.

At least, obviously, the text or bitmap aspect will be, with a
conversion filter inserted automatically where needed.

But depending on the answers to the questions in part 1, we may need to
negotiate the pixel format and colorspace too.

Unfortunately, the current negotiation code is messy and fragile. We
cannot afford to pile new code on top of it. However good the new code
may be, adding it on top of messy code would only make it harder to
clean up and maintain later. I absolutely oppose that.

Therefore, if anybody wants to add something to libavfilter that
involves the negotiation, they should consider helping cleaning it up.
It would make your work easier afterwards, so you should do it if only
for that reason.

The first task, as I have mentioned, would probably be to add FATE tests
that cover the priority logic implemented in swap_sample_fmts(),
swap_samplerates(), swap_channel_layouts() and pick_format(). Basically,
we need to make sure that if any of these logic is just commented out,
specific tests will detect if by failing. Right now, some random tests
fail in some cases but other cases are completely uncovered.

Ah, and since the vacations have ended, I have much less free time to
work on this myself now.

Yes, this is a lot of work before getting anything useful, before
getting any sexy new feature. But since for a major API change to the
core functionality of our libraries, we accept nothing less.

Regards,

-- 
  Nicolas George
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 833 bytes
Desc: not available
URL: <https://ffmpeg.org/pipermail/ffmpeg-devel/attachments/20210916/fd561081/attachment.sig>