[FFmpeg-devel] [PATCH v6 1/4] doc: Explain what "context" means

Thu Jun 13 17:20:38 EEST 2024

On Wed, Jun 12, 2024 at 10:52:00PM +0200, Stefano Sabatini wrote:
> On date Tuesday 2024-06-04 15:47:21 +0100, Andrew Sayers wrote:
[...]
> My impression is that this is growing out of scope for a
> reference. The doxy is a reference, therefore it should be clean and
> terse, and we should avoid adding too much information, enough
> information should be right enough. In fact, a reference is different
> from a tutorial, and much different from a C tutorial. Also this is
> not a treatise comparing different languages and frameworks, as this
> would confuse beginners and would annoy experienced developers.
> 
> I propose to cut this patch to provide the minimal information you can
> expect in a reference, but not more than that. Addition can be added
> later, but I think we should try to avoid any unnecessary content, in
> the spirit of keeping this a reference. More extensive discussions
> might be done in a separate place (the wiki, a blog post etc.), but in
> the spirit of a keeping this a reference they should not be put here.

I would agree if we had a tradition of linking to the wiki or regular blog
posts, but even proposing internal links has generated pushback in this thread,
so that feels like making the perfect the enemy of the good.  Let's get this
committed, see how people react, then look for improvements.

In fact, once this is available in the trunk version of the website,
we should ask for feedback from the libav-user ML and #ffmpeg IRC channel.
Then we can expand/move/remove stuff based on feedback.

> 
> > +
> > + at section Context_general “Context” as a general concept
[...]
> I'd skip all this part, as we assume the reader is already familiar
> with C language and with data encapsulation through struct, if he is
> not this is not the right place where to teach about C language
> fundamentals.

I disagree, for a reason I've been looking for an excuse to mention :)

Let's assume 90% of people who use FFmpeg already know something in the doc.
You could say that part of the doc is useless to 90% of the audience.
Or you could say that 90% of FFmpeg users are not our audience.

Looking at it the second way means you need to spend more time on "routing" -
linking to the document in ways that (only) attract your target audience,
making a table of contents with headings that aid skip-readers, etc.
But once you've routed people around the bits they don't care about,
it's fine to have documentation that's only needed by a minority.

Also, less interesting but equally important - context is not a C language
fundamental, it's more like an emergent property of large C projects.  A
developer that came here without knowing e.g. what a struct is could read
any of the online tutorials that explain the concept better than we could.
I'd be happy to link to a good tutorial about contexts if we found one,
but we have to meet people where they are, and this is the best solution
I've been able to find.

> 
> > +
> > +When reading code that *is* explicitly described in terms of contexts,
> > +remember that the term's meaning is guaranteed by *the project's community*,
> > +not *the language it's written in*.  That means guarantees may be more flexible
> > +and change more over time.  For example, programming languages that use
> > +[encapsulation](https://en.wikipedia.org/wiki/Encapsulation_(computer_programming))
> > +will simply refuse to compile code that violates its rules about access,
> > +while communities can put up with special cases if they improve code quality.
> > +
> 
> This looks a bit vague so I'd rather drop this.

This probably looks vague to you because you're part of the 90% of people this
paragraph isn't for.  All programming languages provide some guarantees, and
leave others up to the community to enforce (or not).  Over time, people stop
seeing the language guarantees at all, and assume the only alternative is
anarchy.  For example, if you got involved in a large JavaScript project,
you might be horrified to see almost all structs are the same type ("Object"),
and are implemented as dictionaries that are expected to have certain keys.
But in practice, this stuff gets enforced at the community level well enough.
Similarly, a JS programmer might be horrified to learn FFmpeg needs a whole
major version bump just to add a key to a struct.  This paragraph is there to
nudge people who have stopped seeing things we need them to look out for.

If you'd like to maintain an official FFmpeg blog, I'd be happy to expand the
paragraph above into a medium-sized post, then just link it from the doc.
But that post would be too subjective to be a wiki page - JavaScript is
evolving in a more strongly-typed direction, so it would only make sense to
future readers if they could say "oh yeah this was written in 2024, JS was
still like that back then".  This paragraph is an achievable compromise -
covers enough ground to give people a way to think about the code, short enough
for people who don't care to skip over, and objective enough to belong in
documentation.  We can always change it if we find a better solution.

[...]
> > +Some functions fit awkwardly within FFmpeg's context idiom, so they send mixed
> > +signals.  For example, av_ambient_viewing_environment_create_side_data() creates
> > +an AVAmbientViewingEnvironment context, then adds it to the side-data of an
> > +AVFrame context.  So its name hints at one context, its parameter hints at
> > +another, and its documentation is silent on the issue.  You might prefer to
> > +think of such functions as not having a context, or as “receiving” one context
> > +and “producing” another.
> 
> I'd skip this paragraph. In fact, I think that API makes perfect
> sense, OOP languages adopt such constructs all the time, for example
> this could be a static module/class constructor. In other words, we
> are not telling anywhere that all functions should take a "context" as
> its first argument, and the documentation specify exactly how this
> works, if you feel this is not clear or silent probably this is a sign
> that that function documentation should be extended.

That would be fine if it were just this function, but FFmpeg is littered
with special cases that don't quite fit.  Another example might be
swr_alloc_set_opts2(), which can take an SwrContext in a way that resembles
a context, or can take NULL and allocate a new SwrContext.  And yes,
we could document that edge case, and the next one, and the one after that.
But even if we documented every little special case that existed today,
there's no rule, so new bits of API will just reintroduce the problem again.

There's a deeper issue here - as an expert, when you don't know something,
your default assumption is that it's undefined, and could evolve in future.
When a newbie doesn't know something, their default assumption is that
everybody else knows and they're just stupid.  That assumption drives
newbies away from projects, so it's important to fill in as many blanks as
possible, even if it has to be with simple answers that they eventually
evolve beyond (and feel smart for doing so).

> > + at subsection Context_lifetime Manage lifetime: creation, use, destruction
[...]
> About this I have mixed feelings, but to me it sounds like a-posteriori
> rationalization.
> 
> I don't think there is a general rule with the allocation/closing/free
> rule for the various FFmpeg APIs, and giving the impression that this
> is the case might be misleading. In practice the user needs to master
> only a single API at a time (encodering/decoding, muxing/demuxing,
> etc.)  each one with possibly slight differences in how the term
> close/allocation/free are used. This is probably not optimal, but in
> practice it works as the user do not really need to know all the
> possible uses of the API (she will work through what she is interested
> for the job at hand).

Note: I'm assuming "this" means "this section", not "this paragraph".
Apologies if it was intended as a specific nitpick about closing functions.

TBH, a lot of this document is about inventing memorable rules of thumb.
The alternative is to say "FFmpeg devs can't agree on an answer, so they just
left you to memorise 3,000+ special cases".

Let's assume learning the whole of FFmpeg means understanding 3,000 tokens
(I'm not sure the exact count in 7.0, but it's about that number if you don't
include individual members of structs, arguments to functions etc.).  Let's
also assume it takes an average of ten minutes to learn each token (obviously
that varies - AV_LOG_PANIC will take less, AVCodecContext will take more).
That means you'd have to spend 8 hours a day every day for over two months
to learn FFmpeg.  Obviously there are usable subsets, but they mostly cut out
the simple things, so don't save nearly as much time as you'd think.  If you
want people to pick up FFmpeg, they need to learn a useful subset in about 8
hours, which requires a drastically simplified explanation.

(the above is closely related to an argument from a recent post[1],
but the numbers might help explain the scale of the challenge)

There may not be an explicit rule for context lifetimes, but I've looked at the
code carefully enough to have a nuanced opinion about the number of tokens,
and the edgiest case I've found so far is swr_alloc_set_opts2() (see above).
I'm open to counterexamples, but the model discussed in this section feels
pretty reliable.

> 
> > +
> > + at subsection Context_avoptions Configuration options: AVOptions-enabled structs
> > +
> 
> > +The @ref avoptions "AVOptions API" is a framework to configure user-facing
> > +options, e.g. on the command-line or in GUI configuration forms.
> 
> This looks wrong. AVOptions is not at all about CLI or GUI options, is
> just some way to set/get fields (that is "options") defined in a
> struct (a context) using a high level API including: setting multiple
> options at once (through a textual encoding or a dictionary),
> input/range validation, setting more fields based on a single option
> (e.g. the size) etc.
> 
> Then you can query the options in a given struct and create
> corresponding options in a UI, but this is not the point of AVOptions.

There's a problem here I haven't been communicating clearly enough.
I think that's because I've understated the problem in the past,
so I'll try overstating instead:

"Option" is a meaningless noise word.  A new developer might ask "is this like a
command-line option, or a CMake option?  Is it like those Python functions with
a million keyword arguments, or a config file with sensible defaults?".
Answering "it can be any of those if you need it to be" might help an advanced
user (not our audience), but is bewildering to a newbie who needs a rough guide
for how they're likely to use it.  The only solution that's useful to a newbie
is to provide a frame of reference, preferably in the form of something they
already know how to use.

Having said all that, yes this particular answer is wrong.  Could you apply [2]
so I can start thinking about what to replace it with?

[...]

[1] https://ffmpeg.org/pipermail/ffmpeg-devel/2024-June/328970.html
[2] https://ffmpeg.org/pipermail/ffmpeg-devel/2024-June/329068.html