[FFmpeg-devel] Ideas to replace the options system

Nicolas George george at nsup.org
Fri Dec 4 15:33:39 CET 2015


This is a rather long explanation on ideas I have to replace the options
system with something better. I will not work on it before I have made
significant progress on de-recursiving lavfi, but I can still think of it
and mature its design while walking in the street or waiting for the subway.

And of course, if someone else wants to start working on something similar,
so much the better.

TL;DR: I have put small summaries marked by this tag at the end of most long

Why do we need a new options system?

  Most importantly: escaping hell

    What is escaping hell?

      Read this as "escaping" the noun, not the verb: the task of adding a
      backslash in front of every special character in a string. It becomes
      escaping hell when you have to add backslashes in front of the
      backslashes that protect the backslashes that protect the special
      characters. See this bit in the documentation:

      -vf "drawtext=text=this is a \\\\\\'string\\\\\\'\\\\: may contain one\\, or more\\, special characters"

      And it does not even use the % expansion in drawtext. Using single
      quotes can help a little, but it works only once.

    Why do we have escaping hell?

      Because we are using strings all over the place. This is somewhat of a
      paradox since multimedia processing rarely uses strings at all. There
      are reasons to use strings, and they drove the current options system.
      A lot of our user-interface is based on strings. There is a bad reason
      for that and a good one. But the real problem is that it does not stop

      Strings for the user interface

        The bad reason we use strings everywhere for the user interface is
        that most of the user interface is thought with the command-line
        tool ffmpeg in mind rather than high-level applications using the
        API. I am not very fond of microsoft's products, but I think that if
        ffmpeg had been designed for powershell in mind from the start, the
        user interface would use much less strings (there would be a whole
        lot of other problems, though).

        The good reason is that strings are good. Reading and writing is an
        universal skill amongst users that need options. So whenever there
        is no specific way of showing or entering a value, a string will do:
        the users can read it and understand it and write it.

        For example, imagine a GUI application that shows all the options of
        a codec: numbers are spin buttons or sliders, enumerated values are
        drop-down menus, flags are check buttons, etc. But sometimes we add
        new types; applications must be ready for unknown types. For
        example, if we add a DATE type, the next version of the application
        would probably use a calendar widget, but the current version can
        not. For these fallback cases, free-form strings are the solution.
        Two side notes on that example: first, the application have no way
        of validating the syntax of a free-form string (think wireshark
        showing invalid filters in red) without actually setting the option;
        second, before Clément added type BOOL, boolean options were shown
        as 0-1 spin buttons: not optimal but fine enough, but after BOOL was
        introduced, they become strings as a fallback until the applications
        implements BOOL: still functional but less fine.

        TL;DR: strings for the user interface are good.

      Strings as intermediate storage for internal structures

        Since most of the user interface is based on strings, we have a lot
        of internal APIs to handle strings: string -> string maps
        (AVDictionary), key-values parsers, etc. These APIs are robust: they
        handle escaping, and thus allow users to type any string at any
        level. But they are at the same time too generic and too low level.

        For example, the key-value parser reads the value until the
        delimiter, handling escaping, and returns a string. Then the string
        is usually parsed according to the type of the option. This is
        simple but inefficient: consider "w=max(iw,1024),h=max(ih,768)",
        even though there is no ambiguity, the parser requires escaping the
        inner commas.

        The most prevalent case of this is the use of AV_OPT_TYPE_STRING:
        the string field is almost always re-parsed and dispatched into
        other fields. Sometimes, someone gets fed up with it and implements
        a new AV_OPT_TYPE: IMAGE_SIZE, COLOR, CHANNEL_LAYOUT, etc. But that
        can only be done with types that are generic enough to be present in
        lavu; it can never happen for types that are only used in one codec
        or filter for example.

        (Note: some people suggested to have the token parser check the
        balancing of asymmetric delimiters in order to fix the example
        above. I think this would be a very bad idea: it only fixes a few
        cases and sacrifices uniformity (think: text="2) a) Prove that x is
        in [0,5[."). The ideas I will propose below take care of this case
        and more.)

        TL;DR: we use AV_OPT_TYPE_STRING and parse later instead of using
        parsers that are aware of the type being parsed.

  Clumsy syntaxes

    Remember before Anton moved filter contexts to AVOption (which was a
    good move), some filter used a specific parser, with a syntax similar to
    the usual key=value:key=value but fine-tuned to the filter. When the
    move was done, it had to be dropped somehow, usually using a single
    string option and a different delimiter. For example, you could write
    pan=stereo:L=L+FC:R=R+FC, now you have to use | to separate channels.

    Similar cases can arise with filters with a variable number of inputs or
    outputs: we would want to be able to write in0=...:in1=..., but it is
    not possible to have an unlimited number of options like that.

  Inextensible options sets

    When a component wraps an external libraries, each option of the library
    must have a corresponding AVOption. If there are many, that takes a lot
    of work, and if the library has frequently new options, the wrapper will
    always lag behind. Many libraries like that have an introspection
    system. If they used AVOption itself, we could declare their objects as
    child objects, that is what the scale filter does with the options for
    libsws. But we can not easily wrap a different introspection system.
    Instead, we use a string that we re-parse as key=value pairs, like

  Inextensible API

    The current API uses arrays of AVOption, making sizeof(AVOption) part of
    the ABI.

A new options system


    The AVOption system uses the AV_OPT_TYPE enum to describe the type of
    options. Parsing and printing is done using big switch statements in
    opt.c. That makes it impossible to define new types and parsers from
    codecs or filters to handle specific types.

    Instead, I suggest to use (pointers to) an AVType structure (all names
    are of course just proposals) that holds pointers to functions that do
    the parsing and printing, and also initing and freeing.

    Of course, lavu must provide AVType structures for all the basic types:
    integers, floats, etc., anything that already has an AV_OPT_TYPE. But
    lavc/lavf/lavfi can define their own types, and any codec/muxer/filter
    can do too.

    TL;DR: a structure with function pointers to parse and print a
    particular type.


    It is not possible to use pointers in switch statements. But making
    switch statements on the type is a bad idea. Remember the example with
    boolean options: they used to be 0-1 integers, but now that Clément
    introduced AV_OPT_TYPE_BOOL, until the applications are updated they
    fallback to the default case of the switch statement. This is not
    efficient: boolean options can still be treated as 0-1 integers.

    Instead, I propose a system similar to Rust's trait system. For those
    more familiar with Java, a Rust trait is similar to a Java interface
    without the object-oriented sales-pitch.

    AVTypeTrait is a structure whose main purpose is to let the linker and
    libc create globally unique identifiers: its address.

    AVTypeTraitImpl is a structure with a pointer to AVTypeTrait as the
    only field.

    AVTypeTraitImplSomething (Something = Int / Float / ...) is the same as
    AVTypeTraitImpl, but with extra fields, mostly (only?) function
    pointers. C guarantees ( #12 or #13) that a pointer to
    AVTypeTraitImplSomething can be cast to a pointer to AVTypeTraitImpl.
    The first field of AVTypeTraitImplSomething must point to the unique
    instance of AVTypeTrait identifying Something.

    The way to use it is like that: av_type_trait_something_check(ti) checks
    the first field and returns true or false if it is the correct
    AVTypeTrait. If it returns true, then I know that ti is actually a
    pointer to AVTypeTraitImplSomething, and I can access its fields.

    An AVType holds a (short) list of pointers to AVTypeTraitImpl that
    provide the functions to handle an option of this type in a particular

    For example, the AVType for boolean options will have (at least) an
    AVTypeTraitImplBoolean and an AVTypeTraitImplInteger.
    AVTypeTraitImplInteger contains fields for set, get, get_range, etc.

    This was quite complex to explain, but it is actually rather simple to
    implement, and even easier to use from an application point of view:
    test than an option / object behaves like an integer with
    av_obj_is_int(), then use integer functions on it:
    av_obj_int_get_range() for example. And always test the more specific
    trait first: if boolean, create check box, else if number create spin
    button, else create text entry box.

    TL;DR: a structure with function pointers to handle a particular type
    with a generic API, and pointer magic to make the API optional.


    This is a structure holding an AVType pointer plus a few extra fields
    that give information specific to an instance of the type, for example
    initial value or range, plus opaque fields that are specific to the

  Giving types to context options

    AVClass gets a new field: AVTypeInstance *get_type(void *obj). It
    returns the AVTypeInstance of the corresponding context as a whole.

    (Note: it can not be AVType* directly, as it may cause problems for
    static initializers, especially with shared libraries.)

    It allows to init the context as a whole from a single string, which we
    do not currently do, but that is not the point.

    Most types used for contexts would implement the Fields trait. That
    means there is an API to query them for named fields, each with its
    corresponding type instance. This is very similar to what we have now,
    except the type system allows more possibilities for the types and

    In particular, if the context's AVClass does not have a get_type()
    callback but has an AVOption array, then the fallback function is a
    wrapper around the AVOption system. Components that have not yet been
    upgraded still work exactly as before.

  Context-aware parsing

    How does it solve escaping hell? Of course, it requires a bit of syntax
    change. Getting rid of escaping hell is in itself a big syntax change,
    hopefully so much for the better that people will not complain. It will
    be probably possible to keep compatibility with the current syntax when
    no escaping is used, i.e. most of the time.

    Also, the parsers for base type need to be adapted to be able to work
    nicely together.

    Substring parsers

      Parsers need to be able to operate on a substring, and stop when they
      reach the delimiters for the surrounding syntax. This is, in fact,
      rather easy to achieve.

      Think how strtol() works: consume the string while there are digits
      and return a pointer to the end of it. Then the surrounding parser can
      continue parsing at this point. Actually, all parsers should behave
      that way anyway, irregardless of escaping hell, because it is more

      And while we are at it, we should change them to accept strings as
      pointer+length or pointer+end instead of zero-terminated C strings.

      TL;DR: all parsers must be designed to work in the middle of strings.

    Self-delimiting syntax

      The syntax must allow parsers to know when their span of text is
      finished without relying on the next character(s) in the string. That
      way, the next character can be the delimiter for the surrounding
      syntax without requiring escaping.

      For example, consider a list of subexpressions separated by '+', and
      two of the subexpression happens to be a math expression: how do we
      know whether 1+2+3 means 1+2 and 3 or 1 and 2+3 or anything else?

      The syntax can be anything, and can be fine-tuned for the particular
      type at had. For key=value lists, it can be a double delimiter at the
      end, or surrounding braces. Actually, surrounding markers should
      probably preferred because of of the next point.

      Note that we probably do not need a self-delimiting syntax against
      alphanumeric delimiters: nobody will have the stupid idea of making a
      '4'-separated list of numbers, and if they do, they deserve a taste of
      escaping hell. Therefore, numbers and symbol names are already
      self-delimiting. For times, since we use commas as delimiters all over
      the place, we should allow "5h42m22" (with or without a final "s") on
      top of "5:42:22".

      Also, the delimiters do not have to be mandatory. For example, braces
      around a key=value list can be completely optional. And for times,
      5:42:22 is still accepted.

      TL;DR: individual syntaxes must be tuned to avoid ambiguity.

    List of forbidden characters

      Parsers for AVType accept a list of forbidden characters, typically
      delimiters for the surrounding syntax. If they encounter one of these
      characters, they should stop parsing, just as if they encountered the
      end of the string.

      Except if they have good reason not to.

      The obvious good reason is that the character is prefixed with a
      backslash. That is escaping, escaping is really unavoidable. But that
      is not escaping hell, since there is only one level.

      Another good reason is that the character appears inside balanced
      delimiters: parentheses, braces. This is valid because the parsing
      would otherwise fail. Consider the example I used before, slightly
      extended: "w=8+max(iw,1024),...". If the parser stops at the inner
      comma, then it returns the successful parsing of the expression "8",
      the surrounding parsers will see a '+' instead of their expected
      delimiter and stop, all the way to the top.

      Note that it is up to parsers to decide what constitute a good reason,
      and in particular balanced delimiters. A XML parser (unfortunately, we
      will need one at some point for some web formats) shall consider <...>
      as balanced delimiters, and thus require no escaping for ':' in
      namespaced attributes, but not parentheses. And conversely, the
      expression parser shall consider balanced parentheses, but certainly
      not comparison operators (hopefully, at some point we will be able to
      write "x<42?40:50" instead of "if(lt(x,42),40,50)").

      TL;DR: parsers take an argument telling them what the delimiters are
      for the surrounding syntax.

  Backward compatibility with AVOption

    If a context is designed to use the new system, it will appear to have
    no AVOption of its own. It already happens, it is not an API break; we
    should be careful removing existing AVOption arrays, though. On the
    other hand, av_opt_set() will work, just setting the string (possibly
    with a parser in back-to-escaping-hell mode, for maximum compatibility)
    through the new system; the other av_opt_set_xxx() function will work if
    the field implements the corresponding trait.

  Extra features

    This system allows a few new interesting features. Some of them just
    thanks to no longer worrying about sizeof(AVOption).

    Special syntaxes

      Any component can define its own syntax. It should not be abused,
      since consistency is also good, but it will be useful sometimes.


      A field can accept different types, both at API level and for the
      user. For example, a video size can be both a whole to accept size
      names ("hd720") or individual numbers w and h.

    Namespaced sub-structures

      If the field "f" is itself a structure made of fields, including "a"
      and "b", then several syntaxes can be allowed to set it:
      "f.a=5:f.b=3", "f={a=5:b=3}", and optionally just "a=5:b=3" if there
      are no field "a" and "b" in the parent structure.


      Fields can have a type that wraps their real type to perform extra
      actions. For example set another field to indicate whether the option
      was set by the user or left to default.

    Varying options

      New options can appear or disappear according to previously set
      options, like the number of inputs for a filter. For example, a codec
      context could accept "codec=libx264:crf=20" (but not

    Embedded documentation

      Types and fields can contain documentation, more than the simple
      string currently in AVOption. An API should be available to build a
      single documentation page for a given set of elements, pulling the
      necessary dependencies (description for the syntax of fields) only
      once, and at various detail levels: short summary for a tooltip or
      full text with examples for the web page.

    Syntax validation and autocompletion 

      Parsers should have a dry-mode run where they read the string but do
      not set values, to allow applications to check fields early. They
      could even return suggested completions or corrections. (This is
      somewhat incompatible with varying options, we can live with that.)


  This has been a very lengthy exposition. Actually, I believe
  implementation would not be that long. Well, longer than text, of course,
  but not as gigantic as the explanation suggests. And a lot of steps can be
  made incrementally.

  IMHO, the result would be both a better design and an enhanced user

Personal note: if you skimmed through the whole thing and did not find it
completely uninteresting, I would appreciate even short quick feedback,
even "looks interesting, will read more carefully later".


  Nicolas George
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 819 bytes
Desc: Digital signature
URL: <http://ffmpeg.org/pipermail/ffmpeg-devel/attachments/20151204/20dc60cb/attachment.sig>

More information about the ffmpeg-devel mailing list