[FFmpeg-devel] [PATCHv4] lavf: add libopenmpt demuxer

Jörn Heusipp osmanx at problemloesungsmaschine.de
Sat Jul 9 14:39:16 EEST 2016


On 07/08/2016 10:17 PM, Nicolas George wrote:
> Le primidi 21 messidor, an CCXXIV, Jörn Heusipp a écrit :

>> Regarding AVProbeData:
>> Looking at AVProbeData, I can see no (optional) field describing the file
>> size:
>> typedef struct AVProbeData {
>>      const char *filename;
>>      unsigned char *buf; /**< Buffer must have AVPROBE_PADDING_SIZE of extra
>> allocated bytes filled with zero. */
>>      int buf_size;       /**< Size of buf except extra allocated bytes */
>>      const char *mime_type; /**< mime_type, when known. */
>> } AVProbeData;
>> Sadly, that makes it rather useless for probing module formats: There are
>> module formats which have no magic bytes at all, or very bad ones which
>> require verifying other simple parts of the header in order to determine any
>> meaningful probing result, some even require seeking through the file and
>> verifying other later parts, some even have a footer that may need to be
>> verified.
>> Because of this situation, the libopenmpt I/O layer absolutely needs to know
>> the file size in order to do anything useful. In our adaption layer for
>> streams (like stdin or HTTP without length information), we lazily pre-cache
>> the whole file until hitting EOF as soon as the size is required or the code
>> wants to seek to the end. As any kind of streaming does not apply to module
>> formats, this was (and is) a sane design choice for libopenmpt.
>
> The probing infrastructure can not provide the file size, for all we know it
> could be an infinite stream coming from a live capture device.

Well, I think it could optionally provide it if it knows, as an 
additional hint to probe functions. I do understand however, that 
supporting probing for such non-streamable formats is probably not a 
primary design goal of ffmpeg. I'm not arguing for changing ffmpeg 
internals here.


> Naively, I would suggest to assume that AVProbeData contains the whole file
> and try probing that. That would be like trying to play a file that was
> truncated on disk for some reason. If the probing fails, libavformat will
> read some more and try again until it reaches the probe size limit.

I was not aware of that strategy used by ffmpeg. Skimming through the 
ffmpeg code, you seem to be referring to av_probe_input_buffer2().
I think your suggestion can work.


> For an obscure format (I mean it without any disrespect, only as a fact with
> regard to FFMpeg's usual use), false negatives on the probing are not that
> big a problem: it will fall back to using the extension, and if even that
> fails the user can still specify "-f libopenmpt" or the equivalent in their
> application.

These formats are obscure, but they exist, are old, and are not going to 
change, thus we are stuck with this mess (no disrespect taken ;).
If false-negatives are acceptable for ffmpeg, using your suggestion of 
just pretending AVProbeData contained the whole file will work for 
almost all cases. I do not think that we (libopenmpt) can guarantee that 
this will not also introduce false-positives, though they should be 
really rare (I cannot think of such a case right now, but I also do not 
think that we want to make the promise made by our API more specific for 
the truncated data case).
I also was not aware of the fallback to file extension, which totally 
makes sense.
Specifying libopenmpt explicitly would of course be easy for users of 
the ffmpeg program itself, but maybe not so much for users who use 
libavformat through some other library/framework/player.

I did a really quick, non-exhaustive check on some files, and libopenmpt 
currently gives pretty reliable positive probing results for files 
truncated to 4096 bytes. 1024 is not sufficient as standard ProTracker 
MOD files have the magic bytes "M.K." at offset 1080.

Starting calling openmpt_could_open_propability() only after more than 
4096 are available would be a trade-off between on the one hand 
performance (not calling into libopenmpt at all if other ffmpeg demuxers 
were able to probe successfully with fewer data) and maybe reduced 
false-positives, and on the other hand reduced false-negatives (there 
are perfectly valid and usable module files smaller than 4096 bytes, 
even smaller than 1024 bytes is possible for some formats).


>> Regarding probing performance:
>
> A thing to consider when discussing probing performance: the most important
> is the speed of the obviously negative answers. A Matroska file or a MP3
> file does not look at all like a tracker file, and libopenmpt should be able
> to figure it out very quickly. Only when the file actually looks like a
> tracker file should it make extra checks to be sure.
>
> If that condition is met, then probing performance is probably not an issue:
> users playing non-tracker file will not suffer from it.

I wish we could have this kind of early-reject in our probing function, 
but module formats without any file magic bytes at all prohibit such an 
implementation.

For formats which have magic bytes, we of course check these first and 
continue with the next format if they do not match. We still have to 
check the formats which have no or only bad magic numbers afterwards though.
The format used by the original Ultimate Soundtracker (M15, 4 channel, 
15 samples MOD format without any file magic bytes at all) is the most 
problematic here (the following explanation may require rough prior 
knowledge about the fundamental concepts of module files):
It starts with 20 bytes of supposedly ASCII characters which represent 
the song name, followed by 15 sample headers, each containing again 22 
bytes of name, a length, volume, tuning, and loop information. That is 
followed by the pattern order list.
That is followed by the raw pattern data, followed by the raw sample data.
For probing, libopenmpt looks at the song name, sample headers and order 
list and checks anything for plausibility using heuristics. We can 
reject some cases where values have limited allowed range, but 
ultimately, it ends up being just guessing. See 
https://source.openmpt.org/browse/openmpt/trunk/OpenMPT/soundlib/Load_mod.cpp 
CSoundFile::ReadM15.
Whatever smart heuristics we come up with, this prohibits any kind of 
early-reject, and is basically guaranteed to trigger false positives in 
some cases.

As we are (for historic reasons) currently using the same code paths for 
probing as we are using to actually load a module file and just return 
early after the initial checks, we still instantiate all kinds of 
internal structures which involves multiple memory allocations before 
even looking at the file data at all.
In theory, we could improve this by duplicating the probing code into 
some explicit probing functionality, but this is probably not something 
we will do in the short term, and may not even do at all.


Another possibility might be to explicitly check for some magic bytes 
that turn out to be too often wrongly detected as module files directly 
in the ffmpeg libopenmpt demuxer, before even calling 
openmpt_could_open_propability() at all, in case the libopenmpt probing 
turns out to be too slow to be acceptable for ffmpeg.


In case libopenmpt tends to detect too many false-positives, this could 
be handled by tuning the relative probing scores of libopenmpt and 
possibly other affected demuxers.


> By the way, any thought on getting openmpt into Debian? libmodplug has the
> very significant advantage of only being an apt-get install away.

It's on the roadmap, and we are in contact with a fellow Debian 
developer, but he has been rather busy with other things.


Regards,
Jörn


More information about the ffmpeg-devel mailing list