[FFmpeg-devel] RFC: new packed pixel formats (machine vision)
martin schitter
ms+git at mur.at
Tue Oct 22 14:11:04 EEST 2024
On 22.10.24 08:50, Diederick C. Niehorster wrote:
>> I want to pick up a discussion i started last week
>> (https://ffmpeg.org/pipermail/ffmpeg-devel/2024-October/334585.html)
>> in a new thread, with the relevant information nicely organized. This
>> is about adding pixel formats common in machine vision to ffmpeg
>> (though i understand some formats may also be used by cinema cameras),
>> and supporting them as input formats in swscale so that it becomes
>> easy to use ffmpeg for machine vision purposes (I already have such
>> software, it will be open-sourced in good time, but right now there is
>> a proprietary conversion layer from Basler i need to replace (e.g. by
>> this proposal)).
most of your point do not look so much machine learning or computer
vision specific, but more like typical/traditional video tech
peculiarities. More ML related obstacles come into play, if have to
support optimized calculations with uncommon small bit sizes, etc. But
most of your described issues should be solvable easily by already
available features of ffmpeg, if I'm not wrong.
>> Example formats are 10 and 12 bit Bayer formats, where the 10 bit
>> cannot be represented in AVPixFmtDescriptors as currently as effective
>> bit depth for the red and blue channels is 2.5 bits, but component
>> depths should be integers.
As bits will always be distinct entities, you don't need more than
simple natural numbers to describe their placement and amount precisely.
ffmpeg already supports the AV_PIX_FMT_FLAG_BITSTREAM to switch some
description fields from byte to bit values. That's enough to describe
the layout of most pixelformats -- even those packed ones, which are not
aligned to byte or 32bit borders. You just have to use bit size values
for step and offset stuct members.
But there is another common case, which is indeed not describable with
ffmpeg current stuct: color components can be composed out of separated
MSb and LSb parts at different places in the component sequenz --
similar to the color examples BayerRG12g40 and BayerRG12g24 in your
linked examples. Although these examples are indeed a little bit more
complex, because they may describe arrangements, which differ between
even and odd lanes. The bit packing for 10 and 12bit data in
DNxUncompressed entails a similar issue, by packing all LSb information
as one block at the end of every scan line.
For the simple case of just separated MSb and LSb locations within
otherwise simply repeating pixel bits group it could be solved by
extending the description in a similar way as used in the RGBALayout
description sequenz of MXF -- see G.2.40/p174 of
https://pub.smpte.org/latest/st377-1/st377-1-2019.pdf
More complex arrangements should be IMHO simply converted by application
specfic handling to more common formats, but don't get an overly complex
ffmpeg pixel description.
>> Other example formats are 10bit gray
>> formats where multiple values are packed without padding over multiple
>> bytes (e.g. 4 10-bit pixels packed into 5 bytes, so not aligned to 16
>> or 32 bits).
That's no problem, as already explained.
The unpacking of this kind of date to more sparse 16 bit aligned
structures can be handled very efficient by using PDEP intrinsics of
modern CPUs, as long as the order of components fits. Component order
swapping is unfortunately a slightly more inefficient operation in case
of packed image date, while it can be solved much more easily in case of
planar data arrangements by pointer swaps.
>> Here a proposal for how these new formats could be encoded into
>> AVPixFmtDescriptor, so that these can then be used in ffmpeg/swscale.
I think swscale and the internal processing of ffmpeg should not be
support an endless amount of arbitrary pixel formats, but be focused on
a really useful minimal set of required base formats.
I would look at vulkans pixel format list as modern example for more
systematic list of elementary pixel data storage variants.
(https://docs.vulkan.org/spec/latest/chapters/formats.html)
>> - AV_PIX_FMT_FLAG_BITPACKED_UNALIGNED which indicates formats that are
>> bit-wise packed in a way that is not aligned on 1, 2 or 4 bytes (e.g.
>> 4 10-bit values in 5 bytes). This flag is needed because
>> AV_PIX_FMT_FLAG_BITSTREAM
>> formats are aligned to 8 or 32 bits, ...
Is this really the case?
But in generals you should better describe byte/32bit aligned bitbacked
formats by using explicit "fill" (X, etc.) pseudo components, than you
can simply indicate aligned and unaligned groups by the actual sum of
defined bits res. the reminder of a division by the alignment bit size
count.
I hope, that's at least inspiring food for thought... ;)
Martin
More information about the ffmpeg-devel
mailing list