[FFmpeg-user] How are ffmpeg internal frames structured?

Sun Feb 7 04:24:06 EET 2021

On 02/06/2021 08:51 PM, Carl Eugen Hoyos wrote:
> Am So., 7. Feb. 2021 um 01:34 Uhr schrieb Mark Filipak (ffmpeg)
> <markfilipak at bog.us>:
>>
>> [decoder] --> [filters] --> [codecs] --> [encoder]
> 
> This looks wrong / misleading.
> There is a transcoding pipeline:
> demuxer -> decoder -> filter -> encoder -> muxer
> And there are formats, codecs, filters, devices (and more) as
> part of the FFmpeg project.
> 
>> My question is:
>> How are decoder.output and encoder.input structured?
>>
>> Yes, I know that filters can reformat the video, deinterlace, stack fields, etc.
> 
>> I have been assuming that the structure is frames of pixels like this:
>> pixel[0,0] pixel[1,0] ... pixel[in_w-1,0]  pixel[0,1] pixel[1,1] ... pixel[in_w-1,in_h-1]
> 
> (The usual style is that rows come first, than columns.)

Thanks (not x,y, eh?). Okay, then this:

pixel[0,0] pixel[0,1] ... pixel[0,in_w-1]  pixel[1,0] pixel[1,1] ... pixel[in_h-1,in_w-1]

>> In other words, raw video frames with lines abutted end-to-end
> 
> In abstract terms, this is of course correct, but note that planar and
> packed pix_fmts exist, see libavutil/pixfmt.h, lines and frames are
> often padded for performance reasons.

Okay, thanks.

>> (i.e. not deinterlaced).
> 
> I may misunderstand but I believe this part makes no sense
> in above sentence.
> 
> Carl Eugen

"not deinterlaced" == not this:

pixel[0,0] pixel[0,1] ... pixel[0,in_w-1]  pixel[2,0] pixel[2,1] ... pixel[in_h-2,in_w-1] ... 
pixel[1,0] pixel[1,1] ... pixel[1,in_w-1]  pixel[3,0] pixel[3,1] ... pixel[in_h-1,in_w-1]

and, I should add, not in macroblocks in slices.

The reason I ask is that it seems to me that some of the documentation assumes that readers already 
know this.