[FFmpeg-user] bwdif filter question

Tue Sep 22 11:20:52 EEST 2020

Hello,

>> I'm not entirely aware of what is being discussed, but progressive_frame = !interlaced_frame kind of sent me back a bit, I do remember the discrepancy you noted in some telecopied material, so I'll just quickly paraphrase from what we looked into before, hopefully it'll be relevant.
>> The AVFrame interlaced_frame flag isn't completely unrelated to mpeg progressive_frame, but it's not a simple inverse either, very context-dependent. With mpeg video, it seems it is an interlaced_frame if it is not progressive_frame ...
> 
> No so, Ted. The following two definitions are from the glossary I'm preparing (and which cites H.262).

Ah okay I thought that was a bit weird, I assume it was a typo but I saw h.242 and thought two different types of "frames" were being mixed. Before saying anything if the side project you mentioned was a layman’s glossary type reference material, I think you should base it off of the definitions section instead of the bitstream definitions, just my $.02. I read over what I wrote and I don't think it helps at all, let me try again, I am saying that there are the "frames" in the context of a container, and a different kind of video "frame" that has a width and height dimension. (When I wrote "picture frames" I meant to refer to physical wooden picture frames for photo prints, but with terms like frame pictures in play not very effective in hindsight)

> Since you capitalize "AVFrames", I assume that you cite a standard of some sort. I'd very much like to see it. Do you have a link?

This was the main info I was trying to add, it's not a standard of any kind, quite the opposite, actually, since technically its declaration could be changed in a single commit, but I don't think that is a common occurrence. AVFrame is a struct that is used to abstract/implement all frames in the many different formats ffmpeg handles. it is noted that its size could change as fields are added to the struct.

There's documentation generated for it here: https://www.ffmpeg.org/doxygen/trunk/structAVFrame.html

> H.262 refers to "frame pictures" and "field pictures" without clearly delineating them. I am calling them "pictures" and "halfpictures".

I thought ISO 13818-2 was basically the identical standard, and it gives pretty clear definitions imo, here are some excerpts. (Wall of text coming up… standards are very wordy by necessity)

> 6.1.1. Video sequence
> 
> The highest syntactic structure of the coded video bitstream is the video sequence.
> 
> A video sequence commences with a sequence header which may optionally be followed by a group of pictures header and then by one or more coded frames. The order of the coded frames in the coded bitstream is the order in which the decoder processes them, but not necessarily in the correct order for display. The video sequence is terminated by a sequence_end_code. At various points in the video sequence a particular coded frame may be preceded by either a repeat sequence header or a group of pictures header or both. (In the case that both a repeat sequence header and a group of pictures header immediately precede a particular picture, the group of pictures header shall follow the repeat sequence header.)
> 
> 6.1.1.1. Progressive and interlaced sequences
> This specification deals with coding of both progressive and interlaced sequences.
> 
> The output of the decoding process, for interlaced sequences, consists of a series of reconstructed fields that are separated in time by a field period. The two fields of a frame may be coded separately (field- pictures). Alternatively the two fields may be coded together as a frame (frame-pictures). Both frame pictures and field pictures may be used in a single video sequence.
> 
> In progressive sequences each picture in the sequence shall be a frame picture. The sequence, at the output of the decoding process, consists of a series of reconstructed frames that are separated in time by a frame period.
> 
> 6.1.1.2. Frame
> 
> A frame consists of three rectangular matrices of integers; a luminance matrix (Y), and two chrominance matrices (Cb and Cr).
> 
> The relationship between these Y, Cb and Cr components and the primary (analogue) Red, Green and Blue Signals (E’R , E’G and E’B ), the chromaticity of these primaries and the transfer characteristics of the source frame may be specified in the bitstream (or specified by some other means). This information does not affect the decoding process.
> 
> 6.1.1.3. Field
> 
> A field consists of every other line of samples in the three rectangular matrices of integers representing a frame.
> 
> A frame is the union of a top field and a bottom field. The top field is the field that contains the top-most line of each of the three matrices. The bottom field is the other one.
> 
> 6.1.1.4. Picture
> 
> A reconstructed picture is obtained by decoding a coded picture, i.e. a picture header, the optional extensions immediately following it, and the picture data. A coded picture may be a frame picture or a field picture. A reconstructed picture is either a reconstructed frame (when decoding a frame picture), or one field of a reconstructed frame (when decoding a field picture).
> 
> 6.1.1.4.1. Field pictures
> 
> If field pictures are used then they shall occur in pairs (one top field followed by one bottom field, or one bottom field followed by one top field) and together constitute a coded frame. The two field pictures that comprise a coded frame shall be encoded in the bitstream in the order in which they shall occur at the output of the decoding process.
> 
> When the first picture of the coded frame is a P-field picture, then the second picture of the coded frame shall also be a P- field picture. Similarly when the first picture of the coded frame is a B-field picture the second picture of the coded frame shall also be a B-field picture.
> 
> When the first picture of the coded frame is a I-field picture, then the second picture of the frame shall be either an I-field picture or a P-field picture. If the second picture is a P-field picture then certain restrictions apply, see 7.6.3.5.
> 
> 6.1.1.4.2. Frame pictures
> 
> When coding interlaced sequences using frame pictures, the two fields of the frame shall be interleaved with one another and then the entire frame is coded as a single frame-picture.

So field pictures are decoded fields, and frame pictures are decoded frames? Not sure if I understand 100% but I think it’s pretty clear, “two field pictures comprise a coded frame.” IIRC field pictures aren’t decoded into separate fields because two frames in one packet makes something explode within FFmpeg

>>> But that's a visual projection of the decoded and rendered video, or if you're encoding, it's what you want to see when you decode and render your encoding. I think the term itself has a very abstract(?) nuance. The picture seen at a certain presentation timestamp either has been decoded, or can be encoded as frame pictures or field pictures.
> 
> You see. You are using the H.262 nomenclature. That's fine, and I'm considering using it also even though it appears to be excessively wordy. Basically, I prefer "pictures" for interlaced content and "halfpictures" for deinterlaced content unweaved from a picture.
> 
>> Both are stored in "frames", a red herring in the terminology imo ...
> 
> Actually, it is frames that exist. Fields don't exist as discrete, unitary structures in macroblocks in streams.
> 
>> ... The AVFrame that ffmpeg deals with isn't necessarily a "frame" as in a rectangular picture frame with width and height, but closer to how the data is  temporally "framed," e.g. in packets with header data, where one AVFrame has one video frame (picture). Image data could be scanned by macroblock, unless you are playing actual videotape.
> 
> You singing a sweet song, Ted. Frames actually do exist in streams and are denoted by metadata. The data inside slices inside macroblocks I am calling framesets. I firmly believe that every structure should have a unique name.
> 
>> So when interlace scanned fields are stored in frames, it's more than that both fields and frames are generalized into a single structure for both types of pictures called "frames" –  AVFrames, as the prefix might suggest, also are audio frames. And though it's not a very good analogy to field-based video, multiple channels of sound can be interleaved.
> 
> Interleave is not necessarily interlaced. For example, a TFF YCbCr420 frameset has 7 levels of interleave: YCbCr sample-quads, odd & even Y blocks (#s 1,2,3,4), odd & even Y halfmacroblocks, TFF Y macroblock, TFF Cb420 block (#5), TFF Cr420 block (#6), and macroblock. but only 3 interlacings: TFF Y macroblock, TFF Cb420 block, and TFF Cr420 block.

Okay, horrible analogy then :). This is kind of what I was referring to, frames definitely do exist, but there are many different “frames,” including some with audio, and some with no media data, just signaling metadata. And there are also frames as in encoded video frames, I do think that both the progressive_frame bit in mpeg and  interlaced_frame in AVFrame both refer to this type of video frame, but they are flags set on structures that represent significantly different constructs that they shouldn’t be compared directly.

For example, imagine you picked out a random AVPacket from a mpeg stream, for example, what do you think the chances are of  that random packet having an AVFrame storing an image frame?

(Actually, probably very very high, but sometimes you will pick some with no video data at all.)

A more 1-to-1 association could be found in the mpeg encoder/decoder internal structures, but I don’t know how I could make use of that to extract flags on pictures that may have already been reconstructed.

https://www.ffmpeg.org/doxygen/trunk/structPicture.html

Regards,
Ted Park

P.S. Also, fields is another possibly confusing term, bitfields vs. top/bottom fields.