[FFmpeg-user] repeat a frame

Mark Filipak (ffmpeg) markfilipak at bog.us
Thu Mar 4 10:57:38 EET 2021


On 2021-03-04 01:05, Jim DeLaHunt wrote:
> On 2021-03-03 14:57, Mark Filipak (ffmpeg) wrote:
>> With TB = 1/(720000 ticks/s), for a 24.976fps output,
>> deltaPTS = (1001/24000 frames/s)/(1/(720000 ticks/s)) = 30030 ticks/frame
>>
>> If working time_base (from the AVRational) has an effective resolution of int32 (i.e. ±2147483647 
>> ticks), then frames past 0:49:42 will be dropped.
> 
> 
> I see what you are getting at, but you are using the wrong terminology for this software product, so 
> your statements sound garbled.
> 
> Remember that, in FFmpeg, the time_base is the time difference between frames, in seconds...

I can't agree. I think you refer to (PTS ticks)x(TB s/tick).

I think that time_base (TB) is the tick rate of the STC (system time clock, 27 MHz for mpeg2video) 
divided by a divider (300 for mpeg2video). For example, 1/(90000 ticks/s) = 11.[1..] µs/tick is the 
time_base for mpeg2video.

What I'm using is TB = 1/(720000 ticks/s) = 1.3[8..] µs/tick (which has 8x higher resolution than 
mpeg2video).

>... It is an 
> attribute of the stream, so its value does not change regardless of the length of the stream (unless 
> something changes the time_base, creating a second stream derived from the first). Time_base is type 
> AVRational, which is a rational number, not an integer, not a float.

Hmmm... You are implying that time_base is not being converted to a float, eh? AVRational 
mathematics, eh? Strange, but I guess it's not impossible (or unreasonable).

> Instead of "working time_base", i think you mean "time offset". This is the number of seconds since 
> the zero time. FFmpeg can get a lot done without calculating the time offset.

By "working time_base" I mean the time_base used in the filter pipeline (as opposed to the decoder 
or encoder time_base). I gave it a name because it otherwise has no name to differentiate it.

> Time offset = time_base * Presentation Timestamp (PTS).  Thus, PTS = time offset / time_base.
> 
> FFmpeg uses PTS values, related to the constant time_base, a lot.
> 
> 
>> If working time_base has an effective resolution of uint32 (i.e. 4294967295 ticks), then frames 
>> past 1:39:26 will be dropped >
> When it comes to integers, "resolution" is not the right term to use. "Maximum value" and "minimum 
> value" are the most comment. "range" or "capacity" might also be used.  The number of bits in the 
> integer is the "size".

No, I mean resolution, temporal resolution in this case. A 1.3[8..] µs/tick time_base has 8 times 
the resolution of a 11.[1..] µs/tick time_base.

The thing that's attractive about a 1.3[8..] µs/tick time_base is that it produces exact integer 
'PTS's for 23.976fps, 24fps, 25fps, 29.970fps, 30fps, 47.952fps, 48fps, 50fps, 59.940fps, 60fps, 
100fps, 119.880fps, and 120fps. Lower resolution 'time_base's do not produce exact integers for such 
a broad range of frame rates.

> The AVRational value is stored as an integer numerator and an integer denominator. The ranges of 
> those integers are sufficient to store 1 and 72,000. Beyond that, for this discussion it doesn't 
> matter what their maximum values are.

Well, per rational.h, 'num' & 'den' are both integers.

Now, I don't know how '720000' is stored. Is it stored as an int64? I don't know, but I do know that 
it can't be stored as an 8-bit integer or a 16-bit integer or anything else that has 8-bit or 16-bit 
resolution. That includes AVRational. 720000 can't be stored as AVRational. AVRational just doesn't 
have enough resolution.

Please prove me wrong! I hope you can, because in the proving you will expose what really happens 
when ffmpeg computes 'PTS's, and that's something I very much want to know.

>> I think that the successful transcode of a 2:21:19 video confirms that the working time_base is 
>> sufficient. I suspect it's a float but of course I don't know that and I don't know its resolution.
> 
> As we have discussed, PTS is stored as an int64_t, a signed integer with a size of 64 bits. The 
> maximum value of an int64_t is (2^63)-1, about 9 billion billion (9.2 * 10^18). FFmpeg may reserve a 
> few of the maximum and minimum values to indicate special conditions, but 9 billion billion will do.

Again, my issue is with the resolution of TB, not the extent (range) of either TB or PTS.

> The time offset is calculated from an int64_t * (integer / integer). FFmpeg code can choose to store 
> the result exactly as a rational number (assuming a numerator with a high enough maximum value), or 
> approximately as an integer or a high-precision float, as the circumstances demand.
> 
> With a time_base of 1/720,000 secs, a near-maximum PTS of 9 billion billion indicates a near-maximum 
> time offset of a little over 396,000 years.

I'm sure you know that the running time produced by 'PTS's depends on frame rate *and* time_base. 
For example, for 23.976fps and TB = 1/(720000 ticks/s) = 1.3[8..] µs/tick, a PTS of 
+9223372036854775800 indicates a 307138595965860 maximum frame number. That frame number is reached 
by a 426581383 second video (= 13 years, 189 days, 49 minutes, 43 seconds). Such an improbable 
running time is a consequence of PTS being an int64. it has no bearing on temporal resolution. The 
temporal resolution is 1.3[8..] µs/tick (or, for 23.976fps CFR, deltaPTS = 30030).

> Most of your films will likely be shorter than that.

Do you agree with my figures? Do you see the difference between time extents and time resolution?


More information about the ffmpeg-user mailing list