[FFmpeg-user] Is ffmpeg's "Output Stream" framerate... wrong?

roninpawn roninpawn at gmail.com
Tue Mar 16 03:51:44 EET 2021


I develop a Python application used to conduct official timing for
speedrunning leaderboards based on automated video analysis. And I've
caught a little oversight of my own that leads me to wonder if there isn't
an oversight at the core of ffmpeg in the 'fps' reported in the output
stream.

This relates to Variable Frame Rate video, so please try to hold your
shouts of "*JUST CONVERT IT TO CFR*" until after.

---

So, I'm opening a rawvideo pipe to ffmpeg in Python (using ffmpeg-python
0.2.0 by Karl Kroening as a command-line wrapper) to receive a bitstream of
the frames in a video for analysis. It's blazing fast, btw! Used to do this
with OpenCV and it was a slog. (Not to mention unable to transit the
"read-head" to the ACTUAL frame or time requested without just landing on a
near-ish keyframe instead.)

Before accessing the file, I call to ffprobe to get the 'r_frame_rate,' and
use those values to identify the frames per second of the footage. And
apparently, what 'r_frame_rate' returns is the "Output" stream fps. Which
is NOT the "Input" frames per second on VFR video.

Have a look at the Input data in my console for this VFR footage,
specifically the fps under Stream#0:0:


> *Input #0*, mov,mp4,m4a,3gp,3g2,mj2, from 'I:/Downloads/Medieval 111
> IGT.mp4':
>   Metadata:
>     major_brand     : isom
>     minor_version   : 512
>     compatible_brands: isomiso2avc1mp41
>     encoder         : Lavf58.29.100
>   Duration: 00:01:23.94, start: 0.000000, bitrate: 2270 kb/s
>     *Stream #0:0*(und): Video: h264 (Main) (avc1 / 0x31637661),
> yuv420p(tv, bt709), 1280x720 [SAR 1:1 DAR 16:9], 2137 kb/s, *30.01 fps*,
> 30 tbr, 90k tbn, 60 tbc (default)


So the input is 30.01 fps. Cool. Now look at the Output stream reported.

*Output #0*, rawvideo, to 'pipe:':
>   Metadata:
>     major_brand     : isom
>     minor_version   : 512
>     compatible_brands: isomiso2avc1mp41
>     encoder         : Lavf58.45.100
>     *Stream #0:0*: Video: rawvideo (BGR[24] / 0x18524742), bgr24, 446x344
> [SAR 1:1 DAR 223:172], q=2-31, 110465 kb/s, *30 fps*, 30 tbn, 30 tbc
> (default)


It's just straight-up 30fps. Okay! Great. So, ffmpeg is automatically
converting this VFR footage to CFR and handing it back to me at 30.00fps.
...is what I had erroneously assumed.

(lot of "buts" coming)

But my application that times events detected in the frames of the footage
counted 2301 frames between two events. And then told me that 2301 / 30fps
is 1m 17s 700ms. Which is correct! AT 30FPS ANYWAY.

But when the footage is converted to CFR, or simply mathematically measured
at 30.0fps there are NOT 2301 frames between the two events. There are 2300
frames between those events. Because if you account for the .01 of
30.01fps, a frame must be dropped. 2301 frames / 30.01fps = 1m 17s 674ms.
And at a flat integer of 30fps, it's not 2301, but 2300 frames / 30fps that
gets you a 667ms time. As a matter of approximation, a frame must be
discarded to be as accurate as you can while squeezing the footage.

But all this is just backstory, because I now realize that ffmpeg is NOT
altering the footage of VFR in any way, which makes the MOST sense. It's
just handing me back a bitstream of every frame in the video. Perfectly
sensible!

But while it's handing me back every frame of the 30.01fps media in the
output stream, it's also identifying that output as a flat 30fps. Which it
IS NOT. As ffmpeg clearly reports in the input stream's console log. And
yet more confusing, it seems that ffprobe's 'r_frame_rate' is returning the
output frame rate of 30 / 1, instead of whatever values it used to come up
with the 30.01 it declares for the input at runtime.

To deepen the quagmire, I have pushed VFR video through this application
before that averaged 59.96fps. And ffprobe / ffmpeg have reported back to
me a frame rate of 59.94fps instead -- snapping into the common NTSC frame
base of US-broadcast 60fps TV.

So I think, for the first time, I understand the information ffmpeg is
giving me. That's good! But I find myself wondering why this would possibly
be the intended behavior? The "output" frame rate and the response from
ffprobe is automatically snapped to the nearest of an internally stored
list of industry-standard framebases and reported as the fps when it is
definitely not. There would be no need to "snap" to these values if they
were correct. And because the output tends to report round integer values
of decisive framerates, it led me to the conclusion that ffmpeg was doing
some automatic magic on VFR to present it back to me as CFR. Which it is
not doing.

AND AND, because the frame rate ffprobe returns is this willfully incorrect
industry-standard framebase data, any calculations done USING that value
will be decidedly more wrong than they would be when using the VFR frame
rate echoed in the INPUT stream. So my question is:

Why?

What is the virtue or benefit of this mis-reporting behavior of the
output-frame rate? If it's a convenience feature to report the nearest
industry standard frame rate without another developer having to maintain
their own list, that's a good idea! But I can't see why it would be
reported as the output frames per second on console, when the output
stream's frames are being delivered at a different frame rate. The console
representation led me, as a developer, to draw the most obvious conclusion
from an application I trust: That if ffmpeg is telling me this output
stream is 30fps, the output frames of this stream must be at 30fps. I
assumed there was some automatic transcoding applied to VFR, or internal
frame-drop / add happening to conform the input to the figure presented on
the console. And I scratched my head for a long time wondering why I never
got a float value back from 'r_frame_rate' when handing it VFR.

So, I'm asking for confirmation that I'm understanding this right. And it's
not a rhetorical question when I ask why ffmpeg intentionally mis-reports
the best known frame rate figure in the output stream? Is there a reason?
Does it make more sense when you aren't pulling a raw bytestream or
something? Also, I presume I'll find in the documentation a value other
than 'r_frame_rate' where I can poll the actual frame rate - the INPUT
frame rate - instead of this snapped and conformed one. Feel free to save
me the search if you know, though!

Thanks in advance for replies.
-Roninpawn


More information about the ffmpeg-user mailing list