[FFmpeg-user] resolution of the waterfall diagram of typical mp3 file

Florin Andrei florin at andrei.myip.org
Sun Aug 7 23:17:17 EEST 2016

Consider an mp3 file, mono (single channel), 44.1 kHz, encoded at 128 
kb/s constant bitrate (to keep things simple) with your encoder of 
choice using average settings (let's say whatever ffmpeg uses as 
defaults for this case).

Think of the full 3D representation of the spectrum of the whole file, 
with time being one dimension, frequency another dimension, and relative 
amplitude the 3rd dimension. Or the waterfall diagram - again time is 
one dimension, frequency the other, and the relative amplitude is 

For that particular file, the resolution of the time dimension is pretty 
clear: it's 44100 samples per second. What's less clear to me is the 
resolutions of the other two dimensions. If I were to build the full 3D 
representation, what resolutions should I choose on the other two 
dimensions to achieve, overall, a similar amount of information as that 
contained in the original mp3 file?

For the frequency dimension, what are the limits? Is it 20 Hz and 20 
kHz? And how many frequency "buckets" do I need to keep things 
comparable to the original mp3 file?

For the relative amplitude, how many bits do I need to capture more or 
less the same amount of info as the original mp3 file? 8 bit? 16 bit? 
Keep in mind this is the completely rolled out waterfall representation, 
not the encoded mp3 stream.

I think all these questions are ultimately tied into the total amount of 
information contained in the mp3 file. And I'm only looking for 
reasonable estimates for these parameters.

Florin Andrei

More information about the ffmpeg-user mailing list