[FFmpeg-user] resolution of the waterfall diagram of typical mp3 file

Nicolas George george at nsup.org
Mon Aug 8 00:28:52 EEST 2016

Le primidi 21 thermidor, an CCXXIV, Florin Andrei a écrit :
> For that particular file, the resolution of the time dimension is pretty
> clear: it's 44100 samples per second.

You are making a wrong assumption here, and that is tainting the rest of
your reasoning.

You can not compute the spectrum of a single sample, that does not make
sense mathematically. The spectrum needs to be computed on the whole stream,
or at least, if you want to observe how it evolves during time, over a
window large enough. You will never be able to see a component whose period
is larger than the window. Therefore, if you want to distinguish an A4, you
will need a window of at least 100 samples, preferably several times that.

You can shift the window of a single sample, having n-1 samples overlap, but
nobody does that because the spectrum would be almost identical and quite
uninteresting. Plus, it would be insanely expensive in computation. Usually,
the window is shifted by half or full, I think.

>					What's less clear to me is the
> resolutions of the other two dimensions. If I were to build the full 3D
> representation, what resolutions should I choose on the other two dimensions
> to achieve, overall, a similar amount of information as that contained in
> the original mp3 file?
> For the frequency dimension, what are the limits? Is it 20 Hz and 20 kHz?

If you use something that looks like a FFT for your spectrum, with input
sample frequency F and window size N, you get the spectrum from 0 Hz to F/2
by steps of F/N. Information theory tells us that any other method would
yield roughly the same precision.

In short, the frequency resolution of the spectrum is given by the inverse
of the duration of the window, and the maximum frequencies are half the
sample frequency.

For example, if you want to distinguish A4 from the next note (440 Hz versus
466 Hz, you need a window of at least 1696 samples.

> And how many frequency "buckets" do I need to keep things comparable to the
> original mp3 file?

Remember that the perception of frequency is logarithmic. And also that
spectrum is added quadratically in the buckets.


  Nicolas George
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 819 bytes
Desc: Digital signature
URL: <http://ffmpeg.org/pipermail/ffmpeg-user/attachments/20160807/0a3826fa/attachment.sig>

More information about the ffmpeg-user mailing list