[FFmpeg-devel] soundtouch filter?
pkoshevoy at gmail.com
Tue May 15 08:30:32 CEST 2012
On 4/29/12 2:56 PM, Pavel Koshevoy wrote:
> On 04/29/2012 09:50 AM, Reimar Döffinger wrote:
>> Hello Pavel,
>> it seems I can't send anything to the list at the moment, so direct
>> answer instead...
>> On Sun, Apr 29, 2012 at 08:36:02AM -0600, Pavel Koshevoy wrote:
>>> SoundTouch is LGPL. However, I am fine with GPL. What existing
>>> avfilter code is a good reference for me to use for scaletempo port?
>> I don't really know, but af_aresample does the kind of pts fiddling you
>> asked about and is not that large since most of the "heavy lifting"
>> code is in
>> So that might be a good starting point.
> OK, I had a quick look at scaletempo, it implements WSOLA. I had a
> look at SoundTouch, it appears to implement SOLA. I looked at WSOLA
> paper, it's not terribly enlightening. This article
> http://www.surina.net/article/time-and-pitch-scaling.html by
> SoundTouch author is much easier to understand. I am thinking I may
> not need to port this filter from mplayer, I'll try to implement it
> from scratch first, maybe use ffmpeg rdft functions for cross
> correlation calculation.
> It also occurs to me that this filter needs to modify timestamps not
> only for audio but also for video and subtitles, otherwise they'll go
> out of sync. Is this going to be a problem?
> Thank you,
OK, I now have my own implementation of WSOLA filter. It didn't use
cross-correlation for audio fragment alignment, I've used a
multi-resolution pyramid registration approach instead for performance
reasons -- O(N).
My implementation is just a couple of C++ template classes parameterized
by the sample type (unsigned char, short int, int, float). The filter
supports multi-channel audio. I've already integrated it into my ffmpeg
based video player, but not as an avfilter (because I don't want that
The filter files are here:
So, is it worth trying to wrap it as an avfilter and add it to ffmpeg?
BTW, my filter doesn't sound the same as mplayers scaletempo. At 0.5
tempo I think mine sounds better (that's subjective), but at 0.9
scaletempo sounds better, so that's something else to consider. The
difference may be due to the segment alignment algorithm choices. I may
try to implement cross-correlation via FFT some time later to make it a
More information about the ffmpeg-devel