[FFmpeg-devel] [PATCH v3 3/3] aadec: fix seeking in mp3 content

Karsten Otto ottoka at posteo.de
Wed Jul 4 10:32:32 EEST 2018


> Am 04.07.2018 um 03:26 schrieb Michael Niedermayer <michael at niedermayer.cc>:
> 
> Signierter PGP-Teil
> On Tue, Jul 03, 2018 at 10:25:36PM +0200, Karsten Otto wrote:
>> TL;DR: I will drop patch 3/3, may rather spend some time investigating why
>> "ff ee 47 9d“ passes the mp3 header parser. Also, the aa file "index" cannot
>> be used for frame or chapter detection, unfortunately.
>> 
>> More details inline below.
>> 
>>> Am 03.07.2018 um 02:32 schrieb Michael Niedermayer <michael at niedermayer.cc>:
>>> 
>>> Signierter PGP-Teil
>>> On Mon, Jul 02, 2018 at 07:21:43PM +0200, Karsten Otto wrote:
>>>> 
>>>>> Am 02.07.2018 um 10:59 schrieb Michael Niedermayer <michael at niedermayer.cc>:
>>>>> 
>>>>> Signierter PGP-Teil
>>>>> On Thu, Jun 21, 2018 at 06:58:26PM +0200, Karsten Otto wrote:
>>>>>> MP3 frames may not be aligned to aa chunk boundaries. After seeking,
>>>>>> scan for the next valid frame header. Then truncate the packet, and
>>>>>> also adjust timestamp information accordingly.
>>>>>> ---
>>>>>> libavformat/aadec.c | 33 ++++++++++++++++++++++++++++-----
>>>>>> 1 file changed, 28 insertions(+), 5 deletions(-)
>>>>> 
>>>>> Please see AVSTREAM_PARSE_TIMESTAMPS
>>>>> 
>>>>> This codec specific code in demuxers should not be needed
>>>>> 
>>>> I tried that before, and you are right that it takes care of timestamp adjustments.
>>>> 
>>>> However, after a seek the parsed packet still contains a partial frame before the
>>>> next full one. I had expected libavformat/mpegaudio_parser.c to detect this
>>>> situation and discard the fragment, but unfortunately it does not. Instead it passes
>>>> it unchanged to the codec, which plays it as a pop or even a very ugly BLEEEP -
>>>> painful while wearing headphones!
>>> 
>>> I think you mis-diagnose this at least somewhat
>>> your code searches for a specific mp3 header, the parser and decoder would
>>> accept a wider range of mp3 variants.
>>> But both can choose points that are not mp3 frame starts. (if that is the
>>> problem you are seeing, iam not completely sure it is)
>>> 
>> It took a closer look at what happens when I hear a BLEEP: The packet begins
>> with a partial frame, starting with the byte sequence "ff ee 47 9d“. Unfortunately,
>> the mp3 parser indeed accepts this as a valid mp3 header, causing the noise.
>> By looking for the more restricted header, my patch finds the real next frame at
>> offset 78.
>> 
>> BTW: Should this sequence actually pass? AFAIK 01 is not a valid MPEG audio
>> version ID?
>> 
>>> Also is the more restricted header you search for always used or could
>>> this be failing with some files ?
>>> 
>> Good question. So far, all mp3 aa files I tested with matched the format (MPEG 2
>> Layer III at 32 kbps and 22kHz). I doubt there are other variants, but can’t be sure.
>> 
>>> Either way, looking at the demuxer a bit deeper, theres a TOC list in the
>>> main header which points to chunks. The one file i found has 12 such chunks
>>> the first represents the whole file i suspect, the next largest the audio
>>> data, another one the metadata.
>>> I guess the remaining 2 large ones could be a cover image and an index.
>> Correct, seems like all aa files have the TOC, but its entries can be in a different
>> order in each file. I guess thats why the original aadec.c implementation just
>> looks for the largest chunk to play.
>> 
>>> I didnt really look at it, but theres a table in there with pairs of 32bit
>>> values. the first in the file i have goes from 0 to 3 the second starts
>>> multiple times from 0 and seems monotonly increasing and staying within
>>> the filesize.
>>> The sample i have does not store mp3 but it looks like this is a index
>>> maybe offsets for packets in each of the 3 chapters.
>>> 
>>> Please look at the data, if it can be used. It would be much better than
>>> scaning the file linearly and searching for some byte sequence to find
>>> packet starts.
>>> 
>> Short answer: Sorry, it is not possible to derive frame offsets nor chapter
>> offsets from the index.
>> 
> 
>> Long answer:
>> All offsets in the index are the same, and matching the "codec_second_size"
>> = crypto chunk size, roughly one second of audio:
>> - 1045 for format 2 (SIPR 8kbps)
>> - 2000 for format 3 (SIPR 16kbps)
>> - 3982 for format 4 (MP3 32kbps)
>> This is different from the respective frame size, which is 19, 20, and 104/105
> 
> The first 2 are exact multiples of the frame size.
> 
> I dont have a sample for the mp3 case so i cannot check but are you sure
> the actually writen numbers dont match up multiples of mp3 frames ?
> 
Yes, I have tested both with some old aa books I still had on disk, and some I
bought more recently. Using -fdebug ts, I can clearly see a packet size of
104 alternating with 105, as I would expect from mp3. Furthermore, using my
specialized header detection on every chunk, I see that the actual mp3 frame
header indeed starts at offset 0 in chunk 0. But then it changes for every
subsequent chunk, I have seen a maximum of 103, as to be expected.

> If not the question that remains is how does the official code seek in this?
> It seems a bit odd that they would get the design of the index wrong but then
> implement demuxer side searching for a mp3 header not noticing that this is
> not working fully reliable.
> [This would imply that while they implemented the demuxer they would have
> been aware that their index doesnt work, why would they not fix it at this
> point ...]
> Its hard to imagine they would not notice this bug with the index at all,
> not that this is impossible or anything
> 

Agreed. Here is how I think this is actually supposed to work: Every encrypted
chunk contains roughly one second of audio. So, if you want to seek to a
specific second, you can just look up an index entry. This tells you the chapter
number and the offset in the chapter. Now, if you already know the chapter
offsets, you can jump right to the correct position in the audio section. (I assume
the chapter offsets are somewhere else in the aa format, possibly encrypted.)
Regarding the odd mp3 frame offsets, a more restricted header detection will
take care of that, as I do in patch 3/3.

Of course, given each chunk is the same size, you don't really need an index,
but you can calculate the offset instead. This is what I am doing in patch 2/3.
The index only makes sense if the offsets were not linear, but shuffled around.
Maybe they did that in the original aa format 1 to scramble the audio, and kept
the index for backward compatibility. This is pure speculation though, as I have
no information at all about format 1.

So, the whole thing boils down to knowing the chapter offsets. From the
current aadec.c implementation I know that this information exist inline in the
audio section, so that's what I am using in patch 2/3. As I said, the chapter
information might also exist somewhere else in the format, but finding it would
require more reverse engineering, which I am not comfortable with.

Yes, my current patch requires a couple dozen seek operations on the file,
which is not optimal. But I don't think this is too much of a performance hit, as
nowadays everybody has an OS with a sturdy disk block cache, or read directly
from RAM storage anyway. It works at least, so I can use my old aa books on
my Linux phone now :-)

Cheers, Karsten




More information about the ffmpeg-devel mailing list