[FFmpeg-devel] [PATCH v3 3/3] aadec: fix seeking in mp3 content

Michael Niedermayer michael at niedermayer.cc
Wed Jul 4 04:26:33 EEST 2018


On Tue, Jul 03, 2018 at 10:25:36PM +0200, Karsten Otto wrote:
> TL;DR: I will drop patch 3/3, may rather spend some time investigating why
> "ff ee 47 9d“ passes the mp3 header parser. Also, the aa file "index" cannot
> be used for frame or chapter detection, unfortunately.
> 
> More details inline below.
>  
> > Am 03.07.2018 um 02:32 schrieb Michael Niedermayer <michael at niedermayer.cc>:
> > 
> > Signierter PGP-Teil
> > On Mon, Jul 02, 2018 at 07:21:43PM +0200, Karsten Otto wrote:
> >> 
> >>> Am 02.07.2018 um 10:59 schrieb Michael Niedermayer <michael at niedermayer.cc>:
> >>> 
> >>> Signierter PGP-Teil
> >>> On Thu, Jun 21, 2018 at 06:58:26PM +0200, Karsten Otto wrote:
> >>>> MP3 frames may not be aligned to aa chunk boundaries. After seeking,
> >>>> scan for the next valid frame header. Then truncate the packet, and
> >>>> also adjust timestamp information accordingly.
> >>>> ---
> >>>> libavformat/aadec.c | 33 ++++++++++++++++++++++++++++-----
> >>>> 1 file changed, 28 insertions(+), 5 deletions(-)
> >>> 
> >>> Please see AVSTREAM_PARSE_TIMESTAMPS
> >>> 
> >>> This codec specific code in demuxers should not be needed
> >>> 
> >> I tried that before, and you are right that it takes care of timestamp adjustments.
> >> 
> >> However, after a seek the parsed packet still contains a partial frame before the
> >> next full one. I had expected libavformat/mpegaudio_parser.c to detect this
> >> situation and discard the fragment, but unfortunately it does not. Instead it passes
> >> it unchanged to the codec, which plays it as a pop or even a very ugly BLEEEP -
> >> painful while wearing headphones!
> > 
> > I think you mis-diagnose this at least somewhat
> > your code searches for a specific mp3 header, the parser and decoder would
> > accept a wider range of mp3 variants.
> > But both can choose points that are not mp3 frame starts. (if that is the
> > problem you are seeing, iam not completely sure it is)
> > 
> It took a closer look at what happens when I hear a BLEEP: The packet begins
> with a partial frame, starting with the byte sequence "ff ee 47 9d“. Unfortunately,
> the mp3 parser indeed accepts this as a valid mp3 header, causing the noise.
> By looking for the more restricted header, my patch finds the real next frame at
> offset 78.
> 
> BTW: Should this sequence actually pass? AFAIK 01 is not a valid MPEG audio
> version ID?
> 
> > Also is the more restricted header you search for always used or could
> > this be failing with some files ?
> > 
> Good question. So far, all mp3 aa files I tested with matched the format (MPEG 2
> Layer III at 32 kbps and 22kHz). I doubt there are other variants, but can’t be sure.
> 
> > Either way, looking at the demuxer a bit deeper, theres a TOC list in the
> > main header which points to chunks. The one file i found has 12 such chunks
> > the first represents the whole file i suspect, the next largest the audio
> > data, another one the metadata.
> > I guess the remaining 2 large ones could be a cover image and an index.
> Correct, seems like all aa files have the TOC, but its entries can be in a different
> order in each file. I guess thats why the original aadec.c implementation just
> looks for the largest chunk to play.
> 
> > I didnt really look at it, but theres a table in there with pairs of 32bit
> > values. the first in the file i have goes from 0 to 3 the second starts
> > multiple times from 0 and seems monotonly increasing and staying within
> > the filesize.
> > The sample i have does not store mp3 but it looks like this is a index
> > maybe offsets for packets in each of the 3 chapters.
> > 
> > Please look at the data, if it can be used. It would be much better than
> > scaning the file linearly and searching for some byte sequence to find
> > packet starts.
> > 
> Short answer: Sorry, it is not possible to derive frame offsets nor chapter
> offsets from the index.
> 

> Long answer:
> All offsets in the index are the same, and matching the "codec_second_size"
> = crypto chunk size, roughly one second of audio:
> - 1045 for format 2 (SIPR 8kbps)
> - 2000 for format 3 (SIPR 16kbps)
> - 3982 for format 4 (MP3 32kbps)
> This is different from the respective frame size, which is 19, 20, and 104/105

The first 2 are exact multiples of the frame size.

I dont have a sample for the mp3 case so i cannot check but are you sure
the actually writen numbers dont match up multiples of mp3 frames ?

If not the question that remains is how does the official code seek in this?
It seems a bit odd that they would get the design of the index wrong but then
implement demuxer side searching for a mp3 header not noticing that this is
not working fully reliable.
[This would imply that while they implemented the demuxer they would have
 been aware that their index doesnt work, why would they not fix it at this
 point ...]
Its hard to imagine they would not notice this bug with the index at all,
not that this is impossible or anything


[...]
-- 
Michael     GnuPG fingerprint: 9FF2128B147EF6730BADF133611EC787040B0FAB

The real ebay dictionary, page 2
"100% positive feedback" - "All either got their money back or didnt complain"
"Best seller ever, very honest" - "Seller refunded buyer after failed scam"
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 181 bytes
Desc: not available
URL: <http://ffmpeg.org/pipermail/ffmpeg-devel/attachments/20180704/29a9dcc6/attachment.sig>


More information about the ffmpeg-devel mailing list