[FFmpeg-devel] update data_offset field in format context

Michael Niedermayer michaelni
Thu Nov 6 18:42:33 CET 2008

On Thu, Nov 06, 2008 at 05:50:46PM +0200, Yoav Steinberg wrote:
> Michael Niedermayer wrote:
> > On Wed, Nov 05, 2008 at 05:03:28PM +0200, Yoav Steinberg wrote:
> >> Hi,
> >> I've come across some instances where the data_offset field of 
> >> AVFormatContext isn't updated after opening a file for input 
> >> (av_open_input_file). From the comment in the header it seems that the 
> >> data_offset field should represent the position in the input where the 
> >> header ends and the data begins. In some cases the header parsing done 
> >> during file input seems to run to the end of the input and isn't restored 
> >> to the position where the data begins, yielding an invalid data_offset 
> >> value equal to the file size (specifically this recreates when calling 
> >> av_open_input_file on a mov file).
> >> I've add some code which attempts to provide a more accurate data_offset 
> >> value for such files based on the index_entries table (if one is 
> >> available). This seems to work for me. It'll be cool if this is added to 
> >> the trunk or if someone can explain why not to add this.
> >> (My code is attached).
> >>
> > 
> > This is not a proper solution to the problem, it also adds a obscure and
> > more importantly completely undocumented behavior to index entries.
> > 
> > For a proper solution (aka anything that might be accepted into svn)
> > the first step is a full explanation of what is wrong, basically, if
> > it cannot be reproduced exactly its not a full explanation.
> > second would be the question if its easier to fix the affected demuxers
> > or to change the core to guess the offset. Either way all demuxers must
> > be looked at, in the first case to find&fix them in the second to ensure
> > the core change works with all.
> > 
> > [...]
> > 
> In my specific application (using libavformat) I'm interested in using 
> the data_offset field to figure out how much of the file is used for 
> data and how much is used for "headers". This is for some general file 
> "rating" system which isn't relevant to our discussion. I found the 
> data_offsted field useful since it's documented as the "offset of the 
> first packet". Problem was that some demuxers leave pb at the end of the 
> input after after calling read_header. Since I wasn't sure if changing 
> this behavior in each rogue demuxer is a good idea I found another 
> solution which should work (and actually does work for my tested cases) 
> independently of whether the demuxer seeks back or not after read_header.
> Just as a note this solution was required for "mov" demuxer since its 
> read_header reads the file to the end (if possible).
> Question is whether the data_offset is something I should theoretically 
> be able to count on, or whether it's just a helper utility for any 
> demuxer that wants some place to save the data offset (without adding a 
> private field).
> Currently the:
>          if (pb && !ic->data_offset)
>              ic->data_offset = url_ftell(ic->pb);
> in the core attempts to use the current position if it wasn't set by the 
> demuxer, indicating a "best guess" policy. I was attempting in the patch 
> to improve the guessing by employing the index entries table when available.

well i didnt write these 2 lines IIRC so i can nt say for sure but not every
piece of common code is a "best guess code"
one very well could see it the other way around, that its factorized code
from demuxers and only executed when its exactly correct.

> I'd be willing to add a data_offset setting in the "mov" demuxer if lack 
> of valid data_offset after reading the mov header is considered a bug. 
> But I guess that if a valid data_offset is required only if the packet 
> reading depends on it then having crap in the data_offset after reading 
> the header isn't a bug. And in that case I can't complain...
> What do you think?

I think we should let baptiste who is mov maintainer comment but AFAICS
data_offset has not much  meaning for mov. Headers can at least be at the
begin or the end, and possibly even in the middle.
Also a file with data chunks randomly shuffled and the first packet at
the end and last one at the begin should be valid ...

Michael     GnuPG fingerprint: 9FF2128B147EF6730BADF133611EC787040B0FAB

I count him braver who overcomes his desires than him who conquers his
enemies for the hardest victory is over self. -- Aristotle
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: Digital signature
URL: <http://lists.mplayerhq.hu/pipermail/ffmpeg-devel/attachments/20081106/08258977/attachment.pgp>

More information about the ffmpeg-devel mailing list