[FFmpeg-devel] update data_offset field in format context

Yoav Steinberg yoav
Thu Nov 6 16:50:46 CET 2008



Michael Niedermayer wrote:
> On Wed, Nov 05, 2008 at 05:03:28PM +0200, Yoav Steinberg wrote:
>> Hi,
>> I've come across some instances where the data_offset field of 
>> AVFormatContext isn't updated after opening a file for input 
>> (av_open_input_file). From the comment in the header it seems that the 
>> data_offset field should represent the position in the input where the 
>> header ends and the data begins. In some cases the header parsing done 
>> during file input seems to run to the end of the input and isn't restored 
>> to the position where the data begins, yielding an invalid data_offset 
>> value equal to the file size (specifically this recreates when calling 
>> av_open_input_file on a mov file).
>> I've add some code which attempts to provide a more accurate data_offset 
>> value for such files based on the index_entries table (if one is 
>> available). This seems to work for me. It'll be cool if this is added to 
>> the trunk or if someone can explain why not to add this.
>> (My code is attached).
>>
> 
> This is not a proper solution to the problem, it also adds a obscure and
> more importantly completely undocumented behavior to index entries.
> 
> For a proper solution (aka anything that might be accepted into svn)
> the first step is a full explanation of what is wrong, basically, if
> it cannot be reproduced exactly its not a full explanation.
> second would be the question if its easier to fix the affected demuxers
> or to change the core to guess the offset. Either way all demuxers must
> be looked at, in the first case to find&fix them in the second to ensure
> the core change works with all.
> 
> [...]
> 

In my specific application (using libavformat) I'm interested in using 
the data_offset field to figure out how much of the file is used for 
data and how much is used for "headers". This is for some general file 
"rating" system which isn't relevant to our discussion. I found the 
data_offsted field useful since it's documented as the "offset of the 
first packet". Problem was that some demuxers leave pb at the end of the 
input after after calling read_header. Since I wasn't sure if changing 
this behavior in each rogue demuxer is a good idea I found another 
solution which should work (and actually does work for my tested cases) 
independently of whether the demuxer seeks back or not after read_header.
Just as a note this solution was required for "mov" demuxer since its 
read_header reads the file to the end (if possible).

Question is whether the data_offset is something I should theoretically 
be able to count on, or whether it's just a helper utility for any 
demuxer that wants some place to save the data offset (without adding a 
private field).

Currently the:
         if (pb && !ic->data_offset)
             ic->data_offset = url_ftell(ic->pb);

in the core attempts to use the current position if it wasn't set by the 
demuxer, indicating a "best guess" policy. I was attempting in the patch 
to improve the guessing by employing the index entries table when available.

I'd be willing to add a data_offset setting in the "mov" demuxer if lack 
of valid data_offset after reading the mov header is considered a bug. 
But I guess that if a valid data_offset is required only if the packet 
reading depends on it then having crap in the data_offset after reading 
the header isn't a bug. And in that case I can't complain...

What do you think?






More information about the ffmpeg-devel mailing list