[FFmpeg-devel] [PATCH] RFC: Set reasonable subtitle dimensions for timed-text in mov/mp4.

Mon Mar 11 21:09:14 CET 2013

On 2013-03-11 12:44, Nicolas George wrote:
> Let me get it straight. The extradata in the header can declare "this
> subtitle is meant to be displayed in a 576×72 rectangle at (72, 
> 468)", and
> this will look good if displayed on a 720×576 video, but completely 
> out of
> place on a 1920×1080 video. Is that it?

That's right. The spec assumes a close coupling between the text box 
and the
subtitle stream display area, and in turn a close coupling between that 
display
area and the video stream.

>
> Is it possible to specify some more in the extradata: "this subtitle 
> is
> meant to be displayed in a 576×72 rectangle at (72, 468) on top of a 
> 720×576
> video"?

No. The extradata doesn't let you do that - the stream headers added by 
the
muxer represent that.

>> Note that, as far as I know, only the Apple QT Player on OSX or
>> Windows respects this sizing information.
>
> What happens if the sizing information specifies 0×0?

Then you don't see anything - which is what triggered the original bug
report. It will render the subtitles in a 0x0 box in the top left
corner :-/

>> -    if (track->enc->extradata_size)
>> +    if (track->enc->extradata_size) {
>> +        if (track->enc->extradata_size >= 18) {
>> +            // Rewrite text box dimensions to match video stream.
>
> IMHO, doing so unconditionally is not acceptable.
>
>> +            uint8_t *ed = track->enc->extradata;
>
>> +            uint16_t width = track->video_width;
>> +            uint16_t height = track->video_height;
>> +            height /= 10;
>
> Why do you need to copy this into local variables?
>
>> +            ed[14] = height >> 8;
>> +            ed[15] = height & 0xFF;
>> +            ed[16] = width >> 8;
>> +            ed[17] = width & 0xFF;
>
> We have macros to store integers in a buffer with a specified 
> endianness.
>

This is all RFC - I'm not suggesting this is how I'd actually do it.

>> +        }
>>          avio_write(pb, track->enc->extradata, 
>> track->enc->extradata_size);
>> +    }
>>
>>      return update_size(pb, pos);
>>  }
>> @@ -1633,7 +1645,9 @@ static int mov_write_tkhd_tag(AVIOContext *pb, 
>> MOVTrack *track, AVStream *st)
>>          AVDictionaryEntry *rot = av_dict_get(st->metadata, 
>> "rotate", NULL, 0);
>>          rotation = (rot && rot->value) ? atoi(rot->value) : 0;
>>      }
>> -    if (rotation == 90) {
>> +    if (track->enc->codec_type == AVMEDIA_TYPE_SUBTITLE) {
>> +        write_matrix(pb,  1,  0,  0,  1, 0, (track->video_height * 
>> 9) / 10);
>> +    } else if (rotation == 90) {
>
> Looks unrelated.

No - as I said, you have to use the transformation matrix to position 
the subtitle display
area on screen. You could make the subtitle area equal to the video 
stream dimensions and
then explicitly position the text box - which ought to yield the same 
net results, but it
seems the Apple player puts a tinted background on the subtitle area, 
so this would end
up covering the entire screen - not so hot.

> I must say, I would be much more at ease with a solution requiring 
> human
> input: just ask the users to add "-s 576x72" to their encoding 
> options.

Well, it means requiring a bunch more information. My proposal gives 
you reasonable results in
the vast majority of cases. I agree that additional encoding options 
are probably necessary to
give total control, but I'd like something that doesn't require it. 
It's very tedious if you're
trying to do a simple remux to have to provide a bunch of options that 
are easily guessable in
the normal scenario.

--phil