[FFmpeg-devel] [PATCH] Use tkhd matrix for proper display in mov
John Schmiederer
jschmiederer
Wed Jul 16 17:48:20 CEST 2008
>-----Original Message-----
>From: ffmpeg-devel-bounces at mplayerhq.hu [mailto:ffmpeg-devel-
>bounces at mplayerhq.hu] On Behalf Of Baptiste Coudurier
>Sent: Tuesday, July 15, 2008 11:11 PM
>To: FFmpeg development discussions and patches
>Subject: Re: [FFmpeg-devel] [PATCH] Use tkhd matrix for proper display
>in mov
>
>Hi,
>
>Michael Niedermayer wrote:
>> On Thu, May 29, 2008 at 10:49:22AM -0400, John Schmiederer wrote:
>>>>>>>>>> On Tue, May 27, 2008 at 02:49:24PM -0400, John Schmiederer
>>>>>> wrote:
>>>>>>>>>>>>> Attached is a patch to account for the transformation
>>>>>>>>>>>>> matrix
>>>>>>>>>> contained in the tkhd atom for proper display width/height.
>>>>> [...]
>>>>>
>>>>>>>>> + //transform the display width/height according to the
>>>> matrix
>>>>>>>>> + if (width && height) {
>>>>>>>>> + for (i = 0; i < 2; i++)
>>>>>>>>> + disp_transform[i] =
>>>>>>>>> + (int64_t) width * display_matrix[0][i] +
>>>>>>>>> + (int64_t) height * display_matrix[1][i] +
>>>>>>>>> + display_matrix[2][i];
>>>>>>>> This is not what the original patch did.
>>>>>>> It may look a little different, but the functionality has not
>>>>>> changed.
>>>>>>
>>>>>> You removed the scaling by 1<<16 and 1<<30 the code is no longer
>>>>>> doing the same. The relative scaling of w*matrix[0] and matrix[2]
>>>>>> has changed They are added together so the result is more than
>just
>>>>>> wrong by a constant scale factor.
>>>>> Argh! You're right, the math is off.
>>>>> Although I maintain that the functionality didn't change from the
>>>>> first patch - that was wrong, too. =) I forgot that width and
>height
>>>>> were 16.16 fixed, so instead of multiplying by [width height 1] it
>>>>> should have always been [width height 1<<16]
>>>>>
>>>>> The 1<<16 vs 1<<30 scaling is no longer an issue though, as all the
>>>> 1<<30 scaled terms were in that last column of the display matrix
>that
>>>> I no longer read in or use (display_matrix[*][2], not
>>>> display_matrix[2][*]).
>>>>> So in this updated patch all the multiplied terms are in the same
>>>> scale.
>>>>
>>>> [...]
>>>>
>>>>> + //transform the display width/height according to the matrix
>>>>> + // to keep the same scale, use [width height 1<<16]
>>>>> + if (width && height) {
>>>>> + for (i = 0; i < 2; i++)
>>>>> + disp_transform[i] =
>>>>> + (int64_t) width * display_matrix[0][i] +
>>>>> + (int64_t) height * display_matrix[1][i] +
>>>>> + (int64_t) (display_matrix[2][i] << 16);
>>>> with that order of operations display_matrix[2][i] << 16 can
>overflow
>>> Oops - I misapplied those parentheses to fix a "parentheses around +
>or - inside shift" warning.
>>> Attached patch does it right this time.
>>>
>>>> also things can be vertically aligned like:
>>>> (int64_t) width * display_matrix[0][i] +
>>>> (int64_t) height * display_matrix[1][i] +
>>>>
>>>> looks more pretty ...
>>> Agreed.
>>>
>>>> anyway iam mostly ok with the patch after this, maybe baptiste has
>some
>>>> further comments though ...
>>> Thanks again for all the help.
>>
>> patch looks ok if someone tested it with a few mov files
>>
>
>Is it possible to have the exact quote from the specifications
>describing the correct interpretation of these values ?
>
>Is this valid in mov or/and isom ?
The tkhd matrix is defined in both the QuickTime (mov) documentation and the ISO standard, as such I believe it is valid for both:
http://developer.apple.com/documentation/QuickTime/QTFF/QTFFChap2/chapter_3_section_3.html#//apple_ref/doc/uid/TP40000939-CH204-25550
In the ISO/IEC 14496-12 specification (Part 12: ISO base media file format)
Where the Track Header Box is defined (Section 8.5.1), the matrix is defined as
"matrix provides a transformation matrix for the video; (u,v,w) are restricted here to (0,0,1), hex
(0,0,0x40000000)."
It is later described how to use the matrix as:
"Matrix values which occur in the headers specify a transformation of video images for presentation. Not all
derived specifications use matrices; if they are not used, they shall be set to the identity matrix, If a matrix is used, the point (p,q) is transformed into (p', q') using the matrix as follows:
(p q 1) * | a b u | = (m n z)
| c d v |
| x y w |
m = ap + cq + x; n = bp + dq + y; z = up + vq + w;
p' = m/z; q' = n/z
The coordinates {p,q} are on the decompressed frame, and {p', q'} are at the rendering output. Therefore, for
example, the matrix {2,0,0, 0,2,0, 0,0,1} exactly doubles the pixel dimension of an image. The co-ordinates
transformed by the matrix are not normalized in any way, and represent actual sample locations. Therefore
{x,y} can, for example, be considered a translation vector for the image."
...
"All the values in a matrix are stored as 16.16 fixed-point values, except for u, v and w, which are stored as
2.30 fixed-point values.
The values in the matrix are stored in the order {a,b,u, c,d,v, x,y,w}."
-John
More information about the ffmpeg-devel
mailing list