[FFmpeg-devel] [PATCH] add signature filter for MPEG7 video signature

Mon Mar 21 15:00:52 CET 2016

Am 21.03.16 um 14:15 schrieb Gerion Entrup:
> On Montag, 21. März 2016 11:53:27 CET Thilo Borgmann wrote:
>> Am 21.03.16 um 00:14 schrieb Gerion Entrup:
>>> On Sonntag, 20. März 2016 17:01:17 CET Thilo Borgmann wrote:
>>>>> On Sun, Mar 20, 2016 at 12:00:13PM +0100, Gerion Entrup wrote:
>>>> [...]
>>>>
>>>>> This filter does not implement all features of MPEG7. Missing features:
>>>>>
>>>>> - binary output
>>>>> - compression of signature files
>>>>
>>>> I assume these features are optional?
>>>
>>> Compression is optional (could be set as flag in the binary
>>> representation). I have not found, whether binary output is optional.
>>>
>>> It is definitely possible to only work with the XML-Files.
>>
>> Of course, but having an unspecified XML output is almost useless if binary
>> output is not optional. So I think it is crucial to know what the spec says
>> about output.
> The spec defines the XML output my filter do atm and specifies a binary output 
> additional.
>>
>>>>> - work only on (cropped) parts of the video
>>>>
>>>> How useful is this then? Has fingerprint computed only on (cropped) parts
>>>> of the video any value outside of FFmpeg itself - does this comply to
>>>> the spec so that it can be compared with any other software generating
>>>> it?
>>>
>>> To clarify, the filter does not crop anything. The standard defines an
>>> optional cropping to, I guess, concentrate on specific video parts (this
>>> is not implemented). Assuming someone is recording a monitor, then e.g.
>>> the unrelated part of the video could be cropped out. Beside that, the
>>> signature itself is invariant to cropping up to a certain limit.
>>>
>>> The cropping values (upper left and bottom right position are specified in
>>> the xml, so another software could either crop the same way or compare
>>> only with the cropped input.
>>> (The fact, that ffmpeg has a cropping filter, would make such a feature
>>> some kind of redundant.)
>>
>> If I understand it correctly, the filter should not crop the image but only
>> use the pixel information within the specified area or the whole image.
>> Making it a filter option is useful, because the fingerprint of a part of
>> the image can be used in a filter chain continuing with the entire image
>> (no actual crop is required).
> Of course it would be great, if the filter would support it. It needs some
> modification to the summed area table. Once the binary output is ready, I will
> try to do it.
> 
>>
>>> The XML is standard compliant.
>>
>> So XML output is compliant to the spec? Or is the XML itself just valid XML?
> The XML output is compliant to the spec. The whole format is specified there.

>>> The signature is not bitexact. 3-4 (ternary)
>>> values in the framesignature differ from the signature of the sample
>>> files, but the conformence tests [1] allow up to 15 ternaryerrors.
>>
>> Bitexact compared to what?
> The institute, where I write the filter, owns the sample files mentioned in the 
> doc together with the correspondent binary and XML signatures (so I could 
> compare it).
> 
>> Does it allow up to 15 ternary errors for assume two inputs are equal enough
>> to be the same image or does it state that the fingerprint itself may
>> differ for 15 ternary errors for the very same image?
> The 15 tenary errors are valid between the sample signature and the 
> reimplementation for the same sample.
> 
> But ok, you seem to be right, the binary representation seems to be necessary. 
> I quote (out of the linked doc):

> "The number of dimensions whose ternary values differ between the test and the 
> reference video signatures shall be less than or equal to 15 out of 380, if 
> the FrameConfidence values of both the test and the reference video signatures 
> are greater than or equal to 4. The ternary values of the frame signature 
> shall be decoded from the binary representation according to Table E.1."

This part of the spec seems to be dealing with comparing a reference image with
a test image. These two images differ, and how I read it, the spec says they are
identified as the same image if (FrameConfidence >= 4 && ndiff(sig_ref,
sig_test) <= 15).

That is about identifying images.
That is not about calculation of the signature.

So again: Does your fingerprint filter produces the exact same fingerprint for
_the exact same image_ like the reference software?
(Input reference image -> filter -> exact same fingerprint like in reference xml?)

If not, the difference has to be understood and your filter has to be updated to
match the reference fingerprint.

-Thilo