[FFmpeg-devel] [PATCH v4] lavf/h264: add support for h264 video from Arecont camera, fixes ticket #5154

Sat May 11 10:43:38 EEST 2019

On 11-05-2019 04:27, Reimar Döffinger wrote:

> On Thu, May 09, 2019 at 11:45:59PM +0530, Shivam Goyal wrote: 
> 
>> @@ -117,4 +120,128 @@ static int h264_probe(const AVProbeData *p)
>> return 0;
>> }
>> 
>> +static const uint8_t arecont_sign[] = {0x2D, 0x2D, 0x66, 0x62, 0x64, 0x72, 0x0D, 0x0A};
> 
> I admit I was more thinking of either
> static const uint8_t arecont_sign[] = "--fbdr\r\n";
> which ends up 1 byte too long, or
> static const uint8_t arecont_sign[8] = "--fbdr\r\n";
> 
> Though there is also the option to go for
> static const uint64_t arecont_sign = AV_RL64("--fbdr\r\n");
> or similar.

   Okay, I will change it. 

>> +static int arecont_find_sign(unsigned char *data, int size)
> 
> You should be consistent with the types even if they are essentially
> the same.
> i.e. uint8_t instead of unsigned char *
> Also const, since this function does/should not modify "data".

   Will change it too, 

>> +    int sign_size = sizeof(arecont_sign) / sizeof(arecont_sign[0]);
> 
> First, this is the expression of number of elements, so the
> division part is semantically wrong.
> Also it's pointless because it will be 1 always.

    Will use strlen() as i am changing the above. 

>> +    j = memchr(data, arecont_sign[0], size);
>> +    while (j != NULL && size - sign_size >= (j - data)) {
>> +        if (!memcmp(arecont_sign, j, sign_size))
>> +            return (j - data);
>> +        if (size - sign_size == (j - data))
>> +            return -1;
>> +        j = memchr(data + (j - data) + 1, arecont_sign[0], size - (j - data));
>> +    }
> 
> I know I brought this memchr up, but did you do any benchmarks?
> Unless you have actual evidence of a speed problem AND can
> show that your solution actually makes it faster, I'd
> suggest to go with the simplest possible solution.

 Yeah, I checked the both approaches and perform test to compare time.
and found the previous one is good for shorter 
 strings (length < 300). and the newer version is good for strings
larger than this. The newer version is 4 microseconds faster 
 in our case(length ~ 1100). As the diff is small, so i think the
simpler version is better. 

>> +    data = av_malloc(size);
>> +    ret = avio_read_partial(s->pb, data, size);
>> +    if (ret < 0) {
>> +        av_free(data);
>> +        av_packet_unref(pkt);
>> +        return ret;
>> +    }
>> +    if (pos <= ARECONT_H264_MIME_SIZE) {
>> +        avio_seek(s->pb, 0, SEEK_SET);
>> +        start = pos;
>> +    } else {
>> +        avio_seek(s->pb, pos - ARECONT_H264_MIME_SIZE, SEEK_SET);
>> +        start = ARECONT_H264_MIME_SIZE;
>> +    }
>> +    ret = avio_read_partial(s->pb, data, size);
>> +    if (ret < 0 || start >= ret) {
>> +        av_free(data);
>> +        av_packet_unref(pkt);
>> +        return ret;
>> +    }
> 
> Unfortunately I still have no idea what that code is meant to do.
> First there is no point in allocating "data" when you have
> pkt->data already (yes, it would mean using memmove instead
> of memcpy later on, but that seems to be about it).
> Then the code reads TWICE into the same buffer for some reason,
> the first read seems completely pointless?
> Also the only point of the _partial variant of the read function
> is reducing latency, however the avio_seek is likely to be
> quite a bad hit for latency that this really seems like
> premature optimization.
> Also the avio_seek means this demuxer might not work at all
> if the stream is not seekable (might since I don't know
> if we maybe do enough buffering nowadays - but if the code
> relies on that there should be a check).

    What i did in the above code is that as the packet can contain
partial http header at the start or end of the pkt, which needs to be
removed, i seeked backward 100 bytes( or at the start of the file if
necessory) and the i did the same to remove the partial http header at
the end, i seeked forward 100 bytes extra. 

                  start                      end 
stream..............^.........................^.................... 
             |_____________|             |_________| 
                    |                         | 
              http header                http header 

I first filled the "data" and then searched for the http header, copied
the bytes before position of http header from "data" to pkt->data. And
then skipped the http header and again searched for another http header
in "data" and did the same thing. Also, the code is reading twice
because the first is just if any errors come( like file ended). because
after that the avio_seek() seek stream backwards and if it would do,
then error can't be detected. 

Yeah, i realized that the avio_seek() is not a good idea. One another
approach i thought is that to store 100 bytes (or 70) of the current
iteration for the next iteration. In this way we don't have to use
avio_seek.also there would not be any need of reading twice. Please
suggest if it is right. (Sorry, if i am asking too much doubts). 

>> +        if ((j = arecont_find_sign(data + i, ret - i)) >= 0) {
> 
> Very personal dislike but:
> please just don't put assignments into the if.

   Okay, would change this. 

>> +            k = 0;
>> +            for (w = j + sign_size; w + 1 < ret; w++) {
> 
> So what exactly happens when you actually end up hitting
> that loop end condition?
> It seems to me you then just leave the whole thing in,
> even though you should have removed it?

 When the http header is not found on the "data" then the loop ends, or
if it is found then the bytes before the position of http-header are
copied to pkt->data and then the http-header is skipped and loop again
runs. 

>> +        } else
>> +            break;
> 
> Inverting the condition and putting the 1-line case in the
> "if" instead of the "else" is much better for readability.
> In this case it also saves you 1 indentation level.

  Okay, would change this too. 

> On the general approach:
> you are scanning a whole, potentially many, many GB large video
> file for a 8-character string.
> A false positive has a REALLY high likelihood.
> I think this needs to be changed into a more clever approach,
> that actually knows where these strings can appear and removes
> them in a more targeted way...

Yeah this would be slow for large files, I also tried finding out if
there is any relation between positions of the http-header but didn't
found.

  Thanks for the review 
  Shivam Goyal