[FFmpeg-devel] v4l2: bug #1570 and possible solution

Wed Feb 13 18:05:52 CET 2013

2013/2/13 Michael Niedermayer <michaelni at gmx.at>:
> On Wed, Feb 13, 2013 at 04:53:43PM +0100, Michael Niedermayer wrote:
>> On Wed, Feb 13, 2013 at 03:19:00PM +0100, Giorgio Vazzana wrote:
>> > 2013/2/13 Michael Niedermayer <michaelni at gmx.at>:
>> > > On Tue, Feb 12, 2013 at 07:53:29PM +0100, Giorgio Vazzana wrote:
>> > >> 2013/2/12 Michael Niedermayer <michaelni at gmx.at>:
>> > >> > On Tue, Feb 12, 2013 at 06:01:10PM +0100, Giorgio Vazzana wrote:
>> > >> >> 2013/2/12 Michael Niedermayer <michaelni at gmx.at>:
>> > >> >> > On Tue, Feb 12, 2013 at 12:40:21PM +0100, Giorgio Vazzana wrote:
>> > >> >> >> 2013/2/12 Michael Niedermayer <michaelni at gmx.at>:
>> > >> >> >> +static void mmap_release_buffer(AVPacket *pkt)
>> > >> >> >> +{
>> > >> >> >> +    struct buff_data *buf_descriptor = pkt->priv;
>> > >> >> >> +
>> > >> >> >> +    if (pkt->data == NULL)
>> > >> >> >> +        return;
>> > >> >> >> +
>> > >> >> >> +    if (buf_descriptor->buffer_copied) {
>> > >> >> >> +        av_free(pkt->data);
>> > >> >> >> +    } else {
>> > >> >> >> +        if (!enqueue_buffer(buf_descriptor->fd, buf_descriptor->index))
>> > >> >> >
>> > >> >> >> +            (*buf_descriptor->buffers_dequeued)--;
>> > >> >> >
>> > >> >> > the deallocation of packets could happen from different thread(s)
>> > >> >> > so i think this needs either a mutex, an atomic decrement or some
>> > >> >> > lockless algorithm to achive the same
>> > >> >>
>> > >> >> Ok, I've tried to implement the solution using the mutex. Please comment.
>> > >> >>
>> > >> >> We can use this approach, or the easiest (and slightly slower?)
>> > >> >> solution would be copying every frame/buffer to the memory allocated
>> > >> >> for the corresponding packet.
>> > >> >
>> > >> > if you want to avoid the dependancy on pthreads, one way that should
>> > >> > work is to use an array which size matches the maximum number of
>> > >> > buffers, inited to 0 each time a buffer is allocated its entry is set
>> > >> > to 1, when its freed its set to 0. to find out how many buffers are
>> > >> > left the whole array would need to be scanned and remaining 0 elements
>> > >> > counted.
>> > >>
>> > >> I will have to think about this in the following days, but for now I
>> > >> do not see how this can work: suppose we consider the following
>> > >> example:
>> > >>
>> > >> Initialization:
>> > >> // let's suppose the number of buffers allocated by the v4l2
>> > >> // driver is 10. This is also the lenght of the incoming and
>> > >> // outgoing queues, which are organized as FIFO
>> > >> buffers_obtained = 10;
>> > >>
>> > >> // allocate and clear array
>> > >> int *array = calloc(buffers_obtained, sizeof(*array));
>> > >>
>> > >> // enqueue all the buffers to the incoming FIFO and start capturing
>> > >> start_streaming();
>> > >>
>> > >>
>> > >
>> > >> Thread0 will do this:
>> > >> // when we want to read a frame, we dequeue a buffer from the
>> > >> // outgoing FIFO
>> > >> dequeue_buffer(&buf);
>> > >> // signal that a particular buffer has been dequeued
>> > >> array[buf.index] = 1;
>> > >>
>> > >>
>> > >> Thread1 will do this:
>> > >> // count the numbers of buffers still in the incoming FIFO
>> > >> int i, count = 0;
>> > >> for (i = 0; i < buffers_obtained; i++)
>> > >>     if (array[i] == 0)
>> > >>         count++;
>> > >
>> > > This is not possible
>> > > you imply that v4l2s read packet is itself called from 2 threads at
>> > > the same time
>> > > The only thing that can allocate packets/dequeue buffers is v4l2s
>> > > read packet
>> > > The only thing that has to count the available buffers is v4l2s read
>> > > packet (that is for choosing where to allocate the next packet)
>> >
>> > Ok, I was thinking at the more general case. Let's suppose only one
>> > thread can call v4l2_read_packet().
>> >
>> > It is possible I have not understood correctly the technique you
>> > propose, so I'm sorry if I'm asking what might be a simple question.
>> > Please look at this example:
>> >
>> > Only one thread, Thread0, can do this:
>> > // to read a frame, we dequeue a buffer from the outgoing FIFO
>> > dequeue_buffer(&buf);
>> > // signal that a particular buffer has been dequeued
>> > array[buf.index] = 1;
>> >
>> > // count the numbers of buffers still in the incoming FIFO
>> > count = 0;
>> > for (i = 0; i < buffers_obtained; i++)
>> >     if (array[i] == 0)
>> >         count++;
>> >
>> > if (count == 0) {
>> >     copy_bufferdata_to_packet();
>> >     enqueue_buffer(buf.index);
>> >     array[buf.index] = 0;
>> > } else {
>> >     copy_only_pointer_to_bufferdata_to_packet();
>> > }
>> >
>> >
>> > Another thread, or set of threads, will call the packet destructor:
>> > if (bufferdata_was_copied_to_packet) {
>> >     free(pkt->data);
>> > } else {
>> >     // pkt->data points to the buffer memory region, we need to
>> >     // enqueue this buffer
>> >     enqueue_buffer(pkt->priv->index);
>> >     // signal that the buffer has been enqueued
>> >     array[index] = 0;
>> > }
>> >
>> > I still see at least two threads writing on array, so to avoid a race
>> > condition we still need a semaphore/mutex. If my reasoning is wrong,
>> > could you propose an example that shows how this can be implemented
>> > without a mutex, or point me to a place when I can read about it?
>> > Thanks, your help is much appreciated.
>>
>> consider the buffer at index i
>>
>> Initially its "owned" by v4l2, no other thread can interfere here
>> because nothing else has this buffer and any other buffers use a
>> different index than i
>>
>> now v4l2 sets up a AVPacket with this buffer and sets the array[i] = 1
>> still nothing else knows about this packet yet so no interference
>> possible.
>> the AVPacket is then returned from read_packet
>>
>> now from the point of view of the read packet function as long as
>> array[i] == 1 the buffer is in use and wont be (re)used in any way
>> the only thing that can set it to 0 is the packets destructor which
>> is only called once and only once the buffer is no longer in use.
>> so if array[i] == 0 in read_packet then this implies that the
>> destructor has been called and the buffer is free to be reused
>> it also implies there is nothing else left that could temper with it
>> anymore except read_packet itself

Okay, thanks for the explanation, it's a clever technique but I think
I understand how it works now, I can try to implement it in the
following days.

> one thing i forgot, if you try to implement this you need to be
> carefull with deallocating the array as the destruct callbacks if any
> is left can still access it.

This should not be an issue, if we deallocate the array in v4l2_read_close().
Note that the current code has this problem too: the user/application
needs to call av_free_packet() (which will in turn call the packet
destructor) on all packets before calling v4l2_read_close(), otherwise
pkt->data will be invalid and a call to the packet destructor will use
a closed fd.
To solve this issue we could set fd=-1 on close, add a check for fd>0
in the destruct callback.