[FFmpeg-devel] [PATCH][VAAPI][2/6] Add common data structures and helpers (take 3)

Gwenole Beauchesne gbeauchesne
Mon Mar 9 11:56:18 CET 2009


On Sun, 8 Mar 2009, Michael Niedermayer wrote:

>>> realloc() returning NULL means the original is still there just that
>>> it failed re allocating it
>>
>> 7.20.3.4 indeed confirms what you say but I couldn't find a single
>> code in lavc operating that way:
>> new_buffer = av_*_realloc(buffer, new_size);
>> if (!new_buffer) {
>>      av_freep(&buffer);
>>      // do whatever else/return -1
>> }
>> buffer = new_buffer;
>>
>> would this be what you had in mind?
>
> yes
>
> also it might make sense to add a
> av_realloc_and_free()
> that does free the original in case of fail and replace all code that
> expects these semantics rather ...

why not make it the default behaviour for av_realloc(), though a user 
could override that function? I mean, it's generally used as buffer= 
av_realloc(buffer, new_size);

>>> also glibc memcpy() is shit, even more so for copying ito non system
>>> memory
>>> you should maybe look at mplayer which has some memcpy written for
>>> that.
>>
>> Hmmm, I think this statement no longer holds for some years now. ;-)
>> Even Agner's doesn't bring that much, if any performance gain.
>> Besides, system libcs generally provide the best memcpy() tuned for
>> the underlying processor and memory hierarchy (caches geometry et
>> al.). This is true for Apple's (commpage provided functions) and even
>> glibc, though depending on several factors (distributor, architecture).
>
> glibc and "best" in the same paragraph makes me want to puke
>
> anyway, actual numbers: (done 3x to show that they are stable)
>
> k7 : cpu clocks=170059006 = 98361us  (1016.663fps)  1525.0MB/s
> mmx: cpu clocks=293085663 = 169516us  (589.915fps)  884.9MB/s
> sse: cpu clocks=170377116 = 98544us  (1014.775fps)  1522.2MB/s
> c: cpu clocks=195054405 = 112817us  (886.391fps)  1329.6MB/s

Good numbers but:
1) Those are for an aligned case
2) For a large buffer (1.5 MB) it seems
3) slices are generally around 20 KB for 720 H.264 and around 60 KB for 
1080 H.264 for sample streams I have around, and source buffer aligned on 
2-byte boundaries or not at all.

For mutually aligned on 8-bytes boundaries case, we have (K8 in 32-bit 
mode):

(fast_memcpy/sse2)
16384           0.09            10204.49        0.95
16384           0.09            10210.21        1.00
24576           0.09            10539.97        0.97
24576           0.09            10660.10        0.99
32768           0.09            10705.95        1.00
32768           0.09            10687.72        1.00
49152           0.09            10482.79        1.02
49152           0.09            10566.90        0.99
65536           0.09            10539.16        1.00
65536           0.09            10546.25        1.00
98304           0.18            5264.49         2.00
98304           0.18            5266.05         1.00
131072          0.18            5281.90         1.00
131072          0.18            5282.21         1.00

(memcpy/mmx)
16384           0.04            21997.10        1.00
16384           0.04            21997.37        1.00
24576           0.04            21870.70        1.01
24576           0.04            21870.79        1.00
32768           0.04            21374.74        1.02
32768           0.04            21380.27        1.00
49152           0.09            10498.95        2.04
49152           0.09            10499.11        1.00
65536           0.11            8829.70         1.19
65536           0.11            8829.66         1.00
98304           0.11            8818.30         1.00
98304           0.11            8818.50         1.00
131072          0.11            8811.01         1.00
131072          0.11            8811.13         1.00

(memcpy/libc)
16384           0.05            19301.84        1.01
16384           0.05            19301.02        1.00
24576           0.05            19479.85        0.99
24576           0.05            19479.76        1.00
32768           0.05            19214.58        1.01
32768           0.05            19215.24        1.00
49152           0.11            8563.23         2.24
49152           0.11            8563.03         1.00
65536           0.13            7217.21         1.19
65536           0.13            7217.20         1.00
98304           0.13            7201.32         1.00
98304           0.13            7203.47         1.00
131072          0.13            7197.95         1.00
131072          0.13            7197.79         1.00

(memcpy/agner)
16384           0.04            22421.78        1.00
16384           0.04            22422.20        1.00
24576           0.04            22422.01        1.00
24576           0.04            22421.31        1.00
32768           0.04            22418.09        1.00
32768           0.04            22418.23        1.00
49152           0.08            12430.12        1.80
49152           0.08            12430.02        1.00
65536           0.09            10604.25        1.17
65536           0.09            10601.27        1.00
98304           0.09            10619.99        1.00
98304           0.09            10620.01        1.00
131072          0.09            10627.03        1.00
131072          0.09            10627.05        1.00

Agner's is indeed the best, then fast_memcpy/mmx (the very old one), then 
libc, then fast_memcpy/sse2.




More information about the ffmpeg-devel mailing list