[FFmpeg-devel] [PATCH] BGR24 Huffyuv and drive-by bug fixes

Loren Merritt lorenm
Sat Oct 17 19:12:23 CEST 2009


On Sat, 17 Oct 2009, Alexander Strange wrote:
> On Oct 15, 2009, at 5:53 PM, Michael Niedermayer wrote:
>
>> it might be as fast to handle 3 byte groups in C but its not clear how SIMD
>> would behave with that
>
> I don't think it applies here.
>
> The decoder profile looks like:
>
> 	73.5%	73.5%	ffmpeg_g	decode_bgr_bitstream
> 	8.1%	8.1%	ffmpeg_g	add_hfyu_left_prediction_bgr24_c
> 	1.1%	1.1%	ffmpeg_g	bswap_buf
> 	1.1%	1.1%	ffmpeg_g	add_bytes_mmx
>
> so it's entirely VLC-lookup bound (on angels_480-huffyuvcompress.avi/x86-64).
> The first two functions already can't be easily SIMDed, and the second two 
> work just as well in either case.

add_hfyu_left_prediction_bgr32 can be simded. Just use the low half of an 
mmxreg to add 3 samples at a time.
The same works for bgr24, except then the data is unaligned, so the 
load/stores are slower. Shuffles may work better, and are still 
conceptually simple, but are more annoying to write.

Hmm, even the yuv version can be done with a log-depth addition tree. 
Dunno if that's faster than C.

--Loren Merritt



More information about the ffmpeg-devel mailing list