[FFmpeg-devel] [RFC] DXVA2 decoding and FFmpeg

Stefano Sabatini stefasab at gmail.com
Thu May 14 13:01:51 CEST 2015

On date Tuesday 2015-05-12 15:54:17 +0200, Hendrik Leppkes encoded:
> On Tue, May 12, 2015 at 3:33 PM, Stefano Sabatini <stefasab at gmail.com> wrote:
> > There are some cases when DXVA2 (or in general HW decoding) can be
> > used effectively in ffmpeg? Can you tell if there is something which
> > could be improved in the current ffmpeg_dxva2.c implementation? (My
> > guess is that this code is somehow based on the VLC code).
> Its not based on the VLC code, its roughly based on code from my own
> project that uses ffmpeg for DXVA2, but really, the workflow is going
> to be pretty similar in any implementation either way, since the MS
> API dictates that, more or less.
> DXVA2 decoding can be faster then software decoding, depending on your hardware.
> If you used a low-end Intel CPU, say a Pentium or i3 (Ivy or Haswell),
> or use a recent NVIDIA GPU (Kepler or Maxwell), then DXVA2 decoding on
> the GPU can potentially give you ~400 fps for 1080p, while the CPU
> will likely not manage that.
> On a high-end CPU, the software decoder can potentially exceed that, however.
> One limitation is as the manual said, it needs to be copied from the
> GPU to system memory. ffmpeg_dxva2.c does not implement a optimized
> copy function for this, it uses plain old memcpy.
> Intel introduced a new instruction for this in SSE4, MOVNTDQA, which
> is optimized for copying from USWC memory (Uncacheable Speculative
> Write Combining) to system memory. Using this may help speed up the
> process significantly, and VLC probably uses it.

Now the question is, how would be possible to optimize GPU to CPU copy
to get an overall performance gain? At least VLC seems able to get
better performances when using HW decoding, but I'm not sure it is
copying decoded data back to the CPU (indeed it may perform direct
> The original primary goal of this code was however to be able to test
> and debug the hwaccels much easier, and not directly to provide a
> playback/transcoding feature, so such optimizations were not performed
> for brevity.

FFmpeg = Fanciful & Faithless Merciless Powerful EntanGlement

More information about the ffmpeg-devel mailing list