[FFmpeg-user] FFmpeg single threaded bottleneck

Dennis Mungai dmngaie at gmail.com
Thu May 14 14:01:36 EEST 2020

On Thu, May 14, 2020, 12:33 Gabriel Balaich <roderrooder at gmail.com> wrote:

> Thanks for the feedback.
> On Thu, 14 May 2020 at 01:52, Edward Park <kumowoon1025 at gmail.com> wrote:
> > Hi,
> >
> > Some values don't look right, try getting rid of them.
> > -thread_queue_size 9999 seems arbitrary,
> it is queue length, not bytes
> I was getting the error "Thread message queue blocking; consider raising
> the thread_queue_size option" when I left -thread_queue_size at default,
> the reason I set it at 9999 is that that is the max it will let me set it
> before the command just errors out. When I remove "-thread_queue_size 9999"
> the errors come back and I drop a massive amount of frames even when doing
> a single 4k60 input / output.
> -indexmem 9999 seems arbitrary, pretty sure default value is bigger
> >
> -indexmem is one of the magical options I never really understood, I added
> it at some point (over 2 years ago at least) hoping it would solve this
> issue. I can't seem to find any information on what the default is, and
> when I remove it from the command it doesn't change the results. That being
> said any single option's relevancy in regards to my commands, at least as
> far as I can tell, is pretty low considering that everything works just
> fine with every option I have when I'm running each input(s) / output in
> its own instance of FFmpeg yet simultaneously.
> > -rtbufsize 2147.48M is kind of abusive, especially for the audio inputs
> >
> > I don't think you should be trying to buffer more, if the buffer keeps
> > growing then it won't last.
> >
> I couldn't try more buffer if I wanted to, 2147.48M (max INT) is the
> maximum buffer size allowed. But even then it only overfills if the
> hardware can't keep up, which is only shown to be the case when transcoding
> over 9K60 worth of video in a single FFmpeg instance.
> > I can't really tell what the dshow input mapping looks like, but I think
> > this is about the limit of your system.
> > With a 6800K, assuming the GPU is full sized,  are there enough lanes
> left
> > for 3 additional capture cards?
> >
> As seen in the screenshots my 6800k is only being overly taxed if I'm
> running all the inputs / outputs in one command / one instance of FFmpeg,
> *and
> only on one thread with plenty of headroom left on all other threads* (see
> task manager screenshots). When I separate them into multiple commands
> running in different processes, but still at the same time with all the
> same options, the 6800k isn't even at 35% total usage with plenty of
> headroom per thread. So it seems pretty clear to me that the 6800k is not
> the bottleneck, even so, I'm replacing it with a Threadripper (1950x, 2.5
> times as powerful as my 6800k) as described in the original message so I
> can have headroom to run FFmpeg and OBS at the same time.
> > Using the hardware encoder for so many streams at once might also have to
> > do with it, you could try saving
> > the raw input to fast enough scratch disk to check for that quickly.
> I'm using a GTX 1080 which has dual NVENC processing chips (see NVIDIA
> encode matrix:
> https://developer.nvidia.com/video-encode-decode-gpu-support-matrix), as
> can be seen in my screenshots the encoder is only at 40% usage, and while
> Nvidia typically only allows you to do 2-3 encodes at once it's a
> pseudo-limitation enforced by software which can be bypassed with a patch:
> https://github.com/keylase/nvidia-patch
> Just to further show that the hardware is not yet an issue in itself, I can
> run 4 separate 4k60 transcodes simultaneously in real-time using just the
> 6800k and the GTX 1080 with 30% headroom left on the CPU, 20% headroom on
> the GPUs encoding chips, multiple gigabytes of VRAM still available on the
> GPU, over 12gb of available system memory, and below 30% SSD usage. The one
> caveat being that each input / output has to be running in *separate
> instances of FFmpeg*, as soon as I try to transcode more than 9K60 in a
> *single
> FFmpeg command / instance* a single thread on my 6800K will reach 100%,
> despite the rest of the chip having 70% headroom, and then the command gets
> behind filling the buffer until there is no memory left.
> Just to make it clear, from my extensive testing the issue only presents
> itself when running massive commands with 3x or more 4K60 transcodes in
> *one
> instance* of *FFmpeg*, *when I run them separately but still at the same
> time I have zero issues*... Other than the fact that I have to run them in
> separate instances which is what I'm trying to avoid due to synchronization
> issues, among others. What I'm really trying to determine here is what part
> of a single FFmpeg instance is being limited to 1 thread when transcoding
> 3+ 4k60 streams.
> _______________________________________________
> ffmpeg-user mailing list
> ffmpeg-user at ffmpeg.org
> https://ffmpeg.org/mailman/listinfo/ffmpeg-user
> To unsubscribe, visit link above, or email
> ffmpeg-user-request at ffmpeg.org with subject "unsubscribe".

Disable Game Mode in Windows 10 and retest.

More information about the ffmpeg-user mailing list