[FFmpeg-devel] [PATCH] avcodec/internal: improve min_size documentation for ff_alloc_packet2()

Tue Aug 4 14:06:03 CEST 2015

On Tue, 4 Aug 2015 01:46:24 +0200
Michael Niedermayer <michael at niedermayer.cc> wrote:

> On Tue, Aug 04, 2015 at 12:57:38AM +0200, wm4 wrote:
> > On Tue, 4 Aug 2015 00:51:17 +0200
> > Michael Niedermayer <michael at niedermayer.cc> wrote:
> > 
> > > On Tue, Aug 04, 2015 at 12:12:24AM +0200, wm4 wrote:
> > 
> > > > > As well as benchmark results in some commits like
> > > > > 4302a92835313000d4bf8030baae32c2b130d8a1
> > > > 
> > > >     avcodec/pcm: Better min_size for ff_alloc_packet2()
> > > >     
> > > >     33318 -> 30601 decicycles
> > > > 
> > > > That doesn't look convincing at all. So you save ~3000 decicycles for
> > > > a task that isn't CPU bound, but which might blocks entire milliseconds
> > > > on I/O.
> > > 
> > > There was a bug report of a overall (aka time ffmpeg) speedloss
> > > in the % values with high res rawvideo
> > 
> > Was it fixed?
> 
> for rawideo and for several others too yes

I see no proof though. The numbers you posted so far are not really
convincing. They're micro-benchmarks too, and might not even matter in
real world work loads.

> iam pretty sure there are other encoders that suffer from speedloss
> due to this too though

What is "this"?

> 
> > 
> > > > > i do intend to fine tune this for more codecs but i spend more time
> > > > > in such disussions than working on the code and having people tell me
> > > > > to do it differently each time i take a step kind of slows it down
> > > > > and turns it into a mess
> > > > 
> > > > We also must take care of not turning the codebase into a mess.
> > > 
> > > I want to fix the mess
> > 
> > I don't mind, but...
> > 
> > > and i would suggest that we first test if any / which encoders benefit
> > > from the static buffer.
> > > It where quite a few long ago but it seems from a few quick tests that
> > > this is not so anymore.
> > > Can someone confirm ?
> > > if it has become useless then I could drop quite a bit of code
> > 
> > Do you have to do your experiments on the live codebase? If you just
> > want to try something, you could do that in a branch. If it gives good
> > results, merge, if not, it's a failed experiment.
> 
> noone did "experiments on the live codebase"

Then what's that "FIXME" in the heuristic, and the new API parameter
that by now uses only 2 possible values, which make this "heuristic"
completely pointless? What bugs me is that you insist on this (IMHO)
bogus min_size parameter, and all complaints are supposed to be
silenced with the holy "speedloss" justification. Except that so far it
seems to be rather minor, and I bet shows up only in non-realistic
benchmarks. But it serves as justification anyway, because if it's just
a slightly bit faster, it wins, no discussion.

Now I don't claim I can do better (actually I do, so feel free to call
me an idiot), but here are my own (untested) conclusions.

From what I see there are 2 cases:
- Fixed allocations, usually the case with raw encoders. Currently,
  the packet is always a full malloc (probably ends up as a mmap
  syscall if the size is big), but an additional copy of the packet at
  a later point is avoided when ref-ing or dup-ing the packet.
- Variable allocations, where the encoder has a (usually rather high)
  worst case estimation and wants a buffer of this size. Later the
  buffer is trimmed to avoid wasting memory. Currently, this is done
  by encoding to a "static" buffer, and then copying it to a new
  correctly sized buffer when the packet is referenced or dup-ed.

I suppose the min_size parameter is supposed to select between these
cases dynamically and use the "best" method. I suggest doing the
following instead:

- have a buffer pool in the codec context
- if a packet is allocated with a size larger then the current pool
  size, realloc the buffer pool to that larger size
- likewise, if the requested packet is much smaller than the current
  pool size, you could either resize the pool, or just allocate the
  packet normally (apply clever heuristic 1 here)
- the encoder gets a refcounted packet from the buffer pool
- after encoding, check how much memory is "wasted"
- if too much is wasted (apply clever heuristic 2 here), create a new
  copy (this could even use another buffer pool which might help if the
  encoded packet sizes are usually similar, but maybe it's not worth
  the trouble)

This actually has some potential to be faster (like when the encoder
does fixed allocations, e.g. raw video or audio, no large allocations
are done after a while), and is simpler at least from the encoders
point of view. There can be a single ff_alloc_packet function which
takes a size only.

(But I'd just avoid the need for too many heuristics, and have a
separate function for raw encoders. It would allocate from a buffer
pool.)

Also, why is this all not applied to demuxing? In fact, even when
encoding, a lot of data is copied in avpacket.c, and these were packets
from the demuxer. Doesn't matter?

> also i strongly suggest that this discussion ends now and that we
> return to do usefull work.

I don't agree that it's better to avoid the search for better solutions.