[FFmpeg-devel] [PATCH] avcodec/cbrt_tablegen: avoid pow and speed up cbrt_tableinit

Ganesh Ajjanagadde gajjanag at mit.edu
Thu Nov 26 14:39:49 CET 2015


On Thu, Nov 26, 2015 at 8:27 AM, Ronald S. Bultje <rsbultje at gmail.com> wrote:
> Hi,
>
> On Wed, Nov 25, 2015 at 10:46 PM, Ganesh Ajjanagadde <gajjanag at mit.edu>
> wrote:
>>
>> On Wed, Nov 25, 2015 at 10:13 PM, Ronald S. Bultje <rsbultje at gmail.com>
>> wrote:
>> > Hi,
>> >
>> > On Wed, Nov 25, 2015 at 8:48 PM, Ganesh Ajjanagadde <gajjanag at mit.edu>
>> > wrote:
>> >>
>> >> On Wed, Nov 25, 2015 at 8:29 PM, Ganesh Ajjanagadde <gajjanag at mit.edu>
>> >> wrote:
>> >> > On Wed, Nov 25, 2015 at 8:19 PM, Ronald S. Bultje
>> >> > <rsbultje at gmail.com>
>> >> > wrote:
>> >> >> Hi,
>> >> >>
>> >> >> On Wed, Nov 25, 2015 at 7:36 PM, Ganesh Ajjanagadde
>> >> >> <gajjanag at mit.edu>
>> >> >> wrote:
>> >> >>
>> >> >>> On Wed, Nov 25, 2015 at 6:49 PM, James Almer <jamrial at gmail.com>
>> >> >>> wrote:
>> >> >>> > On 11/25/2015 8:32 PM, Ganesh Ajjanagadde wrote:
>> >> >>> >> On Wed, Nov 25, 2015 at 6:19 PM, Ronald S. Bultje
>> >> >>> >> <rsbultje at gmail.com>
>> >> >>> wrote:
>> >> >>> >>> Hi,
>> >> >>> >>>
>> >> >>> >>> On Wed, Nov 25, 2015 at 5:17 PM, Ganesh Ajjanagadde <
>> >> >>> gajjanagadde at gmail.com>
>> >> >>> >>> wrote:
>> >> >>> >>>>
>> >> >>> >>>> On systems having cbrt, there is no reason to use the slow pow
>> >> >>> function.
>> >> >>> >>>>
>> >> >>> >>>> Sample benchmark (x86-64, Haswell, GNU/Linux):
>> >> >>> >>>> new:
>> >> >>> >>>> 5124920 decicycles in cbrt_tableinit,       1 runs,      0
>> >> >>> >>>> skips
>> >> >>> >>>>
>> >> >>> >>>> old:
>> >> >>> >>>> 12321680 decicycles in cbrt_tableinit,       1 runs,      0
>> >> >>> >>>> skips
>> >> >>> >>>>
>> >> >>> >>>> Signed-off-by: Ganesh Ajjanagadde <gajjanagadde at gmail.com>
>> >> >>> >>>>
>> >> >>> >>>>
>> >> >>>
>> >> >>>
>> >> >>> -------------------------------------------------------------------------------
>> >> >>> >>>> What I wonder about is why --enable-hardcoded-tables is not
>> >> >>> >>>> the
>> >> >>> default
>> >> >>> >>>> for
>> >> >>> >>>> FFmpeg. Unless I am missing something, static storage is
>> >> >>> >>>> anyway
>> >> >>> allocated
>> >> >>> >>>> even
>> >> >>> >>>> if hardcoded tables are not used, and the cost is deferred to
>> >> >>> >>>> runtime
>> >> >>> >>>> instead of
>> >> >>> >>>> build time. Thus binary size should not be affected, but users
>> >> >>> >>>> burn
>> >> >>> cycles
>> >> >>> >>>> unnecessarily for every codec having these kinds of tables. I
>> >> >>> >>>> have
>> >> >>> these
>> >> >>> >>>> patches,
>> >> >>> >>>> simply because at the moment users are paying a price for the
>> >> >>> >>>> typical
>> >> >>> >>>> default.
>> >> >>> >>>> ---
>> >> >>> >>>>  libavcodec/cbrt_tablegen.h | 6 +++---
>> >> >>> >>>>  1 file changed, 3 insertions(+), 3 deletions(-)
>> >> >>> >>>
>> >> >>> >>>
>> >> >>> >>> This has been discussed extensively in the past...
>> >> >>> >>
>> >> >>> >> Can you please give a link and/or timeframe to search for?
>> >> >>> >>
>> >> >>> >>>
>> >> >>> >>> As for the patch, don't forget that tablegen runs on the host
>> >> >>> >>> (build),
>> >> >>> not
>> >> >>> >>> target (runtime), whereas libm.h is for target (runtime) and
>> >> >>> >>> may
>> >> >>> >>> not be
>> >> >>> >>> compatible. I believe that's why we don't use libm.h in
>> >> >>> >>> tablegen
>> >> >>> >>> files.
>> >> >>> >>
>> >> >>> >> I don't understand this, it seems to me like any other code (at
>> >> >>> >> least
>> >> >>> >> in the default configure), it gets called, and like all other
>> >> >>> >> such
>> >> >>> >> things, we use libavutil/libm for hackery. This host/target
>> >> >>> >> business
>> >> >>> >> affects other things as well. What is the issue?
>> >> >>> >
>> >> >>> > libavutil/libm.h uses defines from config.h, which are based on
>> >> >>> > the
>> >> >>> tests run
>> >> >>> > by configure for the target, and not the host where compilation
>> >> >>> > takes
>> >> >>> place.
>> >> >>> > The tablegen applications all run at compile time. What is
>> >> >>> > available
>> >> >>> > on
>> >> >>> the
>> >> >>> > target may not be on the host.
>> >> >>>
>> >> >>> Ok. So I would like an answer to two simple questions that are
>> >> >>> outside
>> >> >>> my knowledge or interest.
>> >> >>>
>> >> >>> Is it possible with some hackery to get this change through, or
>> >> >>> not?
>> >> >>> If so, what is it?
>> >> >>
>> >> >>
>> >> >> You need to understand the issue before you can evaluate hacks.
>> >> >>
>> >> >> The issue is:
>> >> >> - I'm using a linux x86-64 machine using gcc as a compiler, with
>> >> >> libc=glibc
>> >> >> 2.18 (A);
>> >> >> - to build a binary that will run on a Windows Mobile ARMv7 machine,
>> >> >> with
>> >> >> libC=something-from-Microsoft (B).
>> >> >>
>> >> >> tablegen runs on A, but ffmpeg.exe runs on B. libavutil/libm.h only
>> >> >> works
>> >> >> for B. If you want a version of libm.h on A, you need to generate a
>> >> >> version
>> >> >> of libm.h that works on A. There is no relationship between A and B,
>> >> >> and
>> >> >> thus there can not possibly ever be any relationship between A's
>> >> >> libm.h
>> >> >> and
>> >> >> B's libavutil/libm.h.
>> >> >>
>> >> >> It's probably possible to generate a version of libm.h for A, but
>> >> >> that's
>> >> >> not so much a coding issue, as it is an issue of understanding the
>> >> >> build
>> >> >> system and including detection for stuff on machine A, as opposed to
>> >> >> machine B (which is what most of configure does).
>> >> >
>> >> > Thanks a lot for the detail. So how about using a local
>> >> > #ifndef cbrt
>> >> > #define cbrt(x) pow(x, 1 / 3.0)
>> >> > code...
>> >> > #undef cbrt // at the very end of the file
>> >> > #endif
>> >>
>> >> Not that simple, something more like
>> >> #ifndef cbrt
>> >> #define ff_cbrt(x) pow(x, 1/3.0)
>> >> #else
>> >> #define ff_cbrt(x) cbrt(x)
>> >> code...
>> >> #undef ff_cbrt // at the very end of the file
>> >> #endif
>> >>
>> >> - this will resolve a glitch with the above in not undef'ing an
>> >> important symbol (all this is of course without including
>> >> libavutil/libm.h).
>> >
>> >
>> > I don't think cbrt is a macro? But anyway, I don't think any of this is
>> > meaningful, you're basically creating a crappy per-file libm.h for
>> > tablegen
>> > files which will be duplicated all over the place. How about we do this
>> > correctly, if we do it at all?
>>
>> I lack the knowledge for doing this in configure, all this avutil/libm
>> and compat business is black magic to me, and worse still I can hardly
>> test any of it since I run exclusively a GNU/Linux setup, as evidenced
>> by some recent patches. I can put in an effort if someone is willing
>> to review and if it is reasonably simple in configure.
>>
>> >
>> > (Don't forget these table inits run once per process, so the cost of
>> > increased code messiness and code complexity has a much higher weight
>> > than
>> > it does in normal situations. Runtime isn't that important because it
>> > only
>> > runs once.)
>>
>> I doubt the patch itself qualifies as complex/messy as is. It is all
>> this cross compiling business and associated hackery that is.
>>
>> An extra 0.1 ms of CPU burning across 1000's of clients across the
>> globe is significant, especially when it is trivially avoided, going
>> back to the above point.
>
>
> You're getting in this argument/defensive mode again, don't do that.
> Cross-compiling is not a hack. Your patch is a hack. Our code as it is right
> now works perfectly fine when cross-compiling, and lots of people use it
> like that - including me.

Call this "mode" whatever you want. I am as interested as you in
finding a solution, and have posted a patch for configure+libm.

To explain why I do this, there have been absolutely zero replies
addressing a point I made regarding the burning of cycles to the order
of milliseconds across codecs, apart from a casual dismissal. Even
now, there are misconceptions like it "only runs on the host".

What is the fundamental issue with testing out the configure+libm
patch, diff is pretty small?

>
> I understand that the configure code is convoluted, and I don't fully
> understand it either. However, to correctly solve this problem, you need to
> do it right.

I have made an effort to do so.

>
> Ronald


More information about the ffmpeg-devel mailing list