[Ffmpeg-devel-irc] ffmpeg-devel.log.20180501

Wed May 2 03:05:03 EEST 2018

[00:08:18 CEST] <durandal_1707> i will remove 4 arguments, and let it filter only mmsize multiple of stuff and return consumed bytes, because writing normal asm is pain
[00:11:09 CEST] <kierank> just make the function work on mod 8 or 16 data and finish the rest in c
[00:11:41 CEST] <Gramner> writing scalar asm tends to not be very useful compared to compiler-generated code in general
[01:31:33 CEST] <cone-189> ffmpeg 03James Almer 07master:65d36473c930: avcodec/cbs_mpeg2: create a reference to the existing buffer when decomposing slice units
[01:31:34 CEST] <cone-189> ffmpeg 03James Almer 07master:0807a7716009: avcodec/cbs_h2645: create a reference to the existing buffer when decomposing slice units
[01:32:46 CEST] <jkqxz> jamrial:  I did some more AV1, but the minor changes everywhere are quite annoying.
[01:32:54 CEST] <jkqxz> It doesn't really invite doing much more until it's actually frozen (so I don't have to read through all of the syntax tables carefully yet again).
[01:33:07 CEST] <JEEB> yea :/
[01:35:05 CEST] <jamrial> i don't think there have been any changes to the header obus in a while, but yeah, better wait until the thing is officially frozen
[01:36:03 CEST] <jkqxz> The uncompressed header was totally rearranged since ~three weeks ago.
[01:37:13 CEST] <jamrial> yeah, they added this reduced still image stuff and turned it upside down, heh
[01:38:09 CEST] <wm4> didn't they say it's frozen a month ago? lol
[01:38:29 CEST] <jamrial> that was a pr move
[02:07:11 CEST] <BBB> https://aomedia.googlesource.com/aom/+/98ec4594116b20e98c785f6e15dd463b7225e857%5E%21/#F0
[02:07:13 CEST] <BBB> that was 4 days ago
[02:07:16 CEST] <BBB> so no, its not frozen
[02:07:37 CEST] <BBB> I believe that most changes now are about fix bug, not add feature"
[02:21:23 CEST] <jamrial> BBB: and https://aomedia.googlesource.com/aom/+/598d11fdceedaf362033cc24323d602117540ee7%5E%21/ was today, so yeah
[04:35:08 CEST] <rcombs> jkqxz: the folks at NAB seemed to be under the impression that it was frozen, but I think what they internally mean is that silicon design work is theoretically unblocked
[11:03:53 CEST] <nevcairiel> rcombs: so their silly marketing stunt with their fake announcement apparently worked on the NAB people, huh
[13:17:54 CEST] <cone-851> ffmpeg 03Paul B Mahol 07master:273edb2fe45a: avfilter/vf_neighbor: rewrite without using temp memory
[15:10:15 CEST] <cone-851> ffmpeg 03Paul B Mahol 07master:ddf844d17c40: avfilter/vf_neighbor: simplify code little
[15:10:16 CEST] <cone-851> ffmpeg 03Paul B Mahol 07master:5bfc433a6ed9: avfilter/vf_neighbor: add slice threading
[16:05:13 CEST] <cone-851> ffmpeg 03Paul B Mahol 07master:2308a3c7e37d: avfilter/af_biquads: change clipping detection from global to channel
[16:05:14 CEST] <cone-851> ffmpeg 03Paul B Mahol 07master:d176497cec95: avfilter/af_biquads: add slice threading
[16:26:53 CEST] <durandal_1707> JEEB: somebody claims overlay is slower with framesync2 ?
[16:28:36 CEST] <JEEB> it seems to have been my sub2video fix after all quite likely, something regarding buffering or something that makes realtime use cases start failing after N amount of time. so without my fix you lose overlay because of End-of-Stream at filter chain reinit, and with my fix there is something wrong somewhere - I haven't debugged what exactly yet
[16:28:58 CEST] <JEEB> I think I am just going to call this "I'm going to slap whomever came up with using the same filter chain for video and audio"
[16:29:19 CEST] <JEEB> if you want a sample and replication command I can give that if you want to look into it
[16:57:23 CEST] <Danil_Iaschenko>  Hi everybody! My name is Danil Iashchenko, I'm GSoC student and I will be working on project "OpenCL support for libavfilter". (https://summerofcode.withgoogle.com/projects/#4925416792391680) My mentor is Mark Thompson. 
[17:05:47 CEST] <durandal_1707> wm4: somehow mpv thinks pts are broken, and is trying to fix it, that's way backstepping is so slow
[17:31:52 CEST] <jkqxz> Danil_Iaschenko:  Welcome!
[17:36:19 CEST] <jkqxz> Danil_Iaschenko:  If you have any general questions about ffmpeg development then here is a good place to ask them, and some people have useful knowledge of hardware or libavfilter (though not necessarily OpenCL).
[17:39:25 CEST] <jkqxz> (Though as you can see there isn't much excitement going on right now - traffic is quite variable, and it's fine to lurk all the time.)
[17:58:36 CEST] <jamrial> jkqxz: labour/worker day :p
[17:59:00 CEST] <nevcairiel> except in murica, they for some reason have that in fall
[18:01:34 CEST] <jkqxz> Ohright, yeah.  The holiday for it in this country is defined as the first Monday in May, so it's actually next week.
[18:01:59 CEST] <nevcairiel> first of may for us
[18:03:07 CEST] <atomnuker> jkqxz: that's a bank holiday
[18:04:15 CEST] <atomnuker> nothing to glorify the low-middle class workers
[18:06:57 CEST] <jkqxz> Well, May Day was a thing before socialism was invented.
[18:08:05 CEST] <nevcairiel> those two concepts kind of combined around here
[18:11:52 CEST] <durandal_1707> who have expertize in IVTC and deinterlacing?
[18:14:57 CEST] <atomnuker> JEEB maybe?
[18:15:08 CEST] <jkqxz> As long as the peasants stay down they can have as many maypoles and as much morris dancing as they like, I guess.
[18:17:30 CEST] <atomnuker> and as long as these pagans don't do that in front of churches
[19:30:51 CEST] <kierank> durandal_1707: me
[19:30:54 CEST] <kierank> what do you want
[19:31:17 CEST] <JEEB> oh, so you found something wonky with it too?
[19:33:41 CEST] <atomnuker> you can't use detelecine in mpv as it complains the framerate is variable (its uninitialized, 0/1 or something)
[19:33:55 CEST] <atomnuker> you can use the fieldmatch filter
[19:34:02 CEST] <durandal_1707> detelcine is carl's pet
[19:34:38 CEST] <atomnuker> its useless, and then there's the dejitter filter which is meant to fix the output of detelecine
[19:35:10 CEST] <durandal_1707> dejudder
[19:35:33 CEST] <atomnuker> I say remove detelecine and that, move fieldmatch to detelecine and make detelecine optionally decimate the output
[19:35:50 CEST] <atomnuker> the decimate filter has the same error if used in mpv too btw
[19:37:32 CEST] <durandal_1707> atomnuker: use them inside -vf "lavfi=[]"
[21:19:00 CEST] <durandal_1707> cat i get explicit approval for overlay patch?
[21:22:00 CEST] <BBB> atomnuker: \o/
[21:23:48 CEST] <durandal_1707> but, but we are old code museum, not code removal please!
[21:33:19 CEST] <Gramner> need more mmx code for everything
[21:34:08 CEST] <Gramner> think about all those people who want to decode 4K AV1 on their pentium pros
[21:35:50 CEST] <atomnuker> what happened with musl libc btw? do they still do floating point math inside malloc() and don't bother with the FPU state?
[21:36:24 CEST] <Gramner> iirc they changed that
[21:37:36 CEST] <jamrial> atomnuker: add the deprecated opt flag while at it
[21:37:47 CEST] <iive> musl should be using lookup table now.
[21:38:36 CEST] <durandal_1707> hello, please explicitly approve my overlay patch!
[21:39:17 CEST] <Gramner> durandal_1707: now you're just making me want to nitpick it!
[21:40:01 CEST] <cone-785> ffmpeg 03Matt Oliver 07release/4.0:29328d96b90f: ffplay: Fix realloc_texture when input texture is NULL.
[21:40:01 CEST] <cone-785> ffmpeg 03Marton Balint 07release/4.0:da6c519f6e53: avformat/qtpalette: parse color table according to the QuickTime file format specs
[21:40:01 CEST] <cone-785> ffmpeg 03Marton Balint 07release/4.0:70a01aa4901a: avcodec/anm: fix palette alpha
[21:40:01 CEST] <cone-785> ffmpeg 03Marton Balint 07release/4.0:0a22e31fbbba: avcodec/hnm4video: fix palette alpha
[21:40:01 CEST] <cone-785> ffmpeg 03Marton Balint 07release/4.0:d89eea345586: avdevice/decklink_dec: unref packets on avpacket_queue_put error
[21:40:40 CEST] <durandal_1707> nooooooo!
[21:41:56 CEST] <durandal_1707> because you are all evil, i sent you more spam to your mailboxes!
[21:42:28 CEST] <atomnuker> jamrial: oh, right, forgot about it, added locally as replied on the ml
[21:47:06 CEST] <durandal_1707> atomnuker: i posted spam because you complained multiple times, now I expect explicit approvals
[21:53:28 CEST] <durandal_1707> jamrial: i not gonna add x86_32 support for overlay asm, there are others that can do it
[21:54:00 CEST] <jamrial> it's a one line change for two functions, and a two line change for the last function
[21:55:12 CEST] <jamrial> don't be lazy. it's not like the others i adapted that required ifdeffery
[21:58:31 CEST] <durandal_1707> i'm not gonna do it, its hard, and i can not test it, dont be lazy and write it instead
[21:58:53 CEST] <JEEB> I mean, someone already ran FATE for you, right?
[21:59:30 CEST] <JEEB> or are you expecting someone to rage at you and eff you off and send a patch?
[22:00:51 CEST] <atomnuker> durandal_1707: you just need to recompile with a different flag to test it (unless you're from the future and they've dropped 32 bit in x86 cpus)
[22:01:20 CEST] <durandal_1707> i'm from future
[22:04:57 CEST] <atomnuker> have they dropped 16-bit mode too?
[22:08:40 CEST] <atomnuker> durandal_1707: oh wow, you reposted the patchset, thanks
[22:08:54 CEST] <atomnuker> that's with the 2 bugs fixed from last time, right?
[22:12:16 CEST] <durandal_1707> i do not know, was long ago
[22:13:47 CEST] <Gramner> 16-bit x86 with memory segmentation is the one true µarch. it will be in your cpu forever eating transistors
[22:13:53 CEST] <atomnuker> I think so, the ffplay limited range only patch addressed IIRC one of them
[22:14:25 CEST] <Gramner> along with x87
[22:14:37 CEST] <atomnuker> nah, AMD will drop it soon, probably
[22:14:46 CEST] <atomnuker> (as in, not flag it, they'll drop it later)
[22:15:02 CEST] <atomnuker> like what happened with FMA4
[22:15:31 CEST] <Gramner> yeah, AMD actually drops stuff unlike intel which still support undocumented instructions from the 8086 days
[22:16:04 CEST] <atomnuker> lol, really? which ones
[22:16:27 CEST] <Gramner> it's just too bad that they didn't bring the axe to a lot more stuff when they created amd64
[22:19:07 CEST] <jamrial> amd drops xop, which has useful instructions, but keeps shipping cpus with sse4a, which nothing and nobody uses
[22:19:28 CEST] <Gramner> xop was good but never got traction because intel refused to support it
[22:20:54 CEST] <Gramner> atomnuker: 0xD6 is one undocomented instruction for example
[22:21:07 CEST] <Gramner> equivalent to sbb al, al except it doesn't set flags
[22:21:16 CEST] <Gramner> which is obviously incredibly useful /s
[22:21:25 CEST] <Gramner> and totally warrants a one-byte opcode
[22:23:16 CEST] <jamrial> intel adopted amd64 and nothing else from amd because legacy code must run no matter what, so ia64 was a no go
[22:23:19 CEST] <jamrial> it's no wonder then why they keep instructions like that one in place
[22:23:34 CEST] <Gramner> so much useless garbage occuping one-byte opcodes, so any new instructions become insanely long
[22:25:15 CEST] <Gramner> had amd been smart they would've axed all the crap from 64-bit mode when they created amd64 while still supporting it in 32-bit mode. so new stuff could be made 64-bit only with reasonable opcodes. e.g. you could have had avx being 64-bit exclusive with really short opcodes for example
[22:25:27 CEST] <Gramner> which would improve performance because instruction decoding sucks
[22:26:08 CEST] <Gramner> some avx-512 instructions are like 12 bytes long, it's insane
[22:28:52 CEST] <durandal_1707> jamrial: i prefer to explicitly state no stack is used
[22:31:33 CEST] <jamrial> ok
[22:34:45 CEST] <iive> Gramner, amd didn't invent avx
[22:35:04 CEST] <Gramner> no, but they invented amd64 (hence the name)
[22:35:35 CEST] <Gramner> and they would reasonable have assumed that at some poin in the future you'd want to add more instructions
[22:35:47 CEST] <iive> my point is, intel still could have made something that doesn't use 4-5 byte prefixes for each instruction.
[22:35:50 CEST] <durandal_1707> jamrial: how to use al straight from the memory?
[22:36:05 CEST] <nevcairiel> iive: not really though, there is only so many values in a byte
[22:36:33 CEST] <Gramner> how? prefixes are required to not cause collisions with existing instructions so not using them would break stuff
[22:37:36 CEST] <jamrial> durandal_1707: almp
[22:37:45 CEST] <nevcairiel> the way they do the vex prefix is already extremely hacky because there just wasnt any bytes left
[22:37:46 CEST] <iive> well, not that it is a good solution, but it is a solution. you can use a switchable mode
[22:38:22 CEST] <iive> so you can say, all these mmx codes are actually avx. and use the prefix only to invert the mode.
[22:38:34 CEST] <iive> i think there is already something similar for 32 vs 16 bit.
[22:41:59 CEST] <iive> i have explained before my idea how to make extensible SIMD, so the mode could actually be setting the register size
[22:42:22 CEST] <iive> the benefit, you won't need new opcodes for avx-1024 
[22:43:07 CEST] <nevcairiel> Even if that might work, you are limited to exactly the same instructions you always had, no changing
[22:43:14 CEST] <nevcairiel> thats not extensible, thats just variable-size
[22:46:08 CEST] <Gramner> I'm guessing changing the meaning of opcodes back and forth on the fly probably introduces a lot of issues with µop caching, interrupts, etc.
[22:46:55 CEST] <Gramner> and you would have to change it a lot since calling external library functions might expect things to be in the "old state" and break otherwise
[22:47:08 CEST] <iive> not really
[22:48:15 CEST] <iive> let's say that we have a register names mmsize, you set the size of register in it and the instructions work with that size, avx style.
[22:49:39 CEST] <Gramner> now every instruction have an additional register dependency which is problematic
[22:49:46 CEST] <iive> that is, if you set mmsize=32, you can use 2 size16 uOPs that execute in parallel. like the amd does.
[22:50:14 CEST] <iive> that register only affects uOP generation.
[22:52:01 CEST] <iive> you can make it so that modification of that register clears all uOP cache. it's huge penalty that we can affort
[22:52:27 CEST] <iive> because we can use a prefix to change it for single operation, 
[22:53:27 CEST] <iive> still better than mixing sse/avx code.
[22:53:47 CEST] <nevcairiel> that could still end up terrible, if you interleave functions one being sse and one avx, you not only would have to write these functions, but also know the full context of how they are called to avoid crazy expensive switches
[22:55:10 CEST] <iive> that's the beauty of it. you don't need to write sse/avx functions anymore
[22:55:24 CEST] <iive> you write the function once and you can use it with any register size
[22:55:37 CEST] <iive> even insanely huge ones.
[22:56:26 CEST] <nevcairiel> register still c hange size and you load different amounts of data, sometimes alignment or block size concerns dont let you  use the full size, so you use a smaller one
[22:56:40 CEST] <nevcairiel> makes no sense to use avx512 on 128-bit data
[22:57:17 CEST] <Gramner> variable-sized vector registers is not a new idea, it just turns out that a lot of real-world code is too complex to just work with arbitrary sized registers. an ymm implementation can differ drastically from a xmm one, not just "do half as many loop iterations"
[22:58:00 CEST] <atomnuker> yeah, I agree, besides it would probably not be as fast as hardcoding the size to instructions
[23:00:26 CEST] <iive> atomnuker, but that's the problem. instruction with big sizes use long redundant prefixes
[23:03:21 CEST] <atomnuker> they wouldn't have to if the ISA was cleaned up a bit
[23:04:10 CEST] <iive> even if it was
[23:04:22 CEST] <iive> it would just mean more pefixes could be used.
[23:10:36 CEST] <iive> let's say we remove half of the x86 instruction. how would you use the freed codes? map them all to avx512?
[23:13:47 CEST] <iive> Gramner, avx literally executes 2 sse registers. A variable length SIMD would actually need an effective system for collapsing the registers.
[23:14:01 CEST] <atomnuker> nah, I'd just use them to make all instructions fit in 2 bytes
[23:14:46 CEST] <iive> atomnuker, no problem doing this now. You just need 1 free byte for a prefix, and then use a number to pick the set.
[23:15:02 CEST] <iive> but this is not what intel have done, they've been piling prefix after prefix...
[23:15:30 CEST] <nevcairiel> because no more bytes are free
[23:15:47 CEST] <iive> there are few that are not used yet.
[23:15:51 CEST] <Gramner> being able to make most simd instructions 1-2 bytes shorter would help, both for decoding performance and power efficiency as well as code cache utilization
[23:16:25 CEST] <nevcairiel> they even only found a half-byte for the vex prefix, it encodes a normal instruction in an invalid way, which the cpu then recognizes
[23:19:09 CEST] <atomnuker> and to think intel was pursuing the mobile market with this ISA
[23:19:38 CEST] <iive> there are 2 bytes that are currently noop prefixes. 0x2e and 0x3e
[23:19:45 CEST] <iive> use them.
[23:21:01 CEST] <Gramner> that would break legacy code that uses them as branch prediction prefixes
[23:21:20 CEST] <Gramner> and they are adamant to preserve compatibility
[23:21:51 CEST] <iive> that's not problem, they can sill be noops when used before branch jump
[23:23:22 CEST] <Gramner> then it's longer than one byte when used as a prefix since you'd have to make sure to not collide with valid following bytes
[23:24:14 CEST] <iive> not really, 
[23:25:57 CEST] <iive> but even if it was 2 bytes. you can have 200 different sets using them.
[23:29:33 CEST] <rcombs> pull an ARM and have an instruction to switch into a smaller encoding
[23:30:14 CEST] <iive> that's what i proposed initially
[23:30:16 CEST] <durandal_1707> huh, why m32 binaries are slower than m64 ones?
[23:31:21 CEST] <rcombs> save a tiny amount on memory bandwidth for pointers, and better code size due to lack of prefixes?
[23:31:59 CEST] <iive> i have even better idea. put more arm cores in the cpu
[23:32:02 CEST] <rcombs> (which you might lose in spills but hey depends on your workload)
[23:32:13 CEST] <rcombs> iive: and also take out all the x86 ones?
[23:32:28 CEST] <atomnuker> durandal_1707: less registers
[23:32:35 CEST] <iive> rcombs, you can leave one. for the XT games :P
[23:33:56 CEST] <Gramner> also worse calling conventions, e.g. 32-bit x86 passes everything on the stack whereas x86-64 uses registers in most cases
[23:34:55 CEST] <nevcairiel> why is it that they never wanted to use registers back then
[23:35:11 CEST] <iive> nevcairiel, not enough registers
[23:35:28 CEST] <Gramner> probably easier to debug when you can read the process memory and see what arguments were passed
[23:35:44 CEST] <nevcairiel> i reckon most of the time you may push something onto the stack before a call and then just pop it out right in the function
[23:36:49 CEST] <nevcairiel> the fastcall calling conventions did exist which used 2 registers, but i ts probably rarely used
[23:37:42 CEST] <Gramner> doesn't fastcall also have the weird thing where the callee must clean up the stack space allocated by the caller?
[23:38:03 CEST] <rcombs> oh I read what durandal_1707 said backwards
[23:38:34 CEST] <rcombs> durandal_1707: it's also because there's some asm that's only available on x86_64 because nobody wants to deal with spills when hand-optimizing
[23:39:15 CEST] <TD-Linux> also just simpler to implement, these calling conventions were made in the 80s
[23:39:16 CEST] <nevcairiel> fastcall is like pascal call or stdcall yeah, callee clean-up
[23:41:37 CEST] <Gramner> it's starting to become time to let 32-bit x86 though
[23:41:44 CEST] <Gramner> die*
[23:41:54 CEST] <Gramner> deprecate it and move on
[23:42:27 CEST] <durandal_1707> lies, we still have 3D Now asm and libpostproc
[23:43:26 CEST] <rcombs> microsoft's apparently got a new windows variant that fits a 64-bit install on 16GB poop tablets
[23:43:39 CEST] <nevcairiel> yeah they stripped it down quite a bit
[23:43:40 CEST] <JEEB> hah
[23:43:47 CEST] <rcombs> I'm wondering when they'll ship a variant that drops 32-bit compat to save space
[23:43:48 CEST] <JEEB> so that means my whiskey might get 64bit
[23:43:51 CEST] <JEEB> not that I want one
[23:44:04 CEST] <durandal_1707> 16GB? I want max 4GB
[23:44:17 CEST] <rcombs> inb4 never because they never gave people any incentive to move to it for non-performance-critical apps
[23:44:17 CEST] <JEEB> I had stopped using it before because I had forgotten to back up the drivers for it
[23:44:27 CEST] <JEEB> yea
[23:44:29 CEST] <JEEB> quite likely
[23:44:41 CEST] <rcombs> and having a 32-bit build has always been crucial since they never stopped shipping 32-bit kernels
[23:44:57 CEST] <Gramner> make 64-bit a hard requirement on x86, flag every xmm simd function as avx and rm -rf all remaining mmx and sse code. then enjoy the complaints
[23:44:59 CEST] <rcombs> meanwhile apple's dropping 32-bit userspace compat in the next macOS
[23:45:34 CEST] <rcombs> eh sse's useful enough
[23:45:36 CEST] <nevcairiel> apple is not concerned about compatiblity in the slighest
[23:45:43 CEST] <rcombs> not everything benefits from avx
[23:45:51 CEST] <nevcairiel> they dont care about devs or users in that regard
[23:46:20 CEST] <Gramner> code maintainance benefits from not having to have an sse2, an ssse3, and an avx implementation of a bunch of code
[23:46:31 CEST] <rcombs> ¯\_(Ä)_/¯ I'll take it
[23:46:34 CEST] <Gramner> although yes, that is a bit extreme I guess
[23:46:50 CEST] <rcombs> Gramner: I mean not everything needs an avx imp at all
[23:46:51 CEST] <nevcairiel> wonder if people might actually wise up to their nonsense if they actually drop x86 entirely and go ARM on desktop or some shit, and suddenly nothing works anymore (or very slowly with translation)
[23:47:55 CEST] <Gramner> rcombs: indeed, but it makes everything simpler to have a higher baseline. just flag it avx and forget about if it's ssse3 or sse4.1
[23:48:07 CEST] <rcombs> ¯\_(Ä)_/¯ I guess
[23:48:55 CEST] <rcombs> wonder how many of my users would actually regress if I made avx a requirement for all simd
[23:49:20 CEST] <Gramner> it's been around since 2011
[23:49:23 CEST] <rcombs> I remember a few people complained when I shipped something that used AVX and checked the AVX-present cpuid flag but not the AVX-OS-supported
[23:49:25 CEST] <nevcairiel> i had people complain when i forced sse2
[23:49:27 CEST] <nevcairiel> so yeah
[23:49:44 CEST] <rcombs> (which then proceeded to crash)
[23:49:49 CEST] <nevcairiel> (2 years ago or so, but i dont expect those weirdos to have changed)
[23:49:50 CEST] <jamrial> i know some people really like their sandy bridges and stick to them, but i don't think such thing happens with nehalem or older
[23:50:05 CEST] <nevcairiel> sandy at least has avx
[23:50:09 CEST] <Gramner> sandy bridge was a super solid µarch
[23:50:15 CEST] <jamrial> i know
[23:50:17 CEST] <Gramner> still works fine in modern games and whatnot
[23:50:45 CEST] <nevcairiel> most people sticking to something much older is either something crazy like Athlon XP or something, or maybe Core2 arch stuff
[23:50:51 CEST] <jamrial> so avx minimum would piss off <=nehalem and <= phenom users
[23:51:48 CEST] <jamrial> i think phenom 6 core was popular even during the piledriver days
[23:52:53 CEST] <atomnuker> I do wish all ISAs had x86 addressing, that would make them suck less
[23:52:54 CEST] <nevcairiel> the good ol amd fake cores, where you got 2 with the performance of one :)
[23:53:14 CEST] <Gramner> https://store.steampowered.com/hwsurvey/ (which is probably one of the better up-to-date surveys) says ~90% avx support
[23:53:31 CEST] <atomnuker> except maybe with more bits for shifts of the second register and the offset could also be a register, that would rock
[23:53:47 CEST] <atomnuker> then all your loads would pretty much just be one instruction
[23:54:10 CEST] <nevcairiel> I'm just happy we finally managed to roll out 64-bit at work last year, although with people running external third-party plugins and stuff we would never dream to cancel the 32-bit version anytime soon
[23:54:32 CEST] <Gramner> evex compressed displacement for constant offsets is also great, should be a thing everywhere
[23:54:50 CEST] <rcombs> I'm still mad at MS for ever shipping 32-bit win10
[23:54:50 CEST] <Gramner> e.g. it encodes imm8 as a multiple of the element size, not absolute bytes
[23:55:47 CEST] <atomnuker> dunno about that, I'd prefer bytes
[23:56:02 CEST] <jamrial> Oculus Rift - 0.20%
[23:56:03 CEST] <jamrial> lol vr
[23:56:03 CEST] <Gramner> bytes are still supported, just uses 4 bytes instead of one
[23:56:42 CEST] <Gramner> just makes more efficient use of the 1-byte offset encoding
[23:57:37 CEST] <Gramner> otherwise with zmm registers you'd only have -2, -1, and +1 * mmsize fit in an imm8
[23:58:33 CEST] <exastiken_> Hello! I have been using callgrind with ffmpeg encoding of a YUV to mp4
[23:58:48 CEST] <durandal_1707> and?
[23:59:27 CEST] <TD-Linux> the steam survey is biased towards better PCs as well. compare it to the firefox version https://hardware.metrics.mozilla.com/
[23:59:32 CEST] <exastiken_> And I noticed that x265::MotionEstimate::motionEstimate
[23:59:43 CEST] <exastiken_> is used heavily during encoding
[23:59:48 CEST] <exastiken_> but not by do_encode
[23:59:53 CEST] <TD-Linux> (unfortunately the firefox one doesn't collect instruction set data)
[00:00:00 CEST] --- Wed May  2 2018