[Ffmpeg-devel-irc] ffmpeg-devel.log.20190110

burek burek021 at gmail.com
Fri Jan 11 03:05:04 EET 2019


[14:51:59 CET] <cone-518> ffmpeg 03Linjie Fu 07master:e92ce340e630: lavc/qsvenc: add VDENC support for H264
[14:52:00 CET] <cone-518> ffmpeg 03Zhong Li 07master:0aaaca25e044: lavc/qsvenc: set pict_type to be I for IDR frames.
[16:22:47 CET] <durandal_1707> is my SIMD patch OK?
[16:36:14 CET] <atomnuker> durandal_1707: last I looked at it it had brackets on one line if statements
[16:40:48 CET] <durandal_1707> atomnuker: why are you so anal about brackets
[16:40:50 CET] <atomnuker> also if its sse only you can remove the xm* prefix from registers, unless you plan to avx it and you require 128bit regs
[16:41:34 CET] <atomnuker> they waste a line, aren't in the coding style, and look ugly
[16:42:12 CET] <atomnuker> its acceptable in x86/whatever_init.c if you plan on adding more versions of the function in the future as it'll reduce the diff by 2 lines
[16:42:28 CET] <durandal_1707> ok
[19:59:14 CET] <cone-518> ffmpeg 03Carl Eugen Hoyos 07master:34ca5adf9a77: configure: Fix hymt decoder standalone compilation.
[20:02:35 CET] <jamrial> michaelni: does fate-checkasm-af_afir fail in any of your x86 machines?
[20:03:00 CET] <jamrial> if so, can you check if changing FLT_EPSILON into something more lax fixes it?
[20:03:02 CET] <cone-518> ffmpeg 03Carl Eugen Hoyos 07master:e52140ba3780: lavfi/Makefile: Fix bwdif filter standalone compilation.
[20:12:03 CET] <cone-518> ffmpeg 03Carl Eugen Hoyos 07master:e51811d21598: lavfi/f_select: Fix aselect filter standalone compilation.
[20:15:46 CET] <cone-518> ffmpeg 03Carl Eugen Hoyos 07master:02b6d1dd6306: lavfi/f_select: Cosmetics, move a function.
[20:32:26 CET] <durandal_1707> if there are no comments i will apply x86 SIMD patch soon
[20:36:46 CET] <iive> got link to the patch?
[20:37:01 CET] <durandal_1707> iive: not for you
[20:37:56 CET] <durandal_1707> iive: it is on patchwork
[20:38:31 CET] <iive> huh? I thought all patches send to the mail list get on patchwork
[20:39:19 CET] <jamrial> he said it is there
[20:39:42 CET] <iive> oh, so he can give an url
[20:41:43 CET] <durandal_1707> iive: url cost 5$
[20:42:06 CET] <michaelni> jamrial, "if (!float_near_abs_eps(cdst[i], odst[i], 1000*FLT_EPSILON)) {" seems to work on mingw32 400* fails on firsttry
[20:42:57 CET] <durandal_1707> you should use 1/32768 and +1/-1 when converted to int16
[20:45:27 CET] <iive> durandal_1707, why are you using unaligned movups, when you have a code to allign them at the start?
[20:45:47 CET] <durandal_1707> iive: i do not have aligned code
[20:46:35 CET] <durandal_1707> i mean all data is accessed unaligned
[20:47:03 CET] <durandal_1707> except for compute_cache but that one is already SIMDed by clang and I'm lazy
[20:47:12 CET] <iive> what are the valid values of len ?
[20:47:37 CET] <durandal_1707> iive: any number, even 0
[20:47:54 CET] <durandal_1707> actually for S it is >= 1
[20:48:08 CET] <durandal_1707> but that one is not useful much
[20:48:29 CET] <durandal_1707> so number can not be and is not aligned to 16
[20:50:09 CET] <iive> yes, I see you have a *ss code at the end.
[20:51:35 CET] <iive> is it possible to align the input buffers (f1/f2) or they could/should be from any position?
[20:52:08 CET] <durandal_1707> no point in aligning, you would waste extra instructions for little gain
[20:52:30 CET] <durandal_1707> this is moving windows of samples
[20:56:03 CET] <iive> that's not the case for most sse processors.
[20:56:29 CET] <iive> avx2 processors are fast with movu... i think.
[20:56:42 CET] <durandal_1707> dead processors are dead
[20:57:31 CET] <iive> it might also explain why your avx code was slower ;)
[21:00:19 CET] <durandal_1707> iive: that was another code, with also aligned data
[21:02:29 CET] <iive> are you sure this code works?
[21:02:55 CET] <durandal_1707> why do you ask? is there something obviously wrong?
[21:05:01 CET] <iive> the modification of f1/f2 pointers looks strange. you actually move them 1/4 length before the values given to the function.
[21:07:28 CET] <durandal_1707> iive: they are decreased by len*4 so instead of -S to +S it goes 0 to 2*S+1
[21:08:24 CET] <iive> you mean len/4
[21:08:30 CET] <durandal_1707> no
[21:09:00 CET] <iive> ok
[21:09:06 CET] <iive> mybad.
[21:10:50 CET] <iive> have you tried decreasing loop0 ?
[21:11:08 CET] <durandal_1707> what?
[21:11:19 CET] <iive> starting from lenq and going to zero? 
[21:11:33 CET] <iive> this way you combine add+cmp with a sub
[21:11:43 CET] <iive> ... into a sub
[21:11:50 CET] <durandal_1707> overall net change for this function is very low, so no point in losing too much time on it
[21:14:06 CET] <iive> what does that mean? that your sse variant is not much faster than the pure C one?
[21:14:42 CET] <durandal_1707> it is faster than C and what clang autovectorization does
[21:18:21 CET] <iive> why are you using lend, when all the other uses are lenq ?
[21:18:29 CET] <iive> aka 32 bit vs 64 it.
[21:20:52 CET] <durandal_1707> no special reason
[21:21:28 CET] <iive> if you want to handle buffers bigger than 4GB, then use only lenq
[21:21:45 CET] <durandal_1707> no reason
[21:21:55 CET] <iive> if you do not even think of that, then it is better to always use 32 bit registers.
[21:22:20 CET] <iive> and since that code shifts, it is 1GB
[21:31:41 CET] <jamrial> iive: using lend clears the upper 64 bits of the reg. it's useful when the argument is passed from stack in x86_64, but in this case it will always be passed directly in the reg, so it's mostly cosmetic
[21:32:33 CET] <iive> jamrial, that's the problem, it would clip it
[21:33:23 CET] <jamrial> it will not, it's an int
[21:33:41 CET] <iive> ptrdiff
[21:33:44 CET] <jamrial> oh, he made it ptrdiff_t. even less of an issue then
[21:34:02 CET] <iive> isn't ptrdiff the size of a pointer?
[21:34:07 CET] <jamrial> yeah, 100% cosmetic with no real effect
[21:34:33 CET] <iive> explain
[21:35:08 CET] <iive> len is 64 bit, lend would clip it.
[21:35:56 CET] <iive> you should either go all 32 bit, or all 64 bit.
[21:35:57 CET] <durandal_1707> such high len are not supported anyway
[21:35:58 CET] <jamrial> len is an int, durandal_1707 changed it to ptrdiff_t since that simplifies simd
[21:37:01 CET] <jamrial> len will never have a value bigger than 32bits wide. it can't since ptrdiff_t on x86_32 is 32bits wide
[21:37:23 CET] <iive> and on 64 bit...
[21:37:35 CET] <jamrial> i said it will never be bigger than 32bits wide
[21:37:45 CET] <jamrial> because len is an int
[21:37:58 CET] <jamrial> and this dsp function simply treats it as ptrdiff_t
[21:38:25 CET] <iive> even so, it is shifted left, so *4
[21:38:33 CET] <jamrial> he can change it to lenq. it doesn'tr really matter
[21:38:56 CET] <iive> once again, if len is never going to need 64 bits, then it should be lend everywhere
[21:39:05 CET] <iive> mixing them is wrong.
[21:39:21 CET] <jamrial> no, it isn't
[21:39:30 CET] <jamrial> please, look at dozen other dsp functions
[21:39:39 CET] <durandal_1707> omg, next time i will just shut up and apply patch
[21:39:54 CET] <jamrial> it's a trick to clear the higher 32 bits of the regs when the argument is passed from stack on x86_64 windows builds
[21:40:13 CET] <jamrial> in this case it does nothing
[21:40:23 CET] <iive> jamrial, yes, but here it is not used as trick to clear the higher 32 bits
[21:40:36 CET] <jamrial> no, it isn't. it's doing nothing
[21:40:36 CET] <iive> jamrial, did you look at the code?
[21:40:46 CET] <jamrial> yes
[21:40:50 CET] <iive> it's just inconsistent.
[21:41:14 CET] <jamrial> no, it isn't
[21:41:19 CET] <iive> i've been told to alway use 32 bit, unless I do need it to be 64 
[21:41:24 CET] <jamrial> look at any other dsp function. pick one
[21:43:04 CET] <durandal_1707> changed lend to lenq locally
[21:43:17 CET] <iive> :)
[21:53:57 CET] <cone-518> ffmpeg 03Paul B Mahol 07master:dcae5ba322fc: avfilter: add anlmdn filter x86 SIMD optimizations
[21:53:58 CET] <cone-518> ffmpeg 03Paul B Mahol 07master:395e8a53fa02: avfilter/af_anlmdn: use lut table to calculate weights
[22:03:33 CET] <iive> one final thing, there was something where cpu's combine cmp with jxx , but it had some requirement for the conditional jump...
[22:03:47 CET] <iive> basically saved one uop.
[22:12:28 CET] <iive> ((x<<1)+1)<<2) should be the same as (x<<3+4) 
[22:15:20 CET] <iive> or rather ((x<<3)+4)
[00:00:00 CET] --- Fri Jan 11 2019


More information about the Ffmpeg-devel-irc mailing list