[Ffmpeg-devel-irc] ffmpeg-devel.log.20170625

Mon Jun 26 03:05:03 EEST 2017

[00:07:50 CEST] <iive> atomnuker: i think your clang compiler might produce slower code :P
[00:09:11 CEST] <kierank> durandal_170: ??
[00:10:08 CEST] <kierank> Oh more review...
[00:15:44 CEST] <atomnuker> iive: sure its crap but that not what I'm seeing
[00:15:50 CEST] <atomnuker> its as fast as C
[00:15:55 CEST] <atomnuker> *gcc
[00:16:31 CEST] <JEEB> well with FFmpeg we're disabling vectoring optimizations etc anyways
[00:18:38 CEST] <jamrial> iive: that patch is really hard to read...
[00:18:41 CEST] <jamrial> so much disabled code
[00:19:56 CEST] <iive> jamrial: sorry
[00:22:19 CEST] <iive> jamrial: i've left too many alternative methods. As I want to get some benchmark results from them on different CPU's than mine.
[00:24:48 CEST] <atomnuker> well, so far avx2 is as slow as sse2 on mine
[00:24:58 CEST] <atomnuker> avx and sse42 are the same speed
[00:25:17 CEST] <atomnuker> almost done with results though
[00:25:30 CEST] <iive> atomnuker: be sure to run benchmarks few times
[00:26:15 CEST] <atomnuker> 5 times enough?
[00:26:22 CEST] <iive> yes
[00:41:21 CEST] <atomnuker> iive: you've got issues
[00:41:34 CEST] <iive> what?
[00:41:35 CEST] <atomnuker> sometimes the sum doesn't match K
[00:41:49 CEST] <atomnuker> 5 times in an hour long album
[00:42:02 CEST] <iive> hum, with approx #1 ?
[00:42:30 CEST] <atomnuker> iive: https://pars.ee/temp/pvq_v2_benchmark.txt
[00:42:43 CEST] <atomnuker> I'll try disabling approximations
[00:44:02 CEST] <iive> approx #2 should never have the issue, as it handles it specially
[00:44:13 CEST] <iive> you do test the v2 patch, don't you?
[00:44:15 CEST] <atomnuker> approx #0 fixes it
[00:44:19 CEST] <atomnuker> yes, that's v2
[00:44:30 CEST] <atomnuker> I just forgot to do approx 2, wait for me to do it
[00:45:13 CEST] <atomnuker> ok, approx 2 doesn't error, now lets see its speed
[00:49:43 CEST] <atomnuker> iive: updated file with approx 2
[00:50:48 CEST] <atomnuker> I think you should drop everything except sse42, and drop every approximation except approx 0
[00:51:11 CEST] <atomnuker> (and assume blends are fast and horizontal stuff is slow)
[00:51:34 CEST] <iive> approx 0 uses division
[00:51:47 CEST] <iive> it is quite slow on older cpu's
[00:52:04 CEST] <iive> aka, no approximation
[00:52:16 CEST] <iive> I really hoped that the improved precision on #1 is enough.
[00:53:26 CEST] <iive> btw, could you repeat the test of "all_float_presearch" ?
[00:53:55 CEST] <iive> just to make sure you didn't left "presearch_rounding" at 0
[00:54:57 CEST] <atomnuker> iive: APPROX 0: 0.000000, APPROX 1: 2.866339, APPROX 2: 7.077843
[00:55:09 CEST] <atomnuker> total distortion over 1 minute
[00:55:48 CEST] <atomnuker> iive: presearch rounding was _off_
[00:55:55 CEST] <atomnuker> I mean on
[00:56:02 CEST] <atomnuker> I did it right and I'm not doing it again
[00:57:17 CEST] <iive> ok. well... that's quite suprising since, this code avoids two cvt and replaces them with special "roundps" instruction
[00:58:11 CEST] <atomnuker> iive: tested it again, there is _no_ difference, its still slow
[00:58:27 CEST] <atomnuker> (and made very very sure presearch rounding is on)
[00:59:30 CEST] <iive> maybe i've swapped the rounding parameters
[00:59:46 CEST] <iive> would you test with presearch rounding set 0 and all_floats 1
[01:00:35 CEST] <atomnuker> iive: 1931 decicycles in pvq_search
[01:00:42 CEST] <atomnuker> its still slower
[01:01:03 CEST] <iive> yeh, i've made mistake.
[01:01:03 CEST] <atomnuker> repeated twice, 1926 decicycles now, its slow
[01:01:47 CEST] <iive> i kept the allfloat_presearch, because it allows avx1 with 256 regs...
[01:02:33 CEST] <atomnuker> 256 bit regs don't help
[01:02:50 CEST] <atomnuker> you can always add a patch to modify it later to add support for 256bit regs though
[01:04:13 CEST] <iive> avx1 is more of a hack
[01:04:40 CEST] <iive> that is, using float registers to hold integer values.... we don't need it.
[01:05:10 CEST] <iive> i mean, also using float ops on registers that hold integers.
[01:14:14 CEST] <cone-014> ffmpeg 03Michael Niedermayer 07master:63e7bfe78e6d: avcodec/hevc_ps: Fix max_dec_buffer check
[01:14:15 CEST] <cone-014> ffmpeg 03Michael Niedermayer 07master:73ea2a028e12: avcodec/wavpack: Fix integer overflow in wv_unpack_stereo()
[01:22:37 CEST] <iive> atomnuker: how do you get the "total distortion over 1 minute"?
[01:24:33 CEST] <atomnuker> iive: really, the approximations are unacceptable
[01:24:45 CEST] Action: kierank wonders when atomnuker will learn to put iive on ignore
[01:25:05 CEST] <atomnuker> sacrificing accuracy for performance at stone age CPUs isn't a good tradeoff
[01:25:33 CEST] <atomnuker> kierank: I never ignore anyone, ever
[01:31:55 CEST] <iive> atomnuker: well, you are aware that neither opus, nor your improved C version are providing the optimal distortion.
[01:45:43 CEST] <iive> atomnuker: I don't see benchmarks with stall_write_forwarding and short_syy_update 
[01:55:36 CEST] <atomnuker> iive: https://pars.ee/temp/pvq_bench_diff_accuracy.diff
[01:56:01 CEST] <atomnuker> iive: nevertheless, you don't see libopus decreasing their accuracy because they can get more speed out of a pentium 2
[01:58:56 CEST] <atomnuker> since there's nothing to gain by reducing accuracy on anything more modern that 12 years AND its still faster than C on anything older than 12 years there's no point in approximations
[02:02:24 CEST] <iive> actually they do use approximation #2 in their intrinsics code.
[02:03:20 CEST] <iive> but I'm not sure how they handle the padding... i'll have to look.
[02:04:54 CEST] <atomnuker> so I guess their code needlessly uses the approximation
[02:05:48 CEST] <atomnuker> its not better, its worse
[03:02:50 CEST] <iive> actually when you told me that approx #1 is not precise enough and still writes in the pad, i was thinking that maybe special crafted values for X and Y so that the computations are always off.
[03:03:13 CEST] <iive> seems like opus intrinsic does use just that.
[03:06:39 CEST] <jamrial> michaelni: 933aa91e31 broke fate-hevc-conformance-ENTP_C_Qualcomm_1 with THREADS=7 THREAD_TYPE=slice
[03:07:28 CEST] <jamrial> the checksum changes between runs
[03:17:22 CEST] <iive> i'll be going off.
[03:17:25 CEST] <iive> n8 ppl
[03:32:04 CEST] <kierank> wm4: was ffmpeg-security worth it in the end?
[04:52:21 CEST] <cone-893> ffmpeg 03Michael Niedermayer 07master:89f8bff7983f: avcodec/hevcdec: Do not check the first ff_init_cabac_decoder() call in hls_decode_entry_wpp() for failure
[12:37:31 CEST] <cone-121> ffmpeg 03Paul B Mahol 07master:f269a1e0b881: avfilter/vf_overlay: separate functions with main alpha
[12:43:51 CEST] <wm4> <kierank> wm4: was ffmpeg-security worth it in the end? <- I "caught" a certain someone pushing a patch to git without going through review, and since something similar was posted to ffmpeg-security, that was probably the justification
[12:44:24 CEST] <wm4> at least that's what I thought at the time
[13:16:44 CEST] <cone-121> ffmpeg 03Paul B Mahol 07master:8a14374ab374: avfilter/vf_waveform: allow alpha output for >8 depth planar rgb inputs
[16:46:42 CEST] <cone-461> ffmpeg 03Paul B Mahol 07master:22a03c29006e: avfilter/vf_blend: add extremity blend mode
[17:44:30 CEST] <durandal_1707> atomnuker: are you near finishing noisereduce?
[19:59:20 CEST] <atomnuker> durandal_1707: I promise to finish it soon
[19:59:53 CEST] <atomnuker> so are you planning on dropping prores encoders other than prores_kostya?
[20:00:24 CEST] <durandal_1707> yes
[20:01:30 CEST] <durandal_1707> the other one is very minimal in features
[20:02:35 CEST] <atomnuker> I know what happened
[20:03:04 CEST] <atomnuker> "duuude im writing a prores encoder this new format by apple its gonna be huge it goes inside mp4 and quicktime opens it"
[20:03:45 CEST] <atomnuker> "oh man finished writing intra frame support, this stuff was bscly jpeg, so easy, i bet inter is just like h264"
[20:04:20 CEST] <atomnuker> "wait, there's no intra... screw this, im working on some game codecs next time"
[21:08:46 CEST] <durandal_1707> michaelni: the encoders that try to use strings in options crash frame thread encoder
[21:20:33 CEST] <michaelni> durandal_1707,  do you have a testcase / command line to reproduce this ?
[21:21:52 CEST] <durandal_1707> michaelni: see my old patch for prores_ks frame threading encoder
[21:22:36 CEST] <durandal_1707> michaelni: basicall add non null string option to frame encoder like utvideoenc
[21:23:00 CEST] <durandal_1707> and than encode with frame encoding
[21:23:56 CEST] <durandal_1707> at eof it will crash
[21:28:29 CEST] <durandal_1707> michaelni: have you reproduced it?
[23:27:56 CEST] <philipl_> jkqxz: Thanks for the feedback. I've posted an updated patch set.
[23:47:36 CEST] <iive> https://lists.debian.org/debian-devel/2017/06/msg00308.html small microcode bug in skylake & kabylake cpu's
[23:58:46 CEST] <philipl> awesome
[23:59:13 CEST] <nevcairiel> better dont read the list of erratas for any given processor, they are very long =p
[00:00:00 CEST] --- Mon Jun 26 2017