[Ffmpeg-devel-irc] ffmpeg-devel.log.20170404

Wed Apr 5 03:05:03 EEST 2017

[00:10:30 CEST] <atomnuker> BBB: did chromium start using ffvp9 too?
[00:10:42 CEST] <BBB> dont think so?
[00:10:56 CEST] <BBB> but I dont keep close track
[00:10:59 CEST] <BBB> why?
[00:11:36 CEST] <atomnuker> https://chromium.googlesource.com/chromium/third_party/ffmpeg/+/ffvp9-staging
[00:11:58 CEST] <atomnuker> oh, 2015, I guess it was just for testing
[00:12:21 CEST] <atomnuker> was their excuse that it was too big?
[00:12:24 CEST] <BBB> yes
[00:12:33 CEST] <nevcairiel> dont they use ffmpeg for other things already
[00:12:37 CEST] <nevcairiel> would it really be that bad
[00:12:41 CEST] <BBB> I believe I was given an arbitrary limit of about 200kb
[00:12:53 CEST] <BBB> which is insane, even ffvp8 is bigger than that
[00:12:55 CEST] <nevcairiel> vpx is smaller then that?
[00:13:10 CEST] <BBB> no, libvpx decoder is about the same size as ffvp9
[00:13:15 CEST] <BBB> but its already t here
[00:13:18 CEST] <nevcairiel> so its entirely insane? :)
[00:13:20 CEST] <BBB> so it wasnt exposed to that limit
[00:13:33 CEST] <BBB> they implemented that limit at some random point to prevent chrome from becoming bloatware
[00:13:34 CEST] <kierank> to play devils advocate don't they need libvpx for webrtc
[00:13:36 CEST] <BBB> which is probably a good thing
[00:13:38 CEST] <nevcairiel> well requiring a (better) replacement so also be significantly smaller is just stupid
[00:13:51 CEST] <atomnuker> kierank: they don't? I thought it mandated vp8
[00:13:51 CEST] <BBB> kierank: yes indeed
[00:13:56 CEST] <BBB> it uses both
[00:14:02 CEST] <kierank> atomnuker: sure but chrome can negotiate vp9
[00:14:16 CEST] <atomnuker> so why wouldn't they need it?
[00:14:34 CEST] <kierank> webrtc is coupled into libvpx somehow iirc
[00:15:49 CEST] <kierank> but yeah it should be vp9 in an ideal world
[00:15:52 CEST] <kierank> ffvp9
[00:16:08 CEST] <kierank> anyone else got spammed by bitmovin?
[00:16:21 CEST] <nevcairiel> some  time ago, but not recently
[00:22:38 CEST] <TD-Linux> they would still need the libvpx encoder of course
[00:23:01 CEST] <TD-Linux> dunno what the incremental size of just enabling the decoder is
[00:25:07 CEST] <kierank> ah that was it, the encoder
[00:25:12 CEST] <kierank> forgot about that
[01:41:24 CEST] <zipwax> I suppose this is the wrong forum to ask a "build" issue?
[01:44:37 CEST] <jamrial> zipwax: yes, ask in #ffmpeg
[03:46:05 CEST] <cone-878> ffmpeg 03Michael Niedermayer 07master:39ee3ddff87a: avformat/mov: Check creation_time for overflow
[03:46:05 CEST] <cone-878> ffmpeg 03Thomas Turner 07master:dc1a1b8bd795: tests/fate/filter-video: add owdenoise test
[06:27:07 CEST] <jamrial> 76d8c77430e9e0110623705bfb54d922cc2ac3ea broke all msvc versions
[06:28:14 CEST] <jamrial> apparently, including stdatomic.h (for these compilers, that means the win32 wrapper in the compat folder) that early fucked up the include order of windows specific headers
[06:36:13 CEST] <jamrial> nevcairiel: ^
[06:36:25 CEST] <jamrial> since you're the only one who can test a fix :p
[06:41:32 CEST] <wm4> I really wouldn't have expected that to break...
[08:28:31 CEST] <cone-441> ffmpeg 03wm4 07master:2a88ebd096f3: ffprobe: port to new decode API
[09:16:21 CEST] <cone-441> ffmpeg 03Nicolas George 07master:0c20f9fcab41: doc/muxers: fix default value for image2 option start_number.
[09:34:11 CEST] <mateo`> ubitux: can i push ? (are you in the middle of a merge ?)
[09:36:37 CEST] <cone-441> ffmpeg 03Hendrik Leppkes 07master:9ac1e8843653: stdatomic/win32: only include the lean windows headers to avoid conflicts
[09:39:08 CEST] <wm4> nevcairiel: won't that break even worse
[09:39:22 CEST] <wm4> if something needs full windows headers, it'd had to include them before stdatomic
[09:39:28 CEST] <nevcairiel> nah, the extra headers that otherwise includes isnt typically used by anything
[09:39:50 CEST] <nevcairiel> and they can always include them explicitly even after
[09:40:04 CEST] <nevcairiel> the solution for the winsock issue that broke ffmpeg.c is either two things:
[09:40:17 CEST] <wm4> (why did it break anyway)
[09:40:39 CEST] <nevcairiel> 1) include winsock2.h before windows.h (ie. before stdatomic in this case), or 2) set LEAN_AND_MEAN and include winsock2.h explicitly after windows.h - which is what we now do
[09:41:01 CEST] <nevcairiel> we already set LEAN_AND_MEAN in the pthreads wrapper
[09:41:10 CEST] <nevcairiel> should possibly set it everywhere just for consistency
[09:42:12 CEST] <nevcairiel> i have actually never worked on any project  where one couldnt use lean and mean
[09:42:30 CEST] <nevcairiel> windows.h includes so much shit if you let it
[09:42:36 CEST] <ubitux> mateo`: you can push
[09:45:04 CEST] <wm4> nevcairiel: sounds sane then
[09:51:06 CEST] <cone-441> ffmpeg 03Matthieu Bouron 07master:6ffaf90b32e4: lavc/mediacodecdec: switch to AV_CODEC_CAP_DELAY
[09:51:07 CEST] <cone-441> ffmpeg 03Matthieu Bouron 07master:3fce174d4f08: lavc/mediacodecdec: set AV_CODEC_CAP_AVOID_PROBING capability
[09:57:15 CEST] <cone-441> ffmpeg 03Clément BSsch 07master:8c5c6871ba67: lavc: add AV_ prefix to CODEC_CAP_DELAY in doxy
[11:16:11 CEST] <cone-441> ffmpeg 03Clément BSsch 07master:34ec327f693a: examples/decode_audio: reduce the scope of 2 variables
[11:27:05 CEST] <cone-441> ffmpeg 03Anton Khirnov 07master:0946c754d99c: examples/decode_audio: use a parser for splitting the input
[11:27:06 CEST] <cone-441> ffmpeg 03Clément BSsch 07master:87e16e2b4458: Merge commit '0946c754d99c05413e813ee515039adcf0f9232a'
[11:29:13 CEST] <atana> michaelni, ping
[11:31:36 CEST] <michaelni> atana, pong
[11:31:38 CEST] <cone-441> ffmpeg 03Anton Khirnov 07master:3d66717f7cb5: examples/decode_audio: use the new audio decoding API
[11:31:39 CEST] <cone-441> ffmpeg 03Clément BSsch 07master:3d12d106775a: Merge commit '3d66717f7cb5555257244be8f5bce172ed3af7ac'
[11:32:17 CEST] <atana> michaelni, how to add lookup and add option? I have updated the code but it has some problem
[11:34:00 CEST] <atana> should I define FLAGS ?
[11:36:06 CEST] <michaelni> atana, the line with OFFSET(mode) must stay otherwise the avoption code would not know where to put the value
[11:36:16 CEST] <cone-441> ffmpeg 03Anton Khirnov 07master:45a1ce2ff768: examples/decode_audio: handle planar audio now produced by the MP2 decoder
[11:36:17 CEST] <cone-441> ffmpeg 03Clément BSsch 07master:28fd79c9db71: Merge commit '45a1ce2ff7688656aacd53c27de5815a7ec13afe'
[11:36:33 CEST] <atana> michaelni, I see
[11:37:51 CEST] <michaelni> atana, yes FLAGS should be there too
[11:38:45 CEST] <atana> michaelni, what should be FLAGS token ?
[11:38:45 CEST] <cone-441> ffmpeg 03Anton Khirnov 07master:9a38184a143a: examples/decode_audio: allocate the packet dynamically
[11:38:46 CEST] <cone-441> ffmpeg 03Clément BSsch 07master:d1105e8f436e: Merge commit '9a38184a143a1560814b084aebe628f8df46e666'
[11:39:22 CEST] <michaelni> atana, #define FLAGS AV_OPT_FLAG_AUDIO_PARAM|AV_OPT_FLAG_FILTERING_PARAM
[11:39:47 CEST] <atana> michaelni, what does this flags tells?
[11:40:09 CEST] <cone-441> ffmpeg 03Anton Khirnov 07master:fee0f1de2c6a: examples/decode_audio: flush the decoder
[11:40:10 CEST] <cone-441> ffmpeg 03Clément BSsch 07master:dd0113da3cff: Merge commit 'fee0f1de2c6a9924acb74013436dbea8f2bd1ecb'
[11:40:52 CEST] <michaelni> atana, that the option is for audio filtering. Filters could have also options that are for video or subtitles. In peakpoints its all for audio
[11:44:22 CEST] <cone-441> ffmpeg 03Anton Khirnov 07master:5f102a955909: examples/encode_video: switch to the new encoding API
[11:44:23 CEST] <cone-441> ffmpeg 03Clément BSsch 07master:4ea942f2ceaa: Merge commit '5f102a9559099429826e84758b8b5182244c52db'
[11:47:37 CEST] <cone-441> ffmpeg 03Anton Khirnov 07master:59ab9e8ba1df: examples/encode_video: allocate the packet dynamically
[11:47:38 CEST] <cone-441> ffmpeg 03Clément BSsch 07master:6db36a022765: Merge commit '59ab9e8ba1df7e3347a4cd2bd56c32e74aede802'
[12:04:41 CEST] <atana> michaelni, repo updated
[12:10:08 CEST] <michaelni> atana, ok, good, next i would suggest to make the number of peaks seatched an looked up per window a parameter too
[12:10:16 CEST] <michaelni> that value that was 8 and is 32 now
[12:10:49 CEST] <cone-441> ffmpeg 03Matthieu Bouron 07master:82116bd8a45c: doc/examples/extract_mvs: switch to new decoding API
[12:10:50 CEST] <cone-441> ffmpeg 03Matthieu Bouron 07master:1cf93196fc69: doc/examples/extract_mvs: make pkt local to the main function
[12:10:51 CEST] <cone-441> ffmpeg 03Matthieu Bouron 07master:400378b7b3a4: doc/examples/extract_mvs: re-indent after previous commit
[12:11:51 CEST] <atana> michaelni, ok. btw did you send the final feedback to marina?
[12:14:37 CEST] <michaelni> atana, not yet, marina didnt ask me yet
[12:14:41 CEST] <michaelni> maybe she forgot
[12:17:10 CEST] <atana> according to her last mail last date was supposed to be 24th March. May be you should talk to her. 
[12:18:35 CEST] <atana> michaelni,  any name suggestion  for number of peaks? 
[12:23:30 CEST] <michaelni> atana, hmm, peaks_per_window maybe
[12:23:47 CEST] <michaelni> atana, yes, ill talk with her
[13:53:24 CEST] <atana> michaelni, added option for peak points and updated repo. now it always says match song id as 0.
[13:58:31 CEST] <michaelni> atana, sizeof(buff) is the size of a pointer not the array if its dynamically allocated as with *malloc*
[13:59:01 CEST] <michaelni> atana, same with sizeof(entry)
[14:15:38 CEST] <atana> michaelni, I have used sizeof(*buff)
[14:20:05 CEST] <michaelni> atana, you need to use peaks_per_window
[14:20:33 CEST] <michaelni> or something calculated from it
[14:21:02 CEST] <atana> michaelni,  no dynamic allocation you mean?
[14:21:32 CEST] <michaelni> dynamic allocation is ok but you cannot get the size of dynamic allocation with sizeof
[14:34:26 CEST] <J_Darnley> Are there any docs which describe swscale?  Something that describes data layout, where the functions are?  I've been tasked with adding some AVX2 (but not a specific area yet).
[14:34:40 CEST] <J_Darnley> Should I look at older commits, like ones that added AVX?
[14:53:09 CEST] <nevcairiel> swscale is neither documented nor is the code very readable, so .. good luck? :)
[14:54:04 CEST] <nevcairiel> although taking some mmx/sse2 functions and avx2'ing them should not be that terrible
[15:06:12 CEST] <wm4> <J_Darnley> Are there any docs which describe swscale?  <- yes, it's called "ask michael"
[15:07:09 CEST] <BBB> J_Darnley: you can ask me
[15:07:14 CEST] <BBB> J_Darnley: Im fairly familiar with sws
[15:07:39 CEST] <BBB> J_Darnley: the danger with sws is that data is not guaranteed to be aligned, or even padded
[15:07:56 CEST] <BBB> J_Darnley: that makes any simd - avx2 as much as any other - difficult because you need to deal with corner cases
[15:08:04 CEST] <BBB> J_Darnley: other than that, most simd should be trivial
[15:08:12 CEST] <BBB> whos familiar with dnxhr?
[15:10:02 CEST] <BBB> also, if youve ever wanted to see a total disgusting hack
[15:10:06 CEST] <BBB> check our ML right now 8)
[15:10:24 CEST] <BBB> Im very happy to take total credit for that one
[15:11:09 CEST] <nevcairiel> these pointers should really be removed
[15:11:29 CEST] <BBB> I agree
[15:11:35 CEST] <BBB> but I dont want to burn my fingers on that
[15:11:58 CEST] <BBB> theres some performance implications related to very very very old simd code for very very very strange archs
[15:12:09 CEST] <BBB> and I just totally dont want to deal with that
[15:14:56 CEST] <atomnuker> BBB: the get_buf dnxhd patch looks okay, not sure about the atomics patch though
[15:15:17 CEST] <BBB> hehehe
[15:15:21 CEST] <BBB> I said its total hack
[15:15:26 CEST] <atomnuker> wouldn't it be better to not have those pointers as global but as part of a context?
[15:15:26 CEST] <BBB> I even included a 8) smiley
[15:15:43 CEST] <BBB> I dont think we should use function pointers like that at all
[15:15:54 CEST] <BBB> the use is fortunately fairly limited
[15:16:03 CEST] <BBB> but as an example, we use it in simple_idct x86
[15:16:35 CEST] <BBB> afaics, it allows using a sse2 put_pixels/add_pixels together with an otherwise plain mmx/mmxext idct
[15:16:58 CEST] <BBB> now, those of us that have a brain would immediately object but Ronald, why dont we use a sse2 idct?
[15:17:02 CEST] <nevcairiel> its some micro-optimization for some obscure idct
[15:17:08 CEST] <BBB> and I would give you a medal for trying!
[15:17:27 CEST] <BBB> but someone needs to actually write it and I dont feel like optimizing stuff that is of the pre-h264 era
[15:17:41 CEST] <BBB> I mean, h264 is already old, and this cruft is from generations before that
[15:17:45 CEST] <BBB> I really cant be bothered to touch it
[15:18:09 CEST] <BBB> J_Darnley: hey, want to write a sse2 simple_idct? ;-)
[15:18:16 CEST] <BBB> </evil>
[15:18:23 CEST] <BBB> maybe we can bother gramner to do it
[15:19:30 CEST] <atomnuker> be a man, make a difference, write some good PVQ search SIMD
[15:20:36 CEST] <atomnuker> or you could try to beat bofh_ in writing a very uncomfortably simdable 15 point fft
[15:21:18 CEST] <BBB> I made a vp9 encoder, is that ok?
[15:22:01 CEST] <atomnuker> scalar quantization? blargh, a thing of the past, a fossil in the history of video encoding
[15:22:39 CEST] <iive> atomnuker: do you have a nice explanation of what PVQ is?
[15:22:44 CEST] <atomnuker> and sadly something we might be forced to continue living with
[15:22:52 CEST] <BBB> it totally has vector quantization
[15:22:57 CEST] <BBB> the vector just has a very small size
[15:23:40 CEST] <atomnuker> PVQ vectors can be hundreds of coefficients large, and the quantizers can be arbitrarily small or large
[15:24:42 CEST] <BBB> that sounds wonderful
[15:24:55 CEST] <nevcairiel> the magic of opus
[15:25:07 CEST] <BBB> I hope the search space does not exponentially increase with number of coefficients
[15:25:13 CEST] <BBB> that would be scary
[15:25:36 CEST] <atomnuker> iive: it exploits l2 normalized vectors (which sum up to 1) by making an integer vector which sums up to the quantizer (K) and which when normalized approximates your input
[15:26:33 CEST] <atomnuker> BBB: linear for the pre-search, after that depending on how good your pre-approximation is its exponential
[15:26:55 CEST] <atomnuker> (well, N^2)
[15:27:04 CEST] <BBB> quadratic
[15:27:26 CEST] <atomnuker> though its usually fast enough to ignore and you can SIMD both parts
[15:27:52 CEST] <iive> atomnuker: that's too much unknown terminology there. that's why I want article.  l2? is that in spacial or frequency domain? vectors in what?
[15:29:45 CEST] <atomnuker> iive: it can be either, and the output integer distribution will follow the same distribution as your input so you can entropy code it efficiently
[15:30:59 CEST] <atomnuker> though its only really been used in the frequency domain since you can have built in AQ
[15:31:04 CEST] <J_Darnley> wow, talking
[15:31:40 CEST] <atomnuker> and you can also feed in arbitrary inputs as predictors
[15:32:01 CEST] <iive> so predictors are something like samples
[15:32:10 CEST] <atomnuker> no, transforms of samples
[15:32:53 CEST] <iive> i mean, sample as in music sampling. a portion of some other melody
[15:32:53 CEST] <J_Darnley> BBB: thanks.  When I get some specifics I will ask you and/or michael.
[15:33:00 CEST] <atomnuker> (though searching those is difficult so daala had most of that capability removed)
[15:33:15 CEST] <atomnuker> iive: yeah, basically
[15:35:47 CEST] <J_Darnley> BBB: regarding simple_idct: maybe, in my spare time.  Otherwise ask kierank if he wants to pay for it.
[15:36:18 CEST] <BBB> actually...
[15:36:43 CEST] <BBB> kierank: ../libavcodec/mpegvideo.c:    ff_idctdsp_init(&s->idsp, s->avctx);
[15:36:47 CEST] <BBB> kierank: WDYT?
[15:37:03 CEST] <kierank> BBB: ???
[15:37:15 CEST] <J_Darnley> > [Tue 04 15:18] <@BBB> J_Darnley: hey, want to write a sse2 simple_idct? ;-)
[15:37:20 CEST] <BBB> kierank: is it useful for you to have a sse2 simple_idct?
[15:37:26 CEST] <BBB> it affects mpegvideo decoding
[15:37:27 CEST] <kierank> what uses simple idct?
[15:37:28 CEST] <kierank> mpeg2?
[15:37:30 CEST] <BBB> yeah
[15:37:36 CEST] <kierank> yes but not urgently
[15:37:41 CEST] <BBB> awh crap
[15:37:44 CEST] <kierank> swscale I need fast in the next two weeks, mpeg2 can wait until after
[15:37:57 CEST] <kierank> maybe something J_Darnley can work on whilst we are in vegas though
[15:38:05 CEST] <kierank> so yes
[15:38:11 CEST] <BBB> hm, vegas
[15:38:18 CEST] <BBB> I wish I had a reason to be there, then I would go also
[15:38:30 CEST] <kierank> have meetings
[15:38:31 CEST] <kierank> do synergy
[15:38:35 CEST] <kierank> thought leadership
[15:38:38 CEST] <BBB> meetings with who
[15:38:43 CEST] <J_Darnley> shift paradigms
[15:38:50 CEST] <BBB> disrupt, innovate
[15:39:01 CEST] <mateo`> jpeg also uses simple_idct
[15:39:03 CEST] <kierank> you should see the number of people that want to meet with us for no reason at all
[15:39:07 CEST] <kierank> just for the sake of having meetings
[15:39:17 CEST] <BBB> kierank: I know that, I have these type of meetings every day
[15:39:23 CEST] <BBB> I dont like that
[15:39:28 CEST] <BBB> Id rather code
[15:39:29 CEST] <nevcairiel> its a way to feel important
[15:39:31 CEST] <nevcairiel> :)
[15:39:52 CEST] <kierank> one person once had a meeting to plan another meeting
[15:44:22 CEST] <BBB>   Previous write of size 4 at 0x7d500000437c by thread T2:
[15:44:23 CEST] <BBB>     [failed to restore the stack]
[15:44:24 CEST] <BBB> ...
[15:44:29 CEST] <BBB> Im running into this problem a lot lately
[15:44:34 CEST] <BBB> I wonder whats going on
[15:45:32 CEST] <iive> atomnuker: does PVQ stand for Pyramid Vector Quantization?
[15:46:53 CEST] <atomnuker> yes
[15:52:35 CEST] <BBB> ahaaaaaaa
[15:53:20 CEST] <BBB> so dnxhd is actually a tsan artifact of not being able to deal with goto
[15:53:25 CEST] <BBB> how interesting
[15:53:32 CEST] <BBB> I wonder if I should file a bug
[15:53:39 CEST] <nevcairiel> ew goto
[15:53:40 CEST] <nevcairiel> :D
[15:53:46 CEST] <BBB> its valid C
[16:00:12 CEST] <Gramner> BBB: sse2? nah, I only write avx-512 nowadays :P
[16:00:30 CEST] <BBB> I dont think avx512 helps much w/ mpeg2
[16:00:39 CEST] <nevcairiel> (or anything at this point :P)
[16:01:57 CEST] <atomnuker> Gramner: how?
[16:02:21 CEST] <atomnuker> those incredibly expensive xeon phi cards?
[16:02:32 CEST] <Gramner> nah, those kinda suck for video stuff
[16:02:47 CEST] <nevcairiel> are the Purley Xeons available yet?
[16:03:02 CEST] <Gramner> in retail? no.
[16:03:21 CEST] <Gramner> but since when has that ever stopped me before ;)
[16:03:56 CEST] <nevcairiel> its funny that there is no single CPU that supports all AVX512 subsets though
[16:04:07 CEST] <atomnuker> cheap trips to the chinese black market?
[16:04:08 CEST] <nevcairiel> guess ER/PF are only really useful for  Phi
[16:05:03 CEST] <Gramner> fast high-precision reciprocals would be useful in other stuff, it's probably just very expensive to incorporate such large lookup tables in silicon to be worth it for normal cpus with current litography
[16:06:12 CEST] <Gramner> they did bump up the precision of the normal reciprocal and sqrt approximation instructions a bit though. from 11(?) to 14 bits
[16:07:09 CEST] <wm4> why does avx512 even exist
[16:07:14 CEST] <wm4> what is Intel doing
[16:07:22 CEST] <nevcairiel> why shouldnt it
[16:07:29 CEST] <wm4> who is helped with this situation
[16:07:30 CEST] <BBB> I like larger registers
[16:07:33 CEST] <BBB> also, more registers is nice
[16:07:36 CEST] <Gramner> gotta make new features since increasing IPC is really bloody hard
[16:07:44 CEST] <nevcairiel> 32 registers sounds nice
[16:08:12 CEST] <BBB> plus it gets rid of some laning restrictions right?
[16:09:47 CEST] <Gramner> more regs is very useful, yes. even if you're not using more than a few. no legacy xmm registers usage so no more vzeroupper requirement. also sane calling conventions on windows so no more saving every vector register on win64 (as long as you don't actually need that many regs of course)
[16:11:03 CEST] <Gramner> new dual-input shuffle instructions. e.g. arbitrary shuffles from 2 input regs to 1 output reg (just words and above initially though for those, byte versions are in CNL i think)
[16:11:39 CEST] <nevcairiel> CNL is where this gets interesting anyway
[16:12:00 CEST] <BBB> do we get vector dereference support?
[16:12:11 CEST] <Gramner> dereference?
[16:12:48 CEST] <atomnuker> BBB: why would you want that?
[16:13:02 CEST] <BBB> int something[256]; uint8_t pix[w*h]; uint64_t sum = 0; for (y=0;y<h;y++) { for (x = 0; x < w; x++) { sum += something[pix[y*w+x]]; } }
[16:13:16 CEST] <BBB> think of the possibilities
[16:13:17 CEST] <atomnuker> ah, LUTs
[16:13:26 CEST] <Gramner> vpternlogd is also pretty cool. it can do every possible bitwise operation from 3 input regs in one instruction
[16:13:39 CEST] <Gramner> there are vector scatters (and conflict detection)
[16:13:51 CEST] <BBB> is there a way to get gramner to write simd?
[16:14:12 CEST] <nevcairiel> give him a cpu with a fancy new arch? :)
[16:14:19 CEST] <J_Darnley> You didn't hear about the vgramnerwrite instruction?
[16:14:29 CEST] <BBB> J_Darnley: but what does it write into?
[16:14:44 CEST] <BBB> into a register? thats probably a gmm then, not a zmm/ymm/xmm
[16:14:51 CEST] <BBB> and I dont know how to read from them
[16:15:38 CEST] <Gramner> I wonder what they will name the inevitable 1024-bit register though. gonna go with our fancy swedish letters and call it åmm
[16:15:43 CEST] <BBB> like opengl has glReadPixels, maybe we need a grReadSimd()
[16:16:42 CEST] <J_Darnley> > call it åmm <-- I hope so
[16:16:51 CEST] <J_Darnley> I want unicode support in my assembler
[16:16:59 CEST] <nevcairiel> break all the assemblers by requiring them to support unicode =p
[16:17:23 CEST] <nevcairiel> also screw everyone everywhere (except in sweden) for not having such keys
[16:17:25 CEST] <nevcairiel> :D
[16:18:08 CEST] <BBB> I HATE TSAN AAAAAAAAA
[16:18:16 CEST] <BBB> it refuses to reproduce the bug altogether now
[16:18:26 CEST] <nevcairiel> isnt that good
[16:18:29 CEST] <nevcairiel> its fixed!
[16:18:44 CEST] <BBB> ok fine
[17:29:21 CEST] <kurosu> so, about libav's bitstream reading API and ffmpeg's
[17:29:38 CEST] <kurosu> I'm not going to contribute to the discussion more than here (and not long anyway)
[17:29:50 CEST] <kurosu> libav's and ffmpeg's are very different beasts
[17:30:11 CEST] <kurosu> ffmpeg's has no benefit to be extended to 64b (tested that on those high rate codecs)
[17:30:41 CEST] <kurosu> libav's will likely break if you try restricting it to 32b (also tried)
[17:31:12 CEST] <kurosu> libav's works better because there's really a local cache and the cpu branch prediction nops a lot of work
[17:31:21 CEST] <kurosu> ffmpeg's - not really
[17:31:45 CEST] <kurosu> libav's current version on x86_32 is really slower
[17:32:04 CEST] <kurosu> mitigated when converted to use 32b regs
[17:32:13 CEST] <nevcairiel> the ffmpeg reader has a cache in those weird-to-use macros, but due to the macros being used only within one function in the typical use-case, the cache is not useful
[17:32:25 CEST] <kurosu> but bad for corner case where more than 24b are read
[17:32:31 CEST] <kurosu> exactly
[17:32:59 CEST] <kurosu> so in essence, not possible to use one or the other simply in my opinion depending on arch
[17:33:11 CEST] <nevcairiel> i was wondering if you c ouldnt just move the cache into the struct and actually persist it
[17:33:28 CEST] <kurosu> and each can't easily be adapted to work efficiently on some archs
[17:33:28 CEST] <nevcairiel> but its probably harder then it sounds
[17:33:47 CEST] <kurosu> ffmpeg's keeps track of the bit position
[17:33:59 CEST] <kurosu> libav reads aligned chunks of 32b
[17:34:41 CEST] <kurosu> however, libav's is easily 20% faster when it matters
[17:34:55 CEST] <kurosu> 20% as in overall codec speed, not microbenchmark
[17:35:15 CEST] <kurosu> (haswell-e btw, running mt)
[17:35:44 CEST] <kurosu> ok timeout, no longer active
[17:35:47 CEST] <kurosu> bye
[17:38:37 CEST] <jamrial> wm4: https://pastebin.com/i4KaBswM does that means unsupported? the decoder does exist
[17:40:15 CEST] <wm4> jamrial: that seems strange...
[17:40:25 CEST] <jamrial> h264 works
[17:40:32 CEST] <wm4> I think hevc should also work
[17:40:45 CEST] <jamrial> i know my gpu doesn't support hevc hwaccel, but this is not hwaccel
[17:41:11 CEST] <jamrial> happens with both the hevc main and main10 samples i tried
[17:41:12 CEST] <wm4> yes, this is a pure software decoder (it can do "hwaccel" similar to libavcodec, but higher level)
[17:41:30 CEST] <wm4> (it = MF, not the wrapper)
[17:41:54 CEST] <wm4> someone else who tried this wrapper for fun reported that hevc decoding works
[17:42:57 CEST] <RiCON> trying now again, works here
[17:43:32 CEST] <jamrial> i get the same even with the fate suite samples
[17:43:47 CEST] <RiCON> msys2 + mingw64 git x86_64 + gcc 6.1: https://i.fsbn.eu/Farj.txt
[17:44:02 CEST] <RiCON> gcc 6.3*
[17:44:04 CEST] <wm4> could depend on the demuxer
[17:44:20 CEST] <wm4> ("input type" is MF-speak for AVCodecParameters)
[17:45:30 CEST] <RiCON> mkv works too https://i.fsbn.eu/8NGo.txt
[17:46:44 CEST] <nevcairiel> maybe it wants mp4 format
[17:46:50 CEST] <nevcairiel> no likey annexb
[17:46:52 CEST] <nevcairiel> or something
[17:49:13 CEST] <jamrial> i tried mkv and raw hevc
[17:54:52 CEST] <jamrial> h264 encoding seems to work with acceleration. gpu usage jumps to ~51% during encoding at least
[17:57:00 CEST] <wm4> that actually uses a vendor-specific MFT
[18:16:46 CEST] <RiCON> jamrial: just -c:v h264_mf?
[18:17:14 CEST] <jamrial> RiCON: actually i was wrong, forgot to add -hw_encoding 1
[18:17:24 CEST] <jamrial> when i add that, it fails :p
[18:18:41 CEST] <jamrial> it fails to negotiate output format
[18:18:45 CEST] <RiCON> same, with 'nvidia {h264,hevc} encoder mft'
[18:20:03 CEST] <jamrial> amd here, only tried h264 since my card has no hevc support
[18:20:55 CEST] <RiCON> with testsrc=s=hd1080,format=nv12 it works
[18:21:12 CEST] <jamrial> both hevc and h264 encoders with in software mode, so yay
[18:21:23 CEST] <jamrial> wonder why hevc decoder fails in software, though
[18:21:44 CEST] <jamrial> work in
[18:21:59 CEST] <RiCON> hevc encoder fails here, garbage output
[18:22:34 CEST] <RiCON> the software one, hardware works :V
[18:24:36 CEST] <jamrial> ah, you're right, didn't bother to look at the output. hevc encoder in software doesn't fail but outputs garbage
[18:25:45 CEST] <jamrial> h264 encoder output in software is correct
[18:29:16 CEST] <RiCON> h264/hevc software encoders accept IYUV,YV12,NV12,YUY2, but nvidia's only accept 420O/NV12
[18:29:25 CEST] <RiCON> not sure what that 420O is
[18:30:55 CEST] <nevcairiel> msdn doesnt exactly elaborate on that format
[18:31:04 CEST] <nevcairiel> probably safer to ignore it :p
[18:32:38 CEST] <BBB> ok so I have a patch that converts all uses of ff_put/add_pixels_clamped to their respective optimized counterparts
[18:32:52 CEST] <BBB> it excludes simple_idct because Im not touching that stuff
[18:32:56 CEST] <BBB> but I can workaround that
[18:33:02 CEST] <BBB> thats probably the best solution right?
[18:48:31 CEST] <BBB> there we go
[18:48:47 CEST] <BBB> massive patch series, but only negative impact on performance should be on mips
[18:48:59 CEST] <BBB> and other than that it gets rid of the global function pointers entirely
[18:54:31 CEST] <jamrial> BBB: 6 patches is massive? tell that to libav and their 100+ sets :p
[18:58:47 CEST] <BBB> massive for something I really dont give a crap about
[19:13:20 CEST] <kierank> BBB: how much work do you think sse2 simple_idct is for J_Darnley. hopefully enough to coincide with vegas and my week vacation
[19:13:57 CEST] <BBB> maybe good to get michaelnis input on this also
[19:14:01 CEST] <BBB> since he wrote the 8bit assembly for it
[19:14:12 CEST] <BBB> but the thing is that the 8bit version is still inline asm
[19:14:19 CEST] <BBB> so I would like to convert it to yasm at the same time
[19:14:20 CEST] <kierank> yeah and mmx
[19:14:36 CEST] <BBB> so convert-to-yasm and add-sse2 sort of go together
[19:14:45 CEST] <BBB> (see also my cavs patch just now)
[19:15:29 CEST] <BBB> converting inline asm to yasm is utterly boring
[19:44:16 CEST] <durandal_1707> is anybody gonna improve speed of our get bits?
[19:44:43 CEST] <ubitux> that would be nice, or we'll have to merge the bitstream api from libav
[19:48:29 CEST] <kierank> durandal_1707: see what korusu wrote
[19:50:08 CEST] <cone-167> ffmpeg 03Anton Khirnov 07master:f78d360bba6d: examples/decode_video: use a parser for splitting the input
[19:50:09 CEST] <cone-167> ffmpeg 03Anton Khirnov 07master:728ea23cce07: examples/decode_video: switch to the new decoding API
[19:50:10 CEST] <cone-167> ffmpeg 03Anton Khirnov 07master:c7ab0eb3050a: examples/decode_video: allocate the packet dynamically
[19:50:11 CEST] <cone-167> ffmpeg 03James Almer 07master:fddd6af45cfd: Merge commit 'f78d360bba6dcfb585847a49a84e89c25950fbdb'
[19:50:12 CEST] <cone-167> ffmpeg 03James Almer 07master:52bce9a13ddc: Merge commit '728ea23cce07467b732f538c87c13da13dd6dcf3'
[19:50:13 CEST] <cone-167> ffmpeg 03James Almer 07master:81cc33adc68e: Merge commit 'c7ab0eb3050acdd3b8cab2c55fc9c1b2e8610a65'
[19:53:30 CEST] <cone-167> ffmpeg 03James Almer 07master:aa498c318323: avpacket: fix leak on realloc in av_packet_add_side_data()
[19:53:33 CEST] <cone-167> ffmpeg 03James Almer 07master:b20bf5584f7a: Merge commit 'aa498c3183236a93206b4a0e8225b9db0660b50d'
[20:05:39 CEST] <cone-167> ffmpeg 03Martin Storsjö 07master:286ab878bd39: fate.sh: Allow setting other make flags for running tests
[20:05:40 CEST] <cone-167> ffmpeg 03James Almer 07master:22164971b0b1: Merge commit '286ab878bd39b56008035638227b3ecb8ec5bbb7'
[20:09:55 CEST] <atomnuker> durandal_1707: I can fix 'em and kurosu's wrong
[20:15:10 CEST] <durandal_1707> did someone invited him?
[20:22:56 CEST] <cone-167> ffmpeg 03James Almer 07master:79ff9935ae4a: utils: Add av_stream_add_side_data()
[20:22:57 CEST] <cone-167> ffmpeg 03James Almer 07master:1893495e1d02: mov: Use av_stream_add_side_data() for displaymatrix side data
[20:22:58 CEST] <cone-167> ffmpeg 03James Almer 07master:12ab667e219e: matroska: use av_stream_add_side_data() for stereo3d side data
[20:22:59 CEST] <cone-167> ffmpeg 03James Almer 07master:caf3c5b27f03: Merge commit '12ab667e219e7fbf8e9aef3731039b75c822df25'
[20:32:02 CEST] <cone-167> ffmpeg 03Martin Storsjö 07master:effc1430b2fe: Revert "checkasm: vp9dsp: Benchmark the dc-only version of idct_idct separately"
[20:32:03 CEST] <cone-167> ffmpeg 03Ronald S. Bultje 07master:06fec74cacbb: checkasm: vp9dsp: benchmark all sub-IDCTs (but not WHT or ADST).
[20:32:04 CEST] <cone-167> ffmpeg 03James Almer 07master:6747fc436e05: Merge commit 'effc1430b2fe5997d9d55bf28dc507c27125eb27'
[20:32:05 CEST] <cone-167> ffmpeg 03James Almer 07master:e386a2f2fe14: Merge commit '06fec74cacbb0ef7f3e5ea0e6c9ced1b6fd7565d'
[20:39:04 CEST] <cone-167> ffmpeg 03Martin Storsjö 07master:721bc37522c5: arm/aarch64: vp9itxfm: Fix indentation of macro arguments
[20:39:05 CEST] <cone-167> ffmpeg 03James Almer 07master:7283725a08c9: Merge commit '721bc37522c5c1d6a8c3cea5e9c3fcde8d256c05'
[20:43:28 CEST] <cone-167> ffmpeg 03Martin Storsjö 07master:79566ec8c779: arm: vp9itxfm: Rename a macro parameter to fit better
[20:43:29 CEST] <cone-167> ffmpeg 03James Almer 07master:52febc687b23: Merge commit '79566ec8c77969d5f9be533de04b1349834cca62'
[20:52:38 CEST] <cone-167> ffmpeg 03Alexandra Hájková 07master:7d8075cf4714: ra144: Convert to the new bitstream reader
[20:52:39 CEST] <cone-167> ffmpeg 03Alexandra Hájková 07master:c60cda7cb476: ra288: Convert to the new bitstream reader
[20:52:40 CEST] <cone-167> ffmpeg 03Alexandra Hájková 07master:f26cbb555b95: rtjpeg: Convert to the new bitstream reader
[20:52:41 CEST] <cone-167> ffmpeg 03Alexandra Hájková 07master:087bc8d70415: sipr: Convert to the new bitstream reader
[20:52:42 CEST] <cone-167> ffmpeg 03Alexandra Hájková 07master:6efbc88a5cf4: smacker: Convert to the new bitstream reader
[20:52:43 CEST] <cone-167> ffmpeg 03Alexandra Hájková 07master:9f78e3a46d15: svq1dec: Convert to the new bitstream reader
[20:52:44 CEST] <cone-167> ffmpeg 03Alexandra Hájková 07master:9ab1a3e28371: truemotion2: Convert to the new bitstream reader
[20:52:45 CEST] <cone-167> ffmpeg 03Alexandra Hájková 07master:0ac07d0b8d75: tiertex: Convert to the new bitstream reader
[20:52:46 CEST] <cone-167> ffmpeg 03Alexandra Hájková 07master:8e4cadea5d20: truespeech: Convert to the new bitstream reader
[20:52:47 CEST] <cone-167> ffmpeg 03Alexandra Hájková 07master:0bea79afa6cc: tscc2: Convert to the new bitstream reader
[20:52:48 CEST] <cone-167> ffmpeg 03Alexandra Hájková 07master:85f760fedd49: twinvq: Convert to the new bitstream reader
[20:52:49 CEST] <cone-167> ffmpeg 03Alexandra Hájková 07master:104a4289f925: utvideodec: Convert to the new bitstream reader
[20:52:50 CEST] <cone-167> ffmpeg 03Alexandra Hájková 07master:e5bdfc679004: vble: Convert to the new bitstream reader
[20:52:51 CEST] <cone-167> ffmpeg 03Alexandra Hájková 07master:0536e7d78248: vima: Convert to the new bitstream reader
[20:52:52 CEST] <cone-167> ffmpeg 03Alexandra Hájková 07master:f9c59f26c852: wnv1: Convert to the new bitstream reader
[20:52:53 CEST] <cone-167> ffmpeg 03Alexandra Hájková 07master:be35ef92a418: xan: Convert to the new bitstream reader
[20:52:54 CEST] <cone-167> ffmpeg 03Alexandra Hájková 07master:178b4ea5f9a4: xsubdec: Convert to the new bitstream reader
[20:52:55 CEST] <cone-167> ffmpeg 03Alexandra Hájková 07master:8d1997add622: mpegts: Convert to the new bitstream reader
[20:52:56 CEST] <cone-167> ffmpeg 03Alexandra Hájková 07master:2cef81a87cd8: ogg: Convert to the new bitstream reader
[20:52:57 CEST] <cone-167> ffmpeg 03Alexandra Hájková 07master:2dbe2aa2c2d4: rdt: Convert to the new bitstream reader
[20:52:58 CEST] <cone-167> ffmpeg 03James Almer 07master:7c5adf6c88dd: Merge commit '2dbe2aa2c2d4f02d2669feae45dee4fc45414813'
[20:54:45 CEST] <cone-167> ffmpeg 03Martin Storsjö 07master:2f99117f6ff2: aarch64: vp9itxfm: Don't repeatedly set x9 when nothing overwrites it
[20:54:46 CEST] <cone-167> ffmpeg 03James Almer 07master:7d88bc10d57b: Merge commit '2f99117f6ff24ce5be2abb9e014cb8b86c2aa0e0'
[20:57:05 CEST] <cone-167> ffmpeg 03Martin Storsjö 07master:537b5b773b31: rtmpdh: Do global initialization before running the test
[20:57:06 CEST] <cone-167> ffmpeg 03James Almer 07master:30518a68a74b: Merge commit '537b5b773b317af79d3a5b576ee9683e15ed84f6'
[20:58:44 CEST] <cone-167> ffmpeg 03Martin Storsjö 07master:233d50b275dd: qt-faststart: Do not try to use fancy 64-bit seeking functions on mingw32ce
[20:58:45 CEST] <cone-167> ffmpeg 03James Almer 07master:2c40adf218c8: Merge commit '233d50b275dd7cf6cc0656851e670e1b2dfba56f'
[21:00:25 CEST] <cone-167> ffmpeg 03Diego Biurrun 07master:bd9cd04626a9: w32pthreads: Fix function pointer casts
[21:00:26 CEST] <cone-167> ffmpeg 03James Almer 07master:b30cd14b5732: Merge commit 'bd9cd04626a98a752c5771d057a6b86779359904'
[21:03:33 CEST] <cone-167> ffmpeg 03Diego Biurrun 07master:5bcc6f76f180: configure: Disable warning C4703 with MSVC
[21:03:34 CEST] <cone-167> ffmpeg 03James Almer 07master:b9886e569a9c: Merge commit '5bcc6f76f180d0f88269018727c92fc562fb8abb'
[21:05:00 CEST] <cone-167> ffmpeg 03Janne Grunau 07master:6a1ea4ec932f: arm: warn/error on movrelx usage problematic with PIC on ELF
[21:05:01 CEST] <cone-167> ffmpeg 03James Almer 07master:d1ee6fb72945: Merge commit '6a1ea4ec932f4fc9fdc00ec51ee070b298ddb35f'
[21:06:21 CEST] <cone-167> ffmpeg 03Diego Biurrun 07master:30f0d1b997f1: configure: Remove old avisynth support leftover
[21:06:22 CEST] <cone-167> ffmpeg 03James Almer 07master:743eae3cd6f9: Merge commit '30f0d1b997f15d667c05feab0b54f0b2814ba7a9'
[21:09:04 CEST] <cone-167> ffmpeg 03Diego Biurrun 07master:04698d528cac: configure: Use correct variable name in libsnappy test
[21:09:05 CEST] <cone-167> ffmpeg 03James Almer 07master:b26b4d62be4e: Merge commit '04698d528cac334b6b5cabd3384f01406a766285'
[21:15:39 CEST] <cone-167> ffmpeg 03Diego Biurrun 07master:ce6f780bc665: configure: Add missing asyncts filter, movie filter, and output example deps
[21:15:40 CEST] <cone-167> ffmpeg 03James Almer 07master:655418014c88: Merge commit 'ce6f780bc6656ad3895f81a988b239ad3c8af4b8'
[21:18:22 CEST] <cone-167> ffmpeg 03Diego Biurrun 07master:bf2f748fc74f: configure: Use correct libm linker flag during math function checks
[21:18:23 CEST] <cone-167> ffmpeg 03James Almer 07master:12290077d1ce: Merge commit 'bf2f748fc74fff5272075e1fe1c07b4152421526'
[22:02:20 CEST] <BBB> ubitux: did you notice that vp31 started failing in the tsan builds?
[22:02:27 CEST] <BBB> ubitux: I wonder if theres an explanation for that
[22:02:43 CEST] <ubitux> tsan tests fluctuates
[22:02:54 CEST] <ubitux> i suppose it only detects race, not track all the possible races
[22:06:35 CEST] <BBB> hm& maybe
[22:06:50 CEST] <BBB> I hope its not a real race, the code its pointing out I went through pretty closely and I thought I had fixed it
[22:18:37 CEST] <atomnuker> Gramner: does ryzen's avx2 really not give any performance gains at all?
[22:19:13 CEST] <Gramner> apparently not, but I don't have any such chip to test stuff on
[22:19:58 CEST] <Gramner> not sure why they would even add support for instruction sets that will make code that uses it slower
[22:20:22 CEST] <nevcairiel> tickboxes on a feature sheet?
[22:20:29 CEST] <atomnuker> that's horrid
[22:22:24 CEST] <BBB> thats pretty normal
[22:22:30 CEST] <BBB> sse2 in first incarnation was slower than mmx
[22:23:22 CEST] <Gramner> but doubles without x87!
[22:27:03 CEST] <Gramner> AVX2 doesn't really add much aside from wider vectors either. I can understand adding AVX (and AVX-512 eventually) due to features aside from vector width being useful, but adding AVX2 just for the sake of having it seems weird
[22:28:05 CEST] <jamrial> eh, you can always use the couple new instructions it adds on xmm
[22:28:09 CEST] <jamrial> things like byte broadcast
[22:32:25 CEST] <jamrial> in any case, if first gen zen doesn't have a 256bit unit, what will? second/third gen? their next arch five years from now?
[22:33:04 CEST] <jamrial> by then avx512 will be on every intel non server chip already
[22:36:14 CEST] <BtbN> Doesn't Naples already have it?
[22:36:57 CEST] <BtbN> hm, doesn't seem like it
[22:42:42 CEST] <RiCON> cannonlake will have it, iirc
[22:51:38 CEST] <nevcairiel> no released cpu does so far
[22:51:59 CEST] <nevcairiel> other then maybe some Phi
[22:52:59 CEST] <Gramner> the avx-512 subset on KNL is kind of lame though
[22:54:26 CEST] <kierank> KNL is kind of lame
[22:54:34 CEST] <kierank> from the benchmarks i've seen
[22:54:58 CEST] <kierank> probably good for fp32
[22:55:01 CEST] <kierank> and sciency stuff
[22:55:23 CEST] <Gramner> you probably need to have a workload that happens to be very well suited for that arch, yes
[22:55:26 CEST] <BtbN> Is it another Skylake-Remix?
[22:55:31 CEST] <BtbN> Or what is it?
[22:55:40 CEST] <Gramner> knight's landing. xeon phi
[23:47:12 CEST] <J_Darnley> WTF IS GOING ON HERE?!  THERE IS NO SSE2 FUNCTION FOR THAT!
[23:47:34 CEST] <J_Darnley> tig
[23:49:15 CEST] <BBB> lol
[23:55:59 CEST] <kierank> J_Darnley: for what
[23:56:54 CEST] <J_Darnley> my idct functions.  my benchmark results are weird
[23:57:24 CEST] <iive> :)
[00:00:00 CEST] --- Wed Apr  5 2017