[Ffmpeg-devel-irc] ffmpeg-devel.log.20180905

burek burek021 at gmail.com
Thu Sep 6 03:05:03 EEST 2018


[03:06:00 CEST] <cone-677> ffmpeg 03Shiyou Yin 07master:61eeb40a62d0: avcodec: [loongson] fix bug of mss2-wmv failed in fate test.
[03:06:00 CEST] <cone-677> ffmpeg 03Shiyou Yin 07master:17c635e605af: avcodec/mips: [loongson] simplify the usage of intermediate variable addr.
[08:23:11 CEST] <haasn> pnm.c:ff_pnm_decode_header should set avctx->color_range = AVCOL_RANGE_JPEG
[08:23:26 CEST] <haasn> (compare to e.g. pngdec.c)
[08:36:27 CEST] <cone-897> ffmpeg 03Steven Liu 07master:fbd8746efabe: avformat/hlsenc: rename option from use_localtime to strftime
[08:36:27 CEST] <cone-897> ffmpeg 03Steven Liu 07master:f499679e17cc: avformat/dashdec: refine mpd element of attribute name availabilityEndTime
[08:36:27 CEST] <cone-897> ffmpeg 03Steven Liu 07master:e134c20374ee: avformat/dashdec: refine adaptionset attribute members
[08:36:27 CEST] <cone-897> ffmpeg 03Steven Liu 07master:28578e61431b: avformat/dashdec: remove redundant code
[08:36:27 CEST] <cone-897> ffmpeg 03Steven Liu 07master:a222798ef34f: avformat/dashdec: reindent code for previous commit
[08:36:27 CEST] <cone-897> ffmpeg 03Steven Liu 07master:e35e91546551: avformat/dashdec: add trace message for get the logic output message
[08:36:27 CEST] <cone-897> ffmpeg 03Steven Liu 07master:8eac027cd14e: avformat/dashdec: add min_buffer_time process logic
[08:36:28 CEST] <cone-897> ffmpeg 03Steven Liu 07master:d0be0de06557: avformat/dashdec: reindent code for previous commit
[08:36:29 CEST] <cone-897> ffmpeg 03Steven Liu 07master:ad9b4ecc26be: avformat/dashdec: refine compute current fragment for presentation_timeoffset mode
[10:06:37 CEST] <January> michaelni: ok so I dont really understand why you error out on more than a single IDR slice NALU? Also in response to my patch, you say idr(h) is called repeatedlyisnt this what should happen with slices?
[12:08:31 CEST] <michaelni> January, this code, with one frame at a time input does not error out on multiple idr slices. Also with multiple idr slices, running code needed just per frame should not be needed to be called more than once. The part that is maybe buggy is when this code is used in chunked mode
[12:09:57 CEST] <January> michaelni: what do you think is the best way to support chunked mode?
[12:12:55 CEST] <michaelni> 1. find out what fails, why and fix it then goto 1. For the idr case here maybe idr_cleared can be moved to the context not sure if iam missing some interaction here
[12:15:39 CEST] <JEEB> talking of H.264 I will need to see if I can find some time for this https://trac.ffmpeg.org/ticket/7374
[12:15:49 CEST] <JEEB> (and would love input from anyone else)
[12:19:20 CEST] <JEEB> tried looking at some things and it seemed like the H.264 decoder didn't reset some variables proper so even though the stream became valid it would still keep dropping refs
[12:19:27 CEST] <JEEB> -> artifacts
[12:30:54 CEST] <January> michaelni: whys it not valid to call idr multiple times
[12:33:56 CEST] <michaelni> January, its wasting time. not sure it does other bad things
[12:52:37 CEST] <LigH> Hi
[12:53:52 CEST] <LigH> Recently we discussed an issue with lensfun; there is another now. ffmpeg's configure runs several tests to check the availability, and if I try to link lensfun in ffmpeg, configure tries to add "-lgnulib" to the gcc call. But there is no package "gnulib" in MSYS2 / MinGW.
[12:55:13 CEST] <LigH> Instead, it requires glib2, and that is used correctly now.
[12:58:15 CEST] <January> michaelni: maybe this would be better then? https://0x0.st/sv_9.patch
[13:01:16 CEST] <BtbN> LigH, all ffmpeg does for lensfun is add the parameters lensfun itself specifies in its pkg-config file. So that is broken, not ffmpeg.
[13:02:01 CEST] <JEEB> yea my first comment would have been if those things pop up from the configure specifically, or from the pc file
[13:02:04 CEST] <JEEB> :P
[13:02:31 CEST] <LigH> lensfun.pc does not contain "gnulib".
[13:03:11 CEST] <LigH> Want a log files archive?
[13:03:19 CEST] <JEEB> does pkg-config --libs lensfun (or whatever the name is)
[13:03:41 CEST] <LigH> https://0x0.st/sv_e.zip
[13:04:03 CEST] <JEEB> just look at the output of the pkg-config command :P
[13:04:14 CEST] <JEEB> and if you're forcing extra pkg-config params like static, add that into the mix as well
[13:04:17 CEST] <JEEB> :P
[13:04:36 CEST] <JEEB> because the FFmpeg repo most deffo doesn't contain the string "gnulib"
[13:04:51 CEST] <LigH> pkg-config.exe --static --libs lensfun
[13:04:58 CEST] <LigH> "static" is important
[13:05:10 CEST] <LigH> It contains "-lgnulib"
[13:05:16 CEST] <JEEB> well there you have it
[13:05:21 CEST] <JEEB> not related to FFmpeg then :P
[13:05:21 CEST] <LigH> I wonder where that comes from...
[13:08:59 CEST] <LigH> From which sources does pkg-config collect these libraries?
[13:09:47 CEST] <January> LigH: the .pc files
[13:10:18 CEST] <LigH> But there are 0 *.pc files containing the text "gnulib".
[13:10:56 CEST] <LigH> Wait...
[13:11:17 CEST] <LigH> glib-2.0.pc contains it.
[13:11:21 CEST] <LigH> o?
[13:11:49 CEST] <LigH> It's part of MinGW.
[13:13:37 CEST] <LigH> So who shall I talk to, now? AlexPux (MSYS2 packager)?
[13:14:23 CEST] <January> msys2 people probably, yes
[13:14:59 CEST] <January> LigH: I would say try installing msys2 again in a separate directory temporarily and check if you still have the issue on a plain install
[13:15:32 CEST] <LigH> The media-autobuild_suite updates MSYS2 regularly before updating ffmpeg and all related libraries to be linked into.
[13:15:33 CEST] <JEEB> LigH: whomever maintains that package, yes
[13:16:37 CEST] <LigH> I usually only run the suite. Therefore my limited insight into this subsystem, I am rather s Windows person.
[13:17:02 CEST] <January> LigH: also you may want to try WSL & linux mingw
[13:17:40 CEST] <JEEB> LigH: and we honestly don't care about your issues that don't have anything to do with FFmpeg. figure out who maintains the pacman package in msys2 that contains that pc file, and go poke that person
[13:20:17 CEST] <LigH> JEEB, I was just trying to figure out if it is an FFmpeg issue at all, or who else is responsible. Without asking here, how should I do that? My intention was not to blame you, but to discover.
[13:21:20 CEST] <JEEB> which is why we helped you
[13:21:42 CEST] <JEEB> now don't be "I'm just a user and thus please someone fix this in msys2 for me" (or I misread your comments)
[13:22:14 CEST] <JEEB> in any case, your priority is to figure out who in msys2 maintains that package that contains that pc file that is seemingly incorrect (?)
[13:22:39 CEST] <LigH> No. I didn't ask you to fix it, just to tell me who you believe should do that. Now ... over to OFTC #msys2. Thank you...
[13:22:46 CEST] <LigH> Bye
[13:36:45 CEST] <michaelni> January, that patch would remove the check that checks for idr and non idr slices being mixed with multi threading. That would be ok if the code can handle this. Iam not sure if the code can handle this
[13:37:19 CEST] <michaelni> do you need to support invalid streams ?
[13:38:17 CEST] <kierank> michaelni: no we need to support chunks
[13:40:33 CEST] <J_Darnley> I have a deep question for the assembly people.
[13:41:02 CEST] <J_Darnley> We are having large performance problems with page faults and one of our assembly functions.
[13:41:54 CEST] <kierank> we don't have page faults any more
[13:41:59 CEST] <kierank> we just have performance problems randomly
[13:42:08 CEST] <J_Darnley> ah, sorry I guess I wasn't keeping up.
[13:42:20 CEST] <J_Darnley> The function de-interleaves a line of 422 into planar.
[13:42:48 CEST] <January> the check doesn't even properly check for mixed IDR/non-IDR, you could pass it a IDR->PPS->IDR and it would fail (which would be still a valid stream afaik)
[13:43:09 CEST] <J_Darnley> While the code we use it in also interleaves separate fields.
[13:43:52 CEST] <J_Darnley> kierank and I would like to know if we should be using moves other than the usual mova/movu provided by the x264asm layer.
[13:44:27 CEST] <J_Darnley> non-temporal moves
[13:48:09 CEST] <J_Darnley> Who to ping?  Gramner, jam... James Almer not here. Martin Vignali are you on irc? durandal_1707 atomnuker
[13:49:54 CEST] <durandal_1707> looked at cache misses?
[13:50:35 CEST] <Gramner> you can use non-temporal prefetches for the input and non-temporal stores for the output
[13:51:19 CEST] <J_Darnley> I haven't but maybe kierank have you looked at that?  Cache misses?
[13:51:57 CEST] Action: J_Darnley goes to find his intel reference
[13:52:06 CEST] <kierank> Gramner: we have some bizzare issue where cpu load for the function will randomly go up by a factor of 5
[13:52:31 CEST] <kierank> and instead of taking 10ms take 50ms
[13:52:36 CEST] <kierank> and thus go behind realtime briefly
[13:52:47 CEST] <kierank> for a very simple function on a machine not under load
[13:52:59 CEST] <kierank> huge pages got rid of most of perfs complaints but it still hapens
[13:54:30 CEST] <Gramner> "cpu load" in those cases basically means "waiting for memory loads to resolve"
[13:54:58 CEST] <Gramner> perhaps there are multiple threads hammering memory at the same time etc.? can be hard to figure out exactly
[13:55:21 CEST] <kierank> we had lots of page faults (because it's 10-bit) as it interleaved the fields
[13:55:26 CEST] <kierank> but huge pages got rid of those
[13:55:32 CEST] <kierank> but still locks up and uses 100% cpu from time to time
[13:56:14 CEST] <Gramner> cpu usage 100% doesn't mean the cpu is actually doing anything. waiting for memory is considered load
[13:56:49 CEST] <kierank> I doubt multiple threads are trying to read
[13:56:57 CEST] <kierank> unless something went bady wrong
[13:57:29 CEST] <Gramner> is this on a NUMA system?
[13:57:34 CEST] <kierank> yes
[13:57:40 CEST] <kierank> thread was pinned to a core
[13:57:51 CEST] <kierank> at one point, same issue
[13:58:10 CEST] <kierank> trying pre-spectre kernel right now
[14:01:09 CEST] <J_Darnley> While I read up on prefetching I think we should try the stores as non-temporal.  We don't need that data in cache.  The reference gets passed off elsewhere.
[14:01:21 CEST] <J_Darnley> kierank: When you're done with the kernel I can make those changes.
[14:02:44 CEST] <kierank> ok
[14:07:21 CEST] <michaelni> January, about idr pps idr, if you pass this as a single frame, the error check wont execute after the first idr so that cant really help. also this idr pps idr case is a bit odd, normally the decoder is fed with a access unit at a time or in chunked mode (which may be buggy) a NAL unit. to get 2 IDR frames you need something like mov/mp4 with interlaced AUs and these would not have PPS in them.
[14:10:48 CEST] <michaelni> if there is a problem with 2 AUs in a single input packet and they are passed to the decoder unsplit (with or without pps) and if thats buggy a fix is very welcome. If that is one of the goals of the patch the commit message should document it. i thought this was about chunked mode and slices where the decoder would not be fed with idr pps idr at once
[14:16:37 CEST] <January> regardless, removing the check still doesn't result in the decoder actually using threads (though it does decode) so I will have to look further at what the issue is
[14:34:04 CEST] <jya> updating our code to the ffvp9 decoder, I want to continue using the vp9 filter in place of the new vp9_superframe_split, what do you need to set on the AVCodecContext so filters are ignored, if I simply disable the vp9_superframe_split_bsf, I get an error from avcodec_send_packet  https://github.com/FFmpeg/FFmpeg/blame/master/libavcodec/decode.c#L799
[14:36:17 CEST] <nevcairiel> frames need to be split before being send to the decoder, this is what the bsf does. If you dont want it, you need to ensure this properly happens regardless. But really should just use the bsf, it happens fully transparently
[14:37:14 CEST] <jya> nevcairiel: we do that using the vp9 filter
[14:37:32 CEST] <nevcairiel> whats a "vp9 filter"?
[14:38:22 CEST] <jya> sorry, vp9 parser
[14:39:17 CEST] <jya> we feed data through the vp9 parser, that splits superframe already, then we feed that to the decoder
[14:39:23 CEST] <nevcairiel> the parser does not do splitting anylonger
[14:39:32 CEST] <nevcairiel> that task is left to the bsf now
[14:39:37 CEST] <jya> doesn't it? what does it do then?
[14:39:56 CEST] <jya> BBB only mentioned to me that the future aim was to have that work done in the bsf only.
[14:39:57 CEST] <nevcairiel> extract frame type/keyframe flags
[14:40:07 CEST] <nevcairiel> the future is now :)
[14:40:10 CEST] <nevcairiel> well, months ago
[14:40:31 CEST] <jya> ok, so I no longer need the vp9 parser then, considering our demuxer takes care of that already
[14:40:47 CEST] <jya> i'm assuming the vp9 parser is now only used by the demuxer?
[14:41:04 CEST] <jya> nevcairiel: that includes the 4.0 branch?
[14:41:39 CEST] <nevcairiel> not sure what exactly that question means, all parsers are used in demuxing only in avformat
[14:42:08 CEST] <jya> damn, so I have to now import the bsf infrastructure (wasn't needed before)
[14:42:55 CEST] <jya> our demuxer (like for webm) extract the "blob" we feed that into the parser, then into the decoder. up to now the parser was splitting superframes
[14:43:19 CEST] <jya> in any case, thanks for taking the time answering my question... now I know what I need to do
[14:43:32 CEST] <nevcairiel> 3.4 still had the vp9 parser split, 4.0 no longer
[14:48:49 CEST] <January> looks like it actually checks for mixed IDR/non-IDR already https://github.com/ffmpeg/ffmpeg/blob/master/libavcodec/h264_slice.c#L1901
[15:01:06 CEST] <michaelni> January, i think neither of the checks is a superset of the other. one of these was moved where it is now by a merge, the other is still where it was before the merge. The 2 checks possibly should have been left together where they where or moved together. as it was done made the code harder to understand i think
[15:06:59 CEST] <jya> nevcairiel: what's the difference between libavcodec/vp9_superframe_bsf.c and libavcodec/vp9_superframe_split_bsf.c
[15:10:11 CEST] <jamrial> jya: one splits frames, the other merges already split frames
[15:10:25 CEST] <jamrial> the split is the one you want
[15:10:39 CEST] <jya> jamrial: thanks.. figured as much, wanted to be sure
[15:11:05 CEST] <jamrial> the other was used to merge the frames back when the parser did the splitting, to prevent said frames making it to the muxer the wrong way
[15:31:10 CEST] <January> michaelni: where are slices actually passed to the slice threads which are created
[15:32:21 CEST] <funman> January: ff_h264_execute_decode_slices ?
[15:33:24 CEST] <January> oh right. For some reason it seems to be spawning threads but there is only load on a single cpu
[15:34:52 CEST] <jya> jamrial: with the split of superframe now occurring within the decoder, does that mean you now need to call avcodec_decode_video2 in a loop until all bytes have been parsed?
[15:35:15 CEST] <funman> January: maybe slices are fed one by one that makes sense
[15:42:25 CEST] <jamrial> jya: i think so. nevcairiel probably knows best, but since the split is internal it shouldn't have an affect in how you use the decode api
[15:43:13 CEST] <nevcairiel> not sure how exactly it works with the old api
[15:43:16 CEST] <jamrial> you should probably switch to the send/receive api, though. sounds more like what you're looking for
[15:43:33 CEST] <nevcairiel> the new api has a much clearer defined semantic for this case
[15:43:54 CEST] <January> funman: still not seeing where it's actually passed to the other threads though
[15:44:22 CEST] <funman> January: avpriv_slicethread_execute ?
[15:44:40 CEST] <nevcairiel> the actual thread handling funciton is in avctx->execute
[15:44:47 CEST] <nevcairiel> which should be called somewhere
[15:44:48 CEST] <jya> jamrial: we would still have to support the old api (we load the system installed libavcodec, check the version and adapt)
[15:45:21 CEST] <jya> so for now, I use the lowest common denominator
[15:46:32 CEST] <jamrial> what's the oldest version you support? the new decode api is available since ffmpeg 3.1
[15:48:14 CEST] <jya> libavcodec.53
[15:48:33 CEST] <jya> including *both* ffmpeg and libav for 53,54 and 55
[15:48:39 CEST] <jya> real pain
[15:49:56 CEST] <jya> regarding the split, a single AVPacket now can in theory return two frames
[15:52:33 CEST] <jya> if more than two frames are returned, how are their pts/duration calculated?
[15:53:03 CEST] <nevcairiel> in the vp9 case those extra frames in the super frame are not actually returned
[15:53:21 CEST] <nevcairiel> it doesnt have re-ordering or anything like that
[15:53:28 CEST] <nevcairiel> those are invisible frames used for reference
[15:53:32 CEST] <nevcairiel> but never returned directly
[15:54:17 CEST] <nevcairiel> there may be a future P frame with only skip blocks that may look exactly like one of those invisible frames
[15:54:24 CEST] <nevcairiel> but there would be a sparare avpacket for it
[15:55:00 CEST] <jya> ok... good to know, so no need for looping
[15:55:04 CEST] <nevcairiel> the entire super-frame structure exists exactly because they are not being returned
[15:55:15 CEST] <nevcairiel> so that every packet has exactly one visible frame
[15:55:21 CEST] <nevcairiel> which helps with timestamps etc
[15:55:27 CEST] <jya> indeed
[15:56:19 CEST] <jya> awesome, compiled on all platform after only my 5th attempt :)
[15:57:20 CEST] <nevcairiel> and looping should not be required, the old decode APIs do that internally if needed
[15:59:14 CEST] <nevcairiel> and the new apis have a clear semantic that requires looping anyway
[15:59:28 CEST] <nevcairiel> but the old ones never did - at least for video, so no changes there
[16:00:00 CEST] <jya> nevcairiel: excellent, thanks again for your help
[16:00:47 CEST] <January> funman: h->nb_slice_ctx_queued never goes more than 1 it looks like
[16:01:34 CEST] <funman> January: how many slices at a time are you giving to avcodec_decode_video2 ?
[16:02:22 CEST] <January> funman: using new API
[16:02:43 CEST] <funman> January: how many slices at a time are you giving to something() ?
[16:04:43 CEST] <January> I think only one. I thought ffmpeg would begin decoding a slice and return immediately, and decode another slice in a separate thread if it wasn't finished with the first slice when the second is passed--but looks like you actually have to pass it all the slices you want to decode in parallel at once
[16:05:07 CEST] <funman> that's what I understand as well
[16:11:32 CEST] <jya> anyone attempted to compile ffmpeg on windows amd64_arm ? (aarch64)
[16:11:53 CEST] <nevcairiel> i believe wbs does such things
[16:14:14 CEST] <January> jya: https://github.com/mstorsjo/llvm-mingw
[16:16:37 CEST] <jya> should be using clang-cl , wouldn't have to worry about toolchain
[16:24:05 CEST] <wbs> jya: I haven't tested the clang-cl frontend, only clang -target foo-win32-msvc. see https://fate.libav.org/aarch64-win32-clang-6.0/20180904195558
[16:24:55 CEST] <jya> wbs: have you tried with default msvc toolchain?
[16:25:20 CEST] <nevcairiel> msvc comes with both clang and clang-cl
[16:25:27 CEST] <wbs> jya: if you have a working setup with clang-cl for other arches, this shouldn't be different (although I don't remember how you request the target arch via clang-cl)
[16:25:44 CEST] <jya> nevcairiel: I'm specifically referring to cl compiler
[16:25:47 CEST] <wbs> jya: https://fate.libav.org/arm64-msvc-15/20180904202629
[16:26:36 CEST] <nevcairiel> yeah could just use cl.exe directly without clang in recent versions as well
[16:26:50 CEST] <jya> wbs: damn, that fate.libav.org should include the config.h and config.asm for all the platforms it works on, that would save me hours
[16:27:12 CEST] <nevcairiel> those files are just generated and the commands are listed? :)
[16:27:42 CEST] <wbs> jya: I can get them for you if you want, but I'm not at a computer atm
[16:29:25 CEST] <jya> wbs: win32, win64, mac 32, mac 64, android32, android64, win-arm64, freebsd , that would cover it I think... 
[16:29:43 CEST] <jya> oh and linux32 linux64 of course
[16:29:57 CEST] <wbs> ah, you mean in general
[16:30:07 CEST] <jya> yeah :)
[16:30:20 CEST] <jya> though I only care about the vp8, vp9 and flac decoder 
[16:30:31 CEST] <jya> everything else to be disabled
[16:30:53 CEST] <jamrial> even opus?
[16:30:57 CEST] <jya> every few months, we extract the code, regenerate the needed config.h 
[16:31:06 CEST] <jya> jamrial: we use libopus for that
[16:31:14 CEST] <jamrial> ah, i see
[16:31:32 CEST] <jya> can't ship the whole ffmpeg unfortunately. that would make my life so much easier
[16:32:03 CEST] <nevcairiel> configure syntax doesnt change that often, if you would to just keep a collection of scripts to generate this stuff, that shouldnt be so time intensive
[16:32:39 CEST] <jya> nevcairiel: trust me, it does ! 
[16:33:17 CEST] <jya> and now if have to craft those codec_list.c etc that are generated
[16:33:21 CEST] <jya> by hand
[16:33:57 CEST] <jya> could always attempt to run the configure script somehow
[16:33:57 CEST] <nevcairiel> if you tell configure which codecs you want, shouldnt the generated list be just how you want it
[16:34:07 CEST] <jya> but then people would scream that our compile time has doubled
[16:37:39 CEST] <jamrial> if that's a problem, i can see into backporting the recent configure optimizations to the 4.0 branch
[16:40:08 CEST] <jya> jamrial: configure on my box takes a significant amount of time, even master branch
[16:40:11 CEST] <jya> let me measure
[16:41:19 CEST] <nevcairiel> very recent master should've gotten quite a bit better
[16:41:31 CEST] <nevcairiel> like, last 2 weeks or so
[16:49:58 CEST] <jya> nevcairiel: master branch 3m44.655s
[16:50:20 CEST] <nevcairiel> that sounds ok
[16:51:17 CEST] <jya> that sounds pretty long to me, it takes me 10 minutes to compile firefox on this box 
[16:51:37 CEST] <jya> re-doing that on 4.0 branch
[16:51:56 CEST] <BtbN> hm? Configure, on windows in cygwin, barely takes a minute for me.
[16:52:19 CEST] <jya> i'm using msys.bat (like described on the wiki)
[16:52:57 CEST] <jya> $ time ./configure --disable-everything --disable-protocols --disable-demuxers --disable-muxers --disable-filters --disable-programs --disable-doc --disable-parsers --enable-parser=vp8 --enable-parser=vp9 --enable-decoder=vp8 --enable-decoder=vp9 --disable-static --enable-shared --disable-debug --disable-sdl2 --disable-ibxcb --disable-securetransport
[16:52:57 CEST] <jya> --disable-iconv --disable-swresample --disable-swscale --disable-avdevice --disable-avfilter --disable-avformat --disable-d3d11va --disable-dxva2 --disable-vaapi --disable-vdpau --disable-videotoolbox --enable-decoder=flac --enable-asm --disable-cuda --disable-cuvid --toolchain=msvc  --enable-x86asm
[16:52:59 CEST] <nevcairiel> its significantly faster still if you use dash instead of bash
[16:57:41 CEST] <jya> 4m6.045s on release/4.0
[16:58:23 CEST] <BtbN> Only master has the speedup.
[16:58:51 CEST] <jya> 19.46s to compile 
[17:00:39 CEST] <jya> BtbN: it's only a 10% decrease between master and release
[17:01:06 CEST] <jya> still, time difference between a compile and configure is big
[17:01:19 CEST] <nevcairiel> its probably not that bad for you in the first place beause you have very little components enabled
[17:01:20 CEST] <BtbN> You're doing something wrong. It cut configure time from like 10 minutes down to one for me.
[17:01:42 CEST] <BtbN> So 4 minutes with such a small set of stuff sounds very wrong
[17:01:42 CEST] <nevcairiel> so it does way little deps checks
[17:03:20 CEST] <jya> BtbN: i do *exactly* the steps as described in the wiki
[17:03:40 CEST] <BtbN> I never looked at the wiki. Try it with dash
[17:07:37 CEST] <nevcairiel> a full configure run with msvc and everything enabled takes 2m for me, with bash
[17:07:56 CEST] <nevcairiel> it used to be like 15
[17:08:55 CEST] <jamrial> jya: master branch as of today should be a lot faster than 4m
[17:09:17 CEST] <jamrial> especially if you disable pretty much everything except those three decoders, as you said
[17:09:29 CEST] <jya> start a cmd prompt, call "c:\Program Files (x86)\Microsoft Visual Studio\2017\Community\VC\Auxiliary\Build\vcvarsall.bat" amd64
[17:09:39 CEST] <jya> start msys.bat, run configure that's it
[17:10:01 CEST] <nevcairiel> the difference between bash and dash is actually not very big anymore after the changes
[17:10:03 CEST] <nevcairiel> only like 10 seconds
[17:10:57 CEST] <nevcairiel> lets see if gcc vs msvc makes a huge time difference
[17:11:01 CEST] <nevcairiel> i dont expect so,  but who knows
[17:12:31 CEST] <nevcairiel> nah very similar
[17:12:44 CEST] <nevcairiel> but 2m full configure on windows is really quite nice compared to the 10-15m before :d
[17:25:35 CEST] <jya> nevcairiel: indeed, my solution to this is getting a quicker machine :)
[17:30:42 CEST] <jya> hmmmm, on mac, the new vp8 decoder returns 1 less frames than expected in our test...
[17:30:57 CEST] <jya> did the vp8 parser got moved to a bsf too or something like that?
[18:07:42 CEST] <jya> nevcairiel: okay, found it, it's to do with draining... as we pass all packets (even the empty one) through a parser , the parser returns a dangling pointer out, but with a size of 0. so here: https://github.com/FFmpeg/FFmpeg/blob/master/libavcodec/decode.c#L692 
[18:07:50 CEST] <jya> the test is true, and it returns an error
[18:08:07 CEST] <jya> why treat size == 0 with a data non null as an error?
[18:08:32 CEST] <nevcairiel> because the api docs say so, probably
[18:08:53 CEST] <jya> to drain we call avcodec_decode_video2 with a size of 0
[18:09:18 CEST] <nevcairiel> size of 0 is fine, just make data NULL as well
[18:10:13 CEST] <jya> should be up to the parser IMHO, returning a dangling pointer is never good
[18:10:28 CEST] <jya> that's jamrial code (in decode.c)
[18:12:23 CEST] <jya> the use of an internal buffer declared on the stack , which will then be returned is there: https://github.com/FFmpeg/FFmpeg/blame/master/libavcodec/parser.c#L143
[18:12:50 CEST] <nevcairiel> parsing and decoding is usually not quite as close together, because parsing is done in the demuxer
[18:13:01 CEST] <nevcairiel> so something in between would filter out empty packets anyway
[18:14:03 CEST] <jya> probably yes... obviously I'll write a workaround, but would be nice to get that fixed, it's tidier... I don't see the need to check for the data to be non null if size is 0 though
[18:14:12 CEST] <nevcairiel> a parser may return zero-sized packet in the middle of decoding as well
[18:14:18 CEST] <nevcairiel> in which case you definitely dont want to flush teh decoder
[18:14:41 CEST] <nevcairiel> the parser may just need more data
[18:15:04 CEST] <nevcairiel> so checking the parser output size for > 0 should generally be required for proper operation
[18:16:41 CEST] <nevcairiel> (its likely not a case for vp8/9 though, since those always come in full frames9
[18:38:09 CEST] <jya> nevcairiel: if the parser return a zero-sized packet in the middle of decoding, then it will cause a decoding error
[18:38:37 CEST] <nevcairiel> because you should not s end that data to the decoder
[18:39:19 CEST] <jya> where does it say that the data returned from the parser shouldn't be fed to the decoder under some circumstances?
[18:40:41 CEST] <nevcairiel> where does it say that you should send an empty packet to the decoder? :)
[18:40:43 CEST] <jya> hmm, the av_parser_parse2 example does test that size != 0
[18:40:50 CEST] <jya> nevcairiel: to drain
[18:40:58 CEST] <nevcairiel> draining is independent of that entire process
[18:41:11 CEST] <jya> sure, but you need to send an empty packet to the decoder still
[19:34:31 CEST] <durandal_1707> why is ff_reget_buffer() so slow: 431 vs 745
[19:35:08 CEST] <atomnuker> why do you need to use it?
[19:35:19 CEST] <durandal_1707> i just ask
[20:17:13 CEST] <kierank> Gramner: do we need to use sfence for nontemporal writes?
[20:17:20 CEST] <Gramner> yes
[20:19:23 CEST] <thardin> https://www.ioccc.org/2018/  bellard submission
[20:23:58 CEST] <cone-401> ffmpeg 03Gyan Doshi 07master:1a4a8df24942: ffplay: add option to allow custom seek interval
[20:25:34 CEST] <kierank> Gramner: seems to make a massive difference movntdq, function is like twice as fast
[20:25:59 CEST] <Gramner> makes sense, you only need to do half as many memory loads
[20:26:01 CEST] <kierank> but then the program screws up probably because of lack of sfence
[20:26:19 CEST] <Gramner> because normal stores first needs to fetch the entire old cache line and then update parts of it
[20:27:10 CEST] <Gramner> also consider using non-temporal prefetches for the input to avoid clobbering the entire cache
[20:27:43 CEST] <Gramner> (or just non-temporal loads if the source is guaranteed to be aligned)
[20:28:30 CEST] <J_Darnley> Yeah, I was reading on prefetches but I didn't get around to writing any of it yet.
[20:29:21 CEST] <nevcairiel> where does it store the prefetched stuff if its non-temporal? does it have extra storage for that  somewhere?
[20:29:33 CEST] <Gramner> yeah, it's an additional buffer
[20:29:39 CEST] <Gramner> i htink
[20:30:03 CEST] <J_Darnley> How "expensive" can sfence be?
[20:30:26 CEST] <Gramner> you're only calling it once per frame right? shouldn't be a big deal
[20:30:28 CEST] <J_Darnley> Is it like emms and we should avoid calling often?
[20:30:41 CEST] <J_Darnley> No, I just made it once per line
[20:30:54 CEST] <Gramner> oh, that could be bad
[20:31:17 CEST] <Gramner> never really tried using it often, no idea what happens
[20:31:32 CEST] <J_Darnley> okay, then I guess we need some inline functions like what ffmpeg does with emms
[20:31:57 CEST] Action: J_Darnley looks for an intrinsic
[20:32:46 CEST] <Gramner> just place sfence at the end of the function that does the non-temporal frame copying. or am I misunderstanding something?
[20:33:22 CEST] <J_Darnley> I put it in an assembly function that is called once per line.  It only does 1 line.
[20:34:25 CEST] <Gramner> is there any particular reason for doing it per-row?
[20:34:37 CEST] <J_Darnley> the sfence?  no
[20:34:53 CEST] <J_Darnley> if we wanted to sfence when done we either need to redesign that function or call sfence specifically
[20:36:33 CEST] <Gramner> more general, calling a function once per row. e.g. if you're processing an entire frame doing that sounds like a lot of overhead that could be avoided
[20:37:34 CEST] <kierank> Gramner: it's because in part because the function does a uyvy422 to planar yuv conversion at the same time as interleaving fields
[20:39:38 CEST] <kierank> and the rest of the frame has blanking data like audio
[20:40:31 CEST] <kurosu> btw, are the various prefetch insn (still?) worth it?
[20:41:25 CEST] <Gramner> non-temporal ones are useful, normal prefetches generally less so due to hardware prefetching being pretty good nowadays
[20:41:26 CEST] <atomnuker> kierank: is the frame on hardware or on ram?
[20:41:27 CEST] <kurosu> iirc, ffh264 use some in MC (with an address expected to be fetched later on), but I really wonder how much it brings
[20:41:34 CEST] <kierank> atomnuker: in this case ram
[20:41:49 CEST] <kierank> atomnuker: it's the field interleave in sdi_dec
[20:42:49 CEST] <Gramner> hardware prefetching has tons of logic to predict what you are going to access in the future and speculatively prefetch ahead of time
[21:15:10 CEST] <jamrial> jkqxz: https://pastebin.com/raw/SadsjtUZ fix for global motion stuff based on your latest cbs patch
[21:15:22 CEST] <jamrial> you were using but never setting prev_gm_params
[21:17:18 CEST] <jamrial> alternatively, since gm_params is all derived values that have no relevancy unless you're writing a decoder, you can just remove it all with something like https://pastebin.com/raw/NnWYsc5D
[21:33:05 CEST] <atomnuker> yeah, I was wondering why you're parsing the values at all
[21:33:58 CEST] <atomnuker> are there any values of interest beyond those in the header?
[21:34:20 CEST] Action: durandal_1707 still no review for SCPR v3, will push it ASAP
[21:35:36 CEST] <jkqxz> Mostly a desire to get everything in uncompressed_header().  Reference and tile stuff before that is useful, but yeah, global motion and film grain stuff isn't really.
[21:36:37 CEST] <jkqxz> jamrial:  Not doing it at all is even easier than that, because you can eliminate pretty much all of the funny variables in the subexp stuff.  I would prefer to get the numbers, though.
[21:37:49 CEST] <atomnuker> durandal_1707: I'm looking at it now
[21:38:22 CEST] <atomnuker> I think film grain would be interesting though
[21:38:46 CEST] <atomnuker> unlike global motion it doesn't really change anything in the codec afaik, so you could change them without breaking decoding
[21:38:48 CEST] <jamrial> jkqxz: ah, i removed the funny variables that seemed related only to gm_params, but left what seemed necessary to parse the AV1RawSubexp fields
[21:39:02 CEST] <jkqxz> I want to rewrite global_motion_param in a more sensible way.  The current structure is just because the standard does it like that.
[21:39:34 CEST] <jkqxz> atomnuker:  Huh, yeah.  That would actually be quite a cute bsf (add/remove film grain).
[21:46:54 CEST] <cone-401> ffmpeg 03Shiyou Yin 07master:776909e42e2a: avcodec/mips: [loongson] reoptimize put and add pixels clamped functions.
[21:55:31 CEST] <durandal_1707> Compn: ping
[22:07:42 CEST] <durandal_1707> atomnuker: it compares bytes
[22:09:07 CEST] <durandal_1707> for sorting symbols
[22:09:57 CEST] <durandal_1707> why and how to put stuff into separate file?
[22:10:54 CEST] <durandal_1707> if you insist, s/unsigned/uint32_t can be done in another commit
[22:10:57 CEST] <atomnuker> look at how opus_entropy is done
[22:12:07 CEST] <durandal_1707> atomnuker: cant find that one
[22:14:32 CEST] <atomnuker> right, forgot it was called opus_rc
[22:24:20 CEST] <durandal_1707> atomnuker: i cant put stuff into separate file, code is shared
[22:26:38 CEST] <atomnuker> all the more reason to put it elsewhere?
[22:27:11 CEST] <durandal_1707> why? it is fine there
[22:27:27 CEST] <atomnuker> I didn't see it being entangled with the main decoding context, so it should be just a matter of making a .h file and putting all the stuff there
[22:28:59 CEST] <durandal_1707> why not into .c ?
[22:30:32 CEST] <durandal_1707> and you never seen real decompiled code
[22:38:28 CEST] <atomnuker> well it has been touched somewhat, but no programmer would use variables in alphabetical order if they knew what they were writing
[22:38:41 CEST] <atomnuker> unless they were writing fortran or something
[22:41:51 CEST] <durandal_1707> feel free to rename them later
[22:42:05 CEST] <cone-401> ffmpeg 03Marton Balint 07master:6aaf1b504c6a: avformat/mxfdec: do not use sound essence descriptor quantization bits for bits_per_coded_sample
[00:00:00 CEST] --- Thu Sep  6 2018


More information about the Ffmpeg-devel-irc mailing list