[Ffmpeg-devel-irc] ffmpeg-devel.log.20170116

Tue Jan 17 03:05:03 EET 2017

[00:11:00 CET] <AstralStorm> hello, I've found and am debugging an issue related to HLS playlists and seeks with a long playlist
[00:11:14 CET] <AstralStorm> try: ffplay -loglevel debug -ss 38:39:40 'http://vod045-ttvnw.akamaized.net/v1/AUTH_system/vods_c5a3/gamesdonequick_24225507808_582764740/chunked/index-dvr.m3u8'
[00:11:31 CET] <AstralStorm> note it takes million years (read never) to seek to the right position
[00:11:41 CET] <AstralStorm> while it has a whole chunked playlist so it shouldn't take this long
[00:12:07 CET] <AstralStorm> logs suggest it's seeking linearly for every url that is in the playlist instead of, say, bisecting
[00:12:34 CET] <AstralStorm> the issue also affects mpv and mplayer which use this backend
[00:13:00 CET] <AstralStorm> any idea on where I should look to patch it in the source?
[00:13:19 CET] <JEEB> libavformat/hls*.c
[00:13:26 CET] <JEEB> see how seeking is implemented ther
[00:13:28 CET] <JEEB> *there
[00:13:52 CET] <AstralStorm> thanks, I'll look, if I don't manage to patch it quickly I'll just report it full in the bug tracker
[00:13:55 CET] <BtbN> I'd suspect it's not implemented at all and it just fast-forwards until the given time has passed.
[00:14:05 CET] <BtbN> Which involves downloading all segments up until that point.
[00:14:13 CET] <AstralStorm> that would suck.
[00:15:16 CET] <AstralStorm> interesting thing is that twitch html5 player handles it fine, but I suspect there ffmpeg is only used on already downloaded buffer
[00:15:49 CET] <AstralStorm> (in firefox)
[00:16:26 CET] <AstralStorm> livestreamer doesn't handle this either
[00:17:53 CET] <BtbN> Their JavaScript blob handles it.
[00:18:19 CET] <JEEB> yeah, I wouldn't be surprised if seeking wasn't properly implemented yet
[00:18:25 CET] <JEEB> as sad as it is
[00:18:31 CET] <JEEB> "string parsing ahoy"
[00:19:28 CET] <AstralStorm> ah well, I'll get on it
[00:19:53 CET] <AstralStorm> the other sad thing is I cannot just take and edit the m3u8 playlist as it doesn't have the URL prefix
[00:20:05 CET] <AstralStorm> unless there is a way to do that?
[00:20:17 CET] <AstralStorm> (it has names only as per HLS spec)
[00:20:58 CET] <AstralStorm> this is also probably why livestreamer doesn't just handle it (it could just essentially sed the playlist to have full URLs which are just as compliant)
[00:23:25 CET] <AstralStorm> yeah just mangling the playlist works fine
[00:23:53 CET] <AstralStorm> but it's not handled then by hls so no preload of further chunks, wtf
[02:08:15 CET] <faLUCE> is there somewhere a simple mux esample for libavformat? I'm seeing that muxing.c inside doc/examples is a total mess, and it has changed in several releases of ffmpeg
[02:32:53 CET] <cone-010> ffmpeg 03Daniil Cherednik 07master:a6191d098a03: dcaenc: Reverse data layout to prevent data copies during Huffman encoding introduction
[02:32:53 CET] <cone-010> ffmpeg 03Daniil Cherednik 07master:c2500d62c68a: dcaenc: Implementation of Huffman codes for DCA encoder
[02:54:58 CET] <cone-010> ffmpeg 03Andreas Cadhalpun 07master:367cac782787: libopenmpt: add missing avio_read return value check
[02:54:59 CET] <cone-010> ffmpeg 03Steve Lhomme 07master:77742c75c550: dxva2: use a single macro to test if the DXVA context is valid
[02:55:00 CET] <cone-010> ffmpeg 03Steve Lhomme 07master:153b36fc6284: dxva2: get the slice number directly from the surface in D3D11VA
[02:55:01 CET] <cone-010> ffmpeg 03Steve Lhomme 07master:8fb48659018f: dxva2: allow an empty array of ID3D11VideoDecoderOutputView
[02:57:18 CET] <Chloe> faLUCE: the example changes because the api changes 
[07:09:34 CET] <Zeranoe> Are there any objections to delay loading dlls for things like CUVID, OpenCL, OpenGL? If not I can work on the patches.
[08:04:00 CET] <nevcairiel> cuvid is loaded on-demand with LoadLibrary anyway, there is nothing to change =p
[08:16:37 CET] <wm4> opengl is usually delay loaded (you generally must use a loader even for core functions on windows), but then again ffmpeg's opengl support is 100% worthless
[08:17:55 CET] <Zeranoe> Isn't that the same story for OpenCL?
[08:19:25 CET] <atomnuker> it works for the deshake filter
[08:20:19 CET] <Zeranoe> I thought it was unsharp?
[08:20:48 CET] <atomnuker> they both support it
[08:21:11 CET] <wm4> well at least opencl support has potential
[08:21:27 CET] <wm4> (and one could consider opengl filters too, maybe)
[08:24:53 CET] <rcombs> I once considered porting milkdrop into lavfi
[08:24:59 CET] <rcombs> then I read some milkdrop code
[08:25:30 CET] <Zeranoe> So if CUVID is already using LoadLibrary, I'll work on a similar solution for OpenCL
[08:29:44 CET] <Zeranoe> What is the difference between cuda, cuvid, and nvenc...
[08:47:07 CET] <wm4> cuda is nvidia's opencl (incompatible, but fulfills the same role)
[08:47:19 CET] <wm4> cuvid is the cuda video decode API
[08:47:32 CET] <wm4> and nvenc is nvidia's encode API, which can interop with cuda I guess
[09:12:22 CET] <Zeranoe> Would it really be as easy as adding 'opencl_deps_any="dlopen LoadLibrary"' to configure...
[09:14:47 CET] <wm4> no
[09:14:55 CET] <wm4> looks like opencl API is called directly
[10:08:19 CET] <Chloe> 'ffmpeg's opengl support is 100% worthless' agreed
[10:08:42 CET] <Chloe> wtf do we need an ogl output for
[10:09:22 CET] <wm4> that wouldn't be bad, but the bad part is that it's implemented as muxer API
[10:09:40 CET] <wm4> which is completely batshit nonsense insane
[10:10:47 CET] <Chloe> how would it work in a non-insane context? I have no idea where it would go
[10:15:52 CET] <wm4> Chloe: creating an API that's actually suitable for video rendering
[10:25:36 CET] <cone-254> ffmpeg 03Paul B Mahol 07master:40cf94371400: avcodec: add SIPR parser
[10:25:36 CET] <cone-254> ffmpeg 03Paul B Mahol 07master:e0665d385ee0: avformat/aadec: stop ignoring file metadata
[10:25:36 CET] <cone-254> ffmpeg 03Paul B Mahol 07master:591be9e38443: avformat/aadec: use avio_get_str()
[10:43:22 CET] <cone-254> ffmpeg 03Clément BSsch 07master:a91c265f393b: lavc/pthread_frame: protect read state access in setup finish function
[10:43:23 CET] <cone-254> ffmpeg 03Clément BSsch 07master:bd520e856901: lavc/h264_slice: drop redundant current_slice reset
[10:43:24 CET] <cone-254> ffmpeg 03Clément BSsch 07master:9561de418378: lavc/h264dec: reconstruct and debug flush frames as well
[12:04:13 CET] <cone-254> ffmpeg 03Carl Eugen Hoyos 07master:e66473027143: configure: Fix standalone compilation of aiff and caf muxers.
[12:06:39 CET] <mateo`> michaelni: Hello, ubitux and I have updated the next merge commit here https://github.com/mbouron/FFmpeg/tree/libav-merge, can you give a try ?
[12:08:13 CET] <mateo`> we removed the AVFrame *output_frame that was introduced originally and kept the H264Picture *next_output_pic (so we can keep field reconstruction / motion vector display)
[12:47:23 CET] <wm4> atomnuker: https://lists.libav.org/pipermail/libav-devel/2017-January/081803.html
[12:47:30 CET] <wm4> no idea what the numbers mean though, lol
[13:02:26 CET] <michaelni> mateo`, tickets/2254/ttvHD_vlc_sample.ts displays alot fewer frames
[13:03:18 CET] <michaelni> sample should be here: http://samples.ffmpeg.org/ffmpeg-bugs/trac/ticket2254/
[13:13:07 CET] <kierank> oh god that awful sample
[13:14:45 CET] <wm4> awful in what way?
[13:15:45 CET] <kierank> it does some horrible hacks to reduce the bitrate iirc
[13:18:09 CET] <kierank> it doesn't play correctly in my stream analyser
[13:18:14 CET] <kierank> but they send single field 1920x540
[13:18:48 CET] <kierank> and presumably the TV resizes downwards
[13:55:41 CET] <mateo`> michaelni: thx, i'll fix the regression
[14:38:42 CET] <ubitux> http://fate.ffmpeg.org/report.cgi?time=20170116075835&slot=x86_64-archlinux-gcc-valgrindundef
[14:38:48 CET] <ubitux> so are we going to fix this?
[14:39:06 CET] <ubitux> michaelni: ffv1enc is involved
[14:39:13 CET] <ubitux> not sure who is the svq1 maintainer though
[14:44:26 CET] <jamrial> ubitux: are those legit? they only showed up after you upgraded to gcc 6
[14:44:40 CET] <ubitux> i didn't only upgrade gcc 6
[14:44:49 CET] <ubitux> valgrind was in as well
[14:46:44 CET] <iive> ubitux: were you the one that had fate server where hdd flipped bits?
[14:46:51 CET] <iive> what happened with it?
[14:46:51 CET] <ubitux> yes
[14:47:03 CET] <ubitux> nothing, i'm still waiting for bitflip with the 2nd disk
[14:47:27 CET] <ubitux> if it doesn't happen in the next month or so i'll assume it's a firmware issue with the other disk
[14:47:42 CET] <iive> or bad ram on the hdd itself :D
[14:48:06 CET] <wm4> was it really a bit flip, or more like block frying?
[14:48:14 CET] <ubitux> really random bit flip
[14:48:18 CET] <ubitux> on already written data
[14:49:25 CET] <wm4> neat
[14:50:57 CET] <ubitux> jamrial: so valgrind 3.11 ’ 3.12 in the upgrade as well
[14:51:13 CET] <jamrial> yeah, i see that now
[15:43:03 CET] <jamrial_> ubitux: zero initializing both entries[6] on svq1dec.c "fixes" the svq1 failures
[16:03:53 CET] <atomnuker> jamrial: you wrote most of the float_dsp SIMD, right?
[16:04:44 CET] <atomnuker> I'd like to use it with my opus encoder but the thing is it uses frame sizes which aren't a multiple of 16 or 32
[16:05:52 CET] <atomnuker> do you think its okay-ish to overallocate the buffer for e.g. the whole overlap+current samples window
[16:06:44 CET] <atomnuker> and shift the starting point such that it'll be aligned
[16:07:03 CET] <jamrial> atomnuker: no, i didn't, i just ported one from inline asm and added avx/fma3 to others
[16:07:57 CET] <atomnuker> e.g. float *dst = s->tmpbuf[4], s->dsp->vector_fmul(dst, overlap, <not a multiple of 16 but both buffers are big enough so align to 16>)
[16:09:03 CET] <atomnuker> for s->dsp->vector_fmul_reverse I also have to append 8 zero samples at the start of the window slope since opus uses 120 frames
[16:09:40 CET] <atomnuker> I don't like it because the factor by which I need to shift dst w.r.t. s->tmpbuf varies with frame size
[16:10:19 CET] <atomnuker> but I guess I have no choice if I want to use already written SIMD
[17:48:20 CET] <ischemm> I posted here the other day about rtmpdump. Here's my attempt at combining a few different forks: https://github.com/IsaacSchemm/rtmpdump
[19:09:49 CET] <wm4> philipl: I'll wait with that patch until Libav approves it
[19:10:35 CET] <philipl> ok.
[19:12:11 CET] <wm4> philipl: I've also observed some strange cuda hwframes impl. behavior
[19:12:31 CET] <wm4> mpv didn't initialize the hwframe for the decoder
[19:12:52 CET] <wm4> and didn't set any hwframe parameters (like width/height etc.)
[19:13:21 CET] <wm4> just calling init broke it; adding the missing fields + init still worked
[19:13:27 CET] <wm4> that's all kind of weird
[19:20:40 CET] <philipl> Yeah. I didn't really understand what was and wasn't necessary. I got it to the point that it worked and moved on.
[19:20:46 CET] <philipl> Need btbn to comment I think.
[19:21:17 CET] <philipl> I think it's probably due to the weird way he's allocating memory for the frames.
[19:21:23 CET] <BtbN> hm?
[19:22:12 CET] <BtbN> The CUVID decoder shouldn't depend on any external thing being set on the hwframes context
[19:22:48 CET] <wm4> this is for cuda frames output (with a specific device context)
[19:23:15 CET] <philipl> Sorry, I meant that, and it's not btbn's fault.
[19:23:15 CET] <BtbN> what exactly is broken with them?
[19:23:34 CET] <philipl> I don't think it's broken as such, but it's weird, compared to, say, vaapi.
[19:24:12 CET] <wm4> ffmpeg_cuvid.c also "forgets" to init the hwframes context
[19:24:22 CET] <philipl> (that's why I didn't)
[19:24:35 CET] <BtbN> wm4, that's intentional and there's a comment explaining it.
[19:25:10 CET] <philipl> haha. Oh yeah.
[19:25:15 CET] <philipl> wm4: you broke 10bit doing that
[19:25:24 CET] <wm4> don't really understand the comment
[19:25:40 CET] <wm4> well the hwframes API doesn't really work this way
[19:25:46 CET] <BtbN> Basically, it doesn't know the frame width/height, which is needed to allocate the context.
[19:26:13 CET] <BtbN> Never checked if after the ffmpeg.c restructuring it's now called late enough for that to be available.
[19:26:28 CET] <philipl> Right, so there's special code that inits it later, right?
[19:26:33 CET] <philipl> after we actually know what we're dealing with
[19:26:46 CET] <BtbN> cuvid.c just checks if the context is unitialized, and fills in the gaps and does so
[19:27:17 CET] <BtbN> https://github.com/FFmpeg/FFmpeg/blob/master/libavcodec/cuvid.c#L258
[19:27:42 CET] <philipl> wm4: so in the 10bit case, we don't know sw_format until cuvid has a chance to look at the file and fill in the gaps.
[19:28:04 CET] <wm4> that's not really ok... what you want is probably now a hwframes context then
[19:28:09 CET] <wm4> but a hw device ctx
[19:28:16 CET] <wm4> (for which we would have to add a field)
[19:28:35 CET] <wm4> "  Mastering Display Color Volume SEI luminance is [0.0200, 1200.0000]"
[19:28:39 CET] <wm4> wtf prints this
[19:28:42 CET] <BtbN> There is a hwdevice ctx
[19:28:51 CET] <BtbN> that's one layer below the hwframes ctx
[19:29:20 CET] <BtbN> Thw hwdev is fully intialized by ffmpeg_cuvid.c
[19:30:43 CET] <wm4> BtbN: yes but half-initialized frames contexts in AVCodecContext.hwframes_ctx are not (or should not) really be a thing
[19:31:03 CET] <wm4> oh wow the driver prints this WTF
[19:31:10 CET] <wm4> "  Mastering Display Color Volume SEI luminance is [0.0200, 1200.0000]" <- that and others
[19:31:20 CET] <philipl> nvidia?
[19:31:20 CET] <BtbN> Yes, it's a hack, which can actually be removed now, as the transcode init is done early enough after the last set of ffmpeg.c merges.
[19:31:23 CET] <wm4> philipl: yes
[19:31:39 CET] <wm4> BtbN: you still need to be able to control the device
[19:31:57 CET] <BtbN> hm?
[19:32:02 CET] <BtbN> Don't see an issue there.
[19:32:54 CET] <wm4> there's no API for that, and you abuse the hwframes_crx
[19:32:56 CET] <wm4> ctx
[19:33:29 CET] <BtbN> Well, there allways is a hwdevice_ctx in a hwframe_ctx?
[19:33:53 CET] <BtbN> That's how all the ffmpeg.c hwaccels work
[19:34:12 CET] <BtbN> Seems perfectly fine to me.
[19:34:21 CET] <wm4> BtbN: no, hwaccels work by fully initializing the hwframe_ctx
[19:34:28 CET] <wm4> not leaving it somehow half-initialized
[19:34:50 CET] <BtbN> That's left over from before the ffmpeg.c merges, where vital information(namely width/height) wasn't available at that time.
[19:34:55 CET] <wm4> also the decoder is supposed to set sw_pix_format so the callee can init the frames
[19:35:20 CET] <BtbN> Cuvid can't set sw_pix_format before decoding the first frame
[19:36:30 CET] <wm4> that would mean hwframes_ctx is conceptually incompatible with cuvid.c
[19:36:36 CET] <wm4> philipl: hm there is some leak with mpv and cuvid
[19:36:41 CET] <BtbN> Not sure what you're getting at. If you mean that it's bad that the decoder need to be fed an hwframes_ctx in the first place, and it would be better if it just gets a hwdevice_ctx, and create an apropiate frames ctx itself, yes, i agree.
[19:36:55 CET] <wm4> philipl: playing two 4K hevc files after another makes the second fail with a out of memory cuda error
[19:37:18 CET] <wm4> BtbN: yes that's what I mean
[19:37:50 CET] <BtbN> Well, I didn't make that API, just made stuff work with it.
[19:38:04 CET] <BtbN> It would indeed be a lot cleaner if the hwframes_ctx was internal to cuvid.c
[19:38:11 CET] <BtbN> And just the device ctx is passed around
[19:38:32 CET] <wm4> so how does cuvid allocate a frame?
[19:38:41 CET] <wm4> does it do that internally in the cuda API?
[19:38:52 CET] <BtbN> It just asks the hwframes_ctx to give it a frame.
[19:39:01 CET] <wm4> what you said implies decoding and frame allocation are done in one step
[19:39:14 CET] <BtbN> They are
[19:39:20 CET] <wm4> I mean how does it know it needs a 10 bit frame?
[19:39:38 CET] <BtbN> From the sw_pix_fmt set on the hwframes_ctx
[19:40:06 CET] <wm4> and if the user didn't set any?
[19:40:26 CET] <BtbN> That's why it late-initializes the hwframes_ctx itself, to set that field itself.
[19:40:47 CET] <wm4> and why can't it call get_format at this point to let the user do that properly?
[19:40:52 CET] <BtbN> If the user set it to a mismatching format and pre-initialized the context, decoding will fail
[19:41:04 CET] <BtbN> get_format cannot be called outside of init.
[19:41:12 CET] <wm4> that's wrong
[19:41:32 CET] <BtbN> I discussed exactly that before in this channel, and the consensus was that calling get_format outside of init was a bad idea.
[19:41:38 CET] <wm4> although currently it's true that it causes problem with ffmpeg.c because ffmpeg.c is garbage (the same would work in avconv.c)
[19:42:01 CET] <wm4> even traditional hwaccels call get_format outside of init
[19:42:10 CET] <wm4> they actually _must_ do that
[19:43:44 CET] <BtbN> cuvid.c actually calls ff_get_format twice. Once on init, to make ffmpeg.c happy, and then again on the sequence callback when the first frame was decoded.
[19:44:38 CET] <wm4> oh let me check something, maybe my funny "caching" breaks that then
[19:45:11 CET] <BtbN> https://github.com/FFmpeg/FFmpeg/blob/master/libavcodec/cuvid.c#L672
[19:46:38 CET] <wm4> ok there's indeed a second call, and the second call has the right pixfmt... but decoding still doesn't work
[19:51:50 CET] <wm4> BtbN: ah, it's because cuvid.c doesn't read AVCodecContext.hw_frames_ctx after the get_format
[19:51:56 CET] <wm4> instead it uses some internal thing
[19:52:03 CET] <BtbN> after the second one?
[19:52:06 CET] <wm4> yes
[19:52:17 CET] <philipl> who is the caller in each case?
[19:52:38 CET] <wm4> ff_get_format probably unrefs the AVCodecContext.hw_frames_ctx (because that's the API)
[19:52:52 CET] <wm4> philipl: not sure what you mean
[19:53:17 CET] <BtbN> ff_get_format messes with the hwframes_ctx?
[19:53:38 CET] <philipl> ff_get_format is insane, so yes it does
[19:54:00 CET] <wm4> it has a "av_buffer_unref(&avctx->hw_frames_ctx);"
[19:54:05 CET] <wm4> it makes sense for real hwaccels
[19:54:54 CET] <wm4> basically ff_get_format = pick a format, allocate the required frame pool
[19:55:26 CET] <BtbN> hm, wonder if a second get_format would be adequate for that case
[19:55:36 CET] <BtbN> Or just calling avctx->get_format manually
[19:56:24 CET] <wm4> better not, that would cause even more confusion
[19:56:41 CET] <wm4> why a second get_format?
[19:56:42 CET] <BtbN> Cause for cuvid, this basically means that avctx->hw_frames_ctx is something useless after that call, and it's lucky to have it copied to an internal ref of the original
[19:56:47 CET] <wm4> (that'd be a third right?)
[19:57:09 CET] <wm4> internal ref? doesn't it start decoding?
[19:57:54 CET] <philipl> I added one of the calls to ff_get_format to resolve the 8 vs 10 bit thing
[19:58:13 CET] <BtbN> It holds a buffer_ref to the avctx->hw_frames_ctx it gets on init
[19:58:16 CET] <BtbN> And uses that
[19:58:42 CET] <philipl> So if it refreshed it after the second ff_get_format call, it might work out?
[19:59:03 CET] <wm4> BtbN: well it shouldn't do that
[19:59:06 CET] <BtbN> No, it would break, because the original ctx was unreffed and deleted
[19:59:17 CET] <wm4> instead it should fetch it again from AVCodecContext
[19:59:36 CET] <BtbN> And just expect that the users get_format set it to something sane?
[19:59:41 CET] <wm4> but I suspect it'd be better to change the API not to use the hw_frames_ctx and introduce a hw_device_ctx?
[19:59:48 CET] <BtbN> Or where is it supposed to come from?
[20:00:06 CET] <wm4> BtbN: maybe keep a device ref, and then compare whether the new frames ctx has the same device ref
[20:00:43 CET] <BtbN> Yes, the new API for just a device_ctx seems way better. This hwframes_ctx is a huge mess.
[20:00:59 CET] <BtbN> And let cuvid.c allocate the frames ctx internally, and just hand out frames from it.
[20:01:16 CET] <wm4> yeah
[20:01:18 CET] <wm4> I agree
[20:01:31 CET] <BtbN> The passing around and seemingly randomly unrefing it at an unexpected point is like a minefield
[20:01:35 CET] <philipl> Yeah. we only have to allocate the hwframe_ctx externally today because the hwdevice has to be allocated externally
[20:01:41 CET] <wm4> the user doesn't really gain anything from allocating the frames ctx himself right?
[20:01:45 CET] <philipl> no
[20:01:49 CET] <philipl> It's a nuisance.
[20:02:09 CET] <wm4> for vaapi for example it's "necessary" because it's the only way to control the surface pool size
[20:02:09 CET] <philipl> Well, unless you go back to my crazy idea for the OpenGL surface pool passed by the caller...
[20:02:13 CET] <BtbN> Checking the hwframes_ctx for an available device can be kept as backwards compat
[20:02:19 CET] <wm4> philipl: aha
[20:02:27 CET] <jkqxz> Where does the hwframe context which needs to get passed into the filter graph come from in that case?
[20:02:48 CET] <wm4> jkqxz: allocated by the decoder internally
[20:02:55 CET] <BtbN> ffmpeg_cuvid.c would still create it as a dummy
[20:03:04 CET] <BtbN> wm4, no, ffmpeg.c needs the hwframes_ctx for... something
[20:03:12 CET] <BtbN> to build filter graphs and stuff
[20:03:14 CET] <philipl> I fear that gets passed into the filter graph before the decoder can create one and then cannot change
[20:03:15 CET] <wm4> well it shouldn't need that
[20:03:25 CET] <wm4> ffmpeg.c is currently quite unclean and messy about this
[20:03:28 CET] <philipl> this is part of what blows up 10bit handling in ffmpeg.c
[20:03:30 CET] <jkqxz> So set as AVCodecContext.hw_frames_ctx for the user to use?
[20:03:33 CET] <wm4> the corresponding avconv.c cleanup was skipped
[20:04:02 CET] <wm4> (because ffmpeg.c was too messy to just merge it)
[20:04:54 CET] <philipl> wm4: So, as long as a caller could optionally provide the hwframes ctx, that scenario could still be supported, but otherwise you'd deal with it internally.
[20:05:19 CET] <philipl> or just fuck it because it's too messy for other reasons to ever actually do.
[20:05:20 CET] <jkqxz> I think that then needs to then be readable in get_format(), so kindof the other way around to normal hwaccel, but I think that works.
[20:05:39 CET] <wm4> I guess I'll look what the heck ffmpeg.c needs that frames ctx for (before a frame is returned) - but tomorrow
[20:05:56 CET] <BtbN> wm4, for the sw_pix_fmt
[20:06:00 CET] <BtbN> But no idea where
[20:06:17 CET] <wm4> philipl: well if the user does that, then he'll have to return a frames context for the same device
[20:06:28 CET] <BtbN> That's also the reason why stuff with ffmpeg_cuvid.c explodes with 10bit, as ffmpeg_cuvid.c has no idea about the format, and just hardcodes it to NV12
[20:06:29 CET] <wm4> and of course get_format is still needed to select the format anyway
[20:06:57 CET] <wm4> BtbN: AFAIK avconv.c makes it so that it decodes a first frame before initializing filters
[20:07:13 CET] <BtbN> That would simplify a lot of things
[20:07:23 CET] <BtbN> could just set those fields from the get_format callback then
[20:07:55 CET] <wm4> currently ffmpeg.c outputs yuv420p by default and changes to a hwaccel format by default even for "normal" hwaccels
[20:08:07 CET] <wm4> which causes all kinds of mess and requires hacks all over the place
[20:08:31 CET] <wm4> I ended up with -vf format=nv12|vaapi,hwupload when I tried with vaapi today
[20:08:57 CET] <wm4> even though I intended to do pix_fmt_vaapi -> pix_fmt_vaapi transcoding
[20:09:06 CET] <jkqxz> If hwupload worked for cuda then that would be a usable workaround for cuvid as well, I think.
[20:09:20 CET] <wm4> well it does with the patch I sent today
[20:09:31 CET] <wm4> there's also hwupload_cuda because... I don't know why
[20:09:52 CET] <philipl> but does hwupload actually no-op correctly?
[20:09:58 CET] <philipl> in this kludge?
[20:10:01 CET] <jkqxz> It existed before generic hwupload.  It should just be removed.
[20:10:09 CET] <jkqxz> Yes, it was deliberately written to do so.
[20:10:29 CET] <philipl> Good to know.
[20:10:50 CET] <jkqxz> (Because the other tine had the same hack for a short time before the improvements to avconv which rendered it unnecessary.)
[20:15:00 CET] <wm4> philipl: so, is there anything like zero-copy possible with cuvid+gl? (just asking theoretically)
[20:15:40 CET] <BtbN> zero CPU copy, yes. Zero GPU copy, no.
[20:16:01 CET] <BtbN> you need one more cuMemcpy2D
[20:16:30 CET] <BtbN> Which isn't bad performance wise
[20:16:46 CET] <wm4> so the theoretical minimum is 2 cuMemcpys?
[20:16:53 CET] <philipl> minium is 1.
[20:16:58 CET] <philipl> With current code, 2.
[20:17:11 CET] <philipl> This is my whole crazy crusade from last year
[20:17:14 CET] <BtbN> Well, the theoretical minimum is 0, but pulling that off would be... one huge mess
[20:17:31 CET] <philipl> BtbN: I didn't see how to make cuvid decode directly to an external buffer
[20:17:38 CET] <BtbN> You can't.
[20:17:40 CET] <philipl> it always wants its own pool that you need to copy out of
[20:17:45 CET] <wm4> I see
[20:17:48 CET] <BtbN> But you can map the cuvid buffer to GL while the cuvid frame is still mapped.
[20:17:57 CET] <philipl> Ah, right. yes, if it's all synchronous
[20:18:19 CET] <philipl> I'm pretty sure you could never build a player that way
[20:18:21 CET] <BtbN> Not even neccesarily
[20:18:35 CET] <philipl> Well, hugh buffer pool and very close coupling
[20:18:53 CET] <BtbN> if there was a CUVID pix fmt, where the actual cuvid frame was in there instead of a CUDA device pointer, ...
[20:19:01 CET] <BtbN> And the consumer of the frame is the one that maps it
[20:19:25 CET] <philipl> Sure, but the consumer has to be done with the frame before cuvid runs out of frames in the pool and needs it back
[20:19:39 CET] <BtbN> Yes, you need a bit of buffer in cuvid
[20:19:49 CET] <BtbN> but increasing that buffer by a couple frames isn't that bad.
[20:20:06 CET] <BtbN> And usually the delay is just one or two frames until it gets freed
[20:20:16 CET] <jkqxz> You can map a cuvid frame to opengl (or maybe something else), just not cuda?
[20:20:25 CET] <philipl> Well, I could try. I know how to add extra cuda pix fmts :-)
[20:21:15 CET] <philipl> jkqxz: So, today what happens is the decoder has a pool and immediately copies the cuvid owned frame to the pool
[20:21:18 CET] <BtbN> I don't agree with that approach btw ;)
[20:21:29 CET] <BtbN> A pure GPU memcpy is essentialy free
[20:21:49 CET] <wm4> and on the renderer side, it's GPU-memcpy'd to the opengl texture again
[20:21:55 CET] <philipl> right
[20:21:59 CET] <philipl> So two copies.
[20:22:29 CET] <philipl> My proposal was to have the consumer pass a pool of opengl textures (as cuda arrays) to replace the decoder's pool.
[20:22:44 CET] <wm4> sounds complex
[20:22:47 CET] <philipl> That cuts you down to one copy but requires a new pix fmt (cuda arrays are not flat memory) and doesn't work for other reasons
[20:23:38 CET] <wm4> yeah, sounds not worth the trouble I guess
[20:23:41 CET] <philipl> BtbN outlined an alternative, to try and remove the separate pool and pass the cuvid owned frames out directly.
[20:23:45 CET] <philipl> That's actually still one copy
[20:23:55 CET] <philipl> As it's still not an opengl texture
[20:24:10 CET] <BtbN> Can't you map a cuvid frame straight to a texture?
[20:24:14 CET] <philipl> No
[20:24:21 CET] <philipl> It has to be a cuda array
[20:24:26 CET] <philipl> and cuvid can't use arrays
[20:24:36 CET] <wm4> I wonder how this compares to vaapi
[20:24:51 CET] <wm4> where we seemingly can map a surface as texture without any copies
[20:24:51 CET] <BtbN> I think cuvid has its own OpenGL interop
[20:25:14 CET] <philipl> wm4: vaapi's native surfaces with the zero-copy mapping are the moral equivalent of the cuda arrays
[20:25:19 CET] <wm4> or vdpau, where surfaces are mapped as 2 fields (as wtf as it sounds)
[20:25:22 CET] <philipl> BtbN: never saw it
[20:25:44 CET] <philipl> the cuda-gl interop maps textures as cuda arrays
[20:25:59 CET] <philipl> it's also only one way. You can't turn a cuda allocated anything into a GL anything
[20:26:03 CET] <philipl> Off for lunch.
[20:26:08 CET] <philipl> Have fun...
[20:29:31 CET] <philipl> and no, there is no cuvid specific interop.
[23:49:26 CET] <haasn> philipl: have you looked at cuvid's vulkan interop (if existant) at all?
[00:00:00 CET] --- Tue Jan 17 2017