[Ffmpeg-devel-irc] ffmpeg-devel.log.20180509

burek burek021 at gmail.com
Thu May 10 03:05:03 EEST 2018


[00:00:00 CEST] <cone-738> ffmpeg 03Michael Niedermayer 07master:293a6e83325a: avformat/mxfenc: Write transfer characteristic
[05:36:38 CEST] <philipl> BtbN: I'm trying out your patches with mpv and I'm seeing glitched playback
[05:36:54 CEST] <philipl> I tried forcing up the pool size but that didn't change anything so not immediately clear what the issue is.
[05:37:21 CEST] <philipl> I'm seeing incorrect macroblocks at various points - like you'd expect if information was missing.
[05:47:03 CEST] <philipl> BtbN: Mind you, I'm only seeing this with the infamous Sony 4k Camp video. I've got other 4k hevc stuff here that looks fine.
[05:57:42 CEST] <philipl> BtbN: ok - scratch that. It was always broken. I guess I never tried it before.
[05:58:05 CEST] <philipl> The sample plays correctly with cuvid, so I guess we're doing something wrong populating all the pic params. So unrelated issue.
[05:58:11 CEST] <philipl> Other samples all played fine.
[08:14:55 CEST] <lrusak> jkqxz, I've looked over your comments on my patch
[08:15:41 CEST] <lrusak> I've no idea how AVCodecContext.get_format works. I assume if rkmpp outputted anything other than AV_PIX_FMT_DRM_PRIME that it would have to use that too?
[08:17:48 CEST] <lrusak> we check avbuf->num_planes > 1 because well, that's what the existing code did and also because we will need special handling for NV12M
[08:19:41 CEST] <lrusak> plane_info[n].bytesperline set for n > 0 <<< no, as we currently only work with single plane
[08:20:07 CEST] <lrusak> no 10bit handling (I'm not sure if any driver supports 10bit output)
[08:20:27 CEST] <lrusak> I've split out the cosmetic changes for the next round
[08:21:05 CEST] <lrusak> I'll have to look into the hw_frames_ctx reference
[08:22:08 CEST] <lrusak> and no hwdownload doesn't work and I have no idea what that is for
[10:50:30 CEST] <BtbN> philipl, interesting, so even without the series applies, it's broken with nvdec?
[11:00:51 CEST] <durandal_1707> ubitux: so you wanna implement bm3d too?
[11:03:42 CEST] <ubitux> nah
[12:43:10 CEST] <BtbN> philipl, do you see any increase in performance/decrease in CPU/system load when commenting out this line and doing a transcode? http://git.videolan.org/?p=ffmpeg.git;a=blob;f=libavcodec/nvenc.c;h=d61955c03ac3d4f3ade6653b7f2056ac083c66a4;hb=HEAD#l1868
[12:44:04 CEST] <BtbN> Someone is reporting that his system load "went from 15 to about 55" after the patch introducing it, and reverting it fixes it.
[12:44:10 CEST] <BtbN> 32bc4e77f61a5483c83a360b9ccbfc2840daba1e
[12:51:51 CEST] <durandal_1707> bm3d can use dct, we already have that it appears
[13:02:11 CEST] <durandal_1707> anyone willing to comment on fftdnoiz filter?
[13:18:37 CEST] <durandal_1707> gagandeep_: are you active?
[13:19:19 CEST] <gagandeep_> yeah, i am looking in p frames
[13:19:40 CEST] <gagandeep_> i just downloaded the samples in ffmpeg directory but 2 of them are intra frames
[13:19:48 CEST] <gagandeep_> and 2 have buggy containers
[13:21:17 CEST] <gagandeep_> initially i was trying to stop threaded tasks in Testcfhd
[13:22:27 CEST] <gagandeep_> but bayer.c file has been used in decoder.c file such that on turning off THREADED macro in cineform compilation errors occur
[13:26:10 CEST] <gagandeep_> durandal_1707: for the last 2 days i have been looking in the code for p-frames, and since they aren't exactly consistent with their naming scheme it takes time to find connections in code
[13:27:09 CEST] <BtbN> Anyone have a better idea how to do this version checking? https://github.com/BtbN/FFmpeg/commit/39fca52122bbfda7a9b6a966b37b6296dba73afe#diff-e2d5a00791bce9a01f99bc6fd613a39dR5890
[13:33:17 CEST] <kierank> gagandeep_: should be temporal transform files in FFmpeg cfhd samples directory
[13:34:58 CEST] <gagandeep_> kierank: i tried opening 2 files of samples even in after effects on my friends pc, but those 2 were not working, also not in TestCFHD, and the rest of files (2) are using intra frame function in TestCFHD 
[13:35:38 CEST] <gagandeep_> cineform differentiates the type of function it need based on the sample type
[13:37:23 CEST] <gagandeep_> i think it uses group of frames (or gop) for p-frames
[13:37:27 CEST] <kierank> I need to find the name of the file, but I had some picture of a mountain
[13:37:30 CEST] <kierank> It doesn't use gop
[13:37:48 CEST] <kierank> It has repeat frame flags but then it uses different transform type
[13:37:54 CEST] <kierank> And it does a 3d transform
[13:38:57 CEST] <gagandeep_> i don't mean gop as in mpeg
[13:39:18 CEST] <gagandeep_> i think here they use gop as their own naming scheme
[13:40:42 CEST] <kierank> Link?
[13:43:49 CEST] <gagandeep_> https://github.com/gopro/cineform-sdk/blob/master/Codec/codec.h#L964
[13:44:07 CEST] <gagandeep_> sample struct only contains iframe and group
[13:44:28 CEST] <gagandeep_> they only use group naming all over codec.h
[13:45:19 CEST] <gagandeep_> only in 2 or 3 places in codec.h they use p-frame or inter frame
[13:48:54 CEST] <kierank> gagandeep_: https://samples.ffmpeg.org/V-codecs/CFHD/MT_BeartoothHighway_1min_Cineform.avi
[13:49:24 CEST] <gagandeep_> yeah i have downloaded it, it gives error with testcfhd but opened in after effects
[13:49:48 CEST] <kierank> I would try fixing testcfhd then
[13:49:54 CEST] <kierank> otherwise you have no way of testing the clip
[13:49:55 CEST] <gagandeep_> you sure this is p-frame
[13:49:57 CEST] <kierank> yes
[13:50:03 CEST] <kierank> it doesn't play in ffmpeg iirc
[13:50:04 CEST] <gagandeep_> i will then focus on this file
[13:51:12 CEST] <gagandeep_> kierank: i need to go afk, anything else you want to ask
[13:51:21 CEST] <kierank> nothing
[15:16:59 CEST] <jkqxz> lrusak:  Yes, it would (rkmpp avoids needing get_format by only supporting the DRM object output).  I'm not sure what the best reference for how to do this is, maybe the non-hwaccel hardware decoders in cuvid or qsv?  Both have weird quirks, though.
[15:17:39 CEST] <jkqxz> lrusak:  So V4L2 NV12 /must/ have the chroma plane immediately following the luma plane in memory, and the pitch of the two planes /must/ be identical?  If that's true then sure.
[15:18:36 CEST] <jkqxz> I think newer things should support 10-bit output (even on mobile; Rockchip certainly does).  If this isn't supported in V4L2 yet then I guess it can be ignored, though.
[15:21:16 CEST] <jkqxz> hwdownload is a filter which takes a hardware frame and downloads it into a software frame, using the common hwcontext stuff.  You should make it work, because that and mapping are clearly useful here once you go beyond trivial playback cases.
[16:15:35 CEST] <cone-942> ffmpeg 03James Almer 07master:c6a63e11092c: avcodec/cbs_h2645: use AVBufferRef to store list of active parameter sets
[16:20:58 CEST] <philipl> BtbN: So yeah - the Sony Camp video was broken before your change. I did not yet try to bisect to see if any of the recent hevc pic param changes are involved.
[16:21:48 CEST] <philipl> As for register vs map. For OpenGL Interop, you are supposed to register/map/unregister at the beginning and then you unmap when you're finished with the surface. This was badly documented and I had bad CPU usage too until I changed it.
[16:21:58 CEST] <philipl> I would guess it's the same here, given the terminoligy.
[16:22:20 CEST] <philipl> ie: You only ned to be registered when you are mapping.
[17:02:35 CEST] <jdarnley> https://github.com/FFmpeg/FFmpeg/blob/master/libavcodec/vc2enc_dwt.c#L78
[17:03:14 CEST] <jdarnley> atomnuker: are the factors are right in that ^ line or am I missing something?
[17:03:33 CEST] <jdarnley> The taps(?) of the filter are -1 9 9 -1
[17:04:04 CEST] <jdarnley> And the samples are mirrored at the edges.
[17:05:09 CEST] <jdarnley> So shouldn't the mirroring for synthl[-2] be synthl[2]?
[17:05:16 CEST] <atomnuker> not sure, those transforms came from the gsoc of 10 years ago
[17:05:23 CEST] <jdarnley> Oh
[17:06:32 CEST] <jdarnley> To ninish my question completely...
[17:06:35 CEST] <jdarnley> *finish
[17:07:09 CEST] <jdarnley> Shouldn't it be this?  9*synthl[0] + 8*synthl[2]
[17:08:29 CEST] <jdarnley> I need to test against a reference
[17:11:55 CEST] <philipl> BtbN: I commented it out. No change in fps but CPU usage went down 1 or 2%. fps is maxed anyway I think. nvidia-settings reports 100% utilisation.
[17:12:15 CEST] <philipl> But yeah, I think you can unregister right after mapping.
[17:12:40 CEST] <BtbN> I have to
[17:12:51 CEST] <BtbN> the frame is unrefed right after, so not unregistering _should_ make it explode
[17:12:58 CEST] <BtbN> i don't really understand why it doesn't
[17:13:38 CEST] <philipl> I mean, you should unregister immediately after you *map*
[17:13:58 CEST] <BtbN> hm? That seems a bit early
[17:14:01 CEST] <philipl> If this works the same way as opengl interop, it only needs to be registered to map it. Nothing else.
[17:14:34 CEST] <BtbN> Given that _nothing_ happens if you never unmap anything, I suspect it actually does nothing
[17:17:34 CEST] <philipl> Maybe not. It's confusing.
[17:17:44 CEST] <philipl> Could always ask the nvidia guys. Might even get an answer...
[17:21:52 CEST] <philipl> BtbN: certainly the docs say you should register/map/use/unmap/unregister
[17:22:13 CEST] <BtbN> Well, it's certainly required by the docs
[17:22:22 CEST] <BtbN> but just not doing it doesn't break anything
[17:22:34 CEST] <BtbN> which makes me suspect it's currently a noop
[17:22:35 CEST] <philipl> I suppose it's leaking something.
[17:22:45 CEST] <philipl> There's an entry somewhere that's going to stick around forever.
[17:23:07 CEST] <BtbN> Well, given that in transcode-mode, there will only ever be the same ~10 frames re-used at all times...
[17:23:21 CEST] <BtbN> I can see why not unregistering them makes it more efficient
[17:23:56 CEST] <BtbN> hm, the only thing stopping me from pushing the series is the weird configure version thing
[17:24:02 CEST] <BtbN> But I really don't think it can be done nicer
[17:31:00 CEST] <philipl> BtbN: I don't see how either.
[17:32:10 CEST] <philipl> BtbN: So, remind me. The frames here are being reused, ok. So you really just want to register them up front (or at least when you first see them) and then unregister when closing - that's perfectly fine and doesn't contradict anything, right?
[17:43:07 CEST] <BtbN> nvenc has no idea the frames are going to be reused
[17:43:38 CEST] <BtbN> for all it knows, after unref the pointers are invalid
[17:54:40 CEST] <jdarnley> atomnuker: sorry about the noise.  I was mistaken, they don't get mirrored, they get clipped.
[18:10:05 CEST] <atomnuker> jdarnley: you're still going to verify them, right?
[18:16:15 CEST] <jdarnley> yes
[18:38:57 CEST] <jamrial> jkqxz: would https://pastebin.com/AufYmR2q work around the remaining case of potential dangling references?
[19:21:57 CEST] <durandal_1707> can i apply dct of non power of 2 with padding do something with it and do inverse without affecting quality?
[19:23:41 CEST] <jkqxz> jamrial:  Seems reasonable?  Though it also loses the copy property, so the user can unexpectedly change the stored SPS again.  (That was also true before the previous change, so probably doesn't matter.)
[19:26:04 CEST] <atomnuker> durandal_1707: no
[19:26:47 CEST] <atomnuker> what are you trying to do?
[19:28:35 CEST] <durandal_1707> atomnuker: 3d dct for bm3d
[19:28:55 CEST] <jamrial> jkqxz: that's why i set the read_only flag, even if it's not foolproof
[19:30:43 CEST] <durandal_1707> atomnuker: what are minimal and maximal bits for av_dct ?
[19:34:50 CEST] <atomnuker> must be between 4 and 15
[19:35:10 CEST] <atomnuker> its limited by the rdft
[19:36:45 CEST] <durandal_1707> atomnuker: why not 3?
[19:44:22 CEST] <atomnuker> because of how rdft splits the transform into 2x ffts
[19:44:55 CEST] <atomnuker> though looking at the code I think it would be safe to go down to 2 bits
[19:50:25 CEST] <durandal_1707> atomnuker: please add it
[19:55:12 CEST] <atomnuker> I'll need to test it first
[19:56:07 CEST] <atomnuker> if you can go ahead, remove the limit and test it though, I'm working on detangling the FFT from lavc
[19:57:26 CEST] <durandal_1707> it just needs new tables?
[20:02:40 CEST] <ubitux> it seems i have temporal denoising working in nlmeans.
[20:09:12 CEST] <ubitux> i used the "activate" model, but i'm not sure i'm doing it exactly right
[20:12:18 CEST] <ubitux> durandal_1707: https://github.com/ubitux/FFmpeg/commits/nlmeans-temporal
[20:12:21 CEST] <ubitux> if you want to test
[20:12:59 CEST] <durandal_1707> ubitux: why you use activate? its only for >1 inputs
[20:13:54 CEST] <ubitux> dunno, the buffering logic i needed felt a bit tricky
[20:14:37 CEST] <ubitux> you think i should use request_frame callback instead? 
[20:16:43 CEST] <durandal_1707> ubitux: if you could peek frames from internal lavfi queue keep it
[20:18:50 CEST] <durandal_1707> see ff_framequeue_peek
[20:20:19 CEST] <ubitux> i guess i'll try to switch to the "legacy" model
[20:20:53 CEST] <philipl> BtbN: isp fell over. Miss anything?
[20:21:19 CEST] <BtbN> <BtbN> nvenc has no idea the frames are going to be reused
[20:21:19 CEST] <BtbN> <BtbN> for all it knows, after unref the pointers are invalid
[20:22:37 CEST] <durandal_1707> ubitux: no!
[20:23:03 CEST] <ubitux> no?
[20:23:14 CEST] <ubitux> legacy = filter_frame+request_frame
[20:23:19 CEST] <ubitux> you want me to keep activate?
[20:23:38 CEST] <durandal_1707> ubitux: i see that its nicer soluion than legacy or keeping frames in filter private struct
[20:24:13 CEST] <durandal_1707> ubitux: yes, you do not need to store frame in private filter place, just use lavfis fifos
[20:24:49 CEST] <durandal_1707> use #define FF_INTERNAL_FIELDS 1
[20:24:52 CEST] <ubitux> do i have the same control on the fifo?
[20:25:06 CEST] <ubitux> like, i need a delay of `n` (user specified)
[20:25:14 CEST] <ubitux> and that's basically it
[20:25:23 CEST] <ubitux> (+ making sure the flush works)
[20:25:39 CEST] <durandal_1707> you can peek any frame in fifo, and just request for more until eof
[20:25:51 CEST] <ubitux> well, i don't exactly need absolute peek
[20:26:00 CEST] <ubitux> it's linear, the cycling i'm using is just to avoid memmoves
[20:26:34 CEST] <ubitux> it's not much code
[20:26:36 CEST] <durandal_1707> i'm just saying that lavfi already provides fifo with .activate model
[20:26:51 CEST] <durandal_1707> it can be rewritten latter
[20:27:10 CEST] <durandal_1707> you can keep .activate ....
[20:27:23 CEST] <ubitux> mmh
[20:29:58 CEST] <ubitux> is there any filter currently making use of it? 
[20:33:06 CEST] <durandal_1707> ubitux: making use of what?
[20:33:11 CEST] <ubitux> the fifo
[20:33:14 CEST] <JEEB> framesync2 I guess?
[20:34:00 CEST] <durandal_1707> ubitux: amerge
[20:34:42 CEST] <ubitux> mmh
[20:34:47 CEST] <ubitux> well, i'll check later
[20:34:54 CEST] <ubitux> not sure all of this is worth the hassle
[20:35:12 CEST] <ubitux> anyway, if i understand https://github.com/Khanattila/KNLMeansCL/wiki/Filter-description correctly, KNLMeansCL(clip, 6, 32, 3, 4.0) translates to nlmeans=n=13:r=33:p=3:s=4
[20:35:18 CEST] <ubitux> it's taking a while :p
[20:36:14 CEST] <ubitux> it's really blurry, so i guess it's not exactly this
[20:36:19 CEST] <ubitux> maybe s=0.4...
[20:36:35 CEST] <ubitux> which isn't supported
[20:36:37 CEST] <ubitux> oh well.
[20:37:32 CEST] <BtbN> hm, is FFALIGN(x, 256)*2 the same as FFALIGN(x*2, 256)?
[20:38:32 CEST] <BtbN> for all cases i tested it, it was. But I'm not sure if it's universally true
[20:43:04 CEST] <jdarnley> BtbN: I think the first is FFALIGN(x*2, 512)
[20:43:18 CEST] <ubitux> with x=3, FFALIGN(x, 256)*2=512 and FFALIGN(x*2, 256)=256
[20:43:40 CEST] <ubitux> it's basically false from x=1 to x=128
[20:43:53 CEST] <BtbN> ok, so the hwcontext_cuda implementation is wrong.
[20:45:28 CEST] <BtbN> wrong in a not trivially fixable fashin, hm
[20:48:37 CEST] <BtbN> It calculates the buffer size using "FFALIGN(width, 256) * element_size", while the code to fill the data and linesize pointers uses FFALIGN(width * element_size, 256)
[20:48:47 CEST] <BtbN> the later seems more correct to me
[20:54:31 CEST] <durandal_1707> ubitux: amerg4
[20:54:44 CEST] <durandal_1707> please ignore
[20:55:26 CEST] <philipl> BtbN: you found a case where you get different results?
[20:55:39 CEST] <BtbN> well, it's unlikely, but possible
[20:55:58 CEST] <BtbN> it's also silly to have the logic duplicated
[20:57:19 CEST] <philipl> certainly
[21:00:06 CEST] <BtbN> hm, pixdesc isn't overly helpful for this either, as it behaves differently for RGB
[21:02:36 CEST] <philipl> You have a use-case for RGB or just doing it for fun?
[21:02:49 CEST] <BtbN> well, if you want to pass OpenGL frames to nvdec
[21:02:54 CEST] <BtbN> You need to copy from that Array
[21:03:07 CEST] <BtbN> so having a frames_ctx actually just give you the RGB frames is handy
[21:03:25 CEST] <BtbN> and it's like 4 lines of code
[21:05:06 CEST] <philipl> BtbN: to nvenc you mena
[21:05:19 CEST] <BtbN> yeah
[21:27:07 CEST] <cone-942> ffmpeg 03Aman Gupta 07master:dd77cca1c4b4: avcodec/videotoolbox: cleanups
[21:27:08 CEST] <cone-942> ffmpeg 03Aman Gupta 07master:07d175d0b0b2: avcodec/videotoolbox: split h264/hevc callbacks
[21:27:09 CEST] <cone-942> ffmpeg 03Aman Gupta 07master:a19bac8fc8b6: avcodec/hevc: remove videotoolbox hack
[21:30:32 CEST] <cone-942> ffmpeg 03Aman Gupta 07master:12ceaf0fbacb: ffprobe: fix SEGV when new streams are added
[21:34:26 CEST] <tmm1> JEEB: found an edge case bug in mpegts section parser where it ignores multiple tables in a single packet
[21:34:37 CEST] <JEEB> \o/
[21:36:13 CEST] <BtbN> philipl, this logic in hwcontext_cuda is completely messed up wtf
[21:36:28 CEST] <BtbN> it gets to the right result, but it's really convoluted
[21:36:43 CEST] <philipl> I take no credit for it, fortunately.
[21:36:52 CEST] <philipl> or is this my 10bit handling?
[21:37:09 CEST] <BtbN> 10 bit is just 16 bit for this purpose
[21:37:15 CEST] <BtbN> this only cares about bytes
[21:37:43 CEST] <BtbN> calling it aligned_width is just incredibly confusing as well
[21:38:16 CEST] <BtbN> https://github.com/BtbN/FFmpeg/commit/6a25397a86b636c98bac3dd0c4432ea916a103e7
[21:38:57 CEST] <philipl> I think the CUDA alignment part was your doing :-)
[21:39:11 CEST] <BtbN> I feel like all of this can also easily be generalized using pixdesc. All the info should be in there.
[21:39:19 CEST] <philipl> Agreed on that.
[21:39:20 CEST] <BtbN> It's just not straight forward because of the alignment
[21:39:40 CEST] <BtbN> if not for the alignment, this would just be av_get_padded_bits_per_pixel() * height
[21:40:02 CEST] <philipl> Did we prove that cuda alignment made a difference?
[21:40:23 CEST] <BtbN> cuMemcpy2D will straight up error on you if you don't align.
[21:40:28 CEST] <cone-942> ffmpeg 03Aman Gupta 07master:2c500f50972c: avformat/mpegts: skip non-PMT tids earlier
[21:40:48 CEST] <BtbN> unless you use cuMemcpy2DUnaligned, which will be dirt slow in cases where the normal function would error
[21:41:31 CEST] <BtbN> If not for YUV420P, this could all just use cuMemAllocPitch.
[21:41:40 CEST] <BtbN> But cuMemAllocPitch does not support different line-sizes per component
[21:42:25 CEST] <BtbN> I don't think we have ever tested hwframe input with YUV420P in it to nvenc though
[21:42:30 CEST] <BtbN> it might be plain broken
[21:42:38 CEST] <philipl> It's hard to produce that scenario
[21:42:43 CEST] <JEEB> tmm1: btw that PMT thing, doesn't parse_section_header already move the pointer?
[21:42:51 CEST] <JEEB> or do we put another copy of the pointer into pmt_cb
[21:43:01 CEST] <JEEB> so if we early return from it, we can then just fallback
[21:43:11 CEST] <BtbN> nvenc clearly only wants one pitch parameter, over all planes
[21:43:16 CEST] <philipl> BtbN: anyway, change looks reasonable.
[21:43:27 CEST] <BtbN> but I think it's documented that for YUV420 it will halve the pitch for plane 1 and 2
[21:44:36 CEST] <BtbN> it's also still funny that the data planes 1 and 2 are swapped, just to make ffmpeg yuv420 and nvenc yuv420 match up
[21:45:14 CEST] <philipl> Yeah. More evidence of disuse...
[21:45:15 CEST] <tmm1> JEEB: the issue was skip_identical changes tssf->last_ver
[21:45:27 CEST] <JEEB> yea
[21:45:46 CEST] <tmm1> the stream pointer doesn't matter, although i think pmt_cb has its own pointer into the buffer
[21:45:52 CEST] <BtbN> philipl, hm, actually, it's pretty easy to test yuv420p hw input into nvenc
[21:45:54 CEST] <JEEB> ah, ok
[21:46:01 CEST] <BtbN> just putting a hwupload_cuda into any sw chain will achive it
[21:46:02 CEST] <JEEB> yea, that's the only thing that kept me from going LGTM
[21:46:03 CEST] <BtbN> and it works
[21:46:08 CEST] <JEEB> since I didn't have the time to double-check it
[21:46:30 CEST] <tmm1> yea for sections the entire thing is copied into a temp buffer anyway
[21:46:31 CEST] <JEEB> but yup, then that was LGTM :)
[21:46:37 CEST] <JEEB> ah, ok
[21:46:44 CEST] <JEEB> I've really just begun hacking on mpegts.c
[21:46:44 CEST] <tmm1> and then it only parses the first table in the buffer heh
[21:46:46 CEST] <philipl> BtbN: The get_buffer stuff is confusing.
[21:47:11 CEST] <BtbN> you mean the offset calculations in cuda_get_buffer?
[21:47:21 CEST] <philipl> Why is data[2] taken as an offset from data[0] and data[1] an offset from data[2]
[21:47:26 CEST] <JEEB> like https://github.com/jeeb/ffmpeg/commits/mpegts_arib_stuff
[21:47:26 CEST] <philipl> What is data[1] originally?
[21:47:38 CEST] <cone-942> ffmpeg 03Aman Gupta 07master:1a14e3914581: avformat/mpegts: use MAX_SECTION_SIZE instead of hardcoded value
[21:47:39 CEST] <cone-942> ffmpeg 03Aman Gupta 07master:07d9c31055e6: avformat/mpegts: clean up whitespace
[21:47:57 CEST] <BtbN> philipl, just because data[2] and data[1] are swapped
[21:48:14 CEST] <BtbN> that's exactly the yuv420p special case
[21:48:18 CEST] <philipl> Oh, ok. Because it starts off contiguous and then it has to be calculated relative to the previous plane.
[21:48:21 CEST] <philipl> yeah, yeah.
[21:54:00 CEST] <BtbN> calculating the size of a plane is not easy in a generic way
[22:25:48 CEST] <BtbN> philipl, am I missing something, or should this just do it? https://github.com/BtbN/FFmpeg/commit/8eac960e89de6ce6fb614a9a401a99a043ed354e
[22:30:52 CEST] <philipl> BtbN: Sure - I didn't know about av_image.
[22:31:04 CEST] <BtbN> Why was this not done like this in the first place?
[22:31:11 CEST] <philipl> For that reason.
[22:31:15 CEST] <philipl> None of us knew about it.
[22:31:41 CEST] <BtbN> I knew those functions existed. I just had no idea they could be used.
[22:32:07 CEST] <BtbN> the supported pix fmt list is now kinda silly
[22:32:10 CEST] <BtbN> it supports all of them
[22:32:56 CEST] <philipl> Is there a way to express that?
[22:33:05 CEST] <BtbN> not really
[22:33:36 CEST] <BtbN> since it fills a valid_sw_formats list
[22:33:38 CEST] <philipl> So. shouldn't the YUV420P swap happen inside nvenc?
[22:33:44 CEST] <BtbN> can't
[22:33:54 CEST] <BtbN> hwinput only wants the main pointer
[22:33:56 CEST] <philipl> if you view this as generic frame storage, it shoudn't silently flip it
[22:34:00 CEST] <philipl> ugly
[22:34:15 CEST] <BtbN> if you don't use it for nvenc, but only for yourself, the swapping has no meaning for you
[22:34:29 CEST] <BtbN> you grab the pointers from data[x], and don't even notice the swapping
[22:34:42 CEST] <BtbN> it's only nvenc which registers only the data[0] pointer and goes from there that cares
[22:35:34 CEST] <philipl> That's confusing.
[22:35:45 CEST] <philipl> That implies the buffer is already flipped
[22:35:58 CEST] <BtbN> nvenc just doesn't expect YUV420P, but I420
[22:36:05 CEST] <BtbN> which is identical, except that V comes first
[22:36:26 CEST] <philipl> Yes, but I mean you say we pass data[0] to nvenc and it does everything internally.
[22:36:29 CEST] <BtbN> to avoid entirely pointless and costly pix_fmt conversions, ffmpeg pretents it's YUV420P
[22:36:32 CEST] <philipl> So it must already be flipped.
[22:36:38 CEST] <philipl> or how can it work
[22:36:43 CEST] <BtbN> no, nvenc just uses a different pix_fmt
[22:36:46 CEST] <philipl> if nvenc isn't looking at the flipped data[1] and data[2]
[22:36:47 CEST] <BtbN> internally
[22:36:57 CEST] <BtbN> nvenc expects the planes in order Y, V and U
[22:37:01 CEST] <BtbN> ffmpeg in Y, U and V
[22:37:06 CEST] <philipl> yes...
[22:37:29 CEST] <BtbN> and as nvenc only looks at the Y pointer, and goes from there, you can just swap the U and V pointers in ffmpeg, and you're good to go
[22:37:46 CEST] <philipl> This is the part that still confuses me.
[22:37:51 CEST] <philipl> When is the data actually rearranged?
[22:38:03 CEST] <BtbN> never
[22:38:11 CEST] <BtbN> it's always written to the place nvenc expects
[22:38:17 CEST] <BtbN> ffmpeg uses 3 seperate pointers, one per plane
[22:38:23 CEST] <BtbN> nvenc uses one for all 3
[22:38:30 CEST] <BtbN> which is why you can do that in the first place
[22:38:46 CEST] <philipl> Ok. Ok.
[22:38:55 CEST] <philipl> So, get_buffer allocates a single contiguous region.
[22:39:05 CEST] <philipl> We transfer the data plane by plane
[22:39:22 CEST] <philipl> so at transfer time, they are written in I420 order but the pointers are assigned as YUV420
[22:39:28 CEST] <philipl> and nvenc uses the contiguous buffer
[22:39:31 CEST] <philipl> Ok. I get that.
[22:40:05 CEST] <BtbN> You know what would be fun? replacing the cuMemAlloc with cuMemAllocManaged 
[22:40:23 CEST] <BtbN> hwupload/download would then be a matter of doing nothing
[22:40:55 CEST] <BtbN> or rather, in-place converting a frame
[22:42:57 CEST] <philipl> wee
[22:43:00 CEST] <philipl> yeah
[22:44:13 CEST] <philipl> Probably limits applicable hardware
[22:45:38 CEST] <philipl> Assuming we start with a cuda allocated frame, it would be transparent download and upload. We can't adopt system memory, though.
[22:47:04 CEST] <philipl> Actually - don't you mean cuMemAllocHost?
[22:47:13 CEST] <philipl> cuMemAllocManaged is not transparent to regular CPU code.
[22:50:13 CEST] <BtbN> no, cuMemAllocHost always allocated Host-Memory, that is then made accessible to the device, but keeps residing in host memory
[22:50:14 CEST] <BtbN> -> slow
[22:50:34 CEST] <philipl> How do you ensure cuMemAllocManaged memory is migrated to the host for sw access?
[22:50:42 CEST] <BtbN> it is automatically
[22:50:50 CEST] <philipl> It gets page-faulted in?
[22:51:02 CEST] <BtbN> the driver migrates it to the device on access there, and other way around when accesses on the system
[22:51:21 CEST] <BtbN> you can also give it a bunch of hints how to do the migration
[22:51:32 CEST] <BtbN> if it should just leave it on the device and do DMA, or actually copy it over
[22:51:43 CEST] <philipl> Ok. I didn't see a clear statement in the doc that it was directly usable as a regular memory pointer
[22:51:55 CEST] <BtbN> that's the whole point with the unified memory stuff
[22:52:06 CEST] <BtbN> https://docs.nvidia.com/cuda/cuda-driver-api/group__CUDA__UNIFIED.html#group__CUDA__UNIFIED
[22:52:43 CEST] <philipl> I'll take your word for it.
[22:53:20 CEST] <philipl> Docs don't link managed with unified
[22:53:30 CEST] <philipl> but I can easily believe you, and obviously if it works, it works.
[22:55:07 CEST] <BtbN> it's kind of the entire point of that function
[22:57:46 CEST] <philipl> I buy it.
[00:00:00 CEST] --- Thu May 10 2018


More information about the Ffmpeg-devel-irc mailing list