[Ffmpeg-devel-irc] ffmpeg-devel.log.20161002

burek burek021 at gmail.com
Mon Oct 3 03:05:03 EEST 2016


[00:09:46 CEST] <Chloe> Anyone got comments on my latest developer guidelines cleanup/restructure patch? I'd like to get some input from other devs as well, as it's meant to be for us anyway
[00:26:47 CEST] <Chloe> jamrial: yes >.< I did notice that just after I asked for comments
[01:10:49 CEST] <Chloe> hopefully this version's changes are more legible
[01:25:12 CEST] <philipl> BtbN: which hardware are you testing your alignment change with?
[01:25:22 CEST] <BtbN> GTX 1060
[01:25:27 CEST] <philipl> With my 1080 I'm not seeing a difference
[01:25:34 CEST] <philipl> I'm getting ~500 either way
[01:26:10 CEST] <BtbN> hm, there should not be any difference between those card
[01:26:16 CEST] <philipl> might just be raw horsepower
[01:26:28 CEST] <philipl> memcpy could be faster.
[01:26:30 CEST] <BtbN> It's the exact same video engine silicon on both
[01:26:40 CEST] <philipl> Yes, but it's the memcpy that you're speeding up, right?
[01:27:02 CEST] <BtbN> Probably, and encoding.
[01:27:17 CEST] <BtbN> As it operates on 256byte blocks of data
[01:27:36 CEST] <BtbN> so it needs at least a 256 bytes alignment for optimal performance
[01:27:59 CEST] <BtbN> The memcpy should also benefit from it
[01:27:59 CEST] <philipl> Makes sense.
[01:28:24 CEST] <philipl> Well, I get 500fps and 100% video utilization (according to nvidia-settings) either way.
[01:28:29 CEST] <philipl> For whatever it's worth.
[01:28:44 CEST] <BtbN> Are you running the card on native PCIe 3.0?
[01:28:47 CEST] <philipl> yes
[01:28:49 CEST] <BtbN> Cause my board only supports 2.0
[01:29:00 CEST] <BtbN> so the interface is only half as fast
[01:29:17 CEST] <philipl> Yeah, but that's only for pushing the frames to cuvid. After that it should stay on the card in both cases.
[01:29:24 CEST] <BtbN> And a 1080 has _way_ faster RAM, GDDRX5 instead of just GDDR5
[01:29:55 CEST] <philipl> Right. So I suspect the memory operations are so fast that it is encoder limited without optimal alignment.
[01:29:56 CEST] <BtbN> And I can see that making a difference for copying the frame around in VRAM
[01:30:24 CEST] <philipl> Anyway, no harm. Just curious as to the difference between our configurations.
[01:30:43 CEST] <BtbN> Aligning it does no harm, except for a tiny bit of wasted VRAM
[01:31:52 CEST] <philipl> Right.
[01:32:11 CEST] <philipl> If you want to see VRAM consumption - cuvid + 8k video in mpv is 2+ GB
[01:34:18 CEST] <BtbN> Yeah, 8K frames are huge
[02:10:25 CEST] <Chloe> Timothy_Gu: would .container work on the website as well? I found myself having to set margin: 0 auto; to get it to center on ffmpeg.org
[03:29:44 CEST] <cone-090> ffmpeg 03Carl Eugen Hoyos 07master:635a89b0bb36: Changelog: Compress slightly to improve readability.
[05:23:07 CEST] <Timothy_Gu> Chloe: IIRC the website doesn't use the FFmpeg build system
[05:23:16 CEST] <Timothy_Gu> i.e. it has its own init files, etc.
[05:23:45 CEST] <Chloe> right ok, I'll look into it in ffmpeg-web as well then
[05:24:05 CEST] <Timothy_Gu> Chloe: the actual build scripts are actually not public
[05:24:30 CEST] <Chloe> heresy
[05:24:48 CEST] <Timothy_Gu> It was more for security reasons (which sound dubious)
[05:25:13 CEST] <Chloe> can I PM you quick (unrelated)?
[05:25:18 CEST] <Timothy_Gu> sure
[07:18:09 CEST] <cone-075> ffmpeg 03James Almer 07master:42111e8543b1: avcodec: fix arguments on xmm/neon clobber test wrappers
[11:49:15 CEST] <ubitux> Chloe: you have something that eat your trailing space after '--' in your mail signature
[12:57:01 CEST] <wm4> how is multimedia so fucking broken that even timestamps have to be considered as noisy, unreliable data that needs to be filtered and smoothed before it's useful
[14:06:37 CEST] <Chloe> ubitux: in patches, or just normal replies? because I type my signature every time for normal replies. (didn't know I needed a space after '--')
[14:07:31 CEST] <ubitux> normal replies
[14:07:39 CEST] <ubitux> yes you need a space or it's not recognized as a signature
[14:09:33 CEST] <Chloe> I'll try set it up as an actual signature, I couldnt get it working before but I'll give it another shot
[14:20:50 CEST] <michaelni> BBB, should 0f88b3f82fafd536979993aeaafcb11a22266dbd be backported to any releases like 2.8 ?
[14:23:58 CEST] <BBB> your choice. I dont think it would hurt
[14:24:12 CEST] <BBB> but its a relatively crazy case since it only affects very tiny images
[14:24:34 CEST] <BBB> and one-byte overreads are not generally very interesting for taking over a computer because they only give you one byte to take over
[14:24:40 CEST] <BBB> which is typically not enough
[14:25:05 CEST] <BBB> but since it doesnt hurt I guess yes if you want to backport, I would support that
[14:26:38 CEST] <michaelni> ok
[14:30:41 CEST] <michaelni> locally backported to releases 2.8 .. 2.2 (does not apply to 2.1)
[14:53:01 CEST] <Chloe> >Since you are not an idiot, you will use it.
[15:01:50 CEST] <wm4> who wants to accuse him of CoC violation, since if you not follow his "advice" he implicitly called you an idiot?
[15:40:47 CEST] <Compn> wm4 : vendor lock in
[15:41:44 CEST] <wm4> not the correct reply
[15:45:46 CEST] <Chloe> So I give him an *example* an he's like 'no this example isnt real', I'm just going to ignore advice from him (does this sound reasonable?).
[15:46:06 CEST] <wm4> yes
[15:47:14 CEST] <ubitux> ffmpeg devs nitpicking about web development practice
[15:47:21 CEST] <ubitux> what an era to be in
[15:49:47 CEST] <wm4> considering the era, it's a wonder there's no official ffmpeg js port yet
[15:50:57 CEST] <BtbN> I think Twitch uses an emscripten compiled ffmpeg to make browsers capable of HLS
[15:56:18 CEST] <Chloe> Using the container class works nicely (and also uses pixels); Timothy_Gu, thanks for the suggestion.
[15:56:54 CEST] <cone-067> ffmpeg 03Marton Balint 07master:d14b240ecfc5: ffplay: use decoder avctx for decoded subtitle width/height
[15:56:54 CEST] <cone-067> ffmpeg 03Marton Balint 07master:4fdcd2f1889a: ffplay: remove unused viddec_width/viddec_height
[16:55:14 CEST] <bencoh> BtbN: hm, really? but do they actually decode in software using the resulting emscripten-compiled avcodec, or do they somehow pass it to the "native" browser decoder?
[16:57:17 CEST] <BtbN> they remux from mpeg-ts to segmented mp4 with it.
[16:57:28 CEST] <bencoh> and how do they pass the result?
[16:57:32 CEST] <BtbN> MSE
[16:57:38 CEST] <bencoh> nice
[16:57:41 CEST] <BtbN> Browsers do support mp4, but not ts
[16:57:45 CEST] <bencoh> yeah
[16:57:56 CEST] <wm4> lol
[16:57:58 CEST] <BtbN> So intead of making their backend support DASH, they translate from HLS to DASH in JavaScript
[16:59:02 CEST] <BtbN> Because of that stupidity Microsoft Edge is now the best Browser to watch Twitch in.
[16:59:21 CEST] <BtbN> It's, except for Safari, the only Browser which natively plays HLS, so they don't need the JS remuxer.
[16:59:41 CEST] <bencoh> I actually thought about something like that back in early-MSE age, but for live mpeg-ts .... but never had the time to implem anything
[16:59:46 CEST] <ritsuka> btw apple now supports fragmented mp4 in hls, so they could just use mp4&
[16:59:47 CEST] <bencoh> (and I kinda hate web dev)
[17:00:08 CEST] <BtbN> bencoh, Twitch is live?
[17:00:43 CEST] <Chloe> BtbN: it's a livestreaming site
[17:00:44 CEST] <bencoh> no I mean, real live TS stream, not some hls/hds/dash/progressive-like thing
[17:00:48 CEST] <BtbN> ritsuka, aparently they don't want to touch their CDN and stuff to make it use segmented mp4/DASH instead of HLS.
[17:01:22 CEST] <BtbN> HLS is a live ts stream. Just downloaded in fragments via http.
[17:01:42 CEST] <bencoh> except you'd have to cut it right, or ....
[17:02:04 CEST] <BtbN> No need for that. You don't even have to cut at keyframes
[17:02:10 CEST] <kierank> 3:59 PM <bencoh> (and I kinda hate web dev)
[17:02:11 CEST] <kierank> amen
[17:02:23 CEST] <BtbN> Starting the stream might take a few segments to pass by, but it generally works
[17:02:25 CEST] <bencoh> BtbN: you'd need to if you want to properly remux it as mp4
[17:02:37 CEST] <bencoh> at least I'd suppose so, I never went down that road
[17:02:49 CEST] <BtbN> The clean solution for Twitch would have been to migrate from HLS to DASH
[17:02:57 CEST] <BtbN> Every browser supports that.
[17:03:07 CEST] <BtbN> But they seem to be affraid to touch their backend
[17:03:14 CEST] <wm4> no dash demux impl. in ffmpeg yet?
[17:03:34 CEST] <bencoh> kierank: :-)
[17:32:34 CEST] <cone-067> ffmpeg 03Timo Rothenpieler 07master:b7bd5b979422: configure: define posix source on cygwin
[17:39:52 CEST] <BtbN> https://github.com/BtbN/FFmpeg/commit/e68383408e591c5bd006def60a06509d145349b1 does someone have a better algorithm/idea for this?
[17:43:03 CEST] <wm4> is ++a+a even defined?
[17:43:42 CEST] <BtbN> hm, good question. I'm expecting a++; a += (a == 0);
[17:46:07 CEST] <wm4> also it should happen what happens with av_next_pow2(0) or av_next_pow2(higher than 2^ 63)
[17:46:27 CEST] <BtbN> that's what the + (a == 0) is for.
[17:46:39 CEST] <BtbN> It makes it return the next power of two even for 0. Which is 1.
[17:50:37 CEST] <BtbN> Also works, kind of, as expected for UINT64_MAX - x
[17:50:39 CEST] <BtbN> returning 1.
[17:51:20 CEST] <wm4> should be documented then
[17:51:45 CEST] <wm4> also I guess you could reuse the code for log2, and then simply check if it's a power of 2 or not
[17:52:57 CEST] <BtbN> To return immediately if it is a power of two?
[17:54:35 CEST] <kierank> isn't it a case of doing something like clz
[17:54:52 CEST] <kierank> and then flipping all the other bits
[17:54:55 CEST] Action: kierank thinks out loud
[17:56:09 CEST] <BtbN> hmm
[17:56:21 CEST] <BtbN> that would be limited to 32bit I think, but enough for my use
[17:56:45 CEST] <wm4> for 64 bit you do it just twice
[17:56:45 CEST] <kierank> iirc there is a 64-bit clz
[17:57:22 CEST] <kierank> i wrote one but can't rmeember if it was for x264 or ffmpeg
[17:58:21 CEST] <BtbN> well, I'm using it for aligning a resolution
[17:58:30 CEST] <BtbN> those exceeding 32bit is quite unlikely
[17:59:52 CEST] <jamrial> kierank: yeah, ff_ctz and ff_ctzll
[18:00:23 CEST] <jamrial> wait no, my bad, that's ctz
[18:00:49 CEST] <jamrial> there's ff_clz but no 64bit one it seems
[18:01:25 CEST] <jamrial> pretty trivial to add if needed i guess
[18:04:12 CEST] <BtbN> ff_clz returns strange value for 0.
[18:04:17 CEST] <BtbN> Aparently that's not well defined.
[18:04:27 CEST] <BtbN> ff_clz(0) -> 4195756
[18:07:25 CEST] <nevcairiel> clz with zero has no really defined meaning, no
[18:08:12 CEST] <BtbN> a <= 1 ? 1 : 1 << (32 - ff_clz(--a))
[18:08:15 CEST] <BtbN> that seems to do it.
[18:08:22 CEST] <jamrial> BtbN: you're using gcc? that's the output of __builtin_clz(0) then. try ff_clz_c
[18:09:08 CEST] <BtbN> a resolution of 0 rarely ever happens. So I'll not care about that special case.
[18:12:43 CEST] <kierank> bencoh: there is FFALIGN
[18:12:49 CEST] <kierank> BtbN: 
[18:13:08 CEST] <BtbN> kierank, that doesn't align to the next power of two though, which is where CUDA is fastest
[18:13:27 CEST] <kierank> oh i see
[18:18:30 CEST] <BtbN> kierank, actually, i need both...
[18:19:05 CEST] <BtbN> also needs to be 256 byte aligned. Which is a given for large resolutions with next_pow2, but not for small ones
[18:48:27 CEST] <BtbN> wm4, what do you mean, "No decoder should ever have set the pts field
[18:48:27 CEST] <BtbN> before"?
[18:48:51 CEST] <BtbN> I think most of them do
[18:49:12 CEST] <BtbN> oh wait, that's purely about audio.
[18:50:36 CEST] <nevcairiel> no, decoders dont typically set that field
[18:50:39 CEST] <nevcairiel> not e ven video
[18:51:40 CEST] <nevcairiel> i tried t ofind occurances that set it, and it was only like or two external library ones that did it weirdly
[18:51:40 CEST] <BtbN> hm, so cuvid and openh264 are the exceptions?
[18:52:32 CEST] <wm4> if they do, they should be fixed
[18:52:50 CEST] <wm4> also, does cuvid or something similar use the new decode API?
[18:52:51 CEST] <BtbN> cuvid kind of has to set the pts itself, because of the opaque reordering
[18:52:59 CEST] <BtbN> wm4, yes, cuvid uses it.
[18:53:05 CEST] <BtbN> So far the only decodoer from what i know.
[18:53:07 CEST] <wm4> because the new decode API doesn't set best_effort_timestamp from pkt_pts
[18:53:15 CEST] <wm4> so ffmpeg.c behavior might be confusing
[18:53:25 CEST] <wm4> I'll probably post a patch to change this
[18:53:32 CEST] <BtbN> cuvid sets both pts and pkt_pts
[18:53:46 CEST] <wm4> yeah but ffmpeg.c only reads best_effort_timestamp
[18:54:00 CEST] <wm4> and only the "old" decode API sets best_effort_timestamp automatically
[18:54:18 CEST] <BtbN> So that's just another field I should fill the pts in?
[18:56:57 CEST] <wm4> BtbN: well, patch sent
[18:59:30 CEST] <BtbN> so for cuvid to behave correctly, I should ignore frame->pts exists, set only pkt_pts, and copy that value to best_Effort_timestamp?
[19:01:50 CEST] <philipl> It's great to have so many wrong ways to do things.
[19:02:17 CEST] <BtbN> Why does the pts field even exist if nothing uses it?
[19:02:42 CEST] <nevcairiel> its used for encoding
[19:02:46 CEST] <wm4> and libavfilter
[19:02:59 CEST] <wm4> Libav changed it so that decoding also sets this field
[19:03:18 CEST] <BtbN> so if one is to use libavcodec for transcoding, he has to manually transfer pkt_pts to pts in between?
[19:03:30 CEST] <nevcairiel> you also have to rescale the timebase and whatnot
[19:03:35 CEST] <nevcairiel> so manual steps are required anyway
[19:06:11 CEST] <Chloe> we allow random cosmetic patches, right?
[19:06:21 CEST] <Chloe> (provided they are separate, and look like they have some use)
[19:30:30 CEST] <Chloe> ubitux: regression as of 2.0.4 then?
[19:30:41 CEST] <Chloe> SDL2 ffplay 'f' key thing
[19:30:44 CEST] <ubitux> seems so
[19:31:15 CEST] <ubitux> or misusage revealed by it
[19:31:18 CEST] <ubitux> dunno
[19:34:10 CEST] <jkqxz> BtbN:  That cuda align-to-power-of-two patch feels dubious to me in the large cases (going up to 2048 or 4096?).  Explicitly forcing that alignment has nasty effects on associative caches because nearby pixels all end up using the same cache index, and so evict each other.
[19:34:44 CEST] <jkqxz> Does performance really continue to increase above 256-byte alignment?
[19:36:18 CEST] <nevcairiel> 256 uses 2048 or 4096 for popular sizes anyway (1920 / 3840)
[19:37:06 CEST] <nevcairiel> or not, 3840 is aligned
[19:37:26 CEST] <BtbN> For 1280x720, where it has probably by far the largest impact, yes.
[19:38:27 CEST] <BtbN> It's the difference between 80% and 100% Video Engine load.
[19:38:54 CEST] <BtbN> Keep in mind that this aligns GPU Memory, not normal CPU memory
[19:52:13 CEST] <philipl> BtbN: We seem to be beating the performance numbers in the nvidia docs. That's always nice.
[19:56:24 CEST] <BtbN> sure they just weren't ever updated for pascal?
[19:59:32 CEST] <philipl> they have a pascal column
[20:39:28 CEST] <BtbN> Watching Twitch in Edge and using cuvid + nvenc at the same time is not something the driver likes.
[20:39:30 CEST] <BtbN> It crashed.
[21:04:12 CEST] <philipl> heh.
[21:07:23 CEST] <BtbN> philipl, there's a new undocumented symbol in the nvcuvid.dll
[21:07:27 CEST] <BtbN> cuvidResetDecoder
[21:44:41 CEST] <philipl> BtbN: intriguing.
[21:50:13 CEST] <philipl> BtbN: Well, it's not a trivial function. I guess it takes some flags to indicate how much to reset it.
[21:50:46 CEST] <BtbN> Google has litteraly nothing about it
[21:51:30 CEST] <philipl> I guess it's not a surprise. The implementation is distributed with the driver and so can get updated before there's an SDK release to go with it.
[21:51:40 CEST] <philipl> Hopefully the next one documents it and flush can become less heavyweight
[22:08:05 CEST] <BtbN> Hm, using Cuda from cygwin is... complicated
[22:08:16 CEST] <BtbN> The cuda header thinks it's on linux, and thus uses cdecl instead of stdcall
[22:08:48 CEST] <rcombs> >i386 calling conventions
[22:10:01 CEST] <BtbN> fixed it by just editing cuda.h to hardcode it to stdcall.
[22:11:00 CEST] <philipl> send a patch...
[22:11:27 CEST] <BtbN> patch for what?
[22:11:29 CEST] <BtbN> The NVidia?
[22:11:32 CEST] <philipl> a joke.
[22:11:37 CEST] <nevcairiel> screw cygwin and its "I'm linux" and "I'm windows" at the same time =p
[22:11:56 CEST] <nevcairiel> it breaks so many things that actually work on windows natively otherwise
[22:12:04 CEST] <BtbN> I wonder if ffmpeg would build in cygwin if It's put in the mode where it defined _WIN32
[22:12:17 CEST] <BtbN> Probably not
[22:18:40 CEST] <BtbN> "cuvidCreateDecoder(&cudec, &cuinfo) failed -> CUDA_ERROR_NOT_SUPPORTED: operation not supported"
[22:18:42 CEST] <BtbN> nope
[22:18:47 CEST] <BtbN> something's still not right.
[22:18:56 CEST] <BtbN> And I don't feel like finding out what
[22:19:19 CEST] <Timothy_Gu> BtbN: what are you trying to do?
[22:19:27 CEST] <BtbN> Use cuvid from cygwin
[22:20:00 CEST] <BtbN> Getting it to builds needs a lot of hackery already
[22:20:05 CEST] <Timothy_Gu> lol
[22:20:06 CEST] <BtbN> But then it doesn't even work with said error
[22:21:04 CEST] <BtbN> Had to generate new cuda implibs, because the ones coming with the CUDA sdk don't work with gcc. Not even mingw
[22:21:36 CEST] <Timothy_Gu> sounds tedious
[22:23:44 CEST] <BtbN> they generate a strange linker error with gcc, "relocation truncated to fit R_X86_64_PC32 against symbol cuInit"
[22:24:52 CEST] <BtbN> And the usual way of generating one by generating a def from the dll with dlltool doesn't work either
[22:25:03 CEST] <BtbN> because according to dlltool, nvcuda.dll does not have any symbols.
[22:25:43 CEST] <BtbN> dependency walker can generate a list of imports though, so I made a .def from that, implib via dlltool, and it works.
[22:27:18 CEST] <RiCON> BtbN: the .lib files work with mingw here
[22:27:19 CEST] <RiCON> weird
[22:27:29 CEST] <BtbN> From CUDA SDK 8?
[22:27:42 CEST] <RiCON> pretty sure
[22:27:47 CEST] <RiCON> i'm using msys2 though, not cygwin
[22:28:01 CEST] <BtbN> I have the same issue with my mingw crosscompiler on native linux
[22:28:12 CEST] <BtbN> So I guess it's a general issue with gcc and those libs
[22:28:26 CEST] <RiCON> aren't you trying to use 32-bit libs in 64-bit or something?
[22:28:30 CEST] <BtbN> no
[22:28:56 CEST] <RiCON> i'll try again, just --enable-{cuda,cuvid}, right?
[22:29:27 CEST] <BtbN> just --enable-cuda --enable-nonfree
[22:29:34 CEST] <BtbN> and extra-cflags/ldflags for it to find stuff
[22:32:04 CEST] <philipl> BtbN: You inspired me to run strings on libnvcuvid. There are references to P016 and NV24 conversions internally.
[22:32:07 CEST] <philipl> Someday.
[22:32:43 CEST] <BtbN> They are probably there to convert them to NV12 internally
[22:32:49 CEST] <philipl> Yep. That's what it says :-)
[22:32:50 CEST] <BtbN> Which it obviously has to be doing
[22:38:53 CEST] <RiCON> BtbN: configure and make worked fine
[22:39:00 CEST] <BtbN> strange
[22:39:03 CEST] <BtbN> which gcc you got there?
[22:39:17 CEST] <RiCON> 6.2.1
[22:39:18 CEST] <BtbN> x86_64-w64-mingw32-gcc (GCC) 5.4.0
[22:40:24 CEST] <RiCON> https://i.fsbn.eu/MDLZ.txt
[22:42:36 CEST] <BtbN> I'll just stick to msvc for now.
[22:43:13 CEST] <BtbN> colors and keyboard interaction, and cygwin paths, would just have been nice to have
[22:43:37 CEST] <RiCON> that's what winpty is for
[22:47:03 CEST] <BtbN> Do I just put those in my cygwin bin dir, and stuff works if prefixed with winpty?
[22:48:19 CEST] <RiCON> yes
[22:49:23 CEST] <RiCON> it should even convert unix paths to windows in arguments
[22:50:14 CEST] <BtbN> cygwin should package that.
[22:51:50 CEST] <BtbN> It doesn't convert the paths
[22:51:58 CEST] <BtbN> but colors and keyboard interaction work
[22:52:54 CEST] <RiCON> maybe it's just msys2's that does that then
[22:53:23 CEST] <RiCON> until a few months it even tried to convert URLs so it was kinda useless
[22:54:47 CEST] <Chloe> michaelni: did you see the split version of the the developer guidelines patch?
[23:00:53 CEST] <BtbN> hm, I'm indeed unable to find any difference in speed/engine load compared to a plain FFALIGN 256 now. Probably something else was messing up the comparison
[00:00:00 CEST] --- Mon Oct  3 2016


More information about the Ffmpeg-devel-irc mailing list