[Ffmpeg-devel-irc] ffmpeg-devel.log.20170613

Wed Jun 14 03:05:04 EEST 2017

[04:18:41 CEST] <cone-100> ffmpeg 03Michael Niedermayer 07master:db93fd74e4be: avcodec/golomb: Assert that the input is not too large in set_ue_golomb()
[04:18:41 CEST] <cone-100> ffmpeg 03Michael Niedermayer 07master:4f9e958b04c6: avcodec/put_bits: Implement put_bits32() in a single pass instead of 2 passes writing 16bits each
[12:50:53 CEST] <atomnuker> do we still need libtwolame support?
[12:51:10 CEST] <atomnuker> we've had our own encoder for a very long time and its from what I know quite alright
[12:53:27 CEST] <nevcairiel> twolame is higher quality then ffmp2
[12:54:33 CEST] <atomnuker> I guess when you're encoding mpeg2 you need all the quality you can get
[13:03:19 CEST] <kierank> twolame is better iirc
[13:12:35 CEST] <JEEB> was there any OSS bit stream analyzer that would let me check if what I think was set in VUI was indeed set?
[13:12:38 CEST] <JEEB> (H.264)
[13:13:15 CEST] <kierank> h264parse
[13:13:32 CEST] <kierank> or just stick some printfs in ffmoeg
[13:14:24 CEST] <JEEB> hmm, there also seems to be something called https://github.com/shi-yan/H264Naked
[13:14:39 CEST] <JEEB> which is a  Qt based interface on top of libh264stream
[13:14:45 CEST] <jkqxz> <https://lists.libav.org/pipermail/libav-devel/2017-May/083758.html>
[13:15:35 CEST] <JEEB> funky
[13:24:00 CEST] <jkqxz> wm4:  Does this rockchip thing have discussion somewhere else?
[13:25:04 CEST] <wm4> I think I did most of this in private communication with longchair
[13:26:02 CEST] <wm4> jkqxz: should we make him join here?
[13:26:38 CEST] <J_Darnley> WTF!?  I can apparently fix the problem by removing the perm_type assignment.
[13:27:26 CEST] <jkqxz> wm4:  Sure, if that's possible.
[13:28:12 CEST] <wm4> LongChair: hi, jkqxz is markt
[13:28:26 CEST] <LongChair> Hi everyone ;)
[13:28:34 CEST] <J_Darnley> WTF?!  It isn't even runnig my code
[13:28:41 CEST] <LongChair> i think i talked to him already, but that was a while ago :)
[13:28:42 CEST] <J_Darnley> WTF?!  It isn't even running my code
[13:29:33 CEST] <LongChair> jkqxz: available to answer any specific questions if you have any :)
[13:29:39 CEST] <jkqxz> Hi.  Yeah, I recognised the name, but I couldn't remember exactly what that was about.
[13:30:07 CEST] <LongChair> we talked about the framecontext iirc
[13:30:16 CEST] <LongChair> but that is out of topic :)
[13:30:29 CEST] <jkqxz> My main concern is that adding a new format like this is tying us to a specific format, and will propagate into everything else as people want to add more features.
[13:30:49 CEST] <jkqxz> Well, I'd prefer if hwcontext were not out of topic!
[13:31:36 CEST] <jkqxz> Because, for example, someone will come along and say "I want to render subtitles on top of this thing", and then add some sort of crazy support in libavfilter.
[13:31:45 CEST] <LongChair> Well, the main thing is that outside rockchip specific work, there are some other perspectives that got me try to get something standard
[13:31:46 CEST] <jkqxz> With no commonality with anything else.
[13:32:47 CEST] <jkqxz> So I'd kindof prefer a hwcontext form, because then you have things like mapping and transfer in a common way which will avoid specific support for this DRM setup in random places.
[13:32:58 CEST] <LongChair> drmprime is mostly intended to render to a drm layer which is an overlay. it's the only way to achieve a proper performance on embedded devices
[13:33:49 CEST] <LongChair> also when deving this i found out that both drm and egl / dmabuf rendering had a lot of things in common 
[13:33:49 CEST] <jkqxz> I admit the hwcontext implementation is kindof silly, though, because there isn't a device and the frames context doesn't really contain much.
[13:33:54 CEST] <J_Darnley> FUCK!  If I assign NULL to those function pointer it then segfaults
[13:34:10 CEST] <jkqxz> This will also get used for encode on those devices, though?
[13:34:37 CEST] <LongChair> jkqxz: yeah i didn't find any existing pixfmt that would allow to have that type of frame information, that is one that allows renderer to use zerocopy
[13:34:53 CEST] <wm4> jkqxz: well I added one for osx anyway
[13:34:57 CEST] <jkqxz> (Which probably doesn't support hardware overlays, so you have to actually blend.)
[13:35:03 CEST] <wm4> it does almost nothing (dummy device etc.)
[13:35:12 CEST] <LongChair> i don't think drm is used for any encoding process
[13:35:28 CEST] <jkqxz> It would still be a new DRM pixfmt.
[13:35:57 CEST] <LongChair> just that that set of info (fds, strides, offsetts) are used for drm, egl dmabuf, and also v4L2 in some modes
[13:36:35 CEST] <LongChair> so it would make sense to have some "standard" pixfmt that can help renderers to handle a single format
[13:36:50 CEST] <LongChair> rather than reivneting the wheel for every platform :)
[13:37:22 CEST] <jkqxz> Yes, there would still be a new DRM pixfmt.  I'm just wanting to set up the metadata so that we don't reinvent other wheels.
[13:38:56 CEST] <LongChair> Sorry but i'm not sure to understand what you are meaning with "metadata" :)                                                                                                                                                                                                                                                         
[13:39:34 CEST] <wm4> the DRM fourcc, strides, offsets, I think
[13:39:46 CEST] <jkqxz> Whatever is sufficient to let the existing av_hwframe_map() and related functions work.
[13:40:05 CEST] <LongChair> dmabuf can be mmaped
[13:40:25 CEST] <LongChair> at least the fds
[13:40:47 CEST] <jkqxz> Indeed.  So the hwcontext implementation will do that in av_hwframe_map() and then vf_hwmap will work magically for everyone.
[13:41:16 CEST] <wm4> jkqxz: do you want to do mpp->drm as mapping between different pixfmts?
[13:41:25 CEST] <J_Darnley> I hate this DSP crap.  Why is permutation an option?!  Why is it not binary?  Why can some function be NULL and not others?
[13:42:26 CEST] <jkqxz> Not sure?  If the MppFrame really is useless then having the decoder output the DRM pixfmt directly is reasonable, but if it has some other use then maybe.
[13:43:09 CEST] <wm4> jkqxz: I don't think exporting mpp has any use at this point
[13:43:16 CEST] <wm4> but Indon't know details
[13:43:44 CEST] <LongChair> MPPFrame is just what comes out of MPP, the av_drmprime forwards the interesting information
[13:44:21 CEST] <wm4> do the rockchip guys have any APIs for postproc yet? like deint
[13:44:44 CEST] <LongChair> mmaping the av_drmprime fds which are coming from the MPPFrame would allow to get a planar surface that matches the FOURCC format
[13:45:20 CEST] <wm4> (just that the 10 bit format is useless)
[13:45:35 CEST] <LongChair> true
[13:46:47 CEST] <LongChair> their 10 bits format, cannot be really processed as it's packet 10bits 
[13:46:54 CEST] <LongChair> packed*
[13:47:01 CEST] <wm4> (grrr)
[13:47:34 CEST] <atomnuker> we have a decoder for packed 10bits now, though only does 422
[13:48:14 CEST] <LongChair> their decoder will basically output standard 8 bits NV12 or that crappy format which is non standard. the reason for this format is that it's memory efficient and allows to significantly reduce the frames footprint in memory and reduces the data transfer. sepcially given 10 bits is mostly used with 4K HEVC :)
[13:49:05 CEST] <LongChair> but it's non standard. that one can be processed by the hardware which is behind the drm implementation. but useless in software
[13:49:50 CEST] <wm4> I wonder what they'll do with 12 bit data
[13:50:15 CEST] <LongChair> packed 12 bits ? :)
[13:50:58 CEST] <LongChair> well they could probably move to NV12 with 16bits stuff in later chips, because later chips could use AFBC (Arm FrameBuffer Compression)
[13:51:35 CEST] <LongChair> that would release the pressure on memory by gettting compressed frames ... but then we'll have the same issue with reading AFBC buffers :)
[13:54:03 CEST] <LongChair> anyways NV12 could probbaly be used easilly mmpaing those fds, NV12 (10bits packed) would imho be useless because of all the processing it would involved
[13:54:11 CEST] <LongChair> mmaping*
[13:55:11 CEST] <BtbN> You mean P010? NV12 is 8bit
[13:55:52 CEST] <wm4> no, mpp uses an actually packed 10 bit format
[13:55:53 CEST] <LongChair> isn't P010 NV12 with 16bits per component ? 
[13:56:02 CEST] <wm4> similar to P010, but packed per plane
[13:56:50 CEST] <BtbN> that sounds kinda insane
[13:57:03 CEST] <wm4> because it is
[13:57:20 CEST] <wm4> and there's some danger theyr're going to use P010 as fourcc for it in the kernel
[13:58:18 CEST] <LongChair> it was formerly called NV12_10 in their kernel, not sure how they will upstream that 
[13:59:55 CEST] <jkqxz> So something like <http://sprunge.us/XIai>.
[14:00:44 CEST] <LongChair> that paste doesn't work here :)
[14:00:57 CEST] <jkqxz> The DRM pixfmt would then be a single pointer to an AVDRMFrameDescriptor, which is also a buffer reference.
[14:01:23 CEST] <jkqxz> The nonempty device and frames contexts are unnecessary for that case, but would let hwupload work.
[14:01:28 CEST] <jkqxz> *for the decoder case
[14:02:13 CEST] <jkqxz> The decoder can create a trivial frames context and use that, it doesn't really matter (there is no preallocation at all, so you can't map to fixed-pool things anyway).
[14:02:48 CEST] <jkqxz> It doesn't work?  What would work?
[14:03:27 CEST] <jkqxz> <http://ixia.jkqxz.net/~mrt/hwcontext_drm.h>
[14:05:51 CEST] <jkqxz> The hwcontext implementation would want to implement at least transfer and map-to-memory, which would give you av_hwframe_transfer() and av_hwframe_map(), allowing a lot of use in lavfi.
[14:07:12 CEST] <wm4> in theory
[14:07:25 CEST] <wm4> but in practice it's too slow, and the 10 bit format shirts the bed
[14:08:17 CEST] <jkqxz> Even just writing subtitles on part of an existing image is too slow?
[14:08:42 CEST] <jkqxz> (Is there a cheap SBC with this thing on I could just buy to try it?)
[14:10:02 CEST] <LongChair> ASUS Tinkerboard
[14:14:56 CEST] <LongChair> jkqxz: well not sure if it's relevant, but we wouldn't use the same image for subtitles on such devices.
[14:15:23 CEST] <LongChair> drm has layers that can be combined usually in hardware
[14:15:49 CEST] <LongChair> typically in mpv we'd use the primary layer which is used through opengl to render subittles
[14:15:58 CEST] <LongChair> the video layer would be in the back 
[14:16:42 CEST] <LongChair> also on such devices, rendering in high resolution (ie 4K) is usally a bottleneck for performance
[14:17:01 CEST] <wm4> yeah, the hw will also do forced rgb conversion and such
[14:17:14 CEST] <wm4> which is not nice, but typical primitive hwdec API
[14:17:50 CEST] <LongChair> we would usually just have a video layer that is purely renderered by hardware (in rockchip case this doesn't even go thru GPU, but directly from memory to another IP which is used for video scaling)
[14:18:15 CEST] <LongChair> GPU on such devices can usually not handle Opengl rendering at proper framerate in 4K 
[14:19:11 CEST] <LongChair> so we render video to a layer via drm in 4K, for the rest we render GUI / subtitles to primary drm layer which is sized to a lower res (ie 1080) and uspcaled to 4K on top via drm as well 
[14:19:52 CEST] <LongChair> so GPU will only handle the opengl stuff in 1080p, rest is done by scaler / compositor thru drm api 
[14:20:43 CEST] <jkqxz> 50GBP on Amazon.  Ok, I've ordered one.
[14:20:51 CEST] <wm4> jkqxz is (at least currently) mostly concerned with hw transcoding (decoding + encoding), so he's thinking about a way to render subs directly to the surfaces
[14:21:00 CEST] <LongChair> in the same area, other devices would use compressed buffers on embedded devices (AFBC is the most common), so adding stuff on top  of the buffer would mean decompress / add / recompress
[14:21:25 CEST] <LongChair> not mentionning that AFBC compression might be non public stuff (not sure)
[14:22:43 CEST] <LongChair> I understand what transcoding required. Although i see a global performance issue there :)
[14:24:02 CEST] <jkqxz> When you're transcoding 4K it's probably because you needed to downscale for some weaker device anyway.
[14:25:50 CEST] <jkqxz> And I'm still not sure what the performance issue with writing on a small part of the surface is going to be anyway.
[14:27:28 CEST] <jkqxz> (Maybe if it turns out to be some sort of crazy uncached memory, but I imagine mobile stuff all using the same memory will try to act sensibly there.)
[14:30:40 CEST] <jkqxz> So, is there any reason not to have the hwcontext formulation?  For your uses, it shouldn't be any overhead at all (you can just ignore it if you only want the DRM objects from the decoder).
[14:35:08 CEST] <wm4> so your point is mostly that we should drop the new libavcodec public header, and move it to hwcontext
[14:35:25 CEST] <wm4> but the struct would essentially survive?
[14:35:54 CEST] <jkqxz> Yes.
[14:36:12 CEST] <cone-228> ffmpeg 03Matthieu Bouron 07master:3839580b7134: lavc/mediacodecdec: switch to the new generic filtering mechanism
[14:37:05 CEST] <jkqxz> As I said above, my concern isn't that there is any problem with this method, it's that when people want to extend it it will be horrible if we have separate API everywhere.
[14:37:32 CEST] <wm4> yeah
[14:40:38 CEST] <wm4> LongChair: can you make a change based on that?
[15:07:31 CEST] <JEEB> mateo`: nice, I will have to test out the bsf'ification
[15:09:01 CEST] <mateo`> JEEB: I tested on my end and it works as expected (hope i haven't missed anything)
[15:09:24 CEST] <mateo`> JEEB: did you find out if ndk r15 was the cause of your issues with mediacodec the other day ?
[15:09:59 CEST] <mateo`> I did a quick test and ffmpeg was still working with r15 for me
[15:10:29 CEST] <wm4> mateo`: btw. I got a report that mediacodec fails with h264 if avformat_find_stream_info() is skipped on HLS (and presumably .ts) - any real reason for that?
[15:10:33 CEST] <kierank> michaelni: can you help J_Darnley if you have time
[15:11:20 CEST] <cone-228> ffmpeg 03Paul B Mahol 07master:f85cad799b52: avfilter: properly set SAR for A->V filters
[15:11:57 CEST] <mateo`> wm4: because avctx->{width, height, extradata} are mandatory probably
[15:12:14 CEST] <wm4> extradata? for annexb?
[15:12:20 CEST] <wm4> that would be batshit insane
[15:12:28 CEST] <wm4> and I don't understand the need for width/height either
[15:12:53 CEST] <michaelni> J_Darnley, can i help somehow ?
[15:13:16 CEST] <mateo`> I though at least width/height were mandatory when I started writing the wrapper
[15:13:28 CEST] <wm4> I mean the SPS is going to contain that, and a decoder will have to be able to react to SPS and resolution changes too
[15:13:50 CEST] <wm4> and if it's really required, why the fuck did Google make it annex b and not avcc
[15:13:57 CEST] <kierank> michaelni: seel ml
[15:14:01 CEST] <kierank> see*
[15:15:00 CEST] <JEEB> mateo`: huh, funky. I do see that el goog failed in other things with r15 looking at the bug reports
[15:15:01 CEST] <mateo`> it's not required by MediaCodec, it's required by the wrapper to help us detect at init time if the underlying decoder support the stream
[15:15:21 CEST] <JEEB> but if r15 works for you then that's rather funk'ologic
[15:15:28 CEST] <wm4> ah
[15:15:37 CEST] <mateo`> JEEB: i was not able to compile the whole stack i use at work due to header changes, but ffmpeg did
[15:15:40 CEST] <wm4> well you could just fail at runtime
[15:16:21 CEST] <mateo`> indeed, I guess it was easier to implement this way at the time
[15:17:18 CEST] <mateo`> well recently i had to patch the decoder to remove the requirements of width/height but i had to deduces it from extradata at init times
[15:17:40 CEST] <mateo`> because some decoders were asserting when doing their init
[15:18:18 CEST] <wm4> they assert if the size is not supported? just how garbage can android get
[15:19:30 CEST] <mateo`> because i was setting width=0 height=0 and they were doing something like assert(width*height>=sps_width*sps_height) internally
[15:21:37 CEST] <mateo`> i can try to remove the dependency over extradata, width, height if that helps you
[15:36:37 CEST] <JEEB> btw, what sort of buffer sizes in HRD are utilized in broadcast usually? I'd guess something along the lines of 0.5-1 seconds?
[15:36:58 CEST] <JEEB> (and then CBR filling for a CBR mux)
[15:37:50 CEST] <jdarnley> Ah ha.  Found something possibly useful.  "libavcodec/tests/dct -i 2" shows an obvious difference.
[15:41:32 CEST] <DHE> JEEB: I can probably find out...
[15:43:09 CEST] <DHE> oh dear, most of what I see around here is mpeg2. still want it?
[15:45:04 CEST] <mateo`> mmm, idct8x8-0 now fails on aarch64
[15:45:28 CEST] Action: jdarnley wonders if he managed to cause that too
[15:46:11 CEST] <DHE> JEEB: what I extracted from my broadcast sources (not OTA, but close enough) was mpeg2 1080i. video bitrate estimated at 17.3 megabit CBR, buffer size estimated at 3000 kbit
[15:48:15 CEST] <kierank> jdarnley: yes 0.5 to 1 second
[15:48:18 CEST] <kierank> JEEB: 
[15:48:19 CEST] <kierank> i mean
[15:48:32 CEST] <kierank> sometimes 2
[15:48:59 CEST] <JEEB> DHE, kierank - cheers :)
[16:14:53 CEST] <jdarnley> Is there an archive of old coverage reports on the website?
[16:16:37 CEST] <mateo`> jdarnley: regarding the aarch64 failure, it's probably due to edf686f089d68092c3b17a23cc48667665b5a069 
[16:17:15 CEST] <mateo`> and because the aarch64 simple idct neon code not accurate enough, maybe
[16:17:25 CEST] <mateo`> i will investigate
[16:38:13 CEST] <jdarnley> mateo`: I was joking when I wondered whether I broke it.  I haven't altered arm or the tests.
[16:44:10 CEST] <LongChair> @wm4 : sorry was sidetracked, reading back 
[16:45:38 CEST] <mateo`> jdarnley: I didn't get the joke (but didn't think the breakage was your fault either)
[16:49:06 CEST] <jdarnley> :)
[16:54:06 CEST] <LongChair> jkqxz: is there any chance that you could give me a few pointers on where to look at for that hwcontext thing. I saw the sample you posted, but i'd need to see how those structs are used and what for maybe in other decoders :)
[17:04:50 CEST] <atomnuker> grr, I need to finish adding support for post-skip in ogg/opus files
[17:08:11 CEST] <mateo`> jdarnley: i broke it :/
[17:41:06 CEST] <cone-228> ffmpeg 03Michael Niedermayer 07master:d549f026d8b6: avcodec/sbrdsp_fixed: Return an error from sbr_hf_apply_noise() if operations are impossible
[17:41:07 CEST] <cone-228> ffmpeg 03Michael Niedermayer 07master:d1992448d37f: avcodec/aacsbr_fixed: Check shift in sbr_hf_assemble()
[17:41:08 CEST] <cone-228> ffmpeg 03Michael Niedermayer 07master:4cc2a357f5dc: avcodec/aacsbr_fixed: Fix signed integer overflow in sbr_hf_inverse_filter()
[17:47:26 CEST] <cone-228> ffmpeg 03Matthieu Bouron 07master:8aa60606fb64: lavc/aarch64/simple_idct: fix idct_col4_top coefficient
[17:48:49 CEST] <cone-228> ffmpeg 03Matthieu Bouron 07release/3.3:20f5e2c17785: lavc/aarch64/simple_idct: fix idct_col4_top coefficient
[18:49:22 CEST] <JEEB> mateo`: btw which API level are you using?
[18:49:25 CEST] <JEEB> 21 for me
[18:50:20 CEST] <mateo`> 19 for me
[19:30:27 CEST] <durandal_1707> https://youtube.com/watch?v=nKR44fDM_uc
[19:33:01 CEST] <durandal_1707> there is one guy who only fixes security issues, says in podcast
[19:53:29 CEST] <cone-228> ffmpeg 03Timo Rothenpieler 07master:21583e936a06: avfilter/unsharp: fix uninitialized pointer read
[19:53:30 CEST] <cone-228> ffmpeg 03Timo Rothenpieler 07master:0fbc9bbbbb39: avfilter/vf_scale_npp: fix out-of-bounds reads
[19:53:31 CEST] <cone-228> ffmpeg 03Timo Rothenpieler 07master:4b2a2969f3e0: avformat/librtmp: check return value of setsockopt
[19:53:32 CEST] <cone-228> ffmpeg 03Timo Rothenpieler 07master:a5b5ce2658bf: avformat/pcmdec: fix memory leak
[19:53:33 CEST] <cone-228> ffmpeg 03Timo Rothenpieler 07master:feb13aed794a: avfilter/vf_signature: fix memory leaks in error cases
[19:55:36 CEST] <philipl> BtbN: have you looked at how to do generic hwaccel for ffmpeg.c for cuda? Is any work even necessary?
[19:55:51 CEST] <BtbN> i'd assume it'd just work?
[19:59:36 CEST] <philipl> just rm ffmpeg_cuda.c?
[20:02:08 CEST] <BtbN> Is the thing even merged yet?
[20:11:33 CEST] <philipl> No. but presumably will be soonish
[20:23:11 CEST] <wm4> did anyone ever look at supporting PolarSSL/mbed in ffmpeg?
[20:23:26 CEST] <wm4> (I have no idea whether it's license compatible, or technically feasible)
[20:25:02 CEST] <wbs> wm4: I had a 2 minute look back when I made the original backends, but back then it was gplv2 which didn't really feel like it was worth the effort
[20:25:12 CEST] <wbs> wm4: with apache2 now it might be more useful
[20:28:41 CEST] <wm4> yeah, potentially
[20:42:56 CEST] <rcombs> how far along is OpenSSL's apache relicense, anyway
[20:53:35 CEST] <wm4> apparently it started only in march this year
[20:53:46 CEST] <wm4> when they also got bad press for badly worded relicensing mails
[21:14:13 CEST] <rcombs> their CLA process is awful
[21:14:24 CEST] <rcombs> requires signing a (not easily-located) PDF and emailing it in
[21:14:40 CEST] <rcombs> I've got no problem with CLAs for these things but fuck signing PDFs
[21:14:52 CEST] <rcombs> that's an excellent way to get people not to bother
[21:24:36 CEST] Action: JEEB wonders how much random magic people put in their ARM FFmpeg configure lines
[21:26:38 CEST] <JEEB> stuff like extra-cflags="-mfpu=neon"
[21:27:52 CEST] <wbs> JEEB: many environments' baseline is armv7 but without neon, so the compiler isn't free to use neon anywhere in normal generated code
[21:28:20 CEST] <wbs> JEEB: back before 2012 or something such, you would have needed this in order to build with neon simd, but since then we build the neon code regardless and only enable it at runtime if found
[21:28:30 CEST] <JEEB> yea
[21:29:02 CEST] <JEEB> I'm just wondering how much extra optimizations you get by specifically enabling mfpu neon etc :V
[21:29:15 CEST] <JEEB> I just noticed one of my build scripts had extra-cflags="-mcpu=cortex-a8 -mfpu=neon"
[21:29:20 CEST] <JEEB> and I wonder if I should just drop it
[21:29:53 CEST] <JEEB> with x86 I just know that the gains you get from any additional flags is so small it doesn't matter
[21:29:57 CEST] <wbs> well, it allows the compiler to use neon instructions for some e.g. 16 byte loads/stores and such
[21:30:14 CEST] <wbs> especially as long as autovectorization isn't enabled, it probably doesn't gain you anything worthwhile
[21:30:25 CEST] <wbs> in general I'd recommend removing any extra custom flags you don't really need
[21:30:39 CEST] <JEEB> yea, the autovectorization stuff's been disabled for a long time
[21:30:52 CEST] <JEEB> (there was a brief moment of it being enabled IIRC)
[21:31:15 CEST] <JEEB> but yea, I guess no voodoo
[21:31:20 CEST] <JEEB> also for some weird reason I had enabled thumb
[21:31:37 CEST] Action: JEEB stares at his past self
[21:32:21 CEST] <wbs> well thumb is pretty commonly on by default in armv7-baseline toolchains, since iirc they all have thumb2 (and thumb2 is not all too bad contrary to thumb1)
[21:41:51 CEST] <rcombs> I mentioned a while ago but here's a fun fact
[21:42:19 CEST] <rcombs> autoconf inserts -O2 (I think, one of the -O args) in CFLAGS _only if the env variable is empty at configure-time_
[21:42:35 CEST] <rcombs> otherwise it uses whatever you passed and doesn't add an optimization flag
[21:43:01 CEST] <rcombs> (found this out when people complained about libopusenc being slow on ARM)
[21:43:33 CEST] <wbs> yeah, that's a rather annoying thing. I sometimes inject extra cflags via the CC variable when dealing with autotools
[21:44:10 CEST] <rcombs> I sometimes do that when dealing with libtool, since it just _discards compiler/linker flags it doesn't recognize_
[21:44:48 CEST] <JEEB> hah :D
[21:58:50 CEST] <BBB> jdarnley: do you need any more help with the idct patch?
[22:00:02 CEST] <jdarnley> Yes.
[22:00:20 CEST] <jdarnley> Have you seen my two rant-filled emails?
[22:01:06 CEST] <jdarnley> Otherwise I can give you a summary of where I am
[22:12:57 CEST] <jamrial> ubitux: anything stopping your aacps patches? both the checkasm test and the aarch64 functions
[22:14:31 CEST] <jamrial> can't say what to do with that test you found that's failing on arm
[22:18:46 CEST] <BBB> jdarnley: well I did see the rants
[22:19:06 CEST] <BBB> jdarnley: Im not quire sure what the problem was, maybe you can explain it in less ranty and more technical terms to me?
[22:22:32 CEST] <jdarnley> Sure
[22:22:57 CEST] <ubitux> jamrial: yeah i guess i need to investigate that bug
[22:23:08 CEST] <ubitux> otherwise nothing special is preventing the push
[22:23:30 CEST] <jdarnley> The first problem michaelni highlighed (using 3 out of 6 patches) was because the perm_type I set was used for idct add/put functions with different permutation
[22:23:42 CEST] <ubitux> jdarnley: then the aarch64 functions need to be optimized, one is still slower
[22:23:50 CEST] <ubitux> that's what i'm doing tomorrow
[22:24:04 CEST] <jdarnley> I think you mean jamrial ^
[22:24:16 CEST] <jamrial> ubitux: you can look at what i did for deint/ileave (or the arm version for the former) if you want to also port them to aarch64
[22:24:24 CEST] <BBB> jdarnley: uh& wait
[22:24:32 CEST] <BBB> jdarnley: you used different perm_type for idct_add/put than for idct??
[22:24:36 CEST] <BBB> how is that even possible
[22:24:47 CEST] <ubitux> jamrial: ah yeah i need to write that one, maybe in a next iteration
[22:24:52 CEST] <BBB> or you mean it was a typo and you forgot to set it to the correct one?
[22:25:47 CEST] <jdarnley> No, letting the old MMX code set the 3 pointer and perm_type correctly, then I overwrote the idct func. pointer and the perm_type.
[22:26:04 CEST] <jdarnley> Basically the 3 new functions must be appplied together
[22:26:14 CEST] <jdarnley> (I might squash the patches together)
[22:26:16 CEST] <BBB> oh right I see what you mean
[22:26:19 CEST] <BBB> yes yes indeed
[22:26:27 CEST] <BBB> I understand the problem now
[22:26:45 CEST] <BBB> so michaelni is it fixed when you apply all patches (including 4/5/6) from the series?
[22:26:51 CEST] <jdarnley> No
[22:26:59 CEST] <jdarnley> It is still different in that case
[22:27:17 CEST] <BBB> do you know what specifically is different?
[22:27:25 CEST] <BBB> is it idct, add, put, all 3?
[22:27:46 CEST] <jdarnley> Probably all 3 becuase the idct is wrong.
[22:27:59 CEST] <BBB> slightly wrong or totally garbage?
[22:28:03 CEST] <jdarnley> I discovered the alternative tests of libavcodec/tests/dct
[22:28:08 CEST] <jdarnley> Only slightly
[22:28:24 CEST] <jdarnley> try running: libavcodec/tests/dct -i 2
[22:28:27 CEST] <BBB> got a tree I can pull for testing?
[22:30:45 CEST] <jdarnley> The previous link I gave to Gitlab should be the same
[22:31:13 CEST] <jdarnley> Yeah, it's just missing a rebase
[22:31:40 CEST] <jdarnley> https://gitlab.com/J_Darnley/ffmpeg/commits/mpeg2_asm2
[22:32:46 CEST] <BBB> is that the remote I use for git also?
[22:33:23 CEST] <jdarnley> No
[22:34:09 CEST] <jdarnley> https://J_Darnley@gitlab.com/J_Darnley/ffmpeg.git
[22:34:14 CEST] <jdarnley> mpeg2_asm2 branch
[22:35:05 CEST] <jdarnley> What?  Where's that @ come from?
[22:35:07 CEST] <jdarnley> https://gitlab.com/J_Darnley/ffmpeg.git
[22:35:26 CEST] <jdarnley> sorry
[22:35:51 CEST] <BBB> its ok it worked
[22:35:52 CEST] <BBB> :-p
[22:35:57 CEST] <BBB> Im now signed on as you
[22:36:02 CEST] <BBB> <evil>
[22:36:08 CEST] <BBB> what do I look for?
[22:36:10 CEST] <jdarnley> Oh.  I thought that might need login credentials
[22:36:19 CEST] <jdarnley> try running: libavcodec/tests/dct -i 2
[22:36:37 CEST] <jdarnley> You should see that the new functions don't produce the same output as MMX
[22:37:06 CEST] <BBB> IDCT SIMPLE8-SSE2 vs. IDCT SIMPLE-MMX ?
[22:37:11 CEST] <jdarnley> yes
[22:37:48 CEST] <BBB> what is -i 2?
[22:38:02 CEST] <jdarnley> -i <-- run IDCT tests
[22:38:22 CEST] <jdarnley> 2 <-- run "test 2"
[22:38:34 CEST] <jdarnley> I think a comment said it was from an mpeg spec
[22:39:34 CEST] <jdarnley> Ah no, it's from the help output of that
[22:39:46 CEST] <jdarnley> -h
[22:40:03 CEST] <jdarnley> brb
[22:42:57 CEST] <jdarnley> back
[22:59:14 CEST] <BBB> I got a test case, working on it...
[22:59:30 CEST] <BBB> cant say this is mega-exciting but Ill see what I can do ;)
[23:04:11 CEST] <jdarnley> Ha.  I typed up most of an email this afternoon that was supposed to be a reply to michaelni but I never sent it.
[23:20:39 CEST] <wm4> so the new bit reader code actually blocks a feature now? (vp9 vaapi encode support or something)
[23:22:43 CEST] <jkqxz> Not really, it's very easy to convert.  I just didn't really want to bother.
[23:25:17 CEST] <jkqxz> philipl:  It should just work by removing the ffmpeg_cuvid file and changing the hwaccel table entry.  That on its own would change the default behaviour of the hwaccel, though - without -hwaccel_output_format it will download by default.
[23:26:11 CEST] <jkqxz> Probably that isn't what you want?  Really a consequence of the clumsy fake-hwaccel stuff, though, so I'm not sure how it would want to be fixed.
[23:26:19 CEST] <jkqxz> Detect fake hwaccels and default to using the hardware format with them, maybe?
[23:28:29 CEST] <BtbN> don't those fake hwaccels only exist because ffmpeg.c is bad in the first place?
[23:28:30 CEST] <wm4> might be a problem for "compatibility" maybe
[23:32:08 CEST] <TMM> hi all, I've finished the reverse engineering of these extra video formats for ipmovie, I'm trying to figure out how to implement this in avformat/avcodec now
[23:32:47 CEST] <TMM> currently ipmovie.c / interplayvideo.c assumes only one type of video format, what would be the best way of adding support for two more distinct frame formats to it?
[23:33:03 CEST] <TMM> well, frame encodings I suppose
[23:33:44 CEST] <TMM> I can just add a flag to the base structure for the format and handle that in interplayvideo.c, but that's not how it works for, say, avi. Should I rework ipvideo to be a container format somehow?
[23:34:17 CEST] <TMM> also; would an encoder be a mandatory part of a patch to ffmpeg?
[23:34:25 CEST] <JEEB> no
[23:34:31 CEST] <JEEB> decoder-only patches are completely OK
[23:34:49 CEST] <JEEB> we have various formats where making an encoder (as in creating /more/ files in a format) just doesn't make sense
[23:34:56 CEST] <wm4> either new codec ID, or maybe just switch on codec_tag or make up extradata for parameters
[23:35:14 CEST] <TMM> wm4, there are only 3 possibilities
[23:35:43 CEST] <wm4> not sure if codec_tag is appropriate for putting in random parameters, but I don't think anyone would mind
[23:36:51 CEST] <TMM> ipmovie.c already has several extra fields that aren't non-standard it seems, and I need at least one more (for an extra data stream)
[23:37:04 CEST] <TMM> IPMVEContext
[23:37:11 CEST] <TMM> I could stick it in there 
[23:37:42 CEST] <durandal_1707> if stuff is not signalled in bitstream use extradata
[23:38:12 CEST] <nevcairiel> you can do whatever you want internally in the demuxer, the question is what information you need to communicate to the video decoder, if any
[23:38:49 CEST] <TMM> i need to communicate the frame format, the 'decoding map (already used by the current frame format), and a 'skip map'
[23:38:50 CEST] <durandal_1707> eg. demuxer signals via extradata to decoder some info
[23:39:00 CEST] <TMM> is that this IPMVEContext struct?
[23:39:53 CEST] <wm4> demuxer and decoder are strictly separate
[23:40:03 CEST] <wm4> you can't access the IP specific structs from each other
[23:40:24 CEST] <durandal_1707> TMM: no, thats private to demuxer/decoder
[23:40:37 CEST] <BBB> wtf
[23:40:42 CEST] <BBB> the dc in pass 1 is 1 off
[23:40:48 CEST] <BBB> how on earth is that possible
[23:40:52 CEST] <TMM> ok, I'll have to figure out how this data is handed to interplayvideo.c now then
[23:40:56 CEST] <durandal_1707> you can signal frame from demuxer to decoder
[23:41:30 CEST] <BBB> jdarnley: -13087 in sse2 vs. -13088 in mmx/c
[23:41:35 CEST] <BBB> dc at end of pass=1
[23:41:40 CEST] <TMM> also; if one of you would mind having a look at this fine loop: https://notabug.org/hp/rekingdom/src/master/video.c#L405 <--- this is my implementation. I'm calling this a 'skip map' but perhaps someone recognizes this structure as something that has a real name
[23:44:31 CEST] <BBB> oh curse it its the special dc-only behaviour which is broken
[23:44:39 CEST] <BBB> I dont think we want to fix that
[23:45:17 CEST] <durandal_1707> TMM: skip map is fine
[23:45:39 CEST] <jdarnley> BBB: I missed that?
[23:46:03 CEST] <jdarnley> oh..
[23:46:51 CEST] <BBB> remember how W4 changed from 16384 to 16383
[23:46:56 CEST] <jdarnley> yes
[23:46:58 CEST] <BBB> that should really be 1638
[23:47:01 CEST] <BBB> 16384
[23:47:09 CEST] <BBB> I dont know why it is 16383 in the mmx code
[23:47:28 CEST] <BBB> I believe that slightly alters the dc coefficient outputs in the non-dc-only case of the C code
[23:47:38 CEST] <jdarnley> Because the 8-bit C code has 16383
[23:48:12 CEST] <BBB> I dont understand why thought
[23:48:13 CEST] <BBB> -t
[23:48:22 CEST] <BBB> why wouldnt we make it 16384?
[23:48:27 CEST] <BBB> theres no overflow at 14bit
[23:49:01 CEST] <BBB> anyway
[23:49:07 CEST] <BBB> so thats causing the differences between sse2/mmx
[23:49:10 CEST] <BBB> or sse2/c
[23:49:30 CEST] <BBB> the c code says that if dc-only, do a special case where you just shift it
[23:49:40 CEST] <BBB> which is identical to coef which is power-of-two
[23:49:42 CEST] <BBB> such as 16384
[23:49:47 CEST] <BBB> then in the non-dconly case, use 16383
[23:49:50 CEST] <BBB> the mmx code does that too
[23:49:58 CEST] <BBB> the sse2 code has no special case, it always does a full idct
[23:50:06 CEST] <BBB> so in both cases it uses either 16384 or 16383
[23:50:09 CEST] <BBB> that acuses the bug
[23:50:22 CEST] <TMM> hum, I'm looking at libavformat/ipmovie.c I don't quite understand how the chunk map gets sent to libavcodec/interplayvideo.c, it seems to appear as a avctx->priv_data.
[23:50:30 CEST] <jdarnley> Well that is a comprehensive explaination.
[23:50:36 CEST] <TMM> it's a different structure, but I can't figure out where it's being filled
[23:51:20 CEST] <nevcairiel> maybe the entire chunk is just part of the video track and decoded
[23:51:42 CEST] <TMM> some of it seems to come from avpkt->data
[23:52:01 CEST] <TMM> I'd need two of those though? grmbl
[23:53:31 CEST] <jdarnley> Okay... new branch, coeff change, see what happens
[23:53:39 CEST] <jdarnley> BBB:  What about the W3 change?  19266 vs 9265
[23:53:47 CEST] <BBB> not sure yet
[23:53:47 CEST] <jdarnley> 19266 vs 19265
[23:53:53 CEST] <jdarnley> okay
[23:54:02 CEST] <BBB> but if we change the C code and MMX code to use 16384, it should be fine
[23:54:06 CEST] <BBB> that may break other things, not sure
[23:54:10 CEST] <BBB> this mpeg stuff is creepy :D
[23:54:21 CEST] <TMM> wm4, also, sorry, I didn't realize that IpvideoContext and IPMVEContext were completely different... I should listen better
[23:56:50 CEST] <durandal_1707> TMM: demuxer prepares data for decoder
[23:57:09 CEST] <BBB> yes, I get 0 maxErr after that for mmx, sse2 as well as C
[23:57:13 CEST] <BBB> so they match exactly
[23:57:29 CEST] <BBB> I cant really comment on why the value 16383 was there, even the comments say 16384.000 is what it should be
[23:57:44 CEST] <jdarnley> A many year old typo?
[23:57:52 CEST] <BBB> ...
[23:57:58 CEST] <BBB> I wish I knew
[23:58:09 CEST] <BBB> Im really quite quite quite sure the value should be 16384
[23:58:22 CEST] <BBB> michaelni: any idea?
[23:58:24 CEST] Action: jdarnley goes to his server for this testing
[23:58:37 CEST] <BBB> sorry for having to go through this
[23:58:42 CEST] <BBB> this is probably not the most fun thing to work on
[23:59:29 CEST] <BBB> 19266 vs. 19265, the 19266 may be more correct
[23:59:33 CEST] <BBB> the rounded value is 19265.54
[23:59:40 CEST] <BBB> so correct rounding would make that 19266
[23:59:53 CEST] <BBB> I dont know why we made it 19265 for the 10bit case
[00:00:00 CEST] --- Wed Jun 14 2017