[Ffmpeg-devel-irc] ffmpeg-devel.log.20170616

Sat Jun 17 03:05:02 EEST 2017

[00:24:05 CEST] <cone-959> ffmpeg 03Michael Niedermayer 07master:611b35627488: avcodec/dnxhd_parser: Do not return invalid value from dnxhd_find_frame_end() on error
[00:24:06 CEST] <cone-959> ffmpeg 03Michael Niedermayer 07master:e3fadc57c5c1: avcodec/jpeg2000: Fixes integer overflow in ff_jpeg2000_ceildivpow2()
[00:24:07 CEST] <cone-959> ffmpeg 03Michael Niedermayer 07master:3c716682a8b6: avcodec/truemotion2: Move skip computation after checks
[01:13:17 CEST] <Zeranoe> Has there been any work done on a non-local means algorithm that uses the GPU?
[01:18:04 CEST] <Zeranoe> This http://www.joig.org/uploadfile/2015/0911/20150911035858832.pdf claims a 25x increase in speed
[02:25:32 CEST] <atomnuker> considering vulkan is required to and by default doesn't need anything to present to and can just run whatever you want on whatever they sky's the limit
[02:32:50 CEST] <cone-959> ffmpeg 03Michael Niedermayer 07master:c0607d88ee1f: avcodec/parser: assert that there is a past buffer if theres a reference into the past
[02:44:28 CEST] <Zeranoe> Has anything based on Vulkan been merged?
[03:06:37 CEST] <atomnuker> nope
[03:37:12 CEST] <TD-Linux> darktable has a noise reduction nlmeans filter implemented with opencl
[04:22:03 CEST] <cone-959> ffmpeg 03James Almer 07master:b3446862bfdb: x86/vorbisdsp: optimize ff_vorbis_inverse_coupling_sse
[04:51:19 CEST] <cone-959> ffmpeg 03James Almer 07master:623d217ed1ba: avcodec/aacps: move checks for valid length outside the stereo_interpolate dsp function
[04:54:56 CEST] <FishPencil> Do most FFmpeg devs hold CS degrees and work in the field? Is FFmpeg a fulltime job for anyone?
[11:08:28 CEST] <kierank> FishPencil: everything from no degree to phd i think
[11:08:33 CEST] <kierank> i.e doesn't matter
[13:42:38 CEST] <kierank> michaelni: did you have a broken build perhaps
[13:50:23 CEST] <BBB> did he test the earlier patch which failed to allocate stack space?
[13:54:19 CEST] <J_Darnley> He was replying to the right one
[13:55:45 CEST] <atomnuker> BBB: btw managed to compile git master on the board, same, no difference in speed since a month
[13:56:04 CEST] <atomnuker> all the top 10 functions in perf were from the loopfilter
[13:56:25 CEST] <atomnuker> (couldn't even see anything dct related on the first page)
[13:56:40 CEST] <BBB> hm...
[13:56:50 CEST] <BBB> J_Darnley: lets ask him :)
[13:56:56 CEST] <atomnuker> ubitux: why did I manage to compile ffmpeg without gas-preprocessor.pl anywhere on the system?
[13:57:14 CEST] <nevcairiel> wasnt that fixed
[13:57:28 CEST] <atomnuker> the code in configure even still says it'll reject aarch64 builds without gas-preprocessor
[13:58:24 CEST] <atomnuker> yet it still compiled with neon, and I made very sure I'm not tripping and deleted the only gas-preprocessor.pl I had anywhere
[13:59:38 CEST] <michaelni> J_Darnley, ill retest but i remeber revertig the od patch and applying the new so it should have been the right one
[14:01:37 CEST] <BBB> atomnuker: is the loopfilter in C or in assembly?
[14:01:46 CEST] <BBB> (for aarch64)
[14:01:47 CEST] <atomnuker> C
[14:01:51 CEST] <BBB> wut?
[14:01:56 CEST] <BBB> wbs: whathappened?!?
[14:02:04 CEST] <BBB> did that get overlooked?
[14:02:10 CEST] <J_Darnley> michaelni: Thank you and sorry to be a pain in the backside
[14:02:34 CEST] <BBB> aarch64/vp9lpf_neon.o ?
[14:02:35 CEST] <ubitux> atomnuker: aarch64 doesn't require gas pp anymore afaik
[14:02:39 CEST] <BBB> that doesnt sound right
[14:02:59 CEST] <ubitux> atomnuker: actually, even arm should be fine, that might be just for the apple stack
[14:03:30 CEST] <ubitux> and even the apple stack should be fine without gaspp for aarch64
[14:03:35 CEST] <atomnuker> ubitux: since which commit?
[14:03:49 CEST] <ubitux> since the apple stack works fine?
[14:03:58 CEST] <mateo`> https://github.com/FFmpeg/FFmpeg/commit/8aa60606fb64b8280627935b0df55d4d2aeca5d1
[14:04:00 CEST] <atomnuker> no, aarch64 doesn't require gaspp
[14:04:12 CEST] <ubitux> it may need it before the linked hash
[14:04:21 CEST] <ubitux> but on apple only
[14:04:28 CEST] <BBB> atomnuker: and vp9dsp_loopfilter_init_aarch64
[14:04:44 CEST] <BBB> atomnuker: there should be loopfilter simd for aarch64
[14:04:46 CEST] <BBB> the source is there
[14:04:58 CEST] <atomnuker> BBB: oh,    6.19%  ffmpeg_g            [.] ff_vp9_loop_filter_h_16_16_neon
[14:05:04 CEST] <BBB> \o/
[14:05:20 CEST] <BBB> wbs: nevermind :-p
[14:05:33 CEST] <ubitux> btw, after a few days we managed with mateo` to get a mainline linux on the hikey
[14:05:38 CEST] <ubitux> unfortunately, still no pmu
[14:05:42 CEST] <mateo`> yet
[14:05:45 CEST] <ubitux> yet.
[14:05:48 CEST] <atomnuker> hikey?
[14:06:12 CEST] <ubitux> one of the first aarch64 dev board
[14:06:26 CEST] <ubitux> http://www.96boards.org/product/hikey/
[14:06:41 CEST] <ubitux> it's a nightmare but we were able to flash arch on the emmc
[14:08:02 CEST] <mateo`> wbs: do you want to fix the vp9 aarch64 asm with apple clang or should i do it ?
[14:13:59 CEST] <atomnuker> BBB: https://0x0.st/lKO.txt <- perf printout if you're interested
[14:14:17 CEST] <BBB> oh yeah so the top is simply MC
[14:14:20 CEST] <BBB> thats totally normal
[14:14:31 CEST] <BBB> thats a heavy loopfilter indeed
[14:14:57 CEST] <atomnuker> what's up with that memcpy?
[14:16:27 CEST] <atomnuker> (maybe its in lavf actually)
[14:18:14 CEST] <BBB> would need to see call trees
[14:18:17 CEST] <BBB> difficult to say
[14:18:33 CEST] <BBB> I dont think theres a ton of memcpy in the decoder
[14:18:38 CEST] <BBB> but maybe Im wrong
[14:18:47 CEST] <BBB> I guess loopfilter backup is memcpy
[14:29:59 CEST] <kierank> J_Darnley: michaelni is encoding the file
[14:30:03 CEST] <kierank> whereas you are playing it
[14:30:04 CEST] <kierank> afaik
[14:30:47 CEST] <J_Darnley> I've tried his exact command now I have the same file.
[14:30:55 CEST] <J_Darnley> I get no difference.
[14:31:04 CEST] <J_Darnley> No more errors
[14:31:17 CEST] <J_Darnley> (See my last email)
[14:31:19 CEST] <kierank> then i dunno
[14:32:41 CEST] <michaelni> J_Darnley, the error occur in the ffplay command not the encode command
[14:32:47 CEST] <michaelni> errorS
[14:33:07 CEST] <michaelni> theres in fact noting resemling the original displayed
[14:34:48 CEST] <michaelni> J_Darnley, see ML btw ive repled 5min ago in case you didnt see it
[14:35:00 CEST] <J_Darnley> I did not, thanks
[14:35:16 CEST] Action: J_Darnley moves his mIRC window over
[14:41:15 CEST] <gnafu> Wow, mIRC...
[14:41:25 CEST] <gnafu> "Now there's a name I've not heard in a long time."
[14:42:36 CEST] <J_Darnley> :)
[14:43:20 CEST] <J_Darnley> michaelni: is the permutation in the else block "no permutation"?
[14:44:25 CEST] <J_Darnley> Looks like all the array indicies are the same
[14:47:02 CEST] <michaelni> its probably no perm yes
[14:47:45 CEST] <J_Darnley> Okay.  I guess I'll look at addressing that.
[15:44:11 CEST] <jkqxz> LongChair: wm4:  A quick-and-dirty attempt at DRM hwcontext <http://sprunge.us/EYRV>.
[15:45:14 CEST] <jkqxz> E.g. on Intel: "./ffmpeg_g -y -init_hw_device drm:/dev/dri/card0 -i in.mp4 -an -filter_hw_device drm0 -vf 'format=nv12,hwupload,format=drm,hwmap=derive_device=vaapi,format=vaapi' -c:v h264_vaapi out.mp4"
[15:49:04 CEST] <jkqxz> Unclear how that would need to change for anything else.  The Intel VAAPI driver wants a single object containing both planes of NV12, which is what I've hacked up there.
[15:51:11 CEST] <jkqxz> I think we might want more metadata about the size of each object.  Maybe the descriptor would be better as a list of objects with sizes and then a list of planes as object index + offset and pitch?
[15:52:45 CEST] <wm4> what's a dumb buffer?
[15:55:31 CEST] <jkqxz> The simplest buffer type the driver is able to use for scanout.  (Since the DRM drivers come from rendering for output that's the baseline support for every driver.)
[15:55:53 CEST] <wm4> also I think we might need better code structure for MxN device mappings
[15:56:15 CEST] <wm4> currently it seems if we support A->B mapping, we put for code for it into A.c, with #ifdef CONFIG_B
[15:56:36 CEST] <wm4> maybe we should have map_a_b.c or something
[15:56:59 CEST] <wm4> jkqxz: what would other types be? tiled and such?
[15:57:02 CEST] <jkqxz> Not necessarily - see OpenCL.  I put the code in the "downstream" API if that is well-defined.
[15:57:22 CEST] <jkqxz> In this case VAAPI is downstream from DRM, so it's there as map_to.
[15:57:47 CEST] <jkqxz> Though yeah, it will get a bit clumsy if there are more variations.
[15:59:15 CEST] <LongChair> @jkqxz : checking
[15:59:48 CEST] <jkqxz> Tiled is a modifier to other buffers, not a type itself.  GEM is the main other type, there are probably more possibilities.
[16:01:15 CEST] <jkqxz> (Many of the drivers have 9001 private ioctls implementing all sorts of crazy things.)
[16:01:51 CEST] <wm4> apropos ioctl, doesn't libdrm have wrappers for the standard ones?
[16:01:53 CEST] <wm4> (not sure)
[16:03:49 CEST] <jkqxz> Only some of them.  The libdrm API is very fragmented, and much of it is driver-specific.
[16:04:39 CEST] <LongChair> i dont think DRM_IOCTL_MODE_CREATE_DUMB has any but DRM_IOCTL_PRIME_HANDLE_TO_FD has some
[16:04:40 CEST] <jkqxz> The dumb buffers don't.  Looks like the fd <-> handle ones do, though - I can change that.
[16:04:47 CEST] <jkqxz> Yeah, that :)
[16:05:27 CEST] <wm4> (then why does libdrm even exist)
[16:05:35 CEST] <wm4> time to call all linux kernel developers stupid
[16:05:49 CEST] <LongChair> DRM_IOCTL_PRIME_HANDLE_TO_FD -> drmPrimeFDToHandle
[16:06:03 CEST] <LongChair> or the other way round :)
[16:08:24 CEST] <LongChair> jkqxz: if i understand correctly, that attempt creates a dumb buffer pool. in RK case, MPP will create it's own dumb buffers
[16:08:39 CEST] <jkqxz> It doesn't need to create anything.
[16:08:57 CEST] <jkqxz> You can supply a pool if you want, or just attach a frames context and never allocate in it.
[16:10:42 CEST] <jkqxz> What you want for MPP here is to start with a device context (fd can be unset or your DRM node if you have one, doesn't really matter), then whenever the format changes you make a new empty frames context with the appropriate height/width/format and attach that to all of the output frames.
[16:11:30 CEST] <J_Darnley> michaelni, BBB: I've sent two patches to address the transpose issue
[16:11:57 CEST] <wm4> LongChair: again this doesn't change functionality, it's mostly an "API formalities" thing, so we can handle HW decoding in an uniform way
[16:12:48 CEST] <BBB> J_Darnley: cool
[16:21:10 CEST] <LongChair> jkqxz: i need ot read that carefully, but is the AV_PIX_FMT_DRM supposed to replace the AV_PIX_FMT_DRMPRIME format i had in the mpp hwdec ? 
[16:24:58 CEST] <jkqxz> It's the same thing, yeah.  Don't really mind on the name of the pixfmt - it seemed wrong to use it the hwcontext name (PRIME is just the sharing mechanism) so I made it DRM throughout, maybe DRM_PRIME would be clearer for the pixfmt type though.
[16:25:30 CEST] <LongChair> yeah  AV_PIX_FMT_DRM is fine by me, just wanted to make sure :)
[16:26:11 CEST] <cone-613> ffmpeg 03Rostislav Pehlivanov 07master:18f09524f72c: configure: use -x instead of -wN ..@ to strip assembly files
[16:26:36 CEST] <atomnuker> J_Darnley: had forgotten about this, pushed
[16:26:38 CEST] <J_Darnley> I forgot about that ^
[16:26:41 CEST] <J_Darnley> .. too
[16:26:54 CEST] <jkqxz> Hmm, maybe DRM_PRIME would be better to make clear that it's always an fd rather than an internal handle or a flinked name.
[16:28:15 CEST] <jkqxz> (And if someone cares you could have DRM_FLINK as well, I guess.  Does anything new use flink nowadays?)
[16:31:28 CEST] <LongChair> jkqxz: so what's your plan, finsih up with VAAPI & me adapting MPP to it ? 
[16:33:49 CEST] <jkqxz> Dunno.  I wrote that to try to clarify what the requirements should be, so that the external API part can be set.
[16:35:08 CEST] <jkqxz> VAAPI doesn't really need it, though it makes a nice test case to see it working.
[16:38:25 CEST] <LongChair> jkqxz: up to you :) 
[17:24:23 CEST] <jkqxz> LongChair: wm4:  New version <http://sprunge.us/fVaM>.  DRM -> DRM_PRIME for the pixfmt, and I've made the descriptor a bit more verbose and hopefully clearer.
[17:24:49 CEST] <jkqxz> Also the DRM format is per-object.
[17:28:51 CEST] <wm4> jkqxz: how would rockchip's funny 10 bit format fit in there?
[17:29:32 CEST] <wm4> also why is the format per object?
[17:29:33 CEST] <jkqxz> Just make a new pixfmt for it to use as sw_format, and the DRM stuff is whatever they use.
[17:29:37 CEST] <wm4> I'm not sure how much that makes sense
[17:29:47 CEST] <wm4> I know vaapi uses different formats per plane
[17:29:55 CEST] <wm4> but the rockchip thing uses a format for the whole thing
[17:30:18 CEST] <jkqxz> The rockchip thing is also one object, though?
[17:30:26 CEST] <wm4> (rockchip also forces RGB conversion when you use these frames, so it "sort of" makes sense)
[17:30:28 CEST] <wm4> hm, not sure
[17:30:33 CEST] <wm4> LongChair: ?
[17:31:13 CEST] <jkqxz> I'm more thinking of Mesa, where you get an R8 object and an RG88 object separately.
[17:31:38 CEST] <jkqxz> So that really would be two separate objects and want this style.
[17:32:22 CEST] <BBB> J_Darnley: can we use the permutation table directly in the quantization stage?
[17:32:35 CEST] <BBB> J_Darnley: it seems strange that wed need a specific extra place to check for perm_type
[17:32:42 CEST] <wm4> jkqxz: yeah, ok, I get it
[17:32:42 CEST] <BBB> not that Im against it, and maybe this is more a question for michaelni
[17:33:00 CEST] <wm4> jkqxz: the thing is, NV12 actually still has 2 planes
[17:33:19 CEST] <wm4> each plane has an offset into the buffer, and a stride
[17:33:34 CEST] <wm4> but I suppose that's why you have separate arrays
[17:33:52 CEST] <wm4> so it's up to the final API user to know about the difference
[17:33:53 CEST] <jkqxz> Maybe there should be an AV_PIX_FMT_WTF to represent something opaque and bizarre which you shouldn't look inside, and we can use that as the sw_format for rockchip 10-bit and similar things.
[17:34:07 CEST] <jkqxz> In the absence of swscale support that's basically what it is, since you can't do anything else with it.
[17:34:21 CEST] <wm4> would it be possible to define an opaque sw_format? or the thing that I wanted from the beginning: a hw-API-specific int64 identifier
[17:34:33 CEST] <wm4> AVHWFramesContext.sw_format_native or something
[17:36:40 CEST] <jkqxz> I think the answer to the format problem depends on how the multiple-plane ones are actually represented internally.
[17:36:59 CEST] <jkqxz> Maybe it's just meaningless on the object, and should instead be attached to the plane - that would match how EGL uses it?
[17:37:32 CEST] <jkqxz> Or does EGL import on rockchip actually have a single NV12 "plane" in DRM_FORMAT_NV12?
[17:37:50 CEST] <jkqxz> Which magically does ... something ... in the driver.
[17:39:15 CEST] <jkqxz> (The dumb buffers in that patch are entirely independent of the format - I just gave it a size and made up the format myself.)
[17:40:47 CEST] <wm4> well LongChair linked the (working) EGL code for this
[17:40:52 CEST] <wm4> but I think the answer is yes?
[17:41:04 CEST] <wm4> they don't allow you to actually sample from the planes
[17:41:09 CEST] <wm4> you get a virtual RGB texture
[17:42:36 CEST] <J_Darnley> BBB: I don't know.  Where could I find that table or related code?
[17:42:53 CEST] <BBB> I dont know, I would probably defer that question to michaelni 
[17:43:53 CEST] <BBB> I would guess its idct_permutation[]?
[17:44:03 CEST] <BBB> but maybe theres something special to the one used in quantization
[17:44:36 CEST] <BBB> or maybe it needs the inverse (like T-1; that should be trivial to write but maybe there was a reason michaelni didnt want to add it in the struct, e.g. to reduce memory usage or so)
[17:44:46 CEST] <BBB> or maybe a hand-unrolled quant is faster?
[17:44:50 CEST] <BBB> not sure
[17:45:25 CEST] <michaelni> its hand unrolled for speed
[17:48:04 CEST] <BBB> that makes sense
[17:48:49 CEST] <BBB> michaelni: do you think a default: based on T-1 idct_permutation makes sense? or would you prefer it to assert out so we write the correct code in that place?
[17:49:04 CEST] <BBB> (default only if the hand-unrolled version for that perm_type is missing, obviously)
[17:49:55 CEST] <BBB> (I have no preference, Im just blurping out random ideas, Im fine with things as they are)
[17:51:19 CEST] <michaelni> i like the assert as it makes sure theres always optimal code and nothing missing
[17:52:20 CEST] <BBB> ok
[17:52:49 CEST] <BBB> J_Darnley: if you need help debugging the mismatches from michaelni, let me know
[17:53:08 CEST] <BBB> I think were slowly getting closer to perfect, even if were not exactly there yet
[17:53:14 CEST] <J_Darnley> I will.  I'm just getting the file as we speak
[17:53:36 CEST] <BBB> you will look at it? or you will need help? :-p
[17:53:50 CEST] <BBB> kierank: Im sorry for the rabbit hole
[17:55:18 CEST] <J_Darnley> I will in a short while
[18:27:32 CEST] <jkqxz> wm4:  Having looked at that a bit more, I think the set of "planes" should just be the set of things you import to EGL.  Anything else will just cause confusion.
[18:27:55 CEST] <jkqxz> So, the DRM format goes with the "plane", and they probably shouldn't be called planes.
[18:30:56 CEST] Action: J_Darnley gets back to work
[18:39:31 CEST] <wm4> jkqxz: depends how you use it with native DRM I guess, but maybe that's not much different
[18:45:04 CEST] <J_Darnley> I can reproduce michaelni's differences.
[18:46:17 CEST] <J_Darnley> BBB: did you expect the new functions to produce the same output as C or MMX?
[18:46:25 CEST] <BBB> both?
[18:46:28 CEST] <J_Darnley> I've forgotten which we were aiming for.
[18:46:42 CEST] <BBB> mmx, basically, but Im not sure they ever are differeny
[18:47:20 CEST] <BBB> I believe the idct itself should always be identical, but some things may not honor perm_type and then it would behave different from C
[18:47:28 CEST] <BBB> (but then mmx/c would differ also)
[18:54:13 CEST] Action: J_Darnley wonders whether that transpose issue was also the problem with fate
[18:59:58 CEST] <J_Darnley> no, not entirely
[19:01:18 CEST] <BBB> you mean the wmv1 failure?
[19:01:26 CEST] <BBB> (vsynth)
[19:01:27 CEST] <J_Darnley> yes
[19:01:47 CEST] <BBB> it would probably be part of it
[19:01:57 CEST] <J_Darnley> mmm part
[19:02:01 CEST] <BBB> but michaelni may be able to indicate other parts where similar code needs to be added
[19:02:24 CEST] <BBB> Im more concerned about cases where perm_type is entirely ignored, especially in semi-custom encoders like wmv2
[19:03:42 CEST] <J_Darnley> In his test I do get a different result between the mmx and the new sse2.  However I get the same result if I use C and the new sse2
[19:04:08 CEST] <BBB> so mmx/c are different?
[19:04:23 CEST] <BBB> all hacks are modeled after the C code btw
[19:04:29 CEST] <BBB> I still dont fully understand the mmx code
[19:04:40 CEST] <J_Darnley> I think they always were because the mmx is not used for -idct simple
[19:05:07 CEST] <BBB> I thought that had to do with perm_type being ignored by some code
[19:05:17 CEST] <BBB> and mmx perm_type is different than c perm_type
[19:05:30 CEST] <BBB> but maybe results are actually different for the 2
[19:05:36 CEST] <BBB> if they are, thatd be interesting
[19:05:45 CEST] <BBB> and maybe we need to model simple, not simple_mmx
[19:05:46 CEST] <BBB> ??
[19:14:38 CEST] <J_Darnley> Here's some good news: the fate failues with the sse2 are down to just wmv and "mdec"
[19:14:48 CEST] <BBB> wmv1?
[19:14:54 CEST] <J_Darnley> yes
[19:14:58 CEST] <BBB> mdec
[19:14:59 CEST] <BBB> hm
[19:15:14 CEST] <BBB> of all the things I am not interested in, wmv1 must be pretty much on top of that list
[19:15:16 CEST] <J_Darnley> fate-vsynth{1,2,3_lena}-wmv1
[19:15:23 CEST] <BBB> but I can imagine that mdec is high on that list also
[19:15:25 CEST] <BBB> :-p
[19:15:30 CEST] <BBB> lets have a look
[19:15:31 CEST] <J_Darnley> fate-mdec fate mdec-v3
[19:15:47 CEST] <BBB> mdec is decoding?
[19:16:05 CEST] <J_Darnley> I don't know, I haven't looked at what the tests do yet.
[19:16:34 CEST] <J_Darnley> I would say "probably, it sounds like some obscure format nobody wants to create"
[19:19:28 CEST] <BBB> its based on idct_put and appears to use the scantable correctly
[19:19:32 CEST] <BBB> it does have a weird line:
[19:19:49 CEST] <BBB>     if (avctx->idct_algo == FF_IDCT_AUTO)
[19:19:49 CEST] <BBB>         avctx->idct_algo = FF_IDCT_SIMPLE;
[19:20:41 CEST] <J_Darnley> Skipping the "incorrect" mmx , maybe
[19:21:01 CEST] <BBB> its after init, not before
[19:21:29 CEST] <J_Darnley> After initing the idct dsp struct?  Then I'm not sure
[19:22:09 CEST] <BBB> yes, and the scan table
[19:22:54 CEST] <BBB> maybe its insignificant
[19:22:57 CEST] <BBB> not sure
[19:23:07 CEST] <BBB> I can look at mdec, it may provide some insights
[19:23:11 CEST] <BBB> latest branch name? :)
[19:23:16 CEST] <BBB> Ill check after lunch
[19:23:26 CEST] <J_Darnley> let me push then it should be mpeg2_asm6
[19:23:43 CEST] <BBB> okay
[19:24:03 CEST] <J_Darnley> done
[19:25:50 CEST] <BBB> while its building, Ill head out for lunch
[19:25:51 CEST] <BBB> brb
[19:25:58 CEST] <J_Darnley> have a nice lunch
[19:29:06 CEST] <tdjones> atomnuker: Could you help me debug psy in vorbis when you have a chance? It is failing to return any window suggestions for the second channel of audio even when I duplicate two mono channels for the input.
[19:30:25 CEST] <tdjones> Here's the commit I'm working off of
[19:30:36 CEST] Action: tdjones posted a file: 0001-WIP-libavcodec-vorbisenc-Include-AAC-psy-model.patch (10KB) <https://matrix.org/_matrix/media/v1/download/matrix.org/UttmUwbsBuNQrUyVEYmAzLgB>
[19:35:55 CEST] <J_Darnley> BBB: look at e3e3c82555 and tell me if you think that was a good idea?  :D
[19:39:34 CEST] <atomnuker> tdjones: what does it return?
[19:40:22 CEST] <tdjones> It just returns ONLY_LONG for the second channel
[19:41:19 CEST] <tdjones> It behaves how you would expect in aac, so it must be something that I've done incorrectly
[19:42:44 CEST] <atomnuker> tdjones: what happens if you zero out the left and then the right channel?
[19:43:15 CEST] <atomnuker> and you look only at the first channel's window type
[19:50:31 CEST] <tdjones> Muting the left channel ends up returning only_long for both channels
[19:50:43 CEST] <tdjones> Muting right channel returns only_long for first channel again
[19:53:33 CEST] <J_Darnley> Maybe I should be so hard on the guy, the code didn't have the same idctdspback then.
[19:56:34 CEST] <atomnuker> tdjones: so if you mut the right channel you get something other than only_long on the second channel?
[19:56:37 CEST] <atomnuker> *mute
[19:57:03 CEST] <tdjones> I said that backwards, my bad
[19:57:12 CEST] <tdjones> It returns only_long for second channel
[19:57:59 CEST] <tdjones> What I meant is that muting the right channel seems to have no effect on the output of the psy model
[20:01:12 CEST] <atomnuker> well, drop the patch for now, the transient detector will be easily replaced by the opus one (which needs a little more work)
[20:01:47 CEST] <atomnuker> its more important to get more functionality out of the encoder
[20:02:30 CEST] <atomnuker> if you need to test whether switching between transient and non-transient frames work just add a counter and e.g. mark every 10th frame as a transient
[20:02:54 CEST] <tdjones> I have block switching and window application working correctly, but I feel that is pointless now without any psy model to provide the input
[20:03:02 CEST] <atomnuker> (and make sure to be able to handle cases where either the entire file has no transients or every single frame is a transient)
[20:03:58 CEST] <atomnuker> tdjones: I'll try to push the opus psy with just transient handling out by the end of the weekend
[20:05:26 CEST] <atomnuker> until then see if you can get a classbook search for the residual working
[20:06:00 CEST] <tdjones> atomnuker: Alright, I appreciate the help. Thank you
[20:09:51 CEST] <atomnuker> tdjones: though if you think its more useful to have some sort of temporary transient handling first go for it and try debugging what goes wrong
[20:10:21 CEST] <atomnuker> just put printfs in both the aac and the vorbis encoder and look for what's different on the psy system inputs
[20:11:07 CEST] <tdjones> atomnuker: That's what I've been doing, I'll probably try some more and put it on the sidelines for a bit if I don't make progress
[20:11:29 CEST] <atomnuker> alright
[20:20:57 CEST] <BBB> J_Darnley: dunno& it seems to me we should set the idct_type before (not after) the idctdsp init?
[20:22:54 CEST] <J_Darnley> That would be the proper thing to do but doesn't help us identify where it fails when we do use permutation.
[20:23:48 CEST] <J_Darnley> Do you know if it is right to call ff_init_scantable with ff_zigzag_direct and not something out of a DSP struct?
[20:39:09 CEST] Action: J_Darnley does to look at wmv1
[20:55:18 CEST] <J_Darnley> Ah.  It is in mpegvideo_enc
[20:56:34 CEST] <BBB> I think its ok, but Im not 100% sure
[20:56:38 CEST] <J_Darnley> ah but might really be h263
[20:59:26 CEST] <J_Darnley> but there's no idct in there
[20:59:28 CEST] <BBB> Im not 100% sure about the mdec one, it looks very trivial
[20:59:37 CEST] <BBB> Ill have to actually debug tht in detail :/
[21:00:11 CEST] <J_Darnley> fun!
[21:01:52 CEST] <BBB> so much fun
[21:02:11 CEST] <BBB> now imagine you having to explain this to an audience of people in a blog or at vdd of people that are not familiar with mpegvideo
[21:02:29 CEST] <BBB> and you cant tell them to go read it because torture is forbidden by the geneva convention
[21:06:43 CEST] <nevcairiel> that only covers prisoners, not willing participants
[21:07:42 CEST] <BBB> oh I see
[21:08:00 CEST] <BBB> is locking them up in a room with jdarnley and throwing away the key prisoner, or willing?
[21:08:05 CEST] <BBB> (they entered voluntarily)
[21:09:57 CEST] <BBB> J_Darnley: fate-mdec passes for me
[21:10:00 CEST] <BBB> J_Darnley: what fails?
[21:13:20 CEST] <BBB> or should I test without the -idct simple flag?
[21:15:46 CEST] <J_Darnley> Yes, you need to force it to use the new code.  I suggest adding to selection in avcodec/x86/idctdsp_init.c
[21:18:03 CEST] <J_Darnley> add "|| avctx->idct_algo == FF_IDCT_SIMPLE" to line 103
[21:47:47 CEST] <BBB> oh
[21:47:53 CEST] <BBB> J_Darnley: yeah theres definitely a bug
[21:48:01 CEST] <BBB> I dont know if its in the C or in the SSE2
[21:48:46 CEST] <atomnuker> what the hell is subps doing, it should only do a subtraction
[21:49:00 CEST] <iive> ?
[21:49:15 CEST] <atomnuker> yet I get y[0], y[1], y[3], y[2] instead of y0123
[21:49:45 CEST] <atomnuker> every pair of numbers is the other way around
[21:49:51 CEST] <nevcairiel> that doesnt sound like something subps would do
[21:49:56 CEST] <atomnuker> (-10.771465 72.416946 -4.435959 22.024652) vs (72.416946 -10.771465 22.024652 -4.435959)
[21:50:37 CEST] <atomnuker> I checked and all the inputs are exactly the way they should be, no swapping or anything
[21:50:51 CEST] <atomnuker> yet as soon as I subtract them I get this
[21:51:12 CEST] <iive> can we see the whole code?
[21:51:23 CEST] <iive> i've used subps, no suprises there
[21:52:24 CEST] <iive> and the above is more y1032 vs y0123
[21:53:24 CEST] <atomnuker> https://0x0.st/lNc.asm
[21:53:36 CEST] <iive> you know that in register values are y[3] [2] [1] [0] compared to memory locations.
[21:53:40 CEST] <BBB> something is clearly wrong in mdec here
[21:53:53 CEST] <BBB> I dont think the ac quantizers are initialized correctly
[21:55:00 CEST] <atomnuker> iive: m2 is completely the right way around and everything matches
[21:58:37 CEST] <atomnuker> iive: nevermind, my reference C code had them swapped
[21:59:05 CEST] <iive> yeh, i was checking the swap code :D
[21:59:46 CEST] <J_Darnley> BBB: so there might be a bug before the idct is even run
[22:00:25 CEST] <BBB> well the idct is still returning wrong results
[22:00:27 CEST] <BBB> Im testing...
[22:09:43 CEST] <cone-822> ffmpeg 03Michael Niedermayer 07master:e77ddd31a8e1: avcodec/shorten: Sanity check maxnlpc
[22:09:43 CEST] <cone-822> ffmpeg 03Michael Niedermayer 07master:16d6cc2168b6: avcodec/wavpack: Change wp_log2() to unsigned
[22:09:43 CEST] <cone-822> ffmpeg 03Michael Niedermayer 07master:dfb61ea26300: avcodec/jpeg2000dec: Check nonzerobits more completely
[22:22:34 CEST] <BBB> huh
[22:22:36 CEST] <BBB> J_Darnley: 
[22:22:37 CEST] <BBB>     pand m3, m5
[22:22:38 CEST] <BBB>     por  m3, m6
[22:22:38 CEST] <BBB> why?
[22:24:16 CEST] <J_Darnley> huh?  That's merging in the stored DC only part
[22:25:44 CEST] <BBB> but m3 has nothing in it
[22:25:49 CEST] <BBB> m3 is a temporary in the 8x8 transpose
[22:26:41 CEST] <J_Darnley> oh I guess I was overzealous with my typing
[23:18:30 CEST] <BBB> J_Darnley: https://pastebin.com/mVvA8fAR
[23:18:36 CEST] <BBB> J_Darnley: thats the output from the failing mdec block
[23:18:54 CEST] <BBB> it looks like it may still be related to the rounding in the dc-only case
[23:22:53 CEST] <BBB> J_Darnley: and https://pastebin.com/J1CiCSfA with 1d output from the C
[23:23:01 CEST] <BBB> havent gotten much further yet...
[23:23:41 CEST] <BBB> hm actually that may be wrong, ignore for now
[23:27:13 CEST] <BBB> and now it matches :-/
[23:43:19 CEST] <BBB> I dont know whats wrong yet. when I call the C function (which gives the correct output) and manually call the avx idct on the transposed block data, all block output matches (I think)
[23:43:30 CEST] <BBB> but when I remove -idct simple, data doesnt match
[23:43:39 CEST] <BBB> maybe the transpose in permutation isnt working correctly
[23:43:41 CEST] <BBB> not sure yet
[23:45:12 CEST] <BBB> I think the quant_table is not transposed
[23:45:46 CEST] Action: J_Darnley reads
[23:48:04 CEST] <BBB> ah yes thats it
[23:48:11 CEST] <BBB> quant_matrix needs to be transposed in mdec.c
[23:48:22 CEST] <BBB> if I do that manually, the fate test passes
[23:48:31 CEST] <BBB> lemme see how mpegvideo does that, we can probably just copy that
[23:49:01 CEST] <J_Darnley> mmhm... we need to transpose on demand
[23:51:16 CEST] <BBB> michaelni: why is ff_mpeg1_default_intra_matrix[] 256 if it only has 64 entries (in mpeg12data.c)?
[23:51:41 CEST] <kierank> BBB: https://lists.mplayerhq.hu/pipermail/ffmpeg-cvslog/2012-November/057056.html
[23:52:07 CEST] <BBB> da fuq
[23:52:13 CEST] <J_Darnley> That was quick
[23:52:14 CEST] <kierank> my thoughts too
[23:53:59 CEST] <BBB> hum
[23:53:59 CEST] <BBB> ok
[23:54:00 CEST] <BBB> so
[23:54:05 CEST] <BBB> after that, the sse2 function passes
[23:54:07 CEST] <BBB> however........
[23:54:10 CEST] <BBB> the mmx one still fails
[23:54:16 CEST] <BBB> I do not at all understand why
[23:54:46 CEST] <J_Darnley> will it use the correct permutation for mmx?
[23:54:57 CEST] <BBB> it should
[23:55:01 CEST] <BBB> Im basically un-permutating it
[23:55:10 CEST] <BBB> that should work for all permutations
[23:55:22 CEST] <BBB> J_Darnley: https://pastebin.com/gzs0DB54
[23:55:26 CEST] <BBB> that fixes mdec for me
[23:55:38 CEST] <BBB> what was the other? you said it was wmv1 encoding, right? or was it wmv1 decoding?
[23:55:50 CEST] <J_Darnley> encoding
[23:55:53 CEST] <BBB> (I considered that the quant fix you made in mpegvideoenc may have fixed wmv1 encoding)
[23:55:59 CEST] <J_Darnley> (maybe decoding too)
[23:56:05 CEST] <BBB> oh its still encoding, even after the quant fix?
[23:56:16 CEST] <BBB> (the quantization getting support for transpose permutation)
[23:56:28 CEST] <J_Darnley> yeah, I know what you mean
[23:56:30 CEST] <J_Darnley> anyway
[23:56:35 CEST] <BBB> but still broken after that?
[23:56:51 CEST] <J_Darnley> yes, the output is not identical between c and sse2
[23:57:00 CEST] <BBB> okiedokie
[23:57:37 CEST] <J_Darnley> That one is a little harder to trace
[23:58:13 CEST] <BBB> nope, encoding is identical for me
[23:58:27 CEST] <BBB> oh wait that was decoding
[23:58:28 CEST] <BBB> doh
[23:58:30 CEST] <BBB> hold on
[23:58:35 CEST] <BBB> ffmpeg.exe can be so hard to use
[23:59:19 CEST] <BBB> yup, reproduceable
[23:59:20 CEST] <BBB> ok
[23:59:22 CEST] <BBB> will debug
[23:59:28 CEST] <BBB> this is soooooooooo exciting
[00:00:00 CEST] --- Sat Jun 17 2017