[Ffmpeg-devel-irc] ffmpeg-devel.log.20170614

Thu Jun 15 03:05:04 EEST 2017

[00:00:01 CEST] <BBB> Im sure theres a reason for it but & who knows
[00:00:26 CEST] <BBB> I would suggest to not get into that discussion but change it to 16384 for 8bit and be done with it ;)
[00:00:37 CEST] <TMM> durandal_1707, so for the existing decode_map it seems that the demuxer creates another av_new_packet(), which gets read by ipvideo_decode_frame, could I send multiple of these packets to the decoder?
[00:00:37 CEST] <BBB> gonna go home, will review patches tomorrow
[00:00:42 CEST] <BBB> poke me if you need more help
[00:01:09 CEST] <jdarnley> Bye and thank you again.
[00:01:27 CEST] <BBB> happy to help
[00:01:35 CEST] <BBB> Im really excited that youre doing this
[00:01:44 CEST] <BBB> nobody ever wanted to do this, and I genuinely think its important
[00:01:48 CEST] <BBB> just very tedious :(
[00:01:58 CEST] <BBB> later :)
[00:02:06 CEST] <TMM> is there maybe some documentation I should be reading?
[00:02:39 CEST] <durandal_1707> multiple at once?
[00:02:53 CEST] <wm4> TMM: just append what you need to the packet on the demuxer side
[00:03:15 CEST] <wm4> it's not like there's a standard IP packet format, so you can put in there whatever you want
[00:03:27 CEST] <durandal_1707> every packet should represent single decoded frame
[00:05:49 CEST] <TMM> I don't think I understand where the video data goes
[00:07:05 CEST] <TMM> so, there's no knowledge of the structure of IpvideoContext in the demuxer, right?
[00:09:14 CEST] <TMM> oh... the codec does know about IPMVEContext
[00:09:42 CEST] <TMM> IPMVEContext *ipmovie = s->priv_data;
[00:09:56 CEST] <TMM> *sigh* wrong file
[00:10:00 CEST] <TMM> nvm, sorry 
[00:17:57 CEST] <TMM> so this AVPacket->data thing is entirely up to the demuxer/codec to define?
[00:18:47 CEST] <nevcairiel> yes and no, best is to stay close to the file format
[00:21:54 CEST] <TMM> hmm, ok, but how do I send three different data streams to the video codec if not by placing it all in that field?
[00:22:25 CEST] <BBB> is this a tripple video stream?
[00:22:26 CEST] <TMM> the current format that's implemented requires 2 bits of data, the pixeldata, and the 'decoding map', I need to also send this 'skip map'
[00:22:51 CEST] <BBB> sounds like a job for extradaat or so
[00:22:57 CEST] <BBB> but what do I know
[00:23:00 CEST] <TMM> the 0x11 format does this by sticking the decoding map into AVPacket->data
[00:23:08 CEST] <TMM> BBB, infinitely more than me! :D
[00:23:51 CEST] <TMM> ok, I'll see about this extradata thing then, it's not currently used I think
[00:23:59 CEST] <TMM> I'll figure it out 
[00:24:24 CEST] <TMM> there's also AVPacketSideData?
[00:32:30 CEST] <TMM> BBB, so this extradata goes into AVStream, right? but these maps change every frame, would it still make sense to stick it in there?
[00:32:50 CEST] <michaelni> BBB, i remember tuning the 1638X thing to fix some issue, probably some artifacts caused by idct rounding difference aka greenish/pinkish blocks
[00:32:59 CEST] <nevcairiel> if they change every frame then they are part of the frame itself, and should be in AVPacket as part of the frame
[00:33:10 CEST] <nevcairiel> extradata is static once for the entire stream
[00:33:42 CEST] <TMM> nevcairiel, so, it should just both go in pkt->data then?
[00:34:05 CEST] <atomnuker> TMM: no, they get attached to the packet
[00:34:38 CEST] <TMM> it seems currently the decode_map gets read into pkt->data
[00:35:04 CEST] <TMM> if (avio_read(pb, pkt->data + 2, s->decode_map_chunk_size) <snip>
[00:36:19 CEST] <TMM> so, that's wrong then?
[00:38:31 CEST] <atomnuker> yep, it seems it does that
[00:38:48 CEST] <atomnuker> this was written long before extradata was a thing I guess
[00:39:09 CEST] <TMM> but I shouldn't use extradata?
[00:39:18 CEST] <TMM> it's per-frame data
[00:39:26 CEST] <wm4> (pretty sure extradata was always a thing)
[00:40:23 CEST] <wm4> yeah, interplay was added after extradata became a thing
[00:40:28 CEST] <TMM> ok, so what do I do? refactor this? The easy way out would be to stick the subcodec in the fourcc field, and just append the skipmap to pkt->data, but I guess this won't necessarily meet with much approval? :P
[00:40:42 CEST] <wm4> with that useful fact I disappear again
[00:42:27 CEST] <atomnuker> TMM: demuxing and decoding seem not well abstracted with that format
[00:42:41 CEST] <nevcairiel> if its  per-frame data it should go into the packet, its as simple as that
[00:42:46 CEST] <atomnuker> normally a codec packet is very well defined
[00:43:31 CEST] <TMM> is there a good example for how this should be done?
[00:43:46 CEST] <atomnuker> I guess you should stick it at the end of the packet
[00:44:00 CEST] <atomnuker> in this case there doesn't seem to be a nicer way
[00:44:02 CEST] <TMM> I don't mind fixing it, I intend to port this to scummvm like we did with smacker for instance, so doing it right now may help
[00:45:50 CEST] <TMM> If I know what it should look like at least :)
[00:56:06 CEST] <atomnuker> TMM: to make things look as good as possible what you'd have to do is move the decoding map size to the extradata
[00:56:22 CEST] <atomnuker> av_packet_new_side_data with type AV_PKT_DATA_NEW_EXTRADATA
[00:56:34 CEST] <TMM> atomnuker, but... nevcairiel told me that extradata shouldn't be used for per-frame data?
[00:57:07 CEST] <nevcairiel> definitely dont send new extradata with every single frame
[00:57:09 CEST] <nevcairiel> thats just bad design
[00:57:14 CEST] <TMM> I'd have to
[00:57:17 CEST] <nevcairiel> thats just frame data then
[00:57:19 CEST] <nevcairiel> not extradata
[00:58:01 CEST] <atomnuker> I guess bad segmentation is unavoidable with these old codecs
[00:58:01 CEST] <atomnuker> in which case stick it at the end of the packet
[00:58:53 CEST] <TMM> OK
[00:59:17 CEST] <TMM> I'll prepare a PR then, thanks for your help
[01:00:51 CEST] <TMM> atomnuker, what do you mean by 'bad segmentation' btw? this is my first serious foray into reverse engineering a video codec
[01:04:31 CEST] <iive> extradata has always been a thing, it exists since avi. Sidedata is more recent.
[01:04:52 CEST] <atomnuker> well, muxers mux packets, and there's a very clear line between what parts are those of the container and what parts are for audio and video packets
[01:05:35 CEST] <atomnuker> but with those old game codecs containers were only meant to mux very specific things rather than general packets from different codec types
[01:05:36 CEST] <TMM> I'll be working on sprite support for truemotion1 support next, having a bit more vocabulary would maybe help
[01:06:38 CEST] <TMM> is SideData per frame?
[01:06:39 CEST] <nevcairiel> game formats just don't always stick to clear rules, the container is designed to hold one particular format, so it smeares all the typical borders
[01:06:46 CEST] <atomnuker> so those codecs tend to treat the entire bitstream, video packets and containers as 1
[01:07:15 CEST] <nevcairiel> just shove whatever is needed to decode a single frame into the avpacket data
[01:08:02 CEST] <TMM> yeah, so there are two video frame formats that are currently unimplemented. The 0x06 stream actually sticks the 'decode_map' at the top of the framedata, whereas for 0x10 and 0x11 videos it gets loaded by a separate opcode in a different segment of the file
[01:08:05 CEST] <TMM> it's a bit of a mess
[01:08:28 CEST] <TMM> but the actual content and meaning of the decode map is identical between 0x06 and 0x10 frame formats, but different between 0x10 and 0x11 
[01:09:04 CEST] <TMM> the difference between 0x06 and 0x10 is primarily the addition of this 'skip map'
[01:09:25 CEST] <TMM> I'm thinking that for 0x06 I will just generate a static skip map and split out the decode map in the demuxer
[01:09:32 CEST] <TMM> that way the code for both frame formats could be the same
[01:10:04 CEST] <TMM> in the original implementation there are 4 almost identical implementations of this frame format, for with and without a skipmap and for 8 and 16 bit...
[01:10:25 CEST] <TMM> the only difference between the 8 and 16 bit versions being the amount of bits to copy per 8x8 block of pixels
[01:10:42 CEST] <TMM> it looks like someone at interplay was having a bad day or something
[01:42:39 CEST] <atomnuker> BBB: any plans to improve the performance of the vp56 entropy decoder?
[01:43:03 CEST] <atomnuker> having looked at the code last summer I couldn't really find many ways to improve it
[01:43:09 CEST] <nevcairiel> is it unsimdable like most smart entropy coders?
[01:46:32 CEST] <nevcairiel> can go the cabac way and write some fancy branchless assembly
[01:47:15 CEST] <BBB> atomnuker: I think jason looked at that a while ago and what we have now is the best he could come up with
[01:47:20 CEST] <BBB> atomnuker: I have no reason to doubt him
[01:47:36 CEST] <BBB> nevcairiel: yes its unsimdable
[01:53:44 CEST] <atomnuker> well, I guess improving its decoding performance isn't really fixing the issue
[01:54:08 CEST] <atomnuker> the issue being the rate control goes batshit and blows the bitrate up
[01:54:23 CEST] <nevcairiel> isnt it much faster then cabac anyway
[01:54:33 CEST] <BBB> it is
[01:54:39 CEST] <BBB> because cabac is adaptive and vp56rac is not
[01:54:48 CEST] <BBB> atomnuker: huh what?
[01:54:55 CEST] <BBB> atomnuker: is this a specific bug youre looking at?
[01:55:09 CEST] <atomnuker> nah, just looking at some 8k youtube videos
[01:55:17 CEST] <BBB> lol
[01:55:31 CEST] <BBB> hey look, every decoder in the world is slow for 8k content
[01:55:33 CEST] <BBB> surprise! :D
[01:56:03 CEST] <iive> well... that's why 8k is the new thing
[01:56:07 CEST] <nevcairiel> my gpu can do it in hardware! both vp9 and hevc!
[01:56:07 CEST] <nevcairiel> :D
[01:56:12 CEST] <atomnuker> whats interesting is that some 8k videos have lower overall bitrate than the equaivalent 4k videos
[01:56:13 CEST] <BBB> at 8k, every 64x64 block is basically dc intra-predictable with just a dc coefficient and it would still look ok
[01:57:04 CEST] <atomnuker> most work fine here as long as the bitrate isn't above 50ish mbps
[01:57:29 CEST] <nevcairiel> i have a crazy hevc test clip with 1000mbps
[01:57:45 CEST] <nevcairiel> it breaks all sorts of things
[01:58:01 CEST] <atomnuker> however youtube's infamous rate spikes bring it up and make the player drop frames
[01:58:50 CEST] <atomnuker> what's funny is the rate control system doesn't even think its blown its budget and doesn't bump quantizers after the spikes
[01:59:12 CEST] <nevcairiel> bad RCs dont notice momentary spikes as long as the average is fine
[01:59:59 CEST] <nevcairiel> if it thought it would blow its budget, it would probably bump the quants to avoid the spike, i nstead of doing it afterwards
[02:05:50 CEST] <atomnuker> (yes they even offer 8k h264 still)
[02:06:24 CEST] <iive> isn't vbv supposed to handle that?
[02:06:50 CEST] <DHE> it's supposed to, but you still have to configure it properly. set a bufsize and so on
[02:06:59 CEST] <nevcairiel> clearly their vp9 encoder doesnt have that =p
[02:09:22 CEST] <iive> is there more than 1 vp9 encoder out there?
[02:12:07 CEST] <DHE> old version? badly configured encoder?
[02:13:58 CEST] <TD-Linux> yes there is BBB's encoder as well
[02:14:04 CEST] <TD-Linux> libvpx doesn't have a "real" VBV model
[02:14:27 CEST] <TD-Linux> you can configure it pretty close, though I don't think youtube does
[02:14:49 CEST] <TD-Linux> of course, there's also the people who complain about youtube not using crf, which would be even more spiky...
[02:31:44 CEST] <nevcairiel> hey BBB gets to fix crashes in vp9 again
[03:05:07 CEST] <BBB> nevcairiel: ah crap
[03:05:17 CEST] <BBB> but vp9.c is bug-free!
[10:09:54 CEST] <stevenliu_> Hello guys, i have a question,   i want get the pict_type in hlsenc.c , the AVCodecContext has deprecated, then i want to check if the stream have b frame, how should i get it?
[10:10:35 CEST] <stevenliu_> the AVCodecParameters have no has_b_frames member.
[10:11:43 CEST] <nevcairiel> this is not information available to muxers
[10:13:40 CEST] <stevenliu_> ok, let me try anther ways :(
[10:13:47 CEST] <stevenliu_> another
[10:20:22 CEST] <wm4> why would hlsenc need that info?
[10:27:10 CEST] <ubitux> ps_stereo_interpolate_c: 72639.9
[10:27:11 CEST] <ubitux> ps_stereo_interpolate_neon: 72637.1
[10:27:13 CEST] <ubitux> ps_stereo_interpolate_ipdopd_c: 117688.2
[10:27:15 CEST] <ubitux> ps_stereo_interpolate_ipdopd_neon: 113633.4
[10:27:17 CEST] <ubitux> wouhou i'm faster T_T
[10:37:56 CEST] <nevcairiel> by a hair :p
[10:39:20 CEST] <mateo`> I suspect the memory accesses to be the bottleneck
[10:39:22 CEST] <wm4> every speedup is good, every speedup is sacred
[10:47:33 CEST] <stevenliu_> no ,hlsenc not need it
[10:48:05 CEST] <stevenliu_> i get a problem, remux from flv to hls, the hls->duration is loooong than ts file duration
[10:49:19 CEST] <stevenliu_> the case because the flv modified fps at recoding stream,  rewrite avcC at middle of the flv file.
[10:49:35 CEST] <stevenliu_> and the AVStream is not modify timebase
[10:50:06 CEST] <wm4> the solution is not to remux to flv
[10:50:08 CEST] <stevenliu_> so the compute is wrong, the hls->duration is wrong.
[10:50:20 CEST] <stevenliu_> remux from flv to hls
[10:52:10 CEST] <wm4> ah
[10:52:32 CEST] <stevenliu_> the flv file looks like,   cat a.flv b.flv > input.flv     a.flv video fps is 30   b.flv video fps is 25
[10:52:53 CEST] <BtbN> I don't think that results in a valid file
[10:52:55 CEST] <stevenliu_> or cat a.flv b.flv > input.flv     a.flv video fps is 25   b.flv video fps is 30
[10:53:13 CEST] <thebombzen> flv doesn't support concatenation, IIRC
[10:53:25 CEST] <thebombzen> concatenating flv files doesn't work the way you want it to
[10:53:40 CEST] <stevenliu_> No, not concat
[10:53:52 CEST] <thebombzen> that is literally what 'cat' does
[10:54:39 CEST] <thebombzen> also should this be in #ffmpeg? not #ffmpeg-devel?
[10:55:37 CEST] <stevenliu_> for example, a rtmp server record stream to flv,  when user publish rtmp to server, server will record flv, if user modify FPS at living stream dynamic, the recorder always write data in flv file, the file will looks like concat, but not concat
[10:56:29 CEST] <stevenliu_> BtbN, you mean that is not ffmpeg problem?
[10:57:00 CEST] <thebombzen> stevenliu_: if you concatenate the files together (which is what 'cat' does) then the resulting file will not be a valid FLV file
[10:57:02 CEST] <stevenliu_> I want fix it, but i have no idea now
[11:05:34 CEST] <stevenliu_> #thebombzen Thanks :)
[12:09:00 CEST] Action: J_Darnley sighs
[12:09:51 CEST] <J_Darnley> Despite BBB's hard work of tracking down the differences between these different functions it still doesn't work correctly.
[12:18:58 CEST] <J_Darnley> I must ask him how he does that.
[12:21:43 CEST] <J_Darnley> Speak of the devil
[12:22:06 CEST] <J_Darnley> BBB: good morning
[12:24:45 CEST] <BBB> hi
[12:24:48 CEST] <BBB> what did I do
[12:25:38 CEST] <J_Darnley> Nothing, just arrives right after I said I need to ask you how you tracked down those IDCT differences
[12:25:48 CEST] <BBB> I gotta make breakfast for kids, Ill explain after that, ok?
[12:25:53 CEST] <J_Darnley> Sure
[13:19:41 CEST] Action: J_Darnley is afk
[13:55:26 CEST] Action: J_Darnley is back
[13:56:15 CEST] <BBB> J_Darnley: ok, so
[13:56:35 CEST] <BBB> J_Darnley: what I did is that I logged the sumErr per transform block for each idct implementation (C, MMX, SSE2, etc.)
[13:56:49 CEST] <BBB> J_Darnley: while running dct-test
[13:56:59 CEST] <BBB> J_Darnley: so that gives me a 20k line file per implementation
[13:57:07 CEST] <BBB> J_Darnley: I diff the C vs. SSE2 to find which iters give a difference
[13:57:21 CEST] <BBB> J_Darnley: then for these iters I log the input and output to see how they are diffrent
[13:57:37 CEST] <BBB> J_Darnley: then I wrote a test application that takes one of the difference-generating inputs and directly compare C vs. SSE2
[13:57:52 CEST] <BBB> J_Darnley: then in the C function I log intermediates (e.g. after 1d, or within the 1d) and same in SSE2
[13:58:04 CEST] <BBB> J_Darnley: it was pretty obvious from that that the problem was in the DC
[13:58:10 CEST] <BBB> J_Darnley: rest is just code logic and stuff
[13:58:26 CEST] Action: J_Darnley nods
[13:58:47 CEST] <BBB> like I said yesterday, its not exciting or anything, but it gets the job done ;)
[13:58:59 CEST] <BBB> did michaelni say anything about 16384 being ok as a patch?
[13:59:13 CEST] <J_Darnley> I haven't asked yet, or submitted a patch
[13:59:30 CEST] <BBB> ok
[14:00:19 CEST] <J_Darnley> Changing that in the C did lead to a 13 thousand line change to the test reference files
[14:00:31 CEST] <BBB> thats not surprising I guess
[14:00:45 CEST] <BBB> but the sumErr in the dct-test application does go down, right?
[14:01:06 CEST] <J_Darnley> Yes I think so
[14:02:21 CEST] <michaelni> <michaelni> BBB, i remember tuning the 1638X thing to fix some issue, probably some artifacts caused by idct rounding difference aka greenish/pinkish blocks
[14:03:14 CEST] <BBB> do you remember which sample or bugid or git hash?
[14:04:36 CEST] <J_Darnley> git blame on the template file should tell you which commit it was changed in.
[14:04:58 CEST] <BBB> can you guys try to figure that out? :)
[14:05:08 CEST] <BBB> once I have a sample I can look into it and see if theres other things we can do
[14:07:26 CEST] <J_Darnley> Wow.  Revision 433 from 2002
[14:07:44 CEST] <J_Darnley> git hash ccf589a8fe
[14:08:57 CEST] <J_Darnley> And the revision immediately before it
[14:09:14 CEST] <J_Darnley> Unfortunately there is not much in the commit messages
[14:09:25 CEST] <BBB> hm...
[14:12:03 CEST] <BBB> this is all very complicated, as is usually the case when old code is touched :-p
[14:13:28 CEST] <BBB> I personally think we need access to blocks that lead to that change, otherwise its hard to figure out what was going on there and we have to assume it didnt happen
[14:13:45 CEST] <BBB> but to each his own opinion ;)
[14:32:00 CEST] <BBB> J_Darnley: my suggestion would be to send a patch that reverts ccf5 and the patch before it and open it up for discussion
[14:32:33 CEST] <BBB> J_Darnley: alternatively, we can mangle the dc-only patch to also use 16383 instead of rightshift by 14 (which is identical to *=16384)
[14:32:42 CEST] <BBB> J_Darnley: but right now results are inconsistent and I dont like it at all
[14:33:17 CEST] <J_Darnley> Mangling sounds easy in the C, less so in the MMX (if we care about that)
[14:33:52 CEST] <J_Darnley> also a clarification: the sumerr goes down in test 2 of dct but neither of the others.
[14:35:14 CEST] <BBB> test 2 is a particular type of stress test
[14:35:34 CEST] <BBB> Im also concerned that the C was adjusted to give worse results just to match the MMX, even though the overflow was in the MMX alone
[14:35:51 CEST] <BBB> (worse results as measured by maxSumErr in dct-test)
[14:38:38 CEST] <J_Darnley> I don't understand that.  I thought the MMX doesn't match the C in master so it isn't used for fate
[14:38:51 CEST] Action: J_Darnley is going around in circles
[14:42:57 CEST] <michaelni> C was likely adjusted because it produced artifacts but its 15years ago
[14:44:23 CEST] <kierank> eugh
[14:44:41 CEST] <sigdrak> if you change the idct now, and thus the reconstruction, won't that cause error accumulation and then drifts ?
[14:44:51 CEST] <michaelni> sigdrak, yes
[14:45:27 CEST] Action: kierank wonders if he opened a big can of worms
[14:45:28 CEST] <sigdrak> https://guru.multimedia.cx/the-mpeg124-and-h26123-idct/
[14:45:53 CEST] <kierank> J_Darnley: we definitely need to blog about this now
[14:47:51 CEST] <BBB> we are merely one implementation
[14:47:57 CEST] <BBB> so if we drifted, we need to fix it on our end
[14:48:06 CEST] <BBB> some people use ffmpeg to decode and/or encode ffmpeg
[14:48:16 CEST] <BBB> but we are by no means the predominant implementation
[14:48:22 CEST] <BBB> let alone reference
[14:48:58 CEST] <BBB> michaelni: the C code was adjusted to match the C, as can be seen in the commit order; the overflow was in the MMX, not in the C
[14:49:15 CEST] <BBB> C code was adjusted to match the MMX, I mean
[14:50:21 CEST] <michaelni> c and mmx where likely adjusted because of artifacts in ffmpeg decoder output from non ffmpeg generated files
[14:50:42 CEST] <michaelni> i dont think theres an overflow but its long ago
[14:51:56 CEST] <BBB> hm& this is all very difficult if we dont have samples to work with
[14:52:50 CEST] <BBB> and the professor concluded: And this, my students, is why we have informational commit messages. end of class, please hand in your homework assignment no later than next week friday
[14:53:52 CEST] <BBB> gh0st__: https://trac.ffmpeg.org/ticket/6459 is a bug in your code, see http://ffmpeg.org/pipermail/ffmpeg-devel/2017-June/212400.html
[14:57:14 CEST] <michaelni> samples would give more information and allow reproducing but it would just confirm that the value we have since 15 years work while the vaue before had an issue
[14:57:48 CEST] <BBB> it allows investigating alternate solutions
[14:57:58 CEST] <BBB> we dont even knwo whether the issue is a drift or an overflow at this point
[14:58:17 CEST] <BBB> I also still believe we should fundamentally try to be close to the reference
[14:58:29 CEST] <BBB> not to some particularly buggy implementation that used 16383 for who knows what kind of reason
[14:58:34 CEST] <BBB> thats called a hack
[14:58:46 CEST] <BBB> we could have a special idct that uses 16383 for files that are particularly buggy there
[14:58:56 CEST] <BBB> but if that makes us further away from the reference, that shouldnt be the default by any means
[14:58:57 CEST] <BBB> right?
[14:59:09 CEST] <nevcairiel> that C and MMX dont match is already quite suspicious for hacky things
[14:59:09 CEST] <iive> reference doesn't matter
[14:59:17 CEST] <J_Darnley> The problem is I bet we can't detect the problem files and then do the Right Thing.
[14:59:31 CEST] <iive> it's the error toward the existing implementations that is more important.
[14:59:44 CEST] <michaelni> BBB the reference for the decoder IDCT is the encoder IDCT
[14:59:44 CEST] <iive> and i guess this includes xvid and divx
[15:00:07 CEST] <BBB> michaelni: in the implementation
[15:00:11 CEST] <BBB> michaelni: but in the standard?
[15:00:20 CEST] <BBB> I know the standard doesnt define an exact idct
[15:00:20 CEST] <iive> and the fact that xvid used mpeg2 idct at the start doesn't make things easier.
[15:00:21 CEST] <BBB> but it has a target
[15:00:47 CEST] <BBB> J_Darnley: we can tell users to use a special option for files that show visual artifacs
[15:00:48 CEST] <iive> the standard defines an error range
[15:00:59 CEST] <michaelni> BBB if you change the default IDCT you will break alot of things and help noone
[15:01:16 CEST] <BBB> I didnt change the default idct from the reference, you did, in 2002 :-p
[15:01:21 CEST] <michaelni> BBB no
[15:01:25 CEST] <BBB> Im just suggesting putting it back to what it was before that
[15:01:28 CEST] <michaelni> no
[15:01:50 CEST] <BBB> commit ccf589a8fed254f7395b05beac2ef84e7dd89e6f
[15:01:51 CEST] <BBB> Author: Michael Niedermayer <michaelni at gmx.at>
[15:01:51 CEST] <BBB> no?
[15:01:57 CEST] <michaelni> " idct from the reference, you did, in 2002" <-- wrong
[15:02:30 CEST] <BBB> you say that patch fixes a particular file
[15:02:43 CEST] <BBB> and a particular artifact
[15:02:45 CEST] <BBB> right?
[15:03:00 CEST] <J_Darnley> kierank: what have we done?
[15:03:04 CEST] <BBB> lol
[15:03:05 CEST] <michaelni> it probably fixed many files and made the idct work better overall
[15:03:12 CEST] <BBB> probably is not definitive
[15:03:26 CEST] <BBB> compared to the floating point (which I call reference) idct, it adds error
[15:03:32 CEST] <kierank> J_Darnley: i wonder if i have unleashed a beast
[15:03:39 CEST] <BBB> maxSumError for dct-test in -i2 is 0 with 16384, but is >0 with 16383
[15:03:56 CEST] <BBB> that suggests to me that 16384 is a better value for reduced round-trip error against independently written idcts than 16383
[15:03:57 CEST] <BBB> right?
[15:04:20 CEST] <iive> BBB: why don't you read the standard
[15:04:28 CEST] <iive> BBB: it defines an error RANGE
[15:06:56 CEST] <michaelni> if we change the idct not only will some files from other encoders show artifacts, ffmpeg generated files will show artifacts from IDCT mismatch and files generated with the new IDCT will show atifacts with other decoders and old ffmpeg
[15:07:27 CEST] <BBB> indeed
[15:07:39 CEST] <atomnuker> so what's the issue here then?
[15:07:55 CEST] <BBB> files generated from still other encoders will also stop showing artifacts since we are getting closer to the reference (floating point) idct
[15:08:01 CEST] <BBB> its all very complicated
[15:08:14 CEST] <iive> I thought that the issue is that C and MMX differ... but it seems it have drifted somewhere else.
[15:08:36 CEST] <iive> BBB: other ecoders also use flawed IDCT
[15:08:38 CEST] <BBB> J_Darnley: the alternate solution is to re-do the masking that mmx and C do - for 8 bits only - and use 16384 only in those cases
[15:09:36 CEST] <J_Darnley> Hm.  I should try that.  It might make it faster too
[15:09:37 CEST] <michaelni> BBB, i doubt any inter frame encoder with large gop distance uses a 8x8 float idct
[15:09:54 CEST] <BBB> J_Darnley: no, itll make it slower :(
[15:10:05 CEST] <BBB> J_Darnley: you have to do full idct and dc-only idct for both dimensions
[15:10:14 CEST] <BBB> J_Darnley: and then mask them based on each row being dc-only or not
[15:10:21 CEST] <BBB> J_Darnley: its hideous if I may say so
[15:10:26 CEST] <BBB> but thats what the C code does
[15:10:36 CEST] <BBB> so it makes it probably about 20% slower or so
[15:10:45 CEST] <BBB> (rough guesstimate)
[15:12:58 CEST] Action: J_Darnley grumbles
[15:13:22 CEST] <BBB> let me see if I can write an example for you
[15:13:28 CEST] <BBB> and yes, again, this is totally hideous
[15:13:33 CEST] <BBB> welcome to working with 15-year old code :D
[15:13:52 CEST] <BBB> grep x264_build in h264*.c for more fun
[15:13:56 CEST] <BBB> and thats not 15 years old yet
[15:14:01 CEST] <nevcairiel> you can "thank" BBB later for pulling you into this =p
[15:14:12 CEST] <BBB> yikes
[15:17:56 CEST] <iive> michaelni: btw, do you know if ffmpeg4 detects old xvid bitstream ans switches to a special idct, or simple_idct is used for them too?
[15:18:29 CEST] <iive> i might have at least one xvid sample that shows green tilt 
[15:19:52 CEST] <BBB> J_Darnley: the basic idea is in the idct_1d, youd por all ac coefficients together, then peqw with a zero register to see if all acs were zero, lets call that res (so res=0xff or 0 for each row/col), then use the dc to calculate dc <<= 3 and mask (pand) the res with the dc and reverse-mask (pandn) with the normal result, por the two together and thats your output
[15:20:06 CEST] <BBB> J_Darnley: its a little complicated but thatll give the result you want and matches the current c code
[15:20:20 CEST] <BBB> want is subjective obviously
[15:22:33 CEST] <michaelni> iive, we switch ti xvid idct if xvid is deteted
[15:23:22 CEST] Action: J_Darnley needs water
[15:27:51 CEST] <iive> BBB: you want to do full 1D IDCT and DC detect calculations and then pblendv them?
[15:28:17 CEST] <iive> isn't the DC detect used to quickly skip the 1D IDCT
[15:32:13 CEST] <ubitux> i fixed the very large precision problem in aacpsdsp neon arm code
[15:32:19 CEST] <ubitux> but it's not sexy
[15:32:43 CEST] <ubitux> http://sprunge.us/XKjJ?diff
[15:32:46 CEST] <ubitux> any suggestion?
[15:33:11 CEST] <ubitux> basically the offset values are *2 and that initial accuracy drift over time pretty quickly
[15:33:33 CEST] <ubitux> main problem is that it adds 2 instruction in the inner loop
[15:35:10 CEST] <iive> ubitux: are the original 1* values used somewhere in the functions?
[15:35:26 CEST] <iive> you can just double the increments before the loop.
[15:36:00 CEST] <ubitux> that's exactly what i'm removing
[15:36:13 CEST] <ubitux> the first add on top of the diff is the double of the increment
[15:36:28 CEST] <ubitux> but it creates an inaccuracy that increases progressively in the inner loop
[15:36:52 CEST] <ubitux> after ~1024 iteration, the inaccuracy reaches a 0.01 error
[15:36:57 CEST] <ubitux> which is pretty large
[15:37:55 CEST] <iive> stange, the mantisa shouldn't change in *2
[15:38:27 CEST] <ubitux> maybe we should actually fmul by 2
[15:38:32 CEST] <ubitux> instead of adding itself
[15:39:06 CEST] <iive> you can try
[15:44:53 CEST] <ubitux> doesn't help
[15:47:30 CEST] <iive> can i see the whole function? is it in ffmpeg git?
[15:49:00 CEST] <ubitux> yes it's the arm code
[15:49:05 CEST] <ubitux> libavcodec/arm/aacpsdsp_neon.S
[15:51:38 CEST] <kierank> J_Darnley: bloody boiling today
[16:59:42 CEST] <ubitux> jamrial: i found the cause of the inaccuracy in the arm
[16:59:52 CEST] <ubitux> (arm, not aarch64)
[17:00:08 CEST] <jamrial> ubitux: cool, what was it?
[17:00:23 CEST] <ubitux> http://sprunge.us/aIKE
[17:01:10 CEST] <ubitux> and i'm also "faster" with all the aarch64 code now
[17:01:30 CEST] <ubitux> (http://sprunge.us/WKWY)
[17:17:20 CEST] <BBB> J_Darnley: muhahahhaha https://pastebin.com/63DQrJH8
[17:17:26 CEST] <BBB> also: dont judge me!
[17:17:33 CEST] <BBB> I was in a hurry
[17:23:47 CEST] <J_Darnley> Well you beat me to it so I won't
[17:26:18 CEST] Action: J_Darnley will read it after caffeine
[17:29:14 CEST] <iive> BBB: can't you use pandn and a zero constant? 
[17:30:09 CEST] <BBB> J_Darnley: hey at least I did one thing right, and also its a hack so itll need some cleanup
[17:30:29 CEST] <iive> and once again, isn't it better to use conditional jump ?
[17:37:16 CEST] <J_Darnley> As I understand it we can't jump over the idct because we need to perform it even if just one row has one non-zero coeff.
[17:38:11 CEST] <cone-177> ffmpeg 03Ronald S. Bultje 07master:d35ff98e270d: vp9: fix overwrite in ff_vp9_ipred_dr_16x16_16_avx2.
[17:39:17 CEST] <BBB> right, you need to do both dc-only as well as full 1d idct for the rows
[17:39:30 CEST] <BBB> and then and them per-row depending on whether each 1d row was dc-only or not
[17:39:35 CEST] <BBB> check c code for details
[17:40:01 CEST] <BBB> (the C code is quite well-written and well-documented for something so old)
[17:40:48 CEST] <BBB> and I dont mean that in a derogatory way, its just nice to see well-documented code that isnt total spaghetti and be able to predict behaviour just from reading it
[17:40:52 CEST] <J_Darnley> I rather like it too.  The template works rather well
[17:41:13 CEST] <BBB> the mmx, on the other hand, is very obscure :D
[17:42:22 CEST] <J_Darnley> pcmpeqb X, X is a quick way to get all ones, right?
[17:42:30 CEST] <BBB> yes
[17:42:38 CEST] <BBB> theres probably other ways to get that also
[17:42:46 CEST] <BBB> like I Said << lazy
[17:43:25 CEST] <J_Darnley> Oh of course that is inverting the mask
[17:44:45 CEST] <BBB> right
[17:53:48 CEST] Action: J_Darnley is afk
[17:59:15 CEST] <jamrial> ubitux: would using vmla with a reg containing a 2.0 constant work here? or is it even uglier?
[17:59:53 CEST] <ubitux> i tried it, it didn't help, i get the same failure as with the fadd
[18:00:46 CEST] <ubitux> http://sprunge.us/gMIF?diff  ugly & dumb, dunno if there is a smarter way but that's how i tested it
[18:01:52 CEST] <jamrial> no, i mean, instead of replacing the two adds in the loop with four adds, replace them with two vmlas
[18:05:09 CEST] <jamrial> ubitux: instead of "vadd q0, q0, q15" do "vmla q0, q14, 2.0" or however it's written
[18:06:03 CEST] <BBB> can anyone reproduce any of the issues in 6459/6460/6461/6462?
[18:06:12 CEST] <BBB> asan and valgrind are completely clean for me on two platforms
[18:06:29 CEST] <BBB> (they didnt even detect that assembly issue I fixed)
[18:06:51 CEST] <jamrial> BBB: i can reproduce 6460 on a mingw x64 build
[18:07:11 CEST] <BBB> so why cant I?
[18:08:18 CEST] <jamrial> let me check again. i could last night at least
[18:10:54 CEST] <ubitux> jamrial: doesn't help
[18:15:16 CEST] <jamrial> BBB: can't reproduce it anymore
[18:15:24 CEST] <BBB> uhm
[18:15:25 CEST] <BBB> ok
[18:15:32 CEST] <BBB> so maybe 6459 was responsible for the others also?
[18:15:38 CEST] <BBB> can you reproduce any of 6460/1/2?
[18:16:24 CEST] <jamrial> BBB: if i revert d35ff98e27 i can reproduce the crash
[18:17:10 CEST] <jamrial> 6460 i mean. without that commit it crashes, doesn't crash with it
[18:17:25 CEST] <BBB> testing..
[18:17:47 CEST] <BBB> yes
[18:17:54 CEST] <BBB> I can reproduce 6460/1/2 with that reverted
[18:17:56 CEST] <BBB> ok, so all duplicates
[18:19:23 CEST] <BBB> tnx for helping :)
[18:19:48 CEST] <BBB> yesterday night I was semi-panicking like as if ffvp9 was a total security disaster
[18:20:00 CEST] <BBB> always nice to be able to trace that back to a commit just 1 day ago :)
[18:27:12 CEST] <gh0st__> BBB: Looks like I made some trouble there, yikes.
[18:28:40 CEST] <BBB> its ok :)
[22:15:25 CEST] <thebombzen> why is michael the only dev here without op?
[22:15:46 CEST] <Shiz> maybe he has set noautoop in nickserv?
[22:16:02 CEST] <thebombzen> perhaps? but most of the ops here are not ops in #ffmpeg so idk
[22:42:09 CEST] <BBB> J_Darnley: anything more I can do to help get the idct patch pushed?
[22:42:36 CEST] <BBB> J_Darnley: as irritating as part of this process may be, I really would like this to go in, I think its great work and helps a lot of use cases
[22:57:51 CEST] <kierank> I also would like to see this in because I have something else for J_Darnley to work on
[22:59:45 CEST] <atomnuker> tdjones: can you add an option to the encoder like opus_delay?
[23:00:18 CEST] <atomnuker> (except make it in samples rather than seconds)
[23:00:39 CEST] <atomnuker> I'd like to see the effect of the incredibly long vorbis windows
[23:03:12 CEST] <durandal_170> kierank: on mpeg2?
[23:03:33 CEST] <kierank> durandal_170: no
[23:04:31 CEST] <durandal_170> kierank: then what?
[23:04:51 CEST] <jkqxz> How long is the random_seed test meant to take to run?
[23:04:53 CEST] <kierank> well updating our internal ffmpeg and working on some NIC stuff
[23:11:17 CEST] <jkqxz> Taking ~ten minutes on a desktop skylake surely means it will take absurdly fucklong on embedded stuff?  (Or does that mean there is something wrong with my machine?)
[23:16:23 CEST] <jamrial> jkqxz: that's definitely not right
[23:16:49 CEST] <J_Darnley> BBB, kierank: I hope not too much longer.  I confirmed it was working with dct before I went to make dinner
[23:17:21 CEST] <J_Darnley> Now I plan to check it against C and MMX using the fate tests.
[23:20:05 CEST] <jamrial> jkqxz: i remember not too long ago someone updated the random_seed test and it started to take a crap load of time on windows
[23:20:10 CEST] <jamrial> but it was soon fixed
[23:20:37 CEST] <BBB> is someone also working on integrating th patches from bugmaster to fix 444 h264 decoding lossless something or so?
[23:20:45 CEST] <BBB> or should I do that?
[23:21:10 CEST] <durandal_170> do that
[23:21:49 CEST] <cone-862> ffmpeg 03Matthieu Bouron 07master:204008354f7f: lavc/aarch64/simple_idct: fix build with Xcode 7.2
[23:22:54 CEST] <jkqxz> I shouldn't be running out of entropy, because it's using /dev/urandom.
[23:23:02 CEST] <jkqxz> Also it really did take 10 minutes of CPU time, not just real time.
[23:26:31 CEST] <nevcairiel> jamrial: its still relatively slow, mostly because its testing this one mode thats never going to get used in windows either  way, and there is a reason we have the mode that uses windows crypto functions instead of relatively inaccurate time seeds
[23:26:46 CEST] <tdjones> atomnuker: To let it change the length of encoded windows, or is it just for lookahead in the psy model?
[23:28:11 CEST] <jkqxz> nevcairiel:  How long do you expect it to take there?
[23:30:47 CEST] <atomnuker> tdjones: the latter, make a copy of ff_opus_psy_process() for vorbis and do what it does
[23:31:33 CEST] <atomnuker> it sets the window size based on the option value and if its more than the largest possible window keepts it as a lookahead
[23:34:55 CEST] <durandal_170> atomnuker: how is audio noise reduction filter going?
[23:35:14 CEST] <nevcairiel> jkqxz: expect? takes about 10 seconds on my relatively fast system, which for this simple test is already stupid long, if it takes minutes then its probably still broken and tries to find some entropy somewhere
[23:35:57 CEST] <nevcairiel> (it always tests the time-based fallback mode, in addition to the platform-specific entropy source)
[23:36:02 CEST] <tdjones> atomnuker: Sure, I'll take a look at it and work it in as soon as I'm able
[23:36:29 CEST] <J_Darnley> For the love of god, what now?  Why does fate pass on my laptop but not my server? WTF!
[23:39:44 CEST] <wm4> why do we need a random seed test
[23:39:51 CEST] <wm4> that's utterly stupid
[23:39:59 CEST] <J_Darnley> Cock.  I bet its harddrive is dying.  Deleting the sample then syncing fixed it.
[23:40:09 CEST] <wm4> it's not like it's going to be used for something important
[23:41:11 CEST] <atomnuker> tdjones: once you get that running I'd like you to write a code which calculates the distortion of the residual and use it to find the best rc->classbook using a greedy search (look through all values)
[23:41:33 CEST] <cone-862> ffmpeg 03Mark Thompson 07master:f2e4fb61af4b: hwcontext_vaapi: Try to support the VDPAU wrapper
[23:41:34 CEST] <cone-862> ffmpeg 03Mark Thompson 07master:92bd08974541: vaapi_encode: Discard output buffer if picture submission fails
[23:41:35 CEST] <cone-862> ffmpeg 03Mark Thompson 07master:b22172f6f353: hwcontext: Add device derivation
[23:41:36 CEST] <cone-862> ffmpeg 03Mark Thompson 07master:0b1794a43e10: hwcontext: Make it easier to work with device types
[23:41:37 CEST] <cone-862> ffmpeg 03Mark Thompson 07master:06043cc0bc72: ffmpeg: Generic device setup
[23:41:38 CEST] <cone-862> ffmpeg 03Mark Thompson 07master:be5107335230: ffmpeg: Enable generic hwaccel support for VAAPI
[23:41:39 CEST] <cone-862> ffmpeg 03Mark Thompson 07master:e462ace84b92: ffmpeg: Enable generic hwaccel support for VDPAU
[23:41:40 CEST] <cone-862> ffmpeg 03Mark Thompson 07master:527a1e213167: ffmpeg: Document the -init_hw_device option
[23:41:41 CEST] <cone-862> ffmpeg 03Mark Thompson 07master:bff7bec1d7d0: vf_deinterlace_vaapi: Add support for field rate output
[23:41:42 CEST] <cone-862> ffmpeg 03Mark Thompson 07master:91c3b50d74ba: qsv: Add ability to create a session from a device
[23:41:43 CEST] <cone-862> ffmpeg 03Mark Thompson 07master:8aa3c2df1ae6: qsvdec: Allow use of hw_device_ctx to make the internal session
[23:41:44 CEST] <cone-862> ffmpeg 03Mark Thompson 07master:28aedeed1961: qsvenc: Allow use of hw_device_ctx to make the internal session
[23:41:45 CEST] <cone-862> ffmpeg 03Mark Thompson 07master:b658b5399e5d: vaapi_encode: Use gop_size consistently in RC parameters
[23:41:46 CEST] <cone-862> ffmpeg 03Mark Thompson 07master:49ae8a5e87f9: lavc: Add flag to allow profile mismatch with hardware decoding
[23:41:47 CEST] <cone-862> ffmpeg 03Mark Thompson 07master:38820631746f: vaapi: Add external control of allow-profile-mismatch
[23:41:48 CEST] <cone-862> ffmpeg 03Mark Thompson 07master:7ce47090ce36: ffmpeg: Support setting the hardware device to use when filtering
[23:41:49 CEST] <cone-862> ffmpeg 03Mark Thompson 07master:045ff8d30a69: hwcontext_qsv: Support derivation from child devices
[23:41:50 CEST] <cone-862> ffmpeg 03Mark Thompson 07master:ec3dbeae8139: hwcontext: Add frame context mapping for nontrivial contexts
[23:41:51 CEST] <cone-862> ffmpeg 03Mark Thompson 07master:f82ace71c0d8: hwcontext_qsv: Implement mapping frames from the child device type
[23:41:52 CEST] <cone-862> ffmpeg 03Mark Thompson 07master:a97fb14418fd: hwcontext_qsv: Implement mapping frames to the child device type
[23:41:53 CEST] <cone-862> ffmpeg 03Mark Thompson 07master:d59c6a3aebc2: hwcontext: Improve allocation in derived contexts
[23:41:54 CEST] <durandal_170> spaaaam
[23:41:54 CEST] <cone-862> ffmpeg 03Mark Thompson 07master:b2ef1f42badd: vf_hwmap: Add device derivation
[23:41:55 CEST] <cone-862> ffmpeg 03Mark Thompson 07master:d81be0a60a6d: vf_hwmap: Add reverse mapping for hardware frames
[23:41:56 CEST] <cone-862> ffmpeg 03Mark Thompson 07master:5de38188f82a: doc: Document hwupload, hwdownload and hwmap filters
[23:43:25 CEST] <jamrial> jkqxz: did you see my reply about vp9_raw_reorder? converting it to get_bits is trivial. none of the bitstream_read calls read more than 25 bits
[23:43:36 CEST] <jamrial> much easier/better than waiting for the new bitstream reader to finally be merged
[23:45:04 CEST] <jkqxz> Yeah, I'm about to do it.
[23:45:14 CEST] <jkqxz> Just getting that set out of the way first.
[23:45:20 CEST] <jamrial> ah cool
[23:47:45 CEST] <jkqxz> wm4:  For videotoolbox, I think it should be able to use generic hwaccel (though I only have oranges, so can't test).  What about VDA?  That also uses the ffmpeg_videotoolbox.c file.
[23:48:42 CEST] <wm4> fuck VDA, is my reply
[23:49:02 CEST] <wm4> I can't even test VDA because newer OSX versions have it fully removed
[23:49:15 CEST] <wm4> supporting VDA is a bit like supporting windows xp
[23:49:21 CEST] <jkqxz> Oh, ha.
[23:49:25 CEST] <jkqxz> So ffmpeg will keep it forever, then?
[23:49:30 CEST] <wm4> probably
[23:50:16 CEST] <jkqxz> Move videotoolbox to generic hwaccel, rename ffmpeg_videotoolbox.c to ffmpeg_vda.c and remove the ifdefs?
[23:50:17 CEST] <wm4> but I don't care, and you shouldn't either... I'd just post a patch (that implicitly removes vda ffmpeg.c support), and see if any trolls appear
[23:50:23 CEST] <nevcairiel> my problem is that I need mac binaries that work on 10.7 and up, and IIRC VT is only 10.8 and up, but if VDA doesnt work on newer versions that leaves me with shit =p
[23:50:46 CEST] <jkqxz> (Also subtly break it to see if anyone ever uses it.)
[23:50:50 CEST] <wm4> not sure when VT support started... it was inofficial for a while
[23:51:06 CEST] <nevcairiel> docs say 10.8 anyway
[23:56:57 CEST] <jkqxz> "make fate-random_seed" seems to take a random amount of time.
[23:58:31 CEST] <kode54> "We require hvc1 codec id for playback. This was communicated during the sessions at WWDC. We have a similar requirement for H.264 playback where we require avc1 codec id."
[23:58:31 CEST] <jkqxz> I think it's much faster if other stuff is happening on the same machine.  The first run was in the middle of a nonparallel "make check" and took ten minutes, a later run took two minutes, and it's under ten seconds if other stuff is going on.
[23:58:43 CEST] <kode54> maybe make a handy stream reformatter for that?
[23:59:14 CEST] <jkqxz> So sounds like entropy might well be the problem.
[00:00:00 CEST] --- Thu Jun 15 2017