[Ffmpeg-devel-irc] ffmpeg-devel.log.20160708

burek burek021 at gmail.com
Sat Jul 9 02:05:03 CEST 2016


[08:42:22 CEST] <andrey_turkin> i see a potential issue with SCTE-35 regarding its timing. There may be zero, one or many pts values embedded in the messages, and somehow the messages should be in sync with rest of the streams despite all the massaging done to timestamps in ffmpeg
[08:44:00 CEST] <andrey_turkin> easiest cases probably would work reasonably well though (immediate cue in/out commands and no delays in fifos/buffers)
[09:43:23 CEST] <andrey_turkin> libav merges seem to have stalled ( I guess pending h264 regression resolution?
[09:47:31 CEST] <mateo`_> andrey_turkin: yes, the current commit being merged is problematic (and cause regressions)
[09:55:00 CEST] <andrey_turkin> that's unfortunate. I was looking forward to recent batch of qsv patches (
[10:06:13 CEST] <nevcairiel> one other reason is that ubitux was busy the recent days
[10:31:13 CEST] <ubitux> yeah i'm pretty busy currently
[10:31:18 CEST] <ubitux> maybe i'll have time this week end
[10:31:33 CEST] <ubitux> but i'm really not satisfied with reverting most of the commit to get it work in ffmpeg
[10:31:42 CEST] <ubitux> it will only delay things
[10:50:18 CEST] <nevcairiel> did you see michaels patch on top of that?
[10:51:42 CEST] <ubitux> i only see one removing the code repositionning
[11:03:44 CEST] <nevcairiel> ubitux: http://pastebin.com/t0D0aZV7 .. it basically adds a few security checks back that were lost, and moves the function call back to its original position - but keeps the refactoring into its own function
[11:04:49 CEST] <nevcairiel> not sure if that counts as reverting most of it, the diff seems acceptably small
[11:06:11 CEST] <ubitux> yes that's what i was talking about
[11:06:21 CEST] <ubitux> not moving that code might have consequences in the future
[11:06:37 CEST] <nevcairiel> not every changing coming from the other side has to be taken as flawless and issue free :p
[11:06:38 CEST] <ubitux> it means they can execute that function in that particular place
[11:06:40 CEST] <ubitux> while we can't
[11:06:57 CEST] <ubitux> i don't know
[11:07:07 CEST] <ubitux> they don't trigger the same crash we have
[11:07:30 CEST] <nevcairiel> they also dont play a variety of files we can, so maybe it just bails out earlier
[11:09:02 CEST] <ubitux> maybe, but i didn't want to make that assumption without verification
[11:09:03 CEST] <nevcairiel> there is at least one security check that was just flat out removed in the current merge, not sure if that alone helps already
[11:43:32 CEST] <ubitux> nevcairiel: if you think that's the best/least worse way to handle it, feel free to go ahead and commit 
[11:43:40 CEST] <ubitux> i'll continue the merging this week end probably
[11:46:18 CEST] <nevcairiel> i dont have a broken sample, so i cant really say with any certainty
[13:06:57 CEST] <ubitux> nevcairiel: ask michael
[15:14:20 CEST] <BBB> michaelni: you didnt respond to my request to revert the ppc64 simd patch (the first one, which you merged, iirc)
[15:14:25 CEST] <BBB> michaelni: whats your opinion?
[15:38:51 CEST] <cone-560> ffmpeg 03Hendrik Leppkes 07master:c3e9b098e12b: h2645_parse: only read avc length code at the correct position
[15:38:51 CEST] <cone-560> ffmpeg 03Hendrik Leppkes 07master:83a940e7fb96: h2645_parse: don't overread AnnexB NALs within an avc stream
[15:41:54 CEST] <nevcairiel> should really find out what kind of crappy software writes mp4s with annexb NALUs
[15:45:52 CEST] <Compn> hmm, mips optimizaitons http://elinux.org/CI20_MPlayer
[15:48:48 CEST] <ubitux> wrong channel
[15:49:27 CEST] <ubitux> or are they patching libav* dumped inside their mplayer repo?
[15:50:11 CEST] <nevcairiel> probably
[15:50:57 CEST] <nevcairiel> yeah there is all sorts of crazy custom code in avcodec et al
[15:51:17 CEST] <Compn> yes they have their own avcodec
[15:51:30 CEST] <Compn> not sure if those have already been backported here tho
[15:51:39 CEST] <nevcairiel> those are not backportable
[15:51:44 CEST] <nevcairiel> they hacked them in everywhere
[15:51:49 CEST] <nevcairiel> no clean integration anywhere
[15:51:50 CEST] <Compn> mips32
[15:52:03 CEST] <Compn> ah well, we should keep a list for interested parties
[15:52:09 CEST] <ubitux> no
[15:52:52 CEST] <nevcairiel> its also one huge code dump from 2014 or so, no history before that, which covers the majority of changes
[16:40:30 CEST] <michaelni> BBB, didnt reply as i dont really know whats best
[16:40:47 CEST] <BBB> ok& so how do you suggest we move forward?
[16:40:55 CEST] <BBB> want to ask third-person opinions?
[16:41:01 CEST] <BBB> do nothing seems a bad idea
[16:41:53 CEST] <j-b> BBB: what's the issue?
[16:42:09 CEST] <BBB> whether to revert the first ppc64 patch
[16:42:14 CEST] <BBB> from that guy that threatened ubitux
[16:42:21 CEST] <j-b> The code is correct?
[16:42:24 CEST] <BBB> theres little-to-no performance gains
[16:42:28 CEST] <BBB> and the code seems highly inefficient
[16:42:33 CEST] <BBB> (gains vs. c)
[16:43:01 CEST] <j-b> who benched it?
[16:43:02 CEST] <nevcairiel> it is otherwise technically correct, BBB is just worried it might hamper efforts of some future ppc64 person
[16:43:04 CEST] <BBB> so I prefer to revert it and leave the bug (#5570) open so other people can work on it without having to bother with existing code that is probably doing it fundamentally wrong and thus serves as a poor example
[16:43:19 CEST] <BBB> the guy himself benched it
[16:43:35 CEST] <BBB> he gets like 0.9-1.25x speed vs. c whereas sse2 gives ~4x
[16:43:44 CEST] <BBB> the 0.9x is especially worrying because it means it got slower
[16:43:47 CEST] <j-b> wut?
[16:43:48 CEST] <j-b> lol
[16:43:53 CEST] <BBB> so I want to revert it
[16:43:58 CEST] <michaelni> the first patch was alwayys > 1 but not much bigger
[16:44:10 CEST] <michaelni> otherwise BBB explained the situation well
[16:44:13 CEST] <BBB> oh ok I didnt know (he didnt report it publicly in emails)
[16:44:18 CEST] <BBB> (or did he?)
[16:44:26 CEST] <nevcairiel> i think there were numbers in one mail
[16:44:29 CEST] <michaelni> i think in the mail i linked IIRC
[16:44:37 CEST] <nevcairiel> but he didnt bench using START_TIMER so its an unusual format
[16:44:44 CEST] <ubitux> j-b: he was probably motivated by the bounty
[16:44:54 CEST] <BBB> ubitux: thats not an issue
[16:44:59 CEST] <BBB> if people get paid, thats fine
[16:45:02 CEST] <BBB> but the code needs to be good
[16:45:04 CEST] <ubitux> of course
[16:45:29 CEST] <ubitux> i'm just proposing an explanation for the why he may not have care so much about the quality
[16:45:52 CEST] <BBB> fair enough
[16:45:54 CEST] <j-b> I'd argue it should not have been merged, but I don't see the reason to revert it, unless the code is really shitty
[16:46:02 CEST] <BBB> its really shitty :-p
[16:46:05 CEST] <nevcairiel> i wonder if IBM might care that they got a measily 20% speed up at best
[16:46:19 CEST] <j-b> BBB: really?
[16:46:24 CEST] <BBB> I wouldnt be surprised if the bounties are managed by non-coders
[16:46:29 CEST] <BBB> which means they dont care
[16:46:37 CEST] <BBB> they just want a checkmark on a piece of paper
[16:46:45 CEST] <BBB> [ ] ffmpeg swscale input has simd
[16:46:48 CEST] <BBB> [v]
[16:46:52 CEST] <BBB> right?
[16:46:57 CEST] <nevcairiel> i guess
[16:48:24 CEST] <jkqxz> I still think there must be something else going on.  The code doesn't appear to be that bad - it's a simple translation which should gain more than it did, even if it has significant micro issues.
[16:49:17 CEST] <BBB> my impression is that at least in the code that I reviewed, its not confirmed that all code actually runs in simd vector instructions
[16:49:26 CEST] <jkqxz> But yes: if noone is going to work on it further, it should probably be reverted so that the slate is clean if someone else wants to have a go.
[16:49:27 CEST] <BBB> some code - I think! - runs in scalar
[16:49:40 CEST] <BBB> so theres a vector - scalar - vector transition going on in a few places
[16:49:58 CEST] <BBB> and because he does it in c (using c language, not intrinsics) its invisible
[16:50:05 CEST] <BBB> the one I reviewed, he uses *
[16:50:12 CEST] <BBB> on straight integers
[16:50:25 CEST] <BBB> most simd sets have no int*int multiplication instruction
[16:50:33 CEST] <BBB> they have word*word->int or so
[16:50:38 CEST] <BBB> (with horiz. add)
[16:50:41 CEST] <BBB> or something like that
[16:50:52 CEST] <BBB> but thats invisible because he mixed C and intrinsics
[16:50:59 CEST] <BBB> and somehow it doesnt give compiler warnings
[16:51:22 CEST] <BBB> (thats my hunch, I tried verifying it through disassembly but that looks so horridly large that I couldnt make sense out of it - plus its not my job)
[16:52:37 CEST] <BBB> it also doesnt explain why other functions are slow, but maybe add has the same issue (he uses += on vector int, instead of intrinsics)
[16:52:58 CEST] <BBB> again, I didnt confirm, but a 1.25x speed improvement over c for simd is not something we should be happy about if other archs get 4x
[16:54:40 CEST] <nevcairiel> I would honestly be surprised if compilers warned about such things
[16:54:50 CEST] <nevcairiel> it would be nice for sure
[16:55:02 CEST] <michaelni> has it been confirmed auto vectorization was disabled for C ?
[16:55:16 CEST] <andrey_turkin> there is an assembler for Power8 with simd support, right? gas should work? I'd say revert it and wait for someone to write it in assembly. This way at least there won't be any doubts about compiler efficiency
[16:56:44 CEST] <BBB> I think andrey_turkin is on to something
[16:56:50 CEST] <ubitux> all existing ppc optim in ffmpeg are in intrinsics, aren't they? it shouldn't be a reason to revert
[16:57:10 CEST] <nevcairiel> all other ppc optimizations are probably 10 years old though
[16:57:22 CEST] <ubitux> who can bench ppc code here?
[16:59:51 CEST] <BBB> michaelni: no (I mentioned that in one of my emails also)
[17:06:52 CEST] <jkqxz> IBM notionally offers free access to some developer machines, though I don't know how many hoops you have to jump through to use them.
[17:10:40 CEST] <michaelni> IIUC the patch broke build on some configuration, if that isnt fixed that would be a unambigous reason to revert
[17:12:08 CEST] Action: andrey_turkin almost considers learning vsx 
[17:33:38 CEST] <BBB> andrey_utkin: not that I want to tell you what to do, but it may be more useful to learn something that is actually, yknow, useful :-p
[17:33:54 CEST] <BBB> michaelni: the mailinglist suggests that got fixed already
[17:35:26 CEST] <BBB> j-b: yay registration is open!
[17:36:49 CEST] <BBB> j-b: flights only $800, nice
[17:37:43 CEST] <j-b> BBB: please buy it in British Pounds, it soon will be $0
[17:38:03 CEST] <BBB> I dont think BA flies there directly :(
[17:39:02 CEST] <j-b> it was a joke :)
[17:42:39 CEST] <BBB> it was a good idea
[17:42:55 CEST] <Compn> registration time eh
[17:42:59 CEST] <BBB> if I buy fx futures, I can buy now and pay later
[17:43:09 CEST] <BBB> (when BP -> 0)
[17:43:31 CEST] <Compn> family member wants me to drive her from berlin to lviv, ukraine...
[17:44:12 CEST] <Compn> anyone been to poland? how is it ?
[17:44:15 CEST] <BBB> michaelni: so I think we still dont have a decision on the first ppc64 patch from that guy
[17:44:22 CEST] <BBB> michaelni: think about it for a bit so we can decide and move on
[17:45:25 CEST] <cone-560> ffmpeg 03Matthieu Bouron 07master:0f2654c9a3ea: lavc: add mediacodec hwaccel support
[17:47:48 CEST] <j-b> 17:42 <@BBB> if I buy fx futures, I can buy now and pay later
[17:47:52 CEST] <j-b> I'd argue you should :)
[17:48:03 CEST] <j-b> I don't see how the pound can increase :)
[17:48:19 CEST] <j-b> Compn: you must see Krakovia. Warsaw is dead boring
[17:49:04 CEST] <BBB> j-b: you guys take care of hotel right?
[17:49:11 CEST] <BBB> (booking)
[17:49:34 CEST] <j-b> yep
[17:49:39 CEST] <BBB> ty
[17:49:41 CEST] <j-b> except for you, though
[17:49:44 CEST] <j-b> /s
[17:50:01 CEST] <BBB> :'(
[17:50:24 CEST] <Compn> j-b : ok, yes thats what i was thinking warsaw was like. will put krakovia on the list. thanks :)
[17:50:56 CEST] <Compn> j-b : btw do you have idea about how much usa to berlin should cost? i havent looked at ticket prices but it says ask :P
[17:51:39 CEST] <BBB> j-b: jfk-berlin starts at $796
[17:51:42 CEST] <BBB> oops
[17:51:47 CEST] <BBB> Compn: jfk-berlin starts at $796
[17:52:05 CEST] <BBB> (direct)
[17:52:38 CEST] <BBB> Compn: what airport do you fly from?
[17:52:43 CEST] <Compn> dtw ( detroit)
[17:53:09 CEST] <iive> BBB: it would be good if you could prove your suspicions 
[17:53:17 CEST] <Compn> which is a hub, so i can get most direct flights :)
[17:53:40 CEST] <iive> BBB: assembler output should be enough to see if there is indeed mix of vector and scalar ops.
[17:53:54 CEST] <iive> BBB:  a real benchmark too.
[17:54:05 CEST] <BBB> holy shit youll pay $1038 but with 2 stops :-o
[17:54:10 CEST] <Compn> lol
[17:54:39 CEST] <BBB> one-stop is $1236
[17:54:44 CEST] <BBB> 1297 actually
[17:54:55 CEST] <BBB> with united airlines
[17:55:09 CEST] <BBB> american is same price
[17:55:17 CEST] <BBB> (arent they the same company?)
[17:56:06 CEST] <Compn> no non stops! wow.
[17:56:19 CEST] <Compn> i'm searching nwa (aka delta klm now)
[17:56:26 CEST] <Compn> dunno all of the airlines are merging
[17:56:55 CEST] <Compn> i can get non stop to japan but not berlin :V
[17:58:15 CEST] <Compn> any particular difference of BER or TXL (berlin-tegel) ?
[17:58:40 CEST] <BBB> no ide
[17:58:41 CEST] <BBB> a
[17:58:57 CEST] <Compn> my stop would either be amsterdam or cdg france
[17:59:18 CEST] <Compn> wonder what the better airport to switch would be ?
[18:01:52 CEST] <BBB> I would say ams but Im dutch so Im biased :-p
[18:02:34 CEST] <Compn> i mean, are there usual slowdowns or air worker strikes or bad weather at AMS airport ?
[18:02:43 CEST] <Compn> is more my question
[18:03:11 CEST] <ln-> Compn: BER is an airport that has not been opened yet.
[18:03:48 CEST] <ln-> https://en.wikipedia.org/wiki/Berlin_Brandenburg_Airport
[18:04:46 CEST] <BBB> Compn: oh, I think this is summer so you should be fine
[18:05:00 CEST] <BBB> Compn: strikes are by its very nature unannounced so hard to say
[18:05:19 CEST] <BBB> Compn: french typically strike more than dutch but j-b might feel offended by that
[18:05:19 CEST] <Compn> ln- : ah, flight website confused me, yes setting BER sends me to TXL anyway. ok thanks
[18:06:01 CEST] <Compn> BBB : i just wasnt sure if netherlands was a country that had strikes. i know france does
[18:06:10 CEST] <kierank> 4:37 PM <"j-b> BBB: please buy it in British Pounds, it soon will be $0 
[18:06:12 CEST] <kierank> :(
[18:06:13 CEST] <Compn> and usa of course :)
[18:06:35 CEST] <BBB> netherlands has strikes, but proportionally lower than in france from what I recall
[18:07:10 CEST] <Compn> also customs would be nice to know about
[18:07:28 CEST] <Compn> cdg customs was fine. sometimes long lines
[18:07:55 CEST] <Compn> i probably dont have to go through customs with a connecting flight but :P
[18:14:11 CEST] <j-b> kierank: haha.
[18:14:13 CEST] <j-b> sorry :)
[18:17:15 CEST] <BBB> kierank: sorry dude :( I really do feel bad for your country
[18:17:40 CEST] <j-b> (I don't. For once that we are not the stupids)
[18:17:51 CEST] <BBB> now now now
[18:19:26 CEST] <michaelni> BBB, it would be nice to know how far the code is from whats achiveable before reverting. if its a third of the speed what a ppc can do then revert to get a clean slate seems to make more sense than if its 30% below whats achviveable.
[18:19:49 CEST] <BBB> Im going to say 4x should be achievable
[18:20:02 CEST] <BBB> by a knowledgeable person who is willing to dig into this a little
[18:20:21 CEST] <BBB> for the function I analyzed (rgb24toy)
[18:20:32 CEST] <Compn> j-b : oh , lol. in america we just call it 'krakow' 
[18:20:33 CEST] <BBB> others probably similar, theyre straightforward vector functions
[18:20:42 CEST] <Compn> i thought krakovia was some other town D:
[18:20:44 CEST] <Compn> ehe
[18:21:33 CEST] <kierank> 5:17 PM <"j-b> (I don't. For once that we are not the stupids)
[18:21:38 CEST] <kierank> :)
[18:21:56 CEST] <BBB> didnt french rurals vote for le pen?
[18:24:38 CEST] <j-b> not yet
[18:24:47 CEST] <j-b> 2017 will be the French-stupid one
[18:24:54 CEST] <j-b> 2016 is UK-US-stupidity
[18:25:01 CEST] <j-b> please leave us one year of laughing
[18:25:07 CEST] <BBB> hey we still have a few months of sanity left
[18:25:16 CEST] <BBB> dont judge us before we do something stupid
[18:25:16 CEST] <j-b> TRUMP! MAGA!
[18:25:30 CEST] <BBB> trump is a disaster only in november
[18:25:40 CEST] <BBB> if the earth goes down, please let us have some fun on it before that?
[18:25:54 CEST] <j-b> yeah, before, please continue killing yourselves with guns and the police
[18:26:03 CEST] <BBB> manhattan has sensible gun laws
[18:26:15 CEST] <j-b> anyway, next year, French will be the stupid with LePen
[18:26:19 CEST] <BBB> what they do in other parts of the country is their problem :-p
[18:30:12 CEST] <michaelni> is there someone who could benchmark VSX PPC against (non auto vectored C) ? but either way iam happy with whatever the people prefer to do with patch 1
[18:31:53 CEST] <BBB> michaelni: I would argue thats not our job
[18:32:00 CEST] <BBB> michaelni: thats the job of the person submitting the patch
[18:32:12 CEST] <jamrial> michaelni: if nobody can bench and since it's confirmed it broke some setups, then it should be reverted, IMO
[18:32:13 CEST] <BBB> michaelni: after all, he has knowledge of ppc64 simd instructions and toolchains
[18:32:36 CEST] <jamrial> also, the person that submitted the code proved to be an asshole
[18:32:44 CEST] <jamrial> what the fuck was all that anyway?
[18:33:09 CEST] <Compn> it is better to forgive and forget than to insult and ridicule but ok :P
[18:33:11 CEST] <BBB> yeah that was shitty
[18:33:17 CEST] <jamrial> insulting and threatening devs? WTF
[18:33:44 CEST] <kierank> jamrial: yeah, terrible
[18:33:56 CEST] <kierank> he should do some work on the bugtracker, that will remedy the problem
[18:34:53 CEST] <BBB> ???
[18:35:06 CEST] <kierank> maybe move to austria as well
[18:35:30 CEST] <BBB> &
[18:35:34 CEST] <Compn> people like to hold grudges
[18:35:35 CEST] <jamrial> the fact he accused people of "interjecting" while "being nobody" makes me think he's unaware that the review process is not a one on one process
[18:36:10 CEST] <kierank> point is the double standard in ffmpeg is truly shocking
[18:36:11 CEST] <BBB> kierank: lets stay nice for a bit please?
[18:36:21 CEST] <BBB> I agree that we have issues
[18:36:54 CEST] <BBB> and I agree theres a few people we should probably ban for violating the CoC rules
[18:37:05 CEST] <Compn> retroactively
[18:37:11 CEST] <Compn> ?
[18:37:48 CEST] <BBB> maybe. we can vote on it, right?
[18:38:04 CEST] <BBB> laws are often implemented in response to actual events anyway
[18:38:17 CEST] <BBB> were not a legal system, were a small community, we can do whatever we want
[18:38:25 CEST] <BBB> what are they gonna do, sue us? good luck on that
[18:38:34 CEST] <iive> laws are never applied retroactively, and for a good reason.
[18:39:16 CEST] <durandal_1707> we are in anarchy
[18:39:54 CEST] <jamrial> i don't think anyone ever threatened a dev before, especially not for something as stupid as criticizing a patch, so nothing from this guy should be accepted from now on
[18:40:15 CEST] <BBB> jamrial: +1 to that
[18:40:30 CEST] <Compn> or we could try to teach him to be polite ?
[18:40:40 CEST] <BBB> :(
[18:40:42 CEST] <Compn> if someone wanted
[18:43:23 CEST] <jamrial> Compn: teach him to be polite for what?
[18:43:28 CEST] <jamrial> "please, use nicer words next time you threaten someone"?
[18:45:44 CEST] <Compn> maybe no one ever taught him that was rude to threaten someone else
[18:46:29 CEST] <Compn> just playing devils advocate here jamrial
[18:46:38 CEST] <Compn> i dont know.
[18:50:09 CEST] <BBB> I dont think we want such people here
[18:50:27 CEST] <BBB> its not nice to threaten someone is like saying its not nice to point a gun at someone or its not nice to hit someone
[18:50:34 CEST] <BBB> you teach that to babies, maybe toddlers
[18:50:42 CEST] <BBB> after that, you are supposed to know that
[18:51:43 CEST] <jamrial> no, we don't want them. but we tend to have trouble getting rid of that kind of people
[18:54:04 CEST] <BBB> I think this one went off by himself
[18:54:08 CEST] <BBB> so at least that problem is solved
[19:17:19 CEST] <BtbN> I have no idea what that omx report is talking about. Does it watermark? Does it show something on the display?
[19:18:07 CEST] <nevcairiel> BtbN: i assume they start a transcode and for some reason it ends up on screen? :D
[19:18:37 CEST] <BtbN> That would be an interesting side effect.
[19:32:50 CEST] <nevcairiel> for sure
[20:58:10 CEST] <Yuken> How would I go about compiling ffmpeg with NVENC support under Linux?
[20:58:25 CEST] <nevcairiel> grab the header, put it somewhere ffmpeg finds it, enjoy
[21:00:34 CEST] <jkqxz> Also --enable-nvenc; it isn't autodetected.
[21:28:21 CEST] <friki2015> hi, the compile option "--enable-decklink" requieres an extra library. where can i get it?
[21:30:44 CEST] <Compn> friki2015 : i checking
[21:32:24 CEST] <Compn> Blackmagic DeckLink SDK
[21:32:36 CEST] <Compn> http://software.blackmagicdesign.com/SDK/Blackmagic_DeckLink_SDK_10.3.1.zip
[21:32:40 CEST] <Compn> possibly.
[21:32:55 CEST] <Compn> if you are building on windows theres this, https://ffmpeg.zeranoe.com/forum/viewtopic.php?t=1823
[21:33:02 CEST] <friki2015> thanks, i'll try
[21:33:16 CEST] <Compn> maybe this helps too , https://github.com/dche/ffmpeg-decklink
[21:33:21 CEST] <friki2015> i'm trying to do it in Debian 
[21:33:36 CEST] <friki2015> i'll check the link anyway :)
[21:36:36 CEST] <friki2015> the zip link seems to stop working
[21:41:09 CEST] <Compn> must be on their site somewhere :D
[21:46:08 CEST] <friki2015> i need the serial number to download the lib (i'm working remote with the server). Is it the only way to capture video using a blackmagic card?
[21:51:36 CEST] <Compn> try asking lu_zero in #videolan
[21:51:41 CEST] <Compn> i think he knows about decklink
[21:52:31 CEST] <friki2015> thanks
[22:43:05 CEST] <BBB> there, my first avx2 patch
[22:45:57 CEST] <Gramner> you got a rig with a avx2-capable cpu now? :D
[22:47:54 CEST] <BBB> I bought a shiny new laptop
[22:48:04 CEST] <BBB> nobody wants to buy me one so I just bought me one myself
[22:50:56 CEST] <durandal_1707> FFmpeg now have money
[23:00:50 CEST] <BBB> I think people objected to me wanting a shiny high end macbook pro
[23:02:24 CEST] <ubitux> vp9_inv_dct_dct_16x16_add_8_1_sse2: 638.6
[23:02:27 CEST] <ubitux> vp9_inv_dct_dct_16x16_add_8_1_avx: 661.2
[23:02:28 CEST] <ubitux> heh
[23:02:39 CEST] <BBB> I know, something is weird there
[23:02:48 CEST] <BBB> I didnt really check, since I dont really care
[23:02:54 CEST] <ubitux> doesn't happen with the other ones though
[23:03:00 CEST] <ubitux> (unless i misread)
[23:03:03 CEST] <BBB> and only for the very small sub-idcts
[23:03:38 CEST] <Gramner> random cache line alignment shenanigans or stuff like that usually
[23:03:52 CEST] <BBB> 8_2 and 8_1
[23:04:12 CEST] <BBB> Gramner: but they use the same rodata and input data and output buffers
[23:04:28 CEST] <Gramner> code, not data
[23:04:31 CEST] <Gramner> then you change some code in a different file and now the result is the opposite
[23:04:39 CEST] <Gramner> because it linked at a different position
[23:04:52 CEST] <BBB> oh I See
[23:04:55 CEST] <BBB> could be, yes
[23:05:21 CEST] <ubitux> (ah with ssse3/avx in 8_2 indeed)
[23:06:06 CEST] <ubitux> nice improvement with avx2 btw
[23:08:26 CEST] <BBB> I was quite happy with it, yes
[23:08:27 CEST] <nevcairiel> indeed, numbers look good
[23:08:31 CEST] <BBB> 32x32 should see similar improvements
[23:08:35 CEST] <BBB> (I hope?)
[23:08:43 CEST] <BBB> maybe slightly less because you still have to store
[23:08:49 CEST] <BBB> but 1.5x would be pretty good still
[23:10:31 CEST] <BBB> and then loopfilter would _probably_ also benefit from it, but thats trickier because of the way the loopfilter is laid out in vp9
[23:10:52 CEST] <BBB> I can probably figure out how to do it but I need to study that code a bit again, so maybe Ill just not bother and write 10/12bpp asm instead
[23:11:24 CEST] <BBB> oh and directional intra pred, but I feel soooo lazy about doing that
[23:12:10 CEST] <nevcairiel> how big is intra pred in overall cycle use?
[23:13:56 CEST] <Gramner> BBB: why mova + pmovzx instead of doing just the latter from memory?
[23:14:11 CEST] <BBB> is that possible?
[23:14:15 CEST] <jamrial> yeah
[23:14:15 CEST] <BBB> I didnt know that was possible
[23:14:17 CEST] <Gramner> sure
[23:14:18 CEST] <BBB> :-p
[23:14:24 CEST] <BBB> nevcairiel: very tiny
[23:14:39 CEST] <BBB> nevcairiel: and most intra pred is dc, tm is secondary, and directional is a distant last
[23:14:47 CEST] <BBB> nevcairiel: as is also the case for hevc/h264/...
[23:14:55 CEST] <BBB> (although they use planar instead of tm)
[23:16:09 CEST] <BBB> Gramner: fixed locally
[23:16:41 CEST] <BBB> if you have any ideas on how to prevent that vpermq in there, Id love to hear it
[23:16:49 CEST] <jamrial> BBB: did you get into any issues running checkasm --bench? last time i tried it complained about "failed to issue emms" with a vp9 loop_filter test
[23:16:52 CEST] <Gramner> BBB: also a minor nit; no need for the v prefix in the pmovzx instructions
[23:17:06 CEST] <BBB> changed that also
[23:17:17 CEST] <BBB> jamrial: I commented everything out, so dunno :-p
[23:17:26 CEST] <BBB> (I only ran the 16x16 idct for vp9)
[23:17:26 CEST] <jamrial> ah
[23:18:14 CEST] <durandal_1707> BBB: nobody objected
[23:18:39 CEST] <nevcairiel> i dont think he ever officially asked =p
[23:19:09 CEST] <BBB> I feel the money should be used for more productive affairs rather than my love for shiny apple crap
[23:19:40 CEST] <BBB> like, lets use it for sponsoring more devs to attend vdd, or have a dinner there
[23:19:44 CEST] <BBB> or something like that
[23:19:56 CEST] <BBB> call it a ffmpeg and libav reunification dinner And invite the libav crew also
[23:20:10 CEST] <BBB> and then Ill pay for the laptop myself
[23:20:45 CEST] <BBB> so I guess Ill do avx2 32x32 idct next?
[23:21:13 CEST] <BBB> I think last time I did the 32x32 I vowed to never touch it again
[23:21:15 CEST] <BBB> I hate that thing
[23:21:19 CEST] <BBB> its soooooo big
[23:22:11 CEST] <jamrial> probably the same reason nobody wants to touch hevc idct
[23:22:18 CEST] <BBB> Ill touch it
[23:22:22 CEST] <BBB> but I want some company to pay me :-p
[23:22:32 CEST] <jamrial> heh
[23:22:44 CEST] <nevcairiel> arent all the idcts somewhat similar?
[23:22:56 CEST] <BBB> no
[23:22:57 CEST] <TD-Linux> HEVC uses an actual matrix multiply
[23:23:00 CEST] <BBB> the fdcts are similar
[23:23:02 CEST] <BBB> (can be)
[23:23:07 CEST] <BBB> but the idct is matrix-based, yes
[23:23:11 CEST] <BBB> (for hevc)
[23:23:17 CEST] <nevcairiel> i see
[23:23:19 CEST] <BBB> no idea why they did it that way, its ridiculously slow
[23:23:29 CEST] <BBB> I guess its b/c for hardware its all the same
[23:23:51 CEST] <nevcairiel> if its all the same for hardware, why not use one that is faster :D
[23:23:52 CEST] <TD-Linux> nah it's slower for hardware too
[23:24:14 CEST] <BBB> oops again then
[23:24:33 CEST] <BBB> recall how jason was so upset that they broke the idct ordering in vp8 so you need to transpose twice in your assembly?
[23:24:39 CEST] <BBB> guess what they did in hevc&
[23:26:24 CEST] <BBB> I guess theres quite a bit of strange stuff in hevc so this is just part of a pattern
[23:27:42 CEST] <Gramner> BBB: it's possible avoid that vpermq by using 2x vpmovwb instead of packuswb + vpermq + mova + vextracti. unfortunately that requires avx512bw though ;)
[23:28:12 CEST] <BBB> vpmovwb?!?!?
[23:28:15 CEST] <BBB> that sounds amazing
[23:28:17 CEST] <jamrial> then he can't avoid it :p
[23:28:26 CEST] <BBB> so thats like a cross-lane packuswb?
[23:28:30 CEST] <bofh_> avx512 is basically KNL SIMD, so of course it's amazing
[23:28:39 CEST] <bofh_> sadly it's available on like nothing so far
[23:29:09 CEST] <jamrial> avx512 skylake xeons should be out this year, afaik
[23:29:10 CEST] <BBB> I guess its a cross-lane packuswb to memory
[23:29:10 CEST] <Gramner> it's the opposite of pmov(s|z)x
[23:29:13 CEST] <BBB> sweet
[23:29:13 CEST] <nevcairiel> i'm still suspicious if it will be available on the next gen of consumer cpus like some people claim
[23:29:28 CEST] <BBB> anyway, yeah, Ill use that once it exists
[23:29:41 CEST] <Gramner> it will be in cannonlake unless they do soem last-minute backtracking
[23:29:50 CEST] <nevcairiel> so next-next-gen then
[23:29:54 CEST] <jamrial> so no kabylake?
[23:29:55 CEST] <BBB> will someone buy me a cannonlake macbook pro?
[23:30:28 CEST] <Gramner> isn't kaby lake just a fancier pr term for "lets release skylake again"
[23:30:55 CEST] <bofh_> even if it's out on cannonlake that still means we need 2 years for it to become popular in actual consumer hardware
[23:31:10 CEST] <BBB> is it ok to interleave xmm/ymm register use in v* instructions?
[23:31:21 CEST] <BBB> (i.e. theres no slowdown as long as I Dont use non-v instructions?)
[23:31:47 CEST] <BBB> I thought it was but want to make sure
[23:31:48 CEST] <Gramner> as long as you use vex-encoded instructions (which x86inc will do automatically in avx+ functions) there is no penalty
[23:32:00 CEST] <BBB> ok cool
[23:32:12 CEST] <BBB> yeah x86inc.asm makes this stuff pretty sweet
[23:32:30 CEST] <BBB> the 16x16 is a bit of new macros because the old idct was from memory instead of from registers
[23:32:34 CEST] <BBB> so I had to rewrite it
[23:32:42 CEST] <BBB> but the 32x32 should be fairly trivial since its from-memory
[23:32:51 CEST] <BBB> (so it can use existing infrastructure)
[23:33:03 CEST] <bofh_> x86inc makes coding x86 asm not totally painful, it's amazing.
[23:34:06 CEST] <TD-Linux> BBB, the matrix does remove the need for the transpose (obviously) and has flat scaling, which I guess is useful if you don't use a QM
[23:35:00 CEST] <BBB> the flat scaling is nice from a precision pov, yes
[23:35:19 CEST] <BBB> but then they made the coefficients 6 or 7 bit or so
[23:35:26 CEST] <BBB> what on earth were they thinking
[23:38:59 CEST] <BBB> in my tests, the vp9 roundtrip error for 16x16/32x32 is an order of magnitude better (lower) than the hevc one, which I would attribute to the low coefficient precision
[23:38:59 CEST] <jamrial> BBB: wouldn't it be better to merge the sbutterflydqqq into the x86util one, and move the new transpose macro there as well?
[23:39:24 CEST] <BBB> jamrial: yes, I just didnt want to recompile all asm files every time I fixed them
[23:39:32 CEST] <BBB> jamrial: Im happy to merge the final products in x86util.asm
[23:39:45 CEST] <jamrial> alright then
[23:39:46 CEST] <BBB> jamrial: you want me to merge the sbutterflydqqq into sbutterfly?
[23:39:58 CEST] <jamrial> if possible
[23:40:01 CEST] <BBB> like %ifidn %1, dqqq .. %else ..current code.. %endif
[23:40:35 CEST] <jamrial> yeah
[23:41:22 CEST] <BBB> michaelni: fixed 32bit also
[23:41:24 CEST] <BBB> (locally)
[23:41:36 CEST] <BBB> jamrial: ok, Ill do that in a bit
[23:52:20 CEST] <BBB> jamrial: done locally (move transpose/sbutterflydqqq into x86util.asm, integrate sbutterflydqqq into sbutterfly)
[23:53:03 CEST] <jamrial> BBB: thanks :)
[23:53:22 CEST] <BBB> if I dont get comments in the next 30 min or so, Ill re-send
[00:00:00 CEST] --- Sat Jul  9 2016


More information about the Ffmpeg-devel-irc mailing list