Ffmpeg-devel-irc
Threads by month
- ----- 2026 -----
- June
- May
- April
- March
- February
- January
- ----- 2025 -----
- December
- November
- October
- September
- August
- July
- June
- May
- April
- March
- February
- January
- ----- 2024 -----
- December
- November
- October
- September
- August
- July
- June
- May
- April
- March
- February
- January
- ----- 2023 -----
- December
- November
- October
- September
- August
- July
- June
- May
- April
- March
- February
- January
- ----- 2022 -----
- December
- November
- October
- September
- August
- July
- June
- May
- April
- March
- February
- January
- ----- 2021 -----
- December
- November
- October
- September
- August
- July
- June
- May
- April
- March
- February
- January
- ----- 2020 -----
- December
- November
- October
- September
- August
- July
- June
- May
- April
- March
- February
- January
- ----- 2019 -----
- December
- November
- October
- September
- August
- July
- June
- May
- April
- March
- February
- January
- ----- 2018 -----
- December
- November
- October
- September
- August
- July
- June
- May
- April
- March
- February
- January
- ----- 2017 -----
- December
- November
- October
- September
- August
- July
- June
- May
- April
- March
- February
- January
- ----- 2016 -----
- December
- November
- October
- September
- August
- July
- June
- May
- April
- March
- February
- January
- ----- 2015 -----
- December
- November
- October
- September
- August
- July
- June
- May
- April
- March
- February
- January
- ----- 2014 -----
- December
- November
- October
- September
- August
- July
- June
- May
- April
- March
- February
- January
- ----- 2013 -----
- December
- November
- October
- September
- August
- July
- June
- May
- April
- March
- February
- January
- ----- 2012 -----
- December
- November
- October
- September
- August
- July
- June
- May
- April
- March
- February
- January
- ----- 2011 -----
- December
- November
- October
- September
- August
- July
- June
- May
- April
- March
- February
- January
- ----- 2010 -----
- December
- November
- October
- September
- August
- July
- June
- May
- April
- March
- February
- January
June 2010
- 1 participants
- 23 discussions
[00:00:09] <Honoome> mru: we cross-patched :P
[00:12:23] <mru> Honoome: why do you want to tablegenify aes anyway?
[00:12:27] <mru> those tables are tiny
[00:12:52] <Honoome> it makes it entirely non-cow
[00:13:05] <mru> only those pages
[00:13:18] <Honoome> true
[00:13:41] <mru> it's 8k we're talking about
[00:13:44] <Honoome> but one step at a time, you know..
[00:14:07] <mru> I'd start with the multi-megabyte tables
[00:14:26] <Honoome> they are almost all due to the STATIC_VLC that I said above :/
[00:14:59] <lu_zero> STATIC_VLC and friends are a tad convoluted imho
[00:15:40] <Honoome> lu_zero: you were supposed to go to bed an hour ago!
[00:15:46] <lu_zero> I know
[00:15:55] <j0sh_> lu_zero: thanks for the quick review btw
[00:16:15] <j0sh_> lu_zero: dont we need to strip the protocol identifier in fmtp
[00:16:32] <j0sh_> (i'm just cargo culting here, havent actually read the rfcs)
[00:16:44] <lu_zero> I see
[00:17:16] <j0sh_> all the other depacketizers strip out the protocol id, so it must do something
[00:17:25] <lu_zero> well
[00:17:26] <j0sh_> otherwise it would eat up the whole fmtp line
[00:18:10] <lu_zero> once you know with tag it is you have to parse it's value
[00:18:34] <lu_zero> just felt a bit excessive removing it by eating spaces/nonspaces one by one
[00:18:45] <Honoome> mru: if you want I can send tablegenification of cook and dvbsubdec :P
[00:19:59] <j0sh_> lu_zero: is therre a better approach?
[00:20:23] <j0sh_> lu_zero: other depacketizers do the same thing basically, but don't look for tabs and newlines either
[00:23:11] <j0sh_> lu_zero: i see what it does now, it just strips out the 'fmtp:xx' part
[00:24:39] <j0sh_> er, no. the protocol identifier is a numeric value, right?
[00:26:00] <lu_zero> wait
[00:26:18] <Honoome> mru: hmm I think I'll propose a couple of improvements over the tableprint.h in the next few days.. after updating a couple, I think I see an usable pattern
[00:26:28] <lu_zero> http://tools.ietf.org/html/rfc2327
[00:26:34] <lu_zero> that might be useful
[00:26:51] <mru> Honoome: that's reimar's territory
[00:27:16] <Honoome> I said propose ;)
[00:30:42] <j0sh_> lu_zero: i need to send you guys a picture of all the rfcs i printed out
[00:32:14] * Honoome hugs the sony reader
[00:32:21] <drv> i speak for the trees! ;)
[00:32:43] <lu_zero> ^^;
[00:32:49] <lu_zero> I'm a bit afraid
[00:33:08] <Honoome> j0sh_: you do know that there is a docbookified version of rfc2326 around? :P
[00:33:36] <j0sh_> Honoome: no, what's docbook?
[00:33:42] * j0sh_ needs a ebook reader
[00:33:42] <Honoome> argh :P
[00:34:16] <Honoome> go for one that supports adobe de if you have an osx or win system around.. adobe has been so nice as to provide de with a drm that was cracked easily :P
[00:34:33] <j0sh_> the nook looks sweet
[00:34:37] <lu_zero> docbook is a markup I like, Honoome is pretty good in writing docs with it
[00:34:52] * Honoome speaks XML!
[00:34:58] <j0sh_> i'm not gonna get a kindle, because a keyboard on a ebook reader is just stupid
[00:37:03] <lu_zero> Honoome: that would be understood quite badly here
[00:37:14] <lu_zero> look out for BBB and his tar pot
[00:37:25] * Honoome curses Reiman a bit for the choice of function names and decides he'll work on that tomorrow
[00:37:41] <Honoome> lu_zero: as long as it's not schilly's star, it's fine :P
[02:58:28] <Dark_Shikari> mru: ping
[03:18:32] <astrange> j-b: i can't test it (well, if i had a linux cd i could), i don't want to break it though
[07:50:05] <mru> Dark_Shikari: pong
[07:51:24] <Dark_Shikari> mru: just curious what your opinion is on this idea + api for x264
[07:51:33] <Dark_Shikari> since you have more experience designing/reviewing apis and such than I do
[07:51:38] <Dark_Shikari> http://pastebin.org/359789
[07:51:55] <Dark_Shikari> purpose: allow the calling application to begin sending packets before the frame is done being encoded.
[07:52:01] <mru> callbacks are always icky
[07:52:17] <Dark_Shikari> this callback is even better: it's actually basically a closure
[07:52:24] <Dark_Shikari> i.e. we pass back a nal, and a nal_encode function
[07:52:26] <Dark_Shikari> and say "call it please"
[07:52:28] <mru> let me read the comment
[07:53:44] <mru> why does the main output break with this?
[07:54:25] <Dark_Shikari> ideally we don't want to run the nal_encode twice for no reason
[07:54:32] <mru> ok
[07:54:36] <Dark_Shikari> if we've already given output via the callback, running it again seems stupid
[07:54:40] <Dark_Shikari> I mean, it doesn't cost much speed, it's just pointless.
[07:55:20] <Dark_Shikari> hmm, I should probably modify it to give x264_t *h to the callback too, for convenience.
[07:55:33] <Dark_Shikari> (none of the backend code is written, just the api)
[07:55:38] <mru> yes, do that
[07:56:04] <mru> otherwise running multiple encoder instances becomes very hard
[07:57:03] <Dark_Shikari> The other issue I realized we had is that it's impossible to figure out what the correct order of the packets is
[07:57:15] <Dark_Shikari> and lavc (and most decoders) won't handle out of order slices
[07:57:27] <Dark_Shikari> so we'll have to add something to the x264_nal_t to tell the calling app the start and end MB of each slice
[07:57:49] <Dark_Shikari> (because with sliced threads, nalu_process will be called out of order)
[07:58:34] <Dark_Shikari> (speaking of which, we're interested in adding support for that to lavc)
[07:58:40] <Dark_Shikari> i.e. be able to throw single, arbitrary slices at it
[07:58:46] <Dark_Shikari> in any order
[07:58:46] <mru> with this callback, does the x264_nal_t passed to it remain valid after it returs?
[07:59:01] <Dark_Shikari> It remains valid until x264_encoder_encode is called again.
[07:59:14] <Dark_Shikari> note an x264_nal_t is just a very small struct containing a few data values and a pointer.
[07:59:16] <mru> ah, yes it's right there
[07:59:32] <Dark_Shikari> So it's trivial to copy.
[07:59:40] <Dark_Shikari> the pointer, before nal_encode, points to the raw, unescaped data
[07:59:43] <mru> so the callback could just kick another thread to do the actual processing and return immediately?
[07:59:45] <Dark_Shikari> the pointer, after nal_encode points to the user buffer.
[07:59:46] <Dark_Shikari> Yes.
[07:59:53] <Dark_Shikari> That might be useful for large slices.
[08:00:03] <Dark_Shikari> For a small slice, like 1500 bytes, it would cost more time in thread spawning than doing the actual work.
[08:00:15] <Dark_Shikari> since nal_encode is basically asm
[08:00:27] <mru> you wouldn't be so stupid as to _spawn_ a thread there
[08:00:36] <Dark_Shikari> ok, true, but even a context switch woulnd't be worth it
[08:00:44] <Dark_Shikari> it probably takes 400 clocks to escape a 1500 byte slice
[08:01:02] <Dark_Shikari> Basically, the reason we give that guarantee is that it costs x264 nothing to do so (that's how it just already is)
[08:01:06] <Dark_Shikari> so we might as well.
[08:02:19] <mru> I was thinking one might have an output thread running all the time taking x264_nal_t from a ring buffer or something
[08:02:26] <Dark_Shikari> Yeah, you could do that or something.
[08:02:42] <Dark_Shikari> Also, the reason this came up is we got demoed a broadcom system that did per-slice output like this (in hardware)
[08:02:48] <Dark_Shikari> and, I mean, it's not at all a new idea
[08:02:51] <Dark_Shikari> but we figured we should do it too
[08:03:05] <mru> if it's easy to do and helps someone, why not
[08:03:11] <Dark_Shikari> broadcom was claiming 1080p @ 3ms encoding time with 300mw
[08:03:17] <Dark_Shikari> h264
[08:03:28] <Dark_Shikari> 'course, that was intra only. They admitted latency would go up with inter compression.
[08:03:32] <mru> and psnr of 3 too
[08:03:37] <Dark_Shikari> lol
[08:04:06] <Dark_Shikari> Apparently my boss sorta scared him when he said "so how does the quality compare to x264"
[08:04:33] <Dark_Shikari> "oh shit our potential customers are comparing us to something that isn't shit. whatever do we do"
[08:04:43] <j0sh_> Dark_Shikari: i have never used x264's api, so this may sound dumb
[08:05:01] <j0sh_> but why not have x264_nal_encode return the nal units, via an init option or whatever
[08:05:09] <Dark_Shikari> ?
[08:05:25] <Dark_Shikari> currently, when you call x264_encoder_encode, it gives you back completed nal units
[08:05:28] <Dark_Shikari> book it, done
[08:05:28] <j0sh_> since it sounds like its useless if you use the nalu_process callback
[08:05:31] <mru> ibc 2004 was hilarious
[08:05:31] <Dark_Shikari> at the end of each frame
[08:05:40] <mru> all the vendors were showing h264 encoders
[08:05:43] <Dark_Shikari> the new idea: make it so that you can get back the nalus the instant they are done
[08:05:47] <mru> all the encoders looked much more shit than mpeg2
[08:05:50] <Dark_Shikari> even if the whole frame isn't done yet
[08:05:55] <mru> because they _were_ mpeg2 encoders
[08:05:56] <Dark_Shikari> (If you want, not required)
[08:06:06] <Dark_Shikari> mru: 300mw is still pretty impressive.
[08:06:10] <j0sh_> right, why can't you just use the old interface for that
[08:06:13] <Dark_Shikari> Though I have a strong suspicion there is a lot of hardcoding going on
[08:06:23] <Dark_Shikari> i.e. it isn't flexible
[08:06:24] <mru> the only good demo there was from NTT
[08:06:36] <Dark_Shikari> j0sh_: how will the calling app know when the NAL units are returned?
[08:06:36] <mru> they'd spent like 3 weeks encoding a 5-minute clip
[08:06:44] <Dark_Shikari> lol
[08:06:55] <j0sh_> Dark_Shikari: same way it knows when a frame is complete?
[08:06:56] <mru> ok, I made that up
[08:07:03] <mru> but they'd done something insane
[08:07:20] <Dark_Shikari> j0sh_: when it returns?
[08:07:28] <Dark_Shikari> you mean have x264_encoder_encode encode one slice instead of one frame?
[08:07:32] <j0sh_> i suppose (never used x264 api)
[08:07:38] <mru> what bitrate does x264 need for good quality 1080p these days?
[08:07:44] <Dark_Shikari> it works the same way as ffmpeg
[08:08:45] <Dark_Shikari> i.e. you give it an input frame, it gives you an output frame
[08:08:49] <j0sh_> right
[08:09:10] <j0sh_> it just seems a bit hacky, having two interfaces to get the data, even thoug theyre for different purposes
[08:09:26] <j0sh_> s/get the data/get the same data
[08:10:28] <Dark_Shikari> j0sh_: here's why that won't work
[08:10:34] <Dark_Shikari> x264 encodes multiple slices at once
[08:10:42] <Dark_Shikari> therefore, any return method MUST be asynchronous
[08:10:48] <Dark_Shikari> because slices can finish at any point in time, possibly simultaneously
[08:10:53] <Dark_Shikari> and can be anywhere in the frame
[08:11:08] <Dark_Shikari> and it makes sense to have two interfaces -- a high level one and a low level one
[08:11:26] <Dark_Shikari> the high level one for most apps, the low level one for apps that need a special flexibility or feature
[08:11:27] <j0sh_> is x264_nal_encoder_encode a callback also?
[08:11:35] <Dark_Shikari> x264_nal_encode you mean?
[08:11:38] <Dark_Shikari> it's just an internal function.
[08:11:49] <Dark_Shikari> it converts an internal-to-x264 nal_t into a ready-for-output nal_t
[08:11:54] <j0sh_> err, encoder_encode
[08:12:01] <Dark_Shikari> x264_encoder_encode is just a synchronous call
[08:12:05] <Dark_Shikari> takes input frame, gives you output frame
[08:12:16] <Dark_Shikari> mru: depends on the clip, your definition of good quality, and the constraints =p
[08:13:49] <mru> nature scene, virtually no blockiness, no constraints
[08:14:14] <Dark_Shikari> virtually no blockiness -> let's just blur everything
[08:14:21] <mru> ok, no blur either
[08:14:23] <Dark_Shikari> but seriously, about 3-7mbps.
[08:14:27] <mru> ok
[08:14:39] <Dark_Shikari> parkjoy, a very hard clip (1080p50) takes ~10-15mbps
[08:14:45] <Dark_Shikari> or 5-7.5mbps at 25p
[08:14:46] <j0sh_> Dark_Shikari: got it. the distinction is between being synchronous and async
[08:14:50] <Dark_Shikari> j0sh_: yes
[08:14:56] <Dark_Shikari> there are other solutions for the problem
[08:15:07] <Dark_Shikari> Another option, for example, would be to have a condition variable
[08:15:14] <Dark_Shikari> and to broadcast whenever nal units are available to pick up
[08:15:19] <mru> I don't remember what bitrate that NTT clip was, but it was in that ballpark
[08:15:22] <Dark_Shikari> Or various other threading hacks
[08:15:29] <Dark_Shikari> All of them imo are uglier than a callback
[08:15:34] <Dark_Shikari> and less crossplatform
[08:15:37] <j0sh_> Dark_Shikari: that requires the thread to poll, doesnt it? hacky
[08:15:40] <Dark_Shikari> Yes.
[08:15:41] <Dark_Shikari> Hacky.
[08:15:57] <Dark_Shikari> So basically the idea is to have a callback and let the app handle its own synchronization problems
[08:16:03] <mru> callback makes sense here
[08:16:06] <Dark_Shikari> i.e. me being able to call the callback 4 times from different cores at the same time.
[08:16:25] <j0sh_> i agree. i think it was the wording that threw me
[08:16:55] <mru> this is missing one thing
[08:16:55] <Dark_Shikari> Of course, now we need to fix lavc to allow us to send it out of order slices.
[08:17:07] <mru> there's no way for the app to determine which order those slices go
[08:17:14] <mru> without looking inside the payload
[08:17:32] <Dark_Shikari> That's what I said earlier
[08:17:39] <Dark_Shikari> we can add variables to x264_nal_t
[08:17:52] <Dark_Shikari> to mark the start and end of each slice
[08:18:04] <mru> or a slice index
[08:18:07] <Dark_Shikari> Impossible
[08:18:10] <Dark_Shikari> we don't know how many slices there will be yet
[08:18:14] <mru> oh
[08:18:28] <Dark_Shikari> e.g. in the case of slice-max-size.
[08:18:33] <mru> right
[08:18:40] <Dark_Shikari> that threw my boss for a loop
[08:18:45] <Dark_Shikari> once he realized that was the case
[08:18:51] <Dark_Shikari> he was like "oh shit, this breaks half our assumptions"
[08:19:05] <Dark_Shikari> (the assumptions being "we know how many slices there are when we start sending packets")
[08:19:14] <Dark_Shikari> and their order
[08:19:17] <Dark_Shikari> and on receiving
[08:19:32] <Dark_Shikari> (currently our header for each packet contains index and count)
[08:27:08] <j0sh_> 1 slice == 1 nalu? or 1 slice == many nalus?
[08:28:07] <Dark_Shikari> former
[09:36:44] <KotH> nalu? is that a new currency? ;)
[09:37:19] <lu_zero> KotH yup
[09:37:30] <lu_zero> btw you are in Geneve this week?
[09:37:36] <KotH> nope
[09:37:43] <KotH> but i can try to be there :)
[09:37:45] <lu_zero> =|
[09:38:09] <KotH> iirc rathann is still there too
[09:38:14] <KotH> when are you in geneva?
[09:38:34] <lu_zero> I'll be around there from tomorrow to tuesday
[09:38:51] <KotH> when tomorrow?
[09:38:58] <lu_zero> yet to be decided
[09:39:05] <lu_zero> I have to gather the other people
[09:39:21] <mru> say hi to my sister if you see her :-)
[09:39:35] <KotH> i can manage to be in geneva in the late afternoon/early evening tomorrow or on monday evening
[09:39:51] <KotH> mru: i dont have her phone number, yet ;)
[09:40:33] <spaam> KotH: do you think she will be cute? come.. female version of mru ;)
[09:40:53] <KotH> spaam: lol.. you dont even know mru
[09:40:59] <mru> KotH has met her
[09:41:01] <KotH> spaam: beside, i know her already :)
[09:41:11] <j0sh_> lu_zero: can i substitute my earlier series of patches for an emailed status update tonight? :)
[09:41:17] <spaam> KotH: but i have seen pictures of mru ;D
[09:41:34] <KotH> spaam: and we've all seen pictures of diegomax ;)
[09:41:35] <spaam> KotH: if you already know her.. then it cant be that hard to find her number :)
[09:41:44] <lu_zero> KotH: you are in sweden?
[09:41:48] <KotH> lol
[09:41:52] <spaam> KotH: haha yeah.. but i didnt do those pictures ;)
[09:41:57] <KotH> lu_zero: nope, in zürich
[09:42:19] <KotH> lu_zero: but i've met many people, strange and curious over all those years ;)
[09:42:46] <lu_zero> j0sh_: you disike reports a lot ^^?
[09:43:09] <j0sh_> lu_zero: only when i feel like i have nothing to say
[09:43:17] <mru> KotH: are you calling my sister strange?
[09:43:27] <mru> 'cause if you are, you're not far off the mark
[09:43:28] <spaam> mru: he did that
[09:43:35] <lu_zero> KotH: my mental image of mru sister would be some kind of female version of him, riding a llama in some kind of traditional swedish garments
[09:43:45] <spaam> mru: you know what you have to do with KotH !
[09:43:51] <KotH> lu_zero: rotfl
[09:43:53] <mru> lu_zero: the other sister
[09:44:04] <spaam> lu_zero: hahaha
[09:44:08] <lu_zero> mru: she's your sister, the blood is the same
[09:44:12] <spaam> mru: do you have two sisters? :O
[09:44:16] <mru> yes, two
[09:44:18] <spaam> pics?
[09:44:19] <mru> one has a llama
[09:44:29] <mru> the other is trying to blow up the universe at lhc
[09:44:34] <lu_zero> ah
[09:44:51] <lu_zero> the mental image gets even more interesting
[09:44:55] <j0sh_> am i too far off the mark when i say all of our sisters probably think we're weirdo computer geeks?
[09:45:09] <j0sh_> mine does, at least
[09:45:10] <mru> I know mine do
[09:45:19] <KotH> lu_zero: if i can manage to get hold of her, i'll introduce you
[09:45:26] <lu_zero> j0sh_: I have a wierd biology/medicine/whatever geek as sister...
[09:45:32] <KotH> mru: do you have me a number? :)
[09:45:39] <lu_zero> ok
[09:45:48] <lu_zero> be prepared to use japanese btw
[09:46:02] <KotH> j0sh_: sorry, no sister available on this end of the tubes
[09:46:11] <KotH> lu_zero: no prob
[09:46:28] * KotH makes a mental note to take his dictionary with him
[09:46:35] * lu_zero we'll pay the fact he didn't learn enough this weekend
[09:46:42] <KotH> lol
[09:46:47] <KotH> lu_zero: btw: why are you in .ge?
[09:46:59] <lu_zero> igf meeting
[09:47:11] <KotH> igf?
[09:47:21] <KotH> imaginary girl friend?
[09:47:26] <lu_zero> UN internet governance forum
[09:47:33] <KotH> ah..
[09:47:40] <KotH> sounds interesting
[09:47:50] <KotH> lu_zero: why didnt you tell me before, i would have taken a few days off
[09:48:06] <lu_zero> KotH: I do ALWAYS forget you are there =_=
[09:48:35] <lu_zero> usually I put you somewhere between tokyo and berlin
[09:48:40] <KotH> rotfl
[09:48:49] <lu_zero> my fault
[09:49:04] * lu_zero needs latitude or such service to remember positions
[09:49:09] <KotH> lu_zero: ok, then make a note somewhere that you are talking to a japanophile turkish living _in_switzerland_ :)
[09:49:33] <lu_zero> theheh
[09:49:34] <lu_zero> btw
[09:49:54] <lu_zero> I was serious when I was thinking about an ffcon in Tokyo or Sapporo
[09:50:30] * lu_zero should get the foundation website done and start asking for sponsorships
[09:50:35] <KotH> lu_zero: 47°25'N 8°33'E
[09:50:43] <mru> lu_zero: ffcon in japan is not practical
[09:50:58] <mru> too many devs would be unable to go
[09:51:05] <mru> even if we managed to sponsor the trip
[09:51:25] <Dark_Shikari> he just wants an excuse for a junket to japan
[09:51:49] <mru> I'd go, no question
[09:51:54] <pross-au> .jp is not exactly the centre of the FFmpeg universe
[09:51:56] <mru> but I know others are less flexible
[09:52:06] <lu_zero> pross-au: well you could make it
[09:52:07] <KotH> mru: having enough sponsors, it would be possible
[09:52:12] <mru> no
[09:52:14] <KotH> mru: visa is generally not so much a problem
[09:52:23] <mru> none needed for most of us
[09:52:23] <Dark_Shikari> imo there's better things to use the money on
[09:52:29] <mru> but that's not the issue
[09:52:32] <pross-au> Japan is expensive
[09:52:35] <lu_zero> Dark_Shikari: opinions?
[09:52:41] <mru> the issue is the 12-hour travel to get there
[09:52:46] <KotH> pross-au: but japanese is one of the three fflanguages
[09:52:54] <pross-au> three?
[09:52:58] <pross-au> Oh right.
[09:53:02] <KotH> pross-au: c, sweedish and japanese
[09:53:07] <Dark_Shikari> mru: 12 hours to get to europe too.
[09:53:09] <KotH> pross-au: and japan isnt that expensive
[09:53:15] <mru> Dark_Shikari: not for the euro devs
[09:53:15] <KotH> pross-au: zürich is more expensive than tokyo
[09:53:19] <mru> and those are many
[09:53:20] <Dark_Shikari> mru: well duh
[09:53:25] <Dark_Shikari> but it's not 12 hours to get to japan for the japan devs
[09:53:27] * Dark_Shikari ducks
[09:53:33] <lu_zero> given most of us are europeans I might really try to get something here (Torino)
[09:53:55] <lu_zero> Dark_Shikari: that gives me yet another idea for the website
[09:54:00] <Dark_Shikari> if we ever did have an ffcon in japan, I could try to get all the japanese x264 people to come
[09:54:00] <KotH> lu_zero: well, i have a potential location in northern germany that isnt too expensive
[09:54:15] <Dark_Shikari> but some of them are hikkikomori so it might be hard.
[09:54:15] <KotH> lu_zero: if i had the time, i would have organized something for september
[09:54:51] <lu_zero> Dark_Shikari: ouch
[09:54:52] <KotH> Dark_Shikari: there are ways to get hikkikomori out of their cave
[09:55:01] * KotH knows such people
[09:55:04] <lu_zero> KotH: name 3
[09:55:19] <KotH> lu_zero: 1) visit them at their place
[09:55:23] <KotH> lu_zero: 2) talk to them
[09:55:32] <spaam> *drusm*
[09:55:37] <KotH> lu_zero: 3) make them sense that you are not less strange than they are
[09:55:39] <KotH> lu_zero: 4)...
[09:55:43] <KotH> lu_zero: 5) profit
[09:55:48] <lu_zero> KotH: 3 is easy
[09:55:48] <Dark_Shikari> yeah, it's not impossible
[09:55:53] <Dark_Shikari> especially when they're already on irc a lot
[09:56:10] <KotH> lu_zero: besidde, they are japanese. ie easy to drag around
[09:56:19] <KotH> lu_zero: then give them some alc and they open up
[09:56:22] <Dark_Shikari> lol
[09:56:37] <KotH> been there, done that, succeeded
[09:56:38] <lu_zero> KotH: you mean pick them up and move them like they are small furniture?
[09:56:46] <KotH> lu_zero: exactly!
[09:57:01] <KotH> lu_zero: though, you only have to do it the first minute or two
[09:57:14] <lu_zero> KotH: well the Japanese friend I know are among the few with a extreme alcool resitance...
[09:57:15] <KotH> lu_zero: after they've left their cave, they will follow you like small dogs
[09:57:27] * twnqx likes northern germany
[09:57:35] <KotH> lu_zero: japanese can drink a lot....
[09:57:43] <KotH> lu_zero: more than you'd expect from the stories
[09:57:53] <lu_zero> KotH: japanese beer isn't drinking...
[09:57:57] <KotH> lu_zero: but, they are not used to european stuff, thus get pretty fast drunk here
[09:58:11] <lu_zero> I mean friends living here
[09:58:12] <KotH> lu_zero: then, is it food?
[09:58:17] <KotH> ah.. ok
[09:58:29] <lu_zero> drinking stuff that is strong for the locals...
[09:59:20] <lu_zero> j0sh_: good report =)
[10:01:57] <lu_zero> still first we should get a website
[10:04:20] <CIA-99> ffmpeg: mru * r23792 /trunk/common.mak: Fix brief make messages when CC etc are specified on command line
[10:05:21] <lu_zero> KotH: btw do you have my cellphone number?
[10:06:51] <KotH> if you have mine, then probably yes :)
[10:07:10] <mru> assurance of mutual destruction?
[10:07:20] <KotH> lu_zero: nope, i dont have it
[10:15:42] <lu_zero> mru: ?
[10:17:40] <lu_zero> btw swiss operators have pre-paid sim with good data plans?
[10:17:57] <mru> yes, good for them
[10:18:17] <lu_zero> mru: =P
[10:26:57] <Honoome> lu_zero: if you foresee asking me again for geneve, let me know about that as well :P
[10:30:31] <KotH> lu_zero: no idea
[10:30:36] <KotH> lu_zero: i never used prepaid
[10:34:14] * KotH goes off
[10:34:15] <KotH> bbl
[10:44:02] <janneg> lu_zero: for limited usage http://www2.coop.ch/coopmobile/ang_internet_datenoption.cfm?language=it is not too bad
[10:44:33] <janneg> 100M free, then 0.2 franc per MB
[10:45:39] * janneg has a sister in geneve too
[11:25:33] <lu_zero> uhmm
[11:27:04] <lu_zero> sounds interesting
[11:27:50] <lu_zero> 15chf is quite good
[11:41:19] <j-b> astrange: I have to say that linux isn't my priority care, but DxVA2 is :D
[11:42:21] * mru sends j-b off for priority adjustment
[12:15:29] * /join #ffmpeg-devel ...
[12:15:31] *** TOPIC: Welcome to the FFmpeg development channel. | Discussions about the development of FFmpeg itself are ontopic here. | Questions about using FFmpeg or developing with the libav* libraries should be asked in #ffmpeg. | FFmpeg 0.6 has been released! | This channel is now publicly logged.
[12:15:31] *** TOPICINFO: peloverde!~alex(a)cpe-173-88-148-20.neo.res.rr.com, 1276886498
1
0
[00:28:13] <peloverde> Dark_Shikari, your new patch has line ending issues
[00:28:13] <Dark_Shikari> oh, I'll fix that.
[00:28:13] <Dark_Shikari> in h264_intrapred.asm only right?
[00:28:25] <peloverde> yes only in that file
[00:28:34] <peloverde> Also "Added: svn:executable"
[01:16:46] * Terminating due to: TERM
[01:16:58] * /join #ffmpeg-devel ...
[01:17:00] *** TOPIC: Welcome to the FFmpeg development channel. | Discussions about the development of FFmpeg itself are ontopic here. | Questions about using FFmpeg or developing with the libav* libraries should be asked in #ffmpeg. | FFmpeg 0.6 has been released! | This channel is now publicly logged.
[01:17:00] *** TOPICINFO: peloverde!~alex(a)cpe-173-88-148-20.neo.res.rr.com, 1276886498
[01:19:53] <CIA-99> ffmpeg: bcoudurier * r23764 /trunk/libavformat/mov.c: Improve mov atom parsing debug message, print parent atom and size in decimal
[01:23:00] <mru> anyway, the silly float_to_int_interleave converts each channel linearly into a temp buffer, then scatters it into the output buffer
[01:25:44] <bcoudurier> kierank, nice progress on dolby-e
[01:26:14] <mru> wouldn't it be more efficient to scatter-store the samples directly?
[01:26:15] <pengvado> yes, for the odd case that I didn't bother to optimize
[01:26:52] <mru> pengvado: so you make a fucking vla with unbounded size instead?
[01:27:07] <pengvado> yes
[01:27:10] <mru> WHY????
[01:27:20] <mru> do you have a deathwish?
[01:27:47] <Dark_Shikari> because ffmpeg doesn't have h->scratch_buffer ?
[01:27:49] <kierank> bcoudurier: there are still many mysteries to figure out.
[01:28:06] <kierank> like syntax elements that seem to do nothing
[01:28:22] <bcoudurier> can you hear sound yet ?
[01:28:41] <kierank> the mdct is not implemented yet, so no
[01:28:47] <bcoudurier> ok
[01:28:54] <bcoudurier> what's your sample again ? mxf or ts ?
[01:29:01] <mru> Dark_Shikari: and why not store directly to the final destination?
[01:29:15] <kierank> and the binary decoder i haven't been able to get sound out of (i have no idea why but i'm sure it's a 10l somewhere)
[01:29:16] <pengvado> because even though I didn't spend any effort optimizing that case, I thought that "sub esp, foo"was faster than malloc.
[01:29:31] <kierank> bcoudurier: a sample i made with the encoder
[01:29:32] <mru> you don't need a temp buffer AT ALL
[01:30:05] <Dark_Shikari> "didn't spend any effort"
[01:30:10] <pengvado> do I then special-case the last sample, or accept writing beyond the end?
[01:30:20] <mru> Dark_Shikari: that's negative effort
[01:30:30] <mru> it's actively stupid
[01:30:41] <mru> pengvado: ???
[01:31:13] <Dark_Shikari> "laziness is a cardinal virtue"
[01:31:15] <pengvado> there is no 16bit store from mmx. (well there's pextrw, but that's slow)
[01:31:21] <mru> gaaaaaaaaaaaaaaaaah
[01:31:31] <Dark_Shikari> Oh yeah, and what pengvado said.
[01:31:35] <mru> see why x86 is annoying?
[01:31:40] <Dark_Shikari> mru: compare altivecd
[01:31:44] <Dark_Shikari> it has no 32-bit or 64-bit store
[01:31:49] <mru> compare neon
[01:31:55] <Dark_Shikari> Not everything is like neon
[01:31:57] <lu_zero> compare spu
[01:31:59] <Dark_Shikari> in fact, most things are not like neon
[01:32:01] <pengvado> altivec has a 32bit store, but it requites 16byte alignment...
[01:32:02] <mru> has 8, 16, 32, 64, 128, 256
[01:32:15] <mru> to/from _any_ position
[01:32:16] <kierank> bcoudurier: mxf and ts will need some fixes. Probably in conjunction with latm, some sort of way of searching the packet for 337m headers or latm headers.
[01:32:35] <lu_zero> Dark_Shikari: altivec should be compared to mmx
[01:33:13] <Dark_Shikari> mmx had 32-bit stores too
[01:33:19] <Dark_Shikari> mmx had no alignment restrictions
[01:33:23] <lu_zero> so?
[01:33:30] <mru> to be fair, the guy designing neon had mmx and altivec to avoid mistakes from
[01:33:34] <Dark_Shikari> and what mru said.
[01:33:53] <lu_zero> alignment restrictions complaint are similar about compare&swap complaints...
[01:33:54] <mru> thankfully he had the insight to do so
[01:34:15] <mru> compare&swap is silly
[01:34:18] <bcoudurier> ts should use s302m no ?
[01:34:19] <pengvado> anyway, I could have not used mmx at all for the cases I didn't cother to specialize. or I could have used malloc. whatever, I don't see anything to apologize for.
[01:34:36] <mru> load-locked/store-exclusive is much easier to implement and more flexible to use
[01:34:50] <Dark_Shikari> mru: btw, the only weakness I can spot in NEON is the lack of pmovmskb and its inverse (mmx doesn't have the inverse either)
[01:34:54] <lu_zero> yet I got people complaining to me about that
[01:34:58] <mru> pengvado: you have a frikkin vla to apologise for
[01:34:59] <Dark_Shikari> or, in general, byte -> bit and vice versa.
[01:35:06] <lu_zero> since ppc was so limited because of that
[01:35:13] <kierank> bcoudurier: yes you're right, ts will have another step but you might just be able to handle that in the ts demuxer because afaik you can't have 302m in mxf or anything else
[01:35:20] <lu_zero> then we looked at behaviour and latency...
[01:36:38] <mru> anyway, enough blaming
[01:36:44] <mru> how do we fix it?
[01:37:29] <Dark_Shikari> btw, mru, how are you detecting vlas?
[01:37:32] <Dark_Shikari> or just looking at the code?
[01:37:34] <mru> -Werror=vla
[01:37:38] <Dark_Shikari> i.e. did you explicitly go looking at the asm to find it?
[01:37:46] <Dark_Shikari> er... but that won't catch yasm will it?
[01:37:49] <Honoome> mru: uh... nice one that.. /me checks on feng
[01:38:09] <mru> Dark_Shikari: no, but I have to stop somewhere
[01:38:18] <Dark_Shikari> but how did you find float interleave ?
[01:38:30] <mru> the vla is in c
[01:39:02] <bcoudurier> mxf uses either sdti-cp S385M or aes3/wav
[01:39:02] <Dark_Shikari> ahhh k
[01:39:03] <mru> I'd hope there are no bad vlas in the yasm code
[01:39:05] <bcoudurier> should
[01:39:17] <Honoome> guuuh autoconf fails to identify a c99 compiler if you use -Werror=vla ....
[01:39:25] <mru> wtf
[01:39:34] <Dark_Shikari> mru: just grep for regex "sub[ ]*esp,"
[01:39:36] <Dark_Shikari> ;)
[01:39:47] <mru> Dark_Shikari: a vla would have to use ___chkstk on windows
[01:40:01] <bcoudurier> in any case it shouldn't be difficult to handkle
[01:40:33] <Dark_Shikari> mru: true
[01:40:37] <Dark_Shikari> yeah, i doubt any yasm uses it
[01:40:40] <Dark_Shikari> definitely none in x264
[01:40:41] <Dark_Shikari> in C or yasm
[01:41:11] <kierank> bcoudurier: it will need a round of bikeshedding on ffmpeg-devel.
[01:42:04] <mru> if something is allocating a small variable amount in yasm I don't care
[01:42:14] <pengvado> pixel_ads does
[01:42:25] <bcoudurier> why, you don't want to handle it in the decoder ? :)
[01:42:44] <Dark_Shikari> pengvado: oh, you're right
[01:42:54] <mru> what does it do?
[01:43:04] <Dark_Shikari> that would make it the only one
[01:43:57] <Dark_Shikari> mru: successive elimination for exhaustive motion search. the stack space appears to be a place to output candidates before iterating over them in scalar
[01:44:08] <bcoudurier> personnally I start to think that we should have a format field in AVCodecContext
[01:44:22] <bcoudurier> set to the format variant
[01:44:34] <bcoudurier> or we could just use a different codec_id
[01:44:42] <pengvado> if they're processed in the right order, we could use the output space as tmp too
[01:45:16] <kierank> bcoudurier: how will the container know whether it's plain wav or dolby e. (equally for aac whether it's plain aac or latm). The way ffmpeg's designed right now means the container sets the number of streams
[01:45:32] <Dark_Shikari> pengvado: are they?
[01:45:44] <kierank> 302m makes this more complicated because you have to parse that too
[01:47:01] <kierank> also it doesn't help that 302m allows non-byte-aligned 20-bit audio which would have to be made 24-bit or 32-bit aligned
[01:48:59] <bcoudurier> kierank, the ts knows if it's latm
[01:49:08] <bcoudurier> the ts knows if it's s302m too
[01:49:23] <kierank> we still have the problem then for mxf and plain wav
[01:49:25] <bcoudurier> mxf is supposed to use another codec ul
[01:49:34] <bcoudurier> but all files do not use it
[01:49:42] <bcoudurier> so in that case, you need a codec probe
[01:50:01] <kierank> you'd need a "container probe"
[01:50:27] <kierank> also afaik some of the japanese streams still flag the audio as aac
[01:50:35] <kierank> even though it's latm
[01:50:46] <bcoudurier> are you sure about that ?
[01:51:02] <bcoudurier> what stream type is it using ?
[01:52:17] <pengvado> Dark_Shikari: yes
[01:53:00] <Dark_Shikari> pengvado: patches welcome then
[01:53:10] <Dark_Shikari> preferably after you send me the comments for the patches you said you would send comments on
[01:53:26] <Dark_Shikari> hmm. would there be any objections to me just committing tons of non-bikeshed vp8 optimizations?
[01:53:30] <Dark_Shikari> i.e. C optimizations
[01:53:33] <Dark_Shikari> not asm or anything outside of vp8.c
[01:55:08] <kierank> bcoudurier: looks like mpeg-2 aac.
[01:55:21] <kierank> whatever the id is for that; i forget
[01:58:46] <kierank> there will also be a flurry of brits complaining about lack of latm support soon to so it's something that should be dealt with if one could sort out dolby e and latm in the same swoop
[01:59:52] <Honoome> oh my latm again?
[02:05:00] <janneg> bcoudurier: there should be a sample in roundup which switches from adts to latm in stream
[02:05:32] <bcoudurier> without a new pmt ?
[02:06:23] <kierank> maybe there's a descriptor or something. however there are no good ts analyzing apps so it's difficult to tell
[02:06:43] <janneg> I don't remember
[02:07:53] <janneg> kierank: I'm currently writing latm demuxer and the idea is to integrate it then as subdemuxer into the mpeg ts demuxer
[02:08:18] <kierank> can latm go in anything other than ts?
[02:09:48] <kierank> mp4 possiblty
[02:09:49] <mru> it can go insane
[02:11:32] <kierank> latm doesn't have a good reason to do such weird things. dolby e arguably does
[02:11:58] <janneg> bcoudurier: http://roundup.ffmpeg.org/issue1999
[02:12:32] <kierank> freeview hd will be doing latm with 2 programs inside I think
[02:13:40] <janneg> so we need a demuxer or at least something which can produce more than one stream
[02:14:34] <bcoudurier> janneg, why in the ts demuxer ?
[02:14:53] <kierank> a chained demuxer would be best but it would need some bikeshedding to get it agreed upon
[02:20:33] <janneg> bcoudurier: I have easier than chained demuxers
[02:20:47] <janneg> s/I have //
[02:23:11] * Honoome wonders if the people who would like to see FatELF have an idea what LSB is/was
[02:25:09] <bcoudurier> janneg, ok
[02:25:09] <Honoome> lu_zero, j0sh_, wbs: if you care, feng just merged in the netembryo 0.2.0 branch into master ;)
[02:25:20] <bcoudurier> something that works will be ok with me :>
[02:34:02] <Honoome> stupid C++ optimisations, I can't get it to emit a constructor for me
[02:34:52] <bcoudurier> yes this ts sample looks broken
[02:35:41] <kierank> worth saying coreaac handles it fine
[02:35:49] <kierank> and mplayer if you seek into the latm part
[02:41:18] <Compn> Dark_Shikari : google should hire you to work on libvp8
[02:41:22] <Compn> :P
[02:41:40] <Dark_Shikari> Well they're going to enjoy my "fuck you"
[02:41:42] <Dark_Shikari> all this code is LGPL
[02:42:10] <Dark_Shikari> aka not BSD.
[02:42:29] * Compn wonders how many times google is using ffmpeg
[02:42:55] <Compn> 1. youtube, 2. on2 mencoder flux encoder, 3. chrome h264 decoder, 4. webm/vp8, 5. android?
[02:43:15] <Dark_Shikari> though this is really more of a "fuck you" to firefox.
[02:43:26] <Dark_Shikari> Chrome is in fact totally awesome.
[02:43:31] <Compn> i guess google video is a seperate project from youtube, so we'll make that #6
[02:43:42] <kierank> chrome aac probably
[02:44:14] <Compn> firefox could add a ffmpeg vp8 plugin instead of using libvp8
[02:44:14] <Dark_Shikari> hmm. if I really wanted to be a dick, I could make my code gpl.
[02:44:20] <Dark_Shikari> Compn: over blizzard's dead body
[02:44:27] <Compn> so it wouldnt be a fuck you so much as a bit more work for firefox...
[02:44:50] <Compn> i mean, firefox could compile a vp8/vorbis-only ffmpeg plugin
[02:44:52] <Dark_Shikari> firefox is rather... anti-ffmpeg
[02:44:54] <Dark_Shikari> period
[02:45:01] <Dark_Shikari> not "anti-stuff-they-don't-like-in-ffmpeg"
[02:45:05] <Compn> it seems so
[02:45:44] * Compn waits for chrome to have some advanced preferences before he uses it
[02:46:14] <Compn> also less google privacy invasions
[02:46:22] <kierank> chromium still crashes on long pages :/
[02:46:45] <Dark_Shikari> as opposed to firefox, which just crashes
[02:50:10] <Compn> Dark_Shikari : btw some benchmarks on your sssee3 patches might be of some help
[02:50:30] <Dark_Shikari> why
[02:52:22] <Compn> dunno, sounded like a good idea at the time
[02:53:46] <Dark_Shikari> I didn't bench any of it.
[02:53:47] <Dark_Shikari> Ever.
[02:54:06] <Dark_Shikari> ok, except for the idct sse4, that one was tricky
[02:54:14] <Dark_Shikari> the rest I didn't bench.
[02:54:22] <Dark_Shikari> I wonder if that will piss off michael =p
[02:54:49] <bcoudurier> gpl is fine :)
[02:54:51] <kierank> what's up with videolan and shoutcast?
[02:55:17] <Dark_Shikari> aol
[02:55:24] <bcoudurier> aol license requiring toolbar shipping along
[02:55:27] <bcoudurier> crap like that
[02:55:51] <kierank> lol
[02:56:00] <kierank> the vlc website should just troll aol
[02:56:21] <bcoudurier> kierank, yes any implementation is free to handle broken files, ffmpeg handles many of them :)
[02:56:53] <kierank> "YOU WERE AN INTERNET TITAN, AND NOW YOU'LL SETTLE FOR A TOOLBAR" AOL c2010
[02:59:01] <Compn> http://www.videolan.org/press/2010-1.html
[02:59:04] <Compn> that kinda explains things
[02:59:41] <kierank> send them some CDs
[02:59:46] <kierank> that'll shut them up
[02:59:49] <verb3k> "<Dark_Shikari> Chrome is in fact totally awesome." -->> That means the chromium development team/model isn't bad, and libvpx development might be similar.
[03:00:29] <Compn> but i guess j-b can explain further :)
[03:00:58] <Dark_Shikari> verb3k: libvpx uses the mozilla development model
[03:01:09] <Compn> chromium devel model good? ahahaha
[03:01:14] <Dark_Shikari> and no, chromium model isn't good
[03:01:31] <verb3k> but the software rocks.
[03:01:38] <Dark_Shikari> Sure, until you try to do things like fullscreen a video
[03:01:44] <Dark_Shikari> "better than firefox" doesn't mean your model is any good
[03:01:46] <Dark_Shikari> it just means it's less shit
[03:02:00] <verb3k> lol
[03:02:17] <Dark_Shikari> competing with firefox is like competing with the fat retard in class
[03:02:18] <Compn> firefox was following microsoft's lead, become slower over time and require more ram/cpu
[03:02:35] <Kovensky> @Dark_Shikari | as opposed to firefox, which just crashes <-- I tried midori today, it crashed before even the ui showed up
[03:03:02] <Kovensky> kierank | send them some CDs <-- oh you
[03:03:43] <Compn> oh now i got that finally
[03:03:45] <Compn> aol cds
[03:03:53] * Compn is slow
[03:03:59] <Compn> over 9000 hours free
[03:04:55] <kierank> 1000 free hours every month
[03:06:45] <saintdev> wasn't there a project a while ago to gather as many aol cds as possible and mail them back to aol?
[03:07:26] <Compn> yep
[03:08:53] <Kovensky> there's an ISP here, uol, that used to use the aol method
[03:13:21] <kierank> aol also did floppies
[04:24:34] <CIA-99> ffmpeg: darkshikari * r23765 /trunk/libavcodec/vp8.c: fix typo in vp8 decoder error message
[04:29:51] <CIA-99> ffmpeg: jai_menon * r23766 /trunk/libavformat/avienc.c:
[04:29:51] <CIA-99> ffmpeg: avienc : Avoid creating invalid AVI files when muxing subtitle streams
[04:29:52] <CIA-99> ffmpeg: other than XSUB.
[04:30:12] <jai> error: Unable to append to .git/logs/refs/remotes/git-svn: Permission denied
[04:30:15] <jai> mru: ^
[04:37:07] <CIA-99> ffmpeg: jai_menon * r23767 /trunk/libavutil/log.c:
[04:37:07] <CIA-99> ffmpeg: Print a space after the AVClass prefix.
[04:37:07] <CIA-99> ffmpeg: This improves readability a bit.
[06:54:51] <KotH> hoi zäme
[07:03:25] <j-b> Compn: trolling AOL is fun :D
[07:19:20] <kshishkov> j-b: trolling AOL is pointless
[07:20:09] <kshishkov> j-b: and next thing you'll attempt to troll local bureaucracy
[07:42:36] <wbs> Honoome: that's good :-)
[07:57:39] <CIA-99> ffmpeg: mstorsjo * r23768 /trunk/libavformat/ (rtsp.c internal.h):
[07:57:39] <CIA-99> ffmpeg: RTSP: Remove skip_spaces in favor of strspn
[07:57:39] <CIA-99> ffmpeg: Patch by Josh Allmann, joshua dot allmann at gmail
[07:59:35] <CIA-99> ffmpeg: mstorsjo * r23769 /trunk/libavformat/ (rtsp.c Makefile rtpdec_mpeg4.c rtpdec.c rtpdec_mpeg4.h):
[07:59:35] <CIA-99> ffmpeg: RTSP: Decouple MPEG-4 and AAC specific parts from rtsp.c
[07:59:35] <CIA-99> ffmpeg: Patch by Josh Allmann, joshua dot allmann at gmail
[08:00:57] <CIA-99> ffmpeg: mstorsjo * r23770 /trunk/libavformat/ (rtsp.c rtpdec_mpeg4.c):
[08:00:57] <CIA-99> ffmpeg: RTSP: Move more SDP/FMTP stuff from rtsp.c to rtpdec_mpeg4.c
[08:00:57] <CIA-99> ffmpeg: Patch by Josh Allmann, joshua dot allmann at gmail
[08:02:10] <CIA-99> ffmpeg: mstorsjo * r23771 /trunk/libavformat/ (rtpdec_mpeg4.c rtpdec.c):
[08:02:10] <CIA-99> ffmpeg: rtpdec: Move AAC depacketization code in rtpdec to a proper payload handler
[08:02:10] <CIA-99> ffmpeg: Patch by Josh Allmann, joshua dot allmann at gmail
[08:03:41] <CIA-99> ffmpeg: mstorsjo * r23772 /trunk/libavformat/ (rtpdec.h rtsp.c rtpdec_mpeg4.c rtsp.h rtpdec.c):
[08:03:41] <CIA-99> ffmpeg: RTSP, rtpdec: Move RTPPayloadData into rtpdec_mpeg4 and remove all references to rtp_payload_data in rtpdec and rtsp
[08:03:41] <CIA-99> ffmpeg: Patch by Josh Allmann, joshua dot allmann at gmail
[08:04:43] <CIA-99> ffmpeg: mstorsjo * r23773 /trunk/libavformat/rtpdec_mpeg4.c:
[08:04:44] <CIA-99> ffmpeg: rtpdec_mpeg4: Rename PayloadContext to be consistently 'data'
[08:04:44] <CIA-99> ffmpeg: Patch by Josh Allmann, joshua dot allmann at gmail
[08:13:08] <lu_zero> Honoome: good =)
[08:15:54] <mru> mornings
[08:17:29] <kshishkov> goda morgnar, mru
[08:20:12] <CIA-99> ffmpeg: mru * r23774 /trunk/libavcodec/ (rv34vlc.h rv34.c): rv34: kill VLAs
[08:20:26] <Tjoppen> midsommar \o/
[08:21:20] <mru> is that when swedes get pissed, then raped?
[08:23:14] <mru> Dark_Shikari: why don't you start committing some vp8 simd?
[08:24:19] <Dark_Shikari> I'm waiting for bbb to fix his shit
[08:24:38] <mru> can none of it be committed before that?
[08:24:44] <Dark_Shikari> Some could, I guess.
[08:25:02] <astrange> llvm svn claims to not miscompile anymore
[08:28:31] <mru> hmm, vp8 tests are failing with armcc
[08:28:33] <mru> wonder why
[08:29:53] <astrange> of course, llvm assembler doesn't assemble us yet, and that comes first anyway
[08:30:08] <mru> ?
[08:30:25] <Dark_Shikari> mru: it outputted nothing
[08:30:30] <Dark_Shikari> looks like it crashed/failed?
[08:30:34] <mru> obviously
[08:31:14] <mru> I'll run it manually and check
[08:32:23] <mru> sigsegv
[08:34:06] <mru> calling some pred function with null dest
[08:41:53] <mru> libavcodec/vp8.c:1210: warning: 'curframe' may be used uninitialized in this function
[08:42:27] <mru> which is blatantly obvious looking at the code
[08:43:15] <Dark_Shikari> how did it work in the first place?
[08:43:19] <mru> or is there some very clever magic preventing that from happening?
[08:43:27] <mru> look at vp8_decode_frame
[08:44:18] <mru> I guess it makes sense for that for loop to always find something
[08:44:26] <roxfan> it's a shroedingbug!
[08:44:39] <mru> no, just somewhat fragile code
[08:44:42] <roxfan> it worked until you found it
[08:45:15] <mru> it works if I noinline intra_predict
[08:45:22] <mru> so it's a compiler bug for sure
[08:46:33] <Dark_Shikari> it could still be a bug in the code that's hidden by undefined behavior, but yeah, most likely a compiler issue
[08:46:51] <mru> it looks like a typical armcc bug
[08:48:03] <Dark_Shikari> "typical"?
[08:48:14] <mru> the kind of bug they usually have
[08:48:23] <mru> different compilers have different kinds of bugs
[08:49:26] <Dark_Shikari> I mean from them
[08:49:28] <Dark_Shikari> what makes it typical of their bugs
[08:53:31] <mru> it just feels familiar
[08:54:18] <Dark_Shikari> what kind of similar things have you seen?
[08:54:38] <mru> I don't know exactly what the problem is here yet
[08:55:43] <mru> but in most of them there's been a loop involved
[09:01:38] <mru> btw is vp8 intended to support emu_edge?
[09:03:02] <mru> it currently rejects it in init but checks the flag in various places
[09:03:16] <mru> that's fine if the plan is to support it eventually
[09:03:24] <mru> otherwise the checks should be removed
[09:03:42] <Dark_Shikari> it _should_ support it eventually
[09:03:54] <Dark_Shikari> right now it has the worst of both worlds
[09:04:03] <Dark_Shikari> it does emulated_edge_mc (worse than just extending the edges)
[09:04:21] <Dark_Shikari> _and_ it is inefficient due to requiring edge padding
[09:05:26] <mru> edge padding can't be all that bad
[09:05:32] <mru> it's like 16 pixels or whatever
[09:05:44] <mru> don't tell me the spec requires more
[09:05:54] <Dark_Shikari> Well you see, there was actually an issue in that regard ;)
[09:05:58] <Dark_Shikari> You need 32 anyways though
[09:06:00] <Dark_Shikari> due to 6-tap filter + alignment
[09:06:08] <Dark_Shikari> (same as in h264)
[09:06:10] <mru> 16 on either side, sure
[09:06:13] <Dark_Shikari> no
[09:06:15] <Dark_Shikari> 32 on either side
[09:06:17] <Dark_Shikari> same as in h264
[09:06:29] <Dark_Shikari> You can have a 16-wide mb fully off the frame
[09:06:34] <mru> yes
[09:06:35] <Dark_Shikari> plus 3 pixels more for the 6-tap filter
[09:06:38] <Dark_Shikari> 19 pixels
[09:06:42] <mru> ok
[09:06:43] <mru> I see
[09:06:54] <Dark_Shikari> It's a tad worse in vp8 due to the rather hilarious mv clamping bug.
[09:06:58] <Dark_Shikari> But still fits in 32.
[09:07:08] <mru> what is that bug?
[09:07:23] <mru> mvs more than just off the edge?
[09:07:43] <Dark_Shikari> So, VP8 had three mistakes. Each of them individually would have probably been minor, or in some cases, hardly an issue at all.
[09:07:46] <Dark_Shikari> But..
[09:08:01] <Dark_Shikari> "mistake" 1: MVs cannot go more than 16 pixels off the frame. no pointlessly long MVs.
[09:08:07] <Dark_Shikari> Not really a mistake, but this leads us to 2....
[09:08:34] <Dark_Shikari> mistake 2: MVs are clamped in the MV prediction loop. therefore, encoder and decoder clamping must match or else mv pred will give wrong results sometimes.
[09:08:43] <Dark_Shikari> mistake 3: lol let's totally fuck our encoder-side clamping
[09:09:19] <mru> why would you ever want an mv more than just off the edge?
[09:09:33] <Dark_Shikari> Suppose you have an MV that goes 16 pixels off the edge, i.e. the max that makes sense.
[09:09:44] <Dark_Shikari> suppose your frame is 10 macroblocks wide, and this is at macroblock 8.
[09:09:51] <Dark_Shikari> Macroblock 9 will have its MV predicted to be 32 pixels off the frame.
[09:09:54] <mru> why does being 16 pixels off the edge make sense?
[09:10:01] <mru> or did you mean it ends 16 off?
[09:10:05] <Dark_Shikari> it ends 16 off.
[09:10:10] <mru> ok
[09:10:12] <mru> makes sense
[09:10:15] <Dark_Shikari> so anyways, block 9 will have 32 off.
[09:10:24] <Dark_Shikari> if no mv delta is coded
[09:10:25] <Dark_Shikari> thus:
[09:10:36] <Dark_Shikari> 1) you allow arbitrarily far off the frame -- mv clamping is done during MC (h264)
[09:10:50] <Dark_Shikari> 2) you require a delta MV to make it valid again (still requires mv clamping if only to stop crashes)
[09:10:59] <Dark_Shikari> 3) you clamp MVs in the prediction loop
[09:11:20] <mru> MVs are predicted same as left MB + delta?
[09:11:28] <Dark_Shikari> no
[09:11:33] <Dark_Shikari> I was just giving a simple example
[09:11:35] <Dark_Shikari> of how, say, in MPEG-2
[09:11:44] <Dark_Shikari> you could have an MV that was not insane, become insane, merely through prediction to the next MB.
[09:11:52] <Dark_Shikari> i.e. by coding nothing, you get an insane MV.
[09:12:03] <mru> sure, it's just prediction
[09:12:15] <Dark_Shikari> Yes, but the point is, accordingly, you have to do one of 1), 2), or 3).
[09:12:59] <mru> 2 seems stupid
[09:13:19] <mru> 1 should encode most efficiently
[09:13:27] <Dark_Shikari> yes.
[09:13:34] <Dark_Shikari> well, actually, 3) might be better.
[09:13:41] <Dark_Shikari> it requires shorter deltas
[09:13:46] <Dark_Shikari> since it doesn't predict you into stupidly long mvs
[09:13:52] <Dark_Shikari> but, 3) is more costly
[09:14:02] <Dark_Shikari> and more importantly -- it means encoder-side clamping and decoder-side MUST MATCH.
[09:14:07] <Dark_Shikari> And they fucked that up.
[09:14:16] <Dark_Shikari> They had multiple mistakes:
[09:14:20] <Dark_Shikari> 1) splitmv mode did clamping wrong
[09:14:31] <Dark_Shikari> 2) at one point, clamping was done to the wrong pel level (i.e. fullpel instead of qpel or similar)
[09:14:37] <Dark_Shikari> 3) chroma used luma clamps
[09:14:41] <mru> lol
[09:14:43] <Dark_Shikari> and other really stupid shit
[09:14:55] <Dark_Shikari> fortunately there was only _one_ horrible mangling of the spec (splitmv) necessary to fix it
[09:15:08] <Dark_Shikari> and it only affects backwards compatibility with old streams
[09:15:21] <Dark_Shikari> but it's still _retarded_.
[09:15:41] <mru> that's why you should have two teams doing enc and dec independently off the same spec
[09:16:00] <Dark_Shikari> spec? what spec?
[09:16:08] <Dark_Shikari> they dont have a spec
[09:16:15] <mru> I know
[09:16:16] <Dark_Shikari> they changed "spec" to be the "bitstream guide"
[09:16:19] <Dark_Shikari> since that's what it is
[09:16:23] <mru> lol
[09:16:23] <Dark_Shikari> not joking
[09:16:25] <Dark_Shikari> they renamed it
[09:16:38] <mru> that's hilarious
[09:17:18] <av500> https://review.webmproject.org/gitweb?p=bitstream-guide.git;a=summary
[09:18:28] <mru> ugh, unwrapped lines
[09:18:43] <av500> it cites 3 references: ITU BT.601, Shannon and KR!
[09:19:29] <mru> when a "spec" describes a format using C struct syntax, you know something is amiss
[09:19:38] <mru> since struct layout in C is unspecified
[09:19:42] <Dark_Shikari> lol
[10:12:10] <lu_zero> ^^;
[10:17:56] <mru> ah, latest version has fixed the bug
[10:18:01] <mru> need to get a licence for that
[10:20:07] <kshishkov> license to kill bugs?
[10:22:29] <Honoome> at the end icculus himself commented on my latest blogpost.. at leaset he concedes most of the points I made
[10:26:04] <kshishkov> FatXXX always reminds me of Windows PE files that actually contain DOS program as well.
[10:26:37] <lu_zero> oh
[10:26:56] <Honoome> kshishkov: there are a few things that PE is better at than ELF thoug
[10:27:05] <Honoome> like the use of import tables...
[10:27:18] <Honoome> [sure it's a mess, but it works better at runtime]
[10:27:33] <mru> you call that better?
[10:27:36] <Honoome> otoh it stops you from being able to do proper runtime loading, and interposing...
[10:27:42] <mru> it's the most retarded feature ever
[10:28:01] <Honoome> mru: no chances of collisions, no need to hash 400-characters long symbol names...
[10:28:24] <mru> but binding by table index is insane
[10:28:30] <mru> change the dll and all hell breaks loose
[10:28:52] <kshishkov> it does by default
[10:28:55] <av500> dll hell?
[10:29:07] <Honoome> av500: (soname hell)**2
[10:29:39] <mru> elf will rarely just blow up in your face without an explanation
[10:29:46] <Honoome> mru: I guess neither approach is flawless.. sometimes I wish we didn't have the current flat namespace though
[10:29:56] <mru> there are cases, as we've seen, where it can fail to resolve things in annoying ways
[10:30:14] <Honoome> *cough* xine's aac bug from 2006
[10:30:24] <Honoome> siretart might remember that :P
[10:30:45] <mru> but I'd rather have a few obscure link errors than random runtime failures
[10:30:56] <Honoome> it was a runtime failure in xine's case
[10:31:08] <Honoome> runtime crash to be precise
[10:31:14] <mru> I'm not familiar with that bug
[10:31:16] <mru> what was it?
[10:31:28] * av500 guesses a smybol clash
[10:31:28] <Honoome> playing aac files, xine (and any app linked to libxine) crashed
[10:31:39] <mru> yeah, but what was the cause?
[10:31:45] <mru> symbol clash?
[10:31:48] <Honoome> I circled around it until Michael Meeks provided -Bdirect patches that solved it..
[10:31:49] <kshishkov> faad?
[10:32:01] <Honoome> turned out that xine-lib was bundling libfaad 2.5, while ffmpeg linked to libfaad 2
[10:32:14] <Honoome> (or the other way around, details are fuzzy in my mind right now)
[10:32:38] <Honoome> so the symbol from the faad plugin were being resolved by the libfaad brought in by libavcodec
[10:32:57] <Honoome> yes -Bdirect would have solved that
[10:33:21] <Honoome> [at the end I solved it using elf visibility, only exporting the single symbol the plugins used to load]
[10:33:33] <mru> that's what you deserve when bundling libs
[10:33:41] <Honoome> I know
[10:34:03] <Honoome> that's actually my original reason to write ruby-elf and the collision checker script
[10:35:15] <av500> err, -Bdirect does what?
[10:35:43] <Honoome> av500: nowadays nothing, it was a test by SuSE of binding symbols between libraries at build-time
[10:36:15] <Honoome> think of it as a -Bsymbolic between shared objects as well
[10:36:30] <av500> man gcc says something like: -Bprefix ...specifies where to find the executables, libraries, include files, and data files of the compiler itself
[10:36:41] <Honoome> or prelink on steroids
[10:36:46] <av500> ic
[10:36:47] <Honoome> av500: -Bdirect from the linker.. is long dead now
[10:36:53] <av500> ah
[10:36:54] <av500> ok
[10:38:15] <av500> Honoome: these AAPL "bundles", do they resolve deps between each other? or are they all standalone, except for "system" stuff?
[10:38:50] <Honoome> av500: afaict, it's the latter, they copy in the libraries they link to
[10:41:09] <av500> mru, after vla please remove all sleep(1) calls: http://bugs.python.org/issue9075
[10:42:22] <Honoome> ahaha... but it's a bit scary, is python the only one doing that stuff?
[10:45:20] <mru> lol
[10:46:11] <lu_zero> av500: ?
[10:46:26] <lu_zero> av500: you have bundles and frameworks...
[10:46:29] <lu_zero> and installers
[10:46:41] <lu_zero> the world isn't as simple as people like to picture it
[10:47:08] <lu_zero> so you might have a bundle (wesnoth) that uses some frameworks (sdl)
[10:47:23] <lu_zero> so you have to install the framework and then you can use the bundle
[10:47:47] <av500> ok
[10:48:21] <lu_zero> and ryan could consider just letting distributors do their stuff and be happy...
[10:48:26] <av500> and who resolves that sdl might have changed between wesnoth and wesnoth_2_return_of_the_wesnoth?
[10:48:54] <Honoome> hrm, did ffmpeg lose the ability to rotate videos, I'm deluded it ever did, or am I missing something?
[10:49:38] <lu_zero> avfilter...
[10:49:43] * lu_zero meanwhile disables it
[10:57:47] <lu_zero> I couldn't track where that damn var gets overwritten =_=
[10:58:22] <Honoome> lu_zero: use gdb watchpoints
[10:59:18] <lu_zero> uhmm
[11:19:25] <lu_zero> wbs: a patch got lost am I wrong?
[11:29:51] <lu_zero> siretart: ping
[11:30:15] <siretart> lu_zero: what's up?
[11:30:29] <lu_zero> I need your ubuntu knowledge ^^;
[11:30:49] <lu_zero> I have to setup 10 laptops with the very same system
[11:31:07] <lu_zero> which is the quickest route given I have one already prepared?
[11:31:24] <iive> lu_zero: same model and harddisk size?
[11:31:56] <av500> lu_zero: fatelf of course!
[11:32:14] <lu_zero> identical
[11:32:32] <av500> fatelt_lite then :)
[11:32:32] <lu_zero> av500: pff
[11:32:41] <Honoome> lu_zero: iscsi?
[11:32:49] <lu_zero> Honoome: uh?
[11:33:24] <Honoome> run sysresccd, add iscsitarget, make it export the harddisk of the first one, then use all the others to fetch it via iscsi :D
[11:33:28] <av500> lu_zero: but imagine you had an x86, an ARM, a MIPS, an 68k, a PPC laptop and a PDP11....
[11:33:39] <lu_zero> sure
[11:33:51] <Honoome> av500: and pretend to run the same code on all of them...
[11:33:56] <av500> of course
[11:34:10] <lu_zero> and pretend that the code doesn't need even libc
[11:34:24] <siretart> lu_zero: so you want to do a 'mass deployment'?
[11:34:30] <lu_zero> yup
[11:34:37] <av500> then you need a WMD...
[11:34:46] <lu_zero> and since it's ubuntu I have no clue but I want to learn
[11:34:52] <siretart> let's take that to /query, I think that's a bit offtopic here..
[11:34:55] <Honoome> actually, I think the main issue that we should tackle somehow, fatelf or not, is api consistency among arches _in the same OS_ at least
[11:35:23] <lu_zero> Honoome: the iscsi trick sounds like a good solution I just need to bake a sysres
[11:36:09] <Honoome> lu_zero: alternatively put N nc commands listening in multicast, and then multicast the image to them :P but you'll probably need something more reliable than UDP
[13:44:47] <CIA-99> ffmpeg: mru * r23775 /trunk/libavcodec/twinvq.c: twinvq: remove VLAs
[13:46:08] <janneg> do we care about libvpx with disabled encoder or decoder? configure --enable-libvpx fails currently if not both are enabled
[13:47:58] <mru> iirc we decided not to care
[13:48:43] <mru> soon eneough we'll be using it for encoding only
[13:48:46] <janneg> the libvpx encoder needs -lm. I assume it's ok to just commit it
[13:49:09] <janneg> adding it to the require2 check
[13:49:23] <mru> uh?
[13:49:30] <mru> -lm should have been added already
[13:50:17] <janneg> yes, but it's before -lvpx which doesn't lay nice with --as-needed
[13:50:24] <janneg> play
[13:50:52] <mru> let's fix that then
[13:51:28] <janneg> ok, the lame and libopencore have also redundant -lm
[13:52:45] <mru> so let's fix those too
[13:53:13] <janneg> mru: http://pastie.org/1018763
[13:53:23] <mru> wrong
[13:53:25] <mru> 5min
[13:54:24] <janneg> yes, missed all the check_mathfunc
[13:54:35] <mru> no
[13:58:42] <janneg> wll, let's say another error of changing things without thinking
[13:59:02] <mru> I'm fixing it properly
[13:59:05] <mru> give me a moment
[13:59:33] <janneg> no problem
[14:02:15] <Honoome> so to _de_mangle a C++ symbol you have to use a recursive, stack-based, backreferencing parser... not bad
[14:02:31] <Honoome> and the process is not perfect: two symbols may demangle to the same string
[14:04:05] <mru> what about the reverse?
[14:04:35] <mru> demangling is never required for correct operation
[14:05:17] <lu_zero> beside debugging
[14:05:32] * lu_zero meanwhile is having his share of fun with gdb
[14:10:17] <Honoome> mru: the reverse is much easier
[14:11:34] <mru> reverse demangling...
[14:12:33] <Honoome> mangling the symbols is a perfect process, mostly...
[14:12:50] <Honoome> and requires _relatively_ less complexity
[14:14:40] <kierank> iirc google wrote a demangler
[14:19:12] <Honoome> what I'm really surprised is that the GCC3 style avoids using special characters for the symbol names; technically I don't htink that anything stops them from using some special characters for interleaving...
[14:20:25] <mru> it's best not to use special characters
[14:20:45] <mru> although allowed by elf, many assemblers have restrictions on what they accept
[14:21:32] <Honoome> that's a good point I guess
[14:22:48] <lu_zero> still having more than 50 chars to define a symbol is interesting...
[14:23:24] <Honoome> gnash is the exception, most of other symbols are generally shorter...
[14:23:24] <mru> at least the spec doesn't call for randomly inserting "Factory" in the mangled name
[14:23:43] <lu_zero> what's that?
[14:23:49] <mru> java
[14:24:23] <lu_zero> I wonder if having long/overlylong symbols has an effective impact on performance
[14:24:48] <Honoome> lu_zero: sure
[14:25:01] <mru> only startup time
[14:25:02] <Honoome> you know .gnuhash right? know why it was introduced?
[14:25:04] <mru> not runtime
[14:25:21] <Honoome> mru: startup definitely, runtime if you use lazy bindings
[14:25:57] <mru> only first time the symbol is hit
[14:26:14] <Honoome> true
[14:26:17] <Honoome> tux saves the plt
[14:27:27] <KotH> Honoome: wth is .gnuhash?
[14:27:57] <Honoome> actually I forgot a dot, it's .gnu.hash
[14:28:21] <Honoome> KotH: binutils and glibc since about 2006 or 2007 implement a replacement for the sysv .hash section of ELF files
[14:28:33] <Honoome> uses a different algorithm because the original one caused too many collisions with C++ symbols
[14:29:49] <j-b> __gb__ ?
[14:30:03] <Honoome> hm?
[14:32:38] <mru> Honoome: int (*(*foo)[4])(int (*)(int *))[2](int);
[14:32:43] <mru> something for you to mangle
[14:33:01] <Honoome> I said relatively easier ;)
[14:35:57] * KotH doenst even get whether that code is valid or not, much less what it should do
[14:36:19] <mru> sorry, that has a mistake in it
[14:41:44] <Honoome> it would seem to be a bidimensional array of function pointers taking as parameter a further function, but yeah doesn't look entirely valid ;)
[14:42:07] <mru> there, got it int (*(*(*(*foo)[4])(int (*)(int *)))[2])(int);
[14:42:55] <mru> it's a pointer to array 4 of pointer to function (pointer to function (pointer to int) returning int) returning pointer to array 2 of pointer to function (int) returning int
[14:43:19] <mru> don't _ever_ try to use anything like that
[14:43:25] <Honoome> mru: now write some (C++) code that works with that :P
[14:43:45] <mru> functions returning function pointers have weird syntax
[14:44:53] <Honoome> JJ Abrams and flares, damn them :P
[14:45:32] <lu_zero> ok
[14:45:40] <lu_zero> 4 systems more or less configured
[14:45:48] <lu_zero> 20 min left before I must go...
[14:46:01] <mru> ok, demangle this: _Z3barPA4_PFPA2_PFiiEPFiPiEE
[14:46:39] <Honoome> gha I didn't reach A which I guess is array.. :P
[14:49:01] <Honoome> ghaaaa it's a double-function pointer again :P
[14:49:09] <mru> same as before
[14:49:14] <mru> but as argument to a function
[14:49:21] <mru> so one more level of function
[14:49:38] <Honoome> it's almost more readable this way
[14:49:52] <mru> nm -C says bar(int (* (*(* (*) [4])(int (*)(int*))) [2])(int))
[14:55:55] <BBB> mmx didn't need alignment right?
[14:56:01] <BBB> what alignment do sse2/ssse3 need?
[14:56:15] <mru> size of access I presume
[14:56:24] <mru> nothing else would make sense
[14:56:26] <BBB> ssse2: 16, ssse3: 32?
[14:56:45] <mru> probably not more than 16 bytes
[14:56:49] <mru> that's unheard of
[14:56:52] <BBB> ok, 16 then
[14:56:52] <BBB> thanks
[14:56:59] <mru> is there an instruction to load 32 bytes?
[14:57:01] <BBB> mmx needs none right?
[14:57:06] <mru> mmx is faster with aligned
[14:57:11] <BBB> ah
[14:57:12] <mru> aligned is always faster
[14:57:13] <BBB> ok, 8 then
[15:18:32] <lu_zero> BBB: you _want_ to have aligned vars
[15:18:56] <lu_zero> altivec is quite educative on that since you have to do the align work explicitly there
[15:19:14] <mru> unaligned altivec loads are "interesting"
[15:19:21] <lu_zero> useful
[15:19:27] <mru> kind of like armv4 unaligned loads
[15:19:37] <Honoome> sparc64 unaligned loads?
[15:19:41] <mru> they're only useful because true unaligned is missing
[15:19:51] <Honoome> has anybody ever seen a niagara box running? o_o
[15:19:59] <mru> there's one on fate
[15:20:12] <Honoome> fun, I thought they dropped out of the face of the earth
[15:20:19] <lu_zero> if you know which part you want to process you can just avoid aligning
[15:21:22] <mru> if theh operands are differently aligned you have to mess with one to align the registers
[15:21:36] <mru> like in your average MC
[15:38:04] <j-b> who works for fluendo here?
[15:38:24] * mru grabs shotgun.... nobody
[15:38:29] <mru> not anymore
[15:39:25] <lu_zero> not me
[15:40:50] <j-b> http://www.moovida.com/download/ because I fail to see the sources
[15:41:43] <kierank> that looks exactly like spotify
[15:43:05] <Honoome> j-b: I think you won't find them
[15:43:11] <Honoome> j-b: http://www.moovida.com/faq/#will-the-diesel-engine-become-open-source
[15:43:20] <j-b> https://code.launchpad.net/moovida is vastly outdated
[15:43:53] <j-b> Honoome: hmm, interesting how they include libdvdnav, libdvdread and libdvdcss that are GPL
[15:44:16] <Honoome> buh that was the only source reference I could find prodding around
[15:44:47] <Honoome> j-b: but don't worry, me and you are still part of the enemies of free software for supporting h264...
[15:44:58] <j-b> oh? I am?
[15:45:11] <j-b> :)
[15:45:13] <Honoome> well you're in #ffmpeg-devel aren't you? :P
[15:45:20] <j-b> Honoome: I am not :D
[15:45:28] <mru> and you're not being flamed
[15:45:37] <j-b> you don't see me, just my very sophisticated bot :)
[15:45:41] <CIA-99> ffmpeg: mru * r23776 /trunk/configure: configure: remove unused function check_foo_config
[15:45:42] <CIA-99> ffmpeg: mru * r23777 /trunk/configure: configure: simplify check_lib function
[15:45:43] <CIA-99> ffmpeg: mru * r23778 /trunk/configure: configure: simplify append function
[15:45:43] <CIA-99> ffmpeg: mru * r23779 /trunk/configure: configure: remove unused flag save/restore functions
[15:45:44] <CIA-99> ffmpeg: mru * r23780 /trunk/configure:
[15:45:44] <CIA-99> ffmpeg: configure: reverse order of -l flags
[15:45:44] <CIA-99> ffmpeg: Adding new libs to the front of the list allows them to resolve
[15:45:44] <CIA-99> ffmpeg: symbols against previously added ones.
[15:45:45] <CIA-99> ffmpeg: mru * r23781 /trunk/configure: configure: remove superflous -lm flags
[15:45:58] <j-b> Honoome: still, I don't understand how they are not violating the GPL
[15:46:16] <Honoome> j-b: don't ask me, I refused the offer a long time ago
[15:47:13] <j-b> and of course, no LGPL, or GPL license but a stupid Eular
[15:47:59] <Honoome> shush, gstreamer is backed by nokia, they can't be evil...
[15:48:29] * Honoome is sarcastic, if you couldn't tell
[15:48:32] <j-b> :)
[15:48:46] <j-b> but, wtf stil
[15:52:52] <j-b> an incompatible GPL and LGPL license, and all dlls from open source projects
[15:53:21] <kierank> go to fluendo irc
[16:08:30] <KotH> Honoome: if nokia back gstreamer, they have to be evil!
[16:11:59] <j-b> that has nothing to do with gstreamer, but with fluendo here.
[16:12:01] <BBB> mru: current patch better? I removed the huge VLAs
[16:12:18] <BBB> if this is OK, I'll work on adding jason's patches into it
[16:12:25] <kierank> hmm...where's ruggles and merzbt when you need them
[16:15:03] <lu_zero> uhm?
[16:15:16] <lu_zero> bilboed is collabora's
[16:15:19] <lu_zero> still gst devel
[16:15:46] <BBB> why are we talking gst in this channel
[16:15:50] <BBB> superdump is also collabora
[16:17:18] <lu_zero> uhmm
[16:17:26] <lu_zero> looks like an evolved elisa
[16:17:30] <BBB> btw
[16:17:37] <BBB> lu_zero: didn't fluendo fund you long time ago?
[16:17:43] * BBB holds tar and feathers
[16:17:55] <lu_zero> BBB: didn't do for you as well? =)
[16:18:03] <BBB> I repented
[16:18:12] <lu_zero> thehe
[16:18:21] <lu_zero> well it was even worse
[16:18:32] <BBB> I went into exile and then I went back to the right path
[16:18:40] <lu_zero> I got founds to write the vorbis in rtp rfc
[16:19:19] <lu_zero> so it was fluendo paying me to work with xiph people
[16:20:03] * BBB boils tar
[16:21:27] <lu_zero> ...
[16:22:01] <BBB> btw josh is doing great, how much longer is soc gonna last?
[16:22:14] <BBB> with a little luck he can finish both x-qt and svq3/qdm2 integration
[16:23:06] <lu_zero> other two months more or less
[16:23:21] <lu_zero> bbl
[16:38:16] <mru> BBB: no vlas, nice
[16:42:45] <j0sh_> BBB: yeah, soc isn't even 1/3 over yet :)
[16:48:41] <BBB> mru: can you reply on-list so I can pretend that I had some approval?
[16:48:54] <mru> I can't comment on the asm
[16:48:56] <BBB> if dark_shikari and michael are ok also, I'll commit
[16:49:03] <BBB> but you're a neon asm god
[16:49:18] <mru> this is mmx
[16:49:20] <peloverde> j0sh_, so do you think you will be sticking around after soc? :)
[16:49:37] <mru> peloverde: is that his choice?
[16:49:44] <mru> if we want him, we keep him
[16:49:51] <j0sh_> oh, yeah, i'd like to
[16:50:01] <mru> resistance is futile, you will be assimilated
[16:50:07] <j0sh_> heh
[16:50:13] <BBB> I think we need to make j0sh_ RE some codec next
[16:50:59] <mru> oh yeah
[16:51:00] <j0sh_> BBB: yeah, i've been watching you and Dark_Shikari work on vp8 together, that looks like fun :)
[16:51:34] <BBB> don't forget david (yuvi), he wrote much of the C code
[16:52:01] <j0sh_> cool, he maintains the xiph stuff doesn't he?
[16:52:21] <BBB> yeah
[16:53:18] <BBB> also, mru will write all of the neon asm
[16:53:22] <BBB> he just doesn't know it yet
[16:53:36] <j0sh_> heh
[16:53:57] <j0sh_> actually i have a beagleboard, i've been meaning to get some use out of it
[16:54:02] <j0sh_> i'd be willing to learn
[16:54:16] <mru> BBB: of course I know I have neon code to write
[16:54:30] <mru> unless yuvi beats me to it
[16:54:38] <j0sh_> i can do asm, but i've never done neon, and i'm not that familiar with the theory behind codecs (mc, etc)
[16:54:52] <BBB> does iphone 3G (non-S) support neon?
[16:55:01] <mru> you need only a basic understanding of the codec to write the asm
[16:55:10] <mru> 3gs has neon, not the earlier ones
[16:55:21] <BBB> hm, ok, will buy a 4g then
[16:55:30] <BBB> good excuse to learn neon
[16:55:30] <j0sh_> do some android phones have neon?
[16:55:31] <mru> the phone without a phone
[16:55:33] * j0sh_ has a droid
[16:55:39] <mru> droid should have neon
[16:55:42] <mru> is it snapdragon?
[16:55:49] <j0sh_> no, the other one
[16:55:58] <mru> do you know what cpu it has?
[16:56:08] <j0sh_> lemme look it up, its def a TI
[16:56:18] <mru> omap3?
[16:56:21] <jai> whatever happened to the resident demoscener?
[16:56:27] <mru> basty/
[16:56:31] <j0sh_> coretx a8
[16:56:37] <BBB> he's a little awol, I've complained about that a couple of times
[16:56:39] <mru> j0sh_: has neon
[16:56:43] <mru> same as beagle
[16:56:51] <BBB> I'm not very happy with basty's soc performance so far
[16:57:21] <mru> his biggest skill is boasting about his skills
[16:57:36] <janneg> droid is an omap3430
[16:57:57] <mru> same as beagle
[16:58:30] * j0sh_ can never keep track of all the TI model numbers
[16:58:48] <mru> most of them are the same thing
[16:59:07] <mru> omap3[45]xx are all the same silicon
[16:59:36] <mru> for xx < 30 some units are disabled with fuses
[16:59:59] <j0sh_> ah
[17:00:01] <mru> so 3505 has just the cortex-a8
[17:00:18] <mru> 3515 and 3525 have one each of dsp and sgx
[17:00:31] <mru> 34xx and 35xx are the same
[17:00:52] <j0sh_> mru: you are to TI codenames what DS is to intel codenames :)
[17:01:15] <j0sh_> is there any benefit of writing for the dsp, rather than arm/neon?
[17:01:28] <mru> it's 10x harder
[17:02:11] <mru> the benefit of using the dsp is that it doesn't keep the arm busy
[17:02:22] <mru> for best performance you use both
[17:02:43] <j0sh_> i imagine that phones offload some work to the dsp?
[17:02:46] <janneg> 36xx are the 45nm shrinks
[17:02:59] <mru> 36xx are more than shrinks
[17:03:21] <mru> the sgx is much faster per clock
[17:03:29] <mru> the a8 is upgraded to r3p2
[17:03:41] <mru> various peripherals are upgraded
[17:03:44] <j0sh_> is neon/mmx/etc executed in the same instruction pipeline as regular asm? or can it be done in parallel?
[17:04:05] <mru> neon is in the same instruction stream but runs in its own pipeline
[17:04:16] <mru> so there is fine-grained parallelism there
[17:04:32] <j0sh_> cool
[17:04:35] <KotH> mru: isnt it also possible to get more decoding power per W on the dsp?
[17:04:48] <KotH> mru: or is that just dsp vendor FUD?
[17:04:50] <mru> KotH: depends on what you're doing
[17:05:08] <KotH> mru: 0815 dct based codecs
[17:05:16] <mru> if your algorithm maps nicely onto the dsp instruction set, it's probably more power-efficient
[17:05:35] <mru> it's a much simpler processor
[17:05:45] <KotH> ok, 0815 dct based codecs on a 0815 dsp core
[17:05:56] <KotH> well.. not so simple
[17:06:08] <KotH> current dsps are nothing else but VILW machines
[17:06:27] <KotH> VLIW*
[17:06:43] <mru> yes, and a vliw cpu is much simpler to build than an out of order, superscalar regular cpu
[17:06:50] <mru> but much harder to write code for
[17:07:06] <mru> the C64x pipeline isn't even interlocked
[17:07:37] <KotH> hmm...
[17:07:48] <mru> so be careful with your nops
[17:07:49] <KotH> i never looked at the implementation complexity of an vliw machine...
[17:08:30] <KotH> though the itanium was a fucking huge dump of a vliw machine
[17:08:33] <mru> each very long instruction contains one instruction for each execution unit
[17:08:48] <KotH> i know the idea of vliw
[17:08:58] <KotH> but i havent had a in depth look at it
[17:09:34] <mru> the advantage is that you have complete control over instruction scheduling
[17:09:48] <mru> the disadvantage is that you must control the scheduling completely
[17:10:28] <KotH> i know
[17:10:41] <KotH> btw: what made the itanic so fucking complex?
[17:10:45] <mru> intel
[17:10:59] * KotH remembers how the first versions were fucking huge 250W heaters
[17:11:14] <KotH> when all other CPUs were around 50-70W
[17:11:46] <Honoome> mru: I thought it was HP, the fault
[17:11:54] <mru> well, both
[17:12:07] <sp_> hey guys, can you read this message? this is a test
[17:12:12] <mru> take two large companies, each with their own agenda and stack of patents
[17:12:32] <peloverde_> Is the itanium .bss issue an itanium flaw or is it just the gnu toolchain being reatrted?
[17:12:40] <mru> bit of both
[17:12:50] <mru> 22 bits is rather small
[17:12:57] <peloverde_> sp_, yes
[17:13:09] <mru> peloverde_: I was hoping nobody would answer :-)
[17:13:29] <KotH> er.. no, peloverde_: you should not have answered! ;)
[17:13:40] <KotH> peloverde_: or at least answer with a "no"
[17:13:48] <mru> or a /kick
[17:14:01] <sp_> fuck, /kick would've let me know you could see it too
[17:14:14] <mru> /fuck ?
[17:14:23] <KotH> mru: dont use that on spaam
[17:14:34] <KotH> mru: unless you want to take microchips place
[17:14:39] <mru> no thanks
[17:50:47] <BBB> Dark_Shikari: wake up! :-p
[17:51:11] <Dark_Shikari> what
[17:52:09] <KotH> Dark_Shikari: go back to sleep! :-p
[17:53:06] <mru> /kill -STOP Dark_Shikari
[17:53:40] <BBB> Dark_Shikari: can you ok my mmx patch on the mailinglist? I'm halfway adding your sse2 to it locally
[17:54:07] <Dark_Shikari> and drop the property change obviously btw
[17:54:15] <BBB> the ssse3 functions look lovely by the way
[17:54:17] <Dark_Shikari> that's cygwin sucking
[17:54:23] * BBB wishes he had a ssse3 cpu
[17:54:26] <Dark_Shikari> Don't you love how the ssse3 functions are like
[17:54:29] <Dark_Shikari> five billion times shorter?
[17:54:32] <BBB> I know
[17:54:33] <BBB> so unfair
[17:55:01] <Dark_Shikari> You still need to merge my changes to the dsputil function
[17:55:05] <Dark_Shikari> i.e. where I made your functions moree generic
[17:55:09] <Dark_Shikari> _ ## INSTR instead of _mmxext
[17:55:28] <BBB> that doesn't work anymore
[17:55:32] <Dark_Shikari> why?
[17:55:35] <BBB> because I changed the actual asm prototypes
[17:55:41] <Dark_Shikari> .... and?
[17:55:50] <Dark_Shikari> You still need the #defines to be generic
[17:55:55] <Dark_Shikari> they all now have mmx hardcoded
[17:55:56] <Dark_Shikari> this is stupid
[17:56:02] <BBB> the function pointers in VP8DSPContext are different from the asm function pointers
[17:56:09] <BBB> so you can't make a 16 call 8 twice
[17:56:12] <Dark_Shikari> Why?
[17:56:16] <BBB> look at the code
[17:56:20] <Dark_Shikari> That's stupid
[17:56:23] <BBB> yeah, I know
[17:56:26] <Dark_Shikari> Um, fix that
[17:56:29] <Dark_Shikari> that's a bug.
[17:56:33] <BBB> I can't figure out how
[17:56:36] <Dark_Shikari> Huh?
[17:56:36] <Dark_Shikari> wtf
[17:56:41] <Dark_Shikari> I have no idea what you're on
[17:56:44] <Dark_Shikari> "you can't figure out how"???
[17:56:50] <BBB> I might need some coffee
[17:56:53] <Dark_Shikari> How do you make prototypes the same
[17:56:55] <Dark_Shikari> BY CHANGING THE ARGUMENTS
[17:56:56] <Dark_Shikari> hurrrrrr
[17:57:03] <BBB> hush darling
[17:57:05] <Dark_Shikari> seriously, I have no idea what you're talking about
[17:57:09] <Dark_Shikari> Also, your hv width 8 is completely wrong
[17:57:18] <Dark_Shikari> + DECLARE_ALIGNED(8, uint8_t, tmp)[8 * (8 + TAPNUMY - 1)]; \
[17:57:21] <Dark_Shikari> 8 can be height 16.
[17:57:49] <BBB> with the same change, it can't anymore
[17:57:52] <Dark_Shikari> Yes it can
[17:58:02] <BBB> the static functions never call each other anymore with my changes
[17:58:07] <Dark_Shikari> That isn't what I said.
[17:58:15] <Dark_Shikari> First of all, the static functions should call each other
[17:58:20] <Dark_Shikari> but unreleatedly, VP8 supports 8x16 partitions
[17:58:25] <Dark_Shikari> that's a width-8 partition of height 16.
[18:00:08] <BBB> width 16, height 8
[18:00:11] <BBB> the other doesn't exist
[18:00:12] <BBB> just checked it
[18:00:26] <BBB> but you're right that a width 4 height 16 exists :-p
[18:00:30] <BBB> so I'll change it
[18:00:50] <BBB> oh wait I'm misreading
[18:00:51] <BBB> darnit
[18:00:52] <BBB> you're right
[18:01:17] <BBB> there's 16x8, 8x16, 8x8 and 4x4 splits
[18:01:41] <BBB> and yes I have to change that if I ever fix the C code to do the correct thing there
[18:01:46] <Dark_Shikari> width 4 height 16 does not exist
[18:01:46] <BBB> consider it done
[18:01:48] <BBB> anything more?
[18:01:53] <BBB> yeah I know I'm smoking bad stuff
[18:01:54] <Dark_Shikari> Anyways, since you refuse to fix your bloody dsp functions
[18:02:02] <Dark_Shikari> I'm going to unilaterally break your local tree ok?
[18:02:04] <Dark_Shikari> I'm fixing the C code.
[18:02:33] <BBB> I'm doing the same thing as h264, what's wrong with that?
[18:02:36] <BBB> the dsp functions are fine
[18:03:02] <Dark_Shikari> What's wrong is src and dst stride are assumed to be the same!!!
[18:03:07] <Dark_Shikari> didn't we tell you this already
[18:03:17] <BBB> only in the dsp function pointer prototypes
[18:03:29] <BBB> and when called from vp8.c, that's indeed true
[18:03:39] <BBB> h264dsp_mmx.c does this too
[18:03:46] <Dark_Shikari> h264 is different
[18:03:51] <Dark_Shikari> it doesn't round between staged filter steps
[18:04:19] <mru> if someone tells you something is wrong, don't try to defend it by saying that something else is _also_ wrong
[18:04:41] <BBB> mru: maybe it means it's not that wrong (?)
[18:04:51] <mru> hardly
[18:04:51] <KotH> what's currently the status of vp8 playback with ffmpeg?
[18:05:02] <mru> KotH: bickering
[18:05:14] <BBB> the problem with adding another function argument is that it becomes a 7-register function, thus needs to be under HAVE_7_REGS, which I think is unnecessary
[18:05:23] <KotH> mru: and that means?
[18:05:27] <BBB> because we never use all 7
[18:05:50] <mru> isn't it pure asm?
[18:05:58] <pengvado> if it's in yasm it doesn't need HAVE_7_REGS
[18:06:15] <pengvado> because yasm doesn't allow gcc's register allocator to get in the way
[18:07:01] <Dark_Shikari> we want to avoid unnecessary wrapper functions
[18:07:05] <Dark_Shikari> esp. with x86_32 calling conventions
[18:07:15] <Dark_Shikari> if we can call asm directly instead of going through a wrapper, so much the better
[18:07:57] <Dark_Shikari> BBB: where is ff_put_vp8_pixels c?
[18:08:03] <BBB> Dark_Shikari: in dsputil
[18:08:27] <Dark_Shikari> why is it there
[18:08:35] <BBB> that's where put_pixels is
[18:09:19] <BBB> (put_pixels{4,8,16}_c)
[18:09:40] <BBB> oh these also merely take one stride argument, so we'll need to fix them too if you want me to change it
[18:10:03] <Dark_Shikari> already done
[18:10:30] <BBB> I would learn a lot more from this if you wouldn't do it for me but would leave me a little time to do it myself
[18:11:04] <Dark_Shikari> You're the one who hasn't done crap for a week
[18:11:08] <Dark_Shikari> and is bikeshedding over fixing your own code
[18:12:08] <Dark_Shikari> I want this done relatively fast, so I don't want bikeshedding getting in the wa
[18:12:12] <Dark_Shikari> *way
[18:12:22] <Dark_Shikari> so I'm using the anti-bikeshed method: just push things through
[18:12:43] <Dark_Shikari> I don't intend to be stealing your thunder -- I just want it done
[18:12:51] <BBB> ??
[18:13:06] <mru> at this rate it's more of a hum than a thunder
[18:13:12] <BBB> you can't call a 1-line argument a bikeshed, but anyway I'm about to start one
[18:13:21] <Dark_Shikari> 1-line arguments are the biggest bikesheds
[18:14:15] <BBB> bla... well whatever let's just get it done otherwise we'll still be flaming tomorrow
[18:14:19] <Dark_Shikari> Committed.
[18:14:34] <mru> \o/
[18:14:42] <Dark_Shikari> the commit message will explain it.
[18:14:56] <Dark_Shikari> yes, I said it -- "commit message" and "explain" in the same sentence.
[18:15:03] <CIA-99> ffmpeg: darkshikari * r23782 /trunk/libavcodec/ (dsputil.c vp8dsp.c vp8dsp.h vp8.c):
[18:15:03] <CIA-99> ffmpeg: Make VP8 DSP functions take two strides
[18:15:03] <CIA-99> ffmpeg: This isn't useful for the C functions, but will allow re-using H and V functions
[18:15:03] <CIA-99> ffmpeg: for HV functions without adding separate H and V wrappers.
[18:15:39] <mru> wtf were those functions doing in dsputil.c?
[18:15:48] <BBB> so if I redo this and set height=16 for all h/v8 functions it's fine and I can commit?
[18:16:05] * mru needs to find a gcc flag to stop people adding random weird stuff to dsputil.c
[18:16:11] <BBB> Dark_Shikari: you can probably move them to vp8dsp now, put_pixels() being static was the only reason why they were there iirc
[18:16:16] <Dark_Shikari> I already did.
[18:16:20] <BBB> oh
[18:16:21] <BBB> fine
[18:16:24] * BBB goes work on asm
[18:16:35] <Dark_Shikari> BBB: you should take my patch and modify it, not vice versa
[18:16:35] <mru> actually, that might be possible
[18:16:40] <Dark_Shikari> i.e. your patch shoudl have the sse2 and ssse3 stuff too
[18:16:44] <Dark_Shikari> and the templated vp8dsp loaders
[18:16:59] <Dark_Shikari> If you can't test the ssse3 -- that's _fine_, just send me the patch when you _think_ you've fixed it
[18:17:02] <Dark_Shikari> and I'll test it for you.
[18:17:13] <Dark_Shikari> in short:
[18:17:17] <Dark_Shikari> 1) save local changes
[18:17:21] <Dark_Shikari> 2) revert local changes
[18:17:23] <Dark_Shikari> 3) apply my patch
[18:17:27] <Dark_Shikari> 4) apply your changes
[18:17:34] <Dark_Shikari> 5) try to make it work for all the functions
[18:17:44] <Dark_Shikari> 6) when you think you've done it, and it passes for everything you can test, I'll test the rest.
[18:18:21] <Dark_Shikari> also, fyi, your C code is actually pretty good. like, seriously, I did an microoptimization pass over it and found nothing obvious
[18:18:24] * BBB is about to start another bikeshed
[18:18:33] <Dark_Shikari> What now?
[18:18:37] <Dark_Shikari> Just say it, and we'll resolve it instantly
[18:18:39] <Dark_Shikari> by force if necessary.
[18:19:03] <BBB> you templated h4/6 for no good reason, that's useless
[18:19:15] <Dark_Shikari> what do you mean?
[18:19:18] <BBB> and if I just apply your changes, I don't learn anything, so what's the point then?
[18:19:26] <Dark_Shikari> Sure you learn something
[18:19:29] <Dark_Shikari> a) you wrote the mmx
[18:19:33] <Dark_Shikari> b) you learned how to modify someone else's asm
[18:19:48] <Dark_Shikari> which is often much harder than writing it on your own
[18:20:18] <Dark_Shikari> If you're going to commit it with all my changes removed, I will fix it myself and you will commit nothing.
[18:22:42] <Dark_Shikari> anyways I'll go commit the non-vp8-specific parts
[18:26:00] <Dark_Shikari> intra pred: committed
[18:26:43] <CIA-99> ffmpeg: darkshikari * r23783 /trunk/libavcodec/ (7 files in 2 dirs): 16x16 and 8x8c x86 SIMD intra pred functions for VP8 and H.264
[18:33:19] <CIA-99> ffmpeg: mru * r23784 /trunk/libavcodec/huffyuv.c: huffyuv: make VLAs fixed size
[18:34:38] <Dark_Shikari> BBB: any other issues? we now have everything committed except those core dsp functions (mc, idct)
[18:34:54] <CIA-99> ffmpeg: cehoyos * r23785 /trunk/libavcodec/x86/h264dsp_mmx.c: Cosmetics: Fix indentation.
[18:36:25] <Honoome> Dark_Shikari: how sensible is the improvement for h264? :P
[18:36:57] <Dark_Shikari> dunno, 0.1-3%
[18:37:01] <Dark_Shikari> didn't measure, so guessing
[18:37:04] <Dark_Shikari> and giving a stupidly large range
[18:37:11] <Dark_Shikari> depends on how many intra blocks.
[18:37:48] <Honoome> the improvement is for? SSSE3?
[18:37:51] <Dark_Shikari> everything
[18:38:01] <Honoome> that's nice then
[18:47:40] <Dark_Shikari> oh hurr, I broke compilation
[18:47:59] <Dark_Shikari> on gcc 2.9.5
[18:48:07] <mru> fix it!
[18:48:20] <Dark_Shikari> I am
[18:48:21] <mru> j-b: we need your bot here :-)
[18:49:36] <CIA-99> ffmpeg: darkshikari * r23786 /trunk/libavcodec/vp8dsp.c: Fix c99ism in r23782
[18:49:53] <Dark_Shikari> oh hurr, I fail wrt makefile hackery
[18:51:04] <Dark_Shikari> feel free to fix my mistake, I don't trust myself with makefiles
[18:51:32] <mru> what have you done?
[18:52:16] <CIA-99> ffmpeg: mru * r23787 /trunk/libavcodec/elbg.c: elbg: remove VLAs
[18:52:40] <Dark_Shikari> I broke compilation with yasm and without gpl
[18:52:48] <Dark_Shikari> because I put a non-gpl-asm file in CONFIG_GPL in the makefile
[18:52:53] <Dark_Shikari> and I don't know how to fix it properly
[18:52:57] <Dark_Shikari> because I suck at makefiles
[18:53:02] <mru> so learn
[18:53:08] <Dark_Shikari> I don't want to break it again
[18:53:14] <mru> I'm looking at it
[18:53:24] <Dark_Shikari> I don't know what CONFIG it should be under
[18:53:26] <Dark_Shikari> is there a CONFIG H264?
[18:53:49] <Dark_Shikari> should it be under MMX-OBJS-$(HAVE_YASM) ?
[18:53:54] <Dark_Shikari> should it be under YASM-OBJS-$(CONFIG_FFT) ?
[18:53:56] <mru> it's used by vp8 as well, right?
[18:53:58] <Dark_Shikari> I have no fucking clue
[18:53:58] <Dark_Shikari> yes
[18:54:15] <mru> then vp8 should _select h264dsp in configure
[18:54:42] <mru> and the relevant files should be under CONFIG_H264DSP
[18:55:10] <Dark_Shikari> I have no idea what the right thing is to do, and I have to go to work
[18:55:13] <Dark_Shikari> so please, somebody fix this.
[18:55:19] <mru> I'm on i
[18:55:20] <mru> t
[18:55:24] <Dark_Shikari> ok
[18:55:25] <Dark_Shikari> thanks
[18:57:41] <mru> deblock and idct are gpl?
[18:59:06] <Dark_Shikari> yes
[18:59:13] <mru> why?
[18:59:13] <Dark_Shikari> they're copypastad from x264
[18:59:29] <mru> who's refusing lgpl?
[18:59:31] <Dark_Shikari> the company I work for is paying to get deblock lgpl'd
[18:59:31] <Dark_Shikari> loren
[18:59:47] <mru> I'm making a mint selling lgpl code btw
[19:00:30] <mru> did you test this code at all?
[19:00:32] <mru> it doesn't build here
[19:00:40] <Dark_Shikari> I tested it with enable-gpl
[19:00:44] <Dark_Shikari> all fate tests are with enable-gpl
[19:00:58] <mru> libavcodec/vp8dsp.c:261: error: 'y' redeclared as different kind of symbol
[19:01:42] <mru> removing the int y line fixes
[19:02:28] <Dark_Shikari> oh I'm a fucking moron
[19:02:43] <mru> I don't see why that's an error though
[19:03:07] <Dark_Shikari> committed
[19:03:13] <Dark_Shikari> it's because y is an argument to the function
[19:03:15] <Dark_Shikari> and we just shadowed a variable
[19:03:20] <Dark_Shikari> fuck it I cannot think this morning
[19:03:27] <Dark_Shikari> I should not be coding, pull my hands off the keyboard =p
[19:03:28] <mru> but that shouldn't be an error normally
[19:03:37] <Dark_Shikari> Redeclaring a variable is an error.
[19:03:48] <mru> not in an inner scope
[19:03:55] <CIA-99> ffmpeg: darkshikari * r23788 /trunk/libavcodec/vp8dsp.c: Really fix r23782
[19:03:56] <Dark_Shikari> it isn't in an inner scope
[19:04:09] <Dark_Shikari> the scope of the function args is the same as the highest level scope in the function
[19:04:25] <mru> no
[19:04:31] <Honoome> mru: gcc 4.5?
[19:04:38] <Dark_Shikari> function( int x ) { int x; } is invalid
[19:04:38] <mru> when did that change?
[19:04:46] <mru> that used to be totally valid
[19:04:59] <Honoome> mru: gcc45 refuses it
[19:05:10] <Honoome> as to whether it changed in the standard, no clue
[19:06:25] <mru> or maybe it is invalid
[19:06:30] <mru> whatever
[19:06:32] <mru> it's fixed
[19:07:07] <Dark_Shikari> cehoyos fixed my makefile
[19:07:11] <Dark_Shikari> I guess he's not here, but either way thanks
[19:07:16] <CIA-99> ffmpeg: cehoyos * r23789 /trunk/libavcodec/x86/Makefile: Fix compilation without --enable-gpl.
[19:07:48] <mru> and he did it wrong
[19:07:51] <mru> damn
[19:07:57] <mru> I'm the makefile maintainer here
[19:08:43] <Dark_Shikari> lol
[19:08:45] * Honoome hands mru a hacksaw
[19:08:52] <Dark_Shikari> BIKESHED BIKESHED
[19:10:21] <mru> I was 10s from committing it properly
[19:10:25] <Dark_Shikari> lol
[19:10:26] <Dark_Shikari> then fix it
[19:10:31] <mru> alredy done
[19:10:58] <CIA-99> ffmpeg: mru * r23790 /trunk/ (libavcodec/x86/Makefile configure): Make vp8 select h264dsp and use this to pull in mmx intrapred
[19:28:58] <CIA-99> ffmpeg: mru * r23791 /trunk/libavcodec/huffyuv.c: huffyuv: remove unnecessary size argument from generate_len_table()
[20:18:46] <Honoome> mru: what was the reason why you discouraged using precalculated tables? (with the exception of storage space for embedded, that i know)
[20:19:02] <mru> huh?
[20:19:35] <Honoome> mru: you said something against using hardcoded constant tables the other day when I was talking about the huge .bss in ffmpeg
[20:20:04] <mru> hardcoded makes the lib bigger
[20:20:29] <mru> generated can't share the tables at runtime
[20:20:34] <mru> tradeoffs
[20:20:41] <mru> there's a reason we support both
[20:21:05] <Honoome> so it's okay if I write and submit some more generators?
[20:22:07] <mru> of course
[20:22:45] <Honoome> k will think first if I can get that done automatically even dure cross-compile ^^
[20:23:18] <mru> our tablegen framework already supports that
[20:23:50] <Honoome> oh? I missed the introduction of that then
[20:24:18] <mru> last year some time
[20:24:25] <Honoome> yeah that explains it
[20:30:03] <spaam> KotH: <3
[20:30:23] <KotH> mru: could you kick spaam for me? thanks
[20:30:58] <spaam> KotH: :(
[21:08:46] <Dark_Shikari> BBB: got your email
[21:09:00] <lu_zero> who implemented tablegen?
[21:09:16] <Dark_Shikari> BBB: wasn't there that "broken on x86_64 problem" you needed to fix?
[21:17:14] <BBB> lu_zero: reimar, I think
[21:17:15] <Honoome> mru: uh... can I assume that a host with cross-compiling has ffmpeg installed, at all? /me needs libavutil >_<
[21:17:24] <mru> no
[21:17:43] <Honoome> afraid so
[21:17:45] <Dark_Shikari> BBB: ?
[21:17:54] <mru> Honoome: what are you doing?
[21:18:12] <Honoome> wanted to start with an "easy" one (the VLCs are a biit messed up), so I picked up mlp at random
[21:18:18] <Honoome> it has three CRC tables
[21:18:55] <BBB> Dark_Shikari: looking into x264 on how to do that again ;)
[21:19:10] <Dark_Shikari> BBB: I don't actually _know_ why it crashes
[21:19:17] <Dark_Shikari> it should work
[21:19:22] <Dark_Shikari> but either way, the correct method is something like this:
[21:19:32] <Dark_Shikari> %ifdef ARCH_X86_64
[21:19:47] <Dark_Shikari> or actually, crap, win64 is different too
[21:19:53] <Dark_Shikari> Just be lazy and do 7,7
[21:19:54] <Dark_Shikari> instead of 6,6
[21:19:56] <Dark_Shikari> problem solved
[21:20:04] <Dark_Shikari> better solution imo
[21:20:12] <BBB> k
[21:20:14] <Dark_Shikari> BBB: in short, remove "mov r5, r6m"
[21:20:17] <Dark_Shikari> replace r5 with r6 in those functions
[21:20:21] <BBB> right
[21:20:23] <Dark_Shikari> etc
[21:20:32] <Dark_Shikari> Also, it passes on my machine
[21:20:35] <Dark_Shikari> your diff
[21:20:38] <Dark_Shikari> Make sure to test each level of asm, btw
[21:20:41] <Dark_Shikari> I tested ssse3 and sse4
[21:20:45] <Dark_Shikari> I assumed you tested mmx, mmxext, sse2
[21:20:50] <Dark_Shikari> to test each, comment out the loads for sse2
[21:20:53] <Dark_Shikari> then comment out for mmxext
[21:20:53] <Dark_Shikari> etc
[21:21:47] <BBB> I tested all of them
[21:22:06] <BBB> I had to do that to test for performance improvements anyway ;)
[21:22:07] <Dark_Shikari> ok, great
[21:22:19] <Dark_Shikari> in that case, fix the mov r6m issue and you are DONE and we can commit :)
[21:22:29] <Dark_Shikari> if you want, give me the patch one more time so I can check the ssse3 ones again
[21:22:33] <Dark_Shikari> (when you're ready)
[21:23:52] <BBB> still have to fix PIC
[21:23:57] <Dark_Shikari> Oh yeah, that.
[21:24:13] <Dark_Shikari> cabac-a.asm has the LOAD_GLOBAL macro, which you can borrow.
[21:24:19] <Dark_Shikari> oh, but that's not quite the same.
[21:24:35] <Dark_Shikari> ok, well, feel free to write up an example solution and I'll check it out for you
[21:26:17] <Honoome> mru: do you think it's possible to have a generic hardcoded version of INIT_VLC_STATIC? seems to be the cause of most of the .bss
[21:29:03] <mru> tricky
[21:29:49] <Honoome> I estimate at least 2 of the 4MB of .bss in libavcodec are caused by that
[21:30:05] <mru> rv34 and ivi*?
[21:30:14] <mru> dv is pretty awful too iirc
[21:30:24] <Honoome> ivi and msmpeg4
[21:30:30] <mru> but what's so bad about bss anyway?
[21:30:48] <Honoome> as you said, it cannot be shared :)
[21:31:05] <mru> do you run hundreds of ffmpeg processes decoding indeo at the same time?
[21:31:40] <Honoome> indeo, no... but testing feng it's not so rare to run over 200 ffmpeg instances...
[21:31:59] <mru> if you don't use indeo, that table will not use any space at all
[21:47:06] <BBB> I don't think I get it... so which part takes care of PIC?
[21:47:17] <BBB> is it just the indirect lea ?
[21:49:31] <Dark_Shikari> PIC means you cannot do [global_constant + REG]
[21:49:42] <Dark_Shikari> for example, this is one way to avoid this:
[21:49:48] <Dark_Shikari> %ifdef PIC
[21:49:56] <Dark_Shikari> lea r5, [const]
[21:50:04] <Dark_Shikari> lea r5, [r5+r2*4+8]
[21:50:07] <Dark_Shikari> %else
[21:50:14] <Dark_Shikari> lea r5, [const+r2*4+8]
[21:50:15] <Dark_Shikari> %endif
[21:50:38] <BBB> oh that's all?
[21:50:43] <Dark_Shikari> That's it.
[21:50:52] <Dark_Shikari> Oh, and on x86_64, where PIC matters, iirc r11 is free for a temp.
[21:50:57] <Dark_Shikari> i.e. you can do whatever you want with it.
[21:51:10] <Dark_Shikari> If you need it. You rarely do.
[21:51:19] <BBB> I can just use r5/r6 in each function
[21:52:10] <Dark_Shikari> to test, all you need is an x86_64 machine.
[21:52:18] <Dark_Shikari> which I strongly suggest you get ssh access to
[21:52:24] <Dark_Shikari> no matter how shitty it is
[21:52:42] <BBB> can I do lea r5, [const + r5]?
[21:54:04] <Dark_Shikari> no.
[21:54:11] <Dark_Shikari> but you can do lea r11, [const]
[21:54:13] <Dark_Shikari> add r5, r11
[21:54:36] <BBB> is pic ever defined for non-64-bit machines in yasm?
[21:54:50] <BBB> (because for 32-bit machines, r11 doesn't exist, right?)
[21:55:31] <Honoome> ebx...
[21:56:06] <Dark_Shikari> BBB: we only support x86_64 pic
[21:56:25] <Dark_Shikari> much to the chagrin of hardened gentoo users
[21:56:38] <mru> and the selinux trolls
[21:56:44] <peloverde> and google
[21:57:20] <mru> I've still not seen a single explanation of why textrels would be a security problem
[21:57:34] <Honoome> I thought it was also part of the debian policy to not allow textrel?
[21:57:40] <Honoome> mru: it's not that htey are a security problem themselves
[21:57:53] <Honoome> but to enable textrel, you enable write access to the executable memory areas
[21:57:55] <mru> debian are fading
[21:57:59] <Dark_Shikari> Honoome: no you don't
[21:58:06] <Honoome> thus you cannot lock them up with nx
[21:58:11] <mru> Honoome: and how did you think the code got there in the first place?
[21:58:16] <Honoome> Dark_Shikari: that's what the current implementation does..
[21:58:16] <mru> the kernel puts it there!
[21:58:23] <mru> by WRITING it
[21:58:26] <Dark_Shikari> what mru said
[21:58:33] <Honoome> mru: erm yes, but the relocations are done in userspace by ld.so
[21:58:41] <mru> so?
[21:58:45] <Dark_Shikari> Honoome: and ld.so couldn't, like, set the nx bit back when it's done?
[21:58:54] <mru> remove the write bit, then set the exec bit
[21:59:00] <Honoome> Dark_Shikari: seems like it doesn't..
[21:59:00] <Dark_Shikari> And what mru said.
[21:59:03] <Dark_Shikari> Then that's a bug
[21:59:09] <Dark_Shikari> allowing write-execution is a bug.
[21:59:13] <Honoome> don't tell that to me..
[21:59:21] <mru> Dark_Shikari: x86 is a bug then...
[21:59:30] <Honoome> it's like gcc trampulines that require execstack
[21:59:36] <Dark_Shikari> mru: it's not the job of the arch to stop the kernel from doing stupid things
[22:00:07] <Honoome> the other problem is that you then end up treating .text like .data and causing CoW
[22:00:23] <mru> x86 didn't have an exec bit at all until the late P4s
[22:00:34] <mru> Honoome: so?
[22:00:40] <mru> is that a _security_ problem?
[22:00:44] <Honoome> no
[22:01:02] <BBB> should pw_64 also be accessed using that thing?
[22:01:10] <Honoome> that's a "I would like not to waste 100MB of FFmpeg instances" :P
[22:01:32] <Honoome> otoh my solution to all this is "x86-64"
[22:01:53] <Honoome> to the chagrin of those who insist that 32-bit code is faster
[22:02:39] <mru> when the selinux trolls see something can be forbidden, they have an instinctive knee-jerk reaction to enable that
[22:02:47] <mru> because that has to be more secure, right?
[22:03:29] <Honoome> mru: I'm not really a selinux kind of guy myself
[22:03:41] <mru> good
[22:03:53] <mru> they tend to run fedora
[22:04:00] <mru> or sometimes centos
[22:04:04] <Honoome> yeah well gentoo's selinux support is shitty, so ..
[22:04:09] <mru> if they're hardcore
[22:04:15] <Honoome> I'm actually more interested in things like ssp myself
[22:04:25] <Honoome> it's like PIE and ASLR...
[22:05:03] <mru> I think more effort should be spent on writing correct code to begin with
[22:05:22] <mru> and less on papering over the effects of bad code after it's exploded
[22:05:26] <Honoome> sure you randomize the address space... fine for binary distros.. but how can you tell what are the addresses on _my_ gentoo system?
[22:05:50] <Honoome> and if you can find them by probing at apache, aslr is not going to help me the slightest until I restart it...
[22:06:16] <Honoome> mru: I think ssp itself is a nice way to identify the broken code, but that's about it
[22:06:43] <mru> tools for finding bugs are good
[22:06:54] <mru> they should be used during development
[22:07:21] * peloverde found the logic flaw in aacenc window selection :)
[22:07:50] <Honoome> I'm still uncertain about fortify-source
[22:07:58] <Honoome> it changes too much the logic around
[22:08:14] <spaam> mru: do you use that kind of tool on ffmpeg? :)
[22:08:56] <mru> we run the full test suite under valgrind
[22:10:48] <spaam> using cppcheck or something like that also? :)
[22:13:16] <mru> cppcheck?
[22:13:56] <j-b> astrange: ping
[22:14:01] <Honoome> looks like a static analysis tool
[22:14:20] <mru> probably not very useful then
[22:14:44] <peloverde> We run CSA regularly http://tranquillity.ath.cx/clang/, it found a few bugs the first few times, now it's more or less useless
[22:14:46] <spaam> mru: static code analysis for c/c++
[22:14:52] <spaam> ;D
[22:14:56] <peloverde> However like mru it hate VLAs :)
[22:15:06] <mru> good ;-)
[22:15:07] <spaam> peloverde: ok :)
[22:15:08] <Honoome> just the auto-variable tests but that's very much rare
[22:16:11] <mru> only a few more vlas left now
[22:17:33] <Honoome> mru: interestingly enough checking for vlas on feng brought up a stupid function that was passed a parameter, used for vla... that was a constant =_=
[22:19:37] <mru> I found several of those in ffmpeg
[22:19:51] <mru> or where the only possible values were 1 and 2
[22:20:58] <Honoome> now my only huge problem is that the way I'm running the parser over rtsp requests it's a bunch of memmove :(
[22:21:18] <_av500_> mru: but then you can make the param a bool :)
[22:21:28] <_av500_> boo + 1
[22:21:31] <_av500_> +l
[22:21:40] <mru> boo and far?
[22:21:44] <Honoome> because I can't be sure I got the whole request at once ... unless I use sctp with my patched kernel, and FIONREAD
[22:21:59] <_av500_> my coworkers love to typdef bool
[22:22:12] <_av500_> and then to always assing it 0 or 1 and pass to int params...
[22:22:25] <_av500_> but, its a bool...
[22:22:32] <mru> tell them about stdbool.h
[22:22:45] <_av500_> no, they are on java now, they have good bool
[22:22:55] <_av500_> and = 1 is verboten
[22:24:01] <mru> it's fun to set a "boolean" variable to, say, 17 and wait for it to hit some code doing if (foo == true)
[22:24:19] <_av500_> yep
[22:24:29] <_av500_> or how they used to use char for a fn param
[22:24:38] <_av500_> because the value is just so small...
[22:24:50] * Honoome would wish that glib3 required C99, and finally dropped guint8 and company... but knows he'd be delussional
[22:25:16] <_av500_> glib3 can only need GC99
[22:25:17] <mru> but that would be removing the very _soul_ of the library
[22:25:32] <_av500_> genau!
[22:26:09] <mru> the quality of a programmer is inversly proportional to the number of standard types he typedefs
[22:27:09] <pross-au> lol
[22:27:52] <mru> typedefs are ok for function pointers
[22:28:22] <mru> otherwise you end up with int (*(*(*(*foo)[4])(int (*)(int *)))[2])(int)
[22:28:35] <Honoome> mru: did you yank that to a file to paste it back at will? :P
[22:28:48] <_av500_> its on F3
[22:28:54] * Honoome would take all the packages that check for the _size_ of integers and similar in autoconf, and KILL THE UPSTREAMS
[22:29:26] <mru> I think that's one of the worst C declarations I've ever seen
[22:29:27] <Honoome> use the bloody standard types, and if one system (anybody said openbsd?) doesn't have those, simply typedefs out of _those_
[22:29:40] <mru> openbsd has them
[22:29:44] <Honoome> mru: since when?
[22:29:55] <mru> ffmpeg builds there at least
[22:29:57] <Honoome> ah sorry maybe openbsd only had them in the wrong headers until little ago...
[22:30:03] <Honoome> inttypes.h vs stdint.h
[22:30:12] <Honoome> I remember that openbsd was one of the few that lacked stdint.h _entirely_
[22:30:23] <mru> pre-c99 there was some confusion over those headers
[22:30:30] <Honoome> but I admit I haven't tried installing openbsd in a while, even in vm.. especially since qemu kills netbsd as well
[22:30:47] <Honoome> mru: confusion okay.. but why guint8 :|
[22:31:04] <j0sh_> http://www.youtube.com/watch?v=KrfpnbGXL70
[22:31:04] <Honoome> it's not even a #define, it's a damn typedef, which means the compiler warns when you mix those in
[22:32:37] <Honoome> j-b: do you happen to wish to comment about building software as universal binaries, btw? :P
[22:32:50] <mru> Honoome: NIH
[22:33:58] <j-b> Honoome: on Mac?
[22:34:19] <Honoome> j-b: well, yeah.. unless you actually want to use fatelf already :P
[22:35:00] <j-b> Honoome: on Mac, I would like to say ***£Dsdqs€€€q#{#{é"|\^**@@@ and FUCK
[22:35:28] <Honoome> j-b: you can join the fatelf idea bashing then ;)
[22:35:42] <mru> Honoome: fatelf is dead
[22:35:47] <mru> nobody talks about it
[22:35:48] <_av500_> j0sh_: lol at the pron scene
[22:36:06] <j-b> Honoome: how do you call a binary with intel32 and intel64 wrapped up? UB? But then how do explain to moronic users that this UB isn't a ppc UB ?
[22:36:08] <Honoome> mru: I want to make sure no idea like that comes up in the next months
[22:36:13] <mru> btw, for those who don't speak norwegian, the subtitles are accurate
[22:36:54] <mru> j-b: I can haz vlc for m68k?
[22:37:14] <j-b> yes, sure. Here is a compiler
[22:37:51] <j0sh_> _av500_: hilarious,isn't it? :)
[22:37:52] <_av500_> j-b: next vlc has to be a fatelf, you need to bundle an aol cd!
[22:37:53] <mru> for macos8
[22:38:14] <j-b> _av500_: :)
[22:38:32] <Honoome> mru: can we start designing ffexe?
[22:38:37] <j0sh_> j-b: did you ever get ffmpeg networking working?
[22:38:47] <_av500_> ffnet?
[22:38:54] <j-b> j0sh_: I never activated it on Windows builds, IIRC
[22:39:10] <mru> j0sh_: it's only about 10 years late
[22:39:19] <j0sh_> heh
[22:39:27] <Honoome> j-b: while we're all here harassing you, have you fixed live on debian/ubuntu? :P
[22:39:39] <mru> java became the modern-day cobol some 8 years ago
[22:39:42] <j-b> live, as in live55555555555 ???
[22:39:47] <Honoome> that crap yes
[22:39:51] <mru> dead666
[22:40:09] <j-b> Honoome: Sebastien has a patch for the latest live555 crappy change
[22:40:13] <j0sh_> Honoome: he doesn't have to, now that i wrote a patch integrating ffmpeg rtsp in vlc
[22:40:17] <mru> Honoome: we had the idea to write an elf demuxer for lavf
[22:40:28] <Honoome> j-b: what other crappy change?
[22:40:51] <Honoome> j0sh_: good, so soon I can drop the workarounds I added for vlc in feng :P
[22:41:15] <Honoome> mru: okay the demuxer.. but I'm actually tempted to design a new executable format _from scratch_ to workaround elf limitations...
[22:41:59] <roxfan> what limitations?
[22:42:00] <_av500_> will it be fat?
[22:42:12] <Honoome> _av500_: definitely not :P
[22:42:39] <Honoome> roxfan: no encoding of abi, uncountable workarounds and vendor-extensions to provide any basic detail.. standardised versioning...
[22:42:52] <BBB> Dark_Shikari: does that compile?
[22:43:16] <roxfan> there's a movement now for using eabi-derived attributes to encode abi specifics
[22:43:41] <Honoome> where's that movement?
[22:44:08] <roxfan> well, maybe i chose a wrong word
[22:44:57] <roxfan> thisa stuff: http://sourceware.org/binutils/docs/as/GNU-Object-Attributes.html
[22:45:25] <roxfan> it's quite extensible
[22:45:39] <j-b> Honoome: well, I doubt live555 runs on the iPad anyway, so an alternative is welcome
[22:45:58] <Honoome> I meant library ABI, as in the parameters to functions...
[22:46:13] <Honoome> j-b: ipad? O_o
[22:46:18] <j-b> Honoome: why not?
[22:46:30] <roxfan> i'm not sure how a _file format_ going to help there
[22:46:46] <j-b> Honoome: it could be a nice device for VoD rtsp:// client, no?
[22:47:01] <Honoome> j-b: why is there something that seriously still use rtsp?
[22:47:11] <Honoome> I thought we were doing that just for our own enjoyment?
[22:47:32] <j0sh_> lol
[22:47:34] <j-b> Honoome: well, all French ISP are streaming their TVs with rtsp://
[22:47:38] <mru> each architecture has its own properties that might need to be encoded in the file
[22:47:40] <roxfan> you want to implement automatic conversion of function parameters at call time or something?
[22:47:45] <j0sh_> Honoome: i think some network cameras might use rtsp too
[22:47:48] <Honoome> j-b: ah that I didn't know...
[22:47:49] <mru> it makes sense to leave defining those attributes to the vendors
[22:47:54] <_av500_> j-b: freebox ftw
[22:48:01] <j-b> for example :)
[22:48:21] <_av500_> add the fact that it messes up wifi pm
[22:48:31] <Honoome> j-b: next conference in paris I'll try to be there.. by train :P
[22:48:43] <_av500_> train is ok
[22:48:49] <_av500_> at least from FRA
[22:48:56] <j-b> Honoome: could I get then an autograph from you?
[22:48:57] <mru> Honoome: any reason you don't like planes?
[22:49:02] <Honoome> _av500_: from ita :P
[22:49:12] <Honoome> mru: I'm acrophobic.. extends to planes mostly..
[22:49:21] <Honoome> I can accept them if I'm not alone, but alone they still scare the shit out of me :/
[22:49:37] <_av500_> you are seldom alone on a plane
[22:49:43] <Honoome> j-b: sure, bring your gpg fp :P
[22:49:51] <_av500_> unless there are snakes
[22:50:10] <mru> if I found myself alone on a plane I'd be scared
[22:50:20] <Honoome> well "alone" :P
[22:50:31] <j-b> Honoome: nice.
[22:51:36] <janneg> mru: even if it's not flying?
[22:51:52] <mru> janneg: I'd expect at least a pilot
[22:51:53] <BBB> Honoome: radio streams also use rtsp sometimes
[22:52:17] <Honoome> the train from venice is not too bad, by price and time-table... leave at 23, arrives by 12 next day.. single room is acceptable
[22:52:37] <mru> or you could just fly...
[22:53:31] <Honoome> I'm not icarus :P
[22:53:42] <j-b> astrange: do you plan to have hwaccel working with ffmpeg-mt?
[22:53:48] <mru> good, _he_ couldn't fly so well
[22:54:03] <mru> j-b: pointless
[22:54:14] <mru> what is there to mt?
[22:55:02] <j-b> mru: not having 2 different versions of ffmpeg installed?
[22:55:30] <mru> I mean having hwaccel work with -mt should be simply a matter of not doing any mt when using hwaccel
[22:55:34] <janneg> hwaccel doesn't work in -mt?
[22:55:55] <j-b> so far ffmpeg-mt with t=1 doesn't seem to make hwaccell to work
[22:57:08] <janneg> I would consider that a bug
[22:59:16] <j-b> one might
[22:59:35] <j-b> also, maybe our lavc module is borken
[23:10:52] <Honoome> mru: aes.c... the [1] vs [4] .. shouldn't that rather change a multiplier rather than declaring a 2-dimensional array? it's _always_ used linearly afaics
[23:11:15] <mru> try telling that to michael
[23:11:45] <Honoome> ouch..
[23:12:54] <lu_zero> uhm =)
[23:13:54] <BBB> Dark_Shikari: did it compile?
[23:14:24] <lu_zero> 00:46 < Honoome> j-b: next conference in paris I'll try to be there.. by train :P
[23:14:31] <lu_zero> you can reach Torino by train
[23:14:43] <BBB> ohwell, weekend time I guess
[23:14:44] <douglas_carmicha> I'm having problems compiling 0.6 on FreeBSD... can anyone take a look at my pastebin output?
[23:14:44] <BBB> later
[23:14:46] <mru> http://thread.gmane.org/gmane.comp.video.ffmpeg.devel/61898
[23:14:48] <lu_zero> then you get tied and brought to the airplane in a cage
[23:14:56] <mru> ^^ there he claims it's a bug in gcc
[23:15:00] <mru> which is patently false
[23:15:01] <Honoome> lu_zero: I can also take a day to look out of the train writing on the laptop
[23:15:07] <lu_zero> douglas_carmicha: sure
[23:15:13] <lu_zero> Honoome: I did
[23:15:21] <mru> douglas_carmicha: -> #ffmpeg
[23:15:25] <lu_zero> you DO not want it
[23:17:22] <douglas_carmicha> http://ffmpeg.pastebin.com/tAt5M6Cm
[23:17:31] <douglas_carmicha> The main error messages are ones like these, when linking ffprobe_g:
[23:17:37] <douglas_carmicha> '/usr/ports/multimedia/ffmpeg/work/ffmpeg-0.6/libavcodec/libavcodec.so: undefined reference to `ff_dnxhd_init_mmx''
[23:18:25] <lu_zero> fun...
[23:19:08] <lu_zero> try master/svn
[23:20:02] <douglas_carmicha> Basically, build svn head without the port?
[23:20:14] <lu_zero> please try that
[23:20:26] <mru> busted shell or make
[23:20:44] <lu_zero> mru: hopefully
[23:20:54] <mru> probably make
[23:21:03] <mru> it's not building any of the x86 files
[23:25:03] <douglas_carmicha> I got a lot of these assembler messages when I tried a basic build with -D__BSD_VISIBLE in the extra-cflags option:
[23:25:08] <douglas_carmicha> '/var/tmp//cciNA3Jg.s:55: Error: `-1(%ebx)' is not a valid 64 bit base/index expression
[23:25:16] <douglas_carmicha> "/var/tmp//cciNA3Jg.s:55: Error: `-1(%ebx)' is not a valid 64 bit base/index expression
[23:25:39] <spaam> douglas_carmicha: what fbsd version do you use? :)
[23:25:44] <douglas_carmicha> 8.0-RELEASE-p3
[23:25:53] <douglas_carmicha> And the port is only ffmpeg 0.5.2.
[23:25:54] <douglas_carmicha> not 0.6.
[23:26:00] <mru> douglas_carmicha: you're in the wrong channel, go away
[23:26:33] <mru> either ask for help in #ffmpeg or file a proper bug report
[23:26:58] <mru> if you're using the ports, you should go to the port maintainr first
[23:33:56] <peloverde> kshishkov, ping?
[23:36:14] <Honoome> so I _could_ have just converted aes to the gentables framework...
[23:51:38] <mru> does anything test the aes code?
[23:53:33] <Honoome> that's a good question
[23:53:51] <mru> it has a littel test snippet at the bottom
[23:54:30] <mru> oh well, let's try a small patch
1
0
[00:44:54] <Dark_Shikari> someone was poking at me the other day that rgb48 didn't seem to work right
[00:45:04] <Dark_Shikari> I glanced at the code, and, uh.........
[00:45:05] <Dark_Shikari> #define PUTRGB48(dst,src,i) \ Y = src[2*i]; \ dst[12*i+ 0] = dst[12*i+ 1] = r[Y]; \
[00:45:10] <Dark_Shikari> dst[12*i+ 2] = dst[12*i+ 3] = g[Y]; \
[00:45:11] <Dark_Shikari> dst[12*i+ 4] = dst[12*i+ 5] = b[Y]; \
[00:45:15] <Dark_Shikari> I don't think that's quite what RGB48 means?
[00:45:42] <mru> lol
[00:45:55] <mru> who did that?
[00:45:59] <Dark_Shikari> I don't know.
[00:46:08] <mru> git knows
[00:46:10] <Dark_Shikari> But whatever it is, it's fucking funny
[00:46:33] <Dark_Shikari> Kostya.
[00:46:39] <Dark_Shikari> kshishkov: HAHAHAHAHA
[00:47:29] <mru> it's remarkable that software doesn't fail more often than it does
[00:48:04] <mru> everywhere you look, there's a buffer overflow or some other stupid bug
[00:49:37] <mru> looking at a buffer overread in svq1 now
[00:49:52] <mru> unterminated string
[00:50:00] <Kovensky> so he just doubled every byte on rgb24? lol
[00:50:02] <kierank> someone should invent a name for a bug that should completely destroy everything but somehow doesn't
[00:50:26] <mru> miracle?
[00:50:31] <kierank> miraclebug
[00:50:52] <mru> not as funny as heisenbug
[00:51:03] <Dark_Shikari> nobody used yuv -> rgb48
[00:51:04] <Dark_Shikari> so nobody ever saw it
[01:59:48] <kierank> lol michael
[02:00:35] <kierank> "The future is in perpetual motion, iam not capable to say anything
[02:00:35] <kierank> with certainity but i feel some remote danger.
[02:00:36] <kierank> "
[02:00:44] <kierank> etc
[02:03:14] <beandog> heh
[02:03:15] <beandog> just saw that
[02:03:16] <beandog> :)
[02:03:53] <roxfan> kierank: shroedingbug is somewhat close
[02:37:50] <Compn> oh wow
[02:37:52] <Compn> new mn quote
[02:38:24] <Compn> Re: [FFmpeg-devel] [PATCH 03/12] mdct: remove temporary array in ff_kbd_window_init()
[04:44:51] <kshishkov> DarKk_Shikari: I just committed it, not invented. Feel free to improve
[04:54:25] <Dark_Shikari> kshishkov: "improve"?
[04:54:27] <Dark_Shikari> it's completely wrong
[04:54:36] <Dark_Shikari> it doesn't even do what it says it does
[04:54:43] <Dark_Shikari> if you didn't write it, harass the person who did
[04:54:47] <Dark_Shikari> or at least tell mru so he can
[05:47:51] <CIA-99> ffmpeg: siretart * r23747 /branches/ (0.6 0.6/libavcodec/aacsbr.c):
[05:47:51] <CIA-99> ffmpeg: 10l: aacsbr: Fix f_master[2] calculation when k2diff == -1.
[05:47:51] <CIA-99> ffmpeg: backport r23660 by alexc
[05:53:41] <KotH> moin
[05:54:03] <av500> grüezi
[05:59:20] <thresh> woah, someone paid a lot of money to start a new wave of fighting-with-criminal-militia in Russia, http://russian-untouchables.com/eng/
[06:00:27] <thresh> this will surely end up with no results, but still..
[06:47:32] <wbs> superdump: does the latest patch in the libvorbis >2 channels thread look ok?
[06:51:15] <superdump> responded
[06:51:45] <superdump> and to answer your next question - if there is no maintainer in the maintainers file, it's michael
[06:52:04] <wbs> I think yuvi agreed to be maintainer for it
[06:52:19] <wbs> but he seems to be unavailable at the moment
[06:54:31] <CIA-99> ffmpeg: mstorsjo * r23748 /trunk/configure:
[06:54:31] <CIA-99> ffmpeg: Fix dependencies for the ra_144 encoder
[06:54:31] <CIA-99> ffmpeg: Patch by Francesco Lavra, francescolavra at interfree dot it
[06:56:45] <CIA-99> ffmpeg: mstorsjo * r23749 /trunk/libavformat/smacker.c:
[06:56:45] <CIA-99> ffmpeg: Correctly return EOF for smacker videos
[06:56:45] <CIA-99> ffmpeg: Patch by Alexei Svitkine, alexei dot svitkine at gmail
[07:36:02] <Tjoppen> ooh.. reordered_opaque. too bad it isn't used when encoding
[07:36:39] <astrange> coded_frame->pts or something
[07:36:53] <Tjoppen> yes, but that doesn't "survive"
[07:37:26] <Tjoppen> as in, that's not what comes out of coded_frame->pts when you get a packet (you get reordered synthetic PTSes)
[07:44:02] <Tjoppen> I'm also a bit unsure whether demuxed packets are supposed to be able to have dts == AV_NOPTS_VALUE
[07:45:14] <Tjoppen> oh, in AVPacket it says it can be that. nm
[07:45:37] <CIA-99> ffmpeg: vitor * r23750 /trunk/libavcodec/ (4 files in 2 dirs): SSE-optimized MP3 floating point windowing functions
[08:04:23] <mru> morning
[08:07:57] <Tjoppen> morrn
[08:08:05] <Tjoppen> fika \o/
[08:09:16] <av500> mru: os/2 is coming back: http://searchdatacenter.techtarget.com/news/article/0,289142,sid80_gci15085…
[08:12:36] <spaam> Tjoppen: o/
[08:12:54] <mru> gaaaah, they talk about corba
[08:12:57] <KotH> av500: rotfl
[08:13:05] <KotH> av500: ibm doesnt own os/2 anymore
[08:13:19] <KotH> av500: they sold everything a few years ago
[08:13:34] <mru> ecomstation
[08:13:40] <spaam> KotH: you bought it ?
[08:14:15] <KotH> spaam: see mru's acausal reply to your question
[08:14:41] <spaam> KotH: :)
[08:15:26] <kshishkov> wasn't OS/2 partially owned by M$ which prevented IBM doing some thing with it?
[08:16:19] <mru> no iirc
[08:16:34] * kshishkov is happy with the though though that if OS/2 rises from the dead there will be FFmpeg on it
[08:16:41] <kshishkov> *thought
[08:17:45] <mru> ms may have owned some bits for windows compat
[08:18:27] <kshishkov> they designed it in collaboration too
[08:18:36] <mru> and they had a bizarre deal that somehow prevented os/2 from being deployed widely before it was too late
[08:18:40] <kshishkov> though OS/2 turned out to be more stable
[08:19:02] <mru> os/2 isn't bad as such
[08:19:18] <mru> apart from a terrible user interface an absense of apps
[08:20:12] * kshishkov heard it performed quite well in ATMs
[08:21:10] <mru> yes, those problems don't matter there
[08:21:17] <mru> they put their on ui on it anyway
[08:21:22] <mru> and they only need one app
[08:22:28] <av500> mru: os/2 main problem was that it needed 8MB which at that time meant like $1000
[08:22:42] <av500> whereas win3.11 ran fine on 2-4
[08:22:56] <mru> and that it never quite seemed to be ready
[08:24:37] * kshishkov looks forward for FFmpeg 1.0
[08:25:10] <mru> I think that's comparable to the speed of light
[08:25:18] <mru> impossible to achieve
[08:26:48] <roxfan> you can get infinitely close to it but your mass will get inifinitely high :/
[08:27:30] <CIA-99> ffmpeg: mru * r23751 /trunk/libavcodec/alac.c: alac: change VLAs to fixed size
[08:28:51] * kshishkov waits for FFmpeg 0.9999999999999 then
[08:36:26] <Vitor1001> mru: why?
[08:36:55] <Vitor1001> It uses 5 regular registers and the mmx ones...
[08:37:10] <mru> they're in/out
[08:37:13] <mru> count as two
[08:37:18] <mru> look at fate
[08:38:22] <Vitor1001> hmm, I'm a noob in asm, so can I make them read-only?
[08:38:37] <Vitor1001> I modify the buffers but not the pointers
[08:38:40] <mru> let me have a look
[08:38:48] <Vitor1001> ok, thanks
[08:39:28] <mru> I didn't read the asm
[08:39:44] <mru> x86 asm gives me rashes
[08:40:09] <Vitor1001> This one shouldn't. There are no hacks to workaround missing instructions.
[08:40:09] <mru> anyway, I see what you mean
[08:40:28] <mru> change all the operands to "r"(foo)
[08:40:34] <mru> and add a :
[08:40:41] <mru> so :: "r"(win1a), ...
[08:40:52] <Vitor1001> ok. I don't know why, but gcc asm constraints looks black-magic to me :p
[08:41:00] <mru> they actually make sense
[08:41:07] <mru> when you know where they come from
[08:41:32] <mru> it's the same syntax used in the machine description files
[08:42:00] <Vitor1001> ugh, count is modified.
[08:42:01] <mru> the asm block is simply dropped into the rtl tree as a fancy instruction
[08:42:11] <Vitor1001> So I have to invert the registers numbers, no?
[08:42:20] <mru> yep
[08:42:33] <mru> this is why many of us prefer to write pure asm
[08:45:03] <Vitor1001> mru: This would work? http://ffmpeg.pastebin.org/356016
[08:45:42] <Vitor1001> I mean, in my box it works fine...
[08:45:46] <mru> it should be fine
[08:45:52] <Vitor1001> ok, will commit it.
[08:46:35] <mru> great
[08:46:42] <mru> should get some of fate back on track
[08:47:17] <Vitor1001> I hope so.
[08:47:30] <mru> but you still have errors on some systems
[08:47:40] <mru> Incorrect register `%r9' used with `l' suffix
[08:47:41] <CIA-99> ffmpeg: vitor * r23752 /trunk/libavcodec/x86/mpegaudiodec_mmx.c: Fix asm constraints in apply_window()
[08:48:25] <Vitor1001> :p
[08:48:37] <Vitor1001> And what is that supposed to mean?
[08:48:57] <mru> it means addl $16, %r9 isn't valid
[08:49:21] <Vitor1001> And how can it be it is valid in my box?
[08:49:29] <mru> 32-bit?
[08:49:34] <Vitor1001> Yes.
[08:49:35] <astrange> r9 isn't a 32-bit register
[08:49:48] <mru> replace addl with add
[08:50:02] <Vitor1001> Ok, will check.
[08:50:14] <astrange> (that is, it doesn't exist in 32-bit)
[08:50:49] <mru> the important thing here is that addl (note the l) only works with 32-bit registers
[08:51:03] <mru> %eax, not %rax etc
[08:51:12] <Vitor1001> ok, I see.
[08:51:27] <mru> that's why we have the x86_reg type
[08:51:47] <mru> it will use a register name that works with plain add
[08:51:47] <Vitor1001> Ok. "add" works with x86_reg...
[08:51:51] <mru> or something like that
[08:51:58] <Vitor1001> But isn't integers in x64 also 32-bits?
[08:52:13] <mru> %eax is 32-bit, %rax adds 32 high bits
[08:52:51] <mru> astrange: what are the 32-bit names for the high regs?
[08:53:08] <twice11> There is no name for just the top 32 bit of a 64 bit register.
[08:53:18] <mru> that's not what I meant
[08:53:25] <mru> I meant the low half of r8-r15
[08:54:13] <astrange> r9d
[08:54:23] <CIA-99> ffmpeg: vitor * r23753 /trunk/libavcodec/x86/mpegaudiodec_mmx.c: Fix compilation on x64.
[08:54:27] <mru> d? wtf is that supposed to mean?
[08:54:33] <twice11> double word
[08:54:52] <mru> they still believe a word is 32 bits?
[08:54:57] <twice11> no, 16 bits.
[08:55:00] <astrange> they think a word is 16 bits
[08:55:06] <mru> that's what I meant
[08:55:15] <astrange> r9l is taken for the low 8 bits
[08:55:21] <mru> gah
[08:55:28] <mru> and 16 bits?
[08:55:32] <twice11> r9w
[08:55:35] <twice11> http://www.x86-64.org/documentation/assembly.html
[08:55:40] <mru> not going there
[08:55:46] <mru> but thanks
[08:56:15] <mru> I don't see why parts of registers need names in the first place
[08:56:23] <mru> the size only matters for load/store
[08:56:43] <mru> or on x86, any memory operand
[08:56:44] <KotH> mru: old x86ism
[08:57:06] <mru> it made sense when you had %ah as well
[08:57:09] <KotH> mru: where ah+al were together ax which is in turn the lower half of eax
[08:57:10] <twice11> if you modify r9w, the top 48 bits are guaranteed to stay constant.
[08:57:21] <mru> but when only the low part is accessible separately it's totally pointless
[08:57:41] <mru> twice11: and that's mostly a nuisance
[08:57:44] <KotH> mru: there is an ah :)
[08:57:56] <mru> KotH: yes, but only a, b, c, d
[08:58:17] <mru> and there is no high 16-bit in 32-bit reg
[08:58:17] <KotH> mru: there were no more than 4 gp reg in x86 :)
[08:58:19] <kshishkov> mru: look at VFP/NEON registers - at least it's _sane_ lower part of register file you can access there
[08:58:37] <mru> eh?
[08:58:42] <mru> you can access everything there
[08:58:50] <kshishkov> KotH: si/di?
[08:59:15] <mru> KotH: but there is no sih
[08:59:25] <mru> only si and esi
[08:59:39] <mru> and rsi
[09:00:02] <KotH> kshishkov: these are no gp registers
[09:00:15] <mru> depends
[09:00:35] <mru> if you don't use them for their special purposes you can do whatever with them
[09:00:39] <mru> almost
[09:00:46] <mru> you probably can't multiply them
[09:00:48] <KotH> kshishkov: they are (were actually) index registers used for string operations
[09:00:57] <kshishkov> there are no gp-registers at x86 at all - try shl bl, al
[09:01:17] <KotH> mru: nope, they were not accessible in all operations until they got redefined to gp registers in 386
[09:01:46] <mru> but you could always use them to stash a value or so
[09:01:50] <KotH> juup
[09:01:59] <KotH> they were often used as cache for intermediate results
[09:02:01] <mru> and probably do some limited addition and such
[09:02:28] <mru> limited addition != saturating
[09:03:01] <twnqx> they are also perfect in "rep movsb" :P
[09:03:04] <KotH> saturation wasnt introduced until mmx
[09:03:17] <KotH> twnqx: that was their original purpose
[09:03:27] <mru> twnqx: but rep is mostly useless
[09:03:33] <CIA-99> ffmpeg: mru * r23754 /trunk/libavcodec/vp6.c: vp6: convert VLA to fixed size
[09:03:36] <KotH> twnqx: hence their name: source index, destination index
[09:03:42] <KotH> mru: useles? why?
[09:03:59] <twnqx> it's slower than manual looping in many cases
[09:04:10] <mru> it's ucoded in dirty ways
[09:04:18] <KotH> oh.. it became so slow?
[09:04:19] <mru> or rather as cheaply as possible
[09:04:24] <mru> because nobody uses it
[09:04:30] <mru> because it's slow
[09:04:41] * KotH still has an asm book or two who teaches how to use rep
[09:04:42] <mru> actually, it's compilers that have trouble using such instructions
[09:04:51] <KotH> realyl?
[09:04:52] <KotH> why?
[09:04:57] <KotH> what makes them difficult?
[09:04:58] <mru> they suck, didn't you know?
[09:05:48] <KotH> i dont know whether they suck, they taught me asm back in the days of old, when men were real men, women were real women and furry beings from alpha centauri were furry beings from alpha centauri
[09:06:08] <av500> wasnt it small furry beings?
[09:06:22] <KotH> dunno.. been some time since i read it
[09:06:53] <KotH> though, my asm definitly needs some refreshment..
[09:07:19] * KotH didnt learn much about protected mode, never had anything to do with mmx/sse/...
[09:07:26] <mru> good for you
[09:07:39] * mru never did _any_ x86 asm coding
[09:07:41] <KotH> you mean i shall learn arm asm instead? ;)
[09:07:46] <ohsix> repne scasb!
[09:07:47] <av500> yes
[09:07:50] <mru> KotH: that would be wise
[09:07:54] <spaam> haha
[09:08:14] <spaam> any asm that is not x86? :)
[09:08:26] <mru> ohsix: what's that? a klingon curse?
[09:08:34] <av500> for practical purposes that leaves you with arm asm...
[09:08:42] <ohsix> scan something, rep not equal
[09:08:59] <mru> mr worf, scan for life-signs
[09:09:19] <astrange> there's plenty of ppc asm to be written
[09:09:37] <mru> captain, long-range sensors are picking up an unusual rep prefix heading this way
[09:09:40] <ohsix> A common use of the REPNE SCASB instruction is to find the length of a NUL-terminated string.
[09:10:00] <mru> astrange: ppc has become irrelevant
[09:10:13] <ohsix> xbux?
[09:14:35] <kshishkov> ohsix: why y'all forget about "rep {mov,sto}sX" for memcpy/memset?
[09:15:04] <av500> astrange: pps for what?
[09:15:07] <av500> err, ppc
[09:15:19] <mru> kshishkov: memcpy is overrated
[09:15:37] <kshishkov> mru: so is everything else
[09:16:10] <av500> overrated is overrated
[09:16:16] <ohsix> kshishkov: truthfully i forgot everything, repne scasb has been something i've said out of context since the first time i saw it
[09:16:43] <mru> rating is overrated
[09:17:03] <mru> and it still sounds like a klingon curse to me
[09:17:33] <ohsix> it might be, except for the "asb" part
[09:17:50] <ohsix> not familiar with the language but thats a weird construct in any language :O
[09:19:20] <twice11> If I got recent optimization hints correctly, nearly everything is faster by hardcoded loops using XMM than using the string instructions.
[09:20:12] <twice11> And string instructions using a lower width than the maximum width supported by the processor are extremely slow.
[09:20:53] <ohsix> string instructions were a bad idea after the 286 iirc
[09:21:02] <mru> the 286 was a bad idea
[09:21:07] <astrange> strings are a bad idea
[09:21:28] <twice11> That's why memcpy typically has a core like "mov %ecx,%ebx; shr $2,%ecx; rep movsd; mov %ebx, %ecx; and $3,$ecx; rep mosvb"
[09:21:31] <ohsix> they were compact in representation and that was about it
[09:22:03] <ohsix> with rep it was almost like a function call
[09:22:16] <mru> I know that
[09:22:19] <mru> it's still useless
[09:22:25] <twice11> I think rep stosd/rep mosvd was the fastest way to store/copy on Pentium Pro/Pentium II with the right microcode loaded ("fast strings")
[09:22:31] <mru> optimising for something you shouldn't be doing is stupid
[09:22:55] <mru> they had alternative ucode variants?
[09:23:06] <ohsix> they got rid of SEX but added REP :>
[09:23:11] <DonDiego> mru, Dark_Shikari, say: was your mpeg4 simplification stuff helped by michael being trolled into refactoring it all some time ago by my humble trollness?
[09:23:29] <mru> he did?
[09:23:32] <twice11> Intel publishes microcode updates for all processors after the Pentium 1.
[09:23:36] <mru> oh, the h263 split
[09:23:55] <mru> twice11: I know they update the ucode from time to time
[09:24:08] <DonDiego> yes, that's what i was referring to
[09:24:12] <twice11> And I think fast strings were not in the original microcode.
[09:24:24] <mru> I hadn't heard of differently optimised alternatives
[09:24:26] <twice11> At least the BIOS somehow had to "enable" them.
[09:24:47] <mru> oh, it was something they added in some update
[09:24:52] <twice11> There are no alternatives, just "old" and "new" one, and the new one is faster.
[09:25:19] <twice11> The Pentium III added SSE which might beat string instructions again.
[09:25:48] <twice11> Optimizing for code size was important on the 8088 (which really was a bad idea, as it was starving on instruction supply)
[09:26:23] <twice11> Famous quote: "What use is a processor that can perform a 16 bit add in two clock cycles if it needs 14 cycles to fetch that instruction?"
[09:26:25] <mru> optimising memcpy is a largely futile exercise
[09:26:47] <twice11> Because the memcpy interface has no alignment requirements. Or are there further issues?
[09:27:27] <mru> because good code doesn't call memcpy
[09:27:41] <mru> or does it so infrequently that performance doesn't matter
[09:28:29] <twice11> I think any dictionary-based decompressor (Think LZ77) is just limited by memcpy speed for highly redundant data.
[09:29:36] <mru> you can't use memcpy in such code
[09:29:46] <mru> src and dst often overlap
[09:29:59] <mru> and memmove does the wrong thing
[09:30:04] <twice11> Err, right. But memmove is even slower usually.
[09:30:25] <twice11> Because that function has to test the direction of (possible) overlap on each invocation.
[09:30:34] <mru> I know that
[09:30:57] <mru> the point is standard lib functions do the wrong thing
[09:31:02] <mru> so their speed is unimportant
[09:43:23] <CIA-99> ffmpeg: mru * r23755 /trunk/libavcodec/ (fft.h mdct.c): Remove VLA in ff_kbd_window_init, limit window size to 1024
[09:45:41] <astrange> [A5
[09:46:15] <av500> [B 6
[09:47:05] <mru> escape codes!
[09:51:16] <astrange> irssi seems to like sending them instead of switching windows recently
[09:52:07] <mru> are on a laggy ssh?
[09:52:59] <astrange> yes
[09:53:48] <mru> I've seen that effect too
[09:54:57] <ohsix> it'll do it if you mash on pagedown if your buffer is way up, then you type some stuff in too
[09:55:13] <ohsix> ~[6~[6/sb end :>
[09:56:06] <mru> I guess it uses some heuristic to determine what's input and what's navigation
[09:56:51] <mru> if too much is bundled into a single read from the terminal, it gets confused
[09:56:57] <twice11> Navigation commands look like "ESC [ something"
[09:57:03] <ohsix> they do some buffering that detects pastes; recently improved a lot, but i suspect thats where the problem is
[09:57:31] <ohsix> new paste detection works if theres 3 chars in one read, instead of a handful of lines
[09:57:35] <twice11> if the delay between ESC and the rest is too big, ncurses decides that the ESC was stray and doesn't belong to a navigation key.
[10:56:21] <j-b> is there a ffmpeg-mt channel?
[10:56:48] <spaam> ask astrange :)
[10:57:03] <elenril> why should it exist?
[10:57:36] <peloverde> Are we any closer to merging -mt than we were 5 months ago?
[10:58:03] * mru thinks astrange should channel some effort into it
[10:58:13] * elenril thinks mru should help him
[10:58:51] <pJok> http://sphotos.ak.fbcdn.net/hphotos-ak-snc3/hs292.snc3/28268_408780123630_6… that is what i do on a daily basis...
[10:58:59] <peloverde> no, mru needs to rewrite fate
[10:59:20] <pross-au> whats wrong with fate
[10:59:40] <DonDiego> it's not free software for starters
[10:59:49] <DonDiego> also, it sucks
[11:00:04] <pross-au> and your vapourware sucks less?!
[11:00:05] <mru> and mike is single point of failure
[11:00:22] <pross-au> fair enuff
[11:00:24] <DonDiego> i asked mike for the code, he refused to give it out
[11:00:31] <av500> I think fate should also only test free codecs
[11:00:37] <pross-au> Haha
[11:00:38] <mru> pross-au: saying you'd _like_ to do something doesn't make it vapourware
[11:00:56] <pross-au> yeah somebody said it sucks
[11:01:21] <DonDiego> it does
[11:01:33] <DonDiego> i've been asking mike for years to add 'make checkheaders'
[11:01:39] <DonDiego> for some reason this is too hard
[11:01:53] <DonDiego> and i'm being prevented from adding it myself
[11:01:55] <DonDiego> bbl
[11:02:17] <mru> devs should be able to add or change tests without going through $mike
[11:02:24] <av500> +1
[11:02:38] <av500> fata should be in svn/git
[11:02:42] <av500> a->e
[11:02:42] <spaam> fata!
[11:02:43] <mru> that's the plan
[11:02:48] <mru> fatwa
[11:02:51] <pross-au> haha
[11:03:06] <av500> mru: now, after fat elf a fat wa?
[11:04:14] <pross-au> so thats it? everything else non-sucks
[11:04:30] <mru> the basic idea is sound
[11:04:38] <mru> but it needs restructuring
[11:04:55] <mru> and it needs better ways to query the history
[11:05:12] <pross-au> do you use that?
[11:05:16] * mru might enlist lu_zero or Honoome for some web frontend work
[11:05:22] * mru doesn't do web
[11:05:48] * mru has far too much work queued up
[11:05:49] <pross-au> ffmpeg is Web
[11:05:58] <mru> eh no
[11:06:06] <mru> ffmpeg isn't written in php
[11:06:13] <mru> nor does it have a mysql demuxer
[11:06:15] <pross-au> i do all my web programming in c
[11:06:15] <pJok> fate should be part of ffmpeg
[11:06:28] <mru> pJok: it will if I have things my way
[11:06:55] <mru> actually, I'd like to make it a standalone project
[11:07:06] <pross-au> yet another test framework
[11:07:08] <mru> but the test specs go in ffmpeg of course
[11:07:27] <pJok> mru, it doesn't really matter where it is, as long as it works and is changeable
[11:07:55] <mru> pJok: if the framework, as it were, is separate it could be used by others too
[11:08:05] <mru> I know DonDiego wants a good test system
[11:08:58] <pross-au> so php?
[11:09:05] <pross-au> or python
[11:09:09] <pJok> mru, just wondering why mike wont set it free
[11:09:27] <mru> he says it's too embarassingly ugly
[11:09:35] <mru> and I can easily believe that
[11:09:56] <mru> writing maintainable code is not one of mike's stronger skills
[11:10:21] <mru> he's more of a just-in-time programmer
[11:10:25] <pross-au> Ouch
[11:10:47] <mru> he's great at RE though
[11:10:50] <pross-au> and this is our test system
[11:18:37] <mru> #define FRAME_TIME 1.04489795918367346939
[11:19:21] <kshishkov> add comment - "we are retarded, our seconds need to be longer"
[11:19:34] <mru> // FIXME: horribly broken, but directly from reference source
[11:20:01] <mru> what number is that anyway?
[11:21:08] <kshishkov> sqrt(sqrt(sqrt(sqrt(2)))) ?
[11:22:07] <mru> close, but not quite
[11:22:15] <mru> 1.04427378243
[11:22:30] <kshishkov> ln(2)*1.5
[11:22:55] <thresh> 1.04489795918367346939 * 490 = 512
[11:23:16] <thresh> well that, or 245 of course
[11:23:49] <kshishkov> and it's suspiciously close to 25/24
[11:25:55] <peloverde> 256/245
[11:26:21] <mru> 25/24 struck me as a possibility
[11:26:37] <mru> but that's not close enough
[11:26:59] <kierank> http://slashdot.jp/it/10/06/24/0452249.shtml
[11:29:33] <peloverde> the google translate version is almost readable
[11:32:16] <mru> that's readable without translation
[11:32:33] <mru> it says ffmpeg trunk has a vp8 decoder
[11:34:54] <Vitor1001> mru: Why not a "make fate2" for "unofficial" fate tests?
[11:35:15] <mru> Vitor1001: that's the plan
[11:35:30] <mru> and then we'll slip it in as default when nobody is watching
[11:35:39] <Vitor1001> Ah, ok.
[11:35:53] <Vitor1001> So it will still uses Mike's web-based framework?
[11:36:06] <mru> no
[11:36:40] <Vitor1001> What I meant (supposed to be very easy) is to add a new target that is the same thing as current "make fate"
[11:36:45] <Vitor1001> but with different test specs
[11:36:57] <mru> that would be easy
[11:37:00] <Vitor1001> All we need them is to convince mike to add a test spec to it to fate
[11:37:07] <mru> but the aggregation of results is important
[11:37:15] <Vitor1001> Yes, that's exactly the point. It's easy!
[11:37:35] <pross-au> mru: how about a standardised output for for 'make fate'
[11:37:44] <mru> what do you mean?
[11:37:45] <Vitor1001> And it is easy for Mike or whoever to split it one test number per test afterwards
[11:38:13] <pross-au> mru: so results can be extracted for presentation
[11:38:18] <pross-au> and recording into database
[11:39:01] <Vitor1001> The advantage is that in the meantime we can _vastly_ expand our test coverage.
[11:40:08] <mru> it would take me a few days full-time work to make a new system
[11:40:14] <mru> at least the foundations
[11:40:28] <mru> I hope to have that time soon
[11:40:34] <Vitor1001> And how much time to make a "make fate2"?
[11:40:40] <pross-au> mike started with a very basic system
[11:40:47] <pross-au> and evolved it
[11:40:52] <Vitor1001> You have to count at least two weeks for flames and bikesheds :p
[11:41:00] <mru> mike started with one bad requirement: python
[11:41:06] <kshishkov> it should be able to run ffmpeg -i fate-test-file -f framecrc2 mysql://user:pw@server/results
[11:41:17] <mru> mike's love of python is sometimes detrimental
[11:41:31] <mru> I wasn't planning on using mysql
[11:41:38] <pross-au> JSON
[11:41:50] <mru> how about plain old text for the comms?
[11:41:50] <kshishkov> pross-au: that's for x264 :P
[11:41:56] <pross-au> LOl
[11:42:18] * kshishkov still wants FTP support in FFmpeg
[11:42:27] <Vitor1001> mru: Does mike know that you are seriously thinking about redoing fate?
[11:42:45] <pross-au> Cya
[11:42:51] <mru> Vitor1001: don't know
[11:42:59] <elenril> kshishkov: send patches
[11:43:12] <Tjoppen> kshishkov: me too. is there a reason why not?
[11:43:13] <peloverde> what's wrong with python?
[11:43:19] <Vitor1001> Maybe this will change his mind about making the whole thing more collaborative...
[11:43:21] <thresh> everything
[11:43:36] <Tjoppen> I almost ended up writing one, but hacking around it using libcurl was easy enough
[11:44:45] <kshishkov> Tjoppen: exactly - nobody did it
[11:45:01] <Tjoppen> FTP is cool because it can actually tell you/block on partial files
[11:45:07] <benoit-> license violator winner of the day: posted a message on ffmpeg-user ML (without being registered) to advertise its software: http://www.pavtube.com/hd-video-converter/
[11:45:14] <Tjoppen> unlike POSIX files (BSD has fixed that though)
[11:45:24] <mru> benoit-: lol
[11:45:34] <mru> Tjoppen: tell you what?
[11:45:56] <Tjoppen> tell you that the file is partial
[11:46:03] <Tjoppen> in other words: being uploaded
[11:46:13] <Tjoppen> so you treat the connection as a pipe
[11:46:28] <mru> oh, you mean downloading a file from the same server it's being uploaded to
[11:46:34] <Tjoppen> yep
[11:46:50] <mru> I'd consider that a nothing more than a neat gimmick
[11:47:03] <peloverde> Registered in China, what a surprise
[11:47:13] <Tjoppen> libavformat handles that now. I sent in a couple of patches a while that makes reading from stdin work (except a few demuxers)
[11:47:16] <mru> it wouldn't know if the file is being written by some other means
[11:47:21] <Tjoppen> *a while bac
[11:47:47] <Tjoppen> I'm fairly sure FTP, like HTTP, doesn't allow you to change the content in the middle of a file
[11:49:14] <Tjoppen> doing stuff like curl ftp://example.com/foo.avi | ffplay - is pretty neat
[11:51:02] <kshishkov> even better - playing random file via SSH!
[11:51:21] * kshishkov waits for libavcrypt
[11:51:38] <Tjoppen> ah yes. like a lottery - every once in a while you get rick astley instead
[11:57:08] <lu_zero> good morning
[11:57:16] <lu_zero> kshishkov: write it =P
[11:58:25] <spaam> Tjoppen: like this http://spaam.se/secret/mru_and_koth_love.jpg ? :)
[11:58:46] <spaam> and yes you will get rickrolld..
[12:03:12] <mru> spaam: not if you check the headers first
[12:03:47] <mru> hehe youtube... Server: wiseguy/0.6.2
[12:06:53] <Tjoppen> luckily flash doesn't work on my machine
[12:08:52] <kshishkov> it's no luck, it's result of your own actions
[12:09:27] <kshishkov> like Firefox crashing when trying Flash on Gdium
[12:09:41] <Tjoppen> no, it actually does work occasionally
[12:10:02] <av500> kshishkov: those 3 things together are just asking for trouble...
[12:10:04] <spaam> mru: sure :) but not everyone do that :)
[12:10:13] <Tjoppen> I had a greasemonkey script that replaced the player with totem or mplayer, but it doesn't work any more
[12:10:40] <Tjoppen> youtube-dl does though, although a bit awkward
[12:10:46] <peloverde> running FFmpeg with arbitrary network input unsandboxed scares me
[12:11:25] <kshishkov> av500: even independently
[12:11:43] <av500> thats what I meant
[12:12:05] <kshishkov> spaam: being lazy is not an excuse to parse your HTTP responses in telnet window
[12:12:52] <lu_zero> kshishkov: you don't have a tcpdump as background?
[13:42:43] <BBB> peloverde: is there a webm irc channel?
[13:43:04] <av500> #webm
[13:43:51] <BBB> I guess I needed codec :)
[13:44:12] <av500> right
[13:44:26] <mru> #ffmpeg-devel ?
[13:44:51] <BBB> wbs: reviewing j0sh' patches now
[13:44:57] <BBB> mru: good point
[13:48:18] <Dark_Shikari> BBB: #vp8
[13:48:32] <Dark_Shikari> BBB: progress on asm?
[13:49:36] <BBB> it's working, sorry, can't spend full days on this on weekdays, have work to do, but it's progressing, it'll be done this weekend
[13:50:50] <Dark_Shikari> ah ok
[13:51:29] <av500> BBB: I just joined #vp3 - #vp7 as well :)
[13:51:59] <BBB> I've had a good read through your sse2/ssse3 asm, I get most of it, probably will document it a bit before I commit it though (just so I get it)
[13:52:14] <janneg> the official website of vp7 is vp7.de
[13:52:17] <janneg> ;)
[13:52:17] <Dark_Shikari> BBB: of course
[13:52:22] <Dark_Shikari> you've got to have some questions though =p
[13:52:29] <BBB> I will :-p
[13:52:33] <av500> janneg: yep
[13:53:33] <av500> Dark_Shikari: you are sure they still have vp7 source? the files on the smb chare have been now edited into vp8 src code...
[13:53:48] <Dark_Shikari> lol
[13:53:56] <Dark_Shikari> "version control, what's that?"
[13:54:47] <mru> alternatively you can use clearcase
[13:54:53] <mru> it actually stores all the old versions
[13:55:03] <mru> but in totally useless way
[13:59:11] <KotH> worse than vss?
[13:59:29] <mru> more expensive at least
[13:59:55] <mru> I've never encountered vss
[14:00:05] <wbs> vss is .. interesting
[14:00:10] <wbs> or at least was 8 years ago
[14:00:27] <mru> isn't that the one that corrupts the entire repo once a week?
[14:00:33] <wbs> one user locks a file, preventing all other team members from modifying the same file (within their own working copies)
[14:01:16] <KotH> well.. vss is to cvs what cvs is to git
[14:01:50] <kshishkov> mru: yes, especially if more than one guy uses it
[14:02:00] <mru> fundamentally, clearcase is cvs
[14:02:13] <mru> it works on files
[14:02:17] <av500> wbs: it's lol, but at some stage our CEO asked if that was possible, ppl editing files but not having the src code :)
[14:02:50] <wbs> it's major fun when one user has locked the file and gone home for the weekend ;P
[14:02:53] <mru> but they've added layers on top making it nearly impossible to do anything useful
[14:03:12] <kshishkov> sounds like user-friendly
[14:03:18] <mru> another "nice" one is pvcs
[14:03:19] <wbs> yeah, clearcase makes git seem dead simple (which it is, IMO, too ;P)
[14:03:32] <mru> it keeps _no_ track of where the local copy came from
[14:03:42] <mru> so if you do a checkout
[14:03:51] <mru> then someone else does a checkin
[14:03:52] <av500> it comes from the smb share, where else! :)
[14:03:56] <mru> and you check in
[14:04:02] <wbs> anyone used synergy/continuus?
[14:04:06] <mru> you effectively revert the previous checkin
[14:04:23] <kshishkov> wbs: those are too generic buzzwords
[14:04:24] <mru> av500: actually, that's how they ran it where I worked
[14:04:49] <av500> see, I know the enterprise business!
[14:05:16] <mru> then they mounted an smb share in the uk from the us
[14:05:29] <mru> a full checkout took an entire day
[14:12:34] <wbs> kshishkov: yeah, but they're also names of quite an ugly scm ;P
[14:21:33] <Honoome> wbs: you're not using SOCK_SEQPACKET for your code, are you?
[14:21:51] <wbs> Honoome: umm.. not that I know of at least :-)
[14:21:59] <Honoome> (sctp)
[14:22:04] <wbs> I don't use sctp at all
[14:22:15] <mru> that's lu_zero
[14:22:16] <Honoome> ah okay.. so was Josh doing so?
[14:22:28] <Honoome> mru: yes I know lu_zero is the sctp freak :P
[14:22:33] <wbs> no, I don't think he touches it either, yet at least
[14:22:45] * mru played with sctp a bit many years ago
[14:23:07] <Honoome> it's because lu suggested using seqpacket in feng that I'm making sure that nobody is going to use that in ffmpeg, as I've just been explained that it's not what we need :P
[14:23:27] <Honoome> on the other hand, I fixed FIONREAD/SIOCINQ and hopefully it'll be merged in a future kernel, yai.
[15:10:56] <CIA-99> ffmpeg: mru * r23756 /trunk/libavformat/asfdec.c: asfdec: ensure number of streams is within bounds; remove VLA in asf_read_pts()
[15:23:23] <CIA-99> ffmpeg: benoit * r23757 /trunk/libavcodec/ffv1.c:
[15:23:23] <CIA-99> ffmpeg: Set an opaque alpha value when decoding rgba ffv1.
[15:23:23] <CIA-99> ffmpeg: Patch by Thad Ward coderjoe69?yahoo?com
[15:58:19] <wbs> Vitor1001: btw, your mp3 decoding asm breaks compilation if you use --disable-everything --enable-decoder=mp3
[15:58:40] <wbs> Vitor1001: since it tries to call ff_mpegaudiodec_init_mmx within HAVE_MMX
[16:02:26] <mru> and...
[16:02:52] <mru> what does that function depend on to be built?
[16:03:30] <wbs> and the object file containing that function is only compiled if CONFIG_MP*FLOAT_DECODER is enabled (in lavc/x86/Makefile)
[16:27:18] <peloverde> "bash: ffaacdec: command not found" *headdesk*
[16:27:35] <mru> ?
[16:27:47] <peloverde> http://lists.mplayerhq.hu/pipermail/ffmpeg-user/2010-June/025823.html
[16:27:54] <mru> oh, -user
[16:28:02] <mru> that's why we split it off
[16:28:43] <peloverde> The worst part is I can't tell if that guy is trolling or not
[16:29:16] <wbs> peloverde: probably not, don't underestimate users' cluelessness
[16:29:36] <twice11> famous quote: "Never attribute to malice what can be attributed to stupidity"
[16:54:01] <_av500_> peloverde: not all the ppl get the ff prefix thingy...
[16:54:35] <mru> they are not ffpeople
[16:55:09] <twice11> And have no ffclue :)
[16:55:38] <Dark_Shikari> fffuck them
[16:55:42] <BBB> peloverde: I thought that was awesome, you should not reply and wait for someone to figure it out ;)
[16:55:48] <BBB> that's called a "user community"
[16:55:48] <mru> Dark_Shikari: ffuck
[16:56:03] <mru> words already starting with f get only one extra
[16:56:24] <peloverde> The thread was about building ffmpeg with libfaad so presumably ffmpeg is still the correct binary
[16:57:03] <mru> the best part is most people were probably using ffaac all along
[16:57:20] <mru> it was always the default even if libfaad was enabled
[16:57:57] <twice11> The thread is about a two AAC decoding modules. One relies on libfaad and the other (internal) one is called ffaacdec
[17:02:09] <votz> peloverde: how much faster would you say ffaacdec is?
[17:02:20] <mru> votz: on what cpu?
[17:02:39] <peloverde> It's about twice as fast on my core2
[17:02:41] <votz> some modern desktop proc, like an i5 or an i7
[17:02:54] <peloverde> (except for main profile which is super slow) but that's because main profile sucks
[17:03:02] <votz> peloverde: wow, awesome
[17:03:18] <votz> peloverde: does ffaacdec practice black magic to attain such speed improvements?
[17:03:25] <votz> and/or voodoo
[17:03:48] <peloverde> well if you consider inlining get_bits primitives black magic then yes
[17:03:59] <peloverde> but mru gets the credit for that one
[17:04:09] <votz> mru: what does get_bits do exactly?
[17:04:26] <peloverde> The rest of the speed is mostly from av_fft and dsputil
[17:04:29] <votz> probably gets some bits of some kind, that I get ;)
[17:05:19] <mru> ffaac is ~3.6x faster than faad on my i7
[17:06:06] <twice11> get_bits reads a certain number of bits from a bit stream (which for obvious reason is a byte stream in computer memory)
[17:06:48] <votz> peloverde: mru: that's quite an impressive improvement
[17:07:40] <mru> faad isn't exactly brilliantly coded
[17:07:43] <Honoome> I wonder, can we make use of the mirror-climbing capabilities of fatelf supporters?
[17:08:01] <mru> there are more than one?
[17:08:22] * mru could use some shotgun target practice
[17:08:44] <Honoome> mru: sure, otherwise I wouldn't even waste the time
[17:09:14] <Honoome> if it was only icculus, it'd be all fine... he's insane, full stop
[17:09:39] <mru> who else?
[17:09:56] <mru> I'd like to see someone argue for fatelf in gentoo
[17:10:11] <mru> that would be the ultimate misachievement
[17:10:17] <Honoome> mru: http://mike.trausch.us/blog/2010/06/23/on-fatelf-or-because-140-characters-…
[17:10:24] <Honoome> but luckily nobody asked gentoo for it
[17:10:57] <Honoome> [and from the other side I posted another reasoning why fatelf makes no bloody sense... if somebody wants to spread it :P http://www.reddit.com/r/linux/duplicates/ciham/a_few_more_reason_why_fatelf… ]
[17:10:58] <mru> (how does firefox manage to spin up the disk every time I touch it?)
[17:11:09] <Honoome> mru: sqlite
[17:11:31] <mru> is there a way to make it stop?
[17:11:43] <Honoome> emerge -C mozilla-firefox
[17:11:51] <mru> other than that?
[17:11:56] <Honoome> not that I know of
[17:12:09] <mru> I don't bloody care if it doesn't sync the db instantly
[17:12:20] <Honoome> but they do! :P
[17:12:23] <twice11> LD_PRELOAD something that kills fsync/fdatasync?
[17:12:41] <mru> or put ~/.mozilla in tmpfs
[17:12:50] <mru> and copy back when it exits
[17:12:52] <Honoome> that's an option
[17:12:54] <twice11> But be aware that the history and the like are stored in binary database files that can explode in your face if they get inconsistent.
[17:13:14] <mru> it's a laptop ffs
[17:13:17] <mru> it has builtin ups
[17:13:52] <mru> holy crap, .mozilla is 125M
[17:14:03] <twice11> It contains the disk cache.
[17:14:23] <twice11> that is on-disk cached file from the internet, not cached disk contents...
[17:15:01] <mru> hmm, there are files here not touched since 2005
[17:15:26] <Dark_Shikari> they were created on the init of .mozill
[17:15:27] <Dark_Shikari> *.mozilla
[17:15:50] <mru> and never deleted when they switch formats
[17:16:12] <Honoome> there are some that have older dates because are taken from firefox extensions
[17:16:26] * Honoome hates the fact that gentoo mozilla team insists on NOT creating ebuilds for extensions
[17:16:44] <Honoome> my home is not the place for stuff ! why should I keep the dictionaries there?
[17:23:20] <_av500_> Honoome: this guy is so misguided
[17:23:36] <_av500_> what is this small business argument?
[17:29:04] <mru> btw skype are still/again looking for people
[17:29:13] <mru> don't know if permanent or contract
[17:31:00] <Dark_Shikari> well if they want contract, I can help
[17:31:25] <mru> I don't know what skills they're looking for either
[17:31:36] <mru> only that it's arm related
[17:32:27] <mru> someone at arm mentioned it
[17:32:33] <_av500_> Dark_Shikari: they can give you vp7 src code :)
[17:32:55] <Dark_Shikari> mru: well they wanted h264-related stuff a month or two back
[17:33:05] <mru> yeah, I remember that
[17:33:18] <mru> but they didn't like the idea of contracts then
[17:33:24] <mru> maybe they've changed their minds
[17:33:57] <mru> I'll let you know if I hear anything further
[17:34:14] <Dark_Shikari> k
[17:34:14] <peloverde> Looking at their site it seems liek they have a bunch of openings
[17:34:30] <BBB> who on earth wants to work in talin?
[17:34:38] <Dark_Shikari> estonia?
[17:34:41] <BBB> yeah
[17:34:47] <Dark_Shikari> I recall that being a pretty awesome place actually
[17:34:47] <mru> when I spoke with them the jobs were in stockholm
[17:34:50] <_av500_> BBB: estonians?
[17:35:04] <Dark_Shikari> fast, cheap internet!
[17:35:08] <BBB> I recall new york being a pretty awesome place
[17:35:15] <Dark_Shikari> new york costs 15 times more to live in
[17:35:30] <BBB> that's probably because it has 15 times more to offer
[17:35:55] <mru> than?
[17:36:02] <BBB> than talin :)
[17:36:03] <_av500_> thats why you have a 15x larger apartment :)
[17:36:12] <Dark_Shikari> >large apartment
[17:36:13] <Dark_Shikari> >new york
[17:36:16] <Dark_Shikari> hahahaha
[17:36:36] <BBB> yeah, you'll get a shoebox here if you're lucky
[17:36:54] <BBB> although rental is quite cheap b/c of the crisis right now
[17:36:55] <_av500_> imagin talin now, you have to live in a teacup
[17:37:38] <BBB> mru: how close are you to london?
[17:37:55] <mru> 1.5 hours
[17:38:15] <BBB> that's not bad :)
[17:38:33] <mru> it's perfectly within range of a day trip
[17:38:38] <mru> last train back is at 1am
[17:38:57] <_av500_> or 6am i guess
[17:39:06] <mru> don't know when they start
[17:39:18] <mru> no later than 6 I guess
[17:40:03] <Dark_Shikari> _av500_: yeah but the tea is good
[17:40:33] <_av500_> in talin?
[17:40:43] <Dark_Shikari> 01:36 <+_av500_> imagin talin now, you have to live in a teacup
[17:40:47] <mru> changing topic, we have a threading related bug or two
[17:41:07] <mru> fate is showing spurious failurus on threaded encodes since I eneabled pthreads
[17:41:27] <janneg> german rail claims the first train is leaving at 05:30
[17:41:43] <mru> sounds about right
[17:44:41] <BBB> mru, Dark_Shikari: regarding src_stride and dest_stride issue
[17:44:49] <mru> fix it!
[17:44:51] <BBB> mru, Dark_Shikari: shouldn't h264 suffer from the same issue?
[17:45:00] <BBB> we're using a h264 function prototype
[17:45:17] <BBB> or is this because h264 doesn't use intermediate buffers when doing h+v subpel mc?
[17:45:54] <BBB> and if that's the case, should I just shut up and not do that either, rather than adding a function argument that I won't use when implemented correctly/optimally?
[17:46:06] <mru> my neon code uses an intermediate buffer in a couple of cases
[17:46:14] <mru> but the internal functions deal with different strides
[17:46:23] <BBB> oh, you have wrapper functions?
[17:46:30] <BBB> maybe I should do that too?
[17:46:30] <mru> no
[17:46:33] <mru> no wrappers
[17:46:49] <mru> well, tiny ones
[17:47:00] <BBB> that's what I mean :-p
[17:47:03] <mru> for calling h then v
[17:47:08] <mru> no wrappers around single functions
[17:47:14] <BBB> right
[17:48:17] <BBB> I'll look at the neon code for inspiration
[17:48:28] <BBB> on my way to fix it, but will be gone for a bit to watch the dutch play
[17:48:33] <BBB> go netherlands! \o/
[17:48:37] <lu_zero> uh?
[17:49:40] <mru> these crazy dutchmen...
[17:50:10] <lu_zero> mru: what's the impact of the whole vla -> max_sized arrays?
[17:50:20] <mru> impact?
[17:50:25] <mru> no vlas of course
[17:51:07] <lu_zero> in memory/on disk footprint should change a bit
[17:51:25] <mru> probably not
[17:52:11] <mru> I just want to stop people doing stupid things
[17:52:19] <mru> like allocing insane amounts of stack space
[17:52:35] <mru> megabytes in some cases
[17:53:57] <kierank> am I right in saying that to change the phase term in an mdct, only the postrotate needs to be modified?
[17:55:00] * lu_zero stares at ffmpeg.c
[17:56:08] <peloverde> kierank, that seems correct
[18:00:41] <mru> ffmpeg.c stares back
[18:00:59] <CIA-99> ffmpeg: lu_zero * r23758 /trunk/libavformat/options.c: Remove typo: s/ingore/ignore/
[18:18:02] <CIA-99> ffmpeg: mru * r23759 /trunk/libavcodec/tta.c: tta: replace potentially huge VLAs with malloc/free in context
[18:20:37] <Honoome> mru: out of curiosity do you happen to know if there is any tool that can tell you the size of the stack allocations of a given function statically?
[18:22:58] <mru> not if vlas are used :-)
[18:24:01] <Honoome> what if it gives you the amount or "amount + warning: vlas are used" ? :P
[18:24:55] * Honoome found some huge stacks in feng and would love to know if there are more
[18:25:10] <lu_zero> plumes?
[18:25:52] <Honoome> plumes?
[18:26:02] <mru> fumes
[18:26:05] <mru> flames
[18:26:54] <jai> can i apply a cosmetics patch like http://pastie.org/1017588
[18:27:26] <mru> I wouldn't touch that witihout asking michael
[18:27:36] <jai> ah, ok
[18:28:03] <Plumeseyes> lu_zero: this way?
[18:28:34] * lu_zero shivers at the idea of diego in a feathery costume
[18:28:48] <Plumeseyes> Monkey Island style?
[18:29:01] <lu_zero> exactly!
[18:36:57] <mru> Honoome: gcc -S for arm puts the frame size in a comment at the start of each function in the asm output
[18:37:29] <Honoome> hmm so it is feasible to expect that DWARF encodes that data as well?
[18:37:54] <mru> dwarf almost certainly does
[18:38:02] <mru> stack unwinding would be tricky without it
[18:38:14] * Honoome checks whether dwarves already has a tool to extract that information
[18:40:49] <Honoome> I'd rather not even start to think about parsing DWARF data myself...
[19:07:34] <Honoome> mru: please don't hate me.. you know if the C++ mangling scheme used by gcc3 is documented somewhere beside its source code?
[19:07:48] <Dark_Shikari> http://agner.org/optimize
[19:08:06] <mru> Honoome: it should be someplace more official
[19:08:32] <Honoome> Dark_Shikari: ah thanks :)
[19:08:50] <Honoome> mru: well, google's "gcc3 c++ mangling" didn't bring up any doc from the gcc website at all :|
[19:10:01] <lu_zero> Honoome: mangling?
[19:10:10] <Honoome> lu_zero: that's the name of the procedure
[19:14:47] <lu_zero> wikipedia doesn't give any better link
[19:15:00] <mru> well, it is gcc after all
[19:16:17] <Honoome> and C++! :P
[19:16:51] <lu_zero> well it states that they follow microsoft convention
[19:17:12] <lu_zero> http://www.codesourcery.com/public/cxx-abi/abi.html#mangling
[19:17:18] <lu_zero> du de dum...
[19:28:04] <lu_zero> that tells a lot about how messup c++ is...
[19:37:31] <wbs> lu_zero: did you check j0shs patches yet?
[19:40:02] <wbs> BBB: in case you want to reply to the -user mail you pointed out a few days ago (http://lists.mplayerhq.hu/pipermail/ffmpeg-user/2010-June/025790.html) (i'm not on that list)
[19:40:06] <lu_zero> read yes, tried not yet
[19:40:38] <wbs> BBB: it's a bug in the 0.6 branch, fixed in rev 23344 on trunk. and it's not related to http at all, some generic stuff in lavc that's mostly used when streaming to ffserver
[19:51:10] <j0sh_> wha should i work on next? quicktime depacketizer?
[19:51:45] <mru> feel free to review my patches :-)
[19:54:16] <wbs> j0sh_: as a follow-up on this, I'd say refactor out the common code for parsing sdp lines
[19:54:25] <wbs> j0sh_: I've added a note on the wiki list about that
[19:54:27] <j0sh_> ok
[19:54:30] <lu_zero> j0sh_: what you'd like to tackle next?
[19:54:42] <wbs> it shouldn't be big I think, then after that, more depacketizers and packetizers perhaps
[19:54:46] <lu_zero> btw please restart sending reports ^^
[19:54:54] <j0sh_> lu_zero: ah, yes, sorry
[19:55:04] <verb3k> Is libavfilter's fade patch usable atm?
[19:55:23] <j0sh_> patches at the end of the day aren't acceptable reports? :)
[19:55:28] <lu_zero> verb3k: you reminded me that I should dig ffmpeg.c
[19:56:37] <lu_zero> j0sh_: if it's a bourden we could quit that, still helps me a bit since in this week and the next I'll less reachable
[19:56:53] <j0sh_> lu_zero: what's the rfc for mpeg-4 video? 3016 or 3640?
[19:57:36] <wbs> lu_zero: I've tested the patches a bit, seems to work fine for me, valgrind clean too
[19:57:50] * j0sh_ also tested with valgrind for once
[19:57:54] <wbs> and yes, there's both string.h and strings.h, some odd functions are in the latter
[19:57:58] <wbs> j0sh_: good, good. :-)
[19:58:12] <wbs> j0sh_: isn't the tingling sensation that your code runs without a single remark nice? :-)
[19:58:16] <j0sh_> yeah, i got a warning for something using string.h that went away with strings.h
[19:58:28] <j0sh_> wbs: feels very good indeed
[19:59:12] <lu_zero> j0sh_: I was puzzled since the code you moved didn't have any of the strings functions
[19:59:30] <j0sh_> strcasecmp
[20:00:42] <lu_zero> the patch doesn't have strcasecmp references ^^
[20:01:34] <lu_zero> _maybe_ you should split the s/string/strings lines in a patch alone
[20:02:32] <j0sh_> patch 003 is where i introduced string.h with strcasecmp
[20:02:52] <lu_zero> uhm
[20:03:02] <wbs> j0sh_: the mpeg4 video is rfc 3016
[20:03:12] <j0sh_> i think i realized i needed strings.h after i made the commit
[20:03:16] <wbs> key quote from part of that rfc, "An MPEG-4 Visual bitstream is mapped directly onto RTP packets without the addition of extra header fields or any removal of Visual syntax elements.
[20:03:30] <j0sh_> but i changed nearby lines and didn't want to deal with merge conflicts
[20:03:35] <j0sh_> ...in other words,i was lazy :)
[20:03:50] <wbs> that is, you just chop up the packet into rtp packets, so no explicit parse routine is necessary for that one
[20:04:12] <j0sh_> wbs: that explains why there's no parse_packet needed
[20:04:23] <lu_zero> ^^;
[20:05:14] <j0sh_> lu_zero: will fix it tho. coming right up
[20:07:06] <lu_zero> ok =)
[20:10:22] <j0sh_> uh ohhh... accidentally squashed one of the patches... *mutters*
[20:13:03] <wbs> j0sh_: during a rebase -i?
[20:13:25] <wbs> you may be able to recreate the previous branch state before doing the rebase, by checking through the reflog
[20:18:57] <j0sh_> wbs: alright trying that
[20:20:10] <wbs> not sure if there's any pretty interface to the reflog, but you can at least check in .git/logs/refs/heads/<branchname>, there should be a list of the hashes that the branch head has pointed to
[20:20:36] <wbs> so if you try checkout one of the last hashes before the faulty rebase, you can reset the branch to that hash and redo the rebase properly instead
[20:23:56] <j0sh_> git reflog works :)
[20:24:18] <j0sh_> and yeah, was able to checkout to a new branch based on the sha of the last good commit
[20:24:31] <j0sh_> man, git is AWESOME
[20:24:42] <wbs> :-)
[20:24:58] <lu_zero> =)
[20:25:11] <wbs> yeah, as long as you don't gc and purge the database right after doing something stupid, you very very seldom actually lose somthing
[20:25:15] <peloverde> indeed
[20:25:40] <peloverde> WTF http://java.dzone.com/dose/dzone-daily-dose-619 Why did they convert the FFmpeg logo to jpeg?
[20:25:56] <wbs> j0sh_: in case the reflog would have been turned off, you could also have done a git fsck and find all loose heads that aren't referenced by any branch head, and then look for the one you want :-)
[20:26:10] <j0sh_> i think i lost half of a day because of something like this a few weeks ago, when i was still geting the hang of git
[20:26:13] <mru> peloverde: becaue the domain starts with java
[20:26:20] <mru> peloverde: i.e. they're clueless
[20:26:34] <wbs> j0sh_: yeah, you may lose some time until you get the hang of it
[20:26:49] <wbs> j0sh_: actually understanding how it works on a lower level helps a lot of using it effectively though
[20:27:01] <j0sh_> i'm sure
[20:27:56] <wbs> and after using git for a while, you really realize how inferior svn is :-)
[20:28:12] <lu_zero> ;_;
[20:28:17] <j0sh_> i think i realized that after i learned how to wield rebase -i :)
[20:29:03] <wbs> :-)
[20:50:28] <CIA-99> ffmpeg: mru * r23760 /trunk/configure:
[20:50:28] <CIA-99> ffmpeg: configure: add 'warn' function
[20:50:28] <CIA-99> ffmpeg: The 'warn' function records a warning message for display after other
[20:50:28] <CIA-99> ffmpeg: informational messages.
[20:50:28] <CIA-99> ffmpeg: mru * r23761 /trunk/configure: configure: warn about missing yasm
[20:50:29] <CIA-99> ffmpeg: mru * r23762 /trunk/configure: configure: use warn function for unrecognised --cc and --arch settings
[21:07:46] <Dark_Shikari> mru: +100000
[21:08:09] <Dark_Shikari> actually, since we're going to rely on yasm optimizations entirely for vp8....
[21:08:16] <Dark_Shikari> why not we just require it unless the user does --disable-mmx?
[21:12:43] <BBB> warning is sufficient
[21:12:56] <Dark_Shikari> nobody pays attention to configure-time warnings
[21:12:58] <Dark_Shikari> we had them in x264 for years
[21:13:01] <Dark_Shikari> NOBODY NOTICED EVER
[21:13:05] <Dark_Shikari> they only noticed when it went 10x slower
[21:13:13] <BBB> haha :)
[21:13:18] <Dark_Shikari> seriously
[21:13:20] <Dark_Shikari> we must make it error out
[21:13:23] <Dark_Shikari> or nobody will ever notice
[21:13:28] <Dark_Shikari> they will just think ffmpeg is slow as fuck
[21:13:30] <BBB> I'm fine with an error as long as you can disable the error specifically (like --disable-yasm)
[21:13:45] <wbs> j0sh_: would you mind rerolling the whole series? some of the later parts don't apply on top of the modified versions you posted later
[21:14:29] <Dark_Shikari> BBB: ok, yes
[21:17:58] <mru> Dark_Shikari: suggest on the ML
[21:18:12] <mru> I can't make that change without some agreement
[21:20:22] <Dark_Shikari> done
[21:23:05] <mru> hmm, wonder how many fate machines that will bring down
[21:23:21] <mru> does openbsd have yasm?
[21:23:26] * mru smells a contradiction
[21:23:38] <j0sh_> wbs: ahh ok
[21:24:02] <mru> the fate one does
[21:24:04] <Honoome> hmm my demangler is taking form.. this stuff is damn crap though, if I may say so :P
[21:24:22] <mru> Honoome: can't you just steal one somewhere?
[21:24:31] <j0sh_> wbs: lemme test tomake sure they all compile cleanly, one after another
[21:24:43] <Honoome> mru: hard to do given I want to implement it in pure Ruby for consistency
[21:25:05] <Honoome> plus I'm very much curious to see how crazy C++ mangling is ;)
[21:25:12] <mru> dig your own grave if you want...
[21:25:17] <mru> I won't stop you
[21:28:37] <Dark_Shikari> mru: openbsd having yasm would be problematic for them
[21:28:43] <Dark_Shikari> that would be a way for people to compile ssse3 on bsd!
[21:28:45] <Dark_Shikari> and they can't have that.
[21:28:52] * Dark_Shikari grumbles about 7-year-old binutils
[21:29:26] <mru> it's a posix/elf system so it's rather hard to stop someone running yasm
[21:30:06] <iive> Dark_Shikari: do you mean linking would fail if there is ssse3 code in object file?
[21:30:09] <Dark_Shikari> no
[21:30:14] <Dark_Shikari> I mean their as can't handle it
[21:30:19] <Dark_Shikari> because it's 7 years old
[21:30:27] <Dark_Shikari> and they won't upgrade because something something lazy something something gplv3
[21:30:40] <iive> it would corrupt xmm registers on task-switch?
[21:30:44] <Dark_Shikari> no
[21:30:50] <Dark_Shikari> that would be if they weren't sse-aware at all
[21:31:18] <iive> i'm not familiar with ssse3, they may have doubled the registers :)
[21:31:25] <Dark_Shikari> it's just new instructions
[21:31:27] <Dark_Shikari> not modifying old instructions
[21:31:33] <Dark_Shikari> doubling the registers requires modifying instructions
[21:31:40] <mru> not old ones
[21:31:54] <mru> you could add more registers only accessible by new instructions
[21:31:59] <iive> or using prefix
[21:32:00] <Dark_Shikari> True
[21:32:06] <Dark_Shikari> oh god not more prefix notation
[21:33:47] <j0sh_> wbs: done
[21:34:35] <roxfan> isn't the new xsave buffer dynamically sized or something?
[21:34:48] <j0sh_> Honoome: a demangler?
[21:34:57] <Honoome> j0sh_: yup.. yes I'm crazy
[21:35:17] <j0sh_> what is a demangler?
[21:35:25] <mru> inverse of a mangler
[21:35:36] <j0sh_> ahhh, of course :)
[21:35:58] <Dark_Shikari> BBB: so let's stop being lazy and finish up that asm :)
[21:37:06] <j0sh_> i'm never sure if you guys are being sarcastic or not
[21:37:07] <Honoome> j0sh_: c++filt is a demangler :P
[21:37:23] <mru> j0sh_: assume we are
[21:37:33] <j0sh_> heh
[21:38:07] <Dark_Shikari> mru: I get the feeling that trying to write a yasm conversion script would be an equivalent task to writing yasm
[21:38:18] <Dark_Shikari> (i.e. to convert yasm to Something Weaker Than Yasm)
[21:38:28] <mru> write a gas backend for yasm
[21:38:36] <Dark_Shikari> Which is basically writing yasm.
[21:38:45] <mru> no, that's writing part of yasm
[21:38:46] <Dark_Shikari> Just as writing a java->c convert is basically like writing a java compiler.
[21:38:49] <Dark_Shikari> *converter
[21:38:50] <mru> the other part is already written
[21:38:52] <Dark_Shikari> ok, it's writing the frontend.
[21:38:57] <Dark_Shikari> But assemblers are easy
[21:39:12] <Dark_Shikari> I mean, converting instructions to asm is downright trivial
[21:39:22] <Dark_Shikari> parsing a complicated macro language is harder
[21:39:23] <mru> there's more than that to an assembler
[21:40:29] * j0sh_ wrote an assembly language for whitespace once...
[21:41:07] <j0sh_> (that was to test my whitespace interpreter, of course)
[21:41:09] <j0sh_> http://en.wikipedia.org/wiki/Whitespace_(programming_language)
[21:44:56] <wbs> lu_zero: ok with the latest version that j0sh_ posted?
[21:50:18] <twnqx> i'm a bit stumped at the code i just read (libavformat/mpeg.c)
[21:50:25] <twnqx> if (!((startcode >= 0x1c0 && startcode <= 0x1df) ||
[21:50:26] <twnqx> (startcode >= 0x1e0 && startcode <= 0x1ef) ||
[21:50:26] <twnqx> (startcode == 0x1bd) || (startcode == 0x1fd)))
[21:50:36] <twnqx> couldn't one aggregate the first two checks into one?
[21:53:15] <mru> I guess it's for clarity
[21:54:23] <twnqx> mh
[21:54:50] <mru> cx is audio and ex is video
[21:55:09] <twnqx> and dx?
[21:55:15] <mru> also audio
[21:55:18] <twnqx> ah
[21:55:19] <mru> same range
[21:57:36] <mru> anyone know where twinvq samples are?
[21:58:46] <mru> Vitor1001: ^^
[21:59:53] <Vitor1001> http://samples.mplayerhq.hu/vqf/
[22:00:23] <Vitor1001> I suggest http://samples.mplayerhq.hu/vqf/luckynight/
[22:01:02] <mru> suggest for what purpose?
[22:02:15] <Vitor1001> For any testing.
[22:02:41] <Vitor1001> The complete set of files tests all the possible modes of the decoder
[22:03:40] <Vitor1001> mru: Just out of curiosity, why are you interested in twinvq?
[22:04:30] <mru> vla killing
[22:06:42] <Honoome> and this is interesting..
[22:06:46] <BBB> Dark_Shikari: yes yes, I told you, this weekend it'll be done
[22:06:55] <BBB> Dark_Shikari: I'm a bit slow today b/c of the football match and work
[22:07:01] <Honoome> I write a function as "void function(const unsigned char *a) {}"
[22:07:12] <Honoome> c++filt reports it as "function(unsigned char const *a)"
[22:07:25] <Honoome> while my own demangler reports it as I expected it to..
[22:07:49] <mru> the position of const doesn't matter
[22:08:45] <Honoome> I know, but it's a bit silly mostly because the mangling rules mangle the const _before_ the type
[22:08:54] <Honoome> so I was expecting it to do the logical thing and emit it as prefix
[22:10:00] <mru> c++ is not logical
[22:11:10] <Dark_Shikari> BBB: ok, I'm writing intra pred now
[22:11:12] <Honoome> true
[22:14:01] <BBB> great
[22:14:07] <BBB> hey wait, that one is easy
[22:14:09] <BBB> leave some for me...
[22:14:15] <BBB> maybe I'm too slow
[22:14:21] <BBB> go do the loopfilter
[22:14:24] <BBB> that's much more difficult
[22:14:31] * BBB goes home
[22:14:43] <CIA-99> ffmpeg: mru * r23763 /trunk/tests/ (15 files in 2 dirs): fate: add vp8 tests
[22:15:03] <Dark_Shikari> BBB: I already did intra pred in x264
[22:15:06] <Dark_Shikari> so this'll be bonus easy
[22:16:01] <mru> Vitor1001: patch for your reviewing pleasure
[22:16:58] <mru> he ran away
[22:19:21] <spaam> go after him mru
[23:03:19] * Honoome founds the mangler backreferences and cries
[23:03:21] <Dark_Shikari> collateral damage of vp8 optimizations: h264 gets faster
[23:03:30] <mru> lol
[23:03:40] <Dark_Shikari> 16x16 intra pred == h264 intra pred
[23:03:42] <Dark_Shikari> so...
[23:03:57] * mru did those in neon ages ago
[23:04:00] <Dark_Shikari> yeah
[23:04:04] <Dark_Shikari> I've done them in x264
[23:04:11] <Dark_Shikari> I'm just doing them again here, where things are a tad different
[23:04:14] <Dark_Shikari> specifically, stride is variable.
[23:07:07] <Dark_Shikari> 10 16x16 intra pred functions done. now I have to figure out what the fuck TM pred does
[23:08:19] * peloverde just got an ADIF feature request, strange
[23:08:33] <Dark_Shikari> It seems to do row(N+1) = row(N) + leftpixel(N+1) - leftpixel(-1)
[23:08:47] <mru> adif... haven't heard that mentioned in a long, long time
[23:09:01] <Dark_Shikari> oh, no, it does row(N) = row(-1) + leftpixel(N) - leftpixel(-1)
[23:09:06] <Dark_Shikari> Hmm. That's actually pretty cool.
[23:09:31] <Dark_Shikari> so that's saturateub(row(-1) + SPLAT(leftpixel(N)-leftpixel(-1)))
[23:09:45] <Dark_Shikari> ah fuck. byte saturation.
[23:09:51] <Dark_Shikari> we get to add 9-bit signed bytes to AGHHHHHHHHHHHHH
[23:10:14] <kierank> someone is going to write an mvc decoder for ffmpeg it seems
[23:10:23] <Dark_Shikari> "write an mvc decoder"?
[23:10:26] <Dark_Shikari> mvc is almost the same as avc
[23:10:32] <Dark_Shikari> the only thing that differs is ref frame handling basically
[23:10:39] <Dark_Shikari> I really hope they don't replicate all the code
[23:10:48] <kierank> ok then...adapt the current h.264 avc decoder to decode h.264 mvc streams
[23:10:59] <Dark_Shikari> No, I had a serious point there -- people love contributing new code
[23:11:01] <Dark_Shikari> even if it's total shit
[23:11:09] <Dark_Shikari> *cough* h264 encoder
[23:11:15] <kierank> I will tell the guy that then
[23:11:29] <Dark_Shikari> yes, make sure that he integrates it
[23:11:33] <Dark_Shikari> as opposed to contributing replicated code
[23:11:37] <mru> 9-bit data... designed for pdp (tm)
[23:11:54] <Dark_Shikari> mru: well it's the same as idct_dc
[23:11:59] <Dark_Shikari> you have a 9-bit signed offset to apply to 8-bit pixel data
[23:12:04] <lu_zero> Dark_Shikari: the h264 encoder is an educational project iirc
[23:12:12] <kierank> making ffmpeg handle dependent substreams is more complicated though
[23:12:18] <Dark_Shikari> yeah
[23:14:15] <mru> 14 vla-killing patches pending...
[23:14:29] <mru> this never ends...
[23:14:57] <mru> the ones in dsputil_mmx.c scare me
[23:14:57] <Dark_Shikari> lol
[23:15:04] <Dark_Shikari> isn't it sad
[23:15:59] <mru> why do people even bother with a vla when the size is guaranteed to be 1 or 2?
[23:16:36] <mru> gaaaah, uint8_t edge_buf[(h+1)*stride];
[23:16:58] <Dark_Shikari> WELCOME TO STRIDEWORLD
[23:17:00] <Dark_Shikari> ENJOY YOUR STAY
[23:17:03] <mru> lol
[23:17:08] <Dark_Shikari> SPONSORED BY BBB
[23:17:17] <spaam> \o/
[23:17:29] <mru> hey, the next one is easy
[23:17:36] <mru> ac3
[23:17:42] <mru> how many channels can it have?
[23:18:00] <mru> surely no more than 9
[23:18:30] <mru> bet eeek that function is ugly
[23:18:36] <mru> ac3_downmix_sse
[23:18:48] <mru> if(in_ch == 5 && out_ch == 2 && !(matrix_cmp[0][1]|matrix_cmp[2][0]|matrix_cmp[3][1]|matrix_cmp[4][0]|(matrix_cmp[1][0]^matrix_cmp[1][1])|(matrix_cmp[0][0]^matrix_cmp[2][1])))
[23:18:53] <mru> parse that!
[23:18:54] <Dark_Shikari> ahahahahahhah
[23:20:03] <mru> #define AC3_MAX_CHANNELS 6
[23:20:06] <peloverde> Speaking of downmix, does anyone have any AAC samplus with Dolby Pulse Downmix info?
[23:20:11] <mru> is that still true for eac3?
[23:20:19] <mru> and that other extension someone sent a patch for
[23:20:39] <peloverde> eac3 can do 7.1 I think, I don't know if we support it
[23:21:01] <peloverde> eac3 can do 13.1 according to wikipedia
[23:21:35] <mru> we support eac3 allegedly
[23:23:40] <peloverde> That's what I've been told
[23:25:18] <kierank> [00:20] <@peloverde> Speaking of downmix, does anyone have any AAC samplus with Dolby Pulse Downmix info? --> afaik there are no deployments
[23:25:33] <kierank> Possibly freeview HD but there are no pc receivers yet
[23:26:13] <Compn> you guys want more eac3 samples, get mplayer to apply some bluray patches :P
[23:26:32] <Compn> or review/commit yourself...
[23:31:21] <mru> the downmix matrix has AC3_MAX_CHANNELS rows
[23:31:34] <mru> so it should never be used with the extended channels
[23:32:22] <kierank> [00:26] <@Compn> you guys want more eac3 samples, get mplayer to apply some bluray patches :P --> dd+ is rare on blu-ray
[23:32:26] <kierank> common on hd-dvd
[23:32:45] <mru> #undef AC3_MAX_CHANNELS
[23:32:45] <mru> #define AC3_MAX_CHANNELS 7
[23:32:56] <mru> /* override ac3.h to include coupling channel */
[23:32:59] <mru> wtf???
[23:33:13] <peloverde> DD+ on bluray only supports up to 8 channels
[23:33:32] <peloverde> Is that like a CCE?
[23:33:50] <mru> I can't answer that
[23:33:54] <mru> but that's not the point
[23:34:06] <mru> defining the same thing differently in two places is beyond evil
[23:34:45] <peloverde> weren't you aware, the ac-3 decoder is part of a nefarious C contest
[23:35:32] <Dark_Shikari> mru: whenever you try to eliminate one bit of ugliness, deep inside the bowels of ffmpeg
[23:35:37] <Dark_Shikari> it always uncovers far worse evil.
[23:36:01] <mru> the last couple of days I've been poking in some of the oldest parts
[23:36:08] <Honoome> Dark_Shikari: reminds me of feng..
[23:36:43] <mru> there was one codec written by mike for xine, ported to mplayer by arpi, and ported from there to lavc by nick kurshev
[23:36:57] <Dark_Shikari> damn, arpi, that's like before recorded history
[23:37:00] <mru> if that's not asking for trouble
[23:37:28] * Dark_Shikari sees "corrupted stack"... hmm, ok, my asm looks wrong.
[23:38:30] * peloverde misses the days when broken ogg continutaions caused stack corruption in the theora decoder
[23:39:01] <kierank> what happens if you're VLA has a length of -1?
[23:39:04] <kierank> your*
[23:39:23] <twice11> I expect that to be undefined behavour.
[23:39:26] <Dark_Shikari> probably.
[23:39:31] <twice11> Isn't the array size specified as size_t?
[23:39:49] <Dark_Shikari> you know, mru, to really get rid of libvpx
[23:39:51] <twice11> So its not -1, but 2^32-1 or even 2^64-1
[23:39:56] <Dark_Shikari> we'll need to have ffmpeg's vp8 decoder be fast on arm...
[23:40:17] <mru> kierank: you die
[23:40:45] <mru> signed or unsigned, dreadful things happen
[23:41:07] <twice11> The total array size is bigger than representable in a size_t for objects with sizeof() > 1 -> undefined behaviour
[23:41:15] <mru> Dark_Shikari: I or yuvi will for sure do the neon
[23:41:34] <twice11> And you exceed an implementation limit (maximum object size on the stack) for objects with sizeof() == 1 -> undefined behaviour.
[23:42:05] <twice11> And finally: undefined behaviour -> you die. Mans is right.
[23:42:20] <Dark_Shikari> mru: ok
[23:42:46] <mru> or demons fly from your nose
[23:43:23] <mru> what the fuuuuuuuuuuuck?
[23:43:38] <mru> mmx etc float_to_int16_interleave uses a temp buffer!!!
[23:43:43] <mru> on stack
[23:43:51] <mru> same size as output
[23:43:56] <Dark_Shikari> lol
[23:44:21] <mru> how is that even possible?
[23:44:37] * peloverde mumbles somethingsomething planar float ouput
[23:44:40] <Dark_Shikari> there's no security risk there
[23:45:13] <kierank> ffmpeg should have a proper "week of cleanup"
[23:45:24] <Dark_Shikari> month
[23:45:32] <kierank> gsoc
[23:45:33] <mru> kierank: I'm trying my best
[23:45:35] <Dark_Shikari> actually
[23:45:36] <kierank> google summer of cleanup
[23:45:37] <Dark_Shikari> here's an idea
[23:45:50] <Dark_Shikari> All developers are prohibited from committing new patches until they do X patches of cleanup
[23:45:54] <Dark_Shikari> Or, equally
[23:46:03] <Dark_Shikari> they must do X patches of cleanup for every Y patches that aren't
[23:46:04] <mru> my new approach is do a cleanup pass, then add -Werror=foo to stop it happening again
[23:46:15] <peloverde> Could we try to get some ghop students to do cleanup?
[23:46:26] <Dark_Shikari> I don't think making the interns do the cleanup is the best idea.
[23:46:31] <Dark_Shikari> developers should be made to clean up their own mess.
[23:46:40] <mru> the problem is that students can't do it properly
[23:46:56] <mru> it has to be done by someone who knows the ins and outs of ffmpeg
[23:47:06] <peloverde> A lot of the messes were left by the ancients
[23:47:41] <mru> yes, but they're gone
[23:48:05] <Dark_Shikari> the precursors
[23:48:52] <peloverde> Jak and Daxter I?
[23:49:09] <Dark_Shikari> No, the precusor would be 0, obviously.
[23:49:13] <Dark_Shikari> =p
[23:55:41] * j0sh_ has been cleaning up mpeg4/aac from rtsp...
[23:56:21] <Dark_Shikari> new patch for vp8 sent, with intra pred asm (for h264 as well)
1
0
[00:00:11] <Honoome> mru: http://paste.pocoo.org/show/228757/ sorted in .bss size, ascending ;)
[00:00:59] <mru> feel free to make them smaller
[00:01:44] <Honoome> I guess the only way would be to write more hardcoded tables, thus have more of them generate their hardcoded tables :P
[00:02:28] <mru> which is usually even worse
[00:02:41] <mru> except on mmu-less systems
[00:02:50] <michaedw> mru: I work for this person: http://us.experteer.com/job_catalog/job/358969
[00:03:13] <mru> ugh, a suit
[00:03:44] * mru works for himself
[00:03:46] <Honoome> mru: well it reduces the per-process resident memory, for sytsems with <=4GB of RAM and the amount of processes that load ffmpeg nowadays, it's not too bad
[00:03:49] <michaedw> she's a couple notches up from the typical Cisco suit, at least from my interactions with her so far
[00:04:07] <michaedw> ok, you got me there. I work for her, insofar as I work for anyone. :-)
[00:04:17] <mru> Honoome: most ffmpeg runs will use only a small number of codecs
[00:06:16] <mru> reminds me, my contract with ARM was signed by the CEO
[00:06:23] <mru> I wasn't expecting that
[00:06:51] <kierank> mechanical pen ;)
[00:07:16] <mru> two documents, not the exact same signature
[00:08:30] * mru deletes some more cruft
[00:09:11] <CIA-99> ffmpeg: mru * r23731 /trunk/ (configure libavcodec/os2thread.c libavcodec/Makefile):
[00:09:11] <CIA-99> ffmpeg: Remove OS/2 threads support
[00:09:11] <CIA-99> ffmpeg: OS/2 SMP support is rare, and a pthreads library exists.
[00:09:11] <CIA-99> ffmpeg: No need to keep this code.
[00:11:43] <Dark_Shikari> michaedw: fyi
[00:12:00] <Dark_Shikari> michael offered to do that stupid baseline feature... what was it
[00:12:01] <Dark_Shikari> FMO
[00:12:03] <Dark_Shikari> for $10k
[00:12:42] <Kovensky> that's how much he wants it done?
[00:13:26] <Dark_Shikari> no, that's how much he wants to do it
[00:13:37] <Dark_Shikari> *how much money that he wants in order to do it
[00:13:42] <Dark_Shikari> someone asked him for FMO
[00:14:03] <Kovensky> the more he asks, the more he doesn't want to do
[00:14:11] <Kovensky> :>
[00:14:39] <spaam> FMO?
[00:14:41] <Kovensky> if he actually wanted to do it he'd have done it already :)
[00:14:47] <mru> flexible macroblock ordering
[00:14:51] <spaam> ok :)
[00:15:14] <michaedw> that would also be interesting
[00:15:18] <Honoome> from the name one could guess why he might not want to do that... the "flexible" sounds like a buzzword
[00:15:26] <Kovensky> and it's a baseline-only feature
[00:15:52] <michaedw> my impression is that there's more value in GDR
[00:15:58] <mru> what's the point of having features baseline-only?
[00:16:00] <michaedw> but I haven't really applied science to it yet :-)
[00:16:18] <Dark_Shikari> it lets you have slices of arbitrary shape
[00:16:19] <Dark_Shikari> (in MBs)
[00:16:30] <Dark_Shikari> useful application (would be more useful if not baseline-only): rectangular slices
[00:16:33] <Dark_Shikari> i.e. 4 squares
[00:16:48] <Dark_Shikari> to maximize (area) / (perimeter)
[00:17:07] <mru> I can see some use for fmo as such
[00:17:10] <michaedw> it's in extended also
[00:17:14] <michaedw> just not in main
[00:17:16] <Dark_Shikari> extended doesn't exist
[00:17:17] <mru> but why baseline only?
[00:17:28] <Dark_Shikari> mru: to make main/high less stupidly complicated
[00:19:23] <michaedw> I'm far from expert in this area; I'm mostly just looking for low-hanging fruit relative to where we are today
[00:19:50] <Dark_Shikari> the h264 decoder has been optimized pretty heavily
[00:20:02] <michaedw> preferably things that can be tried, on an experimental basis, without mucking about in the hardware acceleration we've got
[00:21:34] <Dark_Shikari> You could come try to optimize x264 as well, though we've got that one locked up pretty tight optimization-wise =p
[00:21:42] <michaedw> that's where the data partitioning recoder idea came from; unpeel the bytestream we've got down to the macroblocks, split up according to the DP scheme, apply UEP
[00:23:18] <michaedw> the experiment's been done, with a crude but not wholly irrelevant metric: http://users.elis.ugent.be/~pbertels/zijspoor/phd/2006_a082_Stefaan_Mys.pdf
[00:23:34] <mru> http://article.gmane.org/gmane.comp.video.ffmpeg.devel/104238
[00:24:16] <Dark_Shikari> there's also that one I posted a while back
[00:24:35] <michaedw> that's good guidance, thank you
[00:24:59] <mru> http://thread.gmane.org/gmane.comp.video.ffmpeg.devel/79246
[00:26:40] <michaedw> mm, the golomb simplification is interesting
[00:27:01] <michaedw> especially since in DP all the golomb is in DP A
[00:27:12] <Dark_Shikari> that was already done, it's not as good as I thought
[00:27:15] <Dark_Shikari> because I was thinking encoder-side
[00:27:19] <astrange> if x264 split loop filter strength and actual filtering, port that
[00:27:29] <Dark_Shikari> ffmpeg already does that
[00:27:32] <Dark_Shikari> loop_filter_fast
[00:27:33] <Dark_Shikari> oh
[00:27:40] <astrange> skal's idea
[00:27:43] <Dark_Shikari> oh actually that change is huge, yeah, that should be #1 priority
[00:27:48] <Dark_Shikari> skal's idea is bloody brilliant
[00:28:09] <michaedw> link?
[00:28:42] <Dark_Shikari> it's simple:
[00:28:55] <Dark_Shikari> 1) calculate deblock strength while the nnz, mvs, etc are still in the cache structure
[00:29:09] <Dark_Shikari> (this requires some reloading of the cache if deblock-across-slice-edges is on, and we're on a slice edge)
[00:29:14] <Dark_Shikari> 2) store the results
[00:29:23] <Dark_Shikari> 3) once the row is done, deblock it using the stored strength values
[00:29:32] <Dark_Shikari> currently, we deblock per-row in ffmpeg, but it does the strength calculation per-row too
[00:29:37] <Dark_Shikari> instead of doing it when the values are still in the cache
[00:30:06] <Dark_Shikari> this is a bit tricky with all the stupid rules in h264.
[00:31:20] <michaedw> hmm, that's interesting; even more interesting to me is the instrumentation for measuring the effect
[00:31:32] <Dark_Shikari> rdtsc?
[00:31:37] <Dark_Shikari> instrumentation is easy
[00:31:48] <astrange> grep START_TIMER
[00:32:05] <michaedw> specifically, the cache behavior
[00:33:12] <Dark_Shikari> well, this saves some cache, but that's not the primary purpose
[00:33:12] <michaedw> we have other crap going on in the system, and it may be worth going to some effort to keep ffmpeg (if we go that route) from being evicted when it's working with that cache structure
[00:33:15] <Dark_Shikari> here "cache" does not mean "CPU cache"
[00:33:19] <michaedw> understood
[00:33:36] <Dark_Shikari> this would be a win on a CPU with 10 gigs of L1 cache
[00:34:04] <mru> can I have one of those?
[00:34:22] <Dark_Shikari> mru: easy. make a cpu with no cache, attach to 10 gigs of ram
[00:34:26] <Dark_Shikari> the ram is now the first-level cache ;)
[00:34:27] <Honoome> rotfl
[00:34:59] <mru> so I just need to disable the caches on my i7
[00:35:09] <astrange> hm, patch on the gcc list from amd today uses dct_unquantize_h263_inter_c as the testcase
[00:35:10] <Dark_Shikari> and add a few more gigs of ram
[00:35:17] <Dark_Shikari> astrange: link?
[00:35:21] <mru> astrange: link?
[00:35:23] <Dark_Shikari> Oh, THAT function
[00:35:26] <Dark_Shikari> wasn't that the one that broke AVR32?
[00:35:28] <mru> yep
[00:36:42] * Honoome is tempted to implement FIONREAD syscall for SCTP tonight
[00:37:05] <Honoome> too bad that I can't unload sctp module for some reason =_=
[00:37:28] <mru> open socket?
[00:37:31] <michaedw> We've got some folks here who are spinning up on measurement techniques on embedded Linux and need good examples to live through
[00:37:46] <astrange> http://article.gmane.org/gmane.comp.gcc.devel/115166/ can't find the patch thread on gmane yet
[00:37:58] <astrange> and they must have turned tree-vectorize back on?
[00:38:25] <mru> turning it on to work on gcc is excusable
[00:38:25] <Honoome> mru: feng is closed and I don't think anything else uses sctp on my system
[00:38:32] <mru> Honoome: netstat
[00:38:41] <Dark_Shikari> >dct kernel
[00:38:54] <Honoome> I guess I could just set up a slackware vm to work on the kernel ...
[00:38:55] <mru> that's not a dct
[00:39:08] <michaedw> be aware that netstat tells you some process that has the socket open, not necessarily the only one
[00:39:12] <Dark_Shikari> it has DCT in the name!11!
[00:39:21] <Dark_Shikari> Fuck, I'm way too used to using pmaddubsw
[00:39:24] <Dark_Shikari> I start writing an sse2 MC function
[00:39:29] <michaedw> we had fun recently with missing O_CLOEXEC flags
[00:39:29] <Dark_Shikari> and I inadvertantly write an ssse3 one instead
[00:39:35] <Dark_Shikari> without even realizing it
[00:39:40] <Honoome> netstat reports no sctp sockets open.. WTH
[00:39:57] <mru> lsmod shows non-zero use count?
[00:40:04] <Honoome> yah
[00:40:16] <mru> no depending modules?
[00:40:36] <mru> hmm... intriguing
[00:40:49] <mru> on avr32 it did move the mult out of the branch
[00:40:58] <mru> that's how it triggered the hw bug
[00:42:02] <astrange> hmm it would be better to put block[i] = level in both arms of the if/else instead of afterwards
[00:42:18] <Dark_Shikari> It might be better to just remove/restore sign
[00:42:22] <astrange> decode_cabac_residual does that and it saves one branch (since they can't both fallthrough to it)
[00:42:31] <Dark_Shikari> pabsw/psignw
[00:43:41] <michaedw> Dark_Shikari: can x264 do constrained intra prediction?
[00:43:59] <Dark_Shikari> yes
[00:44:02] <Dark_Shikari> --constrained-intra
[00:44:03] <Dark_Shikari> read the help
[00:45:15] <michaedw> is that relevant only when doing GDR?
[00:46:03] <Dark_Shikari> ?
[00:46:19] <michaedw> sorry, I guess this is more #ffmpeg material
[00:46:32] * Honoome laughs because GDR in Italian means "RPG" :P
[00:46:34] <Dark_Shikari> I have no idea what GDR is.
[00:46:36] <Dark_Shikari> Feel free to ask here
[00:46:40] <Dark_Shikari> I'm just wondering wtf you're on about.
[00:47:28] <mru> this a typical case of gcc optimisation fragility
[00:47:40] <mru> many of their optimisations seem to be greedy
[00:48:19] <mru> so a small optimisation can inhibit a much better one just because it happes to be applied first
[00:49:15] <michaedw> "gradual decoding refresh", and actually it's not quite what I meant; I meant to say, constrained intra is only relevant when you have frames with mixed I and P slices, right?
[00:49:24] <saintdev> mru: kind of sounds like what you were saying about MN yesterday :P
[00:50:24] <Dark_Shikari> michaedw: no, frames with mixed I and P blocks
[00:50:25] <Dark_Shikari> i.e. every frame
[00:50:31] <Dark_Shikari> saintdev: hahahahah
[00:50:36] <Dark_Shikari> michael is a greedy optimization ;)
[00:51:49] <michaedw> right, so not relevant for IDRs, which is kind of what I meant in the first place :-P
[00:53:12] <j0sh_> i've only seen gdr in video calls. makes sense that it'd be used there
[00:53:13] * Honoome hates on libvirt, virt-manager, redhat, ...
[00:57:17] <mru> Honoome: maybe you'll like this :-) http://rwmj.wordpress.com/
[00:57:37] <Honoome> yes I know Rich
[00:57:58] <michaedw> j0sh_: low-latency streaming generally
[00:58:08] <michaedw> IDR frames are pretty big lumps
[00:58:15] <j0sh_> yup
[00:58:24] <Dark_Shikari> it's called intra refresh
[00:59:25] <michaedw> Dark_Shikari: sure; I'm just looking for a mode that my hardware codec can do and I can still post-process into a DP stream
[00:59:43] <Dark_Shikari> hardware codec? why use a hardware codec and then try to postprocess it?
[00:59:48] <Dark_Shikari> why not just use a good encoder to begin with?
[01:00:04] <michaedw> Dark_Shikari: that's the point that I'm trying to prove
[01:00:19] <Dark_Shikari> ?
[01:00:38] <michaedw> first, what's achievable without re-plumbing things so that we can use a current codec
[01:00:52] <michaedw> and second, how much more is achievable without that hobble
[01:00:53] <Dark_Shikari> I don't see how data partitioning comes into this
[01:00:58] <michaedw> FEC
[01:01:11] <Dark_Shikari> The whole poitn of FEC is so you don't need DP.
[01:01:26] <michaedw> unequal degrees of FEC on different data partitions
[01:01:38] <mru> with dp you can apply more error correction to the important parts
[01:02:01] <michaedw> right -- and pick different overhead/latency trade-offs
[01:02:27] <Dark_Shikari> Yeah, except for that whole "intra prediction means that dct coeffs become the important part"
[01:02:41] <Dark_Shikari> er, latency? you can't have more latency on some parts of the frame than others
[01:02:44] <Dark_Shikari> that doesn't make sense
[01:02:51] <Dark_Shikari> you can't decode a frame until you have all the parts: coeffs, modes, etc
[01:03:20] <Dark_Shikari> SVC seems like a much better option if you're trying to come up with reasonable FEC strategies for 40% loss rates
[01:03:33] <michaedw> DP A is small enough that we can probably afford to spend bandwidth on FEC over small sets
[01:03:33] <Dark_Shikari> An equally good option would be taking your connection and burning it with napalm
[01:03:53] <michaedw> the problem isn't the overall loss rate, it's the burstiness
[01:04:06] <Honoome> ... oh god I've done the worst thing I could do...
[01:04:27] <Dark_Shikari> if loss is bursty, it can be resolved with interactive encoder control
[01:04:32] <mru> Honoome: that's good, now it can only get better
[01:04:43] <michaedw> partition C is the last to arrive, and the cheapest to drop on the floor
[01:04:49] <Honoome> mru: I decided to watch Apple's WWDC keynote...
[01:05:01] <mru> Honoome: where did you get such a crazy idea?
[01:05:20] <Dark_Shikari> michaedw: why not use SVC if you're trying to do that?
[01:05:20] <Honoome> there's the zynga guy... now I feel a primal instinct to kill all the people who waste their time and money to such a dolt...
[01:05:24] <Dark_Shikari> that's actually built for that
[01:05:41] <mru> Honoome: I don't watch _any_ keynotes
[01:05:44] <Honoome> it's a matter of species improvement... my instinct i mean
[01:05:47] <michaedw> turns out it's not very useful for talking heads, at least from what I've read / been told
[01:06:08] <Honoome> mru: I generally tend to, good thing to know what they try to shove down your throat and _how_ they do it...
[01:06:31] <mru> I ignore apple as much as possible
[01:06:54] <Dark_Shikari> michaedw: what isn't
[01:06:58] <michaedw> SVC
[01:07:03] <Honoome> mru: I would switch over to apple development rather than rails development
[01:07:06] <Dark_Shikari> for talking heads, I'd use interactive encoder control
[01:07:10] <Dark_Shikari> if you have a drop in frame N
[01:07:15] <mru> Honoome: I refuse to do either
[01:07:15] <Dark_Shikari> tell the encoder to invalid all frames N and afterwards
[01:07:18] <Dark_Shikari> and use a reference frame from before N
[01:07:24] <mru> if I can't do it in linux, I don't do it
[01:08:00] <michaedw> apple has been know to raise the bar for fit-and-finish, which is healthy
[01:08:10] <mru> bull
[01:08:22] <Honoome> wish I could do as much :/ not good enough to pretend though
[01:08:22] <Dark_Shikari> >apple
[01:08:24] <Dark_Shikari> >raise the bar
[01:08:27] <Dark_Shikari> *headdesk*
[01:08:44] <mru> who declared copy&paste unnecessary?
[01:08:49] <mru> .. only to backtrack later
[01:08:55] <Dark_Shikari> who declared high quality video encoding unnecessary?
[01:08:56] <mru> who declared multitasking useless?
[01:09:03] <mru> ... only to backtrack
[01:09:11] <mru> who uses 20-year old dev tools?
[01:09:16] <mru> and has yet to update
[01:09:16] <Dark_Shikari> 22!
[01:09:23] <michaedw> and the sheer longevity of their hardware is impressive; the freebie iPod mini I gave my daughter is still going strong
[01:09:28] <astrange> the new dev tools are called llvm-mc
[01:09:37] <Kovensky> <@mru> who declared copy&paste unnecessary? <-- didn't microsoft do it for winmo7 too
[01:09:40] <Dark_Shikari> the ipod mini hasn't been around long enough to have "longevity"
[01:09:41] <mru> who charges developers money to code for their platform?
[01:09:53] <Dark_Shikari> tell me about longevity when it's still around in 2035
[01:10:07] <Dark_Shikari> until then, I'll enjoy the imacs in our computer lab
[01:10:11] <Dark_Shikari> which overheat and crash almost daily
[01:10:18] <Dark_Shikari> you can burn yourself by touching their case
[01:10:26] <michaedw> 6-year-old spinning rust handled daily by a 5-year-old
[01:10:58] <Honoome> mru: technically, Microsoft as well..
[01:11:11] <michaedw> doesn't compare with the RS6000 gear I parted with recently, but this is consumer electronics
[01:11:19] <Dark_Shikari> my TI-92 lasted longer than that.
[01:11:20] <mru> but who was f1rst with the "innovation"?
[01:12:05] <michaedw> they do have some of the stupidest billboards in creation, I grant
[01:12:46] * Dark_Shikari can't keep track of left/right-shifts of entire registers in simd on little-endian
[01:12:53] <Honoome> mru: and you forgot "who declared tethering useless (and then backtrack)"?
[01:13:16] <Dark_Shikari> who declared unlimited data to be great
[01:13:18] <Dark_Shikari> and then backtrack
[01:13:38] <astrange> not apple?
[01:13:42] <mru> google did
[01:13:45] <Honoome> Dark_Shikari: half the mobile telcos on the planet?
[01:13:49] <astrange> there are other countries without at&t
[01:14:03] <mru> android <2.2 doesn't have tethering built-in
[01:14:14] <michaedw> Dark_Shikari: I like my HP-35 :-)
[01:14:14] <mru> it's always been possible with 3rd-party apps
[01:14:48] <michaedw> although I think my brother ran off with it last time he visited
[01:14:49] * mru is currently angry with vodafone for raising data roaming price 5x
[01:15:01] <michaedw> communication is overrated
[01:15:06] <Honoome> mru: android didn't have tethering because iphone didn't in the first place
[01:15:11] <mru> and despite that, they're still the cheapest of the uk operators
[01:15:25] <Honoome> and telcos thought they could fetch more money with hsdpa datacards than phones
[01:15:25] <Dark_Shikari> everyone else raised prices too?
[01:15:38] <Dark_Shikari> also, mru
[01:15:39] <Dark_Shikari> >UK
[01:15:41] <Dark_Shikari> there's your problem
[01:15:49] <mru> all eu is the same
[01:16:28] <Honoome> yeah they increased prices in italy as well
[01:16:30] <Dark_Shikari> I thought UK internet was shit?
[01:16:42] <Dark_Shikari> vs say sweden
[01:16:51] <Dark_Shikari> or for that matter, denmark
[01:16:52] <mru> I'm talking about mobile data
[01:16:57] <verb3k_> vs netherlands
[01:17:02] <Dark_Shikari> or netherlands, yeah
[01:17:11] <michaedw> I expect that android didn't have tethering initially because of the unholy mix of Qualcomm, HTC, T-Mobile, Android, and normal Google engineering staff involved in getting it off the ground
[01:17:14] <Honoome> 3ITA used to have a very good price (€80/mo, 20GB data)
[01:17:14] <mru> specifically when roaming
[01:17:18] * j0sh_ is waiting for wimax in his area...
[01:17:43] <Honoome> now they moved to same price, 2GB data
[01:17:49] <Honoome> or €150 and 20GB
[01:17:57] <mru> hsdpa works great here
[01:18:09] <mru> and I get a good flatrate within the uk
[01:18:28] <mru> as soon as I cross a border they charge £1/MB
[01:18:38] <mru> even in their own networks
[01:19:13] <kierank> yes that is a joke
[01:19:14] <Honoome> wow... at least 3 is still "same price if under another 3 network"
[01:19:17] <michaedw> and because of the odd way that T-Mobile handled the gateway/proxy pool at launch time
[01:19:18] <kierank> when you move from orange uk to orange fr
[01:19:31] <Honoome> but I get an even higher rate if I connect, say, in Switzerland
[01:19:54] <mru> they jacked up the prices now because the eu introduced a cap on call prices
[01:20:06] <mru> so they shifted it to data instead
[01:20:17] * mru curses eu for ruining everything
[01:20:21] <Honoome> on the other hand, I spent a grand total of €50 when I was at FOSDEM... and I used Google Maps extensively
[01:20:30] <Honoome> mru: that's _so_ british of you ;)
[01:20:32] <mru> it was bearable before the cap
[01:20:38] <mru> now I can't afford it
[01:21:04] * Honoome gets slack ... brr
[01:21:14] <kierank> Honoome: the relationship is love-hate
[01:21:20] <Honoome> kierank: with what?
[01:21:22] <kierank> eu
[01:21:27] <mru> why oh why didn't they cap data as well?
[01:21:32] <michaedw> I only half-like my CrackBerry, but around here Verizon has the best coverage (and the only tolerable customer service)
[01:21:50] <Dark_Shikari> that's the difference between verizon and AT&T
[01:21:53] <Dark_Shikari> AT&T is incompetent evil
[01:21:56] <Dark_Shikari> verizon is highly competent evil
[01:21:57] <mru> I trolled the vodafone shop in town the other day
[01:22:01] <Dark_Shikari> both are incredibly evil, but verizon is at least good at it
[01:22:04] <mru> they denied all knowledge a 5x price hike
[01:22:12] <Dark_Shikari> mru: show them your bill?
[01:22:15] <kierank> Dark_Shikari: were bell labs evil?
[01:22:20] <kierank> in your book
[01:22:24] <michaedw> I've seen more competent evil than Verizon :-)
[01:22:26] <Dark_Shikari> kierank: "bell labs"?
[01:22:29] <mru> I pointed them to their own web page
[01:22:37] <Dark_Shikari> Bell was the company
[01:22:40] <michaedw> but they give value for money, at least in my experience
[01:22:45] <Dark_Shikari> And yes, back when it was "ma bell", they were rather evil
[01:22:59] <Honoome> kierank: I live in IT... EU doesn't look _too_ bad after
[01:22:59] <Dark_Shikari> Verizon is evil because they are even more lock-you-down-and-screw-you than AT&T
[01:23:09] <Dark_Shikari> It's just that they actually have good coverage.
[01:23:12] <michaedw> they're fairly evil on the regulatory front; all the baby bells are
[01:23:17] <Dark_Shikari> Whereas AT&T is similar, but with awful coverage.
[01:23:25] <Dark_Shikari> woohoo, vp8 h4 sse2 written.
[01:23:37] <michaedw> ask anyone who worked for, or supplied gear to, a CLEC
[01:24:19] <michaedw> I seem to recall that Verizon has perfectly good pay-as-you-go options too
[01:24:53] <mru> everything phone and net related seems to be total shit in the US
[01:24:58] <peloverde> I get shitty reception in bars (with beer, not units of signal strength) with verizon
[01:25:22] <peloverde> (or when I had verizon)
[01:25:30] <michaedw> Verizon coverage is good enough that I ditched my land line
[01:25:46] <michaedw> bye-bye, godawful SBC customer service
[01:25:50] <michaedw> (it was a while ago)
[01:26:03] <mru> land line is the only viable option for international calls here
[01:26:13] <mru> but you guys don't know what international means
[01:26:15] <Dark_Shikari> s/here/anywhere
[01:26:16] <michaedw> peloverde: that's probably CDMA vs. GSM
[01:26:40] <michaedw> why would you call internationally through the phone system?
[01:26:41] <Honoome> mru: I have international calls in zone 1 at local rates :P
[01:27:11] <mru> and what's zone 1? italy and sicily?
[01:27:20] <michaedw> we call in-laws in Russia that way sometimes, out of sheer laziness
[01:27:21] <Honoome> mru: europe and north america
[01:27:21] <kierank> O2 telephone is quite good
[01:27:26] <kierank> unlimited calls to europe and us
[01:27:34] <michaedw> but mostly we use Skype
[01:27:41] <mru> kierank: land line?
[01:27:45] <kierank> yes landline
[01:27:56] <michaedw> yes, evil, but functional
[01:27:58] <ohsix> eh, you shouldn't have to do anything like that, your distro already does it for you (good ones, anyways)
[01:28:27] <michaedw> sadly, Skype is still far, far more robust over crappy networks than any alternative I've used
[01:28:43] <michaedw> although I fully expect that's temporary
[01:28:44] <mru> ohsix: uh what?
[01:29:02] <michaedw> and have every intention of contributing to making it so
[01:29:04] <Honoome> call internationally I guess
[01:29:34] <kierank> skype video still sucks
[01:29:38] <michaedw> speaking of which, that DP recoder ...
[01:29:48] <mru> video calls suck, period
[01:29:59] <ohsix> misfire
[01:29:59] <michaedw> mru: why?
[01:30:14] <kierank> blocky
[01:30:17] <mru> I simply see _no_ use for them
[01:30:23] <michaedw> they seem to engage my kids much more fully than voice alone
[01:30:33] <mru> kids these days...
[01:30:37] <Dark_Shikari> what are kids
[01:30:49] <Honoome> ohsix: let me guess.. #pulseaudio?
[01:30:54] <mru> Dark_Shikari: annoying, whiny little bastards
[01:30:57] <michaedw> they enjoy the granddad-cam and the cousin-cam
[01:31:16] <Dark_Shikari> mru: no, a miserable little pile of secrets
[01:31:19] <Dark_Shikari> enough talk, have at you
[01:31:33] <michaedw> Dark_Shikari: future sources of good schadenfreude, when they have kids of their own
[01:31:33] <ohsix> Honoome: ya he's going through some ancient wiki page to mess with stuff when his drivers timing is busted
[01:31:51] <Dark_Shikari> michaedw: haha
[01:32:05] <mru> is there such a thing as an up to date wiki?
[01:32:29] <michaedw> which is why I go to some trouble to hook mine up with their granddad; he earned his schadenfreude
[01:32:41] <Honoome> 200kb/s to fetch slackware... I guess I'll kernel-hack another day
[01:32:51] <ohsix> mru: definitely not; some pages should be nuked from orbit hours after they are written :P
[01:33:15] <mru> s/after/before/
[01:33:21] <kierank> Honoome: use another mirror...
[01:33:32] <ohsix> but when you're talking "linux" "help", with adhoc nonsolutions and bodges in the first place; people gravitate to them like they contain real information
[01:33:35] <kierank> the belgian ones are pretty fast
[01:33:37] <mru> mirror, mirror on the wall...
[01:33:57] <Honoome> kierank: torrent, after trying both italian mirrors, then another one at random, and all three failed, I've decided to give the torrents a try :/
[01:34:00] <mru> ohsix: could be worse, could be webfourms
[01:34:14] <Honoome> mru: oh god nooooo
[01:34:37] <ohsix> yea, heh; i have a personal beef with forums, front line wall of noise defense for just about anything you'll find them attached to
[01:34:50] <ohsix> but Live! with lots of people that pile in with no real intention to help
[01:34:57] <mru> where on page 7, someone discovers that the fix actually causes catastrophic damage in some subtle way
[01:35:07] <ohsix> huhu
[01:35:14] <ohsix> reverse topposting
[01:35:47] <Honoome> rotfl
[01:36:07] <mru> "I had that problem too. Just disable unrelated $foo and it'll magically work"
[01:36:40] <mru> "I tried that and now $foo doesn't work either plz hlp"
[01:36:56] <Dark_Shikari> "please send me teh codes"
[01:37:08] <michaedw> back to an earlier question: does the fact that intra_gb_ptr is only used in ff_h264_decode_mb_cavlc() mean that ffmpeg's h.264 decoder doesn't handle the combination of CABAC and data partitioning?
[01:37:16] <ohsix> i deleted some stuff in /lib and it worked
[01:37:17] <Dark_Shikari> michaedw: ffmpeg doesn't do data partitioning
[01:37:22] <Dark_Shikari> period
[01:37:31] <Dark_Shikari> ffmpeg doesn't do anything in baseline that isn't in main
[01:37:39] <ohsix> even with something like apt and dpkg-divert in play; people are still butchering stuff
[01:38:07] <michaedw> it handles the DP nal_unit_types
[01:38:39] <Dark_Shikari> BBB: you have vararrays in your dsp code
[01:38:43] <Dark_Shikari> oh, he's not here.
[01:38:45] <mru> nooooooooooooooo
[01:38:57] <mru> I'm going to kill them all and make it an error
[01:39:06] <Dark_Shikari> uint8_t tmp_arr[stride * (height + TAPNUMY - 1)]
[01:39:12] <Dark_Shikari> fortunately that's easy to fix
[01:39:15] <Dark_Shikari> set it to 16
[01:39:19] <mru> set it to max
[01:39:37] <mru> there's no reason to ever allocate an array on stack smaller than max
[01:39:43] <mru> since you have to cope with max anyhow
[01:40:07] <Dark_Shikari> Actually, in this case, max for width4 is 4
[01:40:09] <Dark_Shikari> max for width8 is 16
[01:40:11] <Dark_Shikari> max for width16 is 16
[01:40:51] <mru> 16 is perfectly acceptable to allocate on stack
[01:40:59] <mru> unconditionally
[01:41:21] <Dark_Shikari> w8/w16/w4 are separate functions
[01:47:02] <michaedw> it looks like the code that manipulates intra_gb_ptr has been there a long, long time
[01:48:40] <Honoome> okay I guess I'll leave this be and read something (if anybody is looking for book suggestions, Jim Butcher's Dresden Files are quite good)
[01:49:30] <michaedw> I'm guessing that michael is the only one who has state on that code
[02:00:11] <mru> Dark_Shikari: why don't you fix those vlas?
[02:01:17] <Dark_Shikari> I did
[02:01:28] <Dark_Shikari> Oh wait
[02:01:30] <Dark_Shikari> Shit, you're right.
[02:01:32] <Dark_Shikari> >stride
[02:01:33] <Dark_Shikari> WHAT THE FUCK
[02:01:33] <Dark_Shikari> WHAT
[02:01:36] <Dark_Shikari> THE
[02:01:38] <Dark_Shikari> FUCK
[02:01:41] <Dark_Shikari> WHAAAAAAAAAAAAAT
[02:02:15] <mru> defuck it please
[02:03:19] <Dark_Shikari> I will
[02:03:20] <Dark_Shikari> one moment
[02:03:29] <mru> thanks
[02:04:13] <Dark_Shikari> actually this is a serious problem with the asm
[02:04:17] <Dark_Shikari> it assumes output stride == input stride
[02:04:26] <mru> ungood
[02:04:29] <Dark_Shikari> this will take a significant amount of defucking
[02:04:32] <mru> if it needs temp buffers
[02:04:44] <Dark_Shikari> I will make BBB do this
[02:04:48] <Dark_Shikari> because he fucked it up
[02:04:56] <mru> I'll troll him as soon as I see him
[02:05:44] <CIA-99> ffmpeg: michael * r23732 /trunk/libavformat/asfdec.c:
[02:05:44] <CIA-99> ffmpeg: Continue after guids in asf after which other guids are possible instead of skiping
[02:05:44] <CIA-99> ffmpeg: over the stored size.
[02:05:44] <CIA-99> ffmpeg: Fixes issue2029
[02:15:08] <michaedw> whom would I ask about the hardware accelerator framework?
[02:15:32] <mru> the lhc guys :-)
[02:15:53] <michaedw> mru: <g>
[02:16:06] <mru> they can accelerate a macroblock to 99.9% of lightspeed
[02:16:14] <Dark_Shikari> and then crash it into a reference frame
[02:16:50] <mru> sometimes they dump core
[02:16:52] <mru> for real
[02:16:52] <michaedw> Step 3: ? Step 4: PROFIT!!!
[02:17:23] <michaedw> sorry, that's my Silicon Valley showing }:->
[02:17:27] <kierank> they want to find the higgs field but they don't know if it's tff or bff
[02:17:44] <CIA-99> ffmpeg: mru * r23733 /trunk/configure: Enable pthreads automatically unless w32threads is requested
[02:17:46] <Dark_Shikari> mru: I _hate_ x86 simd prior to ssse3. like, holy shit
[02:17:47] <Dark_Shikari> http://pastebin.org/353192
[02:17:50] <michaedw> Higgs and the LHC guys are definitely BFF
[02:17:54] <mru> field order == spin?
[02:17:57] <Dark_Shikari> those are two loop cores from a 4-tap horizontal MC function
[02:18:01] <Dark_Shikari> THEY DO THE SAME THING
[02:18:07] <Dark_Shikari> one is with sse2, one is with ssse3
[02:18:25] <Dark_Shikari> the mere existence of an arbitrary shuffle + byte multiply eliminates 2/3 of the code
[02:18:40] <mru> nice
[02:18:48] <michaedw> very nice
[02:19:12] <mru> not that I understand a single line of it
[02:19:20] <michaedw> I wonder how much difference there is at the microcode level
[02:19:25] <Dark_Shikari> michaedw: huge
[02:19:28] <Dark_Shikari> most of these ops are one uop
[02:19:35] <Dark_Shikari> most modern x86 simd is not cisc
[02:19:44] <Dark_Shikari> where "cisc" is defined in this case as "multiple internal uops for one instruction"
[02:19:57] <Dark_Shikari> that generally is only done for emulation of harder simd ops on shitty chips, like atom.
[02:19:58] <mru> the proper term for that is "microcoded"
[02:20:08] <Dark_Shikari> true.
[02:20:29] <mru> a risc ISA _could_ be microcoded
[02:20:31] <Dark_Shikari> equally, they're all one-inverse-throughput-per-execution-unit
[02:20:34] <mru> no sane person would do that though
[02:20:43] <michaedw> I don't mean uops so much as microcode size
[02:20:45] <Dark_Shikari> i.e. each execution unit can do one of these ops per cycle
[02:21:40] <Dark_Shikari> whenever someone is designing an simd instruction set, they need to look at stuff like this
[02:21:48] <mru> but single uops aren't necessarily single-cycle
[02:21:52] <Dark_Shikari> and think "how can we not massively cripple people writing complex functions?"
[02:22:00] <Dark_Shikari> er, s/complex/common
[02:22:47] <ohsix> wont need to buy new stuff if they give you all the goods at once
[02:22:48] <michaedw> trace cache footprint
[02:23:26] <mru> Dark_Shikari: that's what they did when they designed the neon instruction set
[02:25:28] <mru> Dark_Shikari: btw, did you see that commit I just did?
[02:25:42] <Dark_Shikari> which
[02:25:47] <mru> auto-pthreads
[02:26:24] <Dark_Shikari> Wait, it's in?
[02:26:32] <mru> scroll up
[02:27:13] <Dark_Shikari> _awesome_
[02:27:17] <Dark_Shikari> wait, we still support w32threads?
[02:27:26] <mru> apparently
[02:27:38] <mru> I met some resistance trying to kill it
[02:28:32] <Dark_Shikari> who the hell cares?
[02:28:57] <mru> ramiro
[02:29:21] <Dark_Shikari> o.0 he cares?
[02:29:31] <mru> I can't imagine anyone else does
[02:31:02] <michaedw> my knowledge in this area is rather stale, but it would be interesting to see those loop internals translated into micro-ops
[02:31:24] <mru> on a modern x86, pretty much as written
[02:32:19] <michaedw> looks like it. probably similar on a P-M core, too. Probably only funky on a P4.
[02:32:33] <mru> and atom
[02:35:08] <michaedw> atom has fewer decode units, but still has micro-ops more like P-M/Core than like P4, I think
[02:35:40] <mru> at least they kept the barrel shifter
[02:35:47] * mru glares at cell
[02:38:23] <michaedw> I would have thought the mova's in the SSE2 version were pretty nearly free
[02:38:46] <mru> what does it do?
[02:39:32] <Dark_Shikari> a mov between registers uses an execution unit just like everything else
[02:40:07] <michaedw> won't they get bonded to the subsequent shift instructions?
[02:40:23] <Dark_Shikari> This is x86, not a magical genie in a cpu.
[02:40:35] <Dark_Shikari> x86 doesn't merge movs and instructions yet.
[02:40:36] <michaedw> I thought that was one of the "micro-op fusion" use cases
[02:40:39] <Dark_Shikari> no
[02:40:42] <Dark_Shikari> cmp/jump is
[02:40:45] <mru> Dark_Shikari: am I dreaming or did you say mov between xmm regs was stupidly slow on some cpu?
[02:40:46] <Dark_Shikari> don't know of any others
[02:40:49] <Dark_Shikari> mru: pentium 4
[02:40:55] <Dark_Shikari> 6 cycles for a mov between mmx/xmm registers
[02:41:00] <Dark_Shikari> 8 cycles for a load from L1
[02:41:07] <mru> lol
[02:41:09] <Dark_Shikari> I wish I was making this up.jpg
[02:41:34] <mru> uop fusion is a paper-only feature
[02:42:39] <mru> if you pay attention, you'll notice that the marketing drivel doesn't say how well anything works
[02:42:42] <mru> only that they have it
[02:43:01] <Dark_Shikari> and gcc loves to reorder in ways that makes it imposisble
[02:43:02] <michaedw> I think you're thinking of macro-op fusion
[02:47:13] <michaedw> I would have thought that micro-op fusion would work the same whether the micro-ops originally came from the same SSE3 instruction or from nearby SSE2 instructions
[02:47:44] <michaedw> at least that's the way that I would have designed it :-)
[02:47:49] <michaedw> for exactly this reason
[02:48:11] <mru> assuming you _could_
[02:48:23] <michaedw> so that the SSE2 version would execute, in practice, as fast on the newer core as the logically equivalent SSE3 version
[02:48:37] <michaedw> register names are just labels in the pipeline anyway
[02:49:23] <michaedw> depends on your instruction scheduling flexibility, of course
[02:49:41] <mru> if it were that simple, why bother adding sse3?
[02:49:43] <Dark_Shikari> ummmmm
[02:49:45] <Dark_Shikari> but it's not logically equivalent
[02:49:46] <Dark_Shikari> ...
[02:49:50] <Dark_Shikari> they're completely different algorithms
[02:49:54] <Dark_Shikari> to solve the same problem
[02:50:05] <Dark_Shikari> it's just that the latter, much simpler one, is made possible by having better instructions available
[02:50:47] <michaedw> sse3 is more of a guide to compiler writers than anything else
[02:50:53] <Dark_Shikari> ssse3 is not sse3
[02:51:14] <mru> will there be an sssse4?
[02:51:40] <Dark_Shikari> no, we're on to sse5 now.
[02:51:44] <Dark_Shikari> and avx
[02:51:47] <michaedw> Dark_Shikari: ah, that's quite right
[02:51:53] <Dark_Shikari> avx == sse 2
[02:51:55] <Dark_Shikari> not sse2, but SSE 2
[02:52:02] <Dark_Shikari> i.e. repeating all the same mistakes of the original SSE
[02:52:25] <mru> avx?
[02:52:25] <michaedw> saturation is helpful
[02:52:36] <Dark_Shikari> mru: 256-bit vectors
[02:53:06] <mru> could occasionally be useful
[02:53:10] <Dark_Shikari> mru: ... float only
[02:53:15] <mru> aaaaiieee
[02:53:18] <Dark_Shikari> Exactly.
[02:53:27] <Dark_Shikari> and it's three-operand.
[02:53:32] <michaedw> that's got to be for FP textures
[02:53:40] <mru> 3-operand is good
[02:53:45] <Dark_Shikari> yup it is
[02:53:50] <Dark_Shikari> they originally announced it as the logical extension of SSE
[02:53:56] <Dark_Shikari> supporting integer and float etc, all the normal sse instructions
[02:53:57] <Dark_Shikari> and then they said
[02:54:03] <Dark_Shikari> "oh, integer is only for the low 128 bits"
[02:54:07] <Dark_Shikari> *headdesk*
[02:54:36] <michaedw> that's not so unreasonable; if you have limited silicon to spend, spend it on something that hasn't already been optimized heavily
[02:54:54] <michaedw> do they have half-float load/store?
[02:55:24] <Dark_Shikari> float "hasn't already been optimized heavily"?
[02:55:37] <Dark_Shikari> "limited silicon" --> how about spend it on things which are far cheaper than float, like integer?
[02:55:46] <mru> with that argument, you should never do anything new
[02:56:02] <michaedw> very low-precision float, for high-dynamic-range texture storage
[02:56:13] <mru> I know what half-float is
[02:56:18] <mru> cortex-a9 supports it
[02:56:31] <michaedw> a la neon fp16
[02:56:37] <michaedw> yep
[02:56:47] <Dark_Shikari> half-float: all the low precision of integers, all the speed of floats
[02:57:20] <michaedw> an elegant solution to certain 3-D rendering problems
[02:57:34] <michaedw> very special-purpose. happens to be a special purpose that sells chips.
[02:57:36] <mru> an elegant way to shoot yourself in the foot
[02:57:51] <Dark_Shikari> "sells chips"
[02:58:10] <Dark_Shikari> intel loves releasing totally useless instructions that have zero practical application
[02:58:14] <Dark_Shikari> *cough* mpsdabw
[02:58:14] <Dark_Shikari> *mpsadbw
[02:58:20] <michaedw> when memory bandwidth is the only constrained resource, slimming memory bandwidth helps
[02:59:48] <michaedw> makes it possible to keep all your textures in main memory and load/store them with cache-bypassing instructions
[03:00:09] <michaedw> without eating your entire mobile DDR bandwidth
[03:00:23] <mru> so let me get this straight...
[03:00:46] <mru> in the near future, we're supposed to graphics on the cpu and everything else on the gpu?
[03:01:05] <Dark_Shikari> lol
[03:01:23] <michaedw> it's largely irrelevant if you have separate texture memory
[03:01:40] <michaedw> aimed at mobile, mostly
[03:01:43] <saintdev> mru: lmao
[03:01:45] <michaedw> AIUI
[03:02:02] <michaedw> mru: <g>
[03:02:56] * mru waits for the CPCPU
[03:03:22] <michaedw> thanks for conversation and help. I'll go through roundup and annotate bug reports that appear to boil down to missing/non-functioning h264 parser (like I hit)
[03:03:34] <mru> I doubt there are many
[03:06:00] <saintdev> mru: CP = ?
[03:06:21] <mru> no fucking clue
[03:08:38] <Dark_Shikari> ........... fuck the vp8 interpolation filter coefficients
[03:08:40] <Dark_Shikari> fuck them
[03:08:41] <Dark_Shikari> long and hard
[03:08:55] <Dark_Shikari> I think it's impossible to do the 6-tap filter completely with pmaddubsw
[03:08:56] <mru> what's the problem?
[03:09:11] <mru> what are the coeffs?
[03:09:24] <Dark_Shikari> 2,-11,108,36,-8,1
[03:09:31] <Dark_Shikari> 3,-16,77,77,-16,3
[03:09:37] <michaedw> I suspect this, for instance, is related: http://lists.mplayerhq.hu/pipermail/ffmpeg-user/2010-April/024956.html
[03:09:38] <Dark_Shikari> 1,-8,36,108,-11,2
[03:10:03] <Dark_Shikari> the problem is 108 * X + 36 * Y can saturate a 16-bit signed word
[03:10:21] <Dark_Shikari> But the -11 * Z + -8 * W could reduce it below saturation again.
[03:10:40] <mru> unless those pixels are zero
[03:10:50] <Dark_Shikari> well we're just talking worst-case here.
[03:10:55] <mru> hmm, does vp8 use full or reduced yuv range?
[03:11:02] <Dark_Shikari> no signalling, it supports both
[03:11:10] <Dark_Shikari> you could have X and Y == 255
[03:11:15] <Dark_Shikari> for example
[03:11:22] <Dark_Shikari> or whatever values are necessary to break things
[03:11:54] <mru> why so huge coeffs?
[03:12:07] <Dark_Shikari> No idea.
[03:12:08] <Dark_Shikari> Stupidity
[03:12:12] <Dark_Shikari> I just figured out how to fix this though
[03:12:20] <mru> this for qpel?
[03:12:25] <Dark_Shikari> yes
[03:12:28] <Dark_Shikari> pmaddubsw does two coeffs at a time
[03:12:29] <Dark_Shikari> so if I do
[03:12:43] <michaedw> subtract the middle row from the other two, add them back together after collapsing
[03:12:57] <Dark_Shikari> (-8 * X1 + 36 * X2) + (108 * X3 -11 * X4)
[03:13:02] <Dark_Shikari> That avoids the saturation possibility
[03:13:08] <Dark_Shikari> i.e. different parenthesis grouping
[03:13:50] <mru> what if X1 and X4 are zero?
[03:13:58] <mru> and X2 and X3 huge
[03:14:27] <Dark_Shikari> no problem
[03:14:35] <Dark_Shikari> the point is we don't want it to saturate in one direction
[03:14:38] <Dark_Shikari> and then need to be _dragged down again_
[03:14:40] <Dark_Shikari> then it breaks
[03:14:45] <Dark_Shikari> so as long as saturation is permanent, we're good
[03:14:52] <mru> ah right
[03:15:08] <Dark_Shikari> ugh, now I have to reorder all my coeffs again
[03:15:21] <mru> you're saturating immediately
[03:15:30] <mru> that's odd
[03:15:31] <Dark_Shikari> pmaddubsw saturates in its output
[03:15:39] <Dark_Shikari> I can use it to do two out of six coeffs
[03:15:41] <Dark_Shikari> and saturate the output of that
[03:15:46] <michaedw> you can use the 128-bit-wide variant to do the whole row
[03:15:58] <Dark_Shikari> Huh?
[03:16:02] <Dark_Shikari> pmaddubsw is the 128-bit-wide variant.
[03:16:12] <mru> where do you add the residual?
[03:16:17] <Dark_Shikari> this is just MC
[03:16:19] <michaedw> if you subtract the middle row from the other two
[03:16:42] <michaedw> and add the result back
[03:16:52] <mru> and I thought h264 qpel was hairy...
[03:17:27] <michaedw> then none of the intermediate results saturates
[03:17:31] <Dark_Shikari> h264 qpel is hairier.
[03:17:41] <mru> two-stage
[03:18:08] <mru> this is more like the h264 chroma mc
[03:18:31] <Dark_Shikari> yes, it's one-stage
[03:18:35] <Dark_Shikari> And i got it.
[03:18:37] <Dark_Shikari> "results identical"
[03:18:38] <Dark_Shikari> ssse3 done.
[03:18:59] <Dark_Shikari> er, h-filter done.
[03:19:04] <Dark_Shikari> v-filter will be... its own problem.
[03:19:16] <Dark_Shikari> oh, and we'll have to bug BBB to fix his stride (lol)
[03:20:02] <drv> nobody gonna break my stride
[03:23:23] <Dark_Shikari> afk. mru, hit BBB if he comes on.
[03:26:16] <michaedw> you could also do four adjacent pixels in 9 ops, accumulate using paddw
[03:26:42] <michaedw> 9 pmaddubsw ops, that is
[03:31:53] <michaedw> you'd need to rebase the coefficients to prevent overflows during paddw
[03:32:41] <michaedw> no, you wouldn't; the totals fit in unsigned words
[03:37:27] <Dark_Shikari> um, 9?
[03:37:29] <Dark_Shikari> for 4 pixels??
[03:37:56] <Dark_Shikari> I'm using 3 for 8 pixels
[03:38:03] <michaedw> actually, I think what you want is to load 8 pixels and use this operand 9 times to calculate 3 filter results
[03:38:27] <michaedw> shifting the coeffs in between
[03:39:23] <michaedw> and for that you need coeffs that won't saturate in pairs
[03:39:46] <michaedw> because the pairing is different after shifting the coeffs left one byte
[03:40:41] <michaedw> shifting coeffs instead of pixels, you don't stall
[03:42:36] <michaedw> and you can run several batches of pixels through in parallel, shifted by 3 bytes at a time
[03:43:31] <Dark_Shikari> that's way too complicated
[03:43:49] <Dark_Shikari> I can do a 6-tap filter of an 8x8 block in 18 multiplies
[03:43:53] <Dark_Shikari> er, 24 multiplies.
[03:44:23] <Dark_Shikari> one multiply per 2.7 output pixels
[03:44:25] <michaedw> with what edge conditions?
[03:44:38] <Dark_Shikari> none. emulated_edge_mc handles those.
[03:45:17] <michaedw> at what cost?
[03:45:23] <Dark_Shikari> effectively zero
[03:45:30] <Dark_Shikari> it costs nothing except for blocks with edge conditions
[03:45:41] <Dark_Shikari> which are O(sqrt(N)) of the total mbs
[03:46:15] <michaedw> right, but what edge conditions on the 8x8 block itself?
[03:46:23] <Dark_Shikari> huh?
[03:46:33] <Dark_Shikari> if motion compensation references pixels outside of the frame
[03:46:40] <Dark_Shikari> those pixels shall be equal to the closest pixel inside the frame.
[03:47:54] <michaedw> right, so you need to pad outside the block with edge pixels to compute the filter, right?
[03:48:03] <Dark_Shikari> yes
[03:49:39] <michaedw> I would think it would be cheapest to do that when loading the first and last batches of pixels in the row
[03:50:07] <Dark_Shikari> um... it's only done in like 1% of blocks
[03:50:11] <Dark_Shikari> it's not worth optimizing
[03:51:14] <michaedw> hmm, we're talking past one another. I'm not talking about edge blocks.
[03:52:11] <michaedw> I'm talking about the contribution of pixels past the edge of the 8x8 block to the filter outputs near the edge of the block.
[03:52:44] <Dark_Shikari> why does that matter?
[03:52:48] <Dark_Shikari> movdqu, [src-2]
[03:52:54] <Dark_Shikari> bam, now you have all the pixels you need to do your filter.
[03:53:06] <Dark_Shikari> there is no "boundary" you have to obey
[03:53:08] <Dark_Shikari> just load past it
[03:54:08] <michaedw> great. now you run your three sets of filter coeffs over it, shift them, run them over it, shift them, run them over it. right?
[03:54:43] <Dark_Shikari> shift, what?
[03:55:00] <Dark_Shikari> http://pastebin.com/ZWd1pzRK
[03:55:03] <Dark_Shikari> there's the code
[03:55:07] <Dark_Shikari> "shift, what shift?"
[03:56:37] <michaedw> the pshufb's seem expensive
[03:56:50] <Dark_Shikari> 1/0.5 (latency / inverse throughput)
[03:56:53] <Dark_Shikari> i.e. two per cycle, 1 latency to finish
[03:59:36] <michaedw> seems like your multipliers are going to spend a lot of time stalled
[04:00:06] <Dark_Shikari> why?
[04:00:13] <Dark_Shikari> they take 3 cycles and you can issue one per cycle
[04:00:23] <Dark_Shikari> as each pshufb finishes, the multiply begins
[04:00:30] <Dark_Shikari> as each multiply finishes, an add begins
[04:00:42] <michaedw> the loop is bottlenecked on m0
[04:01:02] <Dark_Shikari> Not quite.
[04:01:14] <Dark_Shikari> The CPU can start executing the top of the loop before the bottom is finished.
[04:01:18] <Dark_Shikari> Isn't x86 great?
[04:02:01] <michaedw> core2 pipeline is 14 stages deep
[04:02:13] <Dark_Shikari> the reorder buffer is something like 50+ instructions
[04:02:45] <michaedw> which would help, if m0 didn't get reused for every batch of pixels
[04:02:51] <michaedw> with an unaligned load
[04:03:35] <michaedw> when you could calculate 3 pixels per load by shifting the filters
[04:03:38] <Dark_Shikari> doesn't matter
[04:03:43] <Dark_Shikari> the cpu can track dependencies.
[04:03:58] <Dark_Shikari> it knows the m0 at the top of iteration N+1 is unrelated to the m0 at the bottom from iteration N
[04:09:39] <michaedw> Dark_Shikari: you're probably right there; thinko from RISC habits
[04:10:33] <michaedw> how expensive is the unaligned load?
[04:13:17] <michaedw> and now that I look at it, I see that the shuffle indices result in register starvation
[04:14:06] <michaedw> probably doesn't matter, the loads in the loop can be scheduled early enough
[04:14:42] <michaedw> yeah, my RISC instincts don't serve me well here at all
[04:21:01] <michaedw> oh, yeah, there is a reason to do this in parallel
[04:21:24] <Dark_Shikari> the unaligned load costs two aligned loads on any intel chip
[04:21:29] <Dark_Shikari> except when it crosses a cacheline
[04:21:34] <Dark_Shikari> in which case it costs one L1 cache miss.
[04:21:43] <Dark_Shikari> on an AMD chip, it costs two aligned loads, except on phenom, where it costs one.
[04:22:17] <michaedw> you want to do a second pass interpolating along the other axis
[04:22:37] <Dark_Shikari> that's the V filter.
[04:22:39] <Dark_Shikari> This is the H filter.
[04:22:56] <michaedw> so you want to be accumulating pixels into a rotated matrix
[04:23:04] <Dark_Shikari> Only in HV mode.
[04:23:10] <Dark_Shikari> H mode == just H interpolation
[04:23:14] <Dark_Shikari> V mode == just V interpolation
[04:23:16] <Dark_Shikari> HV == both
[04:23:16] <michaedw> and writing them word-wise
[04:23:19] <michaedw> right
[04:23:27] <Dark_Shikari> in HV mode, there's just a temp array that we write to
[04:23:31] <Dark_Shikari> the core function isn't aware of it
[04:23:32] <Dark_Shikari> a wrapper handles it
[04:23:50] <Dark_Shikari> writing as words might help but is unnecessary, we have to do intermediate clamping in betwe--- wait a minute.
[04:23:55] <Dark_Shikari> WHAT?
[04:23:57] <Dark_Shikari> WHAAAAAAAAAAAT?
[04:24:06] <Dark_Shikari> There's intermediate clamping in the MC?
[04:24:11] <Dark_Shikari> What the fuck are they on
[04:28:06] <michaedw> probably because they implemented it in assembly first, then settled for whatever saturation behavior it gave them
[04:29:06] <michaedw> the spec also shows interpolation in 1/8-pixel intervals
[04:29:55] <michaedw> which you could do with 4 sets of filter coeffs and a byte-reversed copy of the pixels
[04:30:09] <Dark_Shikari> no, that isn't what I mean
[04:30:16] <Dark_Shikari> they explicitly round off after the first pass of the interpolation
[04:30:24] <Dark_Shikari> and then unpack _again_ to get the second pass
[04:30:31] <Dark_Shikari> i.e. they intentionally drop all internal precision between passes
[04:30:31] <michaedw> yep
[04:31:07] <michaedw> probably so they could byte-pack an intermediate array
[04:31:15] <michaedw> rotated :-)
[04:35:08] <michaedw> if it were me, I'd probably do something like that; work in 4-row stripes, write them rotated
[04:35:22] <Dark_Shikari> um, why would you rotate?
[04:35:25] <Dark_Shikari> there's no reason to rotate anything
[04:35:38] <michaedw> minimize store bandwidth
[04:35:55] <michaedw> on most architectures, unaligned stores are read-modify-write
[04:36:24] <Dark_Shikari> but the stores aren't unaligned
[04:36:59] <michaedw> depends how you interleave the 8 interpolated columns that you get from the H pass
[04:37:42] <Dark_Shikari> you just write a V filter and an H filter
[04:37:44] <Dark_Shikari> it's not that hard
[04:37:45] <michaedw> I'd want to store rotated, ready to be loaded and V filtered
[04:37:58] <Dark_Shikari> that's stupid
[04:38:08] <michaedw> because when you do the V filter pass, you're cold-cache
[04:38:09] <Dark_Shikari> H and V are both fast operations
[04:38:28] <michaedw> way cheaper to do the rotation while you've got the data in cache
[04:38:47] <Dark_Shikari> um....
[04:38:51] <Dark_Shikari> it will all be in L1 cache.
[04:38:57] <Dark_Shikari> unless you're on a CPU with 320 bytes of cache
[04:38:58] <Dark_Shikari> or something
[04:39:02] <Dark_Shikari> even then I think you can still fit it all in cache
[04:40:32] <michaedw> the block explodes 8-fold when you do all the interpolations
[04:41:04] <Dark_Shikari> um, no it doesn't
[04:41:13] <Dark_Shikari> you have no idea how motion compensation works
[04:41:51] <michaedw> I have a patent in this area
[04:42:08] <michaedw> an old patent, but still
[04:43:01] <michaedw> I could well be confused about this specific codec, but not because I "have no idea"
[04:47:37] <michaedw> ah, I see where our perspectives differ. I am refactoring to not stride from row to row.
[04:49:47] <michaedw> I'm thinking in terms of implementing this during the initial streaming pass over the incoming pixels.
[04:50:20] <Dark_Shikari> "initial streaming pass"?
[04:50:28] <Dark_Shikari> We're pointing a motion vector to a reference frame.
[04:50:32] <Dark_Shikari> there is no "initial streaming pass"
[04:51:05] <michaedw> If you already have the frame in memory, and you can afford the hit to your fetch address prediction, sure, stride from row to row
[04:51:57] <Dark_Shikari> in memory?
[04:51:58] <Dark_Shikari> as opposed to what
[04:52:00] <Dark_Shikari> in thin air?
[04:52:11] <Dark_Shikari> "fetch address prediction"?
[04:52:23] <michaedw> and I am of course thinking of encoding, not decoding, so I'm completely on crack
[04:52:47] <Dark_Shikari> on encoding you'd do the same thing
[04:52:59] <Dark_Shikari> Since there's no magic shortcut you can take for an unstaged filter.
[04:53:04] <Dark_Shikari> like you can for say h264.
[04:53:57] <michaedw> I'm thinking motion estimation, and producing subsampled frames for comparison to the reference frame
[04:54:36] <michaedw> for which you really do need the whole range of subpixel shifts
[04:54:59] <Dark_Shikari> no you don't
[04:55:04] <Dark_Shikari> a diamond search is usually enough
[04:55:18] <Dark_Shikari> in qpel, that would end up searching a small (albeit significant) fraction of the positions
[04:55:33] <Dark_Shikari> of course in h264 you just pre-interpolate the hard 6-tap and do linear on the fly.
[04:56:47] <michaedw> depends what your scene looks like; sometimes the local minimum is not the best estimate
[04:57:55] <michaedw> visually, after quantization, if not arithmetically
[04:58:14] <michaedw> hard edges can throw off individual estimates
[04:58:27] <Dark_Shikari> that's what psy metrics are for
[04:58:47] <michaedw> the overall shape of the well is often a better predictor
[04:59:06] <Dark_Shikari> it doesn't matter what the shape of the well is
[04:59:09] <Dark_Shikari> it matters what the result looks like
[05:00:02] <michaedw> right. and that depends on whether the detail you lose in quantization is the interesting detail or not.
[05:00:51] <Dark_Shikari> Which you can measure.
[05:01:16] <michaedw> the local minimum of the mean-square delta in luma relative to the reference frame is not always the best predictor.
[05:01:29] <Dark_Shikari> Whoever said to use mean-square?
[05:01:41] <michaedw> your choice of statistic
[05:01:43] <Dark_Shikari> SSD is a horrible motion search metric
[05:01:44] <Dark_Shikari> nobody uses it
[05:01:52] <Dark_Shikari> SAD is by far the most common.
[05:01:55] <Dark_Shikari> SATD is a lot better.
[05:02:02] <Dark_Shikari> SADCTD is marginally better but not worth it.
[05:02:04] <Dark_Shikari> RD is better.
[05:02:08] <Dark_Shikari> RD with a psy metric is even better.
[05:03:13] <Dark_Shikari> (by "motion search" I mean subpel search here. SAD is sufficient for fullpel in most cases.)
[05:09:02] <michaedw> at the time that I looked closely at the problem, I found that estimating vertical and horizontal separately and then doing a local search using something close to SAD gave the best bang for the buck; but then we couldn't afford a Hadamard in real-time back then
[05:10:16] <Dark_Shikari> heh, must have been a long time ago, or on very limited hardware =p
[05:10:23] <michaedw> (what we used in the field was closer to a sum of clamped absolute differences, with the clamping done with some crude light level dependence)
[05:10:48] <michaedw> designed in 1990
[05:11:13] <Dark_Shikari> 1990? did mpeg-1 even exist then?
[05:11:15] <Dark_Shikari> or was this h261?
[05:11:18] <michaedw> pity the guy who owned the company was clueless, he could have had a stake in the MPEG pool
[05:11:32] <michaedw> industrial application
[05:12:00] <michaedw> highway survey camera, if you can believe it
[05:12:30] * Dark_Shikari has heard all kinds of great hardware encoder stories
[05:12:35] <Dark_Shikari> like Harmonic's system
[05:12:35] <michaedw> prototyped on the first IndigoVideo board ever to leave the SGI premises
[05:12:37] <Dark_Shikari> they had an MPEG-2 encoder
[05:12:42] <Dark_Shikari> when H.264 came out, they were going to bootstrap it to h264
[05:12:44] <Dark_Shikari> it was DSP-based
[05:12:53] <Dark_Shikari> the guy who ran the project decided they didn't need deblocking
[05:12:57] <Dark_Shikari> because it was "only there to fix mistakes you made"
[05:13:05] <Dark_Shikari> "and we don't make mistakes"
[05:13:09] <michaedw> and the second, and the third, and the fourth
[05:13:21] <michaedw> *serious* infant mortality :-)
[05:13:25] <Dark_Shikari> by the time they finished, they realized that they didn't even have enough power to do both intra and inter analysis per block
[05:13:35] <Dark_Shikari> they ended up pitching out the entire MB core and buying another.
[05:13:41] <michaedw> yum
[05:14:42] <michaedw> anyway, I may be clueless about VP8, but not because I never thought about motion compensation :-)
[05:14:55] <Dark_Shikari> I don't think much of anyone has a clue about vp8
[05:15:11] <Dark_Shikari> even the original devs must be high
[05:15:29] <Dark_Shikari> that's the only way they could have come up with some of this shit
[05:15:43] <michaedw> it looks to me like it's designed for feasible encoding on foreseeable mobile processors
[05:15:58] <michaedw> multiple ARM cores with NEON SIMD, for instance
[05:16:36] <Dark_Shikari> _encoding_? you crazy?
[05:16:40] <Dark_Shikari> their current encoder is slow as crap :/
[05:16:50] <michaedw> lots of raw ops, but inadequate memory bandwidth and no cache to speak of
[05:16:51] <Dark_Shikari> and none of it screams "fast encoding" to me
[05:17:03] <Dark_Shikari> no cache? they have just as much L1 cache as a modern x86
[05:17:16] <michaedw> it's the small L2 that hurts
[05:17:19] <Dark_Shikari> No, L2 is useless
[05:17:27] <Dark_Shikari> L2 is only necessary to catch what falls out of L1
[05:17:55] <Dark_Shikari> ok, obviously not useless, but it's not going to be the killer
[05:18:01] <Dark_Shikari> and VP8 did nothing to save L2
[05:18:19] <Dark_Shikari> at least not as far as I can see
[05:18:36] <michaedw> L2 is what cuts your memory bandwidth down to size, and allows read-modify-write to suck less
[05:18:49] <Dark_Shikari> but nobody does read-modify-write.
[05:18:59] <Dark_Shikari> unaligned stores simply don't exist in video encoders
[05:19:07] <Dark_Shikari> unaligned _loads_ are a huge cost
[05:19:08] <michaedw> what else is any ram access smaller than a cache line?
[05:19:35] <Dark_Shikari> I'm pretty sure the CPU doesn't have to update RAM until the cacheline is evicted.
[05:19:41] <michaedw> *exactly*
[05:21:37] <michaedw> cache-bypassing loads, combined with stores to "freshly allocated" memory (so the CPU is told not to bother loading the pre-existing contents of the cache line)
[05:21:55] <Dark_Shikari> cache-bypassing loads? that's retarded
[05:22:05] <Dark_Shikari> then you have 300 cycles of latency you have to somehow hide
[05:22:27] <michaedw> yep
[05:22:35] <michaedw> *long* pipelines
[05:22:39] <Dark_Shikari> This is stupid.
[05:23:03] <Dark_Shikari> The best way to deal with a small L1 is to keep your working set small.
[05:23:08] <michaedw> 32-byte-wide cache-bypassing loads
[05:23:09] <Dark_Shikari> Not to use tons of "cache bypassing loads"
[05:23:28] <Dark_Shikari> I give up, I'll wait till morning for mru to come back and beat sense into you
[05:23:35] <Dark_Shikari> since he knows more about ARM than every single person in this channel combined
[05:24:03] <michaedw> I'm sure he does know way more than I do
[05:24:09] <michaedw> about ARM, among other things
[05:25:53] <michaedw> but check out VLDM some time
[05:26:40] <Dark_Shikari> what's special about VLDM?
[05:26:42] <Dark_Shikari> it's a nice instruction
[05:27:43] <michaedw> especially when you use it to fetch a whole, aligned cache line
[05:29:22] <michaedw> direct access to L2 cache, bypassing L1
[05:30:12] <michaedw> (you force a preload into the L2 cache in advance, using LDR)
[05:30:56] <astrange> where's a set of vp8 files?
[05:31:06] <michaedw> in a way, it's back to the bad old days of explicit prefetch
[05:31:40] <michaedw> but it makes up for the neon<->arm latency
[05:31:43] <Dark_Shikari> astrange: http://code.google.com/p/webm/downloads/detail?name=vp8-test-vectors-r1.zip…
[05:34:18] * Dark_Shikari wonders where BBB is
[05:35:19] <michaedw> analogous tricks worked back in the StrongARM days; we got about 20fps quarter-VGA MPEG-1 decode on a 200-MHz SA-1100 (8KB data cache)
[05:37:16] <michaedw> I expect to see VP8 doing 720p30 encode on one core of a dual ARMv7, leaving enough memory bandwidth for the other to do UI and network and such
[05:37:24] <Dark_Shikari> hah.
[05:37:36] <Dark_Shikari> You haven't actually tried this have you
[05:37:46] <Dark_Shikari> I just had to implement 1024x768p30 video playback on the iPad.
[05:37:50] <Dark_Shikari> It cannot decode MPEG-1 in realtime.
[05:38:05] <Dark_Shikari> VP8, by comparison, is marginally more complex than H.264.
[05:38:19] <Dark_Shikari> the iPad has a 1ghz armv7.
[05:38:22] <Dark_Shikari> With neon.
[05:38:42] <ohsix> and unicorns
[05:38:44] <Dark_Shikari> I ended up hand-optimizing the FLV decoder (it's simpler than MPEG-1) for up to 30-40% performance gain.
[05:39:00] <Dark_Shikari> It was still too slow to consistently reach 30fps in the hardest scenes.
[05:39:20] <Dark_Shikari> Trying to play 720p (higher res) VP8 (easily 2-3x harder than mpeg-1) on such a thing would be laughable
[05:39:30] <Dark_Shikari> with lavc, it can't play h264 at 15fps.....
[05:39:31] <Dark_Shikari> with deblocking off
[05:39:32] <Dark_Shikari> subpel off
[05:39:34] <Dark_Shikari> cabac off
[05:39:36] <Dark_Shikari> bframes off
[05:39:38] <Dark_Shikari> weighted pred off
[05:45:40] <michaedw> I would expect to see the cabac equivalent on the other cpu (the one doing the network layer anyway). there are no B frames in VP8. subpel will probably be scaled down on mobile encoders to 1/4 pixel units, maybe even 1/2 pixel.
[05:46:07] <Dark_Shikari> it already is 1/4 pixel units.
[05:46:13] <Dark_Shikari> there's no hpel option in the spec.
[05:46:26] <Dark_Shikari> b-frames decrease complexity when arithmetic coding is enabled.
[05:46:43] <michaedw> for decoding, yes
[05:46:57] <peloverde> DS, but can't the beagle do 720p h.264 in software?
[05:47:00] <Dark_Shikari> no way
[05:47:02] <Dark_Shikari> no way in hell
[05:47:07] <Dark_Shikari> Unless you're using the C64x+ DSP.
[05:47:15] <Dark_Shikari> beagle is 40% slower than an ipad
[05:47:24] <Dark_Shikari> remember the videowall? that was mpeg-2 at 960x540 or whatnot.
[05:48:11] <superdump> i thought the videowall at linuxtag this year was running the native resolution of the array of monitors
[05:48:20] <superdump> for bbb
[05:48:22] <michaedw> I am not talking about currently shipping chips. I am particularly thinking of this: http://www.qualcomm.com/news/releases/2010/06/01/qualcomm-ships-first-dual-…
[05:49:02] <Dark_Shikari> superdump: yes
[05:49:04] <Dark_Shikari> which was that iirc
[05:49:10] <Dark_Shikari> remember the total was 2700xsomething
[05:49:15] <Dark_Shikari> that means each one was width ~900something
[05:49:31] <superdump> oh, of course
[05:49:32] <michaedw> the hardware 1080p is good, but power-intensive
[05:50:11] <superdump> a friend did a brief test of power usage when decoding 1080p h.264 on his macbook pro the other day
[05:50:19] <superdump> when using the cpu it was using about 10W
[05:50:38] <superdump> when using the gpu (9400M, so maybe it was a macbook, not a pro) it was using about 6W
[05:50:47] <Dark_Shikari> s/gpu/asic
[05:51:13] <michaedw> http://www.engadget.com/2010/06/21/toshibas-ac100-8-hour-smartbook-runs-and…
[05:52:24] <michaedw> that's unlikely to do 720p30 encode, though
[05:52:31] <Dark_Shikari> or decode
[05:54:12] <michaedw> decode using the hardware unit, I don't see why not
[05:54:18] <Dark_Shikari> well of course
[05:54:20] <michaedw> though again that probably eats power
[05:54:23] <Dark_Shikari> but the "hardware unit" is very specialized
[05:54:47] <michaedw> parts of it are. the rest is just DSP.
[05:55:21] <Dark_Shikari> as I said, talk to mru
[05:55:28] <Dark_Shikari> something something OMAP4 something
[05:55:37] <Dark_Shikari> they rely extremely heavily on asynchronous functional units connected by sram
[05:55:44] <Dark_Shikari> e.g. "h264 motion compensation" and "h264 idct"
[05:55:47] <Dark_Shikari> and "h264 cabac"
[05:55:50] <Dark_Shikari> with 2000-page APIs
[05:55:58] <michaedw> dedicated cabac, I expect; may or may not be flexible enough to do VP8's entropy encoding
[05:56:31] <Dark_Shikari> Having glanced at the APIs...... no.
[05:56:32] <michaedw> if it's anything like the Qualcomm equivalent, yes, it relies on some tightly-coupled memory
[05:56:56] <michaedw> and cache-bypassing access to it :-)
[05:57:03] <Dark_Shikari> it's "flexible" because they hardcode 5 different codecs into the silicon
[05:57:15] <Dark_Shikari> not because the silicon is flexible enough to do 5 different codecs
[05:58:05] <michaedw> yes, but that's not what the next generation is going to look like
[05:58:12] <Dark_Shikari> OMAP4 is the next generation.
[05:58:17] <michaedw> for TI
[05:59:16] <michaedw> the only OMAP generation I know well is the fixed-point DSP version, OMAP5912 and the like; oldish now
[06:02:58] <michaedw> what's shipping today looks more like this: http://www.radvision.com/Corporate/PressCenter/2009/25march2009_hd_engine.h…
[06:03:34] <michaedw> in terms of video coding on TI chips, that is
[06:03:46] <Dark_Shikari> also, I know that mediatek chipsets work the same way.
[06:03:49] <Dark_Shikari> i.e. tons of hardcoded shit.
[06:04:14] <michaedw> sigma designs went pretty far down that road, too
[06:04:30] <michaedw> but I think it's a dead end
[06:04:43] <Dark_Shikari> I think it's fine, because a new video format doesn't come out every 5 days
[06:04:54] <michaedw> I certainly wouldn't design a product today around a hard-function video pipeline
[06:05:07] <michaedw> any more than I would around a hard rendering pipeline
[06:05:13] <michaedw> compare OpenGL ES 1.1 and 2.0
[06:06:07] <michaedw> you know any hard encoders that do a good job of GDR?
[06:06:27] <Dark_Shikari> "hard encoders"?
[06:06:31] <Dark_Shikari> as opposed to easy encoders?
[06:06:51] <michaedw> true hardware h.264 encoding silicon
[06:07:05] <michaedw> not DSPs or FPGAs
[06:08:18] <Dark_Shikari> I don't know of any true hardware encoding silicon.
[06:08:24] <Dark_Shikari> I know it exists, but I don't know of anything in particular.
[06:08:37] <Dark_Shikari> I know of one "hardware" solution that does GDR, but it's a DSP and it's godawful buggy shit
[06:08:56] <Dark_Shikari> and call it intra refresh.
[06:09:02] <michaedw> yeah, I can't find anything I'd take a second look at, either
[06:09:31] <Dark_Shikari> I love it when companies release "hardware" solutions that can't even do their claims
[06:09:34] <Dark_Shikari> the one I mentioned, the DSP
[06:09:36] <j0sh_> michaedw: ittiam h264 uses gdr, but i'm 90% sure its dsp based
[06:09:44] <Dark_Shikari> did 720p... with no partitions, no ratecontrol, no subpel, no nothing
[06:09:49] <Dark_Shikari> the instant you added anything it went slow as crap
[06:09:54] <michaedw> I'd rather let DSPs do what they're good at (DCTs), hardware do what it's good at (bitstreams), and software do what it's good at (field updates)
[06:10:02] <Dark_Shikari> you don't need DSPs for DCTs
[06:10:09] <Dark_Shikari> DCTs are fast.
[06:10:54] <michaedw> but they also involve moving a lot of data in and out, and I'd rather not waste my CPU's memory bandwidth on that
[06:11:17] <michaedw> I don't really want to see the pixels until after they've been DCTed and quantized and packed
[06:11:31] <Dark_Shikari> "packing" is bitstream
[06:11:34] <Dark_Shikari> quantization is fast
[06:11:44] <Dark_Shikari> 8x8 h264 dct on i7: 51 cycles
[06:11:56] <michaedw> packed into residual blocks, pre-cabac
[06:12:02] <Dark_Shikari> 8x8 h264 quant on i7: 22 cycles
[06:12:18] <michaedw> sure, it's fast, once you've got it in L1
[06:12:27] <Dark_Shikari> 8x8 h264 zigzag: 23 cycles
[06:12:30] <michaedw> I have better uses for L1
[06:12:32] <Dark_Shikari> It's always in L1
[06:12:34] <Dark_Shikari> in an encoder, at least
[06:12:46] <Dark_Shikari> And if you think that 256 bytes of L1 is your big worry...
[06:12:49] <michaedw> yes, and it can be in the other core's L1, thank you :-)
[06:12:58] <Dark_Shikari> oh no, 256 bytes of L1 spent
[06:13:00] <Dark_Shikari> on something important
[06:13:01] <Dark_Shikari> whatever will I do!
[06:13:17] <michaedw> it's not the footprint, it's the bandwidth
[06:13:23] <Dark_Shikari> L1 bandwidth is practically unlimited.
[06:13:33] <michaedw> bandwidth to main memory is not
[06:13:40] <Dark_Shikari> But this never reaches main memory.
[06:13:49] <Dark_Shikari> Ever.
[06:13:56] <Dark_Shikari> Except maybe during a context switch.
[06:13:56] <michaedw> pixels gotta get in there somehow
[06:14:00] <Dark_Shikari> Pixels != dct
[06:14:41] <Dark_Shikari> Let's just say that the benchmarks have consistently shown that x264 is not bottlenecked by main memory bandwidth.
[06:14:51] <Dark_Shikari> To the point where people have added faster DDR and measured zero performance change.
[06:14:53] <michaedw> on an i7, sure
[06:15:07] <Dark_Shikari> i7s don't have a lot of memory bandwidth.
[06:15:15] <michaedw> I want all that done with negligible impact on my poor mobile DDR
[06:15:27] <Dark_Shikari> They have quite a bit less per compute power than many smaller cpus.
[06:16:10] <michaedw> on the fly, as the pixels arrive, with enough tightly coupled memory to hold one and a half macroblock heights' worth of rows
[06:16:49] <michaedw> the big problem with dsp encoders isn't the dsp, it's the ddr dedicated to it
[06:16:55] <michaedw> cost, power, footprint
[06:18:34] <pengvado> one and a half macroblock rows plus the areas of all the reference frames they use?
[06:18:52] <michaedw> no B frames = only one reference frame
[06:19:02] <Dark_Shikari> what?
[06:19:08] <Dark_Shikari> there is no equivalence there
[06:19:13] <Dark_Shikari> kumquat = harley-davidson
[06:20:11] <michaedw> let me state that the other way around; if you're doing B frames, you're coding relative to an interpolation between previous and next I/P frames
[06:20:46] <michaedw> even if you could afford the latency, the buffering kills
[06:20:53] <Dark_Shikari> ...?
[06:22:22] <michaedw> say you have two B frames between two non-B frames
[06:23:19] <michaedw> you've got to wait until you have both reference frames, then calculate the predicted (interpolated) frame, then calculate residuals, right?
[06:23:50] <michaedw> I am going on MPEG2-era knowledge, but surely if it didn't work that way in MPEG4, they wouldn't call them B frames
[06:24:05] <michaedw> (I don't know, we don't use them)
[06:24:35] <Dark_Shikari> "wait until...." to do what?
[06:25:01] <michaedw> to be able to calculate residuals for the B frames
[06:25:17] <Dark_Shikari> and this is bad because...
[06:25:59] <michaedw> so that's two frames' worth of pixels buffered, in addition to the reference frames
[06:26:11] <Dark_Shikari> and this is bad because...
[06:26:13] <michaedw> not free on an embedded system
[06:26:34] <Dark_Shikari> If we got rid of everything that wasn't free, a lot of things would be pretty shitty.
[06:27:54] <michaedw> I'm not so much arguing for the quality per MB -- VP8 is presumably not for Blu-Ray -- but trying to understand the design choices
[06:28:06] <Dark_Shikari> B-frames weren't chosen because they were patented.
[06:28:08] <Dark_Shikari> End-of.
[06:28:14] <Dark_Shikari> VP8 uses the altref, which is just as costly as B-frames in terms of memory.
[06:28:27] <Dark_Shikari> And more costly in terms of CPU, because it's coding an entire frame that is never displayed.
[06:28:44] <Dark_Shikari> VP8 uses the golden frame, which is yet _another_ extra frame to store, increasing memory usage.
[06:30:51] <michaedw> but that I can afford in the CPU that sees the entropy-decoded bitstream
[06:31:30] <michaedw> and I can tell the thing that's tightly coupled to the display to prefetch the relevant reference block
[06:32:15] <Tjoppen> so just don't use B-frames? afaik you don't gain much by using B-frames if you can't use the next P-frame as reference. might be useful to reduce idct inaccuracies though
[06:33:56] <michaedw> I'm not saying the whole system has no memory. just that the thing that has most of the memory bolted to it doesn't want to pollute its cache with raw pixels, reference frames included.
[06:34:52] <michaedw> it's perfectly happy to proxy DMA, with enough prefetch lead time to fit the accesses in among the main CPU's actual cache-miss cycles.
[06:36:18] <michaedw> VP8 looks streaming-optimized to me, with videoconferencing and transcoding in mind
[06:36:34] <Dark_Shikari> no, it looks on2-optimized
[06:36:39] <Dark_Shikari> keep in mind that on2 are not very smart
[06:36:51] <Dark_Shikari> there are many things that are in the spec simply because nobody thought to suggest otherwise
[06:37:04] <Dark_Shikari> there are many other things that are just outright bugs and nobody even noticed
[06:37:11] <Dark_Shikari> there are other things they blatantly copied off h264
[06:37:20] <Dark_Shikari> and everything else they copied off their previous codecs
[06:37:25] <Dark_Shikari> there's practically not an ounce of originality in it
[06:37:33] <Dark_Shikari> "golden frames" are there because vp7 had golden frames
[06:37:33] <michaedw> what's so great about originality?
[06:37:37] <Dark_Shikari> golden frames were in vp7 because vp6 had them
[06:37:39] <Dark_Shikari> they were in vp6 because vp5 had them
[06:37:43] <Dark_Shikari> they were in vp5 because vp4 had them
[06:37:56] <Dark_Shikari> originality is good because it lets you be better
[06:38:00] <Dark_Shikari> you can't beat everyone else by copying them
[06:38:05] <Dark_Shikari> you can only be just as good, at best.
[06:38:14] <Dark_Shikari> To win, you must do something they didn't do.
[06:38:30] <peloverde> "On2 also admitted that it's had trouble hiring and retaining skilled, qualified employees" http://news.cnet.com/8301-1023_3-10410341-93.html
[06:38:35] <Dark_Shikari> lol
[06:39:06] <Dark_Shikari> That's how x264 won. We didn't merely do what others did, we did stuff they didn't do.
[06:39:09] <michaedw> Google does a *lot* of video transcoding. They can probably figure out how to specify a streaming-friendly format that's easy to transcode to, and feasible to encode in real-time on mobile hardware.
[06:39:18] <Dark_Shikari> Google didn't do anything
[06:39:19] <Dark_Shikari> they bought on2
[06:39:52] <michaedw> I suppose it's possible that on2 refined their codec in the void, with no target application in mind
[06:40:02] <Dark_Shikari> they refined it with their own target applications in their head in mind
[06:40:13] <Dark_Shikari> In theory.
[06:40:19] <Dark_Shikari> It's a company. Companies do what companies do.
[06:40:28] <michaedw> "getting bought by Google" is a nice target application
[06:40:28] <Dark_Shikari> they're inefficient and slow.
[06:40:38] <Dark_Shikari> Indeed, for those who still had stock options
[06:40:42] <transport> didnt google also pay for x264 devs to code their backends too ?
[06:40:44] <michaedw> high inertia, that's for sure
[06:40:53] <michaedw> I certainly would, if I were Google
[06:41:09] <michaedw> and couldn't figure out how to hire them without spoiling their effectiveness
[06:41:15] <Dark_Shikari> transport: no
[06:41:29] <Dark_Shikari> michaedw: I'm not sure they were ever effective
[06:41:40] <michaedw> I'm referring to the x264 devs :-)
[06:41:42] <Dark_Shikari> transport: google has gone out of their way to be as detached from the open source community as possible
[06:41:53] <Dark_Shikari> they refuse, by policy, to admit they use ffmpeg.
[06:41:59] <michaedw> Dark_Shikari: not really. I spent a couple of years there.
[06:42:00] <Dark_Shikari> they refuse, by policy, to contribute patches.
[06:42:09] <Dark_Shikari> they refuse, by policy, to give us sample videos, even if they're in public domain.
[06:42:34] <Dark_Shikari> michaedw: did you know Pascal Massimino?
[06:42:46] <michaedw> there are parts of "the open source community" with which they engage fairly intensively
[06:42:58] <michaedw> kernel, Ubuntu
[06:43:03] <Dark_Shikari> Certainly not us. Despite the fact that they have tens of thousands of computers running our software.
[06:43:41] <michaedw> never met Pascal
[06:43:44] <peloverde> WebM had dozens of zero day partners but they didn't presubmit their FFmpeg patches
[06:43:51] <michaedw> didn't work in anything remotely video-related
[06:43:53] <Dark_Shikari> ah
[06:43:57] <Dark_Shikari> pascal was an old xvid dev
[06:43:59] <Dark_Shikari> google hired him, he still works there
[06:44:03] <Dark_Shikari> he wrote their original h264 encoder, himself
[06:44:16] <michaedw> open source types tend to go dark on joining google
[06:44:29] <michaedw> sometimes they come out into the light again after awhile, sometimes not
[06:44:29] <Dark_Shikari> Everyone does.
[06:44:38] <Dark_Shikari> that's why I refuse to work for them
[06:44:42] <Dark_Shikari> they tried to hire me
[06:44:46] <Dark_Shikari> No fucking way.
[06:44:48] <michaedw> there's an awfully big playground inside
[06:44:58] <Dark_Shikari> Yes, a big playground with metal bars on the windows.
[06:45:02] <astrange> you're permitted to work on open source in google
[06:45:14] <astrange> but many people seem to not bother doing it in 20%
[06:45:16] <Dark_Shikari> astrange: They said I wasn't allowed to own my open source contributions.
[06:45:18] <michaedw> lots of cool stuff to work on, thousands of people they can talk about it with in a completely unrestricted manner
[06:45:21] <Dark_Shikari> I said they could go fuck themselves.
[06:45:39] <michaedw> and money, of course
[06:45:43] <Dark_Shikari> Not in those exact words obviously.
[06:46:13] <michaedw> you have to be pretty organized to make 20% time work for you that way
[06:46:34] <Dark_Shikari> I prefer to work at a company where I have my 100% time.
[06:46:45] <michaedw> organized is perhaps not the first adjective I think of when I think of people who are involved in, but not central to, open source projects
[06:47:05] <michaedw> some people at Google do
[06:47:17] <michaedw> akpm, for instance
[06:47:37] <michaedw> but that's almost more of a sponsorship than traditional employment
[06:47:52] * Dark_Shikari is somewhere in between.
[06:48:34] <michaedw> I don't think "metal bars" is fair. They just expect value for money, and a fairly high degree of commitment to team goals.
[06:49:03] <Dark_Shikari> "committment to team goals" seems to be codephrase for giving up the community
[06:49:10] <Dark_Shikari> I have never seen a single person absorbed by google who stayed open
[06:49:17] <michaedw> it wasn't the place for me -- more to the point, very much not the right role for me -- but I don't have anything nasty to say about them
[06:49:19] <Dark_Shikari> they go dark and disappear
[06:49:38] <Dark_Shikari> I mean, I don't hate google. I just don't subscribe to the "omg they're the best place to work evar"
[06:50:30] <michaedw> I liked working there. Interesting people to rub up against. I'd have liked it more if those interesting people weren't quite so absorbed in their own projects.
[06:50:40] <KotH> ohayou gozaimasu!
[06:50:47] <Dark_Shikari> I guess I've been there... twice now
[06:50:54] <Dark_Shikari> food was decent. I liked the free Naked Juice things.
[06:51:00] <Dark_Shikari> but facebook had those too, and facebook's food was better.
[06:51:15] <Dark_Shikari> google was a bit larger, too spread out.
[06:51:29] <michaedw> I liked the Facebook culture, too -- but there's less there there, as far as I can see
[06:51:41] <michaedw> Google is struggling with the next e-folding
[06:52:09] <michaedw> I kept telling people there, don't think you're so much better than M$ until you've hit their scale
[06:52:26] <michaedw> I'm old enough to remember when M$ were the good guys
[06:52:42] <Dark_Shikari> no, they ever were
[06:52:54] <michaedw> the first Unix you could run on hardware that you could buy retail
[06:53:03] <Dark_Shikari> they were the "good guys" to developers. that's how they won.
[06:53:26] <Dark_Shikari> Well, and the whole cheating the hell out of ibm.
[06:53:30] <michaedw> and a CP/M clone that mostly worked and had documentation included
[06:53:42] <michaedw> ibm got their slice of the pie
[06:54:12] <transport> even back in the day when bill gayes write the amiga BASIC they were not good guys LOL
[06:54:16] <michaedw> I was an Apple fanboy back then, but I could give M$ their due
[06:54:20] <transport> gates
[06:55:18] <michaedw> and COM was bloody brilliant
[06:56:21] <michaedw> anyway, Larry and Sergey (and Eric) are very different people from Bill and Steve
[06:56:31] <michaedw> and choose to run their company very differently
[06:56:48] <michaedw> but they still haven't proven they can scale to 50K people and remain human
[06:58:05] <michaedw> I think Google has the same problem contributing to ffmpeg that most big companies do. not so much their own IP, but their agreements with other companies not to do things that would compromise their partners' IP
[06:58:34] <michaedw> patent pools and cross-licensing agreements and all that crud
[06:59:33] <peloverde> They don't seem to have that problem with their own homegrown opensource projects
[07:00:27] <peloverde> They also managed to sign on an army of vp8 partners but didn't submit their (messy) libvpx patches agaisnt FFmpeg until after the public announcement
[07:00:49] <michaedw> they like to make a big splash
[07:01:04] <Dark_Shikari> not really. google are the masters of failing to make a big splash
[07:01:10] <peloverde> they could have done both
[07:01:10] <Dark_Shikari> They announce new products all the time and fail to get any traction
[07:01:17] <Dark_Shikari> they're like the opposite of apple
[07:01:25] <Dark_Shikari> google announces tons of cool stuff that nobody ever uses
[07:01:28] <peloverde> enough of us were under NDA anyway
[07:01:28] <michaedw> who wants traction? they want drumbeat
[07:01:30] <Dark_Shikari> apple announces a few shitty things that everyone uses
[07:01:30] <transport> ohh Toshiba are bringing out a new netbook, dubbed the Toshiba AC100.a dual-core ARM Cortex-A9 at 1GHz http://topnews.co.uk/27150-toshiba-unveils-android-based-netbook-and-dual-s…
[07:02:13] <peloverde> The patches could have been prereviewed by select individuals
[07:02:23] <michaedw> transport: beat you by a little over an hour :-)
[07:02:30] <peloverde> The managed to partner with Sorenson and Adobe and still "make a splash"
[07:02:42] <Dark_Shikari> peloverde: ironically, they distributed the NDA'd stuff in violation of the gpl, I'm pretty sure
[07:02:46] <Dark_Shikari> my distribution didn't have ffmpeg source
[07:02:48] <Dark_Shikari> as far as I saw
[07:02:55] <michaedw> did you ask for it?
[07:03:09] <Dark_Shikari> No, but there wasn't an offer.
[07:03:12] <peloverde> their initial public release wasn't GPL comptatible
[07:03:13] <michaedw> lame
[07:03:24] <Dark_Shikari> what peloverde said as well
[07:03:27] <Dark_Shikari> they didn't plan anything out properly
[07:03:33] <michaedw> "GPL compatible" is an interesting concept
[07:03:42] <peloverde> they distributed a nonfree FFmpeg in chrome for a few weeks
[07:04:11] <transport> ohh but did you also see http://www.youtube.com/watch?v=H4Xr9ZSnXxQ the Toshiba bloke quotes a price of 40,000 to 50,000 Yen which in real money is ?
[07:04:22] <Dark_Shikari> 90 yen -> dollar
[07:04:41] <michaedw> last I checked, Android's compiler didn't do anything exciting for the A9
[07:04:43] <Dark_Shikari> FUCK THIS FUCKING put_vp8_epel8_v6_ssse3 I CANT FIGURE OUT WHERE THE BUG IS AGHGHHHHHHHHHHHHHHH
[07:04:48] <Dark_Shikari> I've spent 3 hours staring at this holy shit
[07:05:06] <michaedw> planning is not Google's strong point
[07:05:30] <astrange> android's compiler is gcc and i don't think they bothered pulling any of google's (very good) gcc engineers to do optimization for it
[07:05:38] <astrange> so you're left with what CS does
[07:06:22] <michaedw> it's not a matter of not bothering; there's just way too much value in what they're already doing to waste them on something as small-time as Android
[07:06:42] <Dark_Shikari> http://pastebin.org/353799 someone give this a glance and figure out where I made my retarded error
[07:06:49] <Dark_Shikari> I get the feeling it's something blatantly obvious that I overlooked
[07:06:53] <michaedw> symptom?
[07:07:10] <Dark_Shikari> all md5s are wrong.
[07:07:12] <Dark_Shikari> output is incorrect.
[07:07:23] <Dark_Shikari> haven't gotten around to printfing. I should do that.,
[07:07:38] <astrange> they mostly seem to be working on compiling google itself, which i guess is more important to them
[07:08:19] <michaedw> a lot of them are working on tuning for new platform variants
[07:09:06] <michaedw> new hardware, new kernel features
[07:09:11] <michaedw> containerization and all that
[07:10:10] <michaedw> that moves the marbles around enough to force retuning of the index serving stack
[07:10:24] <michaedw> there's no secret there
[07:11:06] <michaedw> and there's Google Go, of course
[07:11:54] <astrange> that's just the plan9 people's thing
[07:12:07] <astrange> i'm amazed that they could spend that long at google and apparently do nothing except go
[07:12:31] <michaedw> http://xkcd.com/303/
[07:12:33] <astrange> and ian taylor managed to write gccgo and gold and several other largeish gcc things in the same time frame
[07:14:56] <michaedw> r's public Google CV lists a pretty big Sawzall paper
[07:15:18] <astrange> oh, he did write sawzall
[07:15:29] <astrange> i was thinking of someone else who hadn't apparently produced anything...
[07:15:41] <astrange> kernighan
[07:15:44] <michaedw> that happens to some people at Google too
[07:16:10] <astrange> or dennis richie. i forget which one is at google now
[07:16:24] <michaedw> not kernighan, afaik
[07:16:57] <michaedw> Ken Thompson is co-Go
[07:22:59] <michaedw> Dark_Shikari: why the mova m3, m4?
[07:24:25] <Dark_Shikari> to prepare m3 for the next iteration
[07:26:22] <Dark_Shikari> in each iteration, m0 == row -2
[07:26:24] <Dark_Shikari> m1 == row -1
[07:26:26] <Dark_Shikari> m2 == row 0
[07:26:28] <Dark_Shikari> m3 == row 1
[07:26:30] <Dark_Shikari> m4 == row 2
[07:26:32] <Dark_Shikari> m5 == row 3
[07:26:35] <Dark_Shikari> for the 6-tap filter
[07:27:12] <michaedw> I don't quite understand the register allocation; why name the pmaddubsw targets m6, m1, m3?
[07:28:06] <Dark_Shikari> m6 is the accumulator
[07:28:10] <Dark_Shikari> m1 and m3 are to avoid moving registers
[07:28:16] <Dark_Shikari> All register choices are to keep the above mapping.
[07:28:24] <Dark_Shikari> and make sure that at the end of the loop, the registers are ready for the next iteration.
[07:28:28] <Dark_Shikari> *and to make sure
[07:28:34] <Dark_Shikari> i.e. m0 is now m1
[07:28:36] <Dark_Shikari> m1 is now m2
[07:28:37] <Dark_Shikari> m2 is now m3
[07:28:39] <Dark_Shikari> m3 is now m4
[07:28:41] <Dark_Shikari> m4 is now m5
[07:28:57] <michaedw> I think there's a mova m1, m2 missing
[07:29:22] <Dark_Shikari> fuck. you're right.
[07:29:23] <michaedw> maybe after line 31
[07:29:44] <Dark_Shikari> no, but we already lost m2
[07:30:16] <Dark_Shikari> hmm, what's the most elegant way to do this.
[07:30:58] <michaedw> not quite enough registers to work with, I fear
[07:31:16] <Dark_Shikari> we can kick out m7 and use a memory argument
[07:31:30] <michaedw> oh yeah, that's the other reason I would have rotated the intermediate result :-)
[07:31:31] <Dark_Shikari> I've already kicked out 3 regs for the filter coeffs
[07:32:39] * Dark_Shikari tries
[07:32:41] <michaedw> 6 operands pack into 6 bytes of a register better than into 6 registers, when you only have 8 to work with
[07:32:49] <Dark_Shikari> I wonder if it would be better to unroll it by a factor of 2
[07:32:50] <Dark_Shikari> to avoid the moves
[07:32:55] <Dark_Shikari> then you wouldn't have to shift between iterations
[07:32:59] <Dark_Shikari> just swap back and forth between iterations
[07:33:06] <Dark_Shikari> 'results identical' woot
[07:33:14] <Dark_Shikari> thanks for spotting that.
[07:33:19] <michaedw> no worries
[07:33:38] <Dark_Shikari> now to mix in the hv versions
[07:38:11] <michaedw> by the way, what I meant earlier by fetch prediction is the hardware prefetching based on automatically detected "access streams" that was characteristic of the P4
[07:38:32] <michaedw> decent description of it, and how to code to use it well, in http://www.siam.org/proceedings/alenex/2007/alx07_09pans.pdf
[07:40:11] <Dark_Shikari> well, one all-day MC marathon done
[07:40:16] <Dark_Shikari> my second ever.
[07:41:07] <michaedw> supplemented in the Core architecture by an IP-based prefetcher
[07:43:21] <michaedw> so while every access on row stride is going to cost you a full cache line hit to main memory, at least it'll be prefetched after the first few loop iterations
[07:45:40] <KotH> michaedw: is that paper worth to be read?
[07:48:10] <michaedw> KotH: Kevin Dick is considered pretty good, for an undergrad at the time, and any Amazon-Google collaboration is interesting to me
[07:49:00] <michaedw> use of Numerical Recipes in C as a reference implementation is pretty suspect, though
[07:49:16] <michaedw> the Fortran original was adequate, NRC is awful by any standard
[07:51:47] <michaedw> this paper basically illustrates the difference between cache-efficient coding (which is quite difficult) and prefetcher-friendly coding (which is easier, and has fewer tunable parameters, which are easier to tune empirically without knowledge of cache sizes)
[07:52:08] <transport> as asembly coders ,do you think theres any validity to this guys with regard to glibc lack of performance and memory routine optimisations that need doing ? http://www.freevec.org/content/commentsconclusions
[07:52:39] <michaedw> certainly true on modern ARM
[07:53:17] <michaedw> but that's a surface design flaw masking a more fundamental design flaw
[07:54:47] <astrange> i haven't seen a lot of str* hotspots in profiles
[07:54:50] <michaedw> you almost never want any C library's implementations of string functions
[07:55:06] <astrange> not that i profile string programs a lot, but such things can be avoided in hot areas in algorithmic ways instead
[07:55:11] <michaedw> in hot places
[07:56:08] <michaedw> either you need real strings, with Unicode and all that jazz, or you need byte arrays
[07:56:26] <Dark_Shikari> transport: glibc is a disaster
[07:56:53] <michaedw> it does a good job on some important things
[07:57:11] <Dark_Shikari> there are two things at fault
[07:57:13] <Dark_Shikari> "uldrich"
[07:57:13] <Dark_Shikari> and
[07:57:16] <Dark_Shikari> "drepper"
[07:57:17] <michaedw> things that are tightly coupled to the kernel
[07:57:32] <michaedw> sure, but he's not the only contributor
[07:57:41] <michaedw> Ingo puts good stuff in
[07:58:04] <kshishkov> ever heard of the first law in organic chemistry?
[07:58:18] <thresh> never speak about organic chemistry?
[07:58:18] <michaedw> brown + any color = brown?
[07:58:19] <Dark_Shikari> I'm pretty sure memcpy is still faster on mac than on glibc
[07:58:21] <Dark_Shikari> which is embarassing
[07:58:53] <michaedw> it's got to handle all the unaligned cases
[07:59:14] <Dark_Shikari> so does mac's
[07:59:15] <astrange> os x spends a lot of time on memcpy
[07:59:21] <kshishkov> michaedw: almost, "if you mix 10 kilos of jam with 1 kilo of shit you'll get 11 kilos of shit"
[07:59:23] <astrange> er, os x engineers
[07:59:29] <Dark_Shikari> os x's uses palignr to handle the unaligned cases
[07:59:54] <transport> isnt that becaouse mac used altivec optimisations were as linux PPC doesnt use any, is that going to be the same for linux arm/NEON too i wonder...
[08:00:05] <Dark_Shikari> glibc on anything not x86 usually sucks
[08:00:13] <michaedw> transport: true as of the last time I looked
[08:00:24] <michaedw> may change due to ChromeOS
[08:00:31] <astrange> memcpy and some other things like spinlocks are stored in the kernel specialized for each cpu/other stuff and mapped in at runtime
[08:00:38] <kshishkov> Dark_Shikari: that's why they try to replace it with something else at least on ARMs
[08:00:38] <astrange> i'm not actually sure why this is necessary
[08:01:00] <astrange> but i think it means they can unmap and then remap in a different spinlock implementation if all the secondary threads die, etc
[08:01:32] <michaedw> I think it's so the linker can specialize the call sites at load time
[08:02:11] <michaedw> ARM thread-local storage accesses work that way, IIRC
[08:03:13] <michaedw> or are you thinking of the stuff that's in whatever acronym replaced VDSO?
[08:03:40] <astrange> i was still talking about the OS X kernel
[08:03:51] <astrange> this one is called commpage
[08:05:06] <michaedw> oh; linux/glibc do something rather like that for certain instructions that require CPU-level access privileges but not privileged memory access
[08:06:35] <michaedw> they trap into the kernel because they're illegal instructions in user mode; the kernel peeks at the IP that trapped, sees that it's in the special page, and returns with the relevant privileges enabled
[08:06:52] <michaedw> trusting that the special page will shut them off again before returning to normal code
[08:07:13] <astrange> moving the ffplay timestamp reordering code to cmdutils.c is making me feel bad for some reason
[08:07:34] <michaedw> sorry, that's the mechanism that replaced VDSO; no relation to the link-time call site specialization
[08:08:01] <astrange> call site specialization can be done by defining memcpy to be a function pointer in the header
[08:08:18] <astrange> would need special knowledge in gcc though
[08:11:16] <michaedw> in userland it's usually done a la libm
[08:15:10] <michaedw> I seem to be unable to shake the name of the special ELF section with the cpu-specific routines out of the cobwebs in my head, even with Google's help
[08:16:58] <astrange> don't see it in objdump -h
[08:17:14] <michaedw> may be ARM-specific
[08:23:34] <michaedw> no, I'm thinking of x86
[08:23:56] <michaedw> I remember seeing multi-byte NOPs
[08:24:18] <michaedw> being used to pad out the replacement code to the same length as the code it replaced
[08:25:56] <michaedw> alternative something
[08:26:37] <michaedw> Bingo: arch/x86/include/asm/alternative.h
[08:28:09] <michaedw> astrange: .altinstructions and .altinstr_replacement
[08:34:42] <michaedw> Dark_Shikari: decent explanation of use of LDR as an L2 prefetch instruction on Neon here: http://forums.arm.com/lofiversion/index.php?t12665.html
[08:38:42] <transport> <michaedw> may change due to ChromeOS , ohh, so you think they may actually write SIMD optimisations for core libs to use the NEON and can provide for up to 25% increase in application performance as claimed by the freevec guy for altivec, but will any such ARM ChromeOS patches be back ported upstream so every one can benefit , not least the ffmpeg C code calling these faster simd routines generally...
[08:40:45] <michaedw> transport: all I can say at the moment is that it would certainly be feasible to apply a certain amount of knowledge about NEON internals to the problem of speeding up memcpy on specific ARMv7 implementations
[08:41:16] <astrange> faster memcpy can be added directly to ffmpeg if it helps benchmarks
[08:41:24] <astrange> see fastmemcpy in mplayer
[08:41:40] <astrange> i complained the last time someone suggested porting that, but only to point out that we needed a benchmark for it
[08:41:53] <michaedw> and that Google and/or its hardware partners may have some economic interest in finding a way to do this without compromising relevant IP
[08:42:02] <astrange> (fastmemcpybench in mplayer is broken or at least not necessarily accurate, it very heavily favors whatever memcpy uses the most nontemporal instructions)
[08:42:36] <astrange> of course pointer swapping is preferred wherever we memcpy a lot
[08:42:49] <astrange> the fastest blitter is the one that doesn't exist
[08:43:05] <michaedw> the effort is probably better spent on something analogous to the kernel slub allocator
[08:43:37] <michaedw> for things like NAL units
[08:43:40] <roxfan> can you just copy apple's memcpy or it's protected somehow?
[08:44:22] <michaedw> or on using something more rope-like
[08:45:00] <michaedw> I like Vstr myself
[08:49:57] <roxfan> http://img256.imageshack.us/img256/2824/memcpyv7.png <- apple's armv7 memmove
[08:51:14] <kshishkov> looks more or less reasonable for memmove but not for memcpy
[08:51:27] <roxfan> memcpy is just a thunk to memmove
[08:51:37] <astrange> memmove and memcpy are the same function
[08:51:44] <kshishkov> ewww
[08:51:56] <astrange> one of memset(0 and bzero points to the other, i can't remember which
[08:52:23] <astrange> you can copy whatever you see at http://fxr.watson.org/ or opensource.apple.com
[08:52:27] * kshishkov remembers that wonderful change from android libc
[08:52:45] <astrange> memset(x,0 of course
[08:57:32] * _troll_ reporting for duty
[08:57:39] <KotH> kshishkov: the memset one?
[08:59:45] <roxfan> hm, bionic's memcpy has some neon
[09:00:57] <astrange> but just write your own instead of copying, it's more educational
[09:02:03] * elenril is surprised there's no flames about vp8 decoder
[09:02:18] <kshishkov> KotH: but of course
[09:02:51] <KotH> astrange: it's only educational, if someone else reviews it and tells you how to improve it :)
[09:02:51] <_troll_> elenril: wait for it...
[09:02:59] <kshishkov> elenril: why should they appear?
[09:03:04] <_troll_> and you listen
[09:03:20] <kshishkov> KotH: doing it may be an education too
[09:03:52] <elenril> kshishkov: because it's ffmpeg?
[09:04:00] <elenril> there are always flames
[09:04:26] <KotH> elenril: no flame today, flame tomorrow. there is always a flame tomorrow
[09:04:39] <kshishkov> elenril: we don't flame about having native decoders even for most crappy formats. Even for Jar-Jar Video
[09:04:55] <KotH> lol
[09:05:10] <Tjoppen> :)
[09:05:40] <transport> Description for http://freevec.org/function/memmove and http://freevec.org/function/memset with graphs
[09:07:01] <kshishkov> what purpose they serve except for cache size detection?
[09:13:20] <transport> the purpose seems to show that if you write or re-use his SIMD opimisations generally you get a large boost in thoughput, its working code so werth a look perhaps ? , dont loss anything by bringing these technics to the table
[09:13:25] <michaedw> how interesting do you consider out-of-order data arrival, over (say) RTP?
[09:13:31] <_av500_> astrange: android gcc is not CS, it is stock gcc with google patches
[09:14:21] <michaedw> google patches and google-cherry-picked-from-gcc-mailing-list patches
[09:14:22] <astrange> CS contributes back to upstream ARM code more than google, i mean
[09:14:31] <michaedw> the latter being mostly of CS origin
[09:15:02] <michaedw> and who writes the checks to CS?
[09:15:42] <michaedw> ARM and ARM licensees, mostly, where the ARM port is concerned
[09:15:52] <michaedw> including Android hardware partners
[09:15:59] <wbs> michaedw: what about rtp and out of order arrival?
[09:16:48] <michaedw> tends to result in an advantage for vstr and similar libraries
[09:17:25] <michaedw> the jitter buffer is just a vstr
[09:18:28] <wbs> uhmm, if you say so
[09:22:04] <_troll_> I wouldn't take his word for it
[09:22:13] <_troll_> whatever vstr means in this context
[09:22:42] <michaedw> first Google hit: http://www.and.org/vstr/
[09:23:22] <_troll_> oh that
[09:23:40] <_troll_> did anyone tell you you're not supposed to do heavy string processing in C?
[09:24:26] <michaedw> the string processing per se I could take or leave, the rope-like semantics without C++ may be appropriate for this job -- or may not
[09:24:51] <_troll_> rope is great... if you wish to hang yourself
[09:25:26] <michaedw> nah, I use a python for that; does the squeezing for you
[09:27:13] <_troll_> that I'll agree with
[09:32:46] <michaedw> does the rtp client support interleaved packetization mode?
[09:33:10] <wbs> michaedw: that's up to each depacketizer, but none of them supports it at the moment, afaik
[09:37:55] <michaedw> ah, right, it's Android's OpenCore that supports it
[09:39:17] <_av500_> oc ftw!
[09:39:32] <michaedw> the Apache-licensed portion
[09:41:21] <markuman> isn't this patch relevant? http://lists.mplayerhq.hu/pipermail/ffmpeg-devel/2009-October/076749.html
[09:43:43] <michaedw> markuman: I don't think so
[09:55:57] <markuman> michaedw: but without this patch, i get this error message with the last frame when using -vcodec libx264
[09:56:30] <michaedw> oh, sorry, thought you mean relevant to my current hobbyhorse :-)
[09:57:08] <markuman> michaedw: no, general in ffmpeg
[10:04:31] <markuman> i wonder because this patch is one year old and ffmpeg has this error to this day
[10:06:22] <wbs> markuman: well, Michael posted some follow-up questions to that patch there, that didn't seem to get responded to, so feel free to pick up the patch and continue the review of it
[10:27:54] <markuman> wbs: That's too deep for me :) i just know that the patch is working for me
[10:32:48] * KotH feeds the _troll_ with some kebab
[10:33:21] <_av500_> shish?
[10:34:29] <KotH> nope, adana
[10:35:44] <_av500_> yum
[10:58:33] <DonDiego> do we have any vp8 samples?
[10:58:41] <DonDiego> there are none in the samples collection
[10:58:44] <Tjoppen> ooh, native vp8 finally commited
[10:59:02] <DonDiego> how does it compare speed-wise?
[10:59:48] <Tjoppen> judging from the ml it's already quite a bit faster than the reference implementation
[10:59:52] <thresh> i was about to ask that very same question
[11:00:38] <Tjoppen> but appearently not up to spec since the spec fails specify how to handle some of the stuff in the test vectors
[11:00:54] <mru> it is up to spec
[11:01:01] <mru> libvpx is buggy :-)
[11:01:13] <Tjoppen> hehe
[11:01:57] <mru> seriously
[11:02:03] <mru> libvpx does things not in the spec
[11:02:03] <kshishkov> DonDiego: feel free to put something like http://lachy.id.au/lib/media/elephantsdream/Elephants_Dream-720p-Stereo.webm
[11:02:05] <_av500_> i thoguht libvpx is the spec?
[11:02:28] <merbzt> no, just the ref code
[11:02:37] <DonDiego> kshishkov: how large is that sample?
[11:02:47] <mru> _av500_: JM is not the h264 spec
[11:02:57] <kshishkov> DonDiego: 148M IIRC
[11:03:14] <_av500_> mru: JM is not on2 :)
[11:03:40] <_av500_> merbzt: I know what it is supposed to be :)
[11:11:39] <janneg> DonDiego: your youtube-dl -f 45 $URL
[11:12:11] <kshishkov> and maybe some test vectors from webm wherever they are
[11:12:30] <janneg> libvpx with asm is here twice as fast as the native deceoder
[11:14:07] <kshishkov> we haven't got asm for native decoder yet, have we?
[11:14:26] <_av500_> wasnt BBB writing some?
[11:14:28] <janneg> 5.3s vs. 11.1s on http://www.youtube.com/watch?v=uDJXzm4R-U8
[11:14:37] <janneg> kshishkov: nothing committed yet
[11:14:57] <janneg> BBB and Dark_Shikari are busy writing some
[11:15:17] <_av500_> bbbusy
[11:15:31] <kshishkov> I know, so have you tried it with hand-applied asm patches?
[11:15:40] <janneg> no
[11:16:24] <kshishkov> so it's rather useless benchmark
[11:19:07] <_av500_> kshishkov: we could spin it :)
[11:19:16] <_av500_> native is only 3db last fast than libvpx
[11:19:20] <_av500_> less
[11:23:27] <andoma> wbs: ?
[11:23:36] <wbs> andoma: pong?
[11:23:53] <andoma> wbs: r23706 makes avio.h no longer freestanding
[11:24:27] <andoma> AVClass is not defined
[11:24:32] <wbs> ah, crap
[11:24:37] <janneg> kshishkov: Dark_Shikari's latest patch doesn't compile
[11:24:47] <andoma> i guess you either need to include it, or add 'struct' to it
[11:24:57] <andoma> to the line, .. i guess the latter is preferred
[11:25:07] <andoma> include log.h, that is
[11:25:29] <mru> just add the required #include
[11:25:44] <andoma> wbs: you fix?
[11:25:49] <wbs> yeah, will do
[11:25:52] <andoma> sweet
[11:27:06] <wbs> peloverde: libavcodec/aacps.h fails make checkheaders btw
[11:27:21] <CIA-99> ffmpeg: mstorsjo * r23734 /trunk/libavformat/avio.h: Add all required includes to avio.h
[11:27:22] <CIA-99> ffmpeg: mstorsjo * r23735 /trunk/libavformat/avio.c: Reindent
[11:27:37] <wbs> andoma: there you go
[11:29:44] <andoma> thanks
[11:34:32] <janneg> kshishkov: down to 8.9s with jason's latest patch
[11:38:46] <_av500_> apply it once more and you should be there
[11:45:05] <CIA-99> ffmpeg: diego * r23736 /trunk/libavcodec/aacps.h: Add required #includes to pass 'make checkheaders'.
[11:51:44] <janneg> _av500_: and if I apply it 5 times I'll have the decoded frames before I started to decode?
[11:51:59] <mru> no
[11:52:01] <_av500_> check your hdd, they are already there :)
[11:52:03] <mru> it's multiplicative, not additive
[11:52:26] <mru> otherwise, if you applied it enough times, you'd have the decoded frames before you even started patching
[11:52:44] <mru> that's basically how a time machine is built
[11:52:59] <_av500_> with vp8 asm patches?
[11:53:03] <janneg> just get it from the future
[11:53:20] <mru> _av500_: no, other patches
[11:53:22] <mru> additive patches
[11:53:36] <mru> unfortunately they haven't been discovered yet
[11:53:46] <mru> the guys at lhc are hoping to find them
[11:54:17] <_av500_> uhm, this irc channel already seems to loop back in time
[11:58:02] <mru> hmm, two mpeg4thread failures after the auto-pthreads change
[12:02:37] <mru> and sparc/openbsd
[12:02:46] <mru> smells like uninitialised data somewhere
[12:03:55] <KotH> the smell could also be sopie farting in her sleep
[12:09:25] <KotH> mru: that depends on the definition of definitly
[12:34:49] <KotH> .o0(stupid people are stupid)
[13:01:14] <KotH> mru: you have some contacts at arm, dont you?
[13:01:29] <mru> I know a few people, why?
[13:01:43] <KotH> mru: could you propose to the guys there to standardize usb device interface?
[13:02:12] <mru> no
[13:02:16] <mru> not their problem
[13:02:25] <KotH> mru: every arm vendor has its own usb device system and all of them have blatant bugs
[13:02:31] <CIA-99> ffmpeg: rbultje * r23737 /trunk/Changelog: Add missing changelog entry for VP8 decoder.
[13:02:31] <CIA-99> ffmpeg: rbultje * r23738 /trunk/libavcodec/vp8data.h: Fix a typo, spotted by Diego.
[13:02:38] <mru> complain to the chip vendors
[13:02:46] <KotH> lol
[13:02:48] <KotH> i did with atmel
[13:04:25] <av500> KotH: thats not arms fault
[13:05:00] <KotH> after half a year discussing with them an easily reproducible race condition in their hw<->sw interface, i got them far enough to tell me that their silicon requires you to write to the registers in a specific way and that the documentation is "right" (although if you do it as writtne in the documentation you'll end up with a horrible race condition that is so easy to trigger that you can be sure that your customer will trip over it)
[13:05:48] <KotH> av500: it might be not their fault, but haveing one sane interface would also simplfy programming for different chips
[13:05:59] <KotH> av500: and they did it for the interrupt system too
[13:06:06] <av500> that is different
[13:06:47] <iive> KotH: hum... i've heard this firm. I should check if we use something from them.
[13:07:39] <KotH> iive: it's one of the biggest uC manufacturers, especially in the lower power range
[13:08:27] <mru> most chips implement ehci actually
[13:08:32] <KotH> iive: oh.. and if you are using their example code anywhere... drop and rewrite it... it's not worth the effort to fix it
[13:08:36] <mru> but they always need some lower level stuff
[13:08:40] <mru> power management etc
[13:08:47] <mru> you'll never get that standardised
[13:08:56] <KotH> mru: ehci is the host interface. i'm talking about the device interface
[13:09:35] <mru> well, there about half the chips use the buggy mentor usb block
[13:10:09] <av500> KotH: irq system is much closer to the actual cpu core than usb block
[13:10:26] <KotH> mru: power managment standardization is quite easy with usb
[13:10:36] <mru> no
[13:10:39] <KotH> mru: because the complete behavior is defined by usb already :)
[13:10:41] <mru> link power perhaps
[13:10:59] <av500> KotH: also, looking at e.g. TI, it is not the actual HW blocks that make problems, it is the interconnects that they always mess up
[13:11:22] <av500> smae for other SOC vendors
[13:11:26] <mru> the usb spec doesn't say anything about cpu<->controller interface
[13:11:29] <mru> as you've noticed
[13:11:43] <KotH> av500: hmm.. the mps430s we use here work like a charm
[13:11:56] <KotH> av500: didnt hit any silicon bug until now
[13:12:04] <av500> msp430 is to the OMAP3 what ffmpeg is to the boston strangler
[13:12:09] <mru> KotH: you're not using ffmpeg enough then
[13:12:39] <KotH> mru: i doubt that ffmpeg runs on a msp430 ;)
[13:12:52] <mru> see?
[13:13:08] <av500> KotH: we threw out msp430 for atmega in our designs :)
[13:13:44] <KotH> av500: why?
[13:14:07] <av500> i guess it was 0.2c cheaper
[13:26:41] <lu_zero> uhm
[13:35:56] <mru> BBB: ping
[13:36:17] <BBB> mru: pong
[13:36:24] <BBB> if it's about dest/srcstride, I'll fix that
[13:36:32] <mru> ok
[13:36:55] <BBB> just one of those things I didn't get to and then I forgot
[13:37:09] <mru> you should never have written such code in the first place
[13:37:43] <BBB> you forget that I've never written a video decoder before
[13:37:52] <BBB> be patient while I learn ;-)
[13:38:01] <mru> VLAs are dangerous everywhere
[13:38:05] <mru> not just in video decoders
[13:38:20] <BBB> no I know, but it's easy to forget it here or there
[13:38:36] <BBB> it's like a variable declaration halfway, sometimes it's useful to just do one for debugging and then you forget about it
[13:38:36] <mru> you should never, ever write one in the first place
[13:38:46] <BBB> ok, ok
[13:38:55] <mru> I'm going to make it an error in ffmpeg
[13:38:56] <BBB> I'll remove it, really, before I aply any mmx/thing patch
[13:39:01] <BBB> sure
[13:39:19] <BBB> (gcc has an option for that?)
[13:39:23] <mru> yes
[13:39:27] <BBB> use it
[13:39:38] <mru> I need to eradicate the existing ones first
[13:39:40] <BBB> that way I have no excuses left :)
[13:47:23] <lu_zero> yawn
[13:47:50] <Honoome> lu_zero: if you're bored I have a few things you could do
[13:48:13] <lu_zero> right now I'm collecting my strength to debug something _quite_ strange in ffmpeg
[13:48:28] <Honoome> too bad
[13:48:42] <lu_zero> the question is how multiple outputs in ffmpeg got broken?
[13:49:08] <lu_zero> Honoome: you found some new bugs?
[13:49:56] <Honoome> lu_zero: daily.. but which project are you interested in?
[13:59:08] <mru> what is the purpose of ff_lpc_compute_autocorr?
[13:59:15] <mru> it appears unused
[14:00:40] <jai> the plain c version is still useful
[14:00:53] <jai> if --disable-asm is used that is
[14:01:20] * mru needs better grepping skills
[14:01:34] <mru> what uses it?
[14:04:16] <jai> mru: flac, alac encoders right
[14:04:28] <jai> lavc/lpc.c
[14:04:34] <mru> I know where it is
[14:04:34] <jai> am i missing something?
[14:04:52] <mru> just not in the mood for tracing calls backwards
[14:05:17] <Honoome> mru: cscope helps
[14:05:19] <jai> mru: the lpc coeff calculation code
[14:05:32] <mru> jai: duh
[14:05:45] <mru> Honoome: not when the calls are done via function pointers
[14:06:14] <jai> so flacenc, alacenc and ra14.4k enc
[14:06:25] <Honoome> mru: good point
[14:19:57] <lu_zero> mru: Looks like we should add the chained generation to regtest
[14:20:23] <mru> what?
[14:25:50] <lu_zero> ffmpeg -i foo -s size out.size.blah -s size2 out.size2.blah is broken
[14:27:29] * lu_zero writes a stupid script and let bisect do the rest...
[14:35:12] * lu_zero screams about ffmpeg being hardly bisectable thanks to libswscale...
[14:42:01] <mru> Vitor1001: ping
[14:42:23] <Vitor1001> mru: pong
[14:42:27] <KotH> lu_zero: in irc, nobody hears you scream
[14:42:28] <KotH> ;)
[14:42:47] <mru> Vitor1001: lp_order arg of ff_acelp_lp_decode()
[14:42:55] <mru> is always 10 as used now
[14:42:56] <Honoome> KotH: that's why I usually phone him to scream at him :P
[14:43:20] <lu_zero> mru: how's your situation? Still having a impending deadline?
[14:43:22] <mru> Vitor1001: is there an upper limit for what this will ever be?
[14:43:39] <Vitor1001> mru: it's unused ATM
[14:43:58] <Vitor1001> I think you can safely assume 16 as max.
[14:44:02] <mru> it's callled from g729dec.c
[14:44:11] <mru> with a value of 10
[14:44:28] <lu_zero> Honoome: right now I envy you and mru manycores
[14:44:40] <mru> sipr16k has a float version of the same function with max 16
[14:44:50] <Honoome> lu_zero: and I'm in need of a manicure...
[14:45:13] <Vitor1001> mru:g729dec.c is uncomplete and never compiler
[14:45:37] <mru> lu_zero: I have only 12 core2-class cores
[14:45:48] <Honoome> damn you beat me
[14:45:54] <Honoome> I knew I should have updated to Instanbuls
[14:46:22] <mru> only 8 of them currently powered up
[14:46:46] <mru> Vitor1001: #define MAX_LP_HALF_ORDER 8
[14:46:54] <lu_zero> Instanbuls?
[14:47:02] <mru> lu_zero: turkish cpu
[14:47:09] <lu_zero> tasty!
[14:47:13] <mru> you can grill kebab on them
[14:47:39] <Honoome> lu_zero: Barcelona → Quad Opteron; Instanbul → Hex Opteron
[14:47:44] <Vitor1001> mru: fine for me
[14:47:57] <mru> Vitor1001: what is fine? that define exists
[14:48:09] <mru> is half_order half of order?
[14:48:38] <Vitor1001> mru: yes.
[14:48:53] <mru> so making max order twice that number is safe?
[14:48:57] <mru> and sensible
[14:50:22] <lu_zero> Honoome: is there a laptop with it?
[14:50:45] * lu_zero is wondering how much money to waste
[14:52:24] <Honoome> lu_zero: don't think so, but you can get the same dell as I got if you don't carea bout the touchpad
[14:53:08] <Vitor1001> mru: Yes. The "half_order" in the name is not just some random approximation ;)
[14:53:30] <mru> I wasn't sure there wasn't some +1 or something involved
[14:53:39] <mru> or the half referring to something other than the size
[14:53:57] <lu_zero> Honoome: how much did you pay?
[14:53:59] <Vitor1001> mru: I understand, it happens more than often...
[14:54:06] <Honoome> lu_zero: 1980 :P
[14:54:46] * Honoome is just setting a new IP(v6) up on his dns rather than using /etc/hosts... crazy?
[14:56:43] <lu_zero> 2284 for a similar apple...
[14:56:57] * lu_zero isn't that convinced about the dell monitor
[14:57:06] <Honoome> the dell monitor is _gorgeous_
[14:57:27] <lu_zero> really?
[14:57:29] <Honoome> better than the macs.. and it's not just my opinion but of my typographer (or however that translates) as well
[14:57:38] <lu_zero> uhmm
[14:58:26] * lu_zero has 10 dell to be configured for esof and well... they are gouging my eyes already...
[14:58:41] <Honoome> laptops or standalone monitors?
[14:58:48] <lu_zero> laptops
[14:59:11] <Honoome> *shrug* it's your own fault for having cancelled my trip to geneve :P
[14:59:39] <lu_zero> Honoome: you are more than welcome, the room has 4 beds...
[15:00:24] * mru likes the sony laptop screen
[15:01:04] <lu_zero> I thought the 6+3h would be a bit deadly ^^;
[15:01:44] <Honoome> 6+3? o_O
[15:01:59] <Honoome> what do you relly think I would have come to Turin first, anyway? :P
[15:02:13] <lu_zero> 6 Venezia-Torino + 3 Torino-Geneve
[15:02:19] <lu_zero> Honoome: Puria wants you
[15:04:05] <Honoome> lu_zero: I have come to Turin at another time.. my idea was that already ^^ -- Venice-Geneve wouldn't be too bad itself, as I wouldn't have to switch train at least
[15:05:54] <kshishkov> mru: too bad those screens are handicapped with sony laptop
[15:06:12] <mru> kshishkov: what do you have against sony?
[15:06:25] <av500> sony makes good headphones
[15:07:24] <kshishkov> mru: nothing except their proprietary lock-in behaviour and overpricing their products
[15:07:43] <mru> the price is annoying, sure
[15:07:53] <av500> mru: what did u pay?
[15:07:55] <mru> but I see no lockin
[15:07:57] <av500> and what config?
[15:08:12] <kshishkov> mru: ask Benjamin about ATRAC, for example
[15:08:26] <av500> who cares about atrac
[15:08:56] <mru> av500: i5 M540, 8GB, 500GB, 1080p, £lots
[15:09:12] <kshishkov> av500: minidisc users
[15:09:17] <mru> the laptop plays other formats than atrac
[15:09:20] <lu_zero> how much in euro?
[15:09:26] <mru> €lots
[15:09:30] <av500> mru: no SSD?
[15:09:30] <lu_zero> ...
[15:09:39] <mru> ssd isn't worth the overprice
[15:10:04] <kshishkov> av500: for 500GB SSD it would be EUR insanelylots
[15:10:21] <benoit-> lu_zero: €(lots*1.21)
[15:10:27] <lu_zero> thehe
[15:10:54] * lu_zero wants a faster laptop AND a unified git
[15:11:05] <mru> I can give you the git
[15:11:15] <lu_zero> would be great
[15:11:23] <lu_zero> so I could leave it bisecting unattended
[15:11:24] <kshishkov> lu_zero: faster than my Gdium or faster than what you have now?
[15:11:42] <lu_zero> kshishkov: faster as in "find the bug NOW"
[15:12:13] <lu_zero> faster as in "solve it" could be too much though
[15:12:18] <av500> Total Price£ 1,799.00 inc. VA
[15:18:27] <mru> damn this code is ugly
[15:18:30] <mru> shorten.c
[15:20:11] <mru> at least it's good to see our standards have improved
[15:22:11] <kshishkov> look into something older
[15:23:31] <av500> mru: your config is 2089.00€ here
[15:24:14] <Honoome> lu_zero: feel like adding sctp support to lsof or netstat or iproute2 while your laptop looks for the bug? :D
[15:24:34] <lu_zero> uhmm
[15:24:42] <lu_zero> mans where is the unified git?
[15:24:50] <mru> I don't have one
[15:24:55] <lu_zero> iproute2 doesn't support sctp?
[15:25:03] <Honoome> lu_zero: ss doesn't
[15:25:06] <mru> but I can make one
[15:25:12] <mru> when we decide to finally switch
[15:25:29] <lu_zero> after this I think I have a good case for that -_-
[15:25:31] <mru> doing it properly is too much work to do before that
[15:26:06] <Honoome> what are we waiting for? :)
[15:26:14] <mru> godot
[15:26:25] <Honoome> that's whom we're waiting for
[15:26:28] <lu_zero> mru: he's already arrived
[15:36:23] <lu_zero> ...
[15:36:26] <lu_zero> there
[15:37:12] <lu_zero> http://ffmpeg.pastebin.com/6LCXCwsW
[15:37:25] <mru> hehe
[15:38:11] <lu_zero> avfilter ....
[15:38:45] * Honoome mutters a few bad words... why on earth did he decide to go with slackware for kernel hacking, AT ALL?!
[15:39:05] <Honoome> [answer, because it looked faster than installing Gentoo; it was the wrong answer though
[15:39:50] <lu_zero> ...
[15:39:58] <lu_zero> we should have readytogo stage4
[15:40:06] <lu_zero> _really_ ready
[15:40:08] <mru> a base gentoo install is quite fast
[15:41:23] <Honoome> mru: in kvm?
[15:41:43] <mru> I don't use kvm
[15:41:45] <Honoome> well certainly if I count download time, plus setup time, plus finding out they don't really make much sense time...
[15:41:46] <mru> I have enough real machines
[15:42:00] <Honoome> mru: I'm short of convenience hardware to hack the kernel on
[15:42:23] <lu_zero> mru: do you have time help me putting a test to trigger the problem?
[15:42:24] <mru> beagles are great for kernel hacking
[15:43:04] * lu_zero has the script but is a bit lost on the Makefile
[15:43:38] * mru should tidy up the regtest part of the makefile some
[15:47:19] <lu_zero> basically I need to run ffmpeg 3 times and compare the outputs
[15:47:51] <mru> what are you testing?
[15:54:14] <lu_zero> http://ffmpeg.pastebin.com/8G1i6EU3
[15:56:56] <mru> why so complicated?
[15:57:24] <mru> run the commands separately and generate checksums etc
[15:57:38] <mru> then compare those against the files generated by the combined command
[15:57:53] <lu_zero> uhm
[15:58:11] <mru> I mean create the checksums outside the test script
[15:58:12] <mru> manually
[15:58:33] <lu_zero> ok
[15:58:36] <lu_zero> that part is simple
[16:11:30] <xxthink> Are there some tools to output the pts of a specific mp4 file?
[16:19:24] <j0sh_> wbs: i figured it out *just before* i got your email
[16:19:28] <j0sh_> d'ohhh
[16:19:38] <j0sh_> sleep does help :)
[16:19:40] <wbs> :-)
[16:19:45] <wbs> yes, it usually does
[16:20:14] * lu_zero needs some...
[16:20:18] <Honoome> not with idiotic ruby packages, not today as well not today as well =_=
[16:20:25] <wbs> after fixing the nitpicks I mailed about, and that little issue, I think it should be quite ok, but I guess I'll read it through once more in more detail then
[16:20:42] <wbs> and wait for comments from lu_zero and BBB if they want to have a say on it
[16:20:45] <BBB> I'll review the next iteration also
[16:20:47] <j0sh_> alrighty
[16:20:50] <BBB> was wacthing the US game
[16:20:54] <BBB> that was fun :)
[16:21:07] <j0sh_> yeah i think i forgot to format-patch before i sent that round of patches out
[16:21:25] <lu_zero> hopefully I'll be able to review after some rest...
[16:22:07] <av500> BBB: shouldn't KotH be negotitiating with them :)
[16:22:18] <BBB> ?
[16:22:34] <av500> turkish telecom...
[16:22:42] <BBB> why him? :-p
[16:23:33] <j0sh_> is there a way to review the history of a particular file?
[16:23:39] <av500> svn log
[16:23:40] <av500> svn blame
[16:24:03] <j0sh_> or in git? i know about blame, but it only gives me the most recent change
[16:24:04] <av500> svn diff -c <changeset>
[16:24:13] <Honoome> j0sh_: git log :P
[16:24:17] <Honoome> git log $filename
[16:24:24] <j0sh_> ok
[16:24:32] <lu_zero> j0sh_: gitk helps
[16:31:54] <j0sh_> lu_zero: wow. gitk is pretty cool
[16:44:09] <lu_zero> =)
[17:07:28] <sjhor_> Could anyone tell me the purpose of the two left_mb_xy values in lavc's h264 decoder?
[17:14:48] <sjhor_> Actually never mind I see what's going on
[17:24:27] <wbs> j0sh_: you could have a look at gitg, too, I prefer its graphical output to gitk
[17:33:59] <j0sh_> wbs: cool, will check that out. digging through the ffmpeg commit history is fun, i could do this all day :)
[17:34:54] <j0sh_> found the commits that added in mpeg4 and aac support also, so the (c) will be fixed in the next round of patches
[17:42:17] <BBB> Dark_Shikari: so what are the issues preventing a commit?
[17:42:36] <BBB> apart from the huge VLA mru complained about, will fix that now
[17:46:31] <Dark_Shikari> BBB: did you get my patch?
[17:46:42] <BBB> I integrated half of yours (mmx changes)
[17:46:47] <Dark_Shikari> You will have to modify all of the asm functions to take two strides
[17:46:50] <BBB> I didn't integrate the sssssse stuff because I can't test it
[17:46:56] <Dark_Shikari> Um, just locally commmit it
[17:47:01] <Dark_Shikari> you don't have to be able to test the ssse3
[17:47:10] <Dark_Shikari> I tested it for you
[17:47:15] <BBB> haha :) ok
[17:47:19] <Dark_Shikari> and if you want I can give you ssh access
[17:47:22] <Dark_Shikari> to test things
[17:47:28] <Dark_Shikari> now, so first of all, the VLA
[17:47:29] <BBB> nah, I'll keep begging for a better cpu
[17:47:42] <BBB> yeah, I'll fix the vla, mru already bugged me
[17:47:43] <Dark_Shikari> You'll have to modify all the asm functions (albeit trivially, and in the same way) to fix this
[17:47:46] <Dark_Shikari> do you see why?
[17:47:48] <BBB> if you have a patch, go commit it
[17:48:02] <BBB> yeah, because they're gonna get two different strides
[17:48:10] <BBB> so you need to remove the sub r0, r1
[17:48:15] <BBB> and use two adds instead of one
[17:48:22] <janneg> Dark_Shikari: yasm complained on x86_64
[17:48:24] <BBB> add r1, src_stride; add r0, dest_stride
[17:48:25] <BBB> or so
[17:48:30] <BBB> I forgot which one is r0
[17:48:41] <BBB> and you need to change the number of registers used in each function from 5 to 6
[17:48:51] <BBB> I think that's all
[17:48:54] <Dark_Shikari> janneg: yes, PIC is broken
[17:48:55] <BBB> is there more I'd need to change?
[17:48:55] <janneg> Dark_Shikari: all the 'FIXME prevent this on X86_64'
[17:49:06] <Dark_Shikari> janneg: that's not the problem
[17:49:08] <Dark_Shikari> the problem is that I broke PIC
[17:49:28] <Dark_Shikari> oh, BBB, the other thing I noted
[17:49:31] <Dark_Shikari> the 8x8 functions are never used
[17:49:35] <Dark_Shikari> if splitmv is on, it does all 4x4 MC
[17:49:36] <janneg> Dark_Shikari: libavcodec/x86/vp8dsp.asm:537: error: invalid size for operand 1
[17:49:37] <Dark_Shikari> which is going to suck
[17:49:55] <BBB> 8x8 should be used for chroma of non-splitmv
[17:50:00] <BBB> if it's not used, there's a bug somewhere
[17:50:07] <Dark_Shikari> BBB: ok, true
[17:50:09] <Dark_Shikari> but still
[17:50:14] <Dark_Shikari> fix that because it's going to be so much faster
[17:50:26] <BBB> ?
[17:50:31] <BBB> what should I fix?
[17:50:39] <Dark_Shikari> the fact that splitmv == 16 MC calls?
[17:50:42] <Dark_Shikari> instead of 1 MC call per partition?
[17:50:49] <BBB> ooh, I see what you mean
[17:50:52] <BBB> yeah ok, will do
[17:51:14] <BBB> that's not trivial, that might take me a day or two
[17:51:18] <Dark_Shikari> >dc_add/mmx is +/- 90 cycles faster
[17:51:22] <Dark_Shikari> You mean 9
[17:51:27] <Dark_Shikari> start_timer is measured in dezicycles.
[17:51:30] <BBB> yes
[17:51:41] <BBB> dezi is german?
[17:51:47] <BBB> brits say deci
[17:51:56] <Dark_Shikari> 1/10th
[17:52:01] <Dark_Shikari> no idea
[17:52:02] <janneg> yes, one thenth
[17:52:04] <Dark_Shikari> mru: awesome VLA killing
[17:55:35] <BBB> Dark_Shikari: but that second does not prevent me from committing this patch
[17:55:46] <BBB> it just prevents it from having full effect
[17:55:47] <Dark_Shikari> no it doesn't
[17:55:53] <Dark_Shikari> I just wanted to mention it.
[17:55:56] <BBB> ok
[17:55:58] <Dark_Shikari> so basically
[17:56:07] <Dark_Shikari> a) commit my patch locally
[17:56:11] <Dark_Shikari> b) fix vlas in all functions
[17:56:17] <BBB> I don't have your patch :-p
[17:56:22] <Dark_Shikari> Um, I emailed it out...
[17:56:24] <Dark_Shikari> ....
[17:56:29] <BBB> which addy?
[17:56:35] <Dark_Shikari> ffmpeg-devel?!?!?!
[17:56:47] <BBB> which thread?
[17:56:52] <Dark_Shikari> VP8 MMX optimizations?
[17:56:53] <Dark_Shikari> durrhrhrhrh?
[17:57:00] <Dark_Shikari> did you catch the stupid bug today?
[17:57:25] <BBB> I probably didn't read every email or so
[17:57:38] <Dark_Shikari> Um, how about... the most recent one?
[17:57:41] <Dark_Shikari> -.-
[17:58:01] <Dark_Shikari> ok, so you caught the stupid bug today
[17:58:07] <BBB> probably
[17:58:13] <BBB> let me get an axe
[17:58:14] <Dark_Shikari> Things you need to fix:
[17:58:27] <Dark_Shikari> 1) Vararray and all of the functions that take only one stride
[17:58:33] <Dark_Shikari> this can mostly be done by global search/replace etc
[17:58:49] <Dark_Shikari> Make sure to disable SSE2 functions to test the MMX ones and so forth
[17:59:04] <Dark_Shikari> Fix the SSSE3 ones even if you can't fix them; I'll check it for you when you're ready.
[17:59:39] <Dark_Shikari> 2) mov r4, r5m apparently broke something on x86_64 according to janneg. I have no idea what kind of crack he's on, but a simple way to handle that is to get rid of it. To do this, simply eliminate that and use r5 instead (5,5 instead of 4,5 in the cglobal).
[17:59:56] <Dark_Shikari> This is, again, a very simple single change applied to all functions.
[17:59:58] <BBB> yeah, that's what the FIXME was for
[18:00:06] <Dark_Shikari> Now, once you do these
[18:00:11] <Dark_Shikari> give me the patch, and I'll fix the following:
[18:00:18] <Dark_Shikari> 3) PIC is broken entirely
[18:00:28] <Dark_Shikari> btw, your code was _also_ broken with PIC previously
[18:00:42] <Dark_Shikari> because you assumed the x264 and ffmpeg versions of the x264asm headers matched
[18:00:45] <Dark_Shikari> They didn't.
[18:00:45] <Dark_Shikari> I fixed that.
[18:00:52] <Dark_Shikari> (Not your fault, there's no way you could have known)
[18:01:09] <janneg> Dark_Shikari: yasm 1.0.1.2326
[18:01:11] <BBB> I was about to say, you never told me about pic except that x264inc.asm did something for me related to that
[18:01:55] <Dark_Shikari> BBB: here's the rules about PIC with the latest x264asm
[18:02:05] <Dark_Shikari> 1) PIC is only supported on x86_64.
[18:02:29] <Dark_Shikari> 2) PIC is supported using "wrt rip". That is, constants are referenced using an offset from the instruction pointer.
[18:02:42] <Dark_Shikari> This is done automatically by yasm (x264asm just turns it on).
[18:02:43] <Dark_Shikari> HOWEVER
[18:03:01] <Dark_Shikari> you cannot do [globalconstant + r4*8 + r2 + 15 wrt rip]
[18:03:03] <Dark_Shikari> too complicated!
[18:03:28] <Dark_Shikari> thus, in PIC, you would have to do something like this:
[18:03:37] <Dark_Shikari> lea r11, [globalconstant]
[18:03:45] <Dark_Shikari> add r11, r2
[18:03:51] <Dark_Shikari> load from [r11 + r4*8 + 15]
[18:04:04] <Dark_Shikari> For cases like this, you can do %ifdef PIC
[18:05:07] <BBB> maybe you should make a macro for that
[18:05:49] <Dark_Shikari> See common/x86/cabac-a.asm for an example.
[18:05:57] <Dark_Shikari> The reason there isn't one is because it's rarely needed
[18:06:02] <Dark_Shikari> almost all times you load a global constant, it's just [pw_64]
[18:06:09] <Dark_Shikari> not [pw_64+r4*8+r2+...]
[18:06:25] <Dark_Shikari> Oh, and in the new x264asm, the ff_ prefix is automatically added.
[18:06:43] <Dark_Shikari> Both to function names and to constant names from ffmpeg.
[18:06:52] <Dark_Shikari> So [pw_64] will reference [ff_pw_64]
[18:07:00] <Dark_Shikari> this is set by %define program_name ff in x86inc.asm
[18:07:01] <Dark_Shikari> See my patch.
[18:07:11] <BBB> ok
[18:08:06] <Dark_Shikari> If you want, I can apply the "update x264asm" changes right now for you.
[18:08:07] <Dark_Shikari> is that ok?
[18:08:17] <BBB> of course
[18:08:18] <Dark_Shikari> this will make the patch smaller and simplify things
[18:08:20] <BBB> that makes it easier for me
[18:08:22] <Dark_Shikari> I hope I don't step on anyone's toes
[18:08:28] <Dark_Shikari> since it does slightly involve modifying other asm
[18:09:32] <Dark_Shikari> what's the regression test command again?
[18:10:04] <_av500_> make test?
[18:10:29] <Dark_Shikari> k
[18:11:13] <mru> lol
[18:11:25] * BBB goes do real work for a little
[18:11:29] <mru> add -j$bignum if you have the cores
[18:11:39] <Dark_Shikari> what does it do when it fails
[18:11:52] <mru> stops
[18:11:57] <mru> add -k to keep going
[18:14:18] <wbs> j0sh_: yeah, a good way of browsing history is a vital part of a version control system
[18:15:29] <_av500_> wbs: enterprise sw does that inline in the file, just follow the #ifdefs
[18:15:48] <wbs> _av500_: yeah, that's quite scary
[18:16:33] <mru> Dark_Shikari: btw, I found several stride-sized vlas in snow
[18:16:38] <Dark_Shikari> ouch
[18:17:11] <Dark_Shikari> BBB: so in short, you fix 1) and 2), I'll make it cross-platform and test it.
[18:17:15] <Dark_Shikari> then we can commit.
[18:18:10] <_av500_> wbs: and old versions can always be found on that coworkers smb share... ;)
[18:18:15] <Dark_Shikari> also, I'd like you to try to make an effort to understand my functions
[18:18:18] <Dark_Shikari> and ask questions if you need to
[18:18:25] <Dark_Shikari> this is a learning experience too, and I never taught you ssse3!
[18:32:56] <wbs> _av500_: oooooh, yeah. version control through smb shares.. don't make me start crying ;P
[18:34:06] <spaam> wbs: and put it on a dropbox share. can it be better? :)
[18:37:37] <mru> ftp!
[18:37:50] <lu_zero> printouts!
[18:37:51] <mru> with 0.9-time passwords
[18:39:46] <Dark_Shikari> mru: btw, did you notice with that patch the incredible stupidity of vp8 mc?
[18:39:58] <mru> I didn't look at the details
[18:39:59] <Dark_Shikari> it's "separable mc", but you can only do it one way...
[18:40:02] <Dark_Shikari> because you do the H pass first
[18:40:07] <Dark_Shikari> then you round back to 8-bit
[18:40:11] <mru> omg
[18:40:11] <Dark_Shikari> and then you do the V-pass on the rounded data
[18:40:23] <Dark_Shikari> Yes, you round _twice_.
[18:40:28] <Dark_Shikari> Gratuitous loss of precision for no reason.
[18:40:31] <mru> twice as round
[18:40:36] <Dark_Shikari> It makes the asm a bit easier to write
[18:40:38] <Dark_Shikari> but not any faster
[18:40:46] <Dark_Shikari> and loses compression for no reason
[18:41:09] <lu_zero> what happens when you remove them then ^^?
[18:41:49] <Dark_Shikari> remove what
[18:42:06] <lu_zero> the unnecessary round
[18:42:28] <Dark_Shikari> it'll be wrong obviously
[18:43:11] <BBB> Dark_Shikari: I understand what you did to v4/v6, and the dc_add looks pretty straightforward also (same as the x264 one we looked at)
[18:43:25] <lu_zero> well it's vp8.1 material
[18:43:33] <BBB> I haven't looked at the sse4 one yet, and I'll ask questions from there on (ssse3 also)
[18:44:20] <BBB> Dark_Shikari: and sse2 same thing; I actually want to learn sse2, it's useful for me
[18:44:27] <BBB> ssse3 will have to wait until I have a new cpu
[18:45:12] <Dark_Shikari> sse2 is basically the same as mmx
[18:45:16] <Dark_Shikari> just bigger
[18:45:24] <Dark_Shikari> fyi, I am _NOT_ satisfied with the sse2 mc code
[18:45:40] <Dark_Shikari> way too much overhead
[18:45:57] <Dark_Shikari> also, the "shifting down" trick for your V mc does save time, but I think that if we unrolled it by 2x
[18:46:00] <Dark_Shikari> we could eliminate most of it
[18:46:07] <Dark_Shikari> shifting down == m1 -> m0, m2 -> m1, etc
[18:46:32] <Dark_Shikari> if you unroll by 2x, you can just shift up, then shift down, repeatedly.
[18:46:40] <Dark_Shikari> i.e. alternate register patterns
[18:46:47] <Dark_Shikari> But this comes later, not intending to do it now.
[18:47:45] <Dark_Shikari> and feel free to throw ideas at me -- I definitely did not think of every possible option.
[18:49:11] <BBB> I'll get brighter ideas once I get good at this :)
[18:49:21] <BBB> just need to actually do real work now ;)
[18:49:29] <Dark_Shikari> this is real work
[18:50:05] <BBB> I don't get paid to do this :-p
[18:50:07] <Dark_Shikari> btw, all that mc code is the result of what I'd call an "MC marathon"
[18:50:10] <Dark_Shikari> when you take a day and say
[18:50:14] <Dark_Shikari> "fuck it I'm writing all the MC code"
[18:50:17] <BBB> mru: does stefan gehrer have svn access? or should I apply for him?
[18:50:18] <Dark_Shikari> I did this for h264 too a while back =p
[18:50:49] <mru> he should
[18:55:02] <Honoome> mru: see, everything comes around.. last night the farmville guy made me want to kill a huge proportion of the human race? now fatelf makes me wish to kill even more
[18:55:33] <_av500_> fatelf still exists?
[18:55:34] <mru> fatelf still exits?
[18:55:37] <mru> +s
[18:55:54] <_av500_> :)
[18:56:42] <Honoome> it does... and Ryan Gordon or someone who likes him is still going on to talk about it it seems
[18:56:52] <Honoome> there's an article on LWN about a talk about Ryan Gordon's "failures"...
[18:57:09] <Honoome> and people still think that "it has its uses"
[18:57:25] <mru> well, he was mercilessly evicted from lkml
[18:58:09] <Honoome> sure... a file that will _only_ load on an OS with modified kernel and modified loader, which is _not_ going to be smaller than the sum of the files it would replace (and that would be usable on _any_ kernel and _any_ loader... with restrictions of course)...
[19:21:39] <CIA-99> ffmpeg: darkshikari * r23739 /trunk/libavcodec/x86/ (6 files):
[19:21:39] <CIA-99> ffmpeg: Update x264asm header files to latest versions.
[19:21:39] <CIA-99> ffmpeg: Modify the asm accordingly.
[19:21:39] <CIA-99> ffmpeg: GLOBAL is now no longoer necessary for PIC-compliant loads.
[19:22:12] <Dark_Shikari> BBB: ^
[19:22:53] <mru> Dark_Shikari: extra funny, one of the vla[stride] in snow was unused
[19:22:58] <Dark_Shikari> lol
[19:23:07] <mru> there's tons of unused stuff there
[19:25:08] <j0sh_> lu_zero: Honoome: what bottlenecks feng? ram or network i/o?
[19:25:27] <Honoome> j0sh_: cpu.. we're mostly single-threaded for clients' handling
[19:25:45] <Honoome> plus we have a bad I/O hit by the DESCRIBE calls as we don't cache the video-on-demand results
[19:26:13] <j0sh_> you guys probe the media after each describe, right?
[19:27:03] <Dark_Shikari> well, lets see how many fate tests I broke
[19:27:15] <Dark_Shikari> how long does fate usually take to respond?
[19:29:32] <mru> grab the samples and run make fate yourself
[19:29:36] <Honoome> j0sh_: yeah, bloody slow :)
[19:29:52] <Honoome> mru: "make your own fate" :D
[19:30:01] <mru> :-)
[19:30:52] <CIA-99> ffmpeg: alexc * r23740 /trunk/libavcodec/ (8 files): aactab: Tablegenify ff_aac_pow2sf_tab.
[19:32:55] <CIA-99> ffmpeg: alexc * r23741 /trunk/libavcodec/Makefile: Fix alphabetization of the CONFIG_HARDCODED_TABLES Makefile section.
[19:43:18] <wbs> J_Darnley: I hope you noted someone replied to the libvorbis with >2 channels thread a few days ago, I don't think there's much left stopping getting that issue fixed
[19:45:52] <dgt84> how does one write e.g. a stats file in an encoder or filter if you can't use fprintf inside of libav*?
[19:46:25] <_av500_> open, write close?
[19:47:01] <mru> wrong answer
[19:47:07] <mru> right answer is you don't
[19:47:25] <mru> check how mpeg[124] does it
[19:49:05] <Dark_Shikari> something something api incompatibility something
[19:49:27] <mru> stop trolling
[19:49:29] <mru> you know it works
[19:49:32] <Dark_Shikari> oh, I know it works
[19:49:42] <Dark_Shikari> I'm just noting how it's incompatible with the apis of some other apps that do it the other way
[19:49:57] <Dark_Shikari> actually, wait, remind me. lavc does it via writing to a pointer provided by the user, right?
[19:50:14] <mru> something like that
[19:50:16] <Dark_Shikari> how does it know the size of the memory available to write to?
[19:50:18] <mru> lavc does no file i/o
[19:58:26] <dgt84> looks like snprintf is the way to go according to libavcodec/ratecontrol.c
[19:59:34] <dgt84> then just write
[20:12:38] <J_Darnley> wbs: I forgot that!
[20:13:05] <J_Darnley> I should have dealt with that right away before reading the rest of my mails
[20:20:08] <Honoome> "a branch has negligible overhead if not followed" ... after telling that fatelf would be useful only to custom vendors... such vendors that exist almost entirely in embedded... where a 5% overhead in kernel size is going to make the supervisor's head explode...
[20:21:48] <_av500_> embedded does not need fatelf
[20:22:29] <Honoome> _av500_: nobody needs fatelf
[20:23:02] <Dark_Shikari> mmmmm. fat elf.
[20:23:04] <Honoome> distributions don't need fatelf, they'd be at best reducing their package archives' size but paying a huge price in traffic (and that's _much_ worse as youtube demonstrates)
[20:23:56] <Honoome> hardware manufacturers and proprietary software producers don't need fatelf because they'd be forcing their customers to use _extremely_ newer systems, and that's a huge cost
[20:25:16] <Honoome> single developers won't make use of it at all
[20:25:34] <Honoome> and embedded vendors will laugh if you tell them to waste more storage space for that crap
[20:26:09] <microchip_> fat elf? fat dwarfs ftw! :p
[20:26:13] <mru> proprietary vendors force you to use rhel3 anyway
[20:26:37] <Dark_Shikari> hmm this is a good point
[20:26:38] <Dark_Shikari> if we have fat elf
[20:26:42] <Dark_Shikari> we need fat DWARF too.
[20:27:00] <Dark_Shikari> and maybe fat wizards and fat orcs.
[20:27:07] <microchip_> yep :D
[20:27:33] <Honoome> mru: they couldn't do that anymore :P
[20:28:48] <peloverde> Why are we talking about fatelf still. I thought we agreed that it doesn't solve anything
[20:29:16] <mru> peloverde: _everybody_ agreed on that
[20:29:33] <Honoome> peloverde: sorry I needed to vent, an idiot over identi.ca insists I'm being illogical at dissing it
[20:30:35] <mru> stay off such sites
[20:30:50] <peloverde> I gave up on all those social sites except linkedin and facebook
[20:31:18] <mru> I browse the topics on HN because they're fairly frequently interesting
[20:31:22] <mru> the stuff they link to
[20:31:39] <mru> and the discussions are usually a notch above reddit and the like
[20:31:53] <Honoome> mru: it links to my trouble with ruby, sometimes it's the only way I have to contact upstream :/
[20:32:15] <peloverde> have they considered a bug tracker or a mailing list?
[20:32:29] <Honoome> hahahah
[20:32:33] <mru> Honoome: and that's not a warning sign?
[20:32:41] <Honoome> half the released gems don't have a repository :/
[20:32:50] <Honoome> more than half don't have tarballs
[20:32:57] <Honoome> about a third don't have versioned tags
[20:33:16] <mru> what are you still doing with it?
[20:33:27] <Honoome> mru: it's definitely a sign that ruby and rails attracted a bunch of wannabes that shouldn't be allowed to write code for a living so much as I'm not allowed to sing, for a living
[20:33:27] * peloverde never drank the ruby cool-aid
[20:33:56] <Honoome> mru: I like the language itself :/ — most of what I do, though, is standalone, such as ruby-elf
[20:34:09] <Honoome> but lately I'm still swamped in a work project I've already been paid to complete
[20:34:37] <peloverde> ruby is nice because the reference material comes bundled with porn
[20:34:45] <lu_zero> pfff
[20:34:51] <lu_zero> you meant rails
[20:35:00] <lu_zero> ruby is bundled with chunky bacon
[20:35:03] <lu_zero> and foxes
[20:41:36] <mru> if (!isnotcompressed) ...
[20:43:21] <lu_zero> uh?
[20:44:32] <CIA-99> ffmpeg: vitor * r23742 /trunk/libavcodec/mpegaudiodec.c: Remove pointless condition in #if
[20:45:06] <j0sh_> Honoome: but you have to admit, rails makes web dev pleasant. anything else is painful now
[20:45:32] <Honoome> j0sh_: trust me that if you look "under the hood", rails isn't less painful. at all
[20:46:27] <CIA-99> ffmpeg: vitor * r23743 /trunk/libavcodec/ (mpegaudiodec.c mpegaudiodec_float.c): Move float-specific function to mpegaudiodec_float.c
[20:46:32] <j0sh_> oh, i know. but when you keep the hood closed, it works well
[20:47:24] <Honoome> j0sh_: I can't do that :P
[21:01:49] <mru> Vitor1001: will you fix the warning about compute_antialias too?
[21:02:13] <Vitor1001> mru: Well, that's not really my fault.
[21:02:21] <Vitor1001> But yes, I can give a look
[21:02:21] <mru> no, but you'r working on the code
[21:02:40] <mru> ifdef and/or move to _float.c
[21:03:24] <Vitor1001> I imagine. I'll try to give a look after I finish benchmarking mp3lib dct32 :p
[21:03:34] <mru> no rush
[21:03:44] <mru> at least there are no VLAs there
[21:04:17] <mru> how many patches do I need to send before someone replies?
[21:04:37] <Vitor1001> What is so bad with VLA when they are not abused?
[21:04:44] <mru> any use is abuse
[21:04:59] <mru> they are unsafe and slow
[21:05:23] <Vitor1001> Why they need to be unsafe and slow?
[21:05:43] <mru> what happens if the size is outrageous?
[21:05:46] <mru> you die
[21:05:49] <Vitor1001> Of course.
[21:05:55] <mru> no chance of recovery
[21:06:08] <Vitor1001> But suppose that you check somewhere that size < 128, for ex.
[21:06:19] <Vitor1001> And use int buf[size] everywhere.
[21:06:21] <mru> then you might as well allocate 128 unconditionally
[21:07:06] <mru> if the maximum size is acceptable, there's no reason to not always use it
[21:07:22] <mru> since again, you can't catch an error
[21:07:31] <mru> so there's nothing to be gained from trying to save a few bytes
[21:08:44] <Vitor1001> Hmm...
[21:08:50] <Vitor1001> No memory fragmentation?
[21:08:58] <mru> it's the bloody stack
[21:09:02] <Vitor1001> int tab[10][size];
[21:09:03] <mru> it doesn't fragment
[21:09:06] <mru> that's even worse
[21:09:18] <mru> such a vla is much slower than a fixed-sized one
[21:09:39] <mru> since when indexing, you must multiply by a variable
[21:10:36] <Vitor1001> that's a good point.
[21:10:53] <mru> also, gcc can't inline a function containing a vla
[21:10:58] <mru> and you lose one register
[21:14:10] <peloverde> I wish C would relax what in considers a constant expression in regard to array sizing
[21:14:29] <mru> I used to say the same
[21:14:34] <mru> then I thought about it
[21:14:40] <mru> and realised it would be very hard to do
[21:21:15] <lu_zero> j0sh_ try turbogears2
[21:22:11] <lu_zero> if you get a growing dislike on rails automagic/implicit/unexpected tg2 usually is quite pleasant
[21:23:49] <CIA-99> ffmpeg: mru * r23744 /trunk/libavcodec/flacenc.c: flacenc: convert VLA to fixed size
[21:24:14] <mru> one down, many to go
[21:25:48] <lu_zero> post also benchmarks =P
[21:30:30] <Honoome> lu_zero: only if you can stand the fact that there is NO FRIGGING DOCUMENTATION :P
[21:30:51] <mru> Honoome: in that case, what are you doing here?
[21:31:15] <Honoome> mru: trust me, there's documentation here, compared to tg2
[21:32:55] <Honoome> lu_zero: I implemented FIONREAD/SIOCINQ for SCTP ...
[21:34:15] <pengvado> gcc can inline a function containing a vla, and the entirety of the calling function loses one register
[21:34:38] <lu_zero> \o/!
[21:35:26] <lu_zero> Honoome: tg2 has plenty of docs
[21:35:35] <lu_zero> scattered across 5-6 websites
[21:35:38] <lu_zero> but plenty
[21:35:39] <Honoome> in .py files
[21:36:00] <mru> pengvado: hmm, I've never seen it inline a vla
[21:36:11] <mru> and iirc I read somewhere it couldn't
[21:36:17] <mru> but I could have made that up
[21:36:44] <mru> empirically it clearly reduces inlining ability though
[21:36:45] <lu_zero> Honoome: tg2.1 actually has quite documented skels, jokes aside
[21:36:57] <Honoome> "finally"? :P
[21:37:54] <lu_zero> in the paster autogenerated stuff I mean
[21:38:01] <lu_zero> still
[21:38:47] <pengvado> you're right that gcc usually chooses not to inline vla functions unless you force it
[21:39:33] <mru> why did anyone ever thing they were a good idea?
[21:39:37] <mru> *think
[21:43:56] <Honoome> lu_zero: okay I sent the sctp patch to the two mailing lists, let's see if somebody respond to that
[21:44:07] <Honoome> if they do accept it, though, I'll have to work to the user code in feng :P
[21:46:19] <CIA-99> ffmpeg: stefang * r23745 /trunk/libavcodec/vp8.c: avoid conditional and division in chroma MV calculation
[23:45:38] <CIA-99> ffmpeg: mru * r23746 /trunk/libavcodec/snow.c: snow: remove unused parameter to mc_block()
1
0
[01:03:22] <Kovensky> <@Dark_Shikari> the developer who does git format-patch | sendmail. <-- git send-email
[01:07:54] <peloverde> At least by default git-send-email gives a summary and a chance to cancel
[04:09:48] <Dark_Shikari> I wonder what michael will think of my patch
[04:10:06] <Dark_Shikari> I think I'm going to scare him
[04:10:15] <Dark_Shikari> or maybe anger him, I tore apart his beloved decoder
[04:18:08] <elenril> noone on the ffmpeg team would go to such flaming if it was a real issue << lol
[04:18:53] <Dark_Shikari> <3 michael
[06:45:54] <CIA-92> ffmpeg: mstorsjo * r23692 /trunk/ffserver.c: ffserver: Use avcodec_copy_context instead of manually copying an AVCodecContext
[06:53:58] <CIA-92> ffmpeg: mstorsjo * r23693 /trunk/libavcodec/libvorbis.c:
[06:53:58] <CIA-92> ffmpeg: libvorbis: Only drop 1-byte packets at end of stream
[06:53:58] <CIA-92> ffmpeg: This fixes handling of totally silent packets during the encoding, that
[06:53:58] <CIA-92> ffmpeg: also are 1 byte in size.
[06:53:58] <CIA-92> ffmpeg: This fixes issue 2013
[07:08:46] <av500> peloverde_: ping
[07:08:51] <peloverde_> pong
[07:09:01] <av500> wrt facebook
[07:09:10] <av500> the image there shows a user "rikiji"
[07:09:21] <av500> and there are 2 hits for rikiji and ffmpeg:
[07:09:48] <av500> http://www.hwupgrade.it/forum/archive/index.php/t-1437356-p-16.html
[07:10:04] <av500> http://forum.ubuntu-it.org/index.php?PHPSESSID=31fb8a415eacbe05bb255a301fa8…
[07:10:59] <peloverde_> thanks, interesting
[07:13:25] <kshishkov> av500: have you heard the story about "Linux" trademark?
[07:13:54] <av500> my wife used Linux washing powder...
[07:14:05] <kshishkov> not that one
[07:14:09] <pJok> god morgon, kshishkov :)
[07:14:18] <kshishkov> goda morgnar, pJok
[07:14:27] <av500> kshishkov: but yes
[08:28:28] <merbzt> can someone commit the dts patches ?
[08:29:03] <wbs> merbzt: the ones from yesterday evening? sure
[08:32:52] <CIA-92> ffmpeg: mstorsjo * r23694 /trunk/libavcodec/dca.c:
[08:32:52] <CIA-92> ffmpeg: Support DTS-ES extension (XCh) in dca: move subband_samples into context structure
[08:32:52] <CIA-92> ffmpeg: Patch by Nick Brereton, nick at nbrereton dot net
[08:33:48] <CIA-92> ffmpeg: mstorsjo * r23695 /trunk/libavcodec/dca.c:
[08:33:48] <CIA-92> ffmpeg: Support DTS-ES extension (XCh) in dca: move original code around to allow reused by DTS-ES code
[08:33:48] <CIA-92> ffmpeg: Patch by Nick Brereton, nick at nbrereton dot net
[08:34:53] <CIA-92> ffmpeg: mstorsjo * r23696 /trunk/libavcodec/dca.c:
[08:34:53] <CIA-92> ffmpeg: Support DTS-ES extension (XCh) in dca: update and add channel mapping tables for DTS-ES mappings
[08:34:53] <CIA-92> ffmpeg: Patch by Nick Brereton, nick at nbrereton dot net
[08:35:41] <CIA-92> ffmpeg: mstorsjo * r23697 /trunk/libavcodec/dca.c:
[08:35:41] <CIA-92> ffmpeg: Support DTS-ES extension (XCh) in dca: add code to handle DTS-ES extension
[08:35:41] <CIA-92> ffmpeg: Patch by Nick Brereton, nick at nbrereton dot net
[08:36:33] <CIA-92> ffmpeg: mstorsjo * r23698 /trunk/libavcodec/dca.c:
[08:36:33] <CIA-92> ffmpeg: Support DTS-ES extension (XCh) in dca: Cosmetic cleanup
[08:36:33] <CIA-92> ffmpeg: Patch by Nick Brereton, nick at nbrereton dot net
[08:37:00] <wbs> merbzt: done
[08:49:19] <CIA-92> ffmpeg: cehoyos * r23699 /trunk/libavformat/utils.c:
[08:49:19] <CIA-92> ffmpeg: Fix failure in av_read_frame on timestamp rollover.
[08:49:19] <CIA-92> ffmpeg: Patch by Stephen Dredge, sdredge A tpg com au
[08:58:55] <kshishkov> merbzt: is it wholly in? If yes you should ask Martin to mention XCh support in Changelog
[09:03:39] <wbs> sure, just tell me what to write (I haven't got a clue about that format) - I guess lavc minor should be bumped, too?
[09:07:14] <kshishkov> micro, not minor, I think. And it's should be in lines "DTS-ES extension (XCh) decoding support"
[09:07:33] * kshishkov is ashamed not to add entries to Changelog for a long time
[09:19:14] <merbzt> me too :/
[09:19:20] <merbzt> wbs: thanks
[09:20:12] * KotH is not ashamed having not added anything to changelog ever
[09:26:16] <kshishkov> KotH: that's because you're no FFmpeg dev
[09:29:28] * pJok adds kshishkov to Changelog
[09:32:40] <KotH> kshishkov: and because i'm a shamless plug :)
[09:34:01] <kshishkov> KotH, that makes me want to say some remark about Turks
[09:34:54] <av500> mechanical ones?
[09:36:43] <merbzt> ahhh
[09:36:53] <merbzt> good day in #ffmepg-devel
[09:37:53] <kierank> what about young ones?
[09:38:12] <av500> young mechanical turks(tm)
[09:39:08] <KotH> kshishkov: you can say whatever you want
[09:39:25] <KotH> kshishkov: but remember! somday we'll be where you live!
[09:39:26] <KotH> ;)
[09:39:48] <av500> KotH: you have to take Vienna 1st...
[09:39:52] <kshishkov> KotH: actually I prefer to stop myself before saying some things
[09:40:14] <kshishkov> av500: s/Wien/Niedermayerstadt/
[09:40:36] <kierank> av500: I read some article that was trolling Darmstadt
[09:40:46] <av500> kierank: aha
[09:40:51] <av500> what did it troll about?
[09:41:11] <kierank> the fact that they wanted to build a high speed station there even though the town is not worthy of one
[09:41:18] <av500> ah that one
[09:41:29] <av500> yes, its stupid
[09:41:32] <kshishkov> what is high speed station?
[09:41:47] <av500> kshishkov: well, more a low speed station for a high speed train
[09:42:18] <kshishkov> av500: I've seen only zero-speed stations so far
[09:42:27] <av500> new high speed line from FRA to mannheim
[09:42:31] <av500> and DA wants a stop
[09:42:48] <KotH> av500: nah.. we skipped vienna and are now in berlin ;)
[09:42:59] <av500> KotH: right
[09:43:08] <av500> you can keep the swamp
[09:43:35] * KotH keeps the swiss chocolate
[09:43:47] <av500> kierank: so the article is mostly right...
[09:44:03] <kshishkov> av500: I have the impression that West Berlin is a myth and it's all EAST
[09:44:54] <av500> kshishkov: well, both parts of the city are broke now
[09:45:04] <av500> so much for re-unification :)
[12:37:56] <Tjoppen> is there any way to determine whether a video codec supports B-frames? except hard coding the cases of course
[12:39:38] <Tjoppen> a quick look seems to indicate that only the mpeg1/2/4 encoders support them. I might get away with a simple if(codec_id == ...) thing, although a bit ugly
[12:42:05] <CIA-99> ffmpeg: pross * r23701 /trunk/libavcodec/iff.c: IFF PBM decoder: Add a pad byte if image width is odd <aleksi dot nurmi at gmail dot com>
[12:44:44] <mru> Tjoppen: what are you trying to do?
[12:47:19] <Tjoppen> keeping encoder setup from failing
[12:47:45] <Tjoppen> I'm copying settings from the decoder and applying any user overrides to thbat
[12:48:52] <Tjoppen> I guess another solution could be to simply refuse to use B-frames unless explicitly told to do so
[12:49:03] <mru> what's the problem with b-frames?
[12:51:50] <Tjoppen> only the mpeg1/2/4 encoders support them
[12:52:09] <mru> yes, so only they will use them
[12:52:20] <Tjoppen> if I go and encode h263 avcodec_open() will fail
[12:52:28] <mru> really?
[12:52:30] <Tjoppen> yep
[12:52:42] <Tjoppen> see MPV_encode_init()
[12:53:08] <Tjoppen> line 398: av_log(avctx, AV_LOG_ERROR, "b frames not supported by codec\n");
[12:54:49] <mru> ok, I see
[12:55:01] <mru> this needs to be fixed
[12:55:09] <Tjoppen> in general it'd be nice to be able to detect these things. qpel is another case
[12:55:21] <mru> there are two solutions
[12:55:23] <Tjoppen> really, all checks near that line
[12:55:34] <mru> 1. make init adjust the settings to whatever is supported
[12:55:43] <mru> 2. add codec_caps for stuff
[12:56:05] <Tjoppen> I believe michael is opposed to #1
[12:56:15] <mru> that's stupid
[12:56:24] <Tjoppen> some codecs already do similar things though
[12:56:35] <Tjoppen> overriding settings that are "wrong"
[12:56:50] <mru> what if some codec allows at most, say, 4 b-frames
[12:56:53] <mru> and you request 8
[12:57:02] <mru> even codec_cap won't handle that
[12:57:31] <mru> better to have init override impossible settings, at least in some cases
[12:57:38] <Tjoppen> or maybe have an out-AVCodecContext for "adjusted" settings?
[12:57:50] <mru> of course some things should fail hard
[12:57:54] <mru> like bad sample rates for audio
[12:58:14] <mru> those params describe the input
[12:58:25] <mru> if it's out of bounds, the encoder simply can't handle it
[12:58:28] <Tjoppen> right. as opposed to desired settings
[12:58:32] <mru> other things are merely requests
[13:01:59] <Tjoppen> if modifying the struct is out, what about returning more helpful information such as which field is the offending one, along with a suggested value?
[13:02:31] <Tjoppen> that way even bad sample rates and resolutions can be handled
[13:06:31] <Tjoppen> for instance, the h263 encoder could say if(avctx->width > 352){avctx->bad_setting = AV_SETTING_WIDTH; avctx->bad_setting_suggest = 352; return AVERROR_EINVAL;}
[13:10:24] <mru> codecs already advertise some constraints in the struct AVCodec
[13:13:24] <Tjoppen> true, but not all can be put in a struct that way
[13:13:37] <mru> some are min/max
[13:13:45] <mru> some are yes/no
[13:13:53] <mru> (which is a special case of min/max)
[13:13:58] <Tjoppen> a recent example would be the aac encoder (or aac in general) where the maximum bitrate allowed depends on samplerate and the number of channels
[13:14:14] <mru> yeah, that's more complicated
[13:14:39] <mru> what do audio encoders do with invalid bitrates?
[13:14:41] <Tjoppen> or dvvideo, where only about 5-10 settings are valid
[13:15:02] <mru> dnxhd too
[13:15:10] <Tjoppen> aacenc fails, other ignore it and code at their maximum
[13:15:15] <_av500_> ffmpeg could also have a flag to allow it to chose sane values if asked by user
[13:15:23] <_av500_> or to fail
[13:15:29] <mru> that's a good idea
[13:15:39] <Tjoppen> kinda like how the error recovery works atm?
[13:15:47] <mru> no
[13:16:07] <mru> we're talking about encoder settings, right?
[13:16:32] <mru> with the fixup flag set, invalid params would be automatically changed to the nearest supported value
[13:17:25] <_av500_> vp8 would be replaced by h264 BP...
[13:17:58] <Tjoppen> indeed. have a "strict" and a "best effort" mode, and perhaps a couple in between
[13:19:31] <mru> or do it the gnome way: hardcode or auto-choose everything
[13:39:48] <merbzt> HAHAHA, flamefest on ffmpeg-dev
[13:40:06] <merbzt> if I only had that much time to write mails
[13:40:40] <_av500_> mru: just make sure it cannot print
[13:41:09] <Tjoppen> fika.
[13:41:31] <_av500_> no fika on stupid train
[13:41:39] <mru> that is a stupid train
[13:41:51] <mru> swedish trains actually have half-decent coffee
[13:41:53] <_av500_> well it has fika, but i doubt i will like it
[13:42:00] <mru> only half though
[13:42:09] <_av500_> the water is good?
[13:43:09] <KotH> merbzt: which "discussion"?
[13:43:35] <_av500_> coeff 63?
[13:43:40] <wbs> BBB: are you ok with the url_alloc/url_connect patches?
[13:44:00] <BBB> if michael is ok I'm probably ok also... as long as they fix the issue, I'm OK
[13:44:08] <BBB> make sure we can set priv_data options after alloc
[13:44:13] <BBB> that's my only requirement :)
[13:44:20] <BBB> but you would probably know that already
[13:44:34] <wbs> yes, and that's taken care of :-)
[13:47:37] <merbzt> er cvs-log
[13:59:42] <CIA-99> ffmpeg: mstorsjo * r23702 /trunk/libavformat/ (avio.c avio.h allformats.c avformat.h):
[13:59:42] <CIA-99> ffmpeg: Add an av_register_protocol2 function that takes a size parameter
[13:59:42] <CIA-99> ffmpeg: This allows extending the URLProtocol struct without breaking binary
[13:59:42] <CIA-99> ffmpeg: compatibility with code compiled with older definitions of the struct.
[14:01:01] <CIA-99> ffmpeg: mstorsjo * r23703 /trunk/doc/APIchanges: Add an APIchanges entry for av_register_protocol2
[14:01:52] <mru> btw why does vp3 idct use the same function pointer as normal idct?
[14:02:00] <mru> it's not the same function
[14:04:24] <CIA-99> ffmpeg: mstorsjo * r23704 /trunk/libavformat/ (avio.c avio.h avformat.h): Split url_open and url_open_protocol into url_alloc and url_connect
[14:04:54] <Tjoppen> looks like the http code has gone through a bit of a round trip. the chunked encoding stuff is my old compact code :)
[14:06:00] <CIA-99> ffmpeg: mstorsjo * r23705 /trunk/doc/APIchanges: Add an APIchanges entry for url_alloc() and url_connect()
[14:06:58] <merbzt> mru: it's the same but different
[14:07:23] <mru> they're not even remotely interchangable
[14:07:52] <mru> there's no reason you'd ever want to use mpeg2 idct with vp3 or vice versa
[14:08:04] <mru> except perhaps for comic effect
[14:08:09] <mru> but for that we have bink
[14:10:06] <CIA-99> ffmpeg: mstorsjo * r23706 /trunk/libavformat/ (avio.c avio.h avformat.h):
[14:10:06] <CIA-99> ffmpeg: Add priv_data_size and priv_data_class to URLProtocol
[14:10:06] <CIA-99> ffmpeg: This allows url_alloc to allocate and initialize the priv_data.
[14:11:49] <CIA-99> ffmpeg: mstorsjo * r23707 /trunk/doc/APIchanges: Add an APIchanges entry for priv_data_size and priv_data_class
[14:13:28] <CIA-99> ffmpeg: mstorsjo * r23708 /trunk/libavformat/http.c: Allocate the HTTPContext through URLProtocol.priv_data_size
[14:14:45] <CIA-99> ffmpeg: mstorsjo * r23709 /trunk/libavformat/http.c: Add an AVClass to the HTTPContext
[14:15:53] <CIA-99> ffmpeg: mstorsjo * r23710 /trunk/libavformat/ (rtsp.c http.c):
[14:15:53] <CIA-99> ffmpeg: Make the http protocol open the connection immediately in http_open again
[14:15:53] <CIA-99> ffmpeg: Also make the RTSP protocol use url_alloc and url_connect instead of relying
[14:15:53] <CIA-99> ffmpeg: on the delay open behaviour.
[14:17:22] <wbs> BBB: there, now we should have gotten rid of the regressions and have a clean api
[14:19:17] <BBB> \o/
[14:19:50] <BBB> there's now a crash on ffmpeg-user with http streaming
[14:19:53] <BBB> can you look into that? :-p
[14:20:57] <wbs> ugh ;P
[14:21:02] <wbs> I'll give it a look later
[14:22:42] <BBB> I think I almost have a vp8 decoder patch ready to resubmit to ML
[14:23:42] <mru> BBB: add a pack4x8 macro to mathops.h or something
[14:23:52] <mru> it can be easily optimised
[14:24:39] <mru> and it's useful elsewhere
[14:24:42] <BBB> I agree
[14:26:11] <BBB> give me a second while I finish the actual vp8 patch, then I'll resubmit a patch for that
[14:32:21] <BBB> what was the verdict on the vp56mv int->int16_t conversion?
[14:32:40] <mru> probably good
[14:33:09] <mru> someone raised an obligatory bikeshed over the possibility of extra sign extension being done here and there
[14:34:57] <BBB> I'm trying not to be involved in bikesheds too much these days
[14:35:06] <BBB> I've noticed how much you can get done if you don't contribute to bikesheds
[14:39:02] <mru> whatever I want to do, there's always a bikeshed blocking my path
[14:39:12] <mru> and if there isn't one, reimar builds one
[14:40:37] * KotH paints it red
[14:41:11] * mru gets C4
[14:45:40] * KotH buys an N^2 mine on the otaku black market
[14:51:45] <BBB> how do I get the .text size?
[14:51:48] <BBB> (for mru)
[14:51:58] <BBB> just ls --size vp56.o is enough?
[14:53:41] <_av500_> mru: on2 did not get vp8 to 120mio quality by bikeshedding :)
[14:55:42] <ohsix> BBB: theres 'size', but i think it measures more than .text; you can dump and parse section headers with objdump too
[14:56:24] <ohsix> size lists text data and bss here, on 2 lines
[14:57:21] <ohsix> (really old distro, nothing newer within reach to see what it does now)
[14:58:21] <mru> BBB: readelf -S
[14:58:38] <mru> or 'size'
[15:01:05] <BBB> size did it, thanks
[15:03:08] <CIA-99> ffmpeg: benoit * r23711 /trunk/libavutil/common.h: Add missing parentheses in MKTAG and MKBETAG macros.
[15:05:22] <janneg> BBB: wrong parens added
[15:05:49] <janneg> it need parens around the macro arguments
[15:06:04] <mru> BBB: oh, and separate that from the rest of the patch
[15:06:15] <mru> add the macro first as a separate commit
[15:06:35] <BBB> okiedokie
[15:06:41] <BBB> janneg: oh right
[15:07:09] <mru> also << has higher precedence than | so those parens aren't needed
[15:07:16] <mru> but gcc might throw a fit
[15:07:30] <mru> imo that warning should be turned off
[15:07:51] <ohsix> check out "ten most active lists today" http://gmane.org/
[15:08:09] <mru> world domination...
[15:09:42] <mru> that's an odd collection btw
[15:11:26] <BBB> mru: I do it just for the warning
[15:11:47] <ohsix> was that dithering thing the only reason there wasn't already a dc only idct?
[15:11:56] <mru> no
[15:11:59] <mru> that was lazyness
[15:13:58] <BBB> the parens around are needed if you use the result directly in weird mathops btw, or that's what I was always taught
[15:14:07] <BBB> anyway, let me compile a version taht uses #if BIGENDIAN
[15:15:06] <mru> parens around all of it are needed to protect it if used as an operand in an expression
[15:15:13] <mru> imagine this
[15:15:21] <mru> #define FOO(a, b) a + b
[15:15:34] <mru> FOO(x, y) * 3
[15:15:55] <ohsix> premature optimization and all that; michael needs to get a faster computer so the code can get to a point where it can actually be massaged for what he's looking for instead of just taking it out at the knee
[15:16:39] <mru> maybe I should send him my old core2
[15:19:10] <BBB> bleh, I don't want to disassemble today
[15:19:11] <BBB> maybe later
[15:19:17] <BBB> the patch isn't required for vp8 anyway
[15:21:07] <mru> btw, some silly people don't know why parens around macro expansions are sometimes needed
[15:21:14] <mru> so they cargo-cult them everywhere
[15:21:18] <mru> #define FOO (0)
[15:21:24] <mru> I want to kill such people
[15:21:31] <mru> but my boss wouldn't let me
[15:21:52] <BBB> I know why they're needed, but don't want to waste my time writing it and your time reading it
[15:24:42] <ohsix> should have cited the irc log for the dc stuff; then he'd know what you were doing hurf
[15:25:09] <mru> he subscribes to the logs by mail
[15:25:24] <ohsix> ya but he didn't put 2&2 together
[15:25:33] <mru> he didn't want to
[15:25:49] <ohsix> maybe, can't really assume that though
[15:26:13] <mru> besides, why would I care about the value if I wasn't planning on using it?
[15:26:29] <ohsix> i wouldn't read irc logs if they were posted daily; i might like to search them, but i wouldn't have a linear timeline for topics in both places
[15:31:29] <mru> hmm... I made a list of top bikesheds of all time on ffmpeg-devel
[15:31:37] <mru> sorted by number of posts
[15:31:46] <mru> any guesses as to what's on top?
[15:33:47] <ohsix> no clue
[15:34:48] <mru> 238 [PATCH] QCELP decoder
[15:35:20] <mru> 231 Realmedia patch
[15:35:40] <mru> 178 Google Summer of Code participation
[15:35:50] <mru> 170 [PATCH] ALS decoder
[15:35:55] <mru> 155 [PATCH] Implement PAFF in H.264
[15:36:49] <BBB> is realmedia patch my rdt patch?
[15:36:56] <BBB> that one was bikeshed-supreme
[15:37:16] <mru> I've no idea what it is
[15:37:35] <mru> I just sorted and counted the subject lines in the archive
[15:37:52] <mru> hey wait
[15:37:54] <mru> 111 [PATCH] RDT/Realmedia patches #2
[15:38:03] <mru> 111 [PATCH] WMA Voice decoder
[15:38:07] <mru> you're popular
[15:40:13] <BBB> \o/
[15:40:26] <BBB> I think realmedia patch and rdt/realmedia patches are the same thread
[15:40:29] <BBB> just different subject
[15:41:36] <mru> I didn't bother writing an actual thread parser just for this
[15:41:54] <BBB> wasn't the aac decoder bikeshed supreme as well?
[15:42:02] <BBB> I thought there were like 10 threads for that
[15:42:06] <BBB> named #1, #2, #3 etc.
[15:42:33] <mru> it went up to round 10 or so
[15:43:26] <mru> but they're only 20-30 each
[15:43:33] <peloverde_> How do you know these weren't very long but very productive discussions? :p
[15:43:41] <mru> lol
[15:44:43] <peloverde_> and seriously and more importantly how can we reduce bikshedding on future patches?
[15:45:11] <mru> when michael is involved, impossible
[15:45:13] <ohsix> keep your own trees that people can pull your work from
[15:45:54] <ohsix> getting too much into discussions with people who are just going to mill around with one concern is probably not useful either
[15:46:17] <mru> ohsix: you mean people like yourself?
[15:46:34] <ohsix> like how
[15:46:49] <mru> maybe you should have your short-term memory checked
[15:47:04] <ohsix> if i had demonstrated as much on here i'd accept your point, but i'm more talking about 63 and that thread
[15:47:22] <mru> let's talk about alsa instead
[15:47:44] <mru> or ffmpeg vs xiph
[15:47:44] <ohsix> that would be you bikeshedding :P
[15:48:05] <ohsix> and thats more of an opinion thing; and we can both have some of those
[15:48:08] <mru> last time you compared using ffmpeg to stabbing people with knives
[15:48:18] <ohsix> nope
[15:48:23] <mru> oh yes you did
[15:48:34] <_av500_> ffmpeg is to the lone woman what the vcr is to the boston strangler
[15:48:35] <ohsix> knives were mentioned but that is a willful mischaracterization of what i said
[15:49:11] <mru> what the fuck did you intend to mean then?
[16:08:22] <BBB> is there a mmx instruction to write the highest 4 bytes (rather than movd:lower 4 bytes) into a memory location (or the other way around)?
[16:48:36] <BBB> kurosu: ping, so what was the issue with the mc/mmx code exactly?
[17:04:43] <BBB> wbs: you've tested all these patches right? :-p
[17:04:50] <BBB> (I'm reading through cvslog for a bit)
[17:11:45] <wbs> BBB: yes, I've tested it
[18:15:55] <kurosu> BBB: I haven't seen the latest patch, but you were unpacking to dwords some of the computations
[18:16:22] <kurosu> because the results of some of them were not fitting in the expect word [-32767;32768]
[18:16:34] <Dark_Shikari> BBB: use psrlq or whatever to shift
[18:16:35] <Dark_Shikari> then write
[18:17:25] <kurosu> this doesn't really matter as long as the whole range of the results is 16bits, which seems the case here [-32*255;190*255] IIRC
[18:18:03] <kurosu> like in the vc1 mc code, you just need to use unsaturating maths and add bias/do saturating math/remove bias
[18:18:16] <kurosu> without unpacking results to dwords, keeping all in words
[18:20:56] <janneg> mru: experimental doesn't work like that for decoders. I had not enough energy to argue with baptiste
[18:21:21] * mru locks baptiste in a bikeshed and melts the key
[18:22:11] <Dark_Shikari> Why not just make libvpx the default?
[18:22:15] <Dark_Shikari> and change the default later?
[18:22:25] <Dark_Shikari> what's so hard about this
[18:22:26] <mru> that's exactly what the experimental flag would do
[18:22:39] <mru> if baptisted hadn't shedded it into oblivion
[18:24:55] <Dark_Shikari> er....
[18:24:59] <Dark_Shikari> you can do that without experimental
[18:25:16] <Dark_Shikari> for example, take libfaad
[18:25:18] <Dark_Shikari> ffmpeg aac is default
[18:25:20] <Dark_Shikari> libfaad is a secondary option
[18:25:31] <mru> libfaad doesn't even exist
[18:25:34] <Dark_Shikari> there's no "experimental flag" for libfaad
[18:25:38] <Dark_Shikari> You know what I'm talking about.
[18:25:41] <Dark_Shikari> When it did, this was how it worked.
[18:25:47] <mru> actually, I don't
[18:25:50] <Dark_Shikari> ....?
[18:25:53] <janneg> Dark_Shikari: that's just order of the register_decoder calls
[18:25:54] <mru> there is no concept of default codec
[18:25:58] <Dark_Shikari> Yes there is.
[18:26:02] <Dark_Shikari> The one that's called first is used first.
[18:26:04] <Dark_Shikari> What janneg said.
[18:27:06] <janneg> but allcodecs.c is sorted by type and name and not by codec id and priority
[18:27:07] <mru> and native aac was registered way before libfaad
[18:27:15] <mru> so libfaad was never "default"
[18:27:37] <Dark_Shikari> it was before native aac existed
[18:27:40] <Dark_Shikari> and then
[18:27:42] <Dark_Shikari> when ours was added
[18:27:44] <Dark_Shikari> ours was made "default"
[18:27:49] <Dark_Shikari> and libfaad was "not default"
[18:27:56] <Dark_Shikari> So why, for vpx, can't it be the reverse?
[18:27:58] <Dark_Shikari> libvpx is "default"
[18:28:01] <Dark_Shikari> ours is "not default"
[18:28:03] <Dark_Shikari> and then later we can change it?
[18:28:06] <Dark_Shikari> you are bikeshedding. stop it.
[18:28:24] <ohsix> BLUE!
[18:28:52] <ohsix> seems everyone gets into the spirit of things but not everyone is calling names :D
[18:30:28] <mru> I see nothing but hand-waving from baptiste
[18:31:09] <janneg> I'll send a patch
[18:44:03] <BBB> kurosu: huh?
[18:44:16] <BBB> kurosu: no, I used dwords because pmaddbla did that for me
[18:44:21] <BBB> kurosu: the result is 9bit or so
[18:44:29] <BBB> maybe less
[18:44:30] <BBB> but anyway
[18:44:37] <BBB> oh wait it was actually 16bit
[18:44:42] <BBB> but still
[18:45:06] <BBB> kurosu: so it was because the instruction made me, not because I wanted to
[18:47:18] <BBB> mru: don't bother with the experimental
[18:47:32] <BBB> mru: if it's an issue, I'll simply add the bilin flter so the vector test suite passes
[18:47:39] <BBB> mru: I was hoping it wouldn't be an issue
[18:47:51] <Dark_Shikari> BBB: do that, then we can kill libvpx
[18:47:56] <BBB> no
[18:47:58] <BBB> libvpx encodes
[18:48:05] <mru> kill libvpx decode
[18:48:09] <BBB> k...
[18:48:27] * BBB goes look into the cursed and undocumented bilinear filter
[18:48:44] <Dark_Shikari> well yes
[18:49:02] <peloverde_> I would say don't bother with bilin until it appears in the spec
[18:49:18] <mru> +1
[18:49:28] <mru> teach them what spec means
[18:49:40] <BBB> that was my logic
[18:49:48] <BBB> but then I can't apply it without it being experimental
[18:49:52] <BBB> which is a little silly maybe
[18:49:59] <mru> huh?
[18:50:21] <mru> if it decodes everything in the spec properly, it's good to go imo
[18:50:27] <BBB> it does
[18:50:36] <mru> we could even claim those other files are invalid
[18:50:42] <kierank> lol
[18:50:42] <peloverde_> I agree with mru
[18:50:45] <mru> and call it an encoder bug
[18:50:53] <BBB> well, the spec does say that whenever libvpx and spec disagree, libvpx is correct
[18:51:08] <BBB> it's just that this is a very extended way of saying that the spec is wrong
[18:51:14] <peloverde_> Then it's not a spec it's a textual description of libvpx
[18:51:15] <mru> it's like an mpeg2 encoder using qpel mc
[18:51:27] <BBB> it's like writing a spec that says : "empty; whatever is missing but present in libXYZ, libXYZ is right"
[18:51:50] <kurosu> BBB: whatever. I'll benchmark what I propose once your code has been accepted. But my point was: pmaddwd is not mandatory
[18:52:05] <BBB> kurosu: ok
[18:52:12] <peloverde_> It's insulting that google wants to call this a standard but their implementation has priority status
[18:52:35] <BBB> kurosu: I'm learning mmx by doing this, so I might well be wrong in many places
[18:52:43] <BBB> all I can test is that it's faster and bit-exact
[18:52:47] <BBB> it might not be fastest
[18:52:54] <kierank> peloverde_, it's known as the mozilla development model. If you don't get your paycheck from them they don't care.
[18:53:40] <Dark_Shikari> peloverde_: that's why we have to change tha
[18:53:41] <Dark_Shikari> *that
[18:53:44] <Dark_Shikari> we have to get libvpx out of ffmpeg, asap
[18:53:52] <BBB> make x264 output vp8
[18:53:57] <Dark_Shikari> everyone uses ffmpeg. once libvpx is gone, google will no longer have control
[18:54:05] <Dark_Shikari> they won't be able to arbitrarily break things
[18:54:08] <Dark_Shikari> because they won't control the decoder
[18:54:09] <kurosu> BBB: I don't recall why we did this for vc1: either the code was register-starved, or it was faster
[18:54:22] <peloverde_> Don't gst and firefox wrap libvpx?
[18:54:23] <Dark_Shikari> kurosu: pmaddwd saves 2 regs
[18:54:26] <BBB> kurosu: ok... feel free to test it, it'd be better if you teach me while doing so ;)
[18:54:27] <Dark_Shikari> peloverde_: chrome wraps ffmpeg
[18:54:31] <peloverde_> as do the dshow filters
[18:54:43] <peloverde_> Dark_Shikari, chrome does but google controls chrome and can patch libvpx back in
[18:54:47] <BBB> peloverde_: firefox is quickly becoming irrelevant
[18:54:50] <Dark_Shikari> They can, but we'll make it hard for them.
[18:55:24] <Dark_Shikari> Especially when our implementation is twice as fast.
[18:55:33] <peloverde_> BBB, firefox is losing marketshare but it still is the #2 browser
[18:55:37] <Dark_Shikari> And furthermore
[18:55:41] <Dark_Shikari> When ours is twice as fast
[18:55:43] <Dark_Shikari> what will firefox do?
[18:55:46] <mru> vlc would probably be happy to use ffmpeg by default
[18:55:48] <Dark_Shikari> They will have to work very hard to justify not using it.
[18:55:56] <BBB> ok
[18:55:57] <BBB> so now
[18:55:59] <BBB> back to basics
[18:56:01] <Dark_Shikari> thus, our goal needs to be to make it so much better
[18:56:02] <mru> firefox will never use ffmpeg
[18:56:04] <BBB> how do I make this beast experimental?
[18:56:07] <BBB> so I can commit it
[18:56:10] <BBB> my hands are itching
[18:56:11] <Dark_Shikari> mru: but we can make them suffer for not doing it
[18:56:11] <mru> it's against their charter
[18:56:22] <BBB> mru: firefox != chris blizzard
[18:56:29] <BBB> chris blizzard is a moron and the firefox people know it
[18:56:32] <mru> mozilla charter:
[18:56:42] <Dark_Shikari> stop flaming
[18:56:43] <BBB> they have asked sflc about ffmpeg, legal implications etc.
[18:56:43] <mru> 1. all mozilla products shall suck
[18:56:43] <Dark_Shikari> it doesn't _matter_
[18:56:51] <BBB> and sflc has said what they should say
[18:56:54] <Dark_Shikari> 1) we make it super fast
[18:56:57] <Dark_Shikari> 2) if they don't adopt ffmpeg, they suffer
[18:57:01] <Dark_Shikari> 3) if they do adopt ffmpeg, we win
[18:57:03] <Dark_Shikari> win-win situation
[18:57:06] <BBB> and Dark_Shikari is right, by being better we will always win
[18:57:08] <Dark_Shikari> now stop it and get back to coding
[18:57:13] <mru> it's good to have an optimist around
[18:57:23] <Dark_Shikari> there's no optimism
[18:57:25] <BBB> is it CODEC_CAP_EXPERIMENTAL?
[18:57:27] <Dark_Shikari> I don't expect them to do anything but suffer
[18:57:57] <peloverde_> BBB, the consensus (here) is no CODEC_CAP_EXPERIMENTAL
[18:58:05] <mru> a true freetard has to suffer
[18:58:14] <mru> it's part of their philosophy
[18:58:17] <BBB> I thought they just said let's mark is exp. until it passes the test vector suite
[18:58:27] <mru> seeig themselves as victims, suffering for a noble cause
[18:58:34] <Dark_Shikari> mru: it doesn't matter
[18:58:36] <Dark_Shikari> we don't care what they see
[18:58:39] <Dark_Shikari> we don't care what they do
[18:58:41] <BBB> hehe. mru is a little right
[18:58:41] <peloverde_> I tohught we said that non-spec filrs don't matter
[18:58:42] <Dark_Shikari> we care that we are better
[18:58:56] <Dark_Shikari> peloverde_: we could do that
[18:58:57] <BBB> mru, Dark_Shikari: opinions on experimental?
[18:58:58] <BBB> yes / no
[18:58:59] <peloverde_> *I thought we said that non-spec files don't matter
[18:59:03] <Dark_Shikari> and if google complains, we could say that it isn't in the spec
[18:59:09] <BBB> I already told them
[18:59:13] <BBB> they say they're working on it
[18:59:17] <BBB> that was >1 week ago
[18:59:18] <Dark_Shikari> Then we're working on it.
[18:59:20] <Dark_Shikari> Commit now.
[18:59:23] <Dark_Shikari> Now.
[18:59:38] <Dark_Shikari> Stop the bikeshed. Stop the debate. Just fucking commit it.
[18:59:41] <Dark_Shikari> We can work with it from there.
[18:59:52] <peloverde_> statim
[19:00:37] <mru> commit w/o experimental
[19:05:13] <janneg> BBB: commit, CODEC_CAP_EXPERIMENTAL does nothing now. it can be added later if someone insists
[19:06:00] <BBB> janneg: agreed
[19:07:30] <CIA-99> ffmpeg: alexc * r23712 /trunk/libavcodec/ps.c: Cosmetics whitespace.
[19:08:46] * BBB complains that test-builds before commit take ages if you change common.h or mathops.h
[19:10:05] <peloverde> That's why I try to get changes in common headers committed early if at all possible
[19:10:07] <elenril> why test-build, fate will do that for you
[19:10:08] * elenril runs
[19:10:27] <peloverde> you should subtly break x86/linux :)
[19:11:09] <BBB> forgiveness can be hard to receive :-p
[19:11:55] <mru> BBB: get a faster computer
[19:12:07] <BBB> nobody wants to buy me that pretty new mac
[19:12:22] <CIA-99> ffmpeg: rbultje * r23713 /trunk/libavutil/common.h: Add av_clip_int8(), used in the upcoming VP8 decoder.
[19:12:31] <Dark_Shikari> I don't see the commit yet
[19:12:50] <mru> BBB: why would you want a mac? :-)
[19:13:01] <BBB> Dark_Shikari: I'm slow
[19:13:05] <mru> the sony ones are much nicer
[19:13:11] <BBB> they don't run osx
[19:13:18] <mru> even better
[19:13:43] <CIA-99> ffmpeg: rbultje * r23714 /trunk/libavcodec/ (h264pred.h h264pred.c): Make "topright" argument to pred4x4() const.
[19:13:49] <mru> there simple is no macbook that fulfills my requirements
[19:13:57] <elenril> mru: http://www.igniq.com/images/joytech_ps2_monitor_170505.jpg you mean this?
[19:14:06] <Honoome> mru: you know, s/freetard/$religious_organisation/ holds true as well :P
[19:14:29] <mru> elenril: hehe
[19:14:50] <mru> Honoome: yes, and religious people are also generally unhappy
[19:15:23] <Honoome> right
[19:16:19] <CIA-99> ffmpeg: rbultje * r23715 /trunk/libavcodec/mathops.h:
[19:16:19] <CIA-99> ffmpeg: Add a macro to pack 4 bytes into native byte-order so they can be written
[19:16:19] <CIA-99> ffmpeg: at once using a single 32-bit store.
[19:18:02] <CIA-99> ffmpeg: rbultje * r23716 /trunk/libavcodec/ (h264pred.h h264pred.c arm/h264pred_init_arm.c):
[19:18:03] <CIA-99> ffmpeg: Add intra prediction functions for VP8.
[19:18:03] <CIA-99> ffmpeg: Patch by David Conrad <lessen42 gmail com> and myself.
[19:18:03] <CIA-99> ffmpeg: rbultje * r23717 /trunk/libavcodec/ (h264pred.c arm/h264pred_init_arm.c): Reindent after r23716.
[19:18:06] <peloverde> BBB, another testbuild trick is --disable-everything --enable-decoder=vp8
[19:19:03] <Honoome> I need a coffee :/
[19:19:55] <elenril> http://tvtropes.org/pmwiki/pmwiki.php/Main/MustHaveCaffeine
[19:20:04] <CIA-99> ffmpeg: rbultje * r23718 /trunk/libavcodec/vp56.h: Change a / 256 into a >> 8.
[19:20:11] <BBB> almost there...
[19:20:56] <BBB> can somebody else help with throwing out libvpx once we agree it can be thrown out?
[19:21:18] <peloverde> mru seems to take great joy in removing wrappers?
[19:21:33] <BBB> I was hoping for his kind assistance, esp. since it gives him so much pleasure
[19:21:42] <mru> peloverde: it's not the process as such
[19:21:46] <mru> it's the cleanliness that results
[19:21:59] <mru> it's like killing roaches in the kitchen
[19:22:22] <Honoome> elenril: glad to see I'm not the only one living on that wiki :P
[19:23:37] * elenril doesn't read it all that much lately
[19:24:07] <elenril> http://tvtropes.org/pmwiki/pmwiki.php/Main/ItsPopularNowItSucks
[19:24:17] * mru suspects elenril has memorised it all
[19:24:30] <mru> which would explain why he doesn't read it much anymore
[19:24:39] <elenril> yeah, that too
[19:25:05] <CIA-99> ffmpeg: rbultje * r23719 /trunk/ (10 files in 2 dirs):
[19:25:05] <CIA-99> ffmpeg: Native VP8 decoder.
[19:25:05] <CIA-99> ffmpeg: Patch by David Conrad <lessen42 gmail com> and myself.
[19:25:15] <saintdev> \o/
[19:25:23] <Dark_Shikari> \o/
[19:25:25] <elenril> \o/
[19:25:58] <peloverde> \o/
[19:26:03] <elenril> maybe we should /. it or something
[19:26:23] <peloverde> send an announcement to the webm list
[19:26:48] <mru> wait for some asm optimisations
[19:26:57] <mru> so it's clearly faster than libvpx
[19:27:01] <BBB> it already is
[19:27:13] <mru> even without asm?
[19:27:40] <Dark_Shikari> without asm it's faster than vp8 without asm
[19:27:41] <Dark_Shikari> er, libvpx
[19:27:47] <Dark_Shikari> don't announce it yet, let's write the asm
[19:27:49] <Dark_Shikari> and then remove libvpx
[19:27:52] <Dark_Shikari> I say it'll take us a week or three.
[19:27:54] <Dark_Shikari> Then we announce.
[19:28:04] <peloverde> people will find it anyway
[19:28:13] <mru> we don't need big thunder
[19:28:33] <Dark_Shikari> mru: we announce the thunder when we're 40% faster.
[19:29:54] <CIA-99> ffmpeg: alexc * r23720 /trunk/libavcodec/ (13 files): Move Parametric Stereo related ps* files to aacps*.
[19:30:17] <BBB> mru: yes even without asm
[19:31:01] <mru> ffvp8 w/o asm is faster than libvpx with?
[19:31:04] <Dark_Shikari> no
[19:31:25] <Dark_Shikari> BBB: can you commit some asm for starters so we have .asm files for me to easily merge things into?
[19:31:36] <Dark_Shikari> easier to have low-friction commits that way
[19:31:40] <BBB> I'll submit a patch for review
[19:31:50] <BBB> if any part is OK'ed I'll apply it
[19:32:02] <Dark_Shikari> I can write ssse3 mc
[19:32:26] <BBB> coolish
[19:35:00] <peloverde> Is any of the libvpx assembly worth borrowing
[19:35:09] <peloverde> before they de-yasm-ify
[19:35:16] <Dark_Shikari> they're not deleting the git history
[19:35:30] <Dark_Shikari> I don't recall seeing anything really good in there
[19:35:36] <Dark_Shikari> i.e. nothing that wasn't the naive approach
[19:35:43] <Dark_Shikari> which means it's only useful if we're not going to write our own
[19:39:15] <Dark_Shikari> BBB: does the code currently use idct_dc_add
[19:39:16] <Dark_Shikari> ?
[19:39:26] <Dark_Shikari> oh, you never fixed the splatting
[19:39:30] <Dark_Shikari> in your asm
[19:39:41] <Dark_Shikari> and you never unrolled the idct dc and use padd/psub instead... wait what
[19:39:43] <Dark_Shikari> your patch is old
[19:39:48] <Dark_Shikari> I thought you already updated this
[19:41:47] <Honoome> if somebody has problems of self-esteem related to their code, one very good solution is to read some ruby on rails related code...
[19:41:57] <Honoome> that'll definitely make you feel like a _very_ good coder
[19:42:04] <j0sh_> lol
[19:42:19] <j0sh_> rails code uses a lot of magic, but i havent looked at it lately
[19:42:40] <Honoome> j0sh_: rails itself is bad, but not as bad as some of the code written around it
[19:42:44] <mru> is ruby still as horribly slow as it was some years ago?
[19:42:56] * Honoome is fighting with "bundler".. as the name leaves to intend, it's a bad thing by design
[19:43:06] <j0sh_> mru: the newer ruby implementations have made a lot of progress speed-wise
[19:43:12] <Honoome> mru: depends on the task... there are some obvious things that are stupidly slow
[19:43:31] <Honoome> 0xf000000..0xfff00000.min
[19:43:37] <mru> a few years ago my toy script interpreter outran it by orders of magnitude
[19:43:38] <Honoome> this'll take a huge bloody amount of time
[19:44:03] <Honoome> because it actually _iterates_ through all the objects.. it's a frigging range (so it's ordered)
[19:44:03] <mru> of course my toy language isn't comparable to ruby at all
[19:44:19] <mru> seems like an obvious optimisation
[19:44:31] <Honoome> they fixed it in 1.9
[19:44:32] <peloverde> So when is the public release of mruscript?
[19:44:40] <Honoome> but of course... half the code does not work with 1.8
[19:44:46] <Honoome> s/8/9/
[19:45:04] <Honoome> and not all the code that is declared "works with 1.9" actually does
[19:45:15] * Honoome could repat the story of the fcgi gem that compiled... with undefined symbols
[19:45:19] <j0sh_> 1.9 is starting to feel like perl 6
[19:45:19] <mru> peloverde: http://git.mansr.com/?p=libtc;a=blob;f=src/script.c
[19:45:52] <Honoome> j0sh_: yeah more or less.. 1.9.1 is not usable and they moved on to 1.9.2 already, with more stuff to break
[19:46:05] <Honoome> Rails 2.3.5 (and 2.3.8) are both declared to work on 1.9, but their own testsuite explodes on 1.9 :/
[19:46:12] <mru> I heard perl6 was to be released the day before duke nuken forever
[19:46:32] <Honoome> with Gentoo I'm trying to do the best to support 1.9 as it is but it's giving me headaches to fix all the packages :/
[19:46:34] <mru> Honoome: reminds me of java 1.3 times
[19:46:43] <Honoome> mru: and ruby2 the day after I guess
[19:46:58] <Honoome> and those will be the three days that RMS will be the most useful to the world...
[19:47:17] <peloverde> java 1.4 broke the class loader in strange ways
[19:47:23] <mru> well, the third day we'll all be playing DNF
[19:47:42] <mru> java 1.4 was a huge break against a lot of 1.3 things
[19:47:59] <mru> but used alone, it was more bearable
[19:48:10] <mru> I haven't really touched java since
[19:48:30] <Honoome> mru: not sure if 1.9 is like that... they did change quite a bit of things though
[19:48:35] <Honoome> including the String class (d'oh!)
[19:48:36] <mru> I hear they added enums, <templates>, @annotations, and god knows what
[19:48:47] <mru> ^^ in java...
[19:49:07] <Honoome> with 1.5 iirc
[19:49:27] <peloverde> I really don't think there was anything in 1.4 that I found particularly helpful above 1.3
[19:49:30] <Honoome> but it's not <templates> it's <generics> ... which of course the C++tards will tell you are "despisable" because they can't do all the things that templates do...
[19:49:49] <mru> looks a hell of a lot like templates to me
[19:49:49] <peloverde> 1.5 added a stdio compatible formatter
[19:49:53] <Honoome> on the other hand, generics are just the sane stuff you can get with templates, so ... :P
[19:50:07] <Honoome> mru: templates in C++ iirc are turing-complete on their own
[19:50:07] <mru> peloverde: yeah, printf was one of the things I cursed the lack of the most
[19:50:19] <mru> Honoome: no, the _error messages_ are turing complete
[19:50:36] <mru> whatever that means
[19:50:37] <Honoome> you can expand a 1MB source file into something equivalent to a 24MB pre-processed file, with templates
[19:50:51] <Honoome> mru: the error messages are a realistic turing test... inverse
[19:51:04] <peloverde> I wrote my first AAC decoder in java 1.3 (without printf) that made debugging super fun
[19:51:17] <mru> Honoome: tried feeding them to an xml parser?
[19:51:26] <Honoome> you can't be an human if you can understand C++ error messages
[19:51:38] <Honoome> mru: no, actually xml makes more sense to me than some template-heavy C++ code
[19:54:34] <Tjoppen> I wouldn't really compare java's generics with c++ templates. the similarity is mostly superficial
[19:55:02] <mru> I used templates in the general sense
[19:56:02] <peloverde> Perhaps even the most generic sense? :)
[19:56:12] <Tjoppen> generics are kinda like throw() in C++. doesn't give you any guarantees - only run-time enforcement
[19:56:29] <mru> I'm not talking about implementation specifics
[19:57:06] <BBB> Dark_Shikari: I was creating a vp8 decoder patch, I can only do one thing at a time
[19:57:14] <BBB> Dark_Shikari: so yes the patch is several days old :-p
[19:57:24] <mru> the <> in java looks like it's a way to apply the same code to different data types
[19:57:28] <mru> that's a template to me
[19:57:29] <BBB> it's still faster than C, feel free to comment on the ML and I'll resubmit in a few days
[19:59:48] <mru> Dark_Shikari: hey look, michael replied to your "patch"
[20:00:07] <Tjoppen> kinda, except a List<Dog> can contain Cats for instance. but whatever
[20:00:38] <mru> as I said, I know nothing about the new java stuff
[20:00:47] <mru> I've just seen some random fragments
[20:00:55] <mru> many of them on tdwtf
[20:01:19] <Honoome> mru: didn't I tell ya that calling those templates would have brought you to complains? :P
[20:01:25] <mru> btw, does eclipse still have a class with a name >100 chars long?
[20:02:06] <Tjoppen> probably. definately, if you count inner classes
[20:02:23] <Dark_Shikari> mru: lol
[20:02:52] <Honoome> mru: want to know what's the longest ELF symbol I can find on my tinderbox? :)
[20:03:12] <mru> Honoome: if you want to do the search, sure
[20:03:26] <mru> are you counting c++ mangled names?
[20:04:16] <Honoome> yes
[20:04:28] <Honoome> 403 characters, it's a gnash symbol
[20:04:32] <mru> lol
[20:04:37] <Honoome> _ZN5gnash13iterator_findERN5boost11multi_index21multi_index_containerINS_8PropertyENS1_10indexed_byINS1_14ordered_uniqueINS1_13const_mem_funIS3_RKNS_9ObjectURIEXadL_ZNKS3_3uriEvEEEEN4mpl_2naESC_EENS5_INS1_3tagINS_12PropertyList8OrderTagESC_SC_SC_SC_SC_SC_SC_SC_SC_SC_SC_SC_SC_SC_SC_SC_SC_SC_SC_EENS6_IS3_iXadL_ZNKS3_8getOrderEvEEEESC_EESC_SC_SC_SC_SC_SC_SC_SC_SC_SC_SC_SC_SC_SC_SC_SC_SC_SC_EESaIS3_EEEi
[20:04:48] <Honoome> sorry 402, I have a final @ that I use for versioning
[20:05:02] <mru> eeeew
[20:06:55] <Honoome> if I exclude C++ symbols, mangled with the latest g++ ABI (so starting with _Z) I get another at 384 characters, but I have no clue where it comes from... and it looks like C++ stuff anyway
[20:07:57] <Honoome> [I have a database of all the symbols exported.. well almost all of them...]
[20:08:11] <Honoome> I could tell you which packages likely bundle part of FFmpeg :P
[20:08:23] <roxfan> some people collect stamps...
[20:08:23] <mru> what do you have that db for?
[20:08:36] * mru collects beagle boards
[20:09:10] <Honoome> mru: collisions detections and identification of bundled code
[20:09:49] <Honoome> the former was the original intent, the latter is happenstance, but it turns out useful to find if somebody ever bundled a vulnerable function when they come by
[20:10:29] <roxfan> mru: has fft16_neon been tested on real hw? i see it loads some data from a structure with :128 align but the structure is only 4-byte aligned...
[20:10:51] <mru> of course it's tested
[20:11:56] * Honoome wonders how one person can be so stupid about code...
[20:12:04] <roxfan> any idea why that works then? according to arm docs this should result in a fault
[20:12:14] <mru> it works because the alignment is guaranteed
[20:12:40] <Honoome> so you got a sorta-package-manager that uses a "$packagename-$version-$gitsha-$branch" scheme to install packages..
[20:13:01] <Honoome> how do you extract the branch? fullpath.split("-")[3] ...
[20:13:19] <Honoome> what happens if fullpath (which is obviously absolute) contains '-'?
[20:13:24] <mru> it breaks
[20:13:35] <mru> but I can top that
[20:13:36] <Honoome> it reports the most bogus data possible..
[20:13:49] <mru> a build system written in javascript
[20:14:04] <mru> in such a way that '.' anywhere in the absolute path is illegal
[20:14:05] <Honoome> nokia's qmake replacement?
[20:14:10] <Honoome> hahaha
[20:14:22] <mru> think version numbers
[20:14:48] <Honoome> who uses versioned tarballs anyway?
[20:14:54] <Honoome> who uses tarballs! just git fetch it!
[20:17:20] <roxfan> mru: i downloaded a random armv7 build of libavcodec.so and 'mppm' array is only 32-byte aligned
[20:17:34] <mru> try a proper build instead
[20:17:37] <roxfan> maybe kernel silently fixes the fault?
[20:18:15] <mru> my kernels have fixup disabled
[20:18:24] <mru> there's no point arguing
[20:18:38] <roxfan> i'm not arguing, i'm trying to understand
[20:19:02] <mru> 32-byte alignment is plenty btw
[20:19:06] <mru> did you mean bit?
[20:19:13] <roxfan> how do you guarantee 128-byte alignment? i only see ".align 4" in the .S file
[20:19:17] <mru> bit
[20:19:24] <mru> 16-byte
[20:19:29] <mru> 1<<4 == 16
[20:20:27] <roxfan> oh indeed 128 is in bits
[20:20:37] <roxfan> sorry for the false alarm :/
[20:24:09] <mru> IAddWebComponentToEnterpriseApplicationDataModelProperties
[20:24:14] <Dark_Shikari> BBB: do you have a vp8 test video I can use to test my asm?
[20:24:15] <mru> that's an actual class name
[20:24:19] <mru> from eclipse
[20:24:42] <BBB> http://code.google.com/p/webm/downloads/detail?name=vp8-test-vectors-r1.zip…
[20:25:15] <mru> counting inner classes the winner is ConfigureWorkingSetAssignementAction$WorkingSetModelAwareSelectionDialog$Gra
[20:25:18] <mru> yedCheckModelElementSorter
[20:25:21] <BBB> Dark_Shikari: and then http://ffmpeg.pastebin.com/d3KKb0jj
[20:25:31] <BBB> Dark_Shikari: run it with unpatched source to get the md5s for ref.md5s
[20:26:34] <Dark_Shikari> oh, cool
[20:26:49] <Honoome> YAI! I "fixed" bundler
[20:26:56] <Honoome> or rather made it slightly less broken
[20:31:40] <peloverde> Should these test vectors be added to FATE?
[20:31:52] <mru> sure, why not?
[20:36:57] <wbs> BBB: FWIW, the crash on -user that you mentioned, it's not at all related to http. it's a bug on the 0.6 branch that has been fixed in trunk since. 23344 should be the revision fixing that
[20:37:34] <peloverde> How does one go about adding tests to fate?
[20:37:49] <wbs> BBB: I'm not subscribed to -user, so I can't give a sensible reply there, though
[20:37:51] <mru> you hope and pray that mike will eventually do it
[20:38:00] <mru> that's why we need a new system
[20:38:30] <peloverde> is mike willing to open up fate?
[20:38:44] <Honoome> I thought the code was already open
[20:38:49] <mru> diego asked him and got handwaving replies
[20:39:01] <mru> the server-side code is not available
[20:39:08] <mru> and it's probably ugly beyond belief
[20:39:19] <mru> I'd like to do it a bit differently
[20:39:38] <peloverde> I'd like to build the testspecs into FFmpeg itself
[20:39:46] <mru> they're already there actually
[20:40:18] <Honoome> mru: no doubt it's ugly :P
[20:40:47] <kierank> it's php isn't it?
[20:40:51] <mru> python
[20:40:56] <mru> and maybe some php too
[20:40:58] <mru> I don't know
[20:41:09] <mru> mike likes python for some reason
[20:41:20] <peloverde> I thought they lived in http://fate.multimedia.cx/fate-tests.sqlite.bz2
[20:41:31] <mru> they also live in tests/fate*
[20:41:50] <Honoome> damn, I still got one test failure on bundler due to _the bloody paths_ =_=
[20:43:17] <peloverde> Then what's this nonsense about fate not being versioned/not being able to run fate on branches?
[20:43:42] <mru> the test files aren't versioned for starters
[20:44:01] <peloverde> the same files should be usable
[20:44:01] <mru> and the fate runner scripts use the refs from the db, not from the repo
[20:44:40] <peloverde> that seems trivial to fix then, no?
[20:45:19] <mru> I'd like to do completely restructure it
[20:45:36] <Dark_Shikari> BBB: I'm getting "all results bitexact" even when I completely break it
[20:46:01] <Dark_Shikari> er, "results identical"
[20:47:11] <peloverde> What do you want it to look like? and can it be done quickly?
[20:47:30] <peloverde> I *need* some sort of aacdec tests in fate
[20:47:44] <mru> I need time to do it
[20:48:35] <mru> I should make it high priority
[20:49:48] <peloverde> I would be very much appreciative if you did
[20:50:03] <peloverde> And I'm willing to help with it
[20:50:17] <BBB> Dark_Shikari: huh? can't be, I tested various buggy things and broke it :)
[20:50:30] <BBB> Dark_Shikari: make sure the test files actually trigger your code
[20:51:09] <Dark_Shikari> BBB: your regression.sh is broken on cygwin
[20:51:14] <Dark_Shikari> it generates no output
[20:51:19] <Dark_Shikari> because you need "-" not /dev/stdout
[20:51:48] <BBB> it was a hackscript, not supposed to be portable beyond mac and perhaps linux
[20:51:53] <Dark_Shikari> I know I know
[20:51:56] <Dark_Shikari> but it's _shorter_ =p
[20:52:50] <BBB> I didn't know - worked :-p
[20:53:01] <peloverde> Also why does that script use /bin/bash? I don't see any bashisms
[20:53:06] <Dark_Shikari> I didn't know there was a /dev/stdout
[20:53:10] <BBB> hehe :)
[20:53:16] <BBB> peloverde: bad habit, that's all
[20:53:22] <Dark_Shikari> I thought stdout was just a particular stream opened by the shell
[20:53:27] <BBB> really, the script is a complete hack and probably completely broken, I suck at shell
[20:53:29] <peloverde> There is lots of funs stuff in dev
[20:54:11] <siretart> peloverde: hi
[20:54:19] <mru> peloverde: what's the best way to test aac anyway?
[20:54:39] <mru> since it's not quite exact across machines
[20:54:45] <peloverde> snr/off-by-one
[20:55:29] <siretart> peloverde: I don't remember who exactly proposed that, but I'd like to hear your opinion on backporting the HE-AAC2 decoder to 0.6. do you think it's feasible?
[20:55:44] <Honoome> Dark_Shikari: /dev/stdout is a symlink to /proc/self/fd/1
[20:55:49] <Dark_Shikari> Honoome: aha.
[20:55:52] <Honoome> so it does not always work as intended, obviously :P
[20:55:53] <Dark_Shikari> is that supposed to be portable?
[20:55:55] <Dark_Shikari> i.e. is it posix?
[20:56:05] <Honoome> probably it is posix, portable, I wouldn't bet on it
[20:56:33] <Honoome> I think freebsd has it (I should check on the vm to be sure) but there it's implemented differently, obviously
[20:56:41] <Honoome> given that there is no /proc there
[20:56:50] <Honoome> [no _mandatory_ /proc that is]
[20:57:01] <mru> /dev/stdout is not required
[20:57:07] <mru> nor is /proc
[20:57:13] <Dark_Shikari> well proc is a given
[20:57:16] <Dark_Shikari> just was wondering about dev.
[20:58:00] <CIA-99> ffmpeg: rbultje * r23721 /trunk/libavcodec/ (h264pred.c mathops.h): Rename PACK4x8() to PACK4UINT8().
[20:58:00] <CIA-99> ffmpeg: rbultje * r23722 /trunk/libavcodec/h264pred.c: Reindent after r23721.
[20:58:20] <mru> http://www.opengroup.org/onlinepubs/000095399/basedefs/xbd_chap10.html
[20:59:25] <peloverde> siretart, I suppose it's feasible
[20:59:30] <mru> BBB: in future, please don't immediately rename stuff just because someone asks you to
[21:00:11] <peloverde> siretart, I would start with the two bugfixes for LC/HEv1 first
[21:01:26] <peloverde> What are the filenames on the 0.6 branch? did it pick up the aac.c->aacdec.c transition?
[21:02:10] <Dark_Shikari> bah, what's the option to make ffmpeg not print the progress info?
[21:02:15] <mru> -v 0
[21:02:20] <mru> or less
[21:02:23] <Dark_Shikari> will it still print start_timer stuff?
[21:02:29] <Dark_Shikari> ok, good, it does.
[21:02:44] <siretart> peloverde: no, it's still 'aac.c'
[21:04:17] <peloverde> There shouldn't be issues backporting ps if you don't mind that any third part patches against AAC will be clobbered
[21:04:51] <siretart> 'third part patches'?
[21:05:01] <mru> first part, second part, third part
[21:05:02] <peloverde> sorry "third party"
[21:05:19] <siretart> I see
[21:08:27] <Honoome> peloverde: I guess you could have "third hunky" since we talk about patches :D
[21:09:11] <mru> junky
[21:09:19] <mru> junkie
[21:09:43] <mru> which is pretty close to third party actually
[21:12:24] <siretart> peloverde: what are the revisions for the HEv1 bugfixes?
[21:12:32] <mru> http://article.gmane.org/gmane.comp.video.videolan.vlc.general/17813
[21:13:27] <peloverde> siretart: r23660, r23673
[21:13:31] <CIA-99> ffmpeg: cehoyos * r23723 /trunk/libavcodec/mpegvideo_common.h:
[21:13:31] <CIA-99> ffmpeg: Use right-shift instead of division by two.
[21:13:31] <CIA-99> ffmpeg: Patch by Dark Shikari
[21:14:16] <Dark_Shikari> <3 michael
[21:14:17] <Dark_Shikari> the shocking effectivity of the dc idct is likely due to the shocking low
[21:14:18] <Dark_Shikari> quality of flvs ;)
[21:14:20] <Dark_Shikari> hah
[21:14:40] <Honoome> the compiler is so bad that it doesn't replace /2 with >>1? o_O
[21:14:52] <mru> Honoome: it can't for signed types
[21:14:57] <Dark_Shikari> what mru said
[21:15:24] <Honoome> d'oh!
[21:15:46] <Honoome> sign-extension, right
[21:15:56] <mru> no
[21:16:10] <peloverde> consider -1/2
[21:16:12] <mru> -1/2 == 0
[21:16:18] <mru> -1 >> 1 == -1
[21:17:01] <Honoome> hm.. all in all it's actually making me understand why adding negative values to enum can throw gcc off
[21:17:58] <siretart> peloverde: do'h, now I see your nominations on -cvslog. too much backlog :/
[21:29:12] <BBB> mru: isn't MN maint of mathops.h?
[21:29:18] <BBB> mru: but fair enough
[21:29:25] <mru> he thinks he is
[21:29:28] <mru> in practice I am
[21:30:50] <BBB> well that's complicated
[21:30:56] <BBB> can you two fight out who maintains what?
[21:31:18] <mru> no
[21:31:23] <BBB> I'll grab popcorn and be amused at the sideline
[21:32:51] <peloverde> git blame mathops.h | wc -l = 158
[21:33:01] <peloverde> git blame -w mathops.h | grep mru | wc -l = 60
[21:33:12] <peloverde> git blame -w mathops.h | grep michael | wc -l = 3
[21:33:30] <mru> see what I mean
[21:35:42] <mru> hmm, someone broke loads of things on fate
[21:36:12] <mru> the ppc is just misconfigured
[21:36:36] <mru> on ia64 our old friend the gprel22 is back
[21:37:10] <mru> and on openbsd gcc segfaults
[21:37:49] <mru> that may have been a fluke though
[21:37:55] <mru> it's not at latest rev
[21:38:56] <peloverde> why does gcc-4.3 on AVR32 have so many failures?
[21:39:06] <michaedw> I think I have root-caused problems with ffmpeg playback of .h264 files (Annex B bytestreams)
[21:39:31] <michaedw> it appears that the file is read in 1K chunks
[21:39:40] <michaedw> and that the decode_frame function does not deal gracefully with split NAL units
[21:39:52] <mru> you need to use the parser
[21:40:38] <michaedw> ok; --enable-parser=h264?
[21:40:50] <michaedw> (during configure)
[21:40:56] <mru> please take this to #ffmpeg
[21:41:01] <michaedw> ok, thanks
[21:41:17] <michaedw> (Dark_Shikari suggested #ffmpeg-devel :-)
[21:41:46] <mru> whom do you trust more?
[21:42:23] <Dark_Shikari> I assumed you meant an ffmpeg bug.
[21:42:25] <Dark_Shikari> Not a bug in your code.
[21:43:38] <Dark_Shikari> BBB: ping
[21:43:55] <Dark_Shikari> oh nvm, got it, Im' stupid.
[21:44:58] <michaedw> well, there have been a number of unresolved bug reports, and the behavior of ffmpeg is less than helpful; it attempts to decode but fails with many messages like:
[21:45:00] <michaedw> [h264 @ 0x914c620]error while decoding MB 0 37, bytestream (-4)
[21:45:00] <michaedw> [h264 @ 0x914c620]concealing 2209 DC, 2209 AC, 2209 MV errors
[21:45:30] <BBB> Dark_Shikari: pong
[21:45:41] <BBB> oh, I guess n/m is for me
[21:46:03] <Dark_Shikari> yeah
[21:46:08] <michaedw> I don't know whether you would consider that a "bug" if there's a command-line flag that causes the file to be broken in saner chunks, but it's not the best user experience
[21:46:09] <mru> Dark_Shikari: correct protocol is "nick: unping"
[21:46:24] <Dark_Shikari> lol
[21:46:25] <mru> michaedw: please go to #ffmpeg
[21:46:32] <Dark_Shikari> mru: I don't think anyone will know there either =p
[21:46:37] <michaedw> ok, thx
[21:46:48] <mru> it's not an ffmpeg dev question
[21:47:28] <michaedw> I neglected to say that I intended to submit a patch to fix it, and was just looking for advice on where to insert the sanity check
[21:47:41] <michaedw> but I can ask that on #ffmpeg too
[21:47:55] <mru> you haven't even shown it to be a bug yet
[21:48:20] <mru> /redirect michaedw #ffmpeg
[21:48:29] <mru> damn, no such command
[21:48:31] <Dark_Shikari> michaedw: submit a bug report
[21:48:38] <peloverde> Speaking of bugs roundup is looking pretty gruesome these days
[21:48:39] <mru> I don't think there's a bug
[21:48:40] <Dark_Shikari> there's a guide on the site
[21:48:45] <Dark_Shikari> mru: well then it's his job to demonstrate it is
[21:49:05] <mru> he's somehow not using the parser
[21:50:28] <Honoome> peloverde: complain to lu_zero :P
[21:50:37] <peloverde> That's not what I mean
[21:51:25] <Honoome> do it anyway! :P
[21:51:25] <peloverde> We are opening new bugs far faster than we are closing old ones, we need to do some bug squashing
[21:51:38] <mru> let's do it the gcc way
[21:51:45] <mru> just close them
[21:52:06] <saintdev> oh, i thought it was ignore the bug for 3 years, then close it invalid
[21:52:39] <mru> release coming up, let's close some bugs so it doesn't look so bad
[21:52:54] <Honoome> saintdev: or keep it open after five years and a number of minor versions
[21:53:20] <saintdev> Honoome: well that doesn't cut down on the number of open bugs...
[21:53:30] <Honoome> that's still what they do
[21:53:41] <saintdev> true
[22:00:03] * Honoome is on the cc of a precompiled headers bug
[22:00:13] <Honoome> which I hit when I was developing an UO server emulator
[22:00:25] <Honoome> that was years before I joined Gentoo...
[22:00:27] * mru doesn't use c++ so has no need for pch
[22:00:44] <Honoome> mru: I was young and inexperienced
[22:00:57] * mru has been there too
[22:01:02] <mru> not c++ though
[22:01:07] <mru> but other silly things
[22:07:25] <BBB> Dark_Shikari: the filter functions are high in my list, so if you want to do something fun and a little more complex, work on sse2/ssse3/mmx filter functions
[22:07:40] <BBB> (high in my list = take a lot of cpu time)
[22:08:03] <michaedw> mru: configuring with --enable-parser=h264 solved it, with no change to the ffmpeg command line; thanks! Sorry for the noise.
[22:08:15] <Dark_Shikari> BBB: I'm doing mc right now
[22:08:21] <Dark_Shikari> I just rewrote mmx idct_dc
[22:08:23] <Dark_Shikari> and wrote an sse4 version
[22:08:29] <Dark_Shikari> you do the loopfilter
[22:11:46] <Honoome> ohhh sse4, shiny :)
[22:11:52] <Honoome> 4.1, 4.2, 4a or 4b? :P
[22:12:34] <Dark_Shikari> it's mostly just a function that would be slower on any cpu that doens't have sse4
[22:12:39] <Dark_Shikari> regardless of the instructions actually used
[22:12:44] <Dark_Shikari> and 2 clocks faster with
[22:12:47] <Dark_Shikari> on i7
[22:12:52] <mru> is there any extension i7 940 doesn't have?
[22:13:11] <Honoome> ah, well then I can be grateful for the laptop ;) although my workstation only has 4a
[22:13:16] <michaedw> btw, --enable-gprof appears to break get_cabac_noinline() in libavcodec/cabac.h
[22:13:37] <mru> it would
[22:13:59] <mru> unable to find a register in class GENERAL_REGS?
[22:14:02] <michaedw> yes
[22:14:24] <mru> don't use that flag
[22:14:28] <mru> we should probably remove it
[22:14:32] <mru> gprof is useless
[22:16:57] <michaedw> ok; thanks. I was thinking of measuring the cabac performance on Pentium-M cores and tuning (if there's room to tune). Would that be of value?
[22:17:12] <Dark_Shikari> cabac is pretty heavily tuned
[22:17:14] <mru> gprof is useless for profiling
[22:17:17] <Dark_Shikari> and yes, gprof is utterly useless
[22:17:19] <Dark_Shikari> use oprofile
[22:17:32] <mru> gprof can give a call graph
[22:17:35] <mru> if that were ever useful
[22:17:37] <Dark_Shikari> the only tuning possible imo is probably algorithmic changes, maybe merging some of the horrible hacks x264 added recently
[22:17:45] <Dark_Shikari> s/x264 added/I added to x264
[22:17:52] <mru> there's a difference?
[22:18:01] <michaedw> :) I thought the table lookup was a good change
[22:19:07] <Dark_Shikari> which table lookup
[22:19:12] <Dark_Shikari> that's not very specific
[22:19:14] <Dark_Shikari> =p
[22:19:34] <michaedw> and I figured it was already pretty well tuned, but there may be variant tunings appropriate for Pentium M (which is still relevant in some embedded devices)
[22:20:00] <mru> my old laptop is p-m
[22:20:09] <Dark_Shikari> p-m isn't very interesting except for having an awful sse unit
[22:20:09] <twnqx> post-mortem? :X
[22:20:17] <twnqx> oh, pentiumm
[22:20:27] <mru> the wifi is half-dead
[22:20:53] <mru> the new i5 is so much nicer
[22:21:02] <twnqx> i have a wifi-card from intel that confused intel engineers... because it doesn't work
[22:21:06] <Honoome> appletv iirc is a pentium-m
[22:23:01] <BBB> Dark_Shikari: I'll apply if there's a little more review, mru jsut called me trigger-happy
[22:23:02] <michaedw> Dark_Shikari: your blog post on "Finite state machines and CABAC", which is of course not recent, but that's the kind of optimization opportunity I thought there might be more of
[22:23:03] <BBB> which is true :-p
[22:24:31] <BBB> Dark_Shikari: and I'll do the loop filter once MC has some decent ops in
[22:24:43] <michaedw> also, L2 cache is at a premium, -Os sometimes performs better than -O2 (at least with older gcc)
[22:25:13] <Dark_Shikari> michaedw: grep x264 log for recent cabac changes
[22:25:18] <Dark_Shikari> and I think you mean "L1I" not "L2"
[22:25:24] <Dark_Shikari> and that's true of all intel chips
[22:25:37] <mru> what is L1 at nowadays?
[22:25:46] <Dark_Shikari> 32k
[22:25:48] <Dark_Shikari> same as always
[22:25:49] <Dark_Shikari> 32k/32k
[22:25:56] <mru> why is that?
[22:26:10] <mru> even tiny embedded arm9 chips have 32k
[22:26:21] <Dark_Shikari> latency
[22:26:31] <Dark_Shikari> intel would have to increase L1 latency in their arch to increase the cache size
[22:26:34] <michaedw> true in general of Intel x86en, extra-true of P-M (from what I've been seeing on this device)
[22:26:37] <Dark_Shikari> the size of the chip doesn't matter
[22:26:53] <Dark_Shikari> the biggest chip in the world is still bottlenecked by how much L1 you can fit near the alu
[22:27:15] <astrange> cabac tuning: remove the memory load/stores from the x86 asm so subsequent cabac calls can reuse them from registers
[22:27:44] <Dark_Shikari> BBB: ping
[22:27:46] <BBB> pong
[22:27:49] <mru> cortex-a9 can be configured with 64k/64k L1
[22:27:52] <Dark_Shikari> Explain the ordering of your taps
[22:27:55] <astrange> tuning 2: delete the 3-4 "bit &= 1;" and change it to "return bit & 1;" at the end because compilers can't compine and/test/branch to and/branch across if statements
[22:27:58] <Dark_Shikari> why does doing the last tap first magically avoid overflows?
[22:28:06] <astrange> doubt there's much more than that
[22:28:24] <BBB> Dark_Shikari: first the negatives, then the positives
[22:28:28] <Dark_Shikari> But they're not in that order.
[22:28:31] <BBB> Dark_Shikari: the negatives never overflow, since they're small
[22:28:33] <michaedw> also, I am thinking of implementing a data-partitioning post-processor for H.264 streams, for use with unequal error partitioning
[22:28:39] <BBB> Dark_Shikari: ?
[22:28:44] <michaedw> er, error protection
[22:28:46] <Dark_Shikari> some of the time the negatives are last
[22:28:48] <Dark_Shikari> some of the tmie they're first
[22:28:50] <Dark_Shikari> depending on the position
[22:28:54] <BBB> huh?
[22:29:07] <BBB> are you looking at the correct table?
[22:29:14] <Dark_Shikari> -6,123,12,-1
[22:29:17] <BBB> the negatives are always in the same position
[22:29:24] <BBB> in fourtap, negatives are [0] and [3]
[22:29:25] <Dark_Shikari> oh, I see what you mean
[22:29:28] <Dark_Shikari> they're always on the outside
[22:29:29] <BBB> in sixtap, they're [1] and [4]
[22:29:32] <Dark_Shikari> the makes the code really fucking annoying
[22:29:35] <michaedw> astrange: thanks, those sound like good suggestions, I'll measure the impact
[22:29:35] <BBB> yes
[22:29:44] <BBB> I didn't say my code was pretty, just that it worked
[22:29:46] <Dark_Shikari> ... are they always negative?
[22:29:48] <BBB> yes
[22:29:54] <BBB> and the others are always positive
[22:30:43] <mru> of course the negatives are there
[22:30:47] <mru> it's an interpolation filter
[22:31:01] <astrange> tuning 3 (for x86-64/arm/other non-asm platforms): port the use of cmov back to the C code and hope compilers do it right
[22:31:12] <astrange> or write cabac asm for arm
[22:31:20] <Dark_Shikari> oh, I got an idea.
[22:31:20] <BBB> ah, a review
[22:31:29] <BBB> MN also wants me to remove the splats
[22:31:30] <BBB> damn
[22:31:34] <BBB> maybe tomorrow
[22:31:42] * BBB should seriously do work that he's being paid for now
[22:32:21] <Dark_Shikari> check my email
[22:34:37] <michaedw> astrange: I'm more likely to take the port-back-to-C route and see what I need to do to the C to get good asm. which compiler do you consider most relevant for ARM -- gcc 4.5? llvm-gcc? and which tunings interest you most -- cortex-a9? hard-float ABI? NEON?
[22:34:40] <Honoome> hm
[22:34:59] <Honoome> astrange just reminded me that I either fiddle with the ebuild or I make sure ffmpeg's configure understand "barcelona"
[22:35:00] <mru> float is irrelevant to cabac
[22:35:14] <mru> Honoome: send patch
[22:35:28] <Honoome> mru: yeah yeah I know, it just escaped my mind for... a bit
[22:35:34] <mru> michaedw: gcc 4.3 and 4.5 are relevant for arm
[22:35:39] <michaedw> mru: yes, but NEON may not be, and the choice of float ABI affects how parameters are passed
[22:35:58] <mru> you won't be able to use neon for cabac decode
[22:38:02] <mru> and either way, both abis should be supported
[22:38:37] <michaedw> mru: NEON bitwise operations may be worth experimenting with; CodeSourcery has been working on reducing the friction. http://old.nabble.com/-PATCH,-ARM-%3A-rewrite-NEON-bitwise-operations-witho…
[22:38:52] <mru> bwaahahaha
[22:38:55] <Dark_Shikari> >20 cycle latency
[22:38:58] <Dark_Shikari> >experiment with neon bitwise ops
[22:39:01] <Dark_Shikari> >20 cycle latency
[22:39:02] <Dark_Shikari> >20 cycle latency]
[22:39:05] <Dark_Shikari> >20 cycle latency
[22:39:18] <michaedw> to move data from integer to NEON or vice versa, yes
[22:39:29] <mru> only from neon to arm
[22:39:31] <mru> arm to neon is fast
[22:39:43] <mru> it's a result of the pipeline organisation
[22:40:11] <mru> anyway
[22:40:21] <mru> rule #1: gcc CANNOT use neon _AT ALL_
[22:40:30] <michaedw> it puts a premium on replacing conditionals with arithmetic
[22:40:31] <mru> do not even let it try
[22:41:17] <michaedw> gcc isn't yet good at vectorizing pre-existing code, true
[22:41:27] <mru> that's an understatement
[22:41:45] <mru> you're lucky if the code even does the right thing
[22:42:01] <mru> let alone faster
[22:42:04] <Dark_Shikari> +10000
[22:42:07] <mru> there's a good reason we disabled it in ffmpeg
[22:42:08] <michaedw> but it does a good job on C code that's written specifically to be vectorized
[22:42:10] <Dark_Shikari> we're committing fno-tree-vectorize to x264 in a few days
[22:42:13] <mru> bwahahahaah
[22:42:18] <Dark_Shikari> because on C code that was written "specifically to be vectorized"
[22:42:21] <Dark_Shikari> it CRASHED X264
[22:42:25] <Dark_Shikari> because it used aligned loads improperly
[22:42:26] <mru> as I said
[22:42:38] <Dark_Shikari> and all because I converted a memset to a loop
[22:42:44] <Dark_Shikari> I had the gall to write C code
[22:42:51] <Honoome> mru: oh well it seems someone else already handled that, ./configure --cpu=barcelona does not complain any longer ;)
[22:43:24] <mru> patch still needd
[22:43:26] <michaedw> sorry, should have mentioned that I cherry-picked loads of fixes from the CodeSourcery compiler
[22:43:26] <mru> +e
[22:43:31] <Honoome> mru: uhm? what for?
[22:43:39] <mru> to set various things properly
[22:43:59] <mru> what cpu is barcelona?
[22:44:09] <mru> michaedw: codesourcery is just as bad as plain gcc at vectorising
[22:44:11] <mru> if not worse
[22:44:18] <michaedw> the FSF releases aren't as stable with aggressive ARM optimizations (especially NEON) as they might be
[22:44:24] <Honoome> mru: barcelona is amdfam10
[22:44:30] <michaedw> probably worse at vectorizing general code
[22:44:43] <mru> Honoome: then add barcelona to the relevant line in configure
[22:44:53] <mru> just search for amdfam10
[22:45:20] <michaedw> but their changelog is a good guide to which changes to cherry-pick from bugs and mailing list archives
[22:45:28] <mru> michaedw: compilers are a joke
[22:45:34] <mru> gcc-based ones doubly so
[22:46:09] <michaedw> they're just tools
[22:46:18] <michaedw> not always the right tool for the job
[22:46:24] <mru> they're tools in both senses
[22:46:36] <Honoome> mru: it seems to work already o_O that line only enables cmov/fastcmov and they are both enabled already
[22:46:44] <mru> if you want anything resembling performance, you have no choice but writing asm yourself
[22:46:58] <mru> Honoome: ah right, it's amd64
[22:47:14] <Dark_Shikari> michaedw: they just aren't as stable period
[22:47:17] <Dark_Shikari> that vectorization crash?
[22:47:18] <Honoome> not even sure if gcc understands -march=barcelona in 32-bit
[22:47:19] <Dark_Shikari> that was on x86t
[22:47:20] <Dark_Shikari> *x6
[22:47:21] <Dark_Shikari> *x86
[22:47:31] <michaedw> mru: agreed. but I prefer to write the asm, then find a way to prod the compiler into generating something close to the same asm from C.
[22:47:47] <mru> Honoome: it does
[22:48:00] <Honoome> mru: then will double check and see to patch that in case
[22:48:07] <mru> michaedw: and then have the next release of the compiler fuck up all over again
[22:48:08] <michaedw> reduces the risk of writing code that is pessimal on someone else's chip
[22:48:11] <mru> no thanks
[22:48:27] <Honoome> Dark_Shikari: -ftree-vectorize is _known_ to produce bad code on x86; it has a much *much* better history on x86-64
[22:48:46] <mru> we've seen orders-of-magnitude slowdowns there too
[22:48:55] <michaedw> x86 is too register-poor to get much value out of it
[22:48:56] <peloverde> -ftree-vectorize is gcc's audio encoders?
[22:49:10] <mru> x86 has the same sse regs as -64 no?
[22:49:28] <Honoome> mru: no doubt on the slowdowns, I'm referring to positively bad code (crashing code) though
[22:49:33] <Dark_Shikari> mru: no, x86_64 has 16 of them
[22:49:46] <mru> oh right
[22:49:47] <michaedw> and generally needs to be combined with manual cache-bypassing load/store
[22:49:53] <mru> 16x128bit?
[22:49:55] <Dark_Shikari> yes
[22:50:19] <mru> so can we just agree that compilers suck?
[22:50:43] <Dark_Shikari> BBB: ugh. I have some code that's failing on only a few of the test cases (MC code)
[22:50:48] <Dark_Shikari> but I'm pretty sure I got the overflow case right
[22:50:53] <michaedw> mru: sure :-)
[22:51:05] <Dark_Shikari> .... oh damn. I didn't
[22:51:08] <Dark_Shikari> it's [1] and [4]
[22:51:10] <Dark_Shikari> not [0] and [5]
[22:51:23] <astrange> do you have the patch that made gcc produce crashing code?
[22:52:07] <Dark_Shikari> astrange: the pixel_t patch did it
[22:52:12] <Dark_Shikari> it's the pixel_memset function that crashes it
[22:52:15] <Dark_Shikari> I don't know on what compilers
[22:52:21] <Dark_Shikari> we got a bug report from s55 from handbrake
[22:52:23] <Dark_Shikari> talk to the handbrake guys
[22:52:28] <Dark_Shikari> it's on win32, iirc
[22:52:43] <Dark_Shikari> I suspect it's another case of gcc not realizing what is and isn't aligned
[22:52:57] <Honoome> mru: sent the patch
[22:53:01] <michaedw> so this data-partitioning H.264 post-processor would need to decode the incoming stream as far as the CABAC expansion, then extract the residual blocks and re-CABAC them separately from the rest of the stream
[22:53:53] <michaedw> that means 4 separate CABAC contexts, 1 for decode and 3 for encode
[22:55:21] <CIA-99> ffmpeg: stefano * r23724 /trunk/libavformat/avformat.h: Fix date specification accepted by parse_date().
[22:55:24] <CIA-99> ffmpeg: stefano * r23725 /trunk/libavformat/avformat.h: Mention how "now" is interpreted in the parse_date() doxy.
[22:55:25] <CIA-99> ffmpeg: stefano * r23726 /trunk/doc/ffmpeg-doc.texi:
[22:55:25] <CIA-99> ffmpeg: Extend documentation for the ffmpeg -timestamp option.
[22:55:25] <CIA-99> ffmpeg: '(' and ')' are used instead of '{' and '}' in the date specification
[22:55:25] <CIA-99> ffmpeg: as the latter confound the texinfo interpreter.
[22:55:26] <CIA-99> ffmpeg: stefano * r23727 /trunk/ffmpeg.c:
[22:55:26] <CIA-99> ffmpeg: Rename rec_timestamp to recording_timestamp, for consistency with
[22:55:26] <CIA-99> ffmpeg: recording_time.
[22:55:44] <BBB> Dark_Shikari: yeah, for sixtp it's [1] and [4], fourtap is same as sxtap except that [0] and [5] are zero, so [1] in sixtp is [0] in fourtap, etc.
[22:56:05] <Dark_Shikari> k, got it working
[22:57:42] <Dark_Shikari> BBB: one protip
[22:57:45] <Dark_Shikari> packuswb X, Y
[22:57:48] <Dark_Shikari> where you don't intend to use the Y half
[22:57:53] <Dark_Shikari> you can put ANYTHING there if you intend to movd afterwards.
[22:57:56] <Dark_Shikari> including X
[22:57:58] <Dark_Shikari> e.g. packuswb X, X
[22:58:11] <BBB> I guess that makes sense
[22:58:19] <Dark_Shikari> so if you don't have a zero reg
[22:58:21] <Dark_Shikari> you can do Whatever You Want
[22:58:38] <BBB> I remember adding a pxor for that somewhere :-p
[22:58:44] <Dark_Shikari> yup
[22:58:49] <Dark_Shikari> anyways, _almost_ done rewriting it
[22:58:56] <Dark_Shikari> just a last few tweaks and then I'll give it to you
[22:59:10] <BBB> I can just commit, then you commit on top
[22:59:16] <BBB> increases your ohloh status :-p
[22:59:28] <Dark_Shikari> ok
[22:59:32] <Dark_Shikari> however
[22:59:33] <Dark_Shikari> before you do that
[22:59:36] <Dark_Shikari> I'd like you to see what I did
[22:59:43] <BBB> ok
[22:59:53] <BBB> send it up, or reply on-list so others can see it too :-p
[22:59:56] <michaedw> has any particular profiling/tuning been done on the ffmpeg "data partitioning" code path, which has the same issue (3 CABAC contexts, in this case for decode)?
[23:00:07] <Dark_Shikari> also, can I promote an ff_pw_X to xmm_reg if I need it for xmm?
[23:00:12] <Dark_Shikari> (without breaking existing stuff)?
[23:01:06] <BBB> that you need to ask others, I have no idea ;)
[23:02:25] <Dark_Shikari> well whatever, I'll add one more instruction to avoid asking that question
[23:02:28] <Dark_Shikari> also the answer is "no"
[23:02:51] <BBB> I guess you tried and it broke? :)
[23:03:02] <Dark_Shikari> yes
[23:03:10] <Dark_Shikari> ok, finally, I think I'm ready
[23:03:47] <Dark_Shikari> oh, I'm a moron. I had ffv2 locally committede.
[23:05:54] <Dark_Shikari> BBB: see ffmpeg ml
[23:06:07] <Dark_Shikari> I'd like you to understand every change I made
[23:06:15] <Dark_Shikari> if it seems stupid say so
[23:06:17] <Dark_Shikari> because it might be.
[23:06:20] <Dark_Shikari> also I didn't bench any of my mc changes.
[23:06:27] <Dark_Shikari> also the mc code is going to suck gigantic cocks on atom
[23:07:39] <Dark_Shikari> I did bench the idct stuff though. it's faster.
[23:08:03] <BBB> I'll bench it... probably have to do that tonight, wife is calling :-p
[23:08:24] <BBB> if I have questions I'll bug you tonight or tomorrow, I'll look at it
[23:10:00] <BBB> the filter expansions make sense... I did bench it and it didn't really make a speed difference (which is why I decreased them; I had them as your _h first), but I guess it doesn't make logical sense that it's not faster like this...
[23:10:17] <Dark_Shikari> more importantly
[23:10:21] <Dark_Shikari> it lets us do what I did in the V functions
[23:10:31] <BBB> right, separating the V ones makes sense
[23:10:47] <BBB> I was going to try that, but got busy putting the VP8 C decoder in svn instead ;)
[23:11:15] <BBB> there's no changes int he rest of the h4 4x4 filter, is there?
[23:11:27] <Dark_Shikari> h is the same yes
[23:11:33] <Dark_Shikari> nothing major
[23:11:37] <Dark_Shikari> V was rewritten entirely
[23:11:44] <Dark_Shikari> well the main loop was
[23:11:50] <BBB> h6 also looks similar
[23:11:54] <BBB> let's see what you did to my v
[23:11:54] <Dark_Shikari> yes
[23:12:28] <Dark_Shikari> I suspect h can be further optimized, but I didn't spend much time on it
[23:12:31] <Dark_Shikari> I just converted the opening part
[23:12:33] <Dark_Shikari> like you said
[23:13:38] <BBB> hm, so for v4, you free up regs by using [r4] instead of loading the filter in memory, that's smart, I guess
[23:13:44] <CIA-99> ffmpeg: mru * r23728 /trunk/libavcodec/ (bfin/vp3_bfin.c h264idct.c vp3dsp.c vc1dsp.c): Improve some uses of ff_cropTbl with constant offset
[23:13:45] <BBB> oh no you still use mm5
[23:13:49] <BBB> so same number of regs
[23:14:03] <BBB> but you don't splat them, as you said earlier
[23:14:12] <BBB> interesting, let's see how different that is
[23:14:17] <Dark_Shikari> it reduced the number of moves
[23:14:18] <BBB> then v6, my v6 was pretty horrible
[23:14:35] <BBB> oh shit wife is calling
[23:14:40] <BBB> ok, v6 will come tonight :p
[23:16:08] <Dark_Shikari> night
[23:17:29] <Honoome> f-ck
[23:18:03] <mru> what's wrong?
[23:19:02] <Honoome> I've been waiting two weeks for the specs of a job, I receive them, get back to hack at it... another dev introduced the use of bundler (it's a rails app) ... and now I'm spending two weeks just to get the thing working _again_
[23:19:06] <CIA-99> ffmpeg: mru * r23729 /trunk/ (configure libavcodec/Makefile libavcodec/beosthread.c):
[23:19:06] <CIA-99> ffmpeg: Remove beosthreads support
[23:19:06] <CIA-99> ffmpeg: Relevant BeOS variants support pthreads, so there is no need to
[23:19:06] <CIA-99> ffmpeg: maintain the beos-native threads interface.
[23:19:43] <Honoome> [because I don't trust installing the gems through rubygems... what I install is what I'm _sure_ works after testing the heck out of it]
[23:19:49] <ohsix> btw to the /dev/stdout convo earlier; that _is_ a bashism (and there are other paths like /dev/tcp that are also handled in the shell), the real /dev/stdout is there to catch other shells more than anything else
[23:21:08] <Honoome> ohsix: >/dev/stdout is a bashism, /dev/stdout as a file parameter shouldn't be touched by bash (but work by the fact that there is a /dev/stdout for compatibility with bash, at that point)
[23:23:44] <mru> Honoome: guess why I refuse to work on anything web related
[23:24:04] <Honoome> mru: oh I know... I was just in a bit of a pinch :/
[23:24:16] <Honoome> now I'm just trying to finish this task to drop out of the project
[23:26:53] <CIA-99> ffmpeg: flameeyes * r23730 /trunk/configure:
[23:26:53] <CIA-99> ffmpeg: Add barcelona to the list of cmov/fast_cmov compatible CPUs.
[23:26:53] <CIA-99> ffmpeg: For GCC, barcelona is just an alias for amdfam10, so simply add it in
[23:26:53] <CIA-99> ffmpeg: there.
[23:27:27] <Dark_Shikari> can I make all the vp8 mc functions, global or not, have ff for consistency?
[23:27:31] <Dark_Shikari> it makes all the macros cleaner
[23:27:39] <Dark_Shikari> so that a macro doesn't have to know if the function it's calling is asm, or a C wrapper around more asm
[23:27:41] <mru> of course
[23:27:43] <Dark_Shikari> k
[23:27:54] <mru> there are no rules for static symbols
[23:28:02] <mru> except the usual c/posix ones
[23:28:22] <mru> and it's polite to keep names reasonably descriptive
[23:28:31] * mru glares at coreavc
[23:29:54] <Dark_Shikari> moment of truth: sse mc regression time
[23:29:59] <Dark_Shikari> *regression test
[23:30:08] <Dark_Shikari> AND IT WORKS.
[23:30:09] <Dark_Shikari> \o/
[23:30:20] <Dark_Shikari> I love the abstraction layer.
[23:30:31] * Honoome glares at ivi_common.o
[23:30:35] <Honoome> over 500k of .bss ?!
[23:33:42] <Honoome> that's a grand total of 4MB of .bss for the whole ffmpeg build... poor poor cows
[23:35:13] <mru> I keep telling kshishkov to use less .bss...
[23:35:44] <michaedw> it looks like intra_gb_ptr is only used in ff_h264_decode_mb_cavlc(). Does that mean that ffmpeg's h.264 decoder doesn't handle the combination of CABAC and data partitioning?
[23:35:57] <mru> hmm, is that maxim's file?
[23:36:25] <ohsix> Honoome: ya, only mentioned it cuz it was said no bashisms were visible
[23:36:44] <Honoome> mru: ivi_common.c? looks that way
[23:37:16] <michaedw> sorry, that should have gone to #ffmpeg
[23:37:56] <Kovensky> <Honoome> that's a grand total of 4MB of .bss for the whole ffmpeg build... poor poor cows <-- cows? o_O
[23:38:03] <mru> copy-on-write
[23:38:04] <Honoome> Copy-on-Writes
[23:38:08] <Kovensky> lol
[23:38:13] * Kovensky facepalms
[23:38:44] <mru> there's also the theory that .bss stands for bull-shit segment
[23:39:02] <mru> it kind of ties in
[23:40:00] <spaam> haha .)
[23:40:20] <ohsix> wasn't it some ancient grandfathered assembler thing
[23:43:43] <mru> Honoome: see r21977
[23:43:56] <mru> that file's .bss was even bigger before
[23:44:00] <mru> due to a silly typo
[23:44:41] <Honoome> gha
[23:46:15] <michaedw> My employer might be interested in sponsoring some work on ffmpeg to support experiments with data partitioning and unequal error protection. What would be an appropriate channel to explore that?
[23:48:22] <mru> Dark_Shikari or michael
[23:49:33] * kierank wonders why michaedw's employer would be interested in such things considering their vast purchase
[23:50:04] <mru> what did they buy?
[23:50:09] <kierank> tandberg
[23:50:15] <michaedw> interop is good
[23:50:22] <kierank> LOL
[23:51:28] <mru> michaedw: which part do you work in?
[23:51:32] <michaedw> and hey, it's a big company, and different efforts move at different sppeds
[23:51:51] <michaedw> it's telepresence-related
[23:52:35] <michaedw> (telepresence. that has to be the silliest marketing coinage in this area.)
[23:53:19] <kierank> the fact that is in all recent michael bay films is even sillier
[23:54:23] <michaedw> getting the end of the dinosaur that operates the eyes to talk to the end that operates the tail can be ... slow.
[23:55:24] <michaedw> and the end that my eyes are connected to thinks that we ought to make an attempt to measure things before we go merging apples and rutabagas
[23:56:21] <kierank> get some more "human network" adverts please
[23:56:27] <kierank> with more baba o'reilly
[23:56:30] <kierank> those rocked
[23:57:05] <michaedw> so if anyone would be interested in measuring what's achievable within the existing, under-implemented corners of the H.264 spec, we might have common interests :-)
1
0
[01:04:29] <saintdev> peloverde: where should i put this sample?
[01:06:21] <CIA-92> ffmpeg: bcoudurier * r23672 /trunk/tools/qt-faststart.c: fail if input and output are the same
[01:07:30] <bcoudurier> just ported hqdn3d to avfilter
[01:08:17] <peloverde> saintd3v, incoming
[01:08:26] <saintdev> can i upload there?
[01:08:38] <peloverde> yes
[01:08:47] <peloverde> incoming is write-only
[01:09:42] <saintdev> i knew that i just didn't know if anonymous had write permission
[01:12:19] <peloverde> let me know the filename when it is done
[01:12:40] <saintdev> ok, trying to get one that does it :P
[01:24:18] <saintdev> woohoo, got one
[01:24:37] <saintdev> peloverde: incoming/chan_elem_not_alloc/
[01:28:28] <peloverde> ok
[01:29:54] <mru> wtf is the matter with michael?
[02:01:56] <Kovensky> what happen
[02:05:11] <saintdev> somebody set up us the bomb
[02:06:24] <Kovensky> o rly
[02:07:32] <peloverde> o_O ಠ_à²
[02:08:29] * saintdev asplodes
[02:13:23] <astrange> i see faulty dts detection in ffplay, what kind of file could have non-monotonic dts?
[02:42:39] <peloverde> saintd3v, what's wrong with that file?
[02:43:32] <astrange> http://samples.mplayerhq.hu/archive/container/mpeg/mpeg%2bmpeg2video%2bac3%… this one
[02:44:40] <peloverde> saintd3v, it spams some garbage but that's because it has a broken frame upfront, faad wont even play it
[02:45:00] <mru> why the hell does michael suddenly not want to optimise things?
[02:45:29] <peloverde> I wish he felt the same way about PS
[02:47:49] <saintdev> peloverde: ok, that's what it was. i suspected it was an incomplete frame. I just wasn't sure.
[02:53:05] <peloverde> In fact this is actually a pretty good example of the error tolerance of ffaacdec
[02:53:59] <peloverde> faad just worthlessly says "Error: Channel coupling not yet implemented"
[02:54:20] <mru> lol yet
[02:55:35] <saintdev> peloverde: how about a sample that causes a segfault?
[02:55:46] <saintdev> probably the same issue (incomplete leading frame)
[02:56:00] <peloverde> yes, segfaults are more interesting
[02:56:19] <saintdev> ok, do you want a bt, or not?
[02:56:54] <peloverde> bt?
[02:56:58] <saintdev> backtrace
[02:57:00] <mru> gdb
[02:57:10] <peloverde> ahh
[02:57:18] <peloverde> I can make my own/valgrind it
[02:57:22] <saintdev> ok
[02:58:57] <peloverde> "chan_elem_not_alloc-and-gt_1_rdb.aac" seems valgrind clean
[03:06:39] <saintdev> peloverde: incoming/ffmpeg_aacdec_segfault/
[03:06:46] <peloverde> ok
[03:49:11] <peloverde> Wow, that segfault file is behaving strangely
[03:51:08] <saintdev> should i have kept the gnomes out of my computer then?
[03:52:45] <peloverde> The first frame is an empty frame, wow!
[03:56:25] <peloverde> try_decode_frame handles the first frame fine because channel count is zero, then the second frame sets the channel count, and in spectral to sample it's called with the second channel count
[04:05:58] <CIA-92> ffmpeg: alexc * r23673 /trunk/libavcodec/aacdec.c: aacdec: Handle the first frame being empty case.
[04:06:13] <saintdev> \o/
[04:06:30] <peloverde> That should fix it
[04:07:03] <saintdev> sure does
[04:07:08] <astrange> ffmpeg -i <something with subtitles.mkv> -scodec xsub f.mkv creates a track with no codec type
[04:07:17] <astrange> at least according to mkvinfo
[04:08:06] <peloverde> saintd3v, and faad says "Error: Invalid number of channels"
[04:08:07] <CIA-92> ffmpeg: alexc * r23674 /trunk/libavcodec/aacdec.c: aacdec: Factorize if (elem_type < TYPE_DSE).
[04:09:58] <saintdev> yeah :P
[04:10:22] <saintdev> thanks again, i'll let you know if i have any more issues
[04:10:30] <peloverde> ok
[04:11:07] <CIA-92> ffmpeg: alexc * r23675 /trunk/libavcodec/aacdec.c: aacdec: cosmetics: whitespace
[04:15:08] <CIA-92> ffmpeg: alexc * r23676 /trunk/libavcodec/aacdec.c: aacdec: cosmetics: (more) whitespace
[04:16:07] <CIA-92> ffmpeg: astrange * r23677 /trunk/ffmpeg.c: ffmpeg: cosmetics: combine two variable declarations
[06:54:01] <Tjoppen> "Georgia To Become IT Tax Haven"
[06:54:59] <thresh> we should've invaded Tbilisi when we had a chance..
[06:58:58] <kshishkov> thresh: then Georgia wouldn't have its
[07:00:37] * kshishkov is very glad not to live in Ukraine now though
[07:01:17] <elenril> did they go bankrupt already?
[07:04:20] <thresh> kshishkov: you moved?
[07:04:48] <av500> he did
[07:04:51] <kshishkov> elenril: who? I f you mean Ukraine then it's almost bankrupt for two decades
[07:05:01] <av500> he, like greece
[07:07:27] <elenril> kshishkov: almost
[07:07:37] <kshishkov> av500: Greece is luckier, it just borrowed a lot of money and will enjoy watching Germans paying its debts
[07:07:47] <elenril> lolgreece
[07:09:10] <elenril> fun fact: slovakia (with much worse hdi/quality of life) had to borrow tons of money so they can give them to greece
[07:10:15] <elenril> at least cz is not in eurozone
[07:11:29] <Tjoppen> ooh, BBB applied my http patch. I'll have to test it and provide some feedback
[07:13:26] <av500> kshishkov: greece already payed all money it borrowed back by buying german submarines
[07:15:35] <kshishkov> elenril: really? Do they still use krona?
[07:15:55] <elenril> yes
[07:23:02] <wbs> elenril: any progress on them changing to euro? estonia is changing next year iirc
[07:25:08] <av500> is it "progress"?
[07:25:22] <elenril> ^this
[07:25:34] <elenril> our goverment doesn't want to change
[07:26:18] * kshishkov thought that Estonia had kronor for currency as any proper country
[07:26:53] <elenril> a few weeks ago our national bank got a new governor, and his first official statement was that we won't be switching to euro anytime soon
[07:27:40] * kshishkov still has a few thousand SEK to spend on lucky occasion
[07:27:50] <Tjoppen> :)
[07:27:52] <wbs> kshishkov: they have at the moment, yes
[07:28:00] <Tjoppen> sweden is still on SEK
[07:28:19] <wbs> av500: well, at least for those travelling a lot, it simplifies things ;P
[07:28:31] <kshishkov> Tjoppen: last year Sweden also used öre
[07:28:54] <Tjoppen> öre will be dropped in about a year or two
[07:29:08] <kshishkov> I thought it was already
[07:29:28] <av500> Tjoppen: dropped into the öresund?
[07:29:31] <Tjoppen> nope. but they switched to cheaper and smaller 50 öre coins a couple of years ago
[07:30:37] <iive> Tjoppen: would you mind switching your client to utf8 encoding. you are outputting latin1.
[07:32:20] <superdump> no one gives a crap about öre it seems
[07:32:28] <Tjoppen> not until next time this server restarts
[07:32:40] <Tjoppen> restarting screen+irssi is quite a pain
[07:32:56] <superdump> it's like a half-penny
[07:32:58] <wbs> Tjoppen: you can easily change what charset it outputs to a given channel
[07:33:00] <elenril> you don't need to restart anything
[07:33:17] <superdump> well, more like 5p i guess
[07:33:22] <Tjoppen> wbs: irssi - yes. screen - no (I think)
[07:33:31] <superdump> but 5p isn't useful really, except for 5p sweets
[07:33:42] <kshishkov> superdump: yes, more like fivepence
[07:33:45] <wbs> Tjoppen: yes, and you can use latin1 for your screen, but still making irssi recode all you write into utf8 for this channel
[07:33:48] <elenril> screen isn't utf-8 by default?
[07:33:58] <wbs> elenril: depends on your locale
[07:34:04] <Tjoppen> nope. it needs -U
[07:34:28] * elenril hopes they'll purge anything non-unicode soon
[07:34:31] <Tjoppen> oddly enough, my terminal has UTF-8. somewhere along the line it changes the encoding
[07:36:29] <siretart> moroning
[07:37:09] * kshishkov moves siretart to MKAD
[07:38:04] <elenril> so what's up with switching to git? are we waiting for end of gsoc or not?
[07:38:20] <superdump> kshishkov: what amuses me is watching "who wants to be a millionaire?" on swedish tv knowing that they'll get ~11x less than one would in england
[07:40:05] <kshishkov> superdump: it should be funnier in Russia - if they use the same numbers they get ~5 times less than in Sweden
[07:40:20] <superdump> :)
[07:46:37] <KotH> moin
[07:46:45] <kshishkov> gruess dich
[07:46:52] <av500> grÃŒezi
[07:49:48] <siretart> kshishkov: what's MKAD?
[07:55:50] <benoit-> Moscow or Minsk?
[08:18:53] <kshishkov> benoit-: stupid question, MKAD in Minsk is not a state border
[09:01:35] <twnqx> hm. how would i run gdb on a program that reads from stdin - or rather, how do i pipe so that the piped data hits the debuged program and not gdb? :X
[09:02:20] <pross-au> cat foo | gdb --args ./a.out
[09:03:16] <twnqx> no, in that case gdb reads the data...
[09:03:51] <twnqx> and i get stuff like (gdb) warning: bad breakpoint number at or near '|20100611000255|6|21|ftp|754|867'
[09:05:51] <twnqx> oh well, coredump debugging will do
[09:06:01] <Tjoppen> use a named pipe?
[09:07:16] <twnqx> i'd have to rewrite the program. also, problem already solved :)
[09:08:06] <janneg> twnqx: write it into /proc/PID/fd/0
[09:08:32] <twnqx> hm
[09:08:41] * twnqx will try to remember when the program crashes again
[09:18:12] <CIA-92> ffmpeg: diego * r23678 /branches/0.6/RELEASE: Fix two small typos.
[09:24:56] <Kovensky> <@Tjoppen> wbs: irssi - yes. screen - no (I think) <-- C-a : utf8 on
[09:31:24] <Tjoppen> rÀksmörgås
[09:31:39] <spaam> Nice
[09:31:42] <Tjoppen> \o/
[09:31:56] <Tjoppen> â
[09:32:04] <iive> Tjoppen: congratulations :)
[09:32:09] <Tjoppen> that didn't seem to work
[09:32:33] <Tjoppen> (unicode snowman)
[09:32:37] <iive> looks like ring or a bomb
[09:32:52] <Tjoppen> I can paste it, but it doesn't show up correctly here
[09:33:10] <Tjoppen> http://unicodesnowmanforyou.com/
[09:33:39] <Tjoppen> or http://â.net
[09:34:42] <iive> it shows correctly here, but antialiasing makes horrible things with it. inverted (selected) looks much better.
[09:38:54] <spaam> Tjoppen: using windows?
[09:40:29] <Tjoppen> nope, ubuntu
[09:40:46] <Tjoppen> the machine the screen is on run who-knows-what
[09:41:01] <Tjoppen> ah, solaris 10
[09:41:20] <spaam> what terminal do you use ? :)
[09:41:39] <Tjoppen> gnome terminal
[09:41:44] <spaam> urxvt is better with unicode stuff :)
[09:42:12] <Tjoppen> bah, now all other channels encoding-fail
[09:45:20] <wbs> Tjoppen: like I said, you don't need to change it all for once, and it doesn't need to be the same as for other channels
[09:45:33] <wbs> Tjoppen: irssi has a flexible recoding framework, and you can set which charset to send to each channel
[09:46:08] <KotH> unfortunately, that framework does not allow channels with mixed encodings
[09:46:17] <Tjoppen> maybe so, but screen doesn't
[09:46:30] <Tjoppen> or.. hm
[09:46:44] <wbs> KotH: it handles latin1 and utf8 just fine at least
[09:47:01] <wbs> Tjoppen: yes, but screen should be set up to use the same as your terminal
[09:47:43] <wbs> if you've got a latin1 terminal and write åÀö on a utf8-channel, irssi sends it encoded as utf8 over the line to the other users, but still displays it using latin1 on your own terminal
[09:48:04] <wbs> and likewise, when receiving things that other write, it detects which one of the charsets it uses and recodes it back to the charset your terminal uses
[09:48:11] * Tjoppen head asplode
[09:50:20] <wbs> you could just have done /recode add #ffmpeg-devel utf-8
[09:50:53] <wbs> and /set recode_out_default_charset latin1 so that it doesn't mess with your other channels
[10:36:38] <CIA-92> ffmpeg: diego * r23679 /trunk/libavutil/ (mips avr32): Ignore compiled headers.
[11:09:14] <twnqx> hm. are two-dimensional arrays implemented via pointers to pointers extremely inefficient?
[11:10:29] <Tjoppen> depends on how you iterate over it
[11:11:22] <mru> twnqx: why would you do that?
[11:11:47] <mru> the speed inefficiency obviously depends on how you access it
[11:12:09] <twnqx> http://pastebin.com/UXYheAZ5 - both function have the same absolute runtime in the profiler :X
[11:12:33] <mru> what should be looking at?
[11:12:38] <mru> *I
[11:12:46] <twnqx> the insert_line function...
[11:12:59] <twnqx> if if-clause hits about 26 times out of 120000 calls
[11:13:30] <twnqx> still it has the same absolute runtime as the parser below with lots of ascii-to-integer conversions
[11:13:43] <twnqx> 53.52 0.24 0.24 4801308 0.00 0.00 insert_line
[11:13:43] <twnqx> 42.37 0.43 0.19 4802382 0.00 0.00 parse_line
[11:13:46] <twnqx> if not worse.
[11:13:59] <mru> don't tell me you're using gprof
[11:14:04] <twnqx> i was...
[11:14:09] <mru> gprof cannot be trusted
[11:14:16] <mru> use oprofile
[11:14:27] <twnqx> meh, my kernel is not profiling enabled...
[11:14:32] <twnqx> but well.
[11:15:23] <mru> anyhow, accessing a single element from two-level pointer needs two memory loads
[11:15:27] <mru> only one with a flat array
[11:15:38] <twnqx> mh
[11:15:40] <mru> some CPUs have stupidly long load latencies
[11:16:27] <mru> if you're traversing it row-wise and cache the row pointers in local variables, it shouldn't make much difference
[11:16:28] <twnqx> the flat array would have 254*65535 never-to-be-used elements though
[11:16:37] <mru> sparse array?
[11:16:41] <twnqx> yes
[11:16:46] <mru> why didn't you say so
[11:16:52] <mru> that's a different story entirely
[11:16:52] <twnqx> 254 * 1 element, 2* 65536
[11:17:13] <twnqx> but fully random access anyway, and the memory to flatten it should be available
[11:17:23] <mru> are the rows also sparse?
[11:17:33] <twnqx> no
[11:17:46] <mru> so rows that exist have lots of things in them?
[11:17:59] <twnqx> each "element" is two longs
[11:18:11] <twnqx> (64bit long)
[11:18:32] <mru> how many in each row?
[11:18:47] <twnqx> either 1 (254 times), or 65536 (2 times)
[11:19:21] <mru> I don't get it
[11:19:35] <mru> if your 2d array were flat, how would you declare it?
[11:19:53] <twnqx> go non-sparse and waste the memory.
[11:20:05] <mru> just for sake of argument
[11:20:17] <mru> how would you declare the flat array?
[11:20:19] <mru> non-sparse
[11:20:32] <twnqx> mh. long array [256*65536*2];
[11:20:48] <mru> hmm flat was the wrong word
[11:20:58] <mru> how would you declare a simple 2D non-sparse array
[11:21:00] <mru> ?
[11:21:23] <twnqx> ugh. long array [256][65536]
[11:21:37] <mru> you said each element was two longs
[11:21:44] <twnqx> yeah.. i have a type for that...
[11:21:47] <mru> I'm guessing that struct at the top of your paste
[11:21:57] <twnqx> yes
[11:22:25] <twnqx> let me reboot quickly to use oprofile.
[11:27:11] <mru> so, you have 256 rows of 64k elements
[11:27:54] <mru> now, how are the populated elements distributed?
[11:28:58] <twnqx> in row 6 and row 17 about about 1-65535, in row 1, 41, 47 and 50 element 0
[11:29:21] <mru> I can't parse that
[11:29:45] <mru> are you saying only a few rows have any elements at all?
[11:29:52] <twnqx> it's about ip protocols (256 possible protocols), and ports for tcp and udp.
[11:30:03] <twnqx> yes, that's what i'm saying.
[11:30:47] <mru> and for the rows that exist, how many elements do they have?
[11:32:38] <twnqx> as i said, 6 and 17 will have 65536. all others will have one.
[11:32:48] <twnqx> (even if never used those will be allocated)
[11:33:31] <twnqx> for (i=0; i<256; i++) insert->volume[i] = calloc (sizeof(volume_t), (i == tcp || i == udp) ? 65536 : 1);
[11:33:35] <twnqx> this is the allocator...
[11:33:54] <mru> I'm starting to think this is the wrong approach
[11:33:55] <KotH> twnqx: if you want to store sparce matrices, there are data structures for exactly that
[11:34:02] <KotH> s/sparce/sparse/
[11:34:10] <av500> XML?
[11:34:39] <mru> av500: xml is for padding the unused cells
[11:34:42] <twnqx> let me just see if gprof was totally wrong first... and figure out how to use profile
[11:34:50] <twnqx> profile*
[11:34:53] <twnqx> oprofile*
[11:36:31] <mru> KotH: a generic sparse matrix isn't the right solution here
[11:37:15] <twnqx> well, i could remove the first level of pointers directly...
[11:38:17] <KotH> mru: hmm.. i havent read everything
[11:38:20] * KotH is lazy
[11:38:41] * KotH just wants to sound important by stating the obviously right and disapear again
[11:38:41] <mru> KotH: a generic data structure is almost never the best solution
[11:38:55] <mru> because reality is rarely generic
[11:40:09] <mru> twnqx: for tcp and udp, how many ports will actually be used?
[11:40:13] <mru> probably not all
[11:40:27] <twnqx> uh... just assume about all
[11:40:34] <mru> sure, lets pretend
[11:40:46] <twnqx> some hundred k user + P2P will cause it to be about exhaustive
[11:40:58] <mru> here's what you do
[11:41:03] <KotH> twnqx: what kind "thing" are you designing?
[11:41:13] <mru> flat array of 256+2*65k elements
[11:42:16] <mru> if (tcp) index=256+port; else if (udp) index 256+64k+port; else index=protocol;
[11:42:33] <KotH> that'll be 128MB of data
[11:42:38] <mru> no
[11:42:56] <mru> 1MB
[11:43:18] <KotH> oh.. right
[11:43:19] <mru> ~128k * sizeof(elem)
[11:43:39] <mru> if most of them are used, it's good enough
[11:44:41] <twnqx> port statistics generator... for a somewhat larger network operator
[11:47:19] <mru> hang on, this ain't right
[11:47:24] <mru> other protocols have ports too
[11:47:27] <mru> sctp for one
[11:49:32] <mru> but maybe you don't care about ports for the more obscure protos
[11:53:35] <av500> twnqx: so, you will censor us all soon....
[11:54:11] <mru> av500: we can trick him to include a buffer overflow to unlock it
[11:55:04] <twnqx> heh
[11:55:19] <twnqx> i built saudi arabias filter though :X
[11:55:42] * mru sends twnqx a turbomb
[11:55:52] <av500> I guess that was easy, apply wire cutters to uplink
[11:59:36] <mru> http://www.theregister.co.uk/2010/06/21/amiga_x1000/
[11:59:38] <mru> whyyyyyyyyyyyyy
[12:00:22] <av500> dont show basty_CDGS
[12:01:11] <mru> http://acp.atari.org/
[12:01:59] <mru> there's something fundamentally wrong with designing old computers
[12:02:43] <mru> running an original vintage machine is one thing
[12:03:21] <KotH> friend of mine once designed a pdp-11 "emulator" on a fpga
[12:03:45] <mru> pdp-11 implementations are a special case
[12:04:10] <mru> it's the hw equivalent of running doom on as many devices as possible
[12:05:32] <KotH> mru: either these guys have nothign else to do or too much time at hand
[12:05:38] <KotH> or both
[12:05:52] <KotH> mru: and th company who's doing that is just a few km south of zh
[12:06:31] <mru> you know what you have to do
[12:06:44] <KotH> call them and ask them for a sample? ;)
[12:07:45] <av500> join them?
[12:09:18] <Tjoppen> are you supposed to only set AVPacket::pts before interleaving? av_interleaved_write_frame() nags about pts < dts otherwise. I'm guessing it infers better values for dts
[12:10:16] <Tjoppen> max_b_frames = 2 -> dts = 0, 1, 2, 3; pts = 0, 3, 1, 2
[12:13:00] <Tjoppen> works if only setting pts. I guess it does some kind of sorting maneuver
[12:13:40] <wbs> Tjoppen: yes, lavf does that internally, see lavf/utils.c, compute_pkt_fields2 or something similar
[12:13:53] <wbs> Tjoppen: it sets dts = -2, -1, 0, 1, in that case, iirc
[12:14:04] <mru> Tjoppen: that's what you get with an IBBP sequence
[12:15:44] <Tjoppen> ok. that makes sense
[12:16:20] <Tjoppen> I think I'll stick to only setting pts then. either using pts or dts, depending on which is AV_NOPTS_VALUE
[12:16:32] <Tjoppen> but wait a minute.. how the hell does that work with VFR?
[12:17:08] <Tjoppen> if I have pts = 0, 10, 1, 5..
[12:17:22] <wbs> Tjoppen: then it produces -2, -1, 0, 1, 5, 10
[12:17:37] <wbs> the source has it all
[12:17:50] <Tjoppen> right, it doesn't matter too much exactly when decoding happens, as long as it's before presentation
[12:56:12] <mru> Dark_Shikari: ping
[12:56:28] <Dark_Shikari> mru: pong
[12:56:54] <mru> do you have any statistics of block_last_index for typical videos?
[12:57:01] <Dark_Shikari> Depends heavily on the video format.
[12:57:19] <Dark_Shikari> But if you want to make a custom idct, I've seen 3 types of shortcut-idct:
[12:57:25] <Dark_Shikari> (and on the bitrate)
[12:57:29] <Dark_Shikari> 1: idct dc (easy)
[12:57:48] <mru> but not all too common in high-bitrate video
[12:57:52] <Dark_Shikari> no, still quite common
[12:57:55] <Dark_Shikari> totally worth it
[12:58:00] <Dark_Shikari> 2) idct that handles block_last_index 2 or so (coreavc does this one)
[12:58:18] <Dark_Shikari> 3) idct for the top left of the dct (i.e. 10 coeffs, due to the zigzag pattern), theora does this
[12:58:24] <Dark_Shikari> 3) is very very useful even at high bitrates
[12:58:52] <mru> yes, I suspected as much
[12:59:18] <Dark_Shikari> also, keep in mind for non-bit-exact idcts, shortcut idcts tend to be easier.
[12:59:38] <Dark_Shikari> e.g. for coreavc's 2-coeff idct, it still has to emulate a transpose
[12:59:44] <Dark_Shikari> it can't e.g. add basis functions
[13:00:20] <mru> I'm just thinking about the interfaces
[13:00:54] <mru> passing block_last_index to the idct makes sense
[13:01:15] <Dark_Shikari> absolutely
[13:01:18] <mru> so each implementation can choose whatever shortcuts it wants
[13:02:15] <mru> and call the same function even for dc-only
[13:02:28] <mru> some codecs already have a special idct_dc pointer
[13:05:06] <av500> mru: what was the name of that env var that pkgconfig does not honour?
[13:06:11] <mru> av500: pick any name
[13:06:19] <av500> lol
[13:07:03] <av500> never mind, coworker found the fix: rm -rf $(BUILD_DIR)/lib/pkgconfig
[13:08:05] * mru likes that style
[13:20:43] <mru> Dark_Shikari: do you understand michael's latest reply?
[13:22:55] <Dark_Shikari> Nope.
[13:22:57] <Dark_Shikari> lol
[13:29:31] <kshishkov> I think he hints on the fact that 63th 1-bit element will cause noise less than rounding value for DC, so it'ssafe to ignore it
[13:30:06] <mru> a 1 in the 63rd coeff scatters +-1 over the block, yes
[13:30:10] <mru> but the spec mandates it
[13:30:29] <mru> otherwise we could just skip it entirely
[13:30:40] <mru> and he rejected that
[13:30:50] <mru> rightfully
[13:30:55] <mru> I hadn't checked the spec
[13:34:34] <Dark_Shikari> kshishkov: It isn't less than rounding value for DC at high quants
[13:39:51] <BBB> Dark_Shikari: http://ffmpeg.pastebin.com/83HxCxGQ ?
[13:39:58] <BBB> that's the idct_dc_add
[13:40:18] <BBB> I unrolled the loop also, made no difference
[13:40:32] <Dark_Shikari> for that you should unroll the loop, it will basically be shorter
[13:40:38] <Dark_Shikari> oh, you're doing it the slow way!
[13:40:41] <Dark_Shikari> you need to see the fast way to do it.
[13:40:46] <BBB> ?
[13:40:53] <BBB> I'm a beginner
[13:41:02] <BBB> there's a fast way?
[13:41:18] <Dark_Shikari> x264: common/x86/dct-a.asm
[13:41:22] <BBB> the problem is that the DC is 16-bit, so I do byte->word->byte
[13:41:23] <Dark_Shikari> lines 315 to 350
[13:41:36] <Dark_Shikari> Aka how to about the punpck.
[13:42:39] <Dark_Shikari> if you can't figure out why it works, ask questions until you do.
[13:43:49] <Dark_Shikari> Now unroll it and it'll only be 4 load, 8 add/sub, 4 store. plus init code.
[13:44:17] <BBB> there's no sub
[13:44:18] <BBB> only add
[13:44:22] <BBB> ?
[13:44:34] <BBB> I probably have to read your code first :-p
[13:45:00] <Dark_Shikari> I gave you the lines of code
[13:45:00] <Dark_Shikari> read them
[13:45:09] <Dark_Shikari> and yes there's sub.
[13:46:46] <BBB> dc = (block[0] + 4) >> 3; (16x) av_clip_uint8(dst[0] + dc)
[13:46:49] <BBB> wheres the sub?
[13:47:06] <BBB> and why do you calculate the negative dc?
[13:47:31] <BBB> you do mm0=(block+32)>>7, mm1=-mm0
[13:47:37] <Dark_Shikari> You need to add a 9-bit signed integer to a uint8_t.
[13:47:44] <Dark_Shikari> Right?
[13:48:11] <Dark_Shikari> One way to do this is to do an add with saturation, then a subtract with saturation.
[13:48:41] <Dark_Shikari> for example.
[13:48:55] <BBB> I think I get it
[13:49:00] <Dark_Shikari> suppose dc is 130
[13:49:01] <BBB> if it's negative, the add is 0
[13:49:03] <Dark_Shikari> 80 += 130
[13:49:05] <BBB> because it's saturated?
[13:49:05] <Dark_Shikari> 80 -= 130
[13:49:06] <Dark_Shikari> Yeah
[13:49:17] <BBB> if postiive, the sub is 0
[13:49:19] <Dark_Shikari> yup
[13:49:21] <BBB> because you saturated it
[13:49:41] <BBB> interesting
[13:49:48] <BBB> my blocks are only 4x4 though
[13:50:14] <BBB> let me test that
[13:50:17] <BBB> it makes a little sense
[13:50:25] <Dark_Shikari> still works on 4x4
[13:50:28] <BBB> have to make sure it's faster for a 4x4 block
[13:50:36] <BBB> right, but for a 4x4 I'm not 100% sure it's still faster
[13:50:37] <Dark_Shikari> already confirmed for h264
[13:50:37] <BBB> it might be
[13:50:40] <BBB> have to test :)
[13:50:49] <Dark_Shikari> ffmpeg already does it for h264
[13:50:54] <BBB> ok
[13:50:58] <BBB> I'll check that function
[13:51:08] <Dark_Shikari> which is 4x4.
[13:52:25] <Dark_Shikari> it's 5 extra psub to save 4 punpck and 4 packuswb
[13:52:28] <Dark_Shikari> so 3 instructions less
[13:53:03] <BBB> true
[13:53:56] <BBB> I noticed psubusb/paddusb, but wasn't sure how helpful they'd be because the DC is short, not 8-bit, I didn't know it was 9-bit (8+sign?)
[13:54:12] <BBB> but I guess it doesn't matter
[13:54:16] <BBB> because it's saturated
[13:54:17] <BBB> ...
[13:54:20] * BBB needs to think more
[13:54:30] <BBB> I guess learning asm is good ;)
[13:54:36] <BBB> we should do sse2 some day now
[13:54:56] <BBB> also, if pshufw is absent, it's a mmx-only function, right? no mmxext
[13:55:01] <twnqx> mru: how would you access that suggested array, something like #define indexof(prot, port) (prot == 6 ? 256 + port : prot == 17 ? 256 + 65536 + port : prot) ?
[13:55:05] <BBB> I tried to write it without pshufw, which is actually faster
[13:55:33] <twnqx> it actually is way faster indeed, i just wonder if there's a better way
[13:55:50] <mru> twnqx: calculate the index however you want
[13:56:00] <mru> I gave you one option
[13:56:07] <twnqx> i tried different ones
[13:56:08] <mru> you quote another above
[13:56:16] <mru> I doubt it makes much difference
[13:56:46] <twnqx> yeah, ?: or if...else shouldn't make much difference
[13:57:02] <twnqx> i just wondered if there's a way around code
[13:57:06] <twnqx> but well, so be it
[13:57:22] <twnqx> 250k lines/second now
[13:57:33] <mru> and before?
[13:57:37] <twnqx> ~170k
[13:59:23] <Dark_Shikari> BBB: there are other mmxext-only things
[13:59:34] <Dark_Shikari> always check before you list it as one or the other
[13:59:46] <BBB> I need a cheatsheet for that
[13:59:49] <BBB> intel's docs are of no use
[14:04:00] <Dark_Shikari> http://alien.dowling.edu/~rohit/nasmdocb.html
[14:04:15] <Dark_Shikari> if it's listed as "williamette, sse2" in that doc, it's mmxext
[14:04:20] <Dark_Shikari> if it's listed as "katmai, mmx", it's mmx
[14:04:33] <Dark_Shikari> er, oops
[14:04:34] <Dark_Shikari> I mean
[14:04:38] <Dark_Shikari> katmai, mmx -> mmxext
[14:04:44] <Dark_Shikari> pent, mmx -> mmx
[14:04:55] * mru hates intel codenames
[14:05:55] * BBB too
[14:08:26] <Dark_Shikari> http://x264dev.multimedia.cx/?p=472
[14:13:01] <BBB> Dark_Shikari: I don't think it'd be faster
[14:13:09] <BBB> notice how my loop is only five calls
[14:13:34] <BBB> (I didn't try yet, I just am waking up with coffee)
[14:14:24] <Dark_Shikari> "my loop is only 5 calls"?
[14:14:37] <Dark_Shikari> keep in mind also that punpck is slow
[14:14:47] <Dark_Shikari> even in mmx, where it's 1 cycle, punpck and pack share a single execution unit
[14:14:53] <Dark_Shikari> you can only execute one per cycle on any cpu prior to the i7
[14:15:02] <Dark_Shikari> by comparison you can execute three adds/subs per cycle
[14:15:37] <mru> does i5 differ from i7 in any of this?
[14:15:40] <Dark_Shikari> no
[14:15:52] <Dark_Shikari> let's get some terminology straight
[14:15:56] <Dark_Shikari> everything iWhatever is "nehalem"
[14:16:00] <Dark_Shikari> 45nm core 2 is "penryn"
[14:16:03] <Dark_Shikari> 65nm core 2 is "conroe"
[14:16:19] <Dark_Shikari> yes they use different codenames for some permutations of those, but three is enough to cover all of what matters
[14:16:33] <Dark_Shikari> i.e. all variations on the nehalem are just different numbers of cores and jazz, not actual ALU changes
[14:16:54] <mru> my i5 is allegedly an "arrandale"
[14:17:28] <Dark_Shikari> the codenames are easy to deal with if you stick with one per generation
[14:17:37] <Dark_Shikari> "conroe" is much easier than "that 65nm core 2 chip that had a slow shuffle unit"
[14:17:56] <mru> but how am I supposed to know that arrandale means nehalem?
[14:19:13] <Dark_Shikari> you don't. you just know it's an iWhatever
[14:19:15] <Dark_Shikari> so it's a nehalem.
[14:19:48] <av500> mru: you need a wiki :)
[14:21:30] <mru> Dark_Shikari: unlike you, I wasn't born with the intel codename table hardwired in my brain
[14:21:50] <av500> there was no space left beside the gcc standard yu swalloed as a kid...
[14:22:17] <Dark_Shikari> mru: you at least know about the generations
[14:22:20] <Dark_Shikari> iWhatever -> nehalem
[14:22:23] <Dark_Shikari> 45nm core 2 -> penryn
[14:22:24] <mru> no I don't
[14:22:25] <Dark_Shikari> 65nm core 2 -> conroe
[14:22:29] <Dark_Shikari> That's it.
[14:22:31] <Dark_Shikari> That's all you need to know.
[14:22:43] <mru> and how do I know what my core2 is?
[14:22:54] <Dark_Shikari> Does it have SSE4.
[14:22:55] <Dark_Shikari> ?
[14:22:57] <Dark_Shikari> If so, it's a penryn.
[14:23:00] <Dark_Shikari> If not, it's a conroe.
[14:23:02] <mru> don't remember
[14:23:07] <Dark_Shikari> cat /proc/cpuinfo
[14:23:10] <Dark_Shikari> let your linux remember for you
[14:23:10] <mru> not powered on
[14:23:13] <Dark_Shikari> lol
[14:23:27] <mru> I do have another that's running...
[14:23:36] <mru> a T7200
[14:24:12] <mru> which sss*e[0-9] is aka pni?
[14:24:50] <kshishkov> mru: use electronic microscope then
[14:24:56] <mru> the other core2 is a Q6700 iirc
[14:27:24] <janneg> T7200 should be 65nm, pni is sse3
[14:28:06] <mru> not that I really care
[14:29:38] <pJok> note to self: 4TB disks in winxp... no go...
[14:29:49] <mru> there are 4TB disks?
[14:29:53] <mru> or some fancy hw raid/
[14:29:56] <janneg> raid0?
[14:30:09] <pJok> raid0
[14:30:17] <pJok> some WD hardware raid box
[14:30:39] <mru> I take it you don't value your data much
[14:30:51] <av500> it'S just pron
[14:30:58] <mru> ah
[14:31:15] <pJok> mru, its not my data... im a sysadmin... to me data is just protocol overhead
[14:31:20] <mru> the first company I worked for had a 3TB array with mostly porn
[14:31:23] <mru> this was in 2003
[14:31:27] <mru> 3TB was a lot
[14:31:29] <av500> MPEG4 sprites would encode it nicely in a few MB...
[14:35:17] <pJok> mru, that and its really just a drive for transport
[14:35:24] <pJok> the data is multiple places already
[15:01:40] <av500> BBB: did you look into that android violator app?
[15:01:45] <BBB> not yet
[15:01:50] <BBB> busy with other things right now
[15:01:51] <BBB> but I will
[15:01:54] <av500> k
[15:01:57] <BBB> I spend like a full day on that every couple of weeks
[16:39:14] <BBB> if I want to do a 32-bit store (in C, not asm), would I normally do a #if BIG_ENDIAN ... #else ... #endif and then AV_WN32(), or would I just load the value in some order and use AV_WL/B32?
[16:39:20] <BBB> this is in performance-critical code
[16:40:11] <BBB> (or maybe not, it's a _c version of some function basically)
[16:40:23] <mru> huh?
[16:40:27] <mru> what are you trying to do?
[16:41:15] <av500> is it impolite here to ask if ppl want to earn $ with windows coding job (non-ffmpeg)?
[16:41:16] <Dark_Shikari> you mean a 32-bit copy?
[16:41:25] <mru> av500: you already did
[16:41:59] <av500> doh
[16:44:37] <BBB> Dark_Shikari: 4 8-bit copies
[16:44:44] <BBB> x[0]=bla;
[16:44:48] <BBB> x[1]=otherbla;
[16:44:49] <BBB> etc.
[16:45:02] <Dark_Shikari> *(uint32_t*)x = *(uint32_t*)y
[16:45:04] <BBB> (and that repeated at different rows, so x[stride+0]=bla
[16:45:15] <Dark_Shikari> oh you mean the values need to be packed up
[16:45:19] <BBB> yes
[16:45:23] <Dark_Shikari> is it a splat, or a pack?
[16:45:28] <BBB> pack
[16:45:30] <BBB> they're different
[16:45:44] <Dark_Shikari> what's this for
[16:45:51] <BBB> vertical prediction
[16:46:01] <Dark_Shikari> that's easy
[16:46:03] <Dark_Shikari> look at h264
[16:46:18] <Dark_Shikari> uint32_t top pixels = *(uin32t_t*)top
[16:46:30] <mru> AV_RN32A please
[16:47:11] <BBB> it's a little different
[16:47:21] <BBB> vp8 doesn't copy top
[16:47:27] <BBB> it does some math on each pixel
[16:47:33] <BBB> so I end up with 4 values, probably 8-bit
[16:47:44] <BBB> mru: ok
[16:47:52] <Dark_Shikari> er, what? what does it do then?
[16:48:07] <BBB> look at david's vp8 patch, pred4x4_vertical_vp8_c
[16:48:21] <Dark_Shikari> I do'nt have it
[16:48:24] <BBB> src[0+0/1/2/3*stride] = (lt + 2*t0 + t1 + 2) >> 2;
[16:48:42] <BBB> and then different lt/t0/t1/etc combinations for 1/2/3+...
[16:49:16] <BBB> lt=pixel topleft, t1 = pixel topright, t0 = pixel on top
[16:49:23] <av500> mru: btw, if anybody ever mentions OMX, just run...
[16:49:41] <Dark_Shikari> BBB: don't pack it, just assign it directly
[16:49:43] <Dark_Shikari> and write asm later
[16:49:51] <BBB> michael wants 32-bit stores
[16:49:55] <BBB> I want to commit the basic decoder
[16:50:02] <BBB> so I need 32-bit stores :-p
[16:50:09] <Dark_Shikari> that's stupid
[16:50:12] <BBB> yes
[16:50:17] <Dark_Shikari> then you need a pack8to32 macro
[16:50:18] <BBB> but I still need 32-bit stores
[16:51:17] <BBB> it makes some sense, I write the same 4 values in each row, so packing them and writing them as a 32-bit value (per row) is likely faster
[16:51:47] <Dark_Shikari> packing them takes a ton of ops
[16:51:51] <Dark_Shikari> you're not memory-bound here
[16:52:20] <BBB> so the pack8to32() macro is a if (BIG_ENDIAN) { ... } else { .. }?
[16:52:54] <BBB> or, well, BIG_ENDIAN ? x<<24|y<<16|etc : y|y<<8|etc;?
[16:53:04] <BBB> and then have the compiler figure it out
[16:53:34] <mru> av500: yeah, I know omx is bad
[16:54:20] <mru> is it one of those "standards" that everybody does differently?
[16:54:27] <mru> or does it just suck anyway?
[16:54:44] <av500> can u imagine that the arm side OMX wrapper around an MPEG2 dsp codec is 8000 lines of C?
[16:55:54] <mru> easily
[16:56:08] <mru> and that's the app code _calling_ omx, right?
[16:56:19] <av500> no
[16:56:33] <av500> MPEG2 decoder on DSP, running under brdige
[16:56:34] <mru> only 8k lines for omx itself?
[16:56:38] <av500> yes
[16:56:42] <av500> for the OMX self wank
[16:56:58] <av500> to expose this dsp side mpeg2 decoder to an OMX app
[16:57:13] <av500> of course the code is also dead ugly
[16:57:29] <KotH> av500: i hope you're not surprised. it's comercial code after all
[16:58:53] <av500> only mildly surprised
[16:59:29] * mru doesn't get surprised anymore
[17:02:15] <CIA-92> ffmpeg: reimar * r23680 /trunk/ (4 files in 2 dirs):
[17:02:15] <CIA-92> ffmpeg: mathematics.h no longer needs config.h, so update tablegen code and
[17:02:15] <CIA-92> ffmpeg: documentation to use it where appropriate.
[17:28:04] <j0sh_> wbs: thanks for the tips on depacketizer samples
[17:33:43] <wbs> j0sh_: I hope I don't scare you to death with untangling the aac/rtp stuff - I guess you start seeing the dire need of getting that code out of rtsp.c? :-)
[17:38:07] <verb3k> wbs, will you commit the libvorbis patch for fixing the packet dropping issue?
[17:41:54] <j0sh_> wbs: i was actually afraid i missed some things. i saw that code, but wasn't sure if it was used by other formats, so just left it there
[17:44:00] <wbs> verb3k: yeah, I'll wait and see if I can get another explicit ok out of Yuvi, but if I don't see him here, I'll probably just apply it in a few days
[17:44:27] <wbs> j0sh_: yeah, that's the problem, it looks all generic and fine, litters the generic code with format specific stuff
[17:47:48] <j0sh_> alright
[17:50:58] <wbs> j0sh_: I added another task on the wiki checklist btw, that I think I've mentioned - sharing code for sdp line parsing, since many of the rtp depacketizers have a quite similar parsing loop
[17:54:32] <j0sh_> ok cool
[17:54:56] <j0sh_> is there any reason one would make multiple reads/writes from the same ffhttp handler?
[17:55:27] <wbs> multiple separate connections you mean? no
[17:55:41] <j0sh_> on the same connection
[17:56:13] <j0sh_> eg, send POST header, get reply, send POST data, get more replies, etc
[17:56:25] <wbs> ah, I see
[17:56:43] <wbs> no, for normal HTTP, you send the full POST header and the POST data, and don't get back any reply until you've sent all that
[17:56:54] <j0sh_> thats what i thought, alright
[17:57:20] <j0sh_> i think it should be safe to reuse chunksize, then
[17:57:35] <wbs> yeah, I've got a patch for that coming up
[17:57:48] <j0sh_> i think that multiple reads/replies is why i added that is_chunked, since i didnt want replies to overwrite our chunked preference
[17:59:39] <wbs> yeah... semantically, I'd rather use something like is_chunked or chunked_post or whatever for the posts, to keep it clear (and setting chunksize for posts feels unintuitive), but I'm ok with it for now
[17:59:46] <wbs> I'm preparing a patch for that right now
[18:00:01] <j0sh_> indeeed
[18:02:26] <Kovensky> wbs: no pipelining?
[18:05:07] <j0sh_> isn't pipelining a matter of reusing the tcp connection, not the http connection?
[18:06:14] <wbs> Kovensky: posts and such shouldn't be pipelined anyway
[18:22:39] <BBB> wbs: almost correct ;-) your patch just reverted my fixing tjoppen's bug yesterday :-p
[18:23:01] <wbs> BBB: uhm, no?
[18:25:55] <wbs> BBB: for get, it first sets chunksize to 0 in http_open, which doesn't matter at all since it's overwritten with -1 in http_connect again, just exactly as you fixed it yesterday, I just moved the s->chunksize = -1; down to below the if (post)
[18:26:24] <wbs> BBB: for POST, it sets chunksize to 0 (default to chunked) in http_open so that it can be overwritten with -1 if we want to disable chunking, before calling http_connect
[18:41:43] <CIA-92> ffmpeg: mstorsjo * r23681 /trunk/libavformat/http.c: HTTP: Clarify a comment
[19:00:22] <BBB> I see waht you're doing, patches are fine then
[19:00:33] <wbs> ok, great!
[19:02:26] <CIA-92> ffmpeg: mstorsjo * r23682 /trunk/libavformat/http.c: HTTP: Get rid of the is_chunked variable, use the chunksize variable instead
[19:03:03] <CIA-92> ffmpeg: mstorsjo * r23683 /trunk/libavformat/http.c: HTTP: Compact the code for writing chunked post data
[19:03:03] <CIA-92> ffmpeg: mstorsjo * r23684 /trunk/libavformat/http.c: Reindent
[19:24:13] <mru> I can't believe we're having this stupid idct discussion
[19:24:33] <mru> for 10 fuckin years nobody has bothered to optimise based on last coeff
[19:24:49] <mru> and now it's hinging on some ridiculous "feature" of mpeg2
[19:25:47] <mru> just setting the last_coeff correctly would give no slowdown at all compared to _not bloody optimising it_
[19:26:37] <mru> and _all other cases_ would get faster
[19:30:52] <mru> Dark_Shikari: ping
[19:40:06] <Dark_Shikari> mru: pong
[19:40:24] <mru> Dark_Shikari: I'm getting tired of idct bikeshedding
[19:40:40] <Dark_Shikari> yes it's retarded
[19:41:06] <mru> forget about mpeg2
[19:41:20] <CIA-92> ffmpeg: mstorsjo * r23685 /trunk/libavformat/ (http.c http.h): HTTP: Add a method for initializing the authentication state from another connection
[19:41:27] <Dark_Shikari> it's not my job
[19:41:29] <Dark_Shikari> if you want to do it
[19:41:30] <Dark_Shikari> just do it
[19:41:58] <CIA-92> ffmpeg: mstorsjo * r23686 /trunk/libavformat/rtsp.c: RTSP: Use the same authentication for the HTTP POST session as for the GET
[19:42:04] <mru> but I can't optimise the idct with invalid last_coeff
[19:42:19] <mru> and michael stubbornly refuses to FIX A BUG
[19:42:45] <Dark_Shikari> Remember when you rolled back my commit access because I tried to fix a bug?
[19:42:51] <Dark_Shikari> Now you know how it feels.
[19:43:02] <Dark_Shikari> Michael thinks your concerns are invalid
[19:43:06] <Dark_Shikari> and is bikeshedding to avoid fixing the problem.
[19:43:28] <mru> isn't the the one who makes a huge ordeal over 0.00000001% speed difference?
[19:43:33] <Dark_Shikari> It doesn't matter who is right -- what matters is development is being held up because nobody can decide who is right.
[19:43:41] <Dark_Shikari> mru: yes, that's michael
[19:43:52] <Dark_Shikari> He will not lose 0.00001% speed, even to gain 10% speed.
[19:44:03] <Dark_Shikari> That's why the ffmpeg decoders are still so slow and unoptimized
[19:44:05] <mru> fixing this will not lose _any_ speed
[19:44:19] <mru> it's like THREE INSTRUCTIONS_ per MB
[19:44:26] <mru> per block, sorry
[19:44:26] <Dark_Shikari> no, per block, not per mb
[19:44:41] <Dark_Shikari> The fact that I was able to trim 30-40% off the ffmpeg decoder in one or two days is proof that michael's attitude is holding up development
[19:45:13] <mru> you also threw out 90% of functionality
[19:45:23] <Dark_Shikari> I did, but it still decoded FLV correctly
[19:45:29] <Dark_Shikari> If it had been templated, you could have gotten all the speed boost
[19:45:33] <Dark_Shikari> with none of the functionality loss
[19:45:40] <Dark_Shikari> also, most of the speed boost was from functions that would have been trivial to template
[19:46:06] <Dark_Shikari> For example, the mere fact that the flv escape coeff decoding isn't inlined into the h263 block decode loses a few hundred clocks per mb
[19:49:14] * kshishkov eagerly awaits for another burst of troll-driven speedups
[19:50:32] <peloverde> jesus, that's a huge vpx changeset
[19:50:58] <Dark_Shikari> peloverde: lots of minor changes
[19:51:09] <Dark_Shikari> oh
[19:51:10] <Dark_Shikari> trailing whitespace
[19:51:26] <peloverde> sounds like somebody nevere learned about git-hooks
[19:51:44] <mru> someone committed trailing whitespace?
[19:52:12] <Dark_Shikari> yes
[19:52:30] <peloverde> What with renaming *.asm to *.S?
[19:52:37] <Dark_Shikari> peloverde: retards
[19:52:50] <peloverde> @redhat.com
[19:52:51] <peloverde> , what do you expect
[19:53:23] <mru> .S is the standard suffix for c-preprocessed assembler
[19:53:40] <Dark_Shikari> of course, it isn't c-preprocessed
[19:59:34] <mru> Dark_Shikari: since there are multiple shortcut cases, do you think passing last_coeff to the idct func is better than a separate idct_dc function?
[20:00:19] <Dark_Shikari> Yes.
[20:10:43] <Dark_Shikari> mru: new candidate for "most annoying type of developer"
[20:10:49] <Dark_Shikari> the developer who does git format-patch | sendmail.
[20:10:57] <Dark_Shikari> a guy just dumped 50 patches from a git local tree onto the vp8 mailing list
[20:11:20] <Dark_Shikari> "it includes such great commits as "rename" "Forgotten .asm->.S fixups." "Finish the merge."
[20:11:23] <Dark_Shikari> "
[20:11:34] <BBB> yeah my inbox was stuffed
[20:11:38] <BBB> fortunately it was only 49
[20:11:38] <Dark_Shikari> Oh, and he converted all the asm unlaterally to GAS syntax,.
[20:11:42] <Dark_Shikari> For no apparent reason.
[20:11:50] <Dark_Shikari> 04:11 < derf> My personal favorite "heavy". That's the entire commit message.
[20:12:02] <mru> what syntax was it before?
[20:12:05] <Dark_Shikari> yasm
[20:12:16] <Dark_Shikari> Yes, going from yasm _to_ gas, for x86.
[20:12:26] <mru> where's the sense in that?
[20:12:28] <twnqx> at&t lover :S
[20:12:30] <Dark_Shikari> There is none!
[20:12:32] <Dark_Shikari> it's retarded!
[20:12:35] <peloverde> wtf, how is the mediacoder guy ranked 90 on ohloh?
[20:12:44] <Dark_Shikari> peloverde: ohloh is a popularity competition
[20:12:46] <mru> can we downvote him?
[20:12:52] <Dark_Shikari> it's like facebook
[20:12:54] <Dark_Shikari> 'I like this'
[20:13:02] <mru> facebook sucks
[20:13:05] <Dark_Shikari> exactly
[20:13:08] <Dark_Shikari> ohloh is facebook for open source
[20:13:19] <mru> whoever has more imaginary friends when he dies, wins
[20:13:45] <mru> a friend IRL is worth 1000 on facebook
[20:13:59] <Dark_Shikari> I'm pretty sure 1000 facebook friends actually has negative worth
[20:14:20] <mru> I don't know anyone who has that many
[20:14:21] <BBB> I agree with that one
[20:14:37] <mru> I know some people well into the triple digits
[20:14:43] <BBB> imagine that you place a message "I'm depressed", and it gets 50 likes
[20:14:47] <BBB> what the hell do you do with that
[20:14:49] <peloverde> I dunno, facebook makes a pretty good evite replacement
[20:15:25] <mru> at least it's not myspace
[20:16:35] <peloverde> Is there a way we can hostile takeover the FFmpeg facebook page?
[20:17:03] <mru> they're using the logo without permission
[20:17:56] <peloverde> Either "Pretending to be me or someone I know" or "This violates my intellectual property"
[20:18:07] <mru> the logo is _not_ under gpl
[20:18:27] <BBB> there's a facebook ffmpeg person? :-p
[20:18:32] <BBB> that's hilarious
[20:18:34] <peloverde> http://www.facebook.com/#!/pages/FFmpeg/45535533634
[20:18:35] <BBB> I wonder who owns it
[20:18:39] <BBB> maybe it's the mediacoder guy
[20:19:09] <peloverde> Would people mind if I try to take it over under "impersonating"?
[20:19:20] <mru> how do you do that?
[20:19:27] <BBB> maybe one of the devs opened it?
[20:19:28] <peloverde> There is a report page link
[20:19:32] <BBB> ask on the ML before taking it over :)
[20:19:54] <peloverde> good call
[20:21:52] <janneg> http://www.facebook.com/photo.php?pid=1295707&id=45535533634 doesn't look like it's from a developer
[20:22:36] <mru> whoever built that ffmpeg has no clue
[20:23:31] <Dark_Shikari> "built ffmpeg" and "has no clue" are highly correlated
[20:24:03] <mru> they pile up bizarre configure flags yet always manage to omit the ones they should have used
[20:24:06] <mru> like --cpu
[20:24:19] <Dark_Shikari> --extra-cflags="-funroll-all-loops"
[20:25:42] <Honoome> Dark_Shikari: so have you heard about gcc 4.6's -Ofast ?
[20:25:51] <Honoome> [and I'm not frigging kidding you!]
[20:26:06] <mru> how is that different from -O9 -fricer?
[20:26:09] <Honoome> -O3 -ffast-math âŠ
[20:26:16] <mru> fast-math is dangerous
[20:26:26] <Honoome> why -O3 is not?
[20:26:36] <mru> O3 doesn't alter semantics
[20:26:37] <Honoome> [especially on x86, since it enables -ftree-vectorize ...]
[20:26:38] <mru> fast-math does
[20:26:45] <saintdev> i prefer -fzomg-fast-speed
[20:27:05] <Honoome> mru: the point of -Ofast is that ti makes it âfasterâ even though it'll break semantics :|
[20:27:05] <mru> fast-math assumes you will have no infs or nans
[20:27:18] <Honoome> if it sounds batshit crazy it's because it is!
[20:27:30] <mru> if your code is carefully written to not have any such things, -ffast-math is safe
[20:27:38] <mru> there's something else it assumes too
[20:27:44] <peloverde> reciporcal math?
[20:29:17] <Dark_Shikari> what's -Ofast?
[20:29:26] <Dark_Shikari> ffast-math makes sense
[20:29:30] <Dark_Shikari> because it does very specific things
[20:29:31] <Dark_Shikari> but -Ofast?
[20:29:33] <mru> -fbreak-my-code
[20:29:35] <Dark_Shikari> wtf does that do
[20:29:40] <Honoome> -Ofast => -O3 -ffast-math right now
[20:29:50] <Dark_Shikari> ah
[20:29:55] <Dark_Shikari> here's the problem with that
[20:30:01] <Dark_Shikari> breaking semantics is fine as long as you know what breaks
[20:30:04] <mru> a lot of code breaks with -ffast-math
[20:30:06] <Dark_Shikari> -ffast-math breaks very specific things
[20:30:10] <Dark_Shikari> and the developers can code with that in mind
[20:30:11] <Dark_Shikari> BUT
[20:30:19] <Dark_Shikari> -Ofast could, someday, be updated to break ANYTHING
[20:30:24] <Dark_Shikari> so developers cannot safely use it
[20:30:31] <Dark_Shikari> i.e. it's not well defined.
[20:30:47] <Honoome> This tells GCC to disregard strict standards compliance and to enable all speed optimizations. In particular it turns on -O3 and -ffast-math.
[20:31:17] <Honoome> this is from today's post by Nick Clifton
[20:31:21] <peloverde> In theory new things could be added to -ffmast-math, you really should use the options taht it currently is equivalent to
[20:32:42] <peloverde> I've heard rumours taht ICC uses something like -ffast-math by default, is that true?
[20:32:55] <mru> it uses -fbreak-ffmpeg by default
[21:02:34] <mru> Dark_Shikari: who's that and why did you ban him?
[21:02:41] <Dark_Shikari> spambot
[21:02:50] <Dark_Shikari> it's rotating freenode channels
[21:02:57] <Dark_Shikari> so I did a pre-emptive ban on all channels I have op on
[21:02:58] <mru> ah, thanks
[21:13:14] <BBB> so...
[21:13:17] <BBB> now I have this patch
[21:13:33] <BBB> I have checked out and am committing to yuvi's git repo
[21:13:45] <BBB> and I've committed several things like asm optimizations to my local version
[21:13:52] <BBB> but I don't want to mess up his repo yet
[21:13:59] <BBB> now I have a patch which I do want to push
[21:14:03] <BBB> what do I do / how do I do that?
[21:14:08] <BBB> so git push only this patch
[21:14:16] <BBB> ideally using git gui or so
[21:14:29] <mru> create a new branch and put only that commit in it
[21:14:35] <mru> then push that to the remote master
[21:15:06] <mru> suppose you have some commits on your local master branch
[21:15:18] <mru> so git status tell you you're X commits ahead
[21:15:19] <janneg> git checkout -b push-branch origin/master
[21:15:25] <mru> then do this
[21:15:34] <mru> git checkout -b temp
[21:15:38] <mru> git rebase -i origin
[21:15:51] <mru> [remove the commits you don't want to push]
[21:15:59] <mru> git push temp:master
[21:16:02] <janneg> and git cherry-pick the commit you want to push
[21:16:26] <mru> use janneg's method if you have many commits you don't want to push
[21:16:32] <BBB> yeah it's many
[21:16:46] <BBB> I guess I sort of made a mess of my local repo
[21:16:50] <BBB> bt I always did that to my svn also
[21:17:00] <mru> if you have many you do want to push, cherry-picking one by one gets old quickly
[21:17:11] <BBB> no, only one
[21:17:17] <BBB> so I commit this patch, then do what janneg said?
[21:17:21] <BBB> or I do not commit this patch?
[21:17:27] <mru> commit it
[21:17:44] <mru> then branch from origin/master and cherry-pick it
[21:17:57] <mru> push, switch back to master and git pull --rebase
[21:19:38] <BBB> how does git cherry-pick work?
[21:19:41] <BBB> is there a UI?
[21:19:45] <Dark_Shikari> git cherry-pick <hash>
[21:19:47] <Dark_Shikari> it cherry picks it
[21:19:48] <Dark_Shikari> that's it
[21:19:53] <Dark_Shikari> I don't see what you'd need a gui for
[21:20:15] <mru> gitk has cherry-pick in a popup menu if you insist
[21:20:55] <BBB> gitk
[21:21:01] * BBB has git gui and GitX
[21:21:04] <BBB> let's try gitk
[21:21:20] <BBB> no port for gitk
[21:21:27] <BBB> how do I figure out the hash?
[21:21:36] <BBB> and how do I do a diff against origin/master again?
[21:21:51] <mru> git diff origin/master
[21:22:21] <BBB> hm... not sure why I couldn't figure that out myself :-p
[21:23:19] <mru> Dark_Shikari: there are a lot of calls to idct_{put,add} ...
[21:23:23] <ohsix> y helo
[21:23:50] <janneg> git log shows the hashes
[21:24:15] <Dark_Shikari> mru: ?
[21:24:30] <BBB> I guess I should've branched against origin/vp8
[21:24:36] * BBB kicks himself
[21:24:38] <BBB> now what?
[21:24:48] <mru> Dark_Shikari: idct_put and idct_add are called from many places
[21:25:05] <Dark_Shikari> I thought it was just mpegvideo?
[21:25:14] <BBB> can I do git checkout -b push-branch origin/vp8 without breaking things?
[21:25:23] <mru> a quick grep counts 73 places
[21:25:33] <janneg> BBB: yes
[21:25:42] <Dark_Shikari> mru: holy fuck
[21:25:59] <BBB> janneg: it says "failed, branch already exists"
[21:26:16] <mru> maybe adding a new function with the extra arg is simpler
[21:26:49] <Dark_Shikari> ugh,.
[21:26:52] <Dark_Shikari> that's a bad solution too
[21:26:53] <Dark_Shikari> more bloat
[21:27:01] <Dark_Shikari> you could convert all of the current ones to ( , 63)
[21:27:03] <BBB> "fatal: git checkout: branch push-branch already exists"
[21:27:26] <janneg> BBB: of course checkout -b creates a new branch, use a different name
[21:27:31] <mru> Dark_Shikari: and benchmark it to prove no slowdown?
[21:27:43] <Dark_Shikari> lol
[21:27:44] <BBB> janneg: can't I overwrite the old one? I mean, I'm not gonna use it anymore
[21:27:46] <Dark_Shikari> <3 michael
[21:27:51] <janneg> or delete the branch first
[21:28:35] <BBB> how do I delete a branch
[21:28:44] <BBB> again, ideally without screwing up where I am
[21:28:50] <BBB> (is that master?)
[21:29:25] <janneg> you could rebase it against origin/vp8 but I'm not sure if that resets the remote pull/push branchs
[21:29:36] <janneg> git checkout master
[21:29:50] <janneg> git branch -d push-branch
[21:30:47] <janneg> if it complains make sure that push-branch holds no important commits and git branch -D push-branch
[21:31:01] <BBB> libavcodec/h264pred.c: needs merge
[21:31:01] <BBB> libavcodec/vp8.c: needs merge
[21:31:02] <BBB> error: you need to resolve your current index first
[21:31:15] <BBB> why isn't there a -f switch or so?
[21:31:41] <BBB> (there is no libavcodec/vp8.c, that's how I noticed I was in the wrong origin/branch)
[21:34:20] <BBB> hm, the git howto on kernel.org helped me there
[21:35:17] <BBB> bash-3.2$ git checkout -b push-branch origin/vp8
[21:35:17] <BBB> error: Entry 'libavcodec/vp8.c' would be overwritten by merge. Cannot merge.
[21:36:05] <janneg> BBB: still in the push-branch after the cherry-pick? what was the last command
[21:38:40] <BBB> I switched back to master
[21:38:43] <BBB> all my files are fine
[21:38:49] <BBB> now I want to recreate the push-branch
[21:38:54] <BBB> same command as last time
[21:38:56] <BBB> but now it's not working
[21:39:10] <CIA-92> ffmpeg: stefano * r23687 /trunk/Makefile:
[21:39:10] <CIA-92> ffmpeg: Update documentation dependencies, make ff* tools manpages and HTML
[21:39:10] <CIA-92> ffmpeg: pages depend of fftools-common-opts.texi.
[21:39:10] <CIA-92> ffmpeg: stefano * r23688 /trunk/doc/libavfilter.texi:
[21:39:10] <CIA-92> ffmpeg: Replace multitable for the unsharp filter option table with a simple
[21:39:11] <CIA-92> ffmpeg: @table @option.
[21:39:12] <CIA-92> ffmpeg: Allow pod rendering, as texinfo multitables are not supported by
[21:39:12] <CIA-92> ffmpeg: texi2pod.pl, also improve plain texinfo file readability.
[21:39:28] <BBB> I did git checkout master
[21:39:36] <BBB> then git branch -d push-branch
[21:39:45] <BBB> then git checkout -b push-branch origin/vp8
[21:39:47] <BBB> and that's not working
[21:42:45] <janneg> BBB: check that you're working copy and index are clean with git status
[21:43:01] <janneg> after make distclean
[21:47:33] <BBB> huh?
[21:47:40] <BBB> it thinks libavcodec/vp8.c is a new commit
[21:47:40] <BBB> n
[21:47:41] <BBB> ow wh
[21:47:43] <BBB> now what?
[21:47:51] <BBB> do I commit it as-is?
[21:47:57] <BBB> or will I screw up everything if I do that?
[21:49:08] <janneg> first check that you're actually in your master branch with git branch
[21:49:09] <peloverde> do you have a leftover libavcodec/vp8.c in a branch that doesn't contain it?
[21:49:17] <peloverde> what does git status say?
[21:49:46] <janneg> the active branch is marked by ^*
[21:52:01] <BBB> grrrrrrrrr
[21:52:12] <BBB> I think it committed to the remote branch push-branch instead of origin/vp8
[21:52:18] <BBB> git really needs some serious usability love
[21:52:33] <Dark_Shikari> I've never had such problems
[21:52:36] <Dark_Shikari> then again, I don't use branches
[21:52:50] <mru> branches are easy
[21:52:54] <BBB> http://github.com/yuvi/ffmpeg/tree/push-branch
[21:53:01] <BBB> I fucked up yuvi's git repo
[21:53:02] <BBB> :)
[21:53:09] <peloverde> I use branches all the time without problem, though I only have one pushable location
[21:53:10] <BBB> damn git, no wonder MN doesn't want to switch
[21:53:13] <BBB> what the fuck
[21:53:35] <mru> BBB: "git push +:push-branch" to delete it remotely
[21:54:10] <BBB> before or after switching back to master?
[21:54:13] <BBB> or does not matter?
[21:54:18] <mru> doesn't matter
[21:54:35] <BBB> ssh: Could not resolve hostname +: nodename nor servname provided, or not known
[21:54:36] <BBB> fatal: The remote end hung up unexpectedly
[21:54:43] <mru> ah, sorry
[21:55:20] <mru> git push origin +:push-branch
[21:56:05] <BBB> thanks
[21:56:06] <janneg> mru: the '+' shouldn't be needed
[21:56:18] <mru> yeah, probably not
[21:56:30] <BBB> so I suppose that pushing git push push-branch :vp8 would've committed it to the right branch?
[21:57:29] <mru> no, it would have delete the remote vp8 branch
[21:57:47] <mru> you want git push origin push-branch:vp8
[21:58:08] <mru> it's git push $repo $local:$remote
[21:58:24] <BBB> let me try
[21:58:29] <BBB> I seriously hope I don't fuck up again :-p
[21:59:35] <BBB> that looks quite sane
[21:59:37] <BBB> thanks!
[22:06:31] <BBB> eeeeeeeeeek all my files are missing :(
[22:07:02] <Dark_Shikari> git fsck
[22:07:21] <Dark_Shikari> I have no idea why you're doing all this complex stuff
[22:07:26] <Dark_Shikari> I only use about 4 git commands
[22:07:37] <Dark_Shikari> rebase, commit, log, format-patch, push
[22:07:44] <BBB> that's 5
[22:07:48] <Dark_Shikari> 5 is about 4.
[22:08:24] <BBB> git checkout vp8, I guess I was in the wrong branch
[22:09:57] <CIA-92> ffmpeg: stefano * r23689 /trunk/ (5 files in 2 dirs):
[22:09:57] <CIA-92> ffmpeg: Make the ffmpeg and ffplay man pages show the list of lavfi filters,
[22:09:57] <CIA-92> ffmpeg: sinks and sources, and document the -vf option.
[22:16:53] <bcoudurier> [14:52] <BBB> git really needs some serious usability love < you sound like the average ffmpeg user
[22:17:19] <BBB> git is seriously complex and obscure
[22:18:17] <Dark_Shikari> it's complex, but you don't have to use the complexity...
[22:18:31] * j0sh_ agrees on the complexity
[22:18:51] <peloverde> I use "rebase apply commit log format-patch push diff pull checkout branch blame"
[22:19:37] <mru> stash send-email
[22:19:49] <mru> add
[22:19:56] <peloverde> forgot send e-mail, I don't use stash, I'll create a temp branch instead
[22:20:05] <peloverde> git-apply behaves the way I wish patch did
[22:20:15] <mru> if the branch would live for 30s or less, I use stash
[22:20:19] <peloverde> I use git-apply with svn
[22:20:57] <peloverde> "add -p" is excellent
[22:20:58] * j0sh_ has been meaning to write a blogpost about my ffmpeg/git workflow...
[22:21:24] <Dark_Shikari> peloverde: temp branch? I use diffs
[22:21:30] <Dark_Shikari> I have 500+ diffs
[22:21:34] <Dark_Shikari> I upload them online so anyone can view them
[22:22:45] <mru> I have some 40 branches of ffmpeg
[22:22:48] <mru> most of them junk
[22:23:06] <Dark_Shikari> maybe 1/3 of my diffs are stuff that was appliede
[22:23:10] <Dark_Shikari> most of the rest are ideas I never finished, or failed
[22:23:15] <Dark_Shikari> but I keep them around, because sometimes I need them later
[22:24:01] <BBB> now there's a conflict in a file
[22:24:07] * BBB kicks git
[22:24:48] <peloverde> Then resolve the confict and continue
[22:25:33] <Dark_Shikari> what's the git link for that repo?
[22:25:38] <Dark_Shikari> once you push your asm I'd like to go write some asm
[22:26:53] <mru> Dark_Shikari: care to reply to michael?
[22:27:47] <BBB> the sad thing is that after every git checkout, it rebuilds everything
[22:28:05] <mru> then you're doing it wrong
[22:28:07] <BBB> Dark_Shikari: github.com/yuvi/ffmpeg/vp8
[22:28:33] <BBB> Dark_Shikari: I'll send a new plain-C patch to ffmpeg-devel, Michael said it could be applied to continue work in SVN
[22:28:39] <Dark_Shikari> mru: ok
[22:28:40] <BBB> I'll do that, and then send patches for my initial asm
[22:30:35] <Dark_Shikari> mru: done
[22:31:38] <lu_zero> hi
[22:31:53] <mru> lo
[22:31:58] <lu_zero> damn
[22:32:06] <lu_zero> looks like I'm missing lots of fun today
[22:32:24] * lu_zero spent yesterday running from Torino to Milano
[22:32:41] * lu_zero _hates_ korean vdrs
[22:32:53] <lu_zero> and windows =_=
[22:33:16] <mru> hehe koreans...
[22:33:29] <Dark_Shikari> brb, meeting
[22:34:20] <lu_zero> security cameras using a strange vdr
[22:34:35] <lu_zero> dnat doesn't seem to work
[22:35:02] * mru doesn't use nat much
[22:36:23] <lu_zero> I have to make those thing a little more secure using a vpn
[22:36:48] <lu_zero> _but_ the stock rule for suck services isn't working =_=
[22:39:38] <j0sh_> lu_zero: was it a fun run?
[22:40:07] <mru> he was running from the cops
[22:40:46] <j0sh_> vdrs: violent death reporting system? no wonder you were running
[22:41:31] <j0sh_> (thefreedictionary.com acronyms)
[22:42:18] <Honoome> lu_zero: could be worse
[22:42:25] <Honoome> you could have been running away from ME ...
[22:43:18] <lu_zero> Honoome: ...
[22:43:50] <lu_zero> that reminds me that you still have to pick a date for that meeting
[22:43:54] <Honoome> lu_zero: what? should I not be at least slightly pissed? ¬_¬
[22:45:44] <lu_zero> uh?
[22:49:55] <CIA-92> ffmpeg: stefano * r23690 /trunk/doc/filters.texi:
[22:49:55] <CIA-92> ffmpeg: Re-add the list of parameters for the unsharp filter, I somehow lost
[22:49:55] <CIA-92> ffmpeg: it in the previous commit.
[22:51:17] <BBB> is there some way to "cherry-pick" a patch but not apply it, but rather print it to stdout?
[22:51:23] <BBB> (using git)
[22:51:56] <mru> show
[22:51:56] <saintdev> git diff?
[22:52:21] <Honoome> git show $sha
[22:55:42] <CIA-92> ffmpeg: cehoyos * r23691 /trunk/libavdevice/x11grab.c: Remove stray semicolon.
[22:57:37] <ohsix> BBB: once you get over managing the working copy its a breeze
[22:58:07] * BBB decides to go home and take a break
[22:58:14] <BBB> I'll submit a new VP8 patch tomorrow
[22:58:47] <mru> what's the state of the decoder?
[22:59:01] <ohsix> you can also create a local tree to push to that you can fuck up
[22:59:10] <ohsix> derp
[23:35:32] <peloverde> youtube is selling video downloads now?
[23:42:28] <Honoome> peloverde: has been for a while yeah
[23:42:50] <peloverde> I didn't notice until today
[23:43:14] <Honoome> I still can't understand how some people pretend to "sell" their videos..
[23:52:30] <peloverde> I guess it's time for me to start vlogging to cash in on this
1
0
[00:54:47] <Dark_Shikari> mru: nice faad patch :)
[01:14:40] <saintd3v> Dark_Shikari: yes it is :)
[01:14:57] <saintd3v> peloverde: thanks for all you work on the aac decoder
[03:08:09] <Compn> well sorry i derailed the 'removing libfaad wrapper' movement mru
[03:08:23] * Compn sleeps
[03:12:54] <saintdev> Compn, me too. i'll be glad to see it gone
[04:54:32] <saintdev> peloverde: ping
[05:26:19] <astrange> make fate would be more useful if the 4xm test didn't fail on unpatched builds
[06:10:52] <peloverde> saintd3v, pong kind of quickly if you are around otherwise tomorrow
[06:12:53] <saintdev> peloverde: i found he-aac a stream that gives some errors, would you be interested?
[06:13:00] <peloverde> yes
[06:13:17] <saintdev> peloverde: http://91.121.18.185:7260
[06:13:34] <saintdev> [aac @ 0x3166ff0]channel element 1.9 is not allocated
[06:13:58] <saintdev> hmm, not giving me the other errors it was before
[06:14:23] <peloverde> post it on roundup, i'll check it out in the AM
[06:14:41] <saintdev> eek segfault
[06:15:06] <saintdev> hmm, and now it's not giving any error
[06:15:11] <saintdev> what is going on
[06:15:33] <peloverde> odd, It may use some odd feature in a strange way on occasion
[06:16:06] <peloverde> I've only tested against the conformance files and the CT identification file(s)
[06:16:43] <saintdev> could it be because it's joining in the middle of the stream, and maybe not getting a full frame to start with?
[06:17:23] <peloverde> you need an SBR "IDR" frame + a PS "IDR" frame to get full quality
[06:18:04] <peloverde> until then SBR will run in pure upsampling mode and PS will run in channel duplication mode
[06:18:51] <peloverde> and (may?) spam errors?
[06:19:34] <saintdev> ok, maybe that's it. i'll get a reliable sample for you to look at anyway
[06:19:45] <_av500_> peloverde: kudos for PS!
[06:19:47] <saintdev> just in case ;)
[06:19:56] <saintdev> _av500_: agreed \o/
[06:20:01] <peloverde> _av500_, thanks
[06:20:11] <peloverde> saintd3v, broken samples appreciated
[06:20:30] <_av500_> peloverde: next stop ffaacenc?
[06:21:10] <peloverde> in theory that's what I'm working on, there will probably be a few weeks of minor PS fixups
[06:24:04] <peloverde> I felt more competent taking money for psdec seeing as I'm at this point more decoder competent but we will see
[09:11:14] <KotH> morge
[09:59:51] <mru> astrange: what of fate?
[10:01:19] <astrange> let me re-configure and see what the problem was... it was the wrong md5 on 4xm
[10:01:35] <astrange> i assume if you give the wrong path to --samples it will actually notice?
[10:01:57] <mru> as in tests fail, yes
[10:03:17] <mru> well, it works here
[10:03:25] <astrange> +++ tests/data/fate/4xm2010-06-20 03:02:42.000000000 -0700 -88a53430410d1cec5ed46846652ffd51 +d41d8cd98f00b204e9800998ecf8427e
[10:03:55] <astrange> ...yeah, that's the empty file md5
[10:04:22] <mru> the only md5 I've bothered to memorise
[10:04:26] <mru> well, not all of it
[10:11:59] <astrange> no, it fails with the right path. is my rsync url out of date?
[10:12:21] <mru> what's the md5 of your 4xm test file?
[10:12:38] <mru> should be 9c16fcadaf51f93be3a51b8fc92cc119
[10:14:00] <astrange> it's... a symlink pointing to something i don't have
[10:14:01] <astrange> lrwxrwxrwx 1 astrange astrange 49B Jan 25 2009 TimeGatep01s01n01a02_2.4xm -> ../../game-formats/4xm/TimeGatep01s01n01a02_2.4xm
[10:14:13] <astrange> better try it again without symlinks
[10:14:24] <mru> tell rsync to follow links
[10:16:55] * mru is itching to delete the faad wrapper...
[10:19:19] <Compn> wow, another stereoscopic patch for mplayer
[10:19:25] <Compn> what is it with the resurgance of 3d? :P
[10:19:54] <mru> of course you need two patches for stereoscopic video
[10:20:31] <mru> they figured out a cheap way to project polarised images
[10:21:14] <elenril> mru: so what are you waiting for
[10:21:24] <elenril> everybody seems to agree
[10:21:32] <mru> I was hoping michael would comment
[10:21:35] <mru> but he probably won't
[10:24:10] <mru> ok, here it comes
[10:25:00] <CIA-92> ffmpeg: mru * r23653 /trunk/ (5 files in 3 dirs): Remove libfaad wrapper
[10:28:03] <elenril> \o/
[10:29:07] <elenril> maybe this needs a changelog entry
[10:30:08] <mru> yeah, guess it does
[10:30:14] <mru> I always forget about that
[10:32:45] <CIA-92> ffmpeg: mru * r23654 /trunk/Changelog: ChangeLog: note libfaad wrapper removal
[10:39:51] <mru> astrange: btw, pts tracking with libavcodec is a bitch
[10:40:05] <Dark_Shikari> nevermind sometimes it just doesn't work
[10:40:10] <mru> I use 3 different methods
[10:40:19] <mru> reordered_opaque when that works
[10:41:11] <mru> if that fails, I track coded_picture_number
[10:42:02] <astrange> i'm surprised that i couldn't find a file that patch regressed on
[10:42:10] <mru> if that fails, I try pts out = dts in
[10:42:31] <mru> and finally I fall back on incrementing pts by frame duration
[10:42:38] <mru> for constant frame rate
[10:42:47] <mru> so 4 methods in fact
[10:42:48] <astrange> for files with missing dts the pre-reorderd pts can be 1 2 3 4, and then 1 4 2 3 comes out instead of 1 2 3 4
[10:43:04] <mru> which codec?
[10:43:15] <astrange> since there's no buffering it only sees the problem when it gets to 2, so it can only correct it to 1 4 5 6 and you lose two pts values
[10:43:17] <mru> I don't think all codecs support reordered_opaque properly
[10:43:30] <astrange> divx with packed b-frames. or ffmpeg streamcopy .h264 to .mp4 and play that
[10:43:50] <astrange> streamcopy from h264 is broken because the parser doesn't generate dts
[10:43:57] <mru> streamcopy h264 to mp4 will produce an invalid file
[10:44:13] <mru> mp4 requires pts and dts for all frames
[10:44:29] <Dark_Shikari> invalid? last I recall it just fails
[10:44:32] <mru> if you lie to the muxer, you get what you deserve
[10:44:43] <mru> Dark_Shikari: depends on how you lie
[10:44:48] <Dark_Shikari> I mean in ffmpeg.
[10:44:57] <mru> as long as dts is increasing, ffmpeg is happy
[10:44:58] <Dark_Shikari> and generating dts is trivial
[10:45:12] <Dark_Shikari> The problem is that ffmpeg doesn't have a universal system for doing it
[10:45:16] <mru> generating dts is usually just counting frames
[10:45:20] <Dark_Shikari> and instead forces you to repeat the same dts-generation code in every single demuxer
[10:45:21] <astrange> i think vfr + packed b-frames might regress with the patch...
[10:45:27] <astrange> hmm, i might actually have a 120fps avi that would do that
[10:45:37] <Dark_Shikari> but 120fps isn't vfr
[10:45:53] <astrange> the avi demuxer reads it as vfr
[10:45:54] <mru> avi is never vfr
[10:46:06] <Dark_Shikari> astrange: oh, by dropping null frames
[10:46:07] <astrange> it uses drop frames, not 120fps coded frames
[10:46:10] <mru> sometimes people set a stupidly high fps and pad with null frames
[10:46:18] <Dark_Shikari> well, even if avi isn't vfr, the demuxer is
[10:46:19] <mru> for a fake vfr effect
[10:49:39] <Dark_Shikari> dunno, seems perfectly "real" vfr to me
[10:49:43] <Dark_Shikari> a _crappy_ way of doing it, for sure
[10:49:59] <Dark_Shikari> inefficient, as it's O(duration * timebase denominator) in overhead instead of O(frames)
[10:50:01] <mru> fake at container level
[10:50:10] <mru> avi officially doesn't allow vfr
[10:50:23] <mru> how could it without timestamps?
[10:50:38] <mru> and null packets only get you so far
[10:50:57] <Dark_Shikari> are null packets a container level feature?
[10:51:01] <Dark_Shikari> or an abuse of the bitstream
[10:51:05] <mru> container abuse
[10:51:05] <Dark_Shikari> i.e. something done on codec level
[10:51:15] <mru> zero-sized avi packet
[10:51:20] <Dark_Shikari> ah
[10:51:37] <Dark_Shikari> I would say that's "real vfr, horribly inefficient, in a way that isn't specified by the container"
[10:52:16] <Dark_Shikari> either way, just semantics.
[10:54:35] <mru> suppose your average rate is something normal but with random jitter
[10:54:49] <mru> requiring a time base of 1us or so
[10:54:59] <mru> then most of your avi file would be null packets
[10:54:59] <Dark_Shikari> as I said, horribly inefficient =p
[10:55:08] <mru> that's beyond horrible
[10:55:11] <Dark_Shikari> Most VFR doesn't have such a crazy timebase though.
[10:55:16] <Dark_Shikari> millisecond is common enough.
[10:55:24] <Dark_Shikari> for ms precision, it wouldn't be utterly unusable, just bad.
[10:56:11] <astrange> i think windows programs just force the closest available time on frames
[10:57:04] <mru> for playback you obviously have to quantise at the display refresh rate anyway
[11:41:29] <CIA-92> ffmpeg: cehoyos * r23655 /trunk/libavformat/spdif.c:
[11:41:29] <CIA-92> ffmpeg: Add IEC958 data_types for DTS-HD (data burst described in IEC 61937-5),
[11:41:29] <CIA-92> ffmpeg: E-AC-3 (61937-3 Edition 2) and TrueHD (61937-9).
[12:34:23] * lu_zero is back
[12:34:51] <lu_zero> ffrtsp is taking a bit of time before starting, beside that isn't different than live555 so far
[12:51:29] <Honoome> lu_zero: already back from cinisello?
[13:01:15] <CIA-92> ffmpeg: vitor * r23656 /trunk/libavcodec/mpegaudiodec.c:
[13:01:15] <CIA-92> ffmpeg: Fix breakage in compilation with --disable-mpegaudio-hp introduced in
[13:01:15] <CIA-92> ffmpeg: r23646.
[13:32:29] <Vitor1001> mru: aliasing in C sucks
[13:32:49] <Vitor1001> no wonder there are a lot of code that is just way faster in fortran :p
[13:33:07] <mru> hmm... fortran in ffmpeg...
[13:33:27] <Vitor1001> :p
[13:33:36] <Vitor1001> Its ugly as hell
[13:33:41] <mru> I know
[13:33:56] <mru> we could write polyglots for those who don't have a fortran compiler
[13:33:59] <Vitor1001> fortran written by scientist is even better
[13:34:23] <Vitor1001> s/fortran/fortran code/
[13:34:41] <Vitor1001> I wonder why C does not have some attribute for pointer that guarantees that
[13:34:55] <Vitor1001> 1- Pointers with this attribute are either identical or do not alias
[13:35:15] <Vitor1001> 2- They are as much aligned as the most restrictive instruction of the platform
[13:35:16] <Vitor1001> ?
[13:36:32] <mru> attributes wouldn't cut it
[13:36:45] <Vitor1001> You can call it a new type...
[13:36:51] <mru> same problem
[13:37:02] <Vitor1001> You can cast it to a pointer without a warning, but not the inverse.
[13:37:05] <mru> what if you have four pointers which are pairwise identical or disjoint
[13:37:23] <mru> and crosswise aliasing is entirely impossible
[13:38:32] <mru> or any ident-or-disjoint n-tuples
[13:39:24] <Vitor1001> If the compiler really need to know, it can always insert a if(a==b){ ... } else {...}, no?
[13:39:56] <mru> the compiler is free to insert whatever it wants
[13:40:05] <mru> as long as the code is semantically equivalent
[13:40:20] <Vitor1001> So where would such a strategy fail?
[13:40:27] <mru> you'll often see compiler check for alignment
[13:40:42] <Vitor1001> I suppose you are not talking about gcc?
[13:41:00] <mru> even gcc does that sometimes iirc
[13:41:12] <mru> armcc certainly does
[15:22:12] <nfl> merbzt: ping
[16:59:38] <CIA-92> ffmpeg: ramiro * r23657 /trunk/configure:
[16:59:38] <CIA-92> ffmpeg: Use ${strip} variable instead of just plain "strip". The former already has a
[16:59:38] <CIA-92> ffmpeg: leading cross_prefix in it. Fixes cross-compilation for darwin.
[17:16:20] <CIA-92> ffmpeg: alexc * r23658 /trunk/libavcodec/ (ps.c ps.h): Remove iid_mode from the PS context.
[17:29:41] <CIA-92> ffmpeg: alexc * r23659 /trunk/libavcodec/ps.c: Document the PS_BASELINE define.
[17:53:34] <peloverde> Does it ever seem like disable-everything behaves a little wonky sometimes?
[18:00:48] <mru> explain
[18:01:17] <mru> I assume you're enabling some stuff afterwards
[18:01:22] <peloverde> yes
[18:01:23] <mru> or are you running full debian mode?
[18:01:51] <peloverde> consider ./configure --disable-everything {--enable aac specific stuff} --enable-ffmpeg
[18:02:08] <peloverde> It doesn't build FFmpeg because it needs the buffer filter
[18:02:29] <mru> I suppose we could make it auto-select that
[18:02:36] <mru> the question is which should take precedence
[18:03:11] <peloverde> Also that same setup still seems to find and enable SDL and ffplay by default
[18:04:01] <mru> --disable-everything doesn't seem to disable quite everything
[18:05:38] <peloverde> indeed
[18:06:11] <_av500_> --disable-everything on its own should build nothing in the end, no?
[18:06:48] <mru> it only disables codecs etc
[18:07:04] <mru> not building anything at all is much simpler
[18:07:06] <mru> don't run make
[18:07:12] <peloverde> IMHO it should disable the ffprogs as well
[18:07:37] <mru> whatever we do, someone will be inconvenienced
[18:08:18] <peloverde> nobody is happy, the sign of a good compromise
[18:12:26] <peloverde> wonderful... a regression (I think) somewhere deep inside SBR
[18:14:47] <mru> it's not a regression if it never worked
[18:16:15] <peloverde> That's what I'm checking
[18:16:36] <peloverde> I really need to get this AAC stuff inside FATE
[18:17:14] <mru> we need to rewrite fate properly
[18:17:41] <mru> having mike as single point of failure is not acceptable
[18:17:51] <mru> and he's way too slow to react
[18:18:13] <peloverde> The bulk of AAC can be tested by PSNR and/or off-by-1
[18:27:06] <KotH> mru: feel free to rewrite fate ;-)
[18:27:39] <mru> no time right now
[18:28:19] <peloverde> And the answer is never worked or broke before I merged SBR
[18:28:38] <mru> time to bisect the private repo
[18:50:48] <peloverde> Woohoo it's a confirmed regression
[18:56:08] <mru> so fix it, what are you waiting for?
[18:57:08] <twnqx> christmas!
[18:57:14] <twnqx> sorry, couldn't resist.
[18:58:10] * mru wonders whether the queen has the power to move christmas
[19:05:40] * peloverde would gladly revert this commit, but some claimed in the review that it was "cleaner," so I need to figure out how to properly fix it
[19:05:56] <mru> which commit?
[19:07:25] <peloverde> http://github.com/aconverse/ffmpeg-heaac/commit/62b32cbd5ff1e6f20ae5c530a1f…
[19:12:20] <mru> sure, it's cleaner
[19:12:24] <mru> but if it's also wrong...
[19:13:30] <mru> I'd still try to split it by sign
[19:13:47] <mru> might be nicer
[19:22:07] <peloverde> found it
[19:24:58] * peloverde kicks CIA-92
[19:24:59] <CIA-92> ow
[19:25:08] <CIA-92> ffmpeg: alexc * r23660 /trunk/libavcodec/aacsbr.c: 10l: aacsbr: Fix f_master[2] calculation when k2diff == -1.
[19:28:53] <peloverde> I enjoy how the CT PS signaling testsuite uses SBR features not found in the SBR testsuite
[19:38:56] <CIA-92> ffmpeg: alexc * r23661 /trunk/libavcodec/ps.c:
[19:38:56] <CIA-92> ffmpeg: Allow PS envelope fixup when ps->num_env_old <= 1.
[19:38:56] <CIA-92> ffmpeg: It is already rejected by the "source >= 0 && source != ps->num_env" 0 envelope
[19:38:56] <CIA-92> ffmpeg: case and is perfectly legally for the suppressed final envelope case.
[19:54:02] <Dark_Shikari> is that ban on german ips or whatever still up?
[19:54:08] <Dark_Shikari> I recall there was some ip block from ffmpeg.org
[19:54:17] <mru> there are many blocks
[19:54:27] <mru> half of china is blocked
[19:54:27] <Dark_Shikari> there was some rather large one that has hit an unreasonable number of users
[19:54:31] <Dark_Shikari> iirc
[19:54:39] <Dark_Shikari> something like "blocking half of germany"
[19:54:47] <mru> arcor is blocked
[19:54:57] <Dark_Shikari> why?
[19:55:07] <mru> some idiot with dynamic ip is hammering our server
[19:55:52] <Dark_Shikari> yeah, there are a lot of users in #ffmpeg who can't download ffmpeg.
[19:55:56] <Dark_Shikari> it's kinda annoying.
[19:56:16] <mru> tell them to track down and kill the idiot
[19:56:33] <Dark_Shikari> it would be useful if they knew some bit of identifying information =p
[19:56:56] <Dark_Shikari> still, blocking millions of users because of one idiot is kinda silly
[19:57:10] <mru> we don't like it either
[19:57:17] <Dark_Shikari> then why do it?
[19:57:28] <Dark_Shikari> bandwidth is cheap
[19:57:47] <mru> no
[19:58:09] <Dark_Shikari> no what
[19:58:10] <mru> if we use too much bandwidth our sponsors will get angry
[19:58:24] <Dark_Shikari> "sponsors"?
[19:58:33] <Dark_Shikari> Fuck, you can get an unlimited-bandwidth host for $10 a month
[19:58:56] <mru> not with the capacity of ours
[19:59:25] <mru> dual xeon, 6GB ram
[19:59:29] <mru> piles of diskspace
[19:59:48] <Dark_Shikari> But you don't need that for a user-facing server
[19:59:49] <Dark_Shikari> we have no php
[20:00:20] <Dark_Shikari> you could serve our entire website off a beagleboard
[20:00:49] <mru> right now our load average is 0.7
[20:01:02] <Dark_Shikari> yes, that's because of svn, probably
[20:01:07] <mru> yes, mostly
[20:01:38] <mru> now it's 0.8
[20:03:02] <mru> we average close to 10Mbps out
[20:04:20] <Dark_Shikari> so....
[20:04:22] <mru> those "free" hosting services are only good for small blogs
[20:04:24] <Dark_Shikari> if we switched to git
[20:04:45] <Dark_Shikari> I imagine we could cut down rather a lot on that load average
[20:05:12] <osaft> Arcor is blocked?_?
[20:05:17] <Dark_Shikari> mru: our website *is* a small blog
[20:05:20] <SpeedsterF2> hi
[20:05:22] <Dark_Shikari> in terms of size and traffic
[20:05:25] <Dark_Shikari> it's the svn that's the problem
[20:05:41] <SpeedsterF2> I am the arcor user that has frequent connection problems
[20:05:58] <mru> Dark_Shikari: fine, you take over
[20:06:21] <mru> see if you can manage the availability we've had until now
[20:06:31] <Dark_Shikari> you mean totally shit availability?
[20:06:32] <SpeedsterF2> I just wonder why i can access ffmpeg.org from time to time, but on other days i fail
[20:06:37] <Dark_Shikari> where millions of users can't access the website most of the time?
[20:06:43] <Dark_Shikari> where svn takes 5 minutes for a checkout?
[20:06:50] <Dark_Shikari> the current "availability" is laughable
[20:07:04] <mru> in the last year we've had _one_ unplanned outage lasting more than an hour or so
[20:07:08] <Dark_Shikari> you can't just block half the world and then say "oh, well we're highly available to the rest of it"
[20:07:32] <CIA-92> ffmpeg: alexc * r23662 /trunk/libavcodec/ps.c: Use memcpy() where appropriate in PS stereo processing remapping.
[20:07:38] <Dark_Shikari> "we're highly available to 127.0.0.1"
[20:07:43] <mru> $ time svn co svn://svn.ffmpeg.org/ffmpeg/trunk ffmpeg.svn
[20:07:46] <mru> real 0m7.088s.
[20:08:07] <Dark_Shikari> ok, try this then
[20:08:10] <Dark_Shikari> "time svn log > /dev/null"
[20:08:18] <mru> that's svn being full of suck
[20:08:38] <Dark_Shikari> lol
[20:08:48] <peloverde> time git svn clone...
[20:08:54] <Dark_Shikari> so... git ....
[20:09:04] <mru> git svn clone obviously doesn't matter
[20:09:14] <mru> fetching every rev sequentially has to take a while
[20:09:28] <Dark_Shikari> 50.5 seconds
[20:09:47] <peloverde> When we switch to git, it will make life a lot easier
[20:09:53] <Dark_Shikari> 0.13 seconds to get the git log of x264
[20:10:00] <Dark_Shikari> Yes sure it's shorter, but not 500 times =p
[20:10:06] <Dark_Shikari> what's git-svn say for ffmpeg's log?
[20:10:14] <mru> git clone of ffmpeg takes 20s here
[20:10:51] <Dark_Shikari> what about git log of ffmpeg?
[20:11:37] <Kovensky> git log is generated locally, it doesn't count
[20:11:48] <Kovensky> the only git times that matter are clone, fetch and push
[20:12:23] <Kovensky> well, clone is "syntatic" sugar for init + remote + fetch, so yeah, fetch and push
[20:12:30] <Kovensky> (+ checkout)
[20:13:10] <Dark_Shikari> Kovensky: no, it does count
[20:13:21] <Dark_Shikari> the whole point is that it's local, so it's faster.
[20:13:50] <Kovensky> oic, I thought the discussion was about the server itself
[20:14:02] <SpeedsterF2> I just want to mention that blocking arcor keeps me from accessing URLs like http://git.ffmpeg.org/?p=ffmpeg;a=shortlog or http://lists.mplayerhq.hu/pipermail/ffmpeg-devel/, too
[20:14:02] <CIA-92> ffmpeg: alexc * r23663 /trunk/libavcodec/ps.c: Cosmetics: whitespace.
[20:14:26] <SpeedsterF2> I use those to keep being up to date on the discussions / development process
[20:14:56] <peloverde> "time git clone git://git.ffmpeg.org/ffmpeg" real 3m1.161s
[20:15:37] <Kovensky> peloverde: what about git://repo.or.cz/FFMpeg-mirror.git (yes, with uppercase M)
[20:17:57] <mru> peloverde: what's your downstream bandwidth?
[20:19:51] <Dark_Shikari> wow, that git server is slow
[20:19:57] <Dark_Shikari> I'm only getting like 150 KBps
[20:20:02] <Dark_Shikari> and my connection is 10mbit
[20:20:55] <peloverde> Kovensky: real 5m0.676s
[20:21:45] <elenril> Dark_Shikari: no, you're just in a wrong place
[20:21:54] <elenril> Receiving objects: 100% (116165/116165), 35.97 MiB | 7.58 MiB/s, done.
[20:21:55] <peloverde> mru, I'm paying for 15mbps but I get far less than that
[20:22:50] <mru> elenril: .cz probably has poor external lines
[20:23:09] <Dark_Shikari> 2m16s
[20:23:11] <mru> peloverde: I usually get 18-20Mbps down
[20:23:38] <peloverde> I'm getting 3.74 Mbps down atm :(
[20:24:00] <mru> explains the difference in clone time
[20:24:26] <peloverde> It's better weekdays during the day
[20:24:32] <peloverde> but it still is super shitty
[20:24:48] <peloverde> sadly it's the only ISP I can get around here
[20:25:14] <mru> there's a bit of competition here
[20:25:18] <mru> though not much at the high end
[20:25:55] <peloverde> The best government money can buy gave the major telcos money to build out fiber, but they never seemed to deliver on that
[20:27:01] <mru> Dark_Shikari: anyhow, I'm looking at that idct_dc again
[20:27:16] <mru> and it's mysteriously failing in a couple of cases
[20:27:23] * SpeedsterF2 has to leave now
[20:27:35] <SpeedsterF2> thanks for your investigations
[20:28:25] <SpeedsterF2> I hope that there will be a way for me to access natsuki services via my arcor ISP in the future
[20:28:42] <Dark_Shikari> mru: "failing" as in "not matching C"?
[20:28:51] <Dark_Shikari> or "being totally utterly wrong"
[20:28:59] <SpeedsterF2> Or i have to find a (free) proxy to hide my "identity"
[20:29:02] <mru> C not matching C
[20:29:48] <Dark_Shikari> mru: o.0 0.o o.0
[20:32:54] <lu_zero> O_o?
[20:35:15] <peloverde> ಠ_ಠ
[20:37:01] <Kovensky> <@Dark_Shikari> wow, that git server is slow <@Dark_Shikari> I'm only getting like 150 KBps <-- inorite
[20:37:09] <Kovensky> <@Dark_Shikari> and my connection is 10mbit <-- wasn't it 1gbit
[20:38:17] <Dark_Shikari> at school
[20:38:21] <Dark_Shikari> now I'm at a condo in oc
[21:14:43] <BBB> Tjoppen: so I think I need more background info, what exactly isn't working with http seeking? is it just the fact that it hangs?
[21:14:53] <BBB> I mean, seek isn't supposed to work if content-range is missing
[21:19:05] <mru> Dark_Shikari: ugh, block_last_index isn't always correct
[21:19:52] <Dark_Shikari> sounds like a bug
[21:21:19] <Tjoppen> BBB: well, first of all you can't rewind
[21:21:26] <mru> Dark_Shikari: yes, probably
[21:21:44] <mru> Dark_Shikari: block_last_index says 0 even though block[63] == 1
[21:21:47] <mru> in one case
[21:21:49] <Tjoppen> also, it is entirely possible that the server accepts Range but doesn't yet know the size of the file
[21:22:20] <BBB> the way for a server to indicate that it supports range is by sending out content-range
[21:22:27] <Tjoppen> I've managed to get non-finished uploads playing this way
[21:22:28] <BBB> afaict
[21:24:34] <Tjoppen> or rather, it's a bit more serious: if for some reason the server uses chunked encoding but provides a content-range, it would still beak
[21:24:45] <Tjoppen> *break
[21:25:06] <BBB> ?
[21:25:16] <Dark_Shikari> mru: what video format
[21:25:19] <mru> mpeg2
[21:25:31] <BBB> Tjoppen: look, if you want this to work, then first it needs to work for the broken cases it's trying to fix
[21:27:02] <Dark_Shikari> mru: oh wow, so not even something with e.g. ac pred
[21:27:30] <Tjoppen> well, the only thing that is broken is that it can end up sending the headers using chunked encoding
[21:29:49] <BBB> I cannot reproduce that
[21:29:59] <BBB> it doesn't seek here, ever
[21:30:03] <BBB> because content-range is missing
[21:30:07] <BBB> and is_streamed is 1
[21:30:10] <BBB> so it doesn't ever seek
[21:30:15] <BBB> it doesn't even attempt to
[21:30:22] <BBB> it just continues playing
[21:30:27] <BBB> (try e.g. a .rm file)
[21:30:36] <BBB> that sounds like correct behaviour to me
[21:31:21] <Tjoppen> try saying content-range: 0-
[21:33:29] <mru> found something
[21:33:39] <mru> mpeg12.c line 940
[21:34:19] <mru> what's that all about?
[21:34:53] <Dark_Shikari> I have no idea wtf.
[21:38:27] <BBB> Tjoppen: ok, so I have fixed that now by using url_write instead of http_write
[21:38:35] <BBB> I still can't reproduce it, but it's better either way
[21:38:48] <BBB> so now all that's left is that chunked_size isn't reset after a seek?
[21:39:10] <CIA-92> ffmpeg: rbultje * r23664 /trunk/libavformat/http.c:
[21:39:10] <CIA-92> ffmpeg: Use url_write(), not http_write(), for sending the HTTP headers. This prevents
[21:39:10] <CIA-92> ffmpeg: them from being sent using chunked encoding (I don't think this ever happened,
[21:39:10] <CIA-92> ffmpeg: but either way it would be wrong).
[21:41:08] <BBB> Tjoppen: ok, so that's fixed also now
[21:41:21] <BBB> Tjoppen: now, still, it doesn't actually seek, so what do you suggest I do about that?
[21:41:53] <CIA-92> ffmpeg: rbultje * r23665 /trunk/libavformat/http.c:
[21:41:53] <CIA-92> ffmpeg: Reset chunksize back to zero (= no chunked encoding) after each new open
[21:41:53] <CIA-92> ffmpeg: connection (e.g. a seek). This fixes the theoretical case where a server
[21:41:53] <CIA-92> ffmpeg: sends a file first using chunked encoding, and then using non-chunked
[21:41:53] <CIA-92> ffmpeg: encoding.
[21:41:59] <mru> Dark_Shikari: I don't understand the purpose of that mismatch variable
[21:42:13] <Dark_Shikari> mru: neither do I
[21:42:16] <Dark_Shikari> ask michael
[21:42:24] <Dark_Shikari> oh.
[21:42:29] <Dark_Shikari> dct mismatch?
[21:42:43] <Dark_Shikari> Oh god.
[21:42:45] <Dark_Shikari> I see what he's doing.
[21:42:56] <mru> please explain
[21:43:13] <Dark_Shikari> To account for dct mismatch, he's randomly adjusting the last coeff
[21:43:23] <Dark_Shikari> based on the number of nonzero coeffs
[21:44:05] <mru> it's xoring the lsb of all coeffs into the last one
[21:44:12] <Dark_Shikari> yes
[21:44:40] <mru> which of course sprinkles +-1 all over the final block
[21:45:12] <Dark_Shikari> yes
[21:45:15] <Dark_Shikari> to dither the dct mismatch
[21:45:26] <mru> urg
[21:45:39] <Dark_Shikari> and it doesn't set the last nonzero coeff properly
[21:45:46] <Dark_Shikari> stupid.
[21:45:46] <mru> the last bit I noticed
[21:45:56] <Dark_Shikari> check the svn blame on that
[21:46:01] <mru> since forever
[21:46:06] <Dark_Shikari> I wonder when
[21:46:12] <Dark_Shikari> and if the omission of adjusting the last nonzero coeff was intentional
[21:47:59] <BBB> Tjoppen: I even removed all checks
[21:48:05] <BBB> Tjoppen: the fileserver example simply doesn't seek
[21:48:12] <BBB> Tjoppen: the behaviour of http.c is correct
[21:49:00] <BBB> Tjoppen: if my commits leave an actual bug now, I'd be happy to help, but I think apart from the case where chunked and non-chunked encoding are mixed, http.c is doing the correct thing right now
[21:50:15] <mru> Dark_Shikari: it's been there from the start
[21:50:27] <Dark_Shikari> mru: WHAT
[21:50:36] <Dark_Shikari> er
[21:50:42] <Dark_Shikari> have you checked cvs history?
[21:50:45] <mru> yes
[21:50:54] <Dark_Shikari> it was added in the initial commit of the mpeg decoder?
[21:50:57] <Dark_Shikari> by michael or fabrice?
[21:51:08] <mru> initial commit
[21:51:10] <mru> fabrice
[21:51:15] <Dark_Shikari> ..... holy shit
[21:51:19] <Dark_Shikari> I think we can justify removing it then
[21:51:19] <Dark_Shikari> lol
[21:51:19] <mru> initial commit of _ffmpeg itself_
[21:51:23] <Dark_Shikari> LOL
[21:51:51] * elenril thinks this need to go into quotes
[21:52:29] <mru> since the beginning of time, the big bang
[21:53:01] <Dark_Shikari> ok, feel free to propose removing that
[21:53:02] <Dark_Shikari> lol
[21:53:55] <BBB> Dark_Shikari: I added some macros to create simple h8/h16 variants of the mc function
[21:54:22] <BBB> Dark_Shikari: there is no easyway of macroing my way towards a v8/v16 right? I simply made a quick C function calling the v4 one twice with different src/dst offsets
[21:54:33] <BBB> (and then h+v as C functions calling both)
[21:54:39] <CIA-92> ffmpeg: alexc * r23666 /trunk/libavcodec/ps.c: Rename PS bitstream reading functions to have a read_ prefix.
[21:54:58] <mru> Dark_Shikari: you propose just removing the dithering?
[21:55:08] <mru> I'm fine with that
[21:57:59] <Dark_Shikari> mru: yes
[21:58:13] <Dark_Shikari> BBB: what did you do to macro it?
[21:58:16] <Dark_Shikari> link?
[21:59:21] <BBB> http://ffmpeg.pastebin.com/RGxXeekQ
[21:59:30] <BBB> I didn't remove the splats yet, I haven't coded this weekend
[21:59:32] <BBB> maybe monday or so
[22:00:08] <Dark_Shikari> that's wasteful
[22:00:23] <Dark_Shikari> do w8 by calling it many times, not by %rep
[22:00:26] <Dark_Shikari> or even by templating it
[22:00:33] <Dark_Shikari> templating or %rep needlessly increases code size
[22:00:39] <Dark_Shikari> there's no speed benefit really
[22:01:07] <BBB> http://ffmpeg.pastebin.com/at9WQzzR
[22:01:11] <BBB> that's what I do for v8/v16
[22:01:19] <BBB> should I do that for h8/16 as well?
[22:03:01] <Dark_Shikari> yes
[22:03:11] <Dark_Shikari> well, ideally you should do it in asm
[22:03:15] <Dark_Shikari> that would eliminate the calling overhead
[22:03:22] <Dark_Shikari> i.e. in x264 where I showed you NXN_DCT
[22:03:24] <Dark_Shikari> is a good example
[22:03:26] <Dark_Shikari> but you can do that later.
[22:03:48] <BBB> uhm, right, but then it needlessly increases codesize again, no?
[22:03:54] <Dark_Shikari> no
[22:03:56] <CIA-92> ffmpeg: alexc * r23667 /trunk/libavcodec/ps.c: psdec: Replace a division with a shift.
[22:04:01] <Dark_Shikari> just calling code
[22:04:11] <Dark_Shikari> this allows you to skip most of the calling convention
[22:04:19] <BBB> oooo
[22:04:19] <Dark_Shikari> but it's tricky, you can do that much later.
[22:04:20] <BBB> right
[22:04:28] <BBB> I see what you mean
[22:04:38] <BBB> right, because C will re-put all stuff to the stack
[22:04:46] <Dark_Shikari> in x86_32.
[22:04:50] <BBB> whereas I don't need to, I can just jmp
[22:05:09] <BBB> or, well, call x.aftersetup
[22:05:12] <BBB> then increase dst/src
[22:05:16] <BBB> and then do the same thing again
[22:05:27] <Dark_Shikari> yup
[22:05:31] <Dark_Shikari> it's skip_prologue in x264
[22:05:38] <Dark_Shikari> this also lets you load global constants into registers once at the start
[22:06:00] <BBB> I'll do that later, I figure the gain is rather small
[22:06:18] <BBB> I'll rewrite h8/16 to do this, first/for-now
[22:07:11] <Dark_Shikari> Yeah, stick with the current bit for now
[22:07:52] <BBB> so next, I was looking into the loopfilter
[22:08:05] <BBB> that's the top hit in my performance test (shark)
[22:08:24] <CIA-92> ffmpeg: cehoyos * r23668 /trunk/libavcodec/dca.c:
[22:08:25] <CIA-92> ffmpeg: Fix typo in macro name.
[22:08:25] <CIA-92> ffmpeg: Patch by Nick Brereton, nick nbrereton net
[22:08:52] <BBB> 8 functions, with 6 of them taking 7.7-10.2% each of total time in the video thread
[22:09:21] <Dark_Shikari> the real question you'll want to consider is whether the vp8 loopfilter is faster in 8-bit or 16-bit
[22:09:22] <CIA-92> ffmpeg: alexc * r23669 /trunk/libavcodec/ps.c: psdec: Simplify filter addressing by incrementing the "in" pointer.
[22:09:25] <Dark_Shikari> also, the loopfilter is generally pretty difficult to do
[22:09:36] <BBB> yeah, it looks complex
[22:10:01] <BBB> lot of clipping (both int8 as well as uint8)
[22:10:06] <BBB> and lots of conditionals
[22:10:13] <BBB> (if(...) { ... }
[22:10:38] <BBB> which looks particularly complex if you want to do multiple rows at once, you can't conditionally do one row and not the other (?)
[22:10:55] <Dark_Shikari> You should start by looking at existing loopfilter code
[22:10:57] <Dark_Shikari> e.g. VP3 and h264
[22:11:22] <Dark_Shikari> I would suggest doing idct first though
[22:11:25] <Dark_Shikari> it's only one function and it's easier
[22:11:42] <BBB> that's only 2.1% of total calling time
[22:11:47] <BBB> seems like it isn't really relevant
[22:12:01] <BBB> even ff_emulated_edge_mc() takes more time (3.1%)
[22:12:07] <mru> 2.1% is relevant
[22:12:11] <BBB> shouldn't that be optimized btw?
[22:12:20] * mru considers down to 1% certainly relevant
[22:12:26] <mru> smaller ones too if there are many
[22:13:44] <BBB> right, but you start with the stuff taking 10% right?
[22:13:45] <Dark_Shikari> BBB: it's highly relevant
[22:13:48] <BBB> especially if there's 6 of them
[22:13:50] <Dark_Shikari> it will be much more time once you optimize other things
[22:13:56] <Dark_Shikari> if ff_emulated_edge_mc takes more time, something is horribly wrong
[22:14:03] <BBB> it takes 3.1%
[22:14:04] <Dark_Shikari> edge mc shouldnt' be used that much unless your test video is 160x120
[22:14:15] <BBB> that's true
[22:14:20] <BBB> maybe I should check, my conditional may be broken
[22:14:25] <Dark_Shikari> Probably
[22:14:33] <CIA-92> ffmpeg: alexc * r23670 /trunk/libavcodec/ps.c: psdec: IPD/OPD reset is no longer needed by the context initializer.
[22:16:21] <BBB> it looks alright
[22:16:42] <Dark_Shikari> Make sure it is.
[22:16:58] <Dark_Shikari> e.g. by printing mvs and verifying by hand
[22:21:55] <BBB> it looks fine, it's a little broad, I'm finetuning it
[22:25:27] * mru waits for michael to react
[22:30:23] <Dark_Shikari> it violates the spec?!?!
[22:30:28] <Dark_Shikari> WHAT?
[22:30:32] <Dark_Shikari> THE SPEC MANDATES THIS?
[22:32:40] <mru> apparently
[22:32:42] <astrange> at least it was only in one spec
[22:33:03] <mru> why weren't you here a few minutes ago?
[22:33:12] <astrange> coincidence
[22:33:24] <mru> conspiracy
[22:33:25] <astrange> can you do an idct_dc_plus_last that handles the mismatch and still have it be simple?
[22:34:21] <Dark_Shikari> yes
[22:34:32] <Dark_Shikari> it'd be complicating it just for mpeg-2 though
[22:34:39] <Dark_Shikari> and would make the rest of the code slower because you'd have to check it
[22:37:15] <bcoudurier> interesting, work on mpeg-2 coming up ?
[22:37:46] <Dark_Shikari> we want to add an idct_dc for mpeg-alike codecs
[22:38:05] <Dark_Shikari> i.e. start merging some of the changes I made in my ipad-optimized flv decoder
[22:38:58] <bcoudurier> that is cool
[22:39:00] <Dark_Shikari> bcoudurier: so why does mpeg-2 have mismatch control
[22:39:02] <Dark_Shikari> but nothing else does?
[22:39:13] <mru> someone realised it was a silly idea
[22:39:17] <bcoudurier> no idea
[22:40:18] <bcoudurier> http://lists.mpegif.org/pipermail/mp4-tech/2002-June/000815.html
[22:40:40] <bcoudurier> gary sullivan seems to speak in that thread
[22:40:54] <BBB> Dark_Shikari: maybe some company had a patent on it and wanted to get cash fast?
[22:41:37] <mru> ok, new patch
[22:41:39] <bcoudurier> Then the complexity of every
[22:41:39] <bcoudurier> implementation is reduced and no funny "mismatch control"
[22:41:39] <bcoudurier> tweaking of coefficient values is needed. And every
[22:41:39] <bcoudurier> decoder will produce exactly the same pictures.
[22:43:26] <mru> "oddification" heh
[22:47:32] <kierank> did anybody else not receive any of the emails in that mpeg-2 thread?
[22:53:00] <bcoudurier> peloverde, congrats on PS
[22:53:05] <peloverde> thanks
[22:53:29] <bcoudurier> nice work
[22:57:17] <bcoudurier> anybody know people at netflix ?
[22:58:04] <Tjoppen> BBB: sry, was doing other stuff. I can re-check to see if it still causes a problem
[22:58:13] <Dark_Shikari> I'd like to talk to them at some point, as I know they make some use of x264
[22:59:21] <Dark_Shikari> mru: just read that email. I love gary's response
[23:00:21] <bcoudurier> astrange, after the pts patch, you might want to change the condition in do_video_out to <= 0.6 instead of 1.1
[23:01:29] <bcoudurier> and I belive it would be now better to have a flag AVFMT_HAS_PTS to formats having them
[23:02:01] <bcoudurier> for avi you want to use dts and assume pts of outputed picture is dts and use a fifo
[23:02:04] <BBB> Tjoppen: thanks
[23:03:31] <astrange> yes i think so
[23:03:40] <astrange> that would go better in libavformat
[23:04:16] <bcoudurier> lately I found that png in mov is a nice way to do lossless with quicktime, is there any yuv lossless supported by quicktime ?
[23:04:22] <bcoudurier> astrange, yes indeed
[23:04:26] <CIA-92> ffmpeg: alexc * r23671 /trunk/libavcodec/aacsbr.c: aacsbr: Make dk signed. There is no point in it being unsigned.
[23:05:04] <bcoudurier> ah astrange, I had a question for you, what happened to myuv and y420 fourccs in quicktime ?
[23:06:06] <astrange> i never figured out what myuv was
[23:06:19] <bcoudurier> I guess it was for the mpeg2 playback
[23:06:28] <bcoudurier> 4:2:0 yuv planar or something
[23:06:47] <astrange> yeah, it's "mpeg 4:2:0", but i don't see how it differed from y420
[23:06:57] <astrange> maybe the different chroma position
[23:07:17] <astrange> they should both work in mov files
[23:07:51] <astrange> i think the closest thing to a lossless yuv codec is prores (which is only mostly-lossless), there are some third-party ones
[23:08:06] <astrange> like sheervideo, x264encoder lossless mode + perian, some ffvhuff encoder + perian
[23:10:13] <peloverde> For all the talk of Macs being super good for multimedia, quicktime does have some strange holes
[23:12:04] <bcoudurier> prores is 4:2:2 10bit
[23:12:06] <bcoudurier> but it uses dct
[23:12:47] <astrange> FCP and co ignore the system when they have to, but still have problems (compressor uses the same awful encoder as quicktime, i think they had to ship MainConcept to get bluray support)
[23:12:59] <bcoudurier> it operates at very high bitrates so its "visually lossless"
[23:13:19] <bcoudurier> I never got y420 working in quicktime X
[23:15:25] <kierank> prores/aic and similar are there just for market segmentation
[23:15:51] <astrange> hmm... file a bug
[23:16:53] <bcoudurier> aic is 4:2:0
[23:25:09] <bcoudurier> astrange, do you have a working y420 sample ?
[23:26:05] <astrange> nope
[23:28:26] <RTFM_FTW> heh y420 shouldn't be difficult to work with anymore
[23:28:32] <RTFM_FTW> (on Mac OS X)
[23:28:49] <RTFM_FTW> specifically with APIs like IOSurface and friends
[23:29:01] <RTFM_FTW> which are now documented :)
[23:46:05] <peloverde> Is there a good way to use a tablegen in two different codecs?
[23:46:42] <mru> whatever sintable does
[23:46:45] <mru> and costable
[23:46:57] <mru> those are used in several places
[23:56:26] <bcoudurier> what's the best denoise filter around ?
[23:57:01] <mru> memset(0) :-)
[23:57:04] <Kovensky> <@bcoudurier> peloverde, congrats on PS <-- did it get enabled by default on mplayer btw?
1
0
[00:44:21] * Honoome wishes he could have a proper tiling wm for _just one viewport_
[00:44:52] * Dark_Shikari wonders why ptest has a 3 cycle latency
[00:44:57] <Dark_Shikari> and takes two uops.
[00:48:20] <Dark_Shikari> also, I love michael responses
[00:48:21] <Dark_Shikari> "
[00:48:21] <Dark_Shikari> this heuristic is simply poor
[00:48:23] <Dark_Shikari> "
[08:41:03] <KotH> ohayou gozaimasu
[08:41:39] <ohsix> oh no, morning
[08:43:01] <wbs> morning
[08:44:50] <av500> dobro jutro
[08:45:10] <thresh> доброе утро, да
[08:45:30] <spaam> God morgon :)
[09:56:58] <CIA-98> ffmpeg: vitor * r23646 /trunk/libavcodec/ (mpegaudiodec.c mpegaudio.h):
[09:56:58] <CIA-98> ffmpeg: Factorize the mpegaudio windowing code in a function and call it by a
[09:56:58] <CIA-98> ffmpeg: function pointer. Should allow for ASM optimizations.
[10:01:10] <wbs> e/wii Yuvi
[10:01:16] <wbs> oops ;P
[10:01:24] <wbs> Yuvi: ping
[11:21:33] <mru> "Unless Steve and company start genetically engineering new cats, the Mac is in serious trouble."
[11:23:24] <mru> morning superdump
[11:23:32] <superdump> hellos
[11:23:55] <mru> how's sthlm?
[11:25:19] <Dark_Shikari> http://hakuryu.jp/bin/nicky/2008/chinacapacitorek1.jpg
[11:25:36] <mru> old but good
[11:27:07] <Dark_Shikari> also
[11:27:14] <Dark_Shikari> MPEG HRD is officially the most retarded fucking thing ever
[11:27:24] <Dark_Shikari> LETS DEFINE AN ENTIRE SYSTEM THAT RUNS ON INFINITE-PRECISION REAL MATH
[11:27:28] <Dark_Shikari> ... AND THEN ROUND OFF ALL THE BITS AT THE END
[11:27:43] <bilboed-tp> the Turing Codec
[11:27:52] <Dark_Shikari> And of course, let's make sure not to explain in the spec how you're supposed to actually implement it
[11:28:08] <mru> Dark_Shikari: H is for hypothetical
[11:28:10] <Dark_Shikari> And let's design it so that you need, in practice, probably at least 96 bits of internal precision
[11:28:15] <mru> nobody is asking you to build one
[11:28:19] <Dark_Shikari> mru: But you still have to model it
[11:28:27] <Dark_Shikari> if your precision is not infinite you will get rounding error
[11:28:37] <Dark_Shikari> Which will cause compliance errors
[11:28:46] <mru> it doesn't have to be exact
[11:28:49] <Dark_Shikari> Yes it does.
[11:28:52] <Dark_Shikari> We just confirmed it does.
[11:29:10] <mru> as long as you keep the error on the right side it doesn't matter
[11:29:16] <Dark_Shikari> there is no right side
[11:29:55] <Dark_Shikari> Manao confirmed that you absolutely need to keep full precision or you will eventually desync in hrd
[11:29:58] <mru> maybe you won't achieve the most ideal rate control or muxing
[11:29:58] <Dark_Shikari> And he wrote Ateme's HRD.
[11:30:03] <Dark_Shikari> No, this isn't about ratecontrol
[11:30:09] <mru> what then?
[11:30:26] <Dark_Shikari> This is about how you write the buffer state that results from ratecontrol
[11:30:52] <mru> as long as you're within bounds, where's the problem?
[11:31:05] <Dark_Shikari> Because if you desync from reality, you will eventually exceed bounds when you didn't realize that you did
[11:31:19] <Dark_Shikari> For example, suppose you are 0.0000001 off every frame
[11:31:24] <Dark_Shikari> after 1 million frames, you're 0.1 off
[11:31:29] <mru> 0.1 what?
[11:31:32] <Dark_Shikari> of the buffer
[11:31:34] <Dark_Shikari> i.e. 10%
[11:31:43] <Dark_Shikari> that means a buffer fullness of 0.09 is actually -0.01
[11:31:50] <Dark_Shikari> And you just underflowed.
[11:32:17] <mru> so round down when removing bits and up when adding them
[11:32:31] <Dark_Shikari> Won't work.
[11:32:35] <Dark_Shikari> That'll just slow it down
[11:32:37] <mru> no it won't...
[11:32:39] <mru> work
[11:32:54] <Dark_Shikari> There's no guarantee roundups == rounddowns there
[11:33:03] <Dark_Shikari> Or even that round ups will be proportional to round downs
[11:33:10] <Dark_Shikari> Here's the problem: "buffer_rate" (the number of bits added to the buffer each frame) is a rational number (not an integer)
[11:33:41] <mru> that's a bit extreme...
[11:33:43] <Dark_Shikari> it is equal to (video max bitrate) * (ticks per frame) / (timebase denominator)
[11:33:52] <Dark_Shikari> This is a fraction.
[11:34:03] <Dark_Shikari> If you make it into a floating point number, even a double, you will have a slightly inaccurate buffer_rate value.
[11:34:11] <Dark_Shikari> This slight inaccuracy will accumulate over time.
[11:34:23] <Dark_Shikari> The only way to fix this is to express your buffer state as (bits) * (timebase_denominator)
[11:34:27] <mru> can't you fiddle the bitrate to make it an integer?
[11:34:42] <Dark_Shikari> i.e. Actual Buffer State In Bits = Buffer State / Timebase_den
[11:34:44] <Dark_Shikari> that makes it an integer
[11:34:49] <kshishkov> mru: is there a catch with "vcvt.f64.s32 dX, sY, #24" ? Looks like it does not do the right thing for me on doubles while "vcvt.f32.s32" works
[11:35:47] <mru> kshishkov: the catch is that it doesn't work with doubles
[11:36:12] <mru> wait
[11:36:44] <mru> that was vector ops
[11:36:50] <mru> plain vfp should work
[11:37:26] <mru> but not with that syntax
[11:37:33] <mru> both operands must be d regs
[11:37:34] <kshishkov> it's single op in reality - converting one 32-bit int to 64-bit double
[11:37:56] <mru> the assembler should've rejected that
[11:38:30] <mru> so you have to put your fixed-point value in an even s reg
[11:38:36] <kshishkov> it's VFP anyway - NEON does not work on them
[11:38:39] <mru> then use the corresponding d reg
[11:38:55] <mru> why are you doing double-precision vfp anyway?
[11:39:14] <kshishkov> for the reference :)
[11:40:35] <Dark_Shikari> why are you optimizing the reference
[11:40:48] <mru> maybe gcc fucks it up
[11:41:08] <kshishkov> unlikely - it's pure asm
[11:41:08] <Dark_Shikari> um, but why do we care about speed on the reference :/
[11:41:25] <mru> maybe C code isn't exact enough
[11:41:28] <kshishkov> that's for non-FFmpeg code anyway
[11:42:09] <kshishkov> but it's simple "convert fixedpoint int array to double" function
[11:45:40] <_av500_> mru: you are not in stockholm?
[11:45:51] <mru> _av500_: no, why would I?
[11:45:57] <_av500_> gee
[11:46:16] <mru> well, it's a nice city
[11:46:21] <mru> but I'm not usually there
[11:46:39] <_av500_> my kids are watching it live now :)
[11:46:49] <mru> stockholm?
[11:47:01] <_av500_> yep
[11:47:09] <kshishkov> good for them
[11:47:52] <kshishkov> it's the second day of wedding ceremony, isn't it?
[11:48:12] <_av500_> yes, the wedding it today
[11:48:35] * mru is fortunate to be far away from that mess
[11:49:42] * kshishkov was fortunate not to get to Kiev on the day of presidental inauguration
[13:54:24] <_av500_> kshishkov: turn on zdf for swedish lessons
[13:59:29] <mru> are they as "good" as swedish german lessons?
[14:00:02] <_av500_> well, i can understand what they are saying from the context easily
[14:00:06] <_av500_> mostly "yes"
[14:00:24] <mru> what do you know about swedish german lessons?
[14:00:35] <_av500_> nuthin
[14:00:52] <Vitor1001> mru: just wondering, which CPU gains a lot from handwritten asm functions and have slow floats?
[14:01:00] <mru> anything without an fpu
[14:01:05] <mru> armv5te
[14:01:07] <Vitor1001> and with a simd, no?
[14:01:11] <mru> fixed-point dsps
[14:01:46] <mru> gcc doesn't need simd to suck
[14:01:46] <Vitor1001> I mean, if there are no simd instructions, why would we get so much gain in comparison with plain C?
[14:02:25] <_av500_> gcc?
[14:02:39] <Vitor1001> that's true, but it is often solved by a few MAC macros and etc, instead of rewriting a whole function, no?
[14:02:53] <mru> depends
[14:03:10] <mru> those mpeg audio functions are quite terrible even with the mac macros
[14:03:31] <Vitor1001> :(
[14:04:12] <Vitor1001> BTW, on a cpu supporting neon, which mp3 decoder is faster?
[14:04:39] <mru> I'd have to write the neon code to answer that
[14:04:53] <Vitor1001> I mean, with no asm optimizations
[14:05:16] <Vitor1001> arm cpus that support neons have pretty fast floating-point math, no?
[14:05:23] <Vitor1001> s/neons/neon/
[14:05:25] <mru> not the a8
[14:05:31] <mru> plain floats are atrociously slow on A8
[14:05:37] <mru> and gcc makes them even worse
[14:05:58] <mru> right now the float version is 4x slower than fixed-point
[14:06:28] <mru> it's not obvious that neonifying those functions would offset the slowdown in the non-simdable parts
[14:07:13] <Vitor1001> Its funny that an arch with fast float simd has slow plain floats...
[14:07:40] <mru> the A8 vfp unit is widely regarded as a mistake
[14:07:53] <mru> the A9 one is much faster
[14:08:09] <Vitor1001> but you can't use neon to do plain (not packed) floats?
[14:08:15] <mru> A9 float performance is actually good
[14:08:38] <mru> it's allegedly possible to make some single-precision ops use the neon pipeline
[14:08:44] <mru> but gcc certainly doesn't do it
[14:09:02] <Vitor1001> :p
[14:09:43] <Vitor1001> So simd'fying float code in neon has a huge speedup even for simd standards?
[14:09:58] <mru> neon fft is 12x faster than C on A8
[14:10:05] <Vitor1001> wow
[14:10:18] <mru> even though it only does two float ops per cycle
[14:10:48] <Vitor1001> That windowing function for mp3 is pretty simple to simdfy
[14:11:00] <mru> I know
[14:11:11] <Vitor1001> I'm pretty sure you can do it in no time by just "translating" my sse implementation
[14:12:13] <mru> no
[14:12:17] <mru> neon is very unline sse
[14:12:18] <Vitor1001> And it is the single most time-consuming function for mp3 decoding...
[14:12:20] <mru> much easier to use
[14:12:46] <mru> although instruction scheduling is more critical
[14:13:03] <mru> float ops have a latency of 4 or 5 cycles
[14:13:22] <Vitor1001> After RE'ing a x86 fp-unit code, I find SSE so readable ;)
[14:13:42] <Vitor1001> How can be it that much simpler? Better shuffles?
[14:13:59] <mru> it's more complete
[14:14:07] <mru> random things aren't inexplicably missing
[14:14:18] <mru> and transposes are easy and fast
[14:14:53] <Vitor1001> BTW, I have a stupid question about asm in general:
[14:15:23] <Vitor1001> I often run out of registers (plain, not vector) when doing simd with a lot of pointers
[14:15:45] <CIA-98> ffmpeg: alexc * r23647 /trunk/ (12 files in 2 dirs): Add HE-AAC v2 support to the AAC decoder.
[14:16:01] <wbs> \o/
[14:16:01] <Vitor1001> Why don't the CPU manufacturers don't add a instruction to save the stack pointer in some temporary special register so we can use it?
[14:16:08] <iive> \o/
[14:16:15] <mru> Vitor1001: what would happen on an interrupt?
[14:16:55] <Vitor1001> Good question...
[14:17:12] <elenril> peloverde++
[14:17:19] <mru> the stack pointer has to be _somewhere_ at all times
[14:17:29] <mru> so you're really asking why isn't there one more register
[14:17:59] <Vitor1001> mru: Of course, it is just a pity we have it all the time in something that is soldered to all the circuits
[14:18:18] <Vitor1001> mru: It could be in somewhere "cheaper" in silicon than a register sometimes...
[14:18:36] <mru> then how would you access the stack?
[14:18:43] <Vitor1001> BTW, when will we get rid of FAAD?
[14:18:54] <mru> as soon as peloverde commits PS
[14:19:02] <Vitor1001> mru: You won't inside some inner loops that needs 7 REGS
[14:19:04] <mru> oh, he just did
[14:19:11] <Vitor1001> lol
[14:19:18] <mru> Vitor1001: that's why real cpus have more regs
[14:19:40] <Vitor1001> I thought adding more regs was easier said than done...
[14:19:58] <mru> building a register file is easy
[14:20:15] <mru> it's just a tiny sram
[14:20:43] <Vitor1001> Yes, but isn't the problem that you have to physically connect it to a lot of things?
[14:21:23] <mru> you need as many read/write ports as you can execute instructions in parallel
[14:21:49] <mru> and each doubling in size needs one more bit in the opcodes to address it
[14:22:11] <mru> it's mostly a tradeoff between number of registers and opcode size
[14:22:36] <mru> the sweet spot seems to be 16-32 registers
[14:23:09] <mru> you don't want opcodes wider than 32 bits
[14:23:11] <Vitor1001> Also true for vector units?
[14:23:32] <mru> and you do want 3 operands for most instructions
[14:23:52] <mru> that eats 15 bits if you have 32 registers
[14:24:03] <mru> leaving 17 bits to encode the instruction
[14:26:42] <mru> more registers also means slower context switching
[14:34:13] <mru> compare arm and mips instruction sets
[14:34:37] <mru> arm has fewer registers but a richer instruction set
[14:35:06] <mru> conditional everything, more powerful addressing modes etc
[14:35:20] <mru> shifted operands
[14:35:55] <Vitor1001> reverse vector multiplying?
[14:36:17] <mru> unfortunately not
[14:36:28] <mru> but it has a vector reverse instruction
[14:36:44] <Vitor1001> Why not a particular case of shuffling?
[14:36:57] <mru> then you need to set up the shuffle table first
[14:37:06] <mru> takes time and eats registers
[14:37:13] <mru> it has that too
[14:37:17] <mru> but it's rarely needed
[14:37:33] <Vitor1001> Indeed pretty different from sse
[14:38:10] <mru> reverse, transpose, interleave, and deinterleave are usually sufficient
[14:38:56] <Vitor1001> for most of the cases it's true.
[14:39:07] <Vitor1001> But SSE2 pshufd is nice...
[14:39:19] <mru> I wouldn't know
[14:39:21] <Vitor1001> shufps is stupid in comparison...
[14:39:32] <mru> sse is very unintuitive
[14:39:50] <Vitor1001> I would like to lean neon one day to try
[14:39:57] <Vitor1001> shame that qemu sucks for it :(
[14:40:03] <mru> get a beagle
[14:40:27] <Vitor1001> I know it's the easier way
[14:40:50] <mru> there's no point writing code without hardware
[14:42:03] <Vitor1001> Well, when you know it will be faster and you won't micro-optimize it while benchmarking...
[21:18:56] <j0sh_> BBB: is there a reason there's so much sdp stuff in rtsp.c?
[21:19:09] <BBB> like what?
[21:19:17] <j0sh_> mostly parsing
[21:19:19] <BBB> sdp really is the basis of rtsp, so it makes sense to place it there
[21:19:26] <BBB> sdp is the stream description
[21:19:31] <saintd3v> \o/ PS is in
[21:19:48] <BBB> j0sh_: sdp.c is only sdp.c encoding/creating
[21:19:51] <j0sh_> yeah, sdp can be used other places too... im just trying to figure out where to move the mp4/aac stuff
[21:19:53] <BBB> rtsp.c is sdp "decoding"
[21:20:03] <BBB> mp4/aac?
[21:20:19] <BBB> just into a new file, most likely, just like rtp_xiph.c or rtp_whatever.c
[21:20:28] <BBB> how's the vlc work going? :)
[21:20:40] <j0sh_> it's only a few lines, if you think that justifies a new file, i can do that
[21:20:44] <BBB> and how was your sister's wedding? :)
[21:20:52] <BBB> a new file is always ok, even if it's small
[21:20:59] <j0sh_> i turned in a vlc patch last friday, havent heard from anyone on it
[21:21:08] <BBB> got a link?
[21:21:27] <j0sh_> and sister's graduation was great... digging up the link
[21:22:40] <j0sh_> BBB: http://lists.mplayerhq.hu/pipermail/ffmpeg-soc/2010-June/009265.html
[21:23:13] <j0sh_> this was sent to vlc-devel, i CC'd ffmpeg-soc
[21:23:19] <BBB> oh, graduation :-p I thought it was wedding
[21:23:20] <BBB> sorry
[21:23:26] <BBB> congratulations either way
[21:23:58] <j0sh_> she's in hawaii living with her boyfriend right now... after gsoc, i just might pack up and bum out over there, heh
[21:24:00] <saintd3v> mru: sorry about that, didn't see the commit message when i sent that first email :/
[21:25:10] <BBB> hawaii
[21:25:12] <BBB> darn
[21:25:52] <BBB> patch is simple enough, so does this make rtsp://bla work through the ffmpeg rtsp layer?
[21:25:58] <j0sh_> i only need a power outlet and wifi to keep contributing to ffmpeg, hough :)
[21:26:02] <BBB> or is that still handled by something else by default?
[21:26:08] <BBB> you don't need wifi
[21:26:10] <BBB> look at me
[21:26:23] <j0sh_> are you on a modem?
[21:26:29] <BBB> I have no internet at home, I go to a public lounge with internet or chat at work
[21:27:12] <j0sh_> i like coding outside, my backyard is my de facto office
[21:27:26] <j0sh_> but yes, i distinctly remember working on a school project in a starbucks somewhere in puerto rico
[21:27:32] <BBB> hehe :)
[21:27:33] <j0sh_> for the first 3 days of spring break
[21:27:46] <j0sh_> anyway
[21:28:02] <j0sh_> the patch will make rtsp:// work by default using ffrtsp
[21:28:07] <BBB> excellent
[21:28:13] <BBB> I'm guessing it
[21:28:13] <BBB> '
[21:28:14] <BBB> oops
[21:28:24] <j0sh_> but i disabled live555, i dont know what's selected first, ffrtsp or live555
[21:28:30] <BBB> I'm guessing they'll ask for a few more features, like e.g. some more depayloaders, before they take the patch
[21:28:39] <BBB> but the patch is there, that's important
[21:28:49] <BBB> we can all work on the featureset together, during and after soc
[21:28:52] <j0sh_> svq3 and qdm2?
[21:28:56] <BBB> those too
[21:28:59] <BBB> live555 doesn't have them
[21:29:19] <BBB> I RE'ed them (and gst copied my code)
[21:29:39] <j0sh_> nice, nice, that feels good huh
[21:31:00] <j0sh_> seeing your work somewhere else
[21:31:51] <j0sh_> im looking through the live555 files real quickly, it seems like the only depacketizer we may be missing is the quicktime one
[21:32:26] <BBB> yeah
[21:32:40] <BBB> I added that to the end of the soc list, because I did some work and it's not easy
[21:32:47] <BBB> I had a basically working version (no longer applies)
[21:32:54] <j0sh_> http://people.gnome.org/~rbultje/ffmpeg-patchset/16-rtsp-x_qt.patch
[21:33:01] <BBB> but it needs to interact with the qt demuxer
[21:33:13] <BBB> and baptiste didn't really like my hack (and I understand why, it was a little hacky)
[21:33:17] <wbs> j0sh_: regarding sdp parsing; I don't mind too much if the generic parsing routines are in rtsp.c, but I don't want the mp4 specific code there
[21:33:19] <BBB> so that might need some work
[21:33:42] <BBB> j0sh_: oh yeah, that's a very old set of patches by the way
[21:34:08] <j0sh_> wbs: alright, i will move the mp4/aac stuff into a separate file or something
[21:34:15] <wbs> j0sh_: yeah
[21:34:42] <wbs> j0sh_: also, there's been some requests on getting the general parsing of a fmtp line shared between formats
[21:34:54] <wbs> j0sh_: there's a bit of code duplicated between e.g. amr, h264 etc
[21:34:58] <j0sh_> BBB: still a good start, it'll make my job easier
[21:35:13] <BBB> wbs: you can do that too :-p
[21:35:22] <j0sh_> wbs: i can look into that
[21:35:25] <BBB> ;)
[21:35:29] <wbs> BBB: yes, but j0sh_ can too :-)
[21:36:43] <Honoome> j0sh_: please don't tell me you're going to ask us to implement the server-side in feng for that :P
[21:37:07] <wbs> Honoome: nah, I guess using DSS for testing that is enough
[21:37:10] <CIA-92> ffmpeg: mstorsjo * r23648 /trunk/libavformat/rtsp.c:
[21:37:10] <CIA-92> ffmpeg: RTSP: Don't store the connection handles in local variables
[21:37:10] <CIA-92> ffmpeg: This removes some useless copying of handles, and simplifies error handling.
[21:37:10] <CIA-92> ffmpeg: Patch by Josh Allmann, joshua dot allmann at gmail
[21:37:27] <BBB> wbs: btw you're doing a lot of great work all around, keep doing that! :)
[21:37:41] <BBB> so we have all depayloaders that live555 has?
[21:37:43] <BBB> I'm surprised
[21:37:48] <BBB> what about e.g. gstreamer?
[21:37:50] <wbs> BBB: thanks :-)
[21:37:57] <BBB> any that they have and we don't?
[21:38:09] <Honoome> wbs: good, as long as my next work within feng is going to be simply rewriting the conf I'm happy :D
[21:38:16] <wbs> BBB: are you ok with the rtsp/http tunnel auth patches I sent yesterday(?)
[21:38:36] <j0sh_> Honoome: if i have no way to test, i just might :)
[21:38:42] <BBB> wbs: didn't look yet
[21:38:43] <BBB> let me check
[21:38:55] <BBB> I've been a little ... shall we say ... absent on rtsp patch review :)
[21:39:08] <wbs> BBB: yeah, but you're doing good job on VP8 instead :-)
[21:39:28] <wbs> j0sh_: DSS has a few samples with svq3 and qdm2 out of the box iirc
[21:39:38] <wbs> j0sh_: and I can generate a bunch of the other formats using quicktime broadcaster
[21:39:49] <BBB> patch #1 is of course OK
[21:39:56] <j0sh_> wbs: how do you find these leaks? with valgrind, i get a ton of leaks from sdl/x11 but none from ffrtsp (i tried testing the codepath you found)
[21:39:57] <BBB> maybe the caller should check though
[21:40:05] <BBB> but I'm ok with applying it, it's not speed-critical
[21:40:17] <wbs> j0sh_: yeah, it's quite noisy with ffplay - try using ffmpeg instead
[21:40:20] <BBB> please document that in the function doxy, that an empty string means no auth will be added, like NULL
[21:40:34] <wbs> j0sh_: ffmpeg itself shouldn't leak at all
[21:40:36] <j0sh_> wbs: ah, no ffplay. got it
[21:40:51] <BBB> 2 looks simple enough, so is ok
[21:41:45] <BBB> 3/4 doesn't look right
[21:41:52] <wbs> BBB: yeah, for #1, I prefer it that way, doing ff_url_join(..., auth[0] ? auth : NULL, ...) looks ugly
[21:42:19] <j0sh_> wbs: quicktime broadcaster requires osx... i must be the only person in the world who wiped osx on his macbook for ubuntu
[21:42:42] <wbs> j0sh_: yeah. I can generate any samples you want if you list them. :-)
[21:42:56] <j0sh_> cool, thanks
[21:42:57] <wbs> j0sh_: on the other hand, perhaps I should create the ones I want you to fix support for. ;-)
[21:43:14] <j0sh_> heh
[21:46:03] <BBB> j0sh_: I did that... at some point I got bored, ubuntu doesn't do all that much more than osx itself :-p
[21:46:19] <BBB> then the laptop broke and apple support removed ubuntu and reinstalled osx for me, unasked
[21:46:21] <BBB> then I gave up
[21:47:24] <wbs> j0sh_: also, for finding leaks - this one I found just by reading the code, it's a good trait to think about all the nitty gritty cleanup details when reading/writing it
[21:47:30] <CIA-92> ffmpeg: mstorsjo * r23649 /trunk/libavformat/rtsp.c:
[21:47:30] <CIA-92> ffmpeg: RTSP: Set the connection handles to null after closing them
[21:47:30] <CIA-92> ffmpeg: This fixes a potential issue when doing redirects.
[21:47:40] <wbs> BBB: so, what's wrong about 3/4, and how to fix it more cleanly?
[21:48:37] <j0sh_> BBB: yeah, i got frustrated with macports trying to get feng to work. wiped the whole thing rather than deal with broken things
[21:49:14] <BBB> heh :)
[21:49:21] <BBB> wbs: thinking ;)
[21:49:25] <BBB> wbs: it might be ok
[21:49:34] <BBB> wbs: but I want to think for a day or so if there's a better way
[21:49:36] <BBB> if not, it's ok
[21:49:58] <wbs> ok, I'll apply the first parts now at least, feel free to think about those then :-)
[21:50:00] <j0sh_> wbs: yup, it's a good that you're lending a second pair of eyes. would prob not have caught most of the things you've been fixing
[21:50:25] <BBB> ok
[21:50:28] <wbs> BBB: the problem mainly is doing proper auth when doing http posts
[21:50:41] <wbs> BBB: I've been doing a bit of work on those areas in libcurl vs git
[21:51:19] <wbs> since a proper http server doesn't say the "403, please use auth method foo" until you've done the full http post
[21:51:20] <BBB> if they share the same auth, I'm wondering if we shouldn't just share the same object
[21:51:38] <BBB> because as you say, nc/nonce would be shared
[21:51:58] <wbs> yeah
[21:52:02] <BBB> especially if the counter is supposed to increase regardless of which channel you send it over
[21:52:09] <BBB> then it's a bad idea to make them two separate objects
[21:52:11] <BBB> (imo)
[21:52:44] <wbs> yes, but on the other hand, the only requests done on these two sessions is the initial get, that may get a 403 reply, then a re-request on that one with proper auth, and an initial post request on the other one, no more
[21:52:57] <wbs> nobody is doing seeks on these :-)
[21:55:05] <j0sh_> speaking of seeks, isn't seeking technically still broken in http?
[21:55:27] <BBB> yeah
[21:55:30] <wbs> it shouldn't be, as far as I know?
[21:55:31] <BBB> :-(
[21:55:39] <BBB> I should probably just apply tjoppen's patches
[21:55:49] <BBB> I haven't had time to finish looking at it
[21:55:58] <wbs> but is_streamed is still set to 1 for the demuxers, though :-(
[21:57:28] <BBB> get the alloc work done so we can remove that :)
[21:57:41] <CIA-92> ffmpeg: mstorsjo * r23650 /trunk/libavformat/ (utils.c internal.h): ff_url_join: Don't add any at-char if the auth is an empty string
[21:57:54] <wbs> yeah, I should start looking at that ;P
[21:58:06] <wbs> the latest outlines after the discussions with michael should actually be doable
[21:58:33] <CIA-92> ffmpeg: mstorsjo * r23651 /trunk/libavformat/rtsp.c: RTSP: Add the auth credentials to the HTTP tunnel URL, too
[22:02:04] * BBB sets up a seeking http server to test
[22:02:23] <BBB> I'll be back later with patches for that :)
[22:02:32] <BBB> wbs: you work on getting is_streamed fixed in the mean time :)
[22:02:46] <BBB> then after that I'll do some more vp8 work
[22:03:18] <wbs> I think I'll work on getting some sleep here soon, but perhaps I should give it a shot tomorrow
[23:35:55] <peloverde> Why is the MPEG refsoft giving me an ld.bfd internal linker error!?
[23:54:11] <CIA-92> ffmpeg: alexc * r23652 /trunk/libavcodec/ps.c: psdec: Factorize iid/icc/ipd/opd parameter bitstream reading.
1
0
[05:15:58] <wbs> _av500_: in the original opencore, there sure is some arm-opts for amr, but they're not necessarily enabled for compilation in libopencore-amr
[05:20:43] <wbs> Yuvi: ping, is this what you meant? http://albin.abo.fi/~mstorsjo/0001-libvorbis-Only-drop-1-byte-packets-at-en…
[05:55:15] <thresh> moroning
[05:58:15] <av500> \o/
[06:00:40] <thresh> yes, yes
[06:28:52] <av500> kshishkov: send patch to convert comments to .se?
[06:42:37] <kshishkov> jag vill gÀrna göra det
[06:45:00] <av500> patchar vÀlkomna
[06:45:10] <KotH> a wonderfull good morning to those living in switzerland!
[06:45:38] <av500> so, mostly to german people...
[06:46:36] * KotH doesnt care much about the big canton in the north ;)
[06:46:55] <av500> i thought you are getting overcrowded with germans, no?
[06:47:28] <KotH> i didnt say i didnt care about the people comming to .ch to eat all my chocolate
[06:48:05] * elenril thinks mornings are overrated
[06:51:53] <benoit-> moin
[06:53:04] <benoit-> KotH: A spanish friend of mine told me she was going to throw away all the swiss chocolate she had :)
[06:55:10] <KotH> benoit-: must have gone bad in the climate there ;)
[06:55:36] <elenril> how can chocolate go bad?
[06:55:44] <elenril> chocolate is eternal
[06:56:35] <KotH> it can, blieve me it can
[06:57:08] <KotH> (though you need to do some pretty bad ass things to it)
[07:26:42] <twnqx> shouldn't the topic be updated in both channels to 0.6 has been released?
[07:30:47] <KotH> details!
[07:32:32] * kshishkov wonders why development channel should care about releases
[07:37:45] <av500> the topic is too long anyway
[07:37:57] <av500> i would drop the release bit from -devel
[07:38:13] * KotH takes out his katana and slices the topic into mouth sized pieces
[07:38:31] * kshishkov would prefer "FFmpeg development channel. If you want to talk about anything else, you're unwelcome"
[07:38:47] <wbs> that's short and concise at least :-)
[07:39:05] <av500> kshishkov: damn, what about european railway systems?
[07:40:06] <kshishkov> av500: Ukrainian sucks, German sucks less and has comfortable expresses, Swedish is the most comfortable if not that fast
[07:40:07] <KotH> kshishkov: or strange languages?
[07:40:24] <KotH> kshishkov: or how $othersoftware sucks?
[07:40:25] <wbs> KotH: hey, swedish isn't that strange ;P
[07:40:26] <kshishkov> KotH: like Romansh?
[07:40:42] <KotH> kshishkov: rather like lojban ;)
[07:41:57] <kshishkov> wbs: Swedish is mostly German without complications and more pleasant sounding. And tendency not to make words too long
[07:46:35] * pJok makes kshishkov speak skånska
[07:46:37] <pJok> ;)
[07:55:44] <CIA-98> ffmpeg: cehoyos * r23641 /trunk/libavformat/spdif.c:
[07:55:44] <CIA-98> ffmpeg: Add IEC958 data_types for Atrac* and WMA Pro.
[07:55:44] <CIA-98> ffmpeg: Data-burst is described in IEC 61937-7 (Atrac) and IEC 61937-8 (WMA Pro).
[07:57:24] <merbzt> http://tranquillity.ath.cx/clang/2010-06-17-1/
[07:59:42] <andoma> does anyone know of a receiver that is capable of WMA over SPDIF? or even AAC?
[07:59:57] <merbzt> pioneer
[08:00:03] <merbzt> had 3 versions
[08:00:18] <merbzt> I have a patch that was tested against one
[08:02:05] <merbzt> wmapro over spdif actually exists
[08:02:12] <merbzt> atrac not
[08:06:17] <kshishkov> are you sure?
[08:06:45] <kshishkov> maybe nobody just has that SOny equipment to verify if it exists in reality?
[08:07:12] <andoma> i'm a bit puzzled that most receivers does not support AAC ..
[08:07:38] <andoma> AFAIK it will be more and more common in HDTV broadcasts
[08:08:16] <av500> chicken/egg
[08:08:44] <merbzt> maybe cos the encoded stream is broken
[08:09:00] <merbzt> er multichannel aac is a mess
[08:10:30] <kshishkov> why? it's all nicely sorted out - many channels or channel pairs, all with the same ID if you're lucky
[08:10:47] <merbzt> :(
[08:12:16] <merbzt> it's a nice mess
[08:12:30] * kshishkov dabbled a bit in AAC
[08:12:45] <merbzt> you don't say ...
[09:37:59] <lu_zero> Fabio is here asking about the website template
[09:38:09] <lu_zero> which one you like best?
[09:38:23] <kshishkov> what, we'll get website template in addition to nice pictures?
[09:43:20] <mru> Honoome: ping
[09:43:35] <mru> or lu_zero
[09:43:51] <mru> do either of you know anything about the alsa ebuilds in gentoo?
[09:52:48] <DonDiego> bye
[09:59:17] <lu_zero> mru: hi
[09:59:25] <lu_zero> Honoome: should know a lot since he started them
[09:59:33] <lu_zero> what's up?
[09:59:51] <mru> what's the purpose of the ALSA_PCM_PLUGINS setting?
[09:59:57] <mru> if I disable anything it refuses to run
[10:00:10] <lu_zero> anything?
[10:00:13] <lu_zero> it shouldn't
[10:00:25] <mru> it spits errors about missing symbols
[10:00:56] <benoit-> /topic welcome to #gentoo-users :D
[10:01:02] <mru> I neither need nor want most of those
[10:01:38] <lu_zero> give me more details
[10:01:43] <lu_zero> I could try myself now
[10:02:31] <spaam> mru: do you have problems with gentooo? :)
[10:02:38] <mru> spaam: no, with alsa
[10:03:11] <ohsix> mru: you need to audit the configs it comes with if yuo're gonna be ditching some modules; a lot are used implicitly to give nice user labels for device names and functionality
[10:03:27] <mru> ohsix: stay out of this
[10:03:40] <ohsix> just saying; they'll go out of sync
[10:03:41] <lu_zero> spaam: alsa is a pain
[10:04:23] <spaam> lu_zero: it did work good for me back in 2004-2007 :)
[10:04:47] <lu_zero> spaam: once you do not start messing with hda, pulse and try to thin it down
[10:05:39] <spaam> ok. does ubuntu have this problem also with pulse? :)
[10:06:41] <mru> ALSA lib dlmisc.c:118:(snd_dlsym_verify) unable to verify version for symbol _snd_pcm_empty_open
[10:06:44] <mru> ALSA lib pcm.c:2175:(snd_pcm_open_conf) symbol _snd_pcm_empty_open is not defined inside [builtin]
[10:06:47] <mru> Playback open error: -6,No such device or address
[10:07:25] <lu_zero> spaam: ubuntu had and has problems with pulse
[10:07:29] <lu_zero> some self-inflicted
[10:07:36] <mru> speaker-test: pcm_plug.c:67: snd_pcm_plug_close: Assertion `plug->gen.slave == plug->req_slave' failed.
[10:07:40] <mru> Aborted (core dumped)
[10:07:46] <ohsix> they fixed most of the egregious things in 10.04
[10:07:48] <lu_zero> some due lennart ideas
[10:08:21] <ohsix> now they don't disable flat volumes and rtkit is there, and they pick patches from the stable tree like they're supposed to
[10:08:57] <mru> "pick patches ... like they're supposed to" <-- something's not right there
[10:09:12] <ohsix> was someone having a problem with pulse? i didn't follow what lu/spaam were saying
[10:09:28] <mru> ohsix: no, alsa refuses to work with useless plugins disabled
[10:09:31] <lu_zero> http://bugs.gentoo.org/show_bug.cgi?id=186365 <- mru that's related
[10:09:42] <ohsix> then why did spaam mention pulse
[10:09:43] * lu_zero is discussing the website with fabio at the mean time
[10:09:50] <mru> I only care about plain pcm playback
[10:10:08] <mru> _maybe_ with plug for the odd case that doesn't work otherwise
[10:10:17] <ohsix> also; the default configs use almost all of those useless plugins, speaker-test isn't going to work without the surround* and front, side, what have you names; those are in the config and slave pcms and stuff
[10:10:34] <mru> why is the default so fucked up then?
[10:10:54] <mru> it does in fact run with -Dhw:0 -c2
[10:11:00] <mru> apparently the hw doesn't like mono
[10:11:06] <mru> or alsa thinks it doesn't
[10:11:15] <lu_zero> sigh
[10:11:21] <lu_zero> alsa is overly complex
[10:11:26] <mru> no kidding
[10:11:30] <av500> :)
[10:11:36] <lu_zero> pulse is trying to hide that parts
[10:11:42] <mru> failing badly
[10:11:45] <lu_zero> (adding complexity and brain damage)
[10:11:50] <spaam> better to use oss? :)
[10:11:51] <av500> elefant hiding behind tree?
[10:11:56] <lu_zero> av500: no
[10:12:07] <lu_zero> hiding a forest behind an elephant
[10:12:14] <lu_zero> a pink one obviously
[10:12:17] <ohsix> pulse isn't hiding anything; just giving a uniform ui for picking devices and where streams should go, in light of devices coming and going
[10:12:26] <av500> isnt alsa doing that?
[10:12:52] <mru> that's what ohsix said yesterday
[10:13:00] <mru> but he's been known to be inconsistent before
[10:13:09] <ohsix> har har
[10:15:04] <ohsix> well i eat my own dogfood, and i don't have any problems with doing so, being able to play stuff at the same time without resorting to dmix is pretty gr8; and apps don't break with fragile fragment/buffer sizes when the runtime circumstances of the computer changes, and it minimizes latency \m/
[10:15:52] <mru> I'd like to be absolutely certain dmix is never used
[10:15:59] <mru> it's crap and I don't trust it
[10:16:11] <mru> adding ipc to random apps is never a good idea
[10:16:17] <lu_zero> mru: which device are you using?
[10:16:47] <mru> hda-intel hardware
[10:16:53] <ohsix> the configs are set up to use dmix if the device is single stream; default goes to dmix which slaves plug:hw
[10:16:54] <mru> I'd like to be able to use plughw in alsa
[10:17:02] <mru> for the rare cases when something isn't supported
[10:17:15] <av500> how do I make <random_sw> use pa?
[10:17:15] <mru> ohsix: AND I DON'T WANT THAT
[10:17:33] <ohsix> and thats fine; you'll just have to erase all the upstream configuration files heh
[10:17:39] <ohsix> i'm just telling you how it is
[10:17:57] <mru> alsa configs are worse than sendmail
[10:18:18] <kshishkov> but less flexible
[10:18:21] <ohsix> at least they don't hit m4 :]
[10:18:25] <ohsix> less?
[10:19:02] <mru> m4 is not required for sendmail
[10:19:06] <mru> I've configured it without
[10:19:21] <ohsix> i know, its just an old canard; it was a joke
[10:19:29] <spaam> mru: you dont use sendmail ? :O
[10:19:39] <mru> sendmail syntax isn't all that complex, just cryptic
[10:19:44] <mru> spaam: not anymore
[10:19:49] <mru> I did a long, long time ago
[10:20:20] <spaam> why did you change? :)
[10:20:29] <ohsix> i always figured they'd use m4 so they could insulate possibly old configs from internal changes
[10:20:31] <mru> postfix is nicer
[10:20:59] <mru> m4 is just a way to provide common templates
[10:21:16] <ohsix> mru: the default configs are quite extensive, they're in /usr/share/alsa
[10:21:25] <mru> I know where they are
[10:21:45] <mru> oh screw this
[10:21:47] <ohsix> they do things like add softvol to devices with no hw attenuators at the right places too
[10:21:49] <mru> I've better things to do
[10:21:57] <mru> I don't need softvol
[10:22:03] <lu_zero> once Honoome will appear he will be able to help better
[10:22:16] <mru> it's just a laptop anyway
[10:22:22] <KotH> mru, ohsix: are you at it again?
[10:22:30] <mru> although I'd like to configure the desktop sanely too
[10:22:34] * KotH blames spaam
[10:22:39] <ohsix> i know, just saying, there is a _ton_ of real work they put into those configs to normalize a lot of hardware
[10:22:49] <mru> ohsix: shut up please
[10:23:06] <spaam> KotH: noo. go back and sleep :O
[10:23:17] <ohsix> software is written against those labels too; so random stuff will break if you kneecap it
[10:23:31] <mru> I write my own software
[10:23:33] <ohsix> not trying to dissuade you or anything; don't get mad
[10:24:35] <siretart> SCNR: http://www.osscc.net/en/licenses.html#compatibility
[10:25:22] <ohsix> KotH: nah i have the luxury of something to do today :] (but a toothache!)
[10:25:44] <mru> ohsix: see, that's what you get when you troll too much
[10:25:59] <ohsix> nah i've had the tooth ache longer
[10:26:04] <mru> you've been trolling a long time
[10:26:22] <av500> siretart: yes, one guy at linuxtag hinted us that this is to come
[10:26:25] <ohsix> nah
[10:26:30] <mru> clarification: too much for your skill level
[10:26:45] <ohsix> i was excited to see you poking at alsa to see what you can be satisified with
[10:27:30] <mru> I'd be satisfied if it all went away
[10:27:49] <mru> I'd be happy if it did so with a huge boom
[10:27:52] <siretart> av500: I hope that guy wasn't shily himself!
[10:27:55] <ohsix> the matter of what plugins you use aren't with which ones you build, its with what the configuration lumps together for whartever the software was doing
[10:28:41] <mru> if all I'm trying to do is play a single pcm stream the hw can handle directly, why doesn't it just do that?
[10:28:44] <ohsix> i think you even need a plugin to play on a single channel when you can only open a device in stereo or more channels, too; dunno the name of it though (might be plughw)
[10:28:53] <ohsix> it does do that, if you open hw:
[10:29:10] <mru> not by default
[10:29:21] <mru> by default it pulls in all manner madness
[10:29:24] <ohsix> and if you open default: like most software should, it might get ornery with dmix and whatnot, trying to expose a uniform interface to the software
[10:29:43] <mru> that's why I disabled dmix and other junk
[10:29:49] <ohsix> ya, it does; software that opens hw: is considered specialized, rarified even
[10:29:52] <av500> siretart: no, some debian guy
[10:30:01] <av500> siretart: scnr: http://imagebin.ca/view/g2DJYP.html
[10:30:01] <mru> av500: same thing, different colour
[10:30:24] <siretart> av500: LOL
[10:30:38] <ohsix> the "device" can even have plugins and stuff in it too; its kind of ugly (like with speaker-test, -Dfile,butt.wav will slave in the file pcm and write it to the parameter)
[10:31:12] <mru> the default should do the best it can with the available plugins
[10:31:29] <ohsix> if you want no bullshit though, open hw:; if software is going to work with other software at the same time and make use of some default labels (left, right, center; front, that sort of stuff) use default: or that label
[10:31:31] <mru> not dump core the moment something is missing
[10:31:56] <ohsix> well the configuration isn't conditional on plugins present; they have a small set that are essential and they write the configs to it
[10:32:11] <ohsix> missing a slave plugin _is_ akin to a null pointer deref when you try and connect to it
[10:32:16] <mru> dmix sure as hell ain't essential
[10:32:31] <mru> and dumping core is always evil
[10:32:58] <ohsix> nope; unless a user expects to play stuff at the same time, which a lot do, even though dmix is awful and should never need to exist (alsa devs concur)
[10:33:12] <elenril> what, yet another alsa flame?
[10:33:31] <mru> if dmix isn't enabled, the first to open the device should get it, later ones whould fail gracefully
[10:33:45] <mru> now even the _first_ one dumps core
[10:33:48] <ohsix> i h8 dmix; you can't even pick parameters that would work without fail or without huge latency, its very not real time :[
[10:34:00] <mru> I don't know if that's due to dmix or something else
[10:34:11] <ohsix> mru: dmix isn't "disabled" by not building it though, the config files still slave it
[10:34:23] <mru> and that's the flaw
[10:34:33] <lu_zero> indeed...
[10:34:51] <ohsix> you should be able to move pcm/dmix.conf out of the way and it will work as if dmix wasn't there
[10:35:16] <ohsix> its not really a flaw; the configs go with the software, they're for integrators or not to be touched :<
[10:35:51] <mru> why do we need some mystical integrator to fix everything?
[10:36:00] <ohsix> you don't have one without the other, and you don't have all the fancy labels that software uses without them at all, you just have hw: (which you could open in software regardless of their presence)
[10:36:01] <mru> can't it just be made properly to begin with?
[10:36:10] <ohsix> they don't, and it is properly "made" from upstream
[10:36:21] <mru> and I don't need any "fancy labels"
[10:36:28] <ohsix> they do if they want the default behaviour to be different
[10:36:48] <ohsix> you may not need it but software already uses them, your software might not, but they aren't for your software
[10:37:10] <mru> I want to build the bare minimum that works with *my* software
[10:37:19] <mru> my software opens whatever I tell it to
[10:37:31] <ohsix> then just open hw:, speaker-test uses the labels
[10:38:03] <ohsix> some plughw stuff is used implicitly but it shouldn't be crashing in your own software if you're using hw: and none of the extra configuration labels
[10:38:10] <mru> but drop this, I have more important things to do
[10:38:28] <mru> speaker-test should _never_ dump core
[10:38:31] <mru> but it does
[10:38:38] <ohsix> tell the alsa developers
[10:38:45] <mru> I doubt they care
[10:39:27] <ohsix> if you remove software it depends on what do you expect? (they're slaved in asound based on the config, it could check the functors beyond just warning about them; but you'd need to contribute that, or tell the alsa developers to add it)
[10:39:56] <mru> I shouldn't need to tell them that software shouldn't dump core
[10:40:01] <ohsix> if they feel a normalized config comes with all the plugins enabled i doubt they're going to cater to each one of them possibly being not present
[10:40:04] <mru> what are they, java coders?
[10:40:31] <ohsix> thats a canard :[ it shouldn't dump core, but it also shouldn't be run in an incomplete manner; its not top trumps
[10:40:35] <mru> any configuration that can be built must run without crashing
[10:40:46] <mru> failing with a sensible error message is of course ok
[10:40:56] <mru> or even failing with a weird error message
[10:41:01] <mru> BUT NOT DUMPING CORE
[10:41:06] <ohsix> can it be built? you were setting an internal variable weren't you?
[10:41:18] <mru> of course it can be built
[10:41:23] <mru> how the fuck did you think I got it?
[10:41:40] <mru> just some --disable flags to configure
[10:42:05] <ohsix> well thats nice, you still have the matter of the config files
[10:42:22] <mru> which are flawed
[10:42:37] <ohsix> how? not working how you expect doesn't mean its inherently flawed
[10:43:10] <kshishkov> but not working at all does
[10:43:28] <ohsix> i can jam a screwdriver into my motherboard but that doesn't mean i'm not the one responsible for doing something silly
[10:44:03] <ohsix> well we're at an impasse again; you think its dumb or wrong but wont affect any change regarding it
[10:44:08] <mru> but if changing a bios setting makes it blow up in flames, I'd call it a flaw
[10:44:34] <ohsix> flames are relative; if i change the voltage on my cpu it isn't going to be happy, but its right there in the bios
[10:44:52] <ohsix> i get your point though
[10:45:11] <mru> if a particular configuration can never work, the build system shouldn't offer it
[10:45:14] <mru> simple as that
[10:45:35] <ohsix> well what you're building does work; but it is not complete without the kernel, the configs, and the software using it
[10:45:44] <mru> it's as if ffmpeg dumped core if built without the bink decoder
[10:45:52] <mru> even if only decoding mpeg2
[10:45:57] <ohsix> not really
[10:46:18] <elenril> \o/
[10:46:27] <mru> I'll remove the ban in a few hours
[10:47:00] <elenril> what do you use for mixing then if not dmix?
[10:47:05] <mru> I don't
[10:47:08] <mru> I play one thing at a time
[10:47:19] <elenril> :/
[10:47:57] <kshishkov> and silly sound notifications from programs are better to be turned off anyway
[10:48:32] * elenril doesn't use silly notifications
[10:48:43] <elenril> but e.g. flash likes to grab the soundcard for itself
[10:48:51] <elenril> inb4 don't use flash then
[10:49:29] <kshishkov> that's obvious
[10:50:01] * thresh uses flash
[10:51:49] * kshishkov takes a pity on thresh
[10:52:08] <av500> in .ru flash uses you
[10:52:22] <mru> I thought that was everywhere
[10:52:47] <kshishkov> mru: maybe except Adobe HQ
[11:31:54] <wbs> kshishkov: I have a patch for your reviewal. :-)
[11:32:46] <kshishkov> ok
[11:40:16] <lu_zero> mru: the bink decoder is a key component of ffmpeg
[11:40:43] <lu_zero> everybody wants to play any kind of video by transcoding to bink and then playing it
[11:41:29] <av500> i would vote to make that default
[11:43:52] <kshishkov> not on my watch
[11:44:04] <lu_zero> instead of vp8?
[11:44:07] * kshishkov dislikes Bink DCT
[11:44:30] * Compn is the last person waiting for vivo support
[11:44:36] <Compn> ehe
[11:47:03] <kshishkov> Compn: troll mru to get it supported or be trolled out of that stupid idea
[11:58:16] <thresh> OT: is there a way to force youtube not to recode your HD video ?
[11:58:29] <thresh> other than buying 50%+1 it's stock
[11:58:43] <mru> bribe someone
[12:01:32] <KotH> thresh: at least it's being reencoded with ffmpeg
[12:02:03] <Compn> i cant even find a contact at youtube to ask about potential samples
[12:02:23] <Compn> my google contacts havent been having good luck communicating with them either :\
[12:03:03] * KotH isnt surprised
[12:03:53] <CIA-98> ffmpeg: mstorsjo * r23642 /trunk/libavformat/rtmpproto.c:
[12:03:53] <CIA-98> ffmpeg: RTMP: Return from rtmp_read as soon as some data is available
[12:03:53] <CIA-98> ffmpeg: Earlier, the function only returned when the enough data to fill the
[12:03:53] <CIA-98> ffmpeg: requested buffer was available. This lead to high latency when receiving
[12:03:53] <CIA-98> ffmpeg: low-bandwidth streams.
[12:04:50] <thresh> someone should inject a bytecode sequence that will trigger ffmpeg to do vcodec/acodec copy, and then produce videos with that sequence
[12:08:15] <av500> \\\ooo///
[12:10:39] <av500> thresh: otoh, not recoding means to output potentially malicious stream 1:1 to millions of users...
[12:10:56] <mru> yay!
[12:11:01] <thresh> av500: so, win-win.
[12:11:08] <mru> ffbotnet
[12:11:11] <thresh> ffmpeg world domination task accomplished.
[12:11:45] <av500> mru: for next LT we should put tiny parts of BBB rendering into each ffmpeg run...
[12:11:59] <mru> hehe
[12:48:32] <av500> gee, Koleszar found a new use case for the "invisible" bit....
[12:48:53] <av500> MKV edit lists
[12:49:04] <av500> mark all frames outside of range as invisiable...
[12:54:29] <lu_zero> uhmA?
[12:54:47] * lu_zero wonders what's the exact problem there
[12:55:12] <lu_zero> still I like the ARF name
[13:43:50] <Tjoppen> whoa. ffplay seeks to a percentage of where along the width of the window you press? never noticed that before
[13:46:44] <kshishkov> it does so since the beginning, I think
[13:54:20] <av500> Tjoppen: yeah, I found that out only recently too
[13:54:28] <Tjoppen> ok. I always wondered what the logic behind its seeking way
[14:00:17] <mru> you guys don't read the source?
[14:00:34] <mru> shocking
[14:16:05] <Tjoppen> hehe :)
[14:16:16] <Tjoppen> hemma \o/
[14:43:58] <BBB> Dark_Shikari: is pshufw particularly slow?
[14:48:08] <BBB> hm, I guess I mismeasured
[14:48:18] <BBB> I have another 10% speedup by doing crazy pshufw magic
[14:48:28] <BBB> also useful for the sixtap
[14:57:26] <lu_zero> wonderful
[14:57:36] <lu_zero> somebody wrote me in chinese about feng
[14:57:53] <lu_zero> I just managed to take the text and thunderbird ate the email...
[15:03:12] <wbs> BBB: time to give some opinion on the rtsp/http tunnel auth thread? mainly, is it ok for ff_url_join() to behave the same if auth is NULL and auth is ""?
[15:04:57] <lu_zero> wbs: uh?
[15:05:04] <lu_zero> why that?
[15:05:31] <wbs> otherwise we'd have to do ff_url_join(... auth[0] ? auth : NULL, ...) in the rtsp code
[15:05:54] <wbs> skipping it perhaps would be ok, too, but then we'd pass a http://@server/ url to the http protocol, which looks funny
[15:06:35] <lu_zero> looks like I'm missing something
[15:07:20] <lu_zero> are you sure that makes sense put the auth in the http tunnel?
[15:07:46] <wbs> yes, I've been testing it with the private urls that stas oskin provided me with
[15:08:01] <wbs> the http protocol doesn't do anything with the auth unless the server actually responds with 403
[15:08:35] <lu_zero> so we have to move the auth stuff back and forth rtsp and http
[15:08:43] <wbs> umm, no
[15:09:00] <wbs> if the user specified auth for the rtsp url, we're not sure if it will be needed at the rtsp level or on the http tunnel level
[15:09:05] <wbs> so we just add it to the http tunnel urls
[15:09:29] <lu_zero> ok
[15:09:37] <wbs> then _if_ the http tunnel requests get a 403, the http protocol handler will retry using the auth credentials found in the url
[15:10:05] <wbs> likewise, if any of the rtsp requests (tunneled or not) get a 403, we retry using the auth that we were provided. if not, we never send the auth credentials out
[15:10:48] <lu_zero> ok
[15:11:19] <wbs> ..., so, if no rtsp auth was provided, the auth[42] buffer in the rtsp code will be just an empty string
[15:12:52] <lu_zero> hi BBB
[15:13:04] <wbs> so when creating the http tunnel url, we'd pass this auth buffer to ff_url_join(), but either add code to ff_url_join() to omit the auth part if the string is a non-null, but empty string. or make the ff_url_join() call contain ..., auth[0] ? auth : NULL, ...
[15:13:06] <BBB> hello
[15:24:16] <BBB> my sixtap is bitexact for half of my samples, but not for the other half
[15:24:20] <BBB> that's a little frustrating
[15:33:21] <BBB> ah, found it
[15:34:25] <BBB> Dark_Shikari: what do you think of http://ffmpeg.pastebin.com/eGkAPF8R as 6tap filter? it's 4x as fast as C and about 5 less instructions (plus 3 instead of 6 memory accesses inside the loop) compared to the one from libvpx
[15:35:33] <BBB> I'm still wondering if I can prevent the memory access to [ff_pw_64] by saving a reg somehow... ideas welcome :)
[16:01:46] <lu_zero> ff_pw_64?
[16:05:30] <lu_zero> libavcodec/x86/dsputil_mmx.c:DECLARE_ALIGNED(16, const xmm_reg, ff_pw_64 ) = {0x0040004000400040ULL, 0x0040004000400040ULL};
[16:05:33] <lu_zero> ok
[16:05:33] <lu_zero> uhm
[16:05:50] <lu_zero> I guess you have just 8 regs
[16:05:54] <BBB> I'm doing crazy stuff and I know little about it ;)
[16:06:22] <BBB> I want 1 zero reg, 3 filter constant regs and one reg for the ff_pw_64, so I have 3 regs to calculate, I don't think that's enough
[16:06:35] <BBB> so I free one by accessing ff_pw_64 directly, but that's likely slower
[16:07:14] <lu_zero> xmm0 isn't 0 by default in the operations you need a 0 ?
[16:07:22] * lu_zero wonders since he doesn't know
[16:07:35] * lu_zero points that altivec has some more regs available...
[16:07:36] <lu_zero> hmm
[16:07:40] <BBB> I use mm%d regs, not xmm%d
[16:07:48] <BBB> it's mmx only, for now
[16:07:52] <lu_zero> mm0 then
[16:07:52] <BBB> I learn baby-steps
[16:08:01] <BBB> I use mm0 for calculations :-p
[16:08:38] <Honoome> lu_zero: yeah ppc has more regs, but its code is encrypted by default
[16:09:04] <lu_zero> Honoome: pff
[16:09:37] <lu_zero> http://ffmpeg.pastebin.com/eGkAPF8R <- that wouldn't be any harder to read translated in vmx+ppc asm
[16:10:21] <Honoome> lu_zero: depends.. I prefer "one symbol one meaning" to "a combination of two to four symbols half a meaning"
[16:10:28] <Vitor1001> BBB: I'm a really asm noob, but is mm7 always 0?
[16:10:46] <BBB> Vitor1001: I pxor it in the beginning, and then use it as my zero constant
[16:10:56] <BBB> I need that for unsigned byte->word conversions by abusing punpck
[16:11:18] <BBB> and I'm probably more asm n00b than you :-p
[16:11:21] <Vitor1001> I understand, but maybe you can trade someway a extra register by an extra xor.
[16:11:37] <Vitor1001> I mean, you use mm7 as a temp reg and clear it afterwards
[16:11:45] <BBB> got it, that might work
[16:14:49] <BBB> would that be faster?
[16:15:04] <BBB> one extra pxor, one memory access replaced by a register access
[16:15:25] <BBB> my measurements have a lot of noise so it's hard to show convincingly
[16:15:57] <lu_zero> BBB: uhm
[16:16:07] <lu_zero> check which is the average latency of a load
[16:16:31] <lu_zero> and check how many arith ops you issue at the same time
[16:16:31] <BBB> I don't even know what that means
[16:16:52] <lu_zero> if you have a load it will take X cycles before it is ready
[16:17:12] <lu_zero> but you can do something while it is loading
[16:17:16] <lu_zero> e.g
[16:17:28] <lu_zero> load r1 memory
[16:17:38] <lu_zero> add r2 r3 r4
[16:17:56] <lu_zero> mul r4 r5 r6
[16:18:41] <lu_zero> add r7 r1 2
[16:19:01] <lu_zero> if the load takes less than the time of your add and the mul
[16:19:01] <Vitor1001> BBB: Even with START_TIMER / STOP_TIMER() you have a lot of noise?
[16:19:14] <BBB> a little, like 5% or so between runs
[16:19:15] <BBB> ues
[16:19:16] <BBB> yes
[16:19:25] <BBB> maybe it's because I'm watching worldcup matches at the same time :-p
[16:19:29] <Vitor1001> ;)
[16:19:36] <lu_zero> your load won't cost as much as having the cpu waiting for the r1 value
[16:19:56] <lu_zero> obviously you have to do the same for every operation
[16:20:17] <BBB> lu_zero: I'll play with instruction order, I'm hoping to get rid of all loads though
[16:20:51] <lu_zero> there are profilers helping spotting stalls
[16:27:52] <BBB> Vitor1001: I tried, it works, but it's really about the same speed...
[16:29:24] <BBB> I moved a load and now it's a lot faster from ~2850 to ~2750 cycles for the whole thing)
[16:29:29] <BBB> not bad
[16:30:44] <BBB> that's still 4x faster than the C code :-)
[16:31:09] <lu_zero> you are crunching 4x the number of bytes
[16:31:22] <lu_zero> so it's pretty much what you'd expect
[16:32:15] <BBB> http://ffmpeg.pastebin.com/3ZkJBGqX
[16:32:30] <BBB> had to rearrange a few variables for the register-save to work
[16:33:13] <lu_zero> # paddd mm0, mm3 ; add to 2nd 2px cache
[16:33:13] <lu_zero> # pxor mm3, mm3
[16:33:13] <lu_zero> # punpcklbw mm2, mm3 ; byte->word FGHI
[16:33:25] <lu_zero> doesn't look nice
[16:33:29] <BBB> ?
[16:33:42] <BBB> mm3 is my zero variable, I need to clear it before reusing it as such
[16:35:35] <lu_zero> probably if you use a different register and move the pxor on instruction far from punpcklbw
[16:35:44] <lu_zero> it might get faster
[16:35:50] <BBB> yeah, I see what you mean, but that's hard
[16:35:54] <BBB> because I'm reg-starved
[16:36:10] <BBB> 4/5/6/7 cannot be touched
[16:36:14] <BBB> 2 is taken for the load
[16:36:19] <BBB> 1 is the final result
[16:36:26] <BBB> 1/3 need to be added as intermediate products
[16:36:34] <BBB> so I need to use 3 and need it as a zero right after
[16:36:51] <BBB> 1/3 = 0/3 of course
[16:37:46] <BBB> if you see an obvious way to do it I'd of course try :)
[16:41:31] <Vitor1001> BBB: How many times the loop run?
[16:41:36] <BBB> 4
[16:42:01] <Vitor1001> Ok, because I think you can get the dec r3 out of the main loop...
[16:42:15] <BBB> ?
[16:42:46] <Vitor1001> you could start with a negative value of r1
[16:42:53] <Vitor1001> and increase it until it reaches 0
[16:43:11] <Vitor1001> Ow, scrap that, that's the pointer ;)
[16:43:20] <BBB> :-p
[16:43:55] <BBB> r0 is dest-src, r1=src
[16:44:59] <BBB> I could change r4 into stride*(h-1)
[16:45:10] <BBB> and then loop backwards instead of forward
[16:45:16] <BBB> but I doubt that'd be faster
[16:45:23] <Vitor1001> I see...
[16:45:25] <twnqx> a loop of length 4 sounds almost like "unroll me" if it saves a register
[16:45:35] <BBB> (and then access r0+r4 and r1+r4 instead of the current way
[16:46:10] <lu_zero> BBB: try unrolling
[16:46:19] <BBB> Dark_Shikari told me not to :-p
[16:46:27] <lu_zero> unroll+remap should let you avoid stalls
[16:46:37] <lu_zero> uhmm
[16:46:38] <BBB> I can try the macro way I guess
[16:46:54] <lu_zero> cache boundaries?
[16:46:57] <BBB> doesn't it quadruple codesize?
[16:47:03] <twnqx> yes.
[16:47:39] <lu_zero> so if you have a cache miss it might get way slower
[16:47:51] <kierank> BBB: you can join #x264dev and ask holger if you want more asm help if Dark_Shikari's not around
[16:47:58] <lu_zero> worth trying just for educational purpose
[16:48:11] <BBB> ok
[16:48:16] <BBB> kierank: who's he?
[16:48:24] <BBB> isn't pengvado also an asm god?
[16:48:41] <kierank> holger wrote some of x264's asm magic
[16:48:54] <lu_zero> siretart: ffmpeg 0.6 is already ubuntu 10.04 ?
[16:49:03] <kierank> with ridiculous speedups
[16:49:25] * lu_zero is baking a dummy-proof box
[16:50:41] <BBB> nah, same speed
[16:51:23] <BBB> thanks for the idea though :)
[16:51:39] <lu_zero> let me see the code
[16:52:03] <BBB> http://ffmpeg.pastebin.com/3ZkJBGqX
[16:52:35] <pengvado> pxor is latency 1. you don't really need to hoist it.
[16:53:02] <lu_zero> # movq mm1, mm2 ; byte ABCD..
[16:53:04] <lu_zero> uhm?
[16:53:11] <BBB> I need mm2 later
[16:53:15] <BBB> so I need a copy
[16:53:32] <BBB> (for CDEF/EFGH)
[16:53:40] <BBB> for mm1, I only care about ABCD
[16:53:44] <BBB> maybe I can make it a movd
[16:53:50] <BBB> is that faster?
[16:54:00] <lu_zero> so you have mov+mov
[16:54:11] <lu_zero> pengvado: which is the load latency?
[16:54:55] <BBB> compiler complains that I can't use mov/movd on two mm registers
[16:55:02] <BBB> unfortunate :-(
[16:56:52] <BBB> let me try the vertical 4-tap function
[16:56:57] <BBB> that should be easy also
[16:57:13] <pengvado> load latency is 3 cycles from L1, 15 from L2, or 300 from main memory.
[16:57:19] <BBB> Yuvi: can I commit this or would you like to merge the plain vp8 decoder without asm first?
[16:58:32] <lu_zero> ah
[16:59:21] <lu_zero> what about paddd ?
[16:59:41] <pengvado> 1/.5
[17:20:13] <enkidu> hello
[17:20:57] <enkidu> Dark_Shikari: what did you do in ffmpeg 0.6 x264 decoder, that it is able to play smoothly 720p on Atom processors?
[17:21:49] <BBB> there is no x264 decoder
[17:21:59] <enkidu> h264
[17:22:01] <elenril> x264 decoder? in my ffmpeg?
[17:22:03] <enkidu> sorry
[17:22:31] * enkidu is after 12 hours of real life...
[17:22:32] <elenril> and it wasn't D_S who did it
[17:22:49] <enkidu> so who did this opts?
[17:22:53] <janneg> it was Micheal
[17:23:02] <kierank> atom sucks
[17:23:16] <kierank> use the hardware acceleration that's probably present
[17:24:00] <enkidu> VO: [xv] 1280x720 => 1280x720 Planar YV12 [zoom]
[17:24:20] <av500> enkidu: 1080p plays find on atom
[17:24:31] <av500> if you use the HW decoder in the chipset... :)
[17:24:36] <enkidu> yeah...
[17:24:46] <mru> av500: that's not playing on atom
[17:25:04] <enkidu> but probably my is not featured with hw decoder
[17:25:15] <kierank> go and find out
[17:25:15] <av500> enkidu: but it was cheap! :)
[17:25:27] * mru doesn't believe in netbooks
[17:26:35] <Dark_Shikari> BBB: unrolling doesn't help speed unless it lets you save instructions
[17:26:42] <Dark_Shikari> Not on an OOE arch, that is.
[17:27:18] <BBB> I reduced the number of calls by another 2 or 3 on the function we did yesterday by doing pshufw, and limited to 1load per loop iteration
[17:27:28] <BBB> I'll let you see in a bit
[17:28:10] <Dark_Shikari> keep in mind pshufw is mmxext, so mark the function as _mmxext instead of mmx
[17:28:38] <BBB> oh :-(
[17:28:48] <BBB> does it matter?
[17:28:54] <BBB> should I make a plain mmx version also?
[17:29:03] <Dark_Shikari> Not unless you care about pentium 2
[17:29:07] <Dark_Shikari> or amd k6
[17:29:11] <BBB> see, this is so silly if the manual doesn't tell me what instruction set a function belongs to
[17:29:50] <Dark_Shikari> we don't care about it in x264.
[17:31:26] <BBB> ok
[17:33:15] <BBB> Dark_Shikari: http://ffmpeg.pastebin.com/raSzU6vh <- my current versions
[17:33:36] <BBB> pshufw is amazing by the way
[17:33:59] <Dark_Shikari> Is it faster?
[17:34:36] <BBB> it was 10% faster than yesterday's version with 1 load, which was 10% faster than the one we looked at (with 4 loads)
[17:34:53] <BBB> the sixtap one is 4x as fast as the C version
[17:35:00] <Dark_Shikari> You should pipeline things a bit more.
[17:35:03] <Dark_Shikari> i.e.
[17:35:05] <BBB> and has less instructions and less memloads (3 vs 6) compared to libvpx
[17:35:07] <Dark_Shikari> pshufw/pshufw/punpck/punpck
[17:35:12] <Dark_Shikari> you have a bit too much linear depnedency there
[17:35:15] <Dark_Shikari> won't hurt on OOE but it's just ugly
[17:35:29] <Dark_Shikari> pshufw mm0/punpck mm0/pshufw mm3/punpck mm3
[17:35:31] <Dark_Shikari> should be
[17:35:37] <Dark_Shikari> pshufw mm0/pshufw mm3/punpck mm0/punpck mm3
[17:35:50] <Dark_Shikari> and yes pshufw is amazing.
[17:36:23] <Dark_Shikari> You should be consistent with your syntax
[17:36:27] <Dark_Shikari> 0x94 and 9 as pshufw arguments?
[17:36:29] <Dark_Shikari> use 0x for both
[17:36:35] <BBB> oh right
[17:36:41] <Dark_Shikari> Other things to note
[17:36:43] <Dark_Shikari> movq mm1, mm2
[17:36:45] <Dark_Shikari> punpcklbw mm1, mm6
[17:36:54] <Dark_Shikari> Either:
[17:36:59] <Dark_Shikari> a) move the thing that uses mm2 to right after movq
[17:37:08] <Dark_Shikari> b) swap mm1 and mm2 for all instructions after movq
[17:37:20] <Dark_Shikari> this decreases the instruction chain length by 1
[17:37:26] <BBB> ?
[17:37:28] <Dark_Shikari> i.e. if you do a mov from a to b, use _a_ immediately after
[17:37:29] <Dark_Shikari> not b
[17:37:40] <Dark_Shikari> because b hasn't been written yet
[17:37:43] <BBB> really?
[17:37:46] <BBB> ok
[17:37:50] <Dark_Shikari> well obviously, the mov takes 1 cycle
[17:37:59] <BBB> I'll just invert the calls before that, less effort
[17:37:59] <Dark_Shikari> so a will be available one cycle before b.
[17:38:10] <Dark_Shikari> Note: this isn't meaningful on fancy CPUs.
[17:38:16] <Dark_Shikari> But, say, an atom might care a lot.
[17:38:43] * BBB wonders if he cares
[17:38:48] <Dark_Shikari> It's good form
[17:38:50] <BBB> I probably should :-p
[17:38:58] <Dark_Shikari> it doesn't take any extra code, doesn't make things uglier
[17:39:15] <enkidu> anyways
[17:40:02] <enkidu> do you remember first infos about h264? "to decode H264 10ghz processor will be needed"
[17:40:11] <BBB> mov access order changed as suggested
[17:40:18] <BBB> let me look at the pipelining a bit more
[17:40:20] <enkidu> as in one of articles from 2005
[17:40:22] <mru> enkidu: said who?
[17:40:28] <mru> certainly nobody with a clue
[17:40:32] <Dark_Shikari> Probably divx
[17:40:40] <Dark_Shikari> they were convinced that h264 was too complicated to ever implement in hardware
[17:40:43] <mru> h264 was designed to be usable
[17:40:48] <Dark_Shikari> There's some particular irony in that
[17:41:03] <mru> h264 was specifically designed to be implemented in hardware
[17:41:07] <Dark_Shikari> exactly
[17:41:20] <enkidu> mru: dunno, it was old article. most ppl were using first p4 then
[17:41:20] <mru> that's where the big money is
[17:41:45] <mru> and yet a 300MHz hw decoder does it without breaking a sweat
[17:42:14] <Dark_Shikari> and even a 3ghz p4 can do 720p h264
[17:42:59] <enkidu> the article was from the age of beating frequency boundaries
[17:43:14] <enkidu> when Intel was on increasing-clock line
[17:44:35] <mru> this i5 laptop plays it nicely at the lowest speed setting
[17:44:59] <BBB> ok, changed the order for pipelining also
[17:45:01] <Dark_Shikari> this i7 laptop plays 4K fine =p
[17:45:12] <BBB> now I need to work on the vertical one
[17:45:19] <av500> mru: new laptop?
[17:45:23] <mru> yeah
[17:45:30] <BBB> or maybe I should do the 8x8/16x16 horizontal-only ones
[17:45:33] <Dark_Shikari> BBB: fyi, for vertical, it's a bit different
[17:45:35] <av500> mru: the sony?
[17:45:38] <mru> yep
[17:45:38] <BBB> Dark_Shikari: I noticed
[17:45:45] <BBB> Dark_Shikari: I was going to look at how x264 does it ;)
[17:45:52] <Dark_Shikari> BBB: the relevant code is the hpel code
[17:45:56] <Dark_Shikari> that's the most similar thing in x264
[17:46:18] <av500> mru: model?
[17:46:25] <av500> (i forgot)
[17:46:26] <mru> z something
[17:46:32] <Dark_Shikari> BBB: mc-a2.asm, lines 144-162
[17:46:41] <Dark_Shikari> Yes, that's an entire row done in 6 multiplies ;)
[17:46:43] <mru> i5, 8GB
[17:46:45] <Dark_Shikari> Of course you're doing mmx.
[17:47:15] <mru> runs linux nicely
[17:47:16] <Dark_Shikari> BBB: x264 actually uses shift/add for non-ssse3 v filter
[17:47:18] <Dark_Shikari> so you can't do that.
[17:47:28] <Dark_Shikari> You'll probably find it most efficient to repeat your original H algorithm
[17:47:31] <Dark_Shikari> Since you won't need the pshufws.
[17:47:53] <Dark_Shikari> I still suggest you glance at the ssse3 one to see how awesome pmaddubsw is.
[17:48:04] <BBB> hehehe :)
[17:48:24] <Dark_Shikari> SBUTTERFLY, fyi, is ABCDEFGH, IJKLMNOP -> AIBJCKDL and EMFNGOHP.
[17:48:37] <Dark_Shikari> aka interleave bottom halves, interleave top halves
[17:48:59] <BBB> let me guess, ssse3 has some awesome instruction for that
[17:49:30] <Dark_Shikari> no
[17:49:35] <Dark_Shikari> it's just mova, punpcklbw, punpckhbw
[17:49:44] <BBB> oh, ok, that's what I do too
[17:49:49] <BBB> just without the macro
[17:49:52] <Dark_Shikari> SBUTTERFLY just does the swap for you
[17:49:57] <Dark_Shikari> so you don't have to track the registers
[17:50:04] <Dark_Shikari> i.e. it outputs to its inputs
[17:50:17] <BBB> but it needs a temp reg right?
[17:50:17] <Dark_Shikari> this gets very important in say a transpose
[17:50:25] <Dark_Shikari> which ends up with a dozen or two dozen butterflies
[17:50:31] <Dark_Shikari> yes, the third argument is the temp reg
[17:50:35] <BBB> ah, of course
[17:50:36] <Dark_Shikari> SBUTTERFLY bw, 1, 4, 7
[17:50:36] <Dark_Shikari> SBUTTERFLY bw, 2, 5, 7
[17:50:36] <Dark_Shikari> SBUTTERFLY bw, 3, 6, 7
[17:50:40] <Dark_Shikari> "bw" is the size.
[17:50:49] <BBB> got it
[17:51:00] <BBB> I'm not really using macros yet
[17:51:15] <BBB> yesterday I rewrote my function (after decreasing loads to 1) to a macro for first 2px and second 2px
[17:51:16] <Dark_Shikari> x264 has an x86util.asm file
[17:51:23] <Dark_Shikari> which contains macros you can use
[17:51:29] <BBB> then I moved to using pshufw
[17:51:31] <Dark_Shikari> this is in ffmpeg too iirc
[17:51:34] <BBB> it is
[17:51:38] <BBB> SBUTTERFLY is there?
[17:51:41] <Dark_Shikari> Yes, I think so.
[17:51:45] <Dark_Shikari> it's used for the transposes.
[17:52:34] <Dark_Shikari> see lines 68-79 of x86util for why sbutterfly is kinda important
[17:52:48] <Dark_Shikari> in x264, at least. the ffmpeg one might be a bit older.
[17:53:36] <BBB> I see it, hard to keep track
[17:53:54] <BBB> I don't really have a clear butterfly "pattern" here, so I won't use it for now, but I'll keep it in mind
[17:55:11] <BBB> btw that mc-a2.asm thing, pmaddusbw is awesome but I don't have it ;)
[17:55:38] <Dark_Shikari> This is why ssse3 is so great for MC
[17:55:46] <CIA-98> ffmpeg: mstorsjo * r23643 /trunk/libavformat/rtsp.c:
[17:55:46] <CIA-98> ffmpeg: RTSP: Clean up rtsp_hd on failure
[17:55:46] <CIA-98> ffmpeg: Since rtsp_hd isn't assigned to rt->rtsp_hd until after the setup phase,
[17:55:46] <CIA-98> ffmpeg: the initialized URLContext could be leaked on failures.
[17:55:51] <BBB> you'll have to buy me a new cpu for that
[17:56:44] <Dark_Shikari> We can do that.
[17:56:50] <Dark_Shikari> Or just give you an SSH connection to someone who has one.
[17:57:58] <BBB> it has to scratch my itch
[17:58:04] <BBB> just doing it for someone else isn't very useful
[17:58:05] <BBB> :-p
[17:58:19] <lu_zero> eh eh
[17:58:33] <BBB> http://store.apple.com/us/browse/home/shop_mac/family/macbook_pro?mco=MTAyN… <- does that one have ssse3?
[17:58:43] <BBB> if so, please buy me one with some extra features
[17:58:48] <av500> it has itunes!
[17:59:10] <Honoome> I think the c2d has ssse3 yeah
[17:59:31] <lu_zero> pfff
[17:59:33] * Honoome found that out when trying to run lu_zero's mplayer static binary
[17:59:45] <lu_zero> sorry...
[17:59:46] <Honoome> on a system that lacks ssse3 that is
[18:00:04] <mru> maybe I should set up the old c2q as ffmpeg dev system
[18:00:09] <mru> along with the g4
[18:00:12] <Honoome> lu_zero: don't worry, I'll give you an sse4.2-compiled feng next time ;)
[18:00:36] <lu_zero> not a problem if qemu or valgrind could run it
[18:00:49] * lu_zero is thinking about updating his laptop anyway
[18:00:58] <Honoome> valgrind has trouble with sse4.1
[18:01:10] <lu_zero> sigh
[18:01:15] <BBB> I like the luxury 15" one, it's only $2200
[18:01:15] <mru> lu_zero: the sony z is nice
[18:01:15] <Honoome> I would hope I'll never have to use valgrind on _this_ system
[18:01:20] <BBB> with some extra features probably $2500
[18:01:22] <BBB> not too bad
[18:01:28] <mru> 13" 1920x1080
[18:01:35] <Dark_Shikari> c2ds are cheap
[18:01:37] <Dark_Shikari> they're like $100
[18:01:55] * Honoome feels quite at home with the dell e6510
[18:02:17] <Honoome> beside the touchpad, and the fact I forgot _again_ to drop the governor to powersave when running battery
[18:02:26] <Honoome> I'm a lousy laptop user
[18:03:35] <BBB> hmm....
[18:03:38] <BBB> $2800
[18:03:39] <BBB> that's a lot
[18:03:46] <BBB> but it's ok, I figured it'd be up to $3k
[18:04:05] <BBB> anyone wanna buy me one?
[18:04:11] <siretart> lu_zero: source yes, but it hasn't built yet
[18:04:28] <Dark_Shikari> BBB: how about a $100 core 2
[18:04:38] <BBB> what do I do with it?
[18:04:39] <Dark_Shikari> as opposed to a $3k one
[18:04:44] <Dark_Shikari> you put it in a $50 board
[18:04:49] <BBB> I don't have a $50 board
[18:04:55] <Dark_Shikari> you buy one
[18:05:02] <BBB> and do what with it? :-p
[18:05:06] <mru> Dark_Shikari: I think he wants a laptop
[18:05:09] <Dark_Shikari> plug it into your power supply
[18:05:09] <BBB> you don't get it, I don't have a desktop
[18:05:12] <Dark_Shikari> mru: I want a pony
[18:05:15] <Dark_Shikari> BBB: you buy a $30 power supply
[18:05:16] <Dark_Shikari> a $20 case
[18:05:31] <mru> Dark_Shikari: then you curse the thing for as long as it runs
[18:05:33] <BBB> where do I put it? I live in a friggin' manhattan-style shoebox-size appartment
[18:05:37] <BBB> I have no space for a desktop
[18:05:40] <BBB> my wife will kill me
[18:05:49] <Dark_Shikari> get a mini-atx
[18:05:54] <BBB> plus I can't carry it around with me
[18:05:54] <lu_zero> siretart: updating from 9. to 10.4 is sloooow...
[18:06:18] <enkidu> BBB: you can try barebone with lcd
[18:06:20] <Dark_Shikari> mru: and who cares?
[18:06:27] <Dark_Shikari> one day of BBB's time is worth more than a core 2 box
[18:06:31] <Honoome> lu_zero: please tell me we're not going to use ubuntu next week...
[18:06:31] <mru> I do, if I'm the one cursing
[18:08:24] <BBB> they're about the same
[18:08:34] <BBB> if you work 8 hrs for $250/hr, that's $2k/day
[18:08:43] <BBB> isn't that wha t desktop costs nowadays?
[18:08:49] <enkidu> it is
[18:08:50] * BBB has no clue about desktop prices
[18:09:02] <BBB> don't forget taxes on incoem
[18:09:12] <enkidu> I bought my netbook for $150
[18:09:15] <Dark_Shikari> you're not worth $250/hr
[18:09:27] <BBB> probably not
[18:09:47] <BBB> but for the few things I do, I get close to that
[18:10:05] <Honoome> more like â¬30/hr :|
[18:10:35] <BBB> actually that's not true, you're right, I get less
[18:10:43] <BBB> anyway
[18:10:49] <Vitor1001> BBB: I saw sixtap_filter is symmetric. Can you replace a load by a shuffle?
[18:11:21] <BBB> Vitor1001: unfortunately no
[18:11:37] <BBB> Vitor1001: it's symmetric across "mx boundary", not within
[18:11:51] <BBB> mx is constant within one function call
[18:12:09] <siretart> lu_zero: depends on the hardware, but in general, yes
[18:12:22] <Vitor1001> Ok, I see.
[18:15:05] * lu_zero murmurs something about having gentoo on the same hw taking the same time...
[18:15:45] <lu_zero> hopefully the 10.10 will be leaner
[18:18:52] <mru> no, but by then computers will be even faster
[18:30:11] <lu_zero> uff
[18:30:22] <lu_zero> is _still_ updating...
[18:31:12] <mru> what are you doing?
[18:57:58] <Dark_Shikari> pengvado: crazy idea for a lossless format.
[18:58:07] <Dark_Shikari> Every single code is 1 byte.
[18:58:14] <Dark_Shikari> Each byte code maps to a variable number of _pixels_
[18:58:33] <Dark_Shikari> This number of pixels is <= WORD_SIZE, so they can be branchlessly written in one unaligned store.
[18:59:03] <Dark_Shikari> so you write a 32-bit code to the bitstream, and increment the pointer by a variable amount from your code table. No branches.
[18:59:40] <Dark_Shikari> Escape codes are simple: a table lookup results in a zero number of bytes written and the next byte is a raw pixel.
[18:59:47] <Dark_Shikari> (next byte from the bitstream)
[19:00:00] <Dark_Shikari> 2-byte codes would allow eliminating escapes, but it'd exceed L1 cache.
[19:00:16] <Dark_Shikari> This system would allow 100% branchless decoding with the exception of escapes.
[19:00:41] <Dark_Shikari> my question is how you optimize such a code table.
[19:00:57] <mru> sounds like some kind of VQ
[19:01:04] <Dark_Shikari> normally you choose code lengths to match probabilities -- but here you want to choose pixel lengths to make probabilities equal
[19:01:19] <Dark_Shikari> huffyuv already does VQ -- two pixels per code iirc
[19:01:29] <Dark_Shikari> this is constant-code-length VQ
[19:06:44] <BBB> is there like a "word-splat" instruction for mmx/mmxext? to take one (or just the lowest) word of a mm register and splat it over a target register?
[19:06:55] <astrange> huffyuv uses one vlc for every pixel channel
[19:07:12] <astrange> the ffmpeg decoder just uses a joint vlc table so it can read more than one at once
[19:10:34] <Dark_Shikari> BBB: pshufw
[19:10:39] <BBB> oh of course
[19:10:41] <BBB> duh
[19:30:25] <BBB> Dark_Shikari: http://ffmpeg.pastebin.com/7dxa6qM2 is that any good?
[19:30:49] <BBB> actually the 4tap v4 was easy
[19:31:37] <Dark_Shikari> yeah, it's suppoesd to be
[19:31:47] <Dark_Shikari> Oh, you do that trick to avoid repeating row loads
[19:31:48] <Dark_Shikari> good idea
[19:32:05] <Dark_Shikari> wait what's with the splatting of coefs
[19:32:23] <Dark_Shikari> you're repeating that in every row
[19:32:24] <Dark_Shikari> what a waste
[19:32:44] <Dark_Shikari> you would be better pmullw'ing with memory
[19:33:12] <mru> can sse multiply by one element from a vector?
[19:33:17] <Dark_Shikari> no
[19:33:37] <mru> I find that useful
[19:33:51] <Dark_Shikari> BBB: keep in mind in x86, memory loads are FREE as long as the memory unit is not saturated and there's no risk of a cache miss
[19:33:56] <mru> keeping coeffs in a single reg
[19:34:09] <BBB> removing the loads makes it faster though
[19:34:11] <BBB> so it's not free
[19:34:13] <Dark_Shikari> thus, it's better to pmullw against memory than to add actual new ops
[19:34:16] <BBB> (I tested that for the h4, not the v4)
[19:34:17] <Dark_Shikari> Which loads?
[19:34:20] <mru> Dark_Shikari: same is true on cortex-a8
[19:34:22] <Dark_Shikari> The pixel loads?
[19:34:25] <BBB> yes
[19:34:27] <mru> but usually you end up being memory-bound
[19:34:29] <BBB> or you mean the coeff load?
[19:34:31] <Dark_Shikari> Yes, that's because those can be cache misses
[19:34:34] <Dark_Shikari> Because those can cross a cacheline
[19:34:46] <Dark_Shikari> An aligned load off a global constant will never cross a cache line
[19:35:01] <Dark_Shikari> it is better to pmullw against memory than to pshufw to create the multiplication factors
[19:35:09] <mru> no aligned load will cross a cache line
[19:35:11] <mru> for obvious reasons
[19:35:12] <Dark_Shikari> exactly
[19:35:20] <Dark_Shikari> Also, I strongly suggest you pipeline things
[19:35:25] <Dark_Shikari> that is, place all the pmullws next to each other
[19:35:27] <Dark_Shikari> this is better for OOE
[19:35:42] <mru> better than mixing with load/store?
[19:35:43] <Dark_Shikari> and readability
[19:35:44] <BBB> I will pipeline after the function itself is satisfactory ;)
[19:36:03] <Dark_Shikari> mru: load/store doesn't use arithmetic units, so it doesn't matter
[19:36:14] <Dark_Shikari> what matters is getting the cpu to use p1 for multiply first
[19:36:22] <Dark_Shikari> modern intel chips have three alus, p0 p1 and p5
[19:36:33] <Dark_Shikari> adds can use all three, shuffles can use p0 (p0 and p5 on nehalem)
[19:36:35] <mru> separate address gen unit?
[19:36:36] <Dark_Shikari> multiply can only use p1
[19:36:38] <Dark_Shikari> yes
[19:36:47] <Dark_Shikari> p1 also does float stuff
[19:36:54] <Dark_Shikari> p1 is generally _horribly_ underused in most integer code
[19:37:02] <Dark_Shikari> because it can only be used by moves, shifts, and multiplies, iirc
[19:37:15] <Dark_Shikari> But, when OOE is selecting which execution unit to use for, say, an add
[19:37:17] <Dark_Shikari> it isn't smart
[19:37:20] <Dark_Shikari> it will just pick the first avaialble one
[19:37:23] <pengvado> Dark_Shikari: int8 codes for int32 pixel-blocks puts an upper bound of 4 on the compression ratio
[19:37:26] <pengvado> this is very bad
[19:37:28] <Dark_Shikari> So you want to get p1 used up by multiply as soon as possible
[19:37:32] <Dark_Shikari> pengvado: I meant for something huffyuv-like.
[19:37:38] <Dark_Shikari> Not for, say, ffv2.
[19:37:54] <pengvado> in huffyuv, more than 1/2 of all samples are 0
[19:38:06] <Dark_Shikari> huffyuv rarely gets more than 2-2.5x compression
[19:38:33] <Dark_Shikari> even on easy stuff like anime
[19:38:42] <pengvado> that doesn't mean it doesn't suffer when you double the bitrate of the low residual sections
[19:38:49] <BBB> Dark_Shikari: so how do I multiply by a mem constant? you think I should create a RODATA with the 4x repeated coeffs?
[19:38:54] <BBB> that sounds wasteful
[19:38:56] <Dark_Shikari> pengvado: what's the cap on huffyuv?
[19:39:01] <Dark_Shikari> BBB: yes
[19:39:06] <pengvado> 8x. 1 bit per sample.
[19:39:18] <Dark_Shikari> pengvado: so if we used WORD_SIZE=4, that would be the same limit as huffyuv
[19:39:21] <Dark_Shikari> er, =8
[19:40:07] <mru> Dark_Shikari: your constant-code-length vq should be very fast to decode
[19:40:12] <Dark_Shikari> mru: that's the idea
[19:40:20] <mru> did you intend it to be lossless or lossy?
[19:40:23] <Dark_Shikari> lossless
[19:40:30] <mru> why not make a lossy version?
[19:40:36] <Dark_Shikari> that would be interesting
[19:40:48] <pengvado> lossy version is called CYUV
[19:40:52] <Dark_Shikari> CYUV?
[19:41:01] <pengvado> though that's not VQ
[19:41:08] <pengvado> just ADPCM for video
[19:41:18] <Dark_Shikari> ah lol
[19:41:22] <Dark_Shikari> Actually -- if it was lossy, you could completely eliminate the escape codes.
[19:41:34] <Dark_Shikari> you could allocate, say, half the table for common combinations of pixels
[19:41:38] <Dark_Shikari> and the other half for _quantized_ single pixels
[19:41:51] <Dark_Shikari> or whatever combination is RD-wise the best
[19:41:58] <Dark_Shikari> then you could have the entire decoder 100% branchless like cyuv
[19:42:09] <pengvado> thing is, I suspect it would be slower to decode than JPEG since it would require much higher bitrate per quality
[19:42:25] <Dark_Shikari> But decoding would be vastly simpler
[19:42:52] <Dark_Shikari> But hmm. Might be right on that.
[19:42:54] <Dark_Shikari> though idct is slow
[19:43:00] <pengvado> so use hadamard instead
[19:43:17] <Dark_Shikari> hadamard works for real compression?
[19:43:29] <pengvado> or ihct
[19:43:33] <Dark_Shikari> yeah.
[19:43:43] <mru> whatever h264 uses is fast
[19:43:47] <mru> the transform
[19:44:01] <Dark_Shikari> hct
[19:44:04] <Dark_Shikari> h264 cosine transform
[19:44:10] <Dark_Shikari> anyways I think this would be more interesting for lossless
[19:45:02] <Dark_Shikari> I just don't know how to optimize such a table, that's the problem
[19:45:15] <Dark_Shikari> it seems like it shouldn't be too bad, it's the inverse of huffman
[19:46:06] <pengvado> CABGT is the inverse of huffman
[19:46:56] <Dark_Shikari> why's that?
[19:47:23] <Dark_Shikari> I would think the opposite of variable-length codes containing constant amounts of information is constant-length codes containing variable amounts of information
[19:47:57] <pengvado> CABGT literally uses a reverse huffman coder. i.e. a vlc reader in the encoder and a vlc writer in the decoder.
[19:48:07] <Dark_Shikari> lol
[19:48:15] <Dark_Shikari> well so that's another way of having a "reverse"
[19:49:11] <Dark_Shikari> BBB: fyi, probably the best way to do the hv positions is to generate h data and v-filter it
[19:49:24] <pengvado> problem is that the optimal fixed length code containing a variable amont of information must assign one and only one token to the prefix of any data stream
[19:49:30] <BBB> you mean for h&&v subpel?
[19:49:32] <Dark_Shikari> yes
[19:49:46] <Dark_Shikari> pengvado: explain?
[19:49:47] <pengvado> (which corresponds to the constraint that huffman must uniquely decode any bitstream)
[19:49:51] <BBB> I think I was just going to write a quick wrapper that calls my hxvy functions ;)
[19:50:03] <Dark_Shikari> BBB: ?
[19:50:43] <BBB> just place a temp buffer of 9x4 pixels (?) on the stack and use it by calling the v-only and h-only functions
[19:50:58] <BBB> is that bad?
[19:51:16] <Dark_Shikari> oh you'll call H with a height of whatever
[19:51:18] <Dark_Shikari> and then V-filter it
[19:51:22] <Dark_Shikari> ok, that works.
[19:51:33] <Dark_Shikari> when I did this for h264 I wrote it all in asm.
[19:51:44] <BBB> hmm... yes but you are hardcore
[19:51:54] <Dark_Shikari> It also sucked my soul out.
[19:51:58] <pengvado> if you have 256 codes, and some of them are multiple pixels, you can't handle all possible pairs of pixels (let alone larger tuples). and most ways of handling subsets of possible pixel pairs leaves redundancy in the bitstream unless you have a DFA switching between lots of tables.
[19:52:04] * mru points at neon qpel code
[19:52:11] <Dark_Shikari> yeah, mru did it too
[19:52:16] <Dark_Shikari> and pengvado
[19:52:30] <BBB> mru: you may write the neon function to be totally awesome
[19:52:39] <BBB> in fact, maybe you can teach me neon and I'll test it on my iphone
[19:52:40] <mru> that's the most monstrous piece of asm I've ever written
[19:52:44] <Dark_Shikari> mru: same here
[19:52:51] <mru> ~1000 lines of intertwined functions
[19:52:52] <Dark_Shikari> the x86 version was the most monstrous for me
[19:52:58] <Dark_Shikari> of anything
[19:53:04] <Dark_Shikari> pengvado: wait, explain why it can't be optimal?
[19:53:18] <Dark_Shikari> Oh, you mean the fact that
[19:53:21] <Dark_Shikari> suppose I have "0 100"
[19:53:26] <Dark_Shikari> I won't have a code, so I need a code for "0"
[19:53:32] <Dark_Shikari> But I'll also have a code for "0 0 0"
[19:53:35] <BBB> yeah see, I'm not looking forward to writing 1000 lines of asm code just for fun while I just wrote my first asm like yesterday
[19:53:54] <Dark_Shikari> BBB: this is a reasonable approach
[19:53:58] <Dark_Shikari> anyways, after this, do 8x8x
[19:54:00] <Dark_Shikari> or sse
[19:54:01] <BBB> it'll most likely not work and I'll pull my hair out figuring out why the h#ll ;)
[19:54:09] <Dark_Shikari> imo let's do 4x4hv first (your wrapper)
[19:54:13] <pengvado> right, so after coding "0", you either switch to another table that doesn't support anything starting with "0", or you waste bits.
[19:54:15] <Dark_Shikari> then do 4x4 sse (so you can get the hang of that)
[19:54:19] <BBB> I'll finish 4x4 first
[19:54:30] <BBB> didn't you say 8x8 was just a wrapper around 4x 4x4?
[19:54:32] <Dark_Shikari> pengvado: ouch
[19:54:40] <Dark_Shikari> BBB: not optimally
[19:54:49] <mchinen> wow you guys are hardcore
[19:54:51] <BBB> suboptimally :-p
[19:54:57] <mchinen> does everyone here write demuxers in asm?
[19:55:00] <Dark_Shikari> BBB: I would say the optimal way to do it is
[19:55:06] <mru> mchinen: no, we don't do demuxers in asm
[19:55:06] <BBB> mchinen: no, that's a waste of time
[19:55:10] <Dark_Shikari> 1) mmx is width 4. w8 and w16 call it.
[19:55:14] <Dark_Shikari> 2) sse is width 8. w16 calls it.
[19:55:21] <Dark_Shikari> 3) ssse3 is width 8 and width 16. no wrappers.
[19:55:26] <mru> demux doesn't even show up on profile charts
[19:55:38] <Dark_Shikari> unless it's ogg?
[19:55:42] <mru> not even that
[19:55:57] <BBB> mchinen: did you talk to baptiste already?
[19:56:06] <mchinen> BBB: no, not yet
[19:56:15] <BBB> hmm...
[19:56:19] <BBB> did you ping him?
[19:56:34] <Dark_Shikari> BBB: do you get, now that you've written it, why width8 would probably not be worth writing in mmx?
[19:57:03] <mchinen> BBB: no, i just mailed him
[19:57:07] <BBB> yeah, it would just be a double-version of what I just had, because there's probably not a very much more optimal way to write it
[19:57:16] <Dark_Shikari> Yeah
[19:57:25] <BBB> although I guess I could write the final 8 bytes/row all at once
[19:57:28] <BBB> but that would save one call
[19:57:48] <BBB> but I'd lose a register holding the first 4bytes
[19:57:52] <BBB> so it would suck anyway
[19:57:53] <Dark_Shikari> yeah
[19:58:04] <BBB> hmk
[19:58:34] <BBB> I'll finish the 4x4 v modes, look briefly into making hv mix functions and then I'll go for sse1/2/whatever in 4x4 and 8x8
[19:58:45] <BBB> did I mention pshufw is awesome?
[19:58:51] <Dark_Shikari> Wait until you get to play with pshufb.
[19:59:06] <Dark_Shikari> It's almost unfun
[19:59:07] <Dark_Shikari> easy mode
[19:59:16] <BBB> pshufb is... sse2? or ssse3?
[19:59:25] <Dark_Shikari> ssse3
[19:59:35] <BBB> yeah not gonna happen, my crappy cpu doesn't love me
[19:59:46] <BBB> I'm getting my new laptop in a couple of months
[20:00:07] <Dark_Shikari> when what happens? it's not like there's new tech coming out in 3 months
[20:01:00] <BBB> present :-p
[20:02:04] <BBB> I've got a 1/3ed gift cert, my work is paying 1/3rd and the last 1/3rd I'll get from my parents as a graduation gift once my PhD is done, = shiny new laptop that I just pointed out
[20:02:50] <Dark_Shikari> why not use ssh in the meantime
[20:07:52] <lu_zero> mru: preparing a foolproof setup
[20:08:34] <CIA-98> ffmpeg: fenrir * r23644 /trunk/ (4 files in 2 dirs):
[20:08:34] <CIA-98> ffmpeg: MPEG-2 DXVA2 implementation
[20:08:34] <CIA-98> ffmpeg: It allows VLD MPEG-2 decoding using DXVA2 (GPU assisted decoding API under
[20:08:34] <CIA-98> ffmpeg: VISTA and Windows 7).
[20:08:34] <CIA-98> ffmpeg: It is implemented by using AVHWAccel API.
[20:08:35] <lu_zero> BBB: yet another section for the foundation site
[20:09:59] <lu_zero> "feed us with hw"
[20:10:23] * _av500_ feeds lu_zero with obsolete TI EVMs
[20:13:06] <BBB> lu_zero: mplayerhq has that :-p
[20:13:21] <j-b> \o/ DxVA2 mpeg2
[20:13:28] <BBB> Dark_Shikari: to be able to more loudly make the point that I need hw :-p
[20:17:37] <mru> lu_zero: for every foolproof setup there is a new and improved fool
[20:20:22] <Dark_Shikari> GAH
[20:20:28] <Dark_Shikari> I hate it when I write an asm function that's 10 instructions shorter
[20:20:30] <Dark_Shikari> and is somehow not any faster
[20:21:22] <hyc> lol... that's OOE chips for you
[20:21:50] <hyc> and/or, you've hit a memory bandwidth limit
[20:23:06] <Dark_Shikari> or pinsrd just sucks
[20:28:15] <mru> or you suck :-)
[20:28:35] <Dark_Shikari> I think it's that pinsrd just sucks.
[20:36:53] <lu_zero> mru: I know
[20:37:21] <lu_zero> pinsrd?
[20:38:31] <Dark_Shikari> insert doubleword
[20:38:40] <Dark_Shikari> aka load 4 bytes into one of the four positions in a register
[20:38:57] <lu_zero> vector you mean
[20:39:04] <mru> same thing
[20:39:08] <Dark_Shikari> vector register :)
[20:39:17] <lu_zero> that =)
[20:39:21] <lu_zero> uhmm
[20:39:30] <Dark_Shikari> e.g. this kind of code that keeps showing up
[20:39:31] <Dark_Shikari> movd xmm4, [r1+FDEC_STRIDE*0-4]
[20:39:31] <Dark_Shikari> pinsrd xmm4, [r1+FDEC_STRIDE*1-4], 1
[20:39:31] <Dark_Shikari> pinsrd xmm4, [r1+FDEC_STRIDE*2-4], 2
[20:39:32] <Dark_Shikari> pinsrd xmm4, [r1+FDEC_STRIDE*3-4], 3
[20:39:50] <Dark_Shikari> aka "load 4 rows of 4 bytes each from a strided array into this 16-byte register"
[20:39:55] <Dark_Shikari> aka "where is my scatter-gather load!!!"
[20:40:06] <lu_zero> ugh
[20:40:35] <mru> I already told you why scatter-load is hard
[20:40:44] <Dark_Shikari> I know -- you need more L1 load units
[20:40:46] <lu_zero> that could be done using load+permute
[20:40:51] <Dark_Shikari> lu_zero: stride of the array is 32
[20:40:56] <mru> you could also end up with multiple tlb misses
[20:41:02] <lu_zero> meh
[20:41:04] <Dark_Shikari> mru: that's not the problem
[20:41:11] <Dark_Shikari> TLB misses are fine--all you have to do is serialize it whenever one occurs.
[20:41:17] <lu_zero> spu load+permute
[20:41:22] <mru> Dark_Shikari: requires more hardware
[20:41:27] <Dark_Shikari> Any "hard problem" caused by having scatter/gather load can be solved by serializing if the hard problem occurs
[20:41:34] <Dark_Shikari> it doesn't require more hardware to not do something.
[20:41:54] <mru> it requires hardware to detect it and issue the sequence of ops
[20:41:55] <ohsix> does in a cpu
[20:42:15] <Dark_Shikari> well of course, but the point is I think we could probably get some special-cased improved L1 bandwidth in at least some cases.
[20:42:20] <Dark_Shikari> even if "special-cased" means
[20:42:26] <lu_zero> meh...
[20:42:30] <Dark_Shikari> "only if it is in L1, no TLB miss, doesn't cross cachelines"
[20:42:36] <Dark_Shikari> "and is aligned"
[20:42:56] <Dark_Shikari> the current one-load-per-cycle kinda sucks
[20:43:20] <Dark_Shikari> btw, what's with stuff like DSPs that do have scatter/gather load?
[20:43:27] <mru> they don't
[20:43:30] <mru> never heard of one
[20:43:38] <Dark_Shikari> then what has it?
[20:43:41] <mru> nothing
[20:43:44] <Dark_Shikari> or is it just a theoretical capability that doesn't exist?
[20:43:57] <mru> it's something everybody wants but nobody has
[20:43:57] <ohsix> dsps have loads with strides and offsets for packing stuff up, don't they?
[20:44:18] <mru> although a dsp typically has builtin L1 sram
[20:44:20] <mru> non-cache
[20:44:29] <Dark_Shikari> Wikipedia says many DMA engines have it
[20:44:33] <Dark_Shikari> e.g. for Cell SPUs
[20:44:34] <mru> so part of the problem does go away there
[20:44:42] <lu_zero> Dark_Shikari: uhm
[20:44:43] <Dark_Shikari> but that's a bit different.
[20:44:50] <mru> dma engines operate sequentially
[20:44:55] <lu_zero> cell spu has explicit manipulation
[20:45:03] <lu_zero> but isn't the same thing
[20:45:28] <Dark_Shikari> mru: oh wow
[20:45:29] <Dark_Shikari> http://www.patents.com/Microprocessor-high-speed-memory-integrated-loadstor…
[20:45:34] <Dark_Shikari> issued just a few weeks ago
[20:45:35] <Dark_Shikari> lol
[20:45:59] <Dark_Shikari> Broadcom
[20:46:08] <lu_zero> next mips for your pleasure
[20:46:22] <mru> not mips
[20:46:24] <Dark_Shikari> So it seems _someone_ does care.
[20:46:29] <mru> some other part of a bcm chip
[20:46:32] <Dark_Shikari> Cares enough to patent it.
[20:47:14] <lu_zero> Dark_Shikari: btw what you wanted to archive there?
[20:47:44] <lu_zero> you just need that part or the further ones would be needed as well?
[20:49:01] <lu_zero> I wonder if the scatter-gather ops isn't that much considered just because is easier to add more registers or enlarge them
[20:49:54] <Dark_Shikari> adding more registers doesn't solve the problem of slow loads
[20:50:12] <Dark_Shikari> Hmm. This would make for an interesting CISC machine
[20:50:22] <Dark_Shikari> an instruction that does strided gather loading -- but internally, maps to a normal load unit.
[20:50:43] <Dark_Shikari> to decrease code size
[20:52:07] <lu_zero> uhmm
[20:52:12] <iive> Dark_Shikari: in that example above... are you sure the bottleneck is not that you are using same register?
[20:52:30] <Dark_Shikari> iive: I interleaved two of them
[20:52:45] <Dark_Shikari> I omitted the second for brevity
[20:52:50] <Dark_Shikari> And, here's some irony for you
[20:52:51] <lu_zero> you are still doing loads
[20:52:52] <Dark_Shikari> I just deinterleaved them
[20:52:54] <Dark_Shikari> and it got faster
[20:53:24] * lu_zero wonders why
[20:53:46] <Dark_Shikari> because of ordering
[20:53:56] <Dark_Shikari> I'm guessing core i7 tracks dependencies internally so there's no cost to deinterleaving
[20:53:57] <lu_zero> those internally _must_ be load+mask or load+perm
[20:54:05] <Dark_Shikari> so deinterleaving allowed us to reorder the loads
[20:54:08] <Dark_Shikari> and get one of the registers finished faster
[20:54:11] <Dark_Shikari> and start arith ops faster
[20:54:15] <Dark_Shikari> because I was loading two registers, xmm1 and xmm4
[20:54:21] <Dark_Shikari> one of which was being used immediately (xmm4)
[20:54:26] <Dark_Shikari> the other of which wasn't used until about 10 instructions later
[20:54:34] <Dark_Shikari> so by letting it postpone the latter loads, it could start doing work faster.
[20:55:00] <lu_zero> basically the i7 is doing even more work behind our backs
[20:55:22] <lu_zero> that's what I hate about x86
[20:55:57] <lu_zero> ops should be a _bit_ more predictable, even dumber
[20:56:32] <Dark_Shikari> well imo if they can make the cpu smart without increasing cycle time
[20:56:34] <Dark_Shikari> they should feel free.
[20:57:07] <lu_zero> Dark_Shikari: and that makes your code slower since some assuptions get broken
[20:57:19] <Dark_Shikari> not really.
[20:57:45] <lu_zero> that's probably fine with exotic stuff like this load+mask/perm hybrid
[20:58:04] <lu_zero> but for plain loads would be quite depressing
[21:16:12] <BBB> mchinen: if he doesn't respond by tonight, ping the email, I'll see if I can get to him this weekend
[21:17:12] * BBB is confused because his vertical sixtap filter takes less cycles than his fourtap filter
[21:18:03] <Dark_Shikari> BBB: that's because it requires less mungnig
[21:18:07] <Dark_Shikari> it's normal for vert to be faster
[21:18:15] <BBB> ehm
[21:18:16] <BBB> no
[21:18:20] <BBB> both are vertical
[21:18:23] <Dark_Shikari> oh
[21:18:25] <Dark_Shikari> sixtap vs WHAT?
[21:18:29] <BBB> the vertical sixtap is faster than the vertical fourtap
[21:18:32] <Dark_Shikari> er... how are you timing it?
[21:18:39] <BBB> START/STOP_TIMER
[21:18:44] <Dark_Shikari> What if the 4-tap is used on chroma only
[21:18:47] <Dark_Shikari> which is more likely to have cache misses?
[21:18:53] <Dark_Shikari> You have to time them doing the same thing
[21:18:58] <BBB> hm...
[21:18:58] <Dark_Shikari> i.e. make the 4-tap call the 6-tap instead
[21:18:59] <BBB> good point
[21:19:02] <BBB> ok
[21:19:06] <Dark_Shikari> Of course, that's not really an issue
[21:19:07] <BBB> will do that after I make it bitexact
[21:19:08] <Dark_Shikari> you dont have to do that
[21:19:10] <BBB> it doesn't work yet ;)
[21:19:12] <Dark_Shikari> you _know_ the 4-tap is faster than 6-tap
[21:19:25] <Dark_Shikari> so you don't need to compare two different functions
[21:19:26] <BBB> well yeah it has less instructions and less mem accesses
[21:19:30] <Dark_Shikari> you compare different versions of the same function
[21:25:30] <Dark_Shikari> http://www.linuxfordevices.com/c/a/News/Avalue-EPIQM57/?kc=rss hmm this is rather cool
[21:25:39] <Dark_Shikari> 18-watt TDP for a whole core i7 system
[21:26:03] <kierank> not bad at all
[21:27:59] <iive> 18W is just the cpu
[21:28:12] <Dark_Shikari> oh, true. it seems they aren't counting the board
[21:28:14] <Dark_Shikari> but it's a small board.
[21:28:24] <Dark_Shikari> still, 18 watt tdp is really low
[21:29:40] <iive> they also don't say if it is idle or under load... typical is way too broad term.
[21:30:12] <mru> which i7 is that?
[21:30:22] <Dark_Shikari> mru: one of the low power ones
[21:30:24] <iive> some mobiles.
[21:30:25] <Dark_Shikari> dual core i7
[21:30:38] <mru> the 9xx are >100W TDP...
[21:30:44] <Dark_Shikari> runs at 1ghz or so with turbo boost to 2ghz
[21:30:50] <iive> 620UE
[21:30:51] <Dark_Shikari> i.e. 2 cores at 1ghz or 1 core at 2ghz
[21:31:04] <Dark_Shikari> or something like that
[21:31:24] <BBB> vertical sixtap, almost 5x faster
[21:31:25] <BBB> \o/
[21:31:36] <Dark_Shikari> :)
[21:31:38] <Dark_Shikari> and it works?
[21:31:41] <BBB> yeah
[21:31:52] <Dark_Shikari> pastebin?
[21:32:22] <BBB> http://ffmpeg.pastebin.com/LYhUKGUi
[21:32:31] <BBB> the start is a little ugly
[21:33:15] <BBB> but saving 5 pixels + 1 cache + 1 for the coeffs leaves little arith space
[21:33:55] <BBB> hm, the comment for the last tap is wrong
[21:34:01] <Dark_Shikari> I thought you would get rid of the splats
[21:34:23] <BBB> oh yeah I didn't do that yet :-p
[21:34:45] <Dark_Shikari> and the redundant pxor -- that should be a globally kept zero
[21:34:54] <Dark_Shikari> imul r4,3 --> no, use an lea
[21:35:02] <Dark_Shikari> lea r4, [r4*3]
[21:35:16] <Dark_Shikari> I don't see the point of line 5.
[21:35:44] <Dark_Shikari> that can all go into the addressing.
[21:37:06] <iive> r4*2+r4 ?
[21:37:15] <Dark_Shikari> r4*3 is fine in yasm syntax
[21:37:22] <iive> or the macro takes care of that?
[21:37:32] <Dark_Shikari> yasm takes care of it
[21:37:39] <iive> oh, yasm iself.
[21:39:47] <BBB> Dark_Shikari: it's r4*6
[21:39:52] <BBB> Dark_Shikari: yasm didn't eat it when I tried
[21:40:03] <BBB> I think I tried r4*12 though
[21:40:12] <BBB> I thought you could only do 1, 2, 4 or 8
[21:40:51] <iive> BBB: yasm turns r4*3 into op containing r4*2+r4, that's what i asked above.
[21:41:19] <BBB> libavcodec/x86/vp8dsp.asm:206: error: invalid effective address
[21:41:21] <BBB> for r4*6
[21:41:30] <iive> of course...
[21:41:53] <iive> you can do 5, with (r4*4+r4)
[21:42:08] <Dark_Shikari> BBB: I said r4*3
[21:42:10] <Dark_Shikari> not r4*6
[21:42:11] <BBB> Dark_Shikari: I'm out of registries for the global zero, or do you mean a regular r%d registry?
[21:42:20] <Dark_Shikari> BBB: no, you have one extra, because you saved mm7
[21:42:28] <BBB> mm7 is the coeffs
[21:42:32] <Dark_Shikari> Which you're saving.
[21:42:36] <Dark_Shikari> Because you're turning it into memory.
[21:42:37] <BBB> I'll try :-p
[21:42:42] <Dark_Shikari> to eliminate pshufw
[21:42:44] <BBB> yes sir!
[21:43:05] <Dark_Shikari> remember it can do one load per cycle
[21:43:16] <Dark_Shikari> so it won't cost anything.
[21:43:21] <BBB> movd = one load?
[21:43:22] <Dark_Shikari> well, or at least less than the pshufw.
[21:43:25] <BBB> or a byte is one load?
[21:43:30] <Dark_Shikari> movd is one load
[21:43:31] <Dark_Shikari> mov is one load
[21:43:33] <Dark_Shikari> movq is one loa
[21:43:34] <Dark_Shikari> *load
[21:43:37] <Dark_Shikari> movdqa is one load
[21:43:41] <Dark_Shikari> blah X, [mem] is one load
[21:43:58] <BBB> ok, when I tested it was slower, but I'll test again
[21:44:48] * BBB goes home for now
[21:44:50] <BBB> this is fun
[21:44:52] <Dark_Shikari> pastebin it when you test it
[21:44:56] <Dark_Shikari> so I know you're doing it right
[21:45:05] <BBB> in the weekend, I will
[21:45:11] <BBB> and then we'll do sse/sse2
[21:45:22] <BBB> I don't need to care about sse right? just sse2?
[21:45:25] <BBB> or is there sse-only cpus?
[21:45:38] <Compn> a ton of athlons are sse (no sse2)
[21:45:53] <BBB> amd isn't paying me, so screw them
[21:45:54] <Dark_Shikari> sse1 is float only
[21:45:57] <Dark_Shikari> you don't care about sse1
[21:45:59] <BBB> ok
[21:46:01] <Dark_Shikari> pentium 3 is sse-only
[21:46:04] <BBB> sse2 it is then
[21:46:06] <Compn> like athlon 600mhz - 1.5 ghz or so, i think
[21:46:12] <Compn> oh yeah pentiums
[21:46:34] <iive> of course pentiums, they practically invented it :P
[21:47:28] <iive> and I don't think there is athlon 600MHz that have sse1, they have mmx-ext (mmx2) but athlon XP was the first to have sse, and that was way above 1ghz
[21:50:25] <Compn> ah
[23:18:11] <CIA-98> ffmpeg: michael * r23645 /trunk/libavformat/raw.c:
[23:18:11] <CIA-98> ffmpeg: Improve h263_probe()
[23:18:11] <CIA-98> ffmpeg: Fixes issue2015
1
0
[00:28:49] <BBB> and we're back!
[00:28:54] <BBB> Dark_Shikari: ping :-p
[00:32:51] <Dark_Shikari> BBB: ok
[00:33:12] <Dark_Shikari> we've demonstrated how macros can allow us to template a function
[00:33:19] <Dark_Shikari> now we will demonstrate how macros allow us to simplify a function
[00:33:26] <BBB> ok
[00:33:42] <Dark_Shikari> line 144, dct-a.asm
[00:33:58] <BBB> add4x4_idct
[00:34:10] <Dark_Shikari> Isn't that function simple?
[00:34:15] <Dark_Shikari> make a zero
[00:34:17] <Dark_Shikari> load our dct coeffs
[00:34:20] <Dark_Shikari> IDCT_1D
[00:34:21] <Dark_Shikari> transpose
[00:34:25] <Dark_Shikari> add rounding factor
[00:34:26] <Dark_Shikari> IDCT_1D
[00:34:28] <Dark_Shikari> STORE_DIFF
[00:34:58] <BBB> there's an unused label skip_prologue
[00:35:10] <BBB> I'm sure these macros do a lot of weird stuff :)
[00:35:19] <Dark_Shikari> skip_prologue is used elsewhere
[00:35:32] <Dark_Shikari> it lets you call that function without the init part
[00:35:35] <Dark_Shikari> this is used in all the idcts
[00:35:41] <Dark_Shikari> so suppose you have an 8x8 idct that does 4 4x4 idcts
[00:35:42] <BBB> oh ok
[00:35:49] <Dark_Shikari> you call "add4x4_idct_mmx.skip_prologue"
[00:36:05] <Dark_Shikari> thus you skip the initialization
[00:36:10] <Dark_Shikari> whether it be push push push, xor, or whatever
[00:36:18] <Dark_Shikari> and so you call that 4 times.
[00:36:19] <BBB> hmm...
[00:36:20] <BBB> interesting
[00:36:31] <Dark_Shikari> so as you can see here
[00:36:35] <Dark_Shikari> we've wrapped up the complexity in these macros
[00:36:38] <Dark_Shikari> some of them internally do SWAPs
[00:36:40] <Dark_Shikari> we don't care
[00:36:41] <Dark_Shikari> it handles it for us
[00:36:49] <Dark_Shikari> if we had to track the results of the swaps mentally, it would be hell
[00:36:59] <Dark_Shikari> and that's what it is for everyone else writing asm and not using x264asm.
[00:37:37] <Dark_Shikari> now, for the hardest and last bit of what I'll show you.
[00:37:45] <Dark_Shikari> line 263, sad-a.asm
[00:38:03] <BBB> call
[00:39:41] <BBB> is this what breaks up a NxN into 4 N/2xN/2 IDCTs?
[00:39:55] <Dark_Shikari> no
[00:40:04] <Dark_Shikari> er, are you in sad-a.asm?
[00:40:13] <BBB> oops, no
[00:40:14] <BBB> sorry
[00:40:17] <Dark_Shikari> But yes you're right
[00:40:18] <Dark_Shikari> That's what that does.
[00:40:24] <Dark_Shikari> in dct-a.asm :)
[00:40:30] <BBB> yeah, wrong file
[00:40:39] <BBB> intra_sad_x3_4x4
[00:40:45] <Dark_Shikari> so, you know about the 4x4 DC prediction function.
[00:40:47] <BBB> 3 function args, 3 registers
[00:40:47] <Dark_Shikari> We just did that earlier.
[00:40:50] <Dark_Shikari> Right?
[00:40:51] <BBB> yes
[00:41:02] <Dark_Shikari> well there two other "simple" modes, H and V
[00:41:08] <Dark_Shikari> N A B C D
[00:41:10] <Dark_Shikari> E E E E E
[00:41:11] <Dark_Shikari> F F F F F
[00:41:13] <Dark_Shikari> G G G G G
[00:41:14] <Dark_Shikari> H H H H H
[00:41:17] <Dark_Shikari> that's V prediction
[00:41:21] <Dark_Shikari> set the Xs equal to the left side.
[00:41:27] <BBB> ok
[00:41:28] <Dark_Shikari> er, oops, that's H prediction, obviously
[00:41:30] <Dark_Shikari> horizontal
[00:41:33] <Dark_Shikari> V prediction is:
[00:41:35] <Dark_Shikari> N A B C D
[00:41:39] <Dark_Shikari> E A B C D
[00:41:41] <Dark_Shikari> F A B C D
[00:41:43] <Dark_Shikari> G A B C D
[00:41:45] <Dark_Shikari> H A B C D
[00:41:53] <Dark_Shikari> As you can see, both are very simple.
[00:42:02] <BBB> yeah, this is libavcodec's h264pred.c
[00:42:06] <Dark_Shikari> x264 has this function to perform a "merged SAD" on the three modes.
[00:42:13] <Dark_Shikari> That is, predict each one, SAD against source pixels
[00:42:16] <Dark_Shikari> and return the three SADs.
[00:42:24] <Dark_Shikari> Of course, it doesn't need to actually store the prediction, which is part of the gain here.
[00:42:39] <Dark_Shikari> So this function will calculate three SADs.
[00:42:53] <Dark_Shikari> The purpose of going through this function will be to get you to understand shuffles.
[00:43:06] <Dark_Shikari> a "shuffle" is any operation which does no arithmetic and only serves to reorder bytes.
[00:43:22] <Dark_Shikari> This may include arbitrary shuffles, interleaves, etc.
[00:43:39] <BBB> ok
[00:43:47] <Dark_Shikari> the first one to consider is punpck
[00:43:55] <Dark_Shikari> punpck(l|h) (bw|wd|dq|qdq)
[00:44:04] <BBB> (potential example: audio channel interleaving)
[00:44:06] <Dark_Shikari> this takes the (low|high) half of each of the input registers and interleaves by them
[00:44:12] <Dark_Shikari> s/by//
[00:44:16] <Dark_Shikari> it interleaves by:
[00:44:22] <Dark_Shikari> bw: bytes
[00:44:24] <Dark_Shikari> wd: words
[00:44:27] <Dark_Shikari> dq: doublewords
[00:44:30] <Dark_Shikari> qdq: quadwords
[00:44:35] <Dark_Shikari> bw == "bytes to words"
[00:44:36] <Dark_Shikari> i.e.
[00:44:50] <Dark_Shikari> punpcklbw ABCDEFGH, IJKLMNOP = AIBJCKDL
[00:44:53] <Dark_Shikari> got it?
[00:45:42] <BBB> I think so
[00:46:01] <Dark_Shikari> on those two inputs
[00:46:04] <Dark_Shikari> what does punpckhwd do?
[00:46:47] <BBB> EFMNGHOP?
[00:47:00] <Dark_Shikari> correct
[00:47:04] <Dark_Shikari> note qdq only applies for xmmregs
[00:47:10] <Dark_Shikari> it wouldn't make sense with 64-bit regs.
[00:47:19] <BBB> ok
[00:47:27] <BBB> so it's 2 mm or 2 xmm regs
[00:47:30] <BBB> it's never 1 mm and 1 xmm
[00:48:18] <Dark_Shikari> there are only two ops that work on mm and xmm
[00:48:23] <Dark_Shikari> movq2dq
[00:48:26] <Dark_Shikari> and movdq2q
[00:48:28] <Dark_Shikari> you can guess what those do ;)
[00:49:00] <BBB> :)
[00:49:47] <Dark_Shikari> so, the other shuffles:
[00:49:51] <Dark_Shikari> mmx shuffles:
[00:49:56] <Dark_Shikari> pshufw dst, src, mask
[00:50:01] <Dark_Shikari> the "mask" determines which words to put where
[00:50:16] <Dark_Shikari> each 2 bit chunk is the index from the source to use for that word in the destination
[00:50:29] <Dark_Shikari> e.g. "2" means to use dst[2]
[00:50:38] <Dark_Shikari> sse2 shuffles:
[00:50:46] <Dark_Shikari> pshufd dst, src, mask (same as mmx, but for 32-bit)
[00:51:00] <Dark_Shikari> pshuflw dst, src, mask (only shuffles low half, copies top half)
[00:51:02] <Dark_Shikari> pshufhw (you can guess)
[00:51:09] <Dark_Shikari> ssse3 shuffles:
[00:51:17] <Dark_Shikari> pshufb src, mask
[00:51:23] <Dark_Shikari> where mask is a 128-bit reg, and each byte contains an index
[00:51:29] <Dark_Shikari> i.e. completely arbitrary shuffle of the whole reg.
[00:51:35] <Dark_Shikari> You can see why this is awesome.
[00:51:45] <BBB> any byte can go anywhere in the dest
[00:52:04] <Dark_Shikari> Yup
[00:52:27] <Dark_Shikari> now that you get the shuffles, let's go do this function.
[00:52:34] <Dark_Shikari> in this function, we have a few goals.
[00:52:40] <Dark_Shikari> 1) Get the source pixels into two mmx registers
[00:52:50] <Dark_Shikari> we have 16 source pixels, so we want to get them into two mmx registers (8 bytes each)
[00:52:57] <Dark_Shikari> with this, it takes only two SADs to calculate the total SAD.
[00:53:15] <Dark_Shikari> 2) calculate the V prediction, put it into two mmx registers accordingly, and SAD.
[00:53:21] <Dark_Shikari> 3) calculate the H prediction, put it into two mmx registers, and SAD
[00:53:30] <Dark_Shikari> 4) calculate the DC prediction, splat it across an mmx register, SAD.
[00:53:32] <Dark_Shikari> 5) store the results.
[00:53:33] <Dark_Shikari> got it?
[00:53:41] <Dark_Shikari> this means we will need a total of 6 SADs.
[00:54:03] <BBB> I got it
[00:54:49] <Dark_Shikari> k, so, let's go through the function.
[00:54:53] <Dark_Shikari> First we zero mm7. We'll use this later.
[00:55:12] <BBB> in th efunction prototype, what is fenc and what is fdec?
[00:55:21] <Dark_Shikari> FENC = source pixels
[00:55:22] <BBB> res is storage of results of V,H,DC prediction SAD
[00:55:27] <Dark_Shikari> FDEC = reconstructed pixels
[00:55:36] <Dark_Shikari> thus, fdec contains the edge pixels for prediction
[00:55:42] <Dark_Shikari> fenc contains the source pixels we're going to compare against
[00:55:43] <Dark_Shikari> the SAD
[00:55:53] <BBB> ok
[00:55:57] <Dark_Shikari> so, by line 270, the following is the case:
[00:56:15] <Dark_Shikari> if our source pixels are numbered 0 to 15 in raster order
[00:56:21] <Dark_Shikari> mm1 contains 0...7
[00:56:23] <Dark_Shikari> mm2 contains 8...15
[00:56:32] <Dark_Shikari> mm0 contains ABCDABCD (from the chart before)
[00:56:38] <Dark_Shikari> Do you see why?
[00:56:54] <Dark_Shikari> btw, if at any point you don't know why a particular decision was made (even if it works), ask.
[00:58:17] <BBB> movd is a dword move, right?
[00:58:22] <BBB> so why does mm1 contain 8 bytes?
[00:58:38] <BBB> oh, the punpckldq
[00:58:39] <BBB> I see
[00:58:40] <Dark_Shikari> punpckldq mm1, [r0+FENC_STRIDE*1]
[00:58:45] <BBB> why don't you move 8 bytes at once?
[00:58:53] <Dark_Shikari> You can't.
[00:58:55] <BBB> movqu or so?
[00:58:56] <Dark_Shikari> the source is an array of stride 32
[00:59:03] <Dark_Shikari> er, actually, stride FENC_STRIDE
[00:59:06] <Dark_Shikari> and of width 4
[00:59:13] <Dark_Shikari> we're loading a 4x4 block of pixels, that is
[00:59:18] <Dark_Shikari> you can't move 8 at once if they're not adjacent.
[00:59:21] <BBB> ah, of course, stride!=width
[00:59:22] <BBB> ok
[00:59:31] <BBB> got it then
[01:01:01] <Dark_Shikari> so, next
[01:01:18] <Dark_Shikari> we back up mm0, the vertical prediction pixels, in mm6
[01:01:22] <Dark_Shikari> Because we're going to need these later.
[01:01:37] <Dark_Shikari> Then, because we don't want to overwrite the source pixels (we need those later too), we movq mm3, mm1
[01:01:43] <Dark_Shikari> then we do our two SADs for the vertical prediction
[01:01:45] <Dark_Shikari> add the results
[01:01:49] <Dark_Shikari> and move it out to [r2]
[01:01:55] <Dark_Shikari> And we're 1/3 done!
[01:01:57] <Dark_Shikari> got it?
[01:02:01] <BBB> yes
[01:03:17] <Dark_Shikari> ok, now the next two parts are interleaved
[01:03:21] <Dark_Shikari> so it may be slightly harder to follow
[01:03:31] <Dark_Shikari> now, we need EFGH
[01:03:33] <Dark_Shikari> But we have a problem.
[01:03:44] <Dark_Shikari> Each one is on a separate line.
[01:03:48] <Dark_Shikari> We can only load one byte at a time! This sucks.
[01:04:10] <Dark_Shikari> Furthermore, in addition to EFGH, we need EEEEFFFFGGGGHHH
[01:04:14] <Dark_Shikari> this is the H prediction we want to SAD against.
[01:04:23] <BBB> right
[01:04:30] <Dark_Shikari> So, now comes the swarm of punpck.
[01:05:01] <Dark_Shikari> Note... there is no SIMD load smaller than movd.
[01:05:07] <Dark_Shikari> so, in order to avoid crossing cacheline needlessly (this doesn't increase the number of unpacks necessary to get what we want), we load [src-4]
[01:05:10] <Dark_Shikari> not [src-1]
[01:05:26] <Dark_Shikari> so mm3 = _ _ _ E
[01:05:30] <Dark_Shikari> mm0 = _ _ _ F
[01:05:32] <BBB> so this loads BCDE, NNNF, NNNG etc
[01:05:44] <Dark_Shikari> no, NNNE
[01:05:52] <Dark_Shikari> after punpcklbw, we have:
[01:05:55] <Dark_Shikari> _ _ _ _ _ _ E F
[01:06:01] <Dark_Shikari> and _ _ _ _ _ _ G H
[01:06:21] <BBB> yes
[01:06:44] <Dark_Shikari> then we do it again with
[01:06:48] <Dark_Shikari> punpckhwd mm5, mm4
[01:06:54] <Dark_Shikari> giving us _ _ _ _ E F G H
[01:07:08] <Dark_Shikari> And -- oh wait -- that mm6 we saved comes back
[01:07:14] <Dark_Shikari> E F G H A B C D
[01:07:20] <Dark_Shikari> and that mm7 we zeroed comes back
[01:07:31] <Dark_Shikari> psadbw with mm7.... bam. A+B+C+D+E+F+G+H.
[01:08:04] <Dark_Shikari> got it so far?
[01:08:09] <BBB> yes
[01:08:24] <Dark_Shikari> by line 290, we have EEEEFFFFGGGGHHHH
[01:08:29] <Dark_Shikari> for the H prediction.
[01:08:33] <Dark_Shikari> see how that works?
[01:08:38] <Dark_Shikari> follow the punpcks.
[01:08:52] <BBB> yes
[01:09:10] <BBB> because you always do hxtoy, with x one bigger than in the previous call
[01:09:10] <Dark_Shikari> now, we need to do + 4 and >> 3
[01:09:14] <Dark_Shikari> Yeah
[01:09:22] <Dark_Shikari> But now, we want to do +4
[01:09:27] <Dark_Shikari> but crap, you can't have immediate constants in simd.
[01:09:34] <Dark_Shikari> And we'd rather not do a load from a memory constant.
[01:09:40] <Dark_Shikari> But we have a zero reg lying around.
[01:09:51] <Dark_Shikari> So there's another trick
[01:10:00] <Dark_Shikari> (A+4)>>3 is the same as ((A>>2)+1)>>1
[01:10:09] <Dark_Shikari> so we do psraw, then pavgw
[01:10:42] <BBB> ah, smart
[01:10:48] <Dark_Shikari> same number of instructions
[01:10:52] <Dark_Shikari> as add+shift
[01:10:55] <Dark_Shikari> but no constant needed.
[01:11:03] <Dark_Shikari> Now, with a quick punpck and pshufw with 0 (a splat)
[01:11:08] <Dark_Shikari> we have DC DC DC DC DC DC DC DC
[01:11:26] <Dark_Shikari> With 4 quick SADs, we have both our DC and H scores
[01:11:35] <Dark_Shikari> we add those up and store them
[01:11:42] <Dark_Shikari> and then we return.
[01:12:08] <Dark_Shikari> And we're done with the veritable storm of punpcks.
[01:12:48] <BBB> scary stuff to read through
[01:12:56] <BBB> but when you explain it makes some sense :)
[01:13:15] <Dark_Shikari> The first functions you write will be something like pixel_avg2
[01:13:15] * BBB forsees a lot of reading asm
[01:13:37] <Dark_Shikari> By the way, for an example of a function very similar to what you will be writing
[01:13:54] <Dark_Shikari> The hpel interpolation in mc-a2.asm is much like what you will be doing
[01:14:07] <Dark_Shikari> e.g. x[-2]*coeff1 + x[-1]*coeff2 + ... x[3]*coeff6
[01:14:11] <Dark_Shikari> + round >> shift
[01:14:23] <BBB> right
[01:14:46] <Dark_Shikari> what CPU do you have?
[01:15:01] <BBB> intel core duo
[01:15:13] <BBB> I don't know exactly what extensions it has, should be quite a lot
[01:15:45] <Dark_Shikari> er, core duo?
[01:15:46] <Dark_Shikari> or core 2?
[01:15:58] <BBB> core duo
[01:16:24] <Dark_Shikari> that's bad.
[01:16:26] <Dark_Shikari> get something better, fast.
[01:16:43] <Dark_Shikari> Core Duo == nothing above SSE2, and even SSE2 is far slower than MMX
[01:16:52] <Dark_Shikari> it microcodes every single SSE2 op by converting it to MMX ops
[01:17:46] <BBB> haha :)
[01:17:51] <BBB> I'm getting a new one in 1-2 months
[01:17:57] <BBB> latest mac whatever
[01:18:01] <BBB> but takes another 1-2 months
[01:18:19] <Dark_Shikari> find a better one on ssh
[01:18:28] <BBB> I can start with mmx, no?
[01:18:39] <Dark_Shikari> true
[01:18:40] <Dark_Shikari> but still
[01:18:51] <Dark_Shikari> btw, FYI, the greatest instruction ever for motion compensation
[01:18:58] <BBB> \o/
[01:19:07] <Dark_Shikari> pmaddubsw
[01:19:08] <BBB> that's exactly what I'll start with then
[01:19:19] <Dark_Shikari> For inputs ABCD ... , EFGH ...
[01:19:22] <Dark_Shikari> output:
[01:19:40] <Dark_Shikari> (A*E + B*F), (C*G + D*H), ...
[01:19:45] <Dark_Shikari> ABCD are uint8_t
[01:19:47] <Dark_Shikari> EFGH are int8_t
[01:20:05] <Dark_Shikari> so first you interleave two sets of 8 bytes
[01:20:12] <Dark_Shikari> then you multiply by two interleaved sets of MC coefficients
[01:20:18] <Dark_Shikari> so for h264, for example, the coeffs are 1 5 20 20 5 1
[01:20:22] <Dark_Shikari> so you can do
[01:20:34] <Dark_Shikari> movq, [src-1]
[01:20:36] <Dark_Shikari> er
[01:20:39] <Dark_Shikari> movq xmm0, [src-1]
[01:20:48] <Dark_Shikari> movq xmm1, [src]
[01:20:51] <Dark_Shikari> punpcklbw xmm0, xmm1
[01:21:07] <Dark_Shikari> pmaddubsw xmm0, {-5,20,-5,20...|
[01:21:14] <Dark_Shikari> s/|/}/
[01:21:24] <BBB> what about alignment?
[01:21:28] <Dark_Shikari> movq requires no alignment
[01:21:34] <BBB> pmaddubsw?
[01:21:41] <Dark_Shikari> {} means "a constant"
[01:21:45] <Dark_Shikari> you can put that in a register
[01:21:49] <BBB> hmm....
[01:21:51] <BBB> ok
[01:21:57] <Dark_Shikari> of course this is an ssse3 instruction.
[01:22:02] <BBB> and save the register for reuse in each row
[01:22:05] <Dark_Shikari> yeah
[01:22:12] <BBB> I guess I don't have ssse3, do I?
[01:22:16] <Dark_Shikari> nope
[01:22:26] <Dark_Shikari> here's a way of doing that in SSE2:
[01:22:30] <Dark_Shikari> movq xmm0, [src-1]
[01:22:32] <Dark_Shikari> movq xmm1, [src]
[01:22:32] <BBB> which is ssse3? punpcklbw?
[01:22:37] <Dark_Shikari> pmaddubsw
[01:22:49] <Dark_Shikari> punpcklbw xmm0, ZERO (some zero register)
[01:22:51] <Dark_Shikari> punpcklbw xmm1, ZERO
[01:23:03] <Dark_Shikari> pmullw xmm0, pw_5
[01:23:07] <Dark_Shikari> pmullw xmm1, pw_20
[01:23:14] <Dark_Shikari> where pw_5 means a constant with repeated 5, of size word
[01:24:06] <Dark_Shikari> oh, and at the end you'd have to do psubw xmm1, xmm0
[01:24:17] <Dark_Shikari> so with sse2, it's 2 loads, 2 unpacks, 2 multiplies, one add
[01:24:25] <Dark_Shikari> with ssse3 it's 2 loads, 1 unpack, 1 multiply
[01:25:31] <BBB> I guess I can do sse2, if mmx is too limiting
[01:25:48] <Dark_Shikari> of course, when I talk about mmx
[01:25:50] <Dark_Shikari> I mean "mmxext"
[01:25:53] <Dark_Shikari> aka mmx + isse
[01:26:02] <Dark_Shikari> a few instructions we've talked about are isse-only
[01:26:04] <Dark_Shikari> for example, pshufw
[01:26:08] <Dark_Shikari> psadbw
[01:26:23] <Dark_Shikari> sse2 is mmx + mmxext in 128-bit registers. no new integer instructions are in sse2.
[01:26:36] <Dark_Shikari> (other than the ones which are natural generalizations of mmx to 128-bit)
[01:26:41] <Dark_Shikari> no functionally new stuff.
[01:26:49] <Dark_Shikari> so, equally, mmxext is sse2 in 64-bit instead of 128-bit.
[01:27:25] <BBB> and then ssse3 is new stuff
[01:27:38] <BBB> which my poor mac doesn't have because it's 5 yrs old :(
[01:28:21] <Dark_Shikari> more
[01:28:23] <Dark_Shikari> core 2 came out in 2005
[01:28:42] <BBB> yeah, I bought mine in spring, summer was when core2 came out I think
[01:29:18] <BBB> ok, I'm gonna look tonight
[01:29:32] <BBB> I'll have many silly questions tomorrow
[01:30:07] <Dark_Shikari> no problem
[01:31:18] <BBB> now I'll go entertain the wife a bit ;)
[01:31:24] <BBB> thanks for the tutorial!
[01:31:32] <Dark_Shikari> welcome
[01:31:42] <BBB> you should record this on your blog, it's actually really useful, others could learn from this too
[01:31:49] <BBB> not sure how to do the interactivity :)
[01:35:15] <lu_zero> janneg: pong
[06:21:28] <KotH> moin boys
[06:22:14] * Gottaname|Mobili is lost in the ffserver
[06:22:33] * Gottaname|Mobili is replacing the .conf file of ffserver with... mysql
[06:23:12] <Gottaname|Mobili> =3
[06:33:29] * Gottaname|Mobili rolls hyc around
[07:25:14] <av500> gm
[07:48:27] <wbs> superdump: do you have time to give this guy a follow-up on the discussion on libvorbis channel mapping in encoding? http://lists.mplayerhq.hu/pipermail/ffmpeg-devel/2010-June/089909.html
[07:52:22] <superdump> wbs: i'll just check the 7 and 8 channel orders in a minute
[09:32:34] <mru> moroning
[09:33:19] <av500> that bad?
[09:34:42] <mru> morons are everywhere
[09:34:58] <mru> some of them even mormons
[09:40:32] <CIA-98> ffmpeg: mstorsjo * r23636 /trunk/libavformat/http.h: Add the necessary includes, add an extra empty line for cosmetics
[09:56:08] <lu_zero> mru: it's a wonderful day!
[09:56:39] <mru> are you sure?
[09:56:47] * mru is deep in benchmarking hell
[09:57:31] <av500> how does it perfrom against heaven?
[09:57:36] * elenril thinks wonderful days are a lie
[09:58:03] * av500 ignores elenril unless he backs it up with a trope
[09:59:51] <lu_zero> vlc pukes on feng streams
[10:02:20] <elenril> http://tvtropes.org/pmwiki/pmwiki.php/Main/ItGotWorse
[10:02:46] <mru> Dark_Shikari: http://www.macrumors.com/2010/06/17/onlives-gaming-on-demand-service-demoed…
[10:09:45] <ohsix> hi; media guys, what can you expect from something doing ac3 -> 5.1 -> ac3 again (then out spdif)
[10:10:02] <kshishkov> some loss of quality
[10:10:13] <ohsix> and how does it change with the source bit rate
[10:10:25] <av500> 5.1 is a codec?
[10:10:39] * mru thought 5.1 was a number
[10:10:50] <mru> he compresses all the audio into a single number
[10:10:50] <kshishkov> av500: yes, but 7.1 is a cooler codec
[10:10:54] <ohsix> are normal people going to notice if the 5,1 -> ac3 is high bitrate?
[10:11:03] <mru> ah, two numbers
[10:11:12] <ohsix> 5,1 to suggest its being decoded to discrete streams
[10:11:13] <kshishkov> mru: so it's stereo
[10:11:35] <av500> ok, so ac3 -> pcm -> ac3
[10:11:42] <ohsix> yea
[10:11:54] <mru> why?
[10:12:19] <kshishkov> maybe to edit and then output to spdif
[10:12:49] <mru> then he should say so
[10:12:54] <ohsix> yea, or if some other stream needs to mix into some channels
[10:13:08] <mru> so the source being ac3 is irrelevant
[10:13:15] <ohsix> but its not important; just asking how to characterize it
[10:13:22] <mru> you're asking how good the ac3 encoder is
[10:13:46] <mru> I can tell you it's better than the vorbis encoder
[10:13:50] <ohsix> an ac3 encoder; if we're talking ffmpeg it suffices
[10:14:02] <CIA-98> ffmpeg: michael * r23637 /trunk/libavfilter/vsrc_buffer.h: add #include so make checkheaders passes
[10:14:20] <mru> it's possible to create an arbitrarily sucky ac3 encoder
[10:14:34] * kshishkov waves in friendly way
[10:14:35] <ohsix> i'm more concerned with the transition in general, and the output bitrate can be as high as possible
[10:14:59] <av500> transition?
[10:15:07] <av500> it will be ac3 ->pcm in any case
[10:15:11] <ohsix> ac3 -> pcm -> ac3
[10:15:14] <mru> ac3 is capable of encoding very good quality
[10:15:16] <av500> unless you look at the md5sums only
[10:15:53] <ohsix> right, not looking for the same bitstream; but what it means for the audio, i'm not at all familiar with ac3
[10:16:07] <av500> ohsix: make the final encode at max bitrate
[10:16:08] <mru> ac3 is a lossy mdct-based encoder
[10:16:34] <mru> it runs up to 640kbps
[10:17:07] <ohsix> just wondering if it will be acceptable to do at all; it'll be a worst case situation, if it shits itself past a reencode then its not even a worst case :P
[10:17:28] <av500> should not
[10:17:32] <ohsix> cool
[10:17:54] <av500> and of course, just try it
[10:18:10] <kshishkov> av500: that'd spoil all the fun
[10:18:16] <ohsix> people are debating how to do ac3 decently in pulse without essentially bypassing it entirely (and not just using pasuspender)
[10:18:24] <mru> lol
[10:18:39] <mru> of course _they_ can't do it properly
[10:19:11] <ohsix> alsa can encode ac3 if it needs to; so pulse can mix in other streams if it has to, and have alsa encode it for output
[10:19:26] <mru> don't trust either of those
[10:19:38] <ohsix> heh
[10:20:03] <av500> pasuspender? lol
[10:20:09] <ohsix> theres no reason it couldn't accept ac3 and decode it; then at least it would interoperate and still act like pulse
[10:20:27] <mru> act like pulse == fail randomly
[10:20:33] <av500> ohsix: the api allows to send non-pcm to pa?
[10:20:40] <ohsix> av500: ya, it'll get it off the alsa devices as long as it runs
[10:20:49] <ohsix> not yet
[10:21:40] * av500 thinks the existence of pasuspender proves that stuff like pa is a fail
[10:21:51] <ohsix> personally i think if people want to use ac3 then they want to bypass pulse already; but people are proposing mutilating it to pass ac3 and do nothing pulse-y while it does so
[10:21:56] <mru> what av500 said
[10:22:04] <ohsix> heh?
[10:22:36] <mru> you don't see xsuspender, do you?
[10:22:44] * elenril eats popcorn
[10:22:46] <mru> that's because X works
[10:22:56] <mru> despite freedesktop and x.org doing their best to break it
[10:22:56] <ohsix> i don't suspend x to have other things use my gpu
[10:23:08] * av500 needs governmentsuspender
[10:23:28] * mru waits for the 5th of November
[10:23:31] <ohsix> and that analogy only works if you propose all existing alsa clients become pulse native clients
[10:23:44] <mru> the root of the problem is alsa
[10:23:48] <av500> +1
[10:23:56] <ohsix> eh?
[10:24:04] <mru> if alsa were decent, all the ugly hacks on top would be unnecessary
[10:24:09] <av500> yep
[10:24:09] <elenril> what is so wrong with alsa?
[10:24:14] <mru> all of it
[10:24:15] <ohsix> what ugly hacks
[10:24:17] <av500> it needs pa
[10:24:24] <mru> ohsix: pulseaudio for starters
[10:24:29] <av500> ohsix: if alsa is not wrong, why do you need pa?
[10:24:37] <ohsix> a lot of people abuse alsa & don't know anything about it, thats not alsas fault
[10:24:43] <av500> of course it is
[10:24:45] <mru> http://blogs.adobe.com/penguin.swf/linuxaudio.png
[10:24:50] <KotH> ohsix: have you ever seen a kernel api spec of alsa?
[10:25:02] <av500> ohsix: apis that are so easily abusable are wrong
[10:25:13] <mru> KotH: they _intentionally_ refuse to document it
[10:25:18] <KotH> exactly
[10:25:22] <elenril> what?
[10:25:26] <mru> I know you know, but maybe not the rest
[10:25:40] <ohsix> av500: you get every facet and toggle in the hardware, thats "wrong"
[10:25:45] <KotH> mru: i know you know i know
[10:25:47] <mru> but at least the alsa devs are reasonably friendly peoply
[10:25:47] <KotH> :)
[10:25:50] <mru> people
[10:26:01] <ohsix> you don't use the kernal api, you use asound :\
[10:26:25] * av500 uses libaudiomixer :)
[10:26:30] <KotH> ohsix: yes.. and when your kernel code doesnt match the lib you have, all hell breaks lose
[10:26:31] <ohsix> maintainers of drivers might need to; but users and client writers shouldn't need to know its even there
[10:26:53] <KotH> ohsix: and because you have no idea what changed in the kernel api, you cannot even know which lib version to use
[10:26:53] <ohsix> that sucks; but i have a distrobution, thats their problem
[10:27:04] <ohsix> couldn't you bisect?
[10:27:09] <KotH> ohsix: i have a distribution too, but i need custom kernels
[10:27:25] <KotH> oh, yea.. bisecting kernel and libalsa everytime i update one of those
[10:27:29] <KotH> thanks.. but NO THANKS
[10:27:36] <mru> kernel policy is to maintain backwards compat as much as possible, except for alsa
[10:27:40] <ohsix> you can check out and build out of tree drivers that were released with your driver version
[10:27:46] <mru> it baffles me how that crap could be mainlined
[10:28:07] <KotH> mru: someone got either deceived or bribed
[10:28:12] <ohsix> kernel doesn't maintain the perf interface; they ship the tool
[10:28:17] <mru> KotH: or both
[10:28:19] <ohsix> its not just alsa
[10:28:30] <av500> most kernel apis are sane and documented
[10:28:51] <ohsix> but essentially alsa is out of tree until its checkpointed; then old drivers are in tree
[10:29:09] <av500> out of tree?
[10:29:13] <mru> kernel features that have exactly one userspace tool often change incompatibly
[10:29:20] <mru> oprofile and such
[10:29:39] <ohsix> its a red herring asking for something to be documented that you're supposed to use through a tool or other defined interface, driver maintainers are the only ones that should rightly care
[10:29:52] <mru> oprofile is still documented
[10:29:55] <ohsix> well, you can read all about oprofile
[10:30:04] <ohsix> and the kernel guys don't like it
[10:30:20] <mru> there are aspects of it I don't like either
[10:30:25] <ohsix> av500: ya, make M=
[10:30:37] <ohsix> i like perf a lot
[10:30:58] <KotH> ohsix: why is it a red hering?
[10:30:59] <av500> ohsix: I know what oot building is
[10:31:05] <KotH> ohsix: it is a kernel interface after all
[10:31:10] <av500> +1
[10:31:17] <KotH> ohsix: heck, even kernel internal interfaces are documented
[10:31:27] <ohsix> KotH: there isn't an abi rev anywhere in the interface? to the very least asound might say "kernel too old"?
[10:31:39] <av500> KotH: it just shows that the interface is not good if it needs to be chagned so often
[10:31:46] <CIA-98> ffmpeg: lucabe * r23638 /trunk/libavformat/rtpenc.c:
[10:31:47] <CIA-98> ffmpeg: Simplify (no need to check for st->codec->extradata) and correct
[10:31:47] <CIA-98> ffmpeg: (extradata_size must be at least 5 bytes) the H.264 MP4 syntax check
[10:31:47] <CIA-98> ffmpeg: in rtpenc.c
[10:32:02] <KotH> ohsix: no, it fails.. somewhere...
[10:32:05] <ohsix> that isn't a universal truth when you're talking about the kernel
[10:32:13] <KotH> ohsix: beside, there is also the case that the kernel is _too_new_
[10:32:39] <ohsix> well if you ever get to digging into it again let me know
[10:32:58] <mru> and often upgrading alsalib breaks apps
[10:33:05] <KotH> ohsix: other example: why does the oss emulation of alsa work better than alsa native?
[10:33:15] <mru> that's hilarious
[10:33:34] <ohsix> does it?
[10:33:42] <KotH> yes
[10:33:42] <mru> frequently
[10:34:00] <ohsix> i'd have to look at what the alsa client is doing for a fair comparison
[10:34:04] <KotH> ohsix: on my laptop (a t42), alsa didnt work at all when i first installed it in 2004
[10:34:10] <KotH> ohsix: oss emu worked fine though
[10:34:18] <ohsix> nice
[10:34:45] <ohsix> it sounds like a ctl/amp node problem than an abi problem
[10:34:59] <KotH> ohsix: when you listen to people in #mplayer, there is at least once a month someone complainign about shuttery sound or a/v sync issues which are magically solved by using the oss emu instead of alsa
[10:35:14] <ohsix> oss can work wheb the elements for an app to use a device correctly from an alsa perspective aren't present
[10:35:29] <av500> KotH: isnt OSS emu that "sane" subset of alsa :)
[10:35:48] <ohsix> well i'm not a superstitious person; i'd find out what their problem was
[10:36:08] <mru> ohsix: we've answered that already: alsa
[10:36:21] <ohsix> a lot of apps break when they write & alsa doesn't block them, since they don't do their own sample rate timiing
[10:36:22] <av500> ohsix: "wheb the elements for an app to use a device correctly from an alsa perspective aren't present" explain
[10:36:56] <ohsix> mru: that means nothing without knowing their distro and how they may have mutilated their config files
[10:37:01] <mru> it's also highly frustrating that alsalib can behave very differently depending on the driver in use
[10:37:24] <mru> ohsix: the problem of alsa in general is alsa
[10:37:30] <av500> ohsix: but how do mutilated config files work in OSS emu?
[10:37:48] <lu_zero> and then you get pulse
[10:37:54] <elenril> mru: write a ffaudioframework
[10:38:05] <ohsix> av500: internal nodes and controls in something like ac97 or hda need to be adjusted for output to actually work, if your driver doesn't have the knob that gets the input anywhere near the output then it wont
[10:38:11] <lu_zero> (currently reaching segfault in libalsa)
[10:38:35] <pross-au> like linux needs yet another audio framework
[10:38:53] <ohsix> av500: they don't; you can look what oss mode does to drivers, its incumbent on software actually being properly written with alsa
[10:38:53] <mru> pross-au: it'll keep needing them until someone creates one that works
[10:38:54] <av500> elenril: at work we had to do that because pulse would not work on an arm9 3ys ago...
[10:39:06] <lu_zero> pross-au: currently pulse is just brain damaged about the idea of "not a system wide daemon because I'm narrow minded"
[10:39:19] <pross-au> mru: oss
[10:39:20] <elenril> av500: "pulse would not work" sounds normal
[10:39:31] <mru> pross-au: oss has its drawbacks
[10:39:34] <av500> ok, so it was not us :)
[10:39:39] <ohsix> the root problem is you can open an alsa device and play samples, and not get sound, thats not alsas fault
[10:39:40] <av500> oss4 ftw!
[10:39:49] <mru> it's hard to get accurate sync with oss
[10:39:52] <lu_zero> oss4 seems fragile...
[10:40:01] <pross-au> oh did not know that
[10:40:03] <av500> mru: worked fine for us in the past...
[10:40:19] <mru> maybe for you (your customers)
[10:40:29] <ohsix> lu_zero: its not narrow minded; you can't have uses loading modules if it doesn't run as them, and it handles consolekit handoffs
[10:41:10] <mru> any mention of *kit automatically reduces your credibility in my eyes
[10:41:25] <ohsix> mru has a point about oss; a broke app will be broke, and oss can't always block the app to keep it delivering samples at somewhat the right rate
[10:41:44] <ohsix> heh, all it does is apply acls when the owners change
[10:41:47] <mru> I'm not talking about blocking vs nonblocking writes
[10:41:56] <ohsix> i'm not either
[10:42:11] <mru> sure sounds like it
[10:42:14] <ohsix> talking about bad apps that happen to work
[10:42:44] <pross-au> html5 audio
[10:42:50] <mru> with a sane api it would be trivial to simply open the device and play some samples
[10:42:53] <mru> with alsa it is not
[10:42:54] <ohsix> the pcm cursors and wakeups are completely arbitrary between devices, its not an oss or alsa problem
[10:43:21] <ohsix> well alsa isn't for that :P doesn't diminish what its there to do
[10:43:43] <ohsix> you can use libao or portaudio if thats what you are looking for
[10:43:50] <mru> blegh
[10:44:02] <mru> I want _fewer_ wrappers, not _more_
[10:44:11] <elenril> wrappers all the way down!
[10:44:17] <elenril> this is the only future
[10:44:36] <ohsix> you just need one; one that is a library, a system call interface is hard to virtualize
[10:44:46] <mru> eh what?
[10:45:00] <mru> I'd be happy to mmap the control registers :-)
[10:45:22] <ohsix> but saying you want fewer wrappers to also say that something should work how you wan't, not how it does is neither here nor there
[10:45:49] <mru> the base interface should expose the hardware as is
[10:46:02] <ohsix> thats what alsa does
[10:46:04] <mru> no
[10:46:18] <mru> alsa does all manner of mutilation between you and the hardware
[10:46:23] <mru> resampling etc
[10:46:27] <ohsix> yea, you open hw: you get it
[10:46:37] <ohsix> no configs or plugins involved
[10:46:39] <mru> still too many layers
[10:46:59] <ohsix> if you open hw:, there are none but the link to asound
[10:47:12] <mru> still too much cruft
[10:47:54] <ohsix> the point is you can use other labels and have your integrator give you a uniform interface; like a label surround51, or spdif that do what the user expects
[10:48:18] <mru> who do you think "your integrator" is?
[10:48:37] <mru> this "someone else will fix it" attitude is most disturbing
[10:48:40] <ohsix> the person writing the software you use and put it together
[10:48:49] <ohsix> i fix my problems; thanks
[10:48:55] <mru> look dude, I'M WRITING THE SOFTWARE
[10:49:03] <ohsix> but my grandmmother need not know or care
[10:49:16] <mru> I'm not talking about your grandmother
[10:49:51] <ohsix> and if i am? her integrator solved such things so she doesn't have to
[10:50:12] <mru> why not try to make the integrator's life easy?
[10:50:23] <mru> such as by making things work sanely in the first place
[10:50:37] <ohsix> because its not an easy job; and people need to know what they're doing anyways
[10:50:49] <ohsix> if they don't they shouldn't be doing it
[10:50:55] <mru> you're describing how it is, not how it should be
[10:51:04] <ohsix> it works quite sanely
[10:51:15] <kshishkov> open("/dev/dsp", O_WRONLY); ioctl(dspfd, SET_SAMPLING_RATE, &rate); ...
[10:51:28] <ohsix> and you are saying how its bad without proposing anything, whats better?
[10:52:05] <elenril> kshishkov: yeah, let's add moar ioctls
[10:52:05] <mru> even v4l2 is better in some ways
[10:52:12] <ohsix> kshishkov: opening /dev/dsp doesn't make proper audio software any more than -lasound2 does
[10:52:20] <mru> at least it lets you query the actual capabilities of the hardware
[10:52:56] <ohsix> kshishkov: but what you will find; in people using oss _and_ alsa, that they left the "proper" part out anyways
[10:53:19] <ohsix> and why not???// if it works on their machine :]
[10:53:45] <ohsix> mru: alsa does too
[10:54:06] <ohsix> open hw: and you'll be talking to the device (pcm and ctl)
[10:54:29] <ohsix> but opening hw: is bad juju
[10:54:52] <mru> ok, so how do I query the supported sampling rates?
[10:54:59] <mru> and the buffers size
[10:55:20] <ohsix> all you need care about is if the device you;re talking to is reporting information you need in a reasonably correct manner; if it is hw:, default:, or pulse: or surround51
[10:55:42] <ohsix> well i'd point you at amixer
[10:56:26] <ohsix> i could find the documentation but thats more work than i care for at the moment, and amixer will display everything that can be read
[10:56:59] <ohsix> "amixer contents"
[10:57:06] <mru> amixer is _not_ the solution
[10:57:14] <mru> I want a C interface to ask for the info
[10:57:17] <mru> there isn't one
[10:57:37] <ohsix> then amixer must be magic
[10:58:13] <mru> where does amixer report all supported sampling rates for a device?
[10:58:24] <mru> and min/max buffer size
[10:58:28] <mru> interrupt rate?
[10:58:41] <mru> supported channels?
[10:58:47] <mru> sample formats?
[10:58:56] <mru> amixer controls the mixer settings
[10:59:00] <mru> totally different thing
[10:59:08] <ohsix> http://git.alsa-project.org/?p=alsa-utils.git;a=blob;f=amixer/amixer.c;h=c9…
[10:59:33] <mru> that's enumerating the mixer controls
[10:59:41] <mru> nothing to do with what I said
[10:59:42] <ohsix> oh, ok then i misunderstood; that will only show the knobs, moment
[10:59:54] <mru> I'm not talking about the goddamn mixer
[10:59:58] <mru> I don't care about the mixer
[11:00:41] <CIA-98> ffmpeg: maxim * r23639 /trunk/libavformat/ (Makefile oma.c): Add metadata support. Patch by Michael Karcher.
[11:01:13] <ohsix> before i look; is the "interrupt rate" something you really need to write proper audio software? there are 14mhz interval timers available on pc's since 2004
[11:01:26] <mru> yes
[11:01:38] <mru> what alsa calls "period"
[11:01:58] <ohsix> but you know the sample rate; you just want to set/know the watermark
[11:01:59] <mru> the sound hw generates an interrupt ever N bytes in the buffer
[11:02:06] <ohsix> right
[11:02:08] <mru> watermark is not the same
[11:02:22] <ohsix> but those interrupts are not reliable
[11:02:28] <mru> wtf?
[11:02:31] <mru> they are ESSENTIAL
[11:02:38] <ohsix> are ... you for real
[11:02:48] <mru> more real than you could ever imagine
[11:03:23] <lu_zero> (and scary)
[11:03:29] <ohsix> you know and are tracking the output rate; you know where the cursor is better than the device can, and you can handle it a lot sooner than waiting for a wakeup
[11:03:56] <mru> you obviously have no clue about how audio hardware works
[11:04:02] <ohsix> heh
[11:04:07] <mru> that interrupt _is_ your "cursor"
[11:04:10] <ohsix> i do
[11:04:43] <mru> and setting the right interval is important
[11:04:51] <ohsix> interrupts are subject to the OS and software environment; why would you use them when your os & app can wake up on an itimer at just the right time?
[11:05:06] <mru> because the itimer isn't controlled by the sound card, silly
[11:05:40] <mru> so once again, you've obviously never had to deal with things like a/v sync
[11:05:44] <ohsix> do you derive a real sample clock in your app or assume it follows your sample rate with some low error in ppm?
[11:05:56] <mru> eh?
[11:06:14] <mru> the DAC clock is all that matters
[11:06:17] <ohsix> i've done enough real time software to know how to reify disprate and inaccruate clocks :]
[11:06:19] <mru> even if it's off by 10%
[11:06:30] <mru> there is only one clock
[11:06:47] <ohsix> if its off by 10% and the app is playing 44,1khz in _real time), what happens on an xrun?
[11:07:00] <mru> real time is whatever the dac clock says
[11:07:07] <ohsix> ehh
[11:07:23] <mru> since pc hardware doesn't let me control that clock
[11:07:29] <ohsix> real time is wall time since my video clock is nonintegral
[11:07:49] <mru> I repeat, you don't know what you're talking about
[11:07:57] <ohsix> and my 14mhz interval timers will wake me up on happy buffer time
[11:08:12] <mru> say hello to Mr Drift
[11:08:29] <ohsix> alright; propose your have your dac clock and its off by 10%
[11:08:44] <mru> music will sound horrible, sure
[11:09:12] <ohsix> say you have another one and its off by 10% too; and you need to output with some degree of signal jitter to both the best you can
[11:09:21] <mru> what "another one"?
[11:09:46] <ohsix> another device, another time domain (but given the situation, an audio device is fine)
[11:09:55] <mru> then you have a problem
[11:10:07] <mru> if you're dealing with multiple audio devices, you really need some way to sync their clocks
[11:10:10] <ohsix> how do you make it not a problem?
[11:10:16] <ohsix> right!
[11:10:26] <mru> which pc hardware doesn't let you
[11:10:38] <mru> I'm talking about playing audio and video in sync
[11:10:39] <ohsix> and say you have a clock thats high resolution and has low ppm error rate
[11:10:45] <ohsix> i know
[11:10:55] <ohsix> i'm talking about real time software
[11:11:04] <mru> the sound will play at whatever rate it chooses, and I can't do a thing about it
[11:11:06] <ohsix> of which multimdia is a subset
[11:11:25] <mru> to maintain sync, I must use this as the master time and display video frames accordingly
[11:11:38] <mru> regardless of wallclock time
[11:11:43] <ohsix> well you can, unsynced sample clocks will only impact jitter to a degree that you can measure and minimize
[11:12:07] <mru> independent, freerunning clocks will drift apart sooner or later
[11:12:20] <ohsix> the output might differ but you can make it such that by your contrived all time it only amounts to jitter
[11:12:27] <ohsix> ya they will
[11:12:57] <mru> so without a way to tie them together, you have an unsolvable problem
[11:13:05] <mru> and discussing it further is pointless
[11:13:14] <ohsix> thats why you track and smooth them so you know their real rate if you need to reason about them in terms of how much they drift with relation to eachother
[11:13:38] <mru> but that's utterly irrelevant to this discussion
[11:13:48] <ohsix> heh its not an unsolvable problem; its deciding how much jitter is acceptable and factoring it out
[11:13:48] <mru> which was about writing a video player using alsa
[11:14:08] <mru> jitter is managable, uncontrollable drift is not
[11:14:14] <ohsix> its relevant; the video is its own clock domain; as is the audio
[11:14:35] <ohsix> yes but you have a higher resolution clock to resolve the drift
[11:14:35] <lu_zero> ohsix: ...
[11:15:04] <ohsix> there will be a period where you can duplicate frames to keep them from drifting too much (like jitter)
[11:15:05] <mru> with a 60Hz vsync you'll obviously have some jitter in the video display
[11:15:37] <ohsix> as a timebase vsync isn't very useful :]
[11:15:58] <mru> if the video clock runs faster than the audio clock, a frame will be displayed one vsync interval longer once in a while
[11:16:22] <mru> if it runs slow, you'll be one vsync short from time to time
[11:16:25] <mru> unavoidable
[11:16:35] <ohsix> indeed, most overlay engines optionally sync on vsync
[11:17:04] <mru> that's irrelevant
[11:17:24] <mru> what's relevant is that the 15ms glitch isn't visible
[11:17:25] <ohsix> but you aren't reifying the 60hz time domain with the 48khz one; you're clocking the wall time rate of 23.4543whatever fps with the sample clock rate
[11:17:45] <ohsix> it was as relevant as mentioning vsync in the first place; but lets not digress
[11:18:09] <mru> digress is all you do
[11:18:53] <ohsix> sometimes your video presentation rate will be offl but you can also schedule early frame delivery if you think one is going to cross a sync interval; then you will be a partial frame ahead instead of a whole frame behind
[11:19:18] <mru> the end result is exactly the same
[11:20:04] <ohsix> not really; you can't split a frame
[11:20:09] <mru> playing 24fps video on a 60Hz display has a bit of jitter
[11:20:13] <mru> there's no way around it
[11:20:31] <ohsix> but if you do it early you can minimize the error
[11:20:36] <mru> not necessary
[11:20:46] <mru> the error is at most 15ms
[11:21:31] <mru> but that's a separate problem
[11:21:37] <mru> has nothing to do with the audio clock
[11:21:43] <ohsix> but you could schedule early, why not?
[11:22:10] <ohsix> i tried to make a larger point about clock domains and real time software ... :[
[11:22:13] <mru> sure, subtract a constant from all pts, job done
[11:22:36] <ohsix> theres a reason the magic juice is lacking from a lot of stuff on linux that makes sounds
[11:22:39] <mru> the discussion was about how alsa sucks
[11:22:54] <ohsix> eh
[11:23:01] <mru> minimising video jitter has nothing to do with alsa suckage
[11:23:45] <kierank> wow, you've been going for an hour
[11:24:54] <ohsix> i thought this was the discussion; 03:39 <@mru:#ffmpeg-devel> it's hard to get accurate sync with oss
[11:25:17] <ohsix> it certainly went on how oss wasn't alone in that regard
[11:25:52] <mru> to sync something against the audio clock, you need some way to read it
[11:25:56] <mru> oss doesn't offer that
[11:25:57] <ohsix> minimizing error, not jitter; like you said, 24hz != 60hz, you'll get jitter
[11:26:39] <ohsix> but you know your sample rate is 44.1khz, why do you need to know how fast the cursor is going (assume for a moment that you can actually do that)
[11:26:50] <mru> the alsa timer interface gives a reasonably accurate notification each time a "period" is crossed
[11:27:12] <ohsix> it is still unreliable; and bodged in many drivers
[11:28:15] <mru> which is one reason alsa sucks
[11:28:18] <ohsix> the important parameters you need to know, alsa nor oss will tell you, you need to measure the sample clock (for jitter and error(
[11:28:29] <mru> untrue
[11:28:44] <ohsix> alsa is reporting what the hardware tells it
[11:28:47] <mru> the sound clock is the reference, so there's no need to measure it
[11:28:59] <KotH> are you still at it?
[11:29:16] <ohsix> if the watermark is invalid or the interrupt is delivered late (by some period you'd need to know) that is the card
[11:29:30] <mru> watermark is irrelevant
[11:29:41] <KotH> have you written anything worth knowing? or shall i just skip as a discussion between the master of trolls and a noob?
[11:29:53] <mru> KotH: nothing noteworthy
[11:29:58] <ohsix> in some respects oss is better in that regard; its not providing information you shouldn't know and is unreliable anyways
[11:30:21] <mru> it's only unreliable in certain reference frames
[11:30:38] <mru> nothing is unreliable with _itself_ as reference
[11:30:59] <ohsix> yes; if you can be sure the card is doing it correctly (though this is unspecified!) you might be able to use it as a time base
[11:31:33] <mru> I've been using the sound card as time base for many years
[11:31:36] <mru> works like a charm
[11:31:40] <ohsix> and some cards synthesize those events from kernel buffers; they work differently as well
[11:31:50] <ohsix> until it doesn't :P
[11:32:03] <mru> you could say that about anything
[11:32:41] <ohsix> clocks have a quality masure (Q) and if you're going to pick one you need to either knock one into shape and know why a clock is bad, or derive a locked loop with respect to another clock that you know the Q on
[11:32:51] <KotH> ohsix: if you want to know how high precision+accuracy synchronisation of clocks is done, join the time-nuts mailinglist
[11:32:52] <mru> and once again, with a 60Hz display, pursuing jitter less than 15ms is pointless
[11:33:08] <ohsix> but in audio and video timeframes, 14mhz is enough
[11:33:14] <mru> KotH: this is not about synchronising clocks
[11:33:21] <ohsix> we're talking about reclocking audio, not video
[11:33:25] <mru> eh, no
[11:33:41] <KotH> ohsix: there are people who sync their sound cards to Cs frequency standards for high precision measurement of audio signals :)
[11:33:55] <ohsix> minimizing visual artifacts that are unpleasing to the eye is completely different than minimizing artifacts that are unpleasing to the ear
[11:33:59] <mru> KotH: I bet that requires some rather specialised sound cards
[11:34:04] <KotH> mru: nope
[11:34:20] <KotH> mru: just one where you can desolder the crystal
[11:34:21] <mru> I've never seen a sound card with external clock input
[11:34:26] <mru> oh, soldering...
[11:34:27] <ohsix> you can measure and tune any old pci sound card
[11:34:38] <ohsix> and add external syncs and stuff
[11:34:46] <mru> I have piles of old sound cards...
[11:34:56] <KotH> ohsix: measuring and tuning wont help if you want to go better than 10^-5
[11:34:57] <mru> and some old gps receivers
[11:35:05] <av500> 4) profit
[11:35:09] <KotH> ohsix: and these guys are doing 10^-10 measurements
[11:35:49] <mru> KotH: these the guys who carry atomic clocks to mountaintops just for fun?
[11:35:52] <ohsix> but anyways; an interval timer can wake up your app to write to your pcm just in time (which you can measure due to circumstance in your app preparing sound and the system status), you can literally be writing at the exact moment you cross the water mark, down to 1 or 2 samples; you cannot do that with timing information from the sound card
[11:35:55] <KotH> mru: exactly those
[11:36:00] <mru> ohsix: DRIFT!!!!
[11:36:02] <ohsix> ya i know
[11:36:19] <mru> if you do indeed know, you're sure as hell not acting on that knowledge
[11:36:34] <ohsix> mru: drift, constant or varying? varying to what degree? drift is real, yes; but you have one clock who's frequency is so much higher than the time domain you're interested in
[11:36:51] <mru> I'm not trying to get low latency
[11:36:55] <mru> that's a different problem
[11:37:06] <KotH> ohsix: a normal crystal has 3 types of driffts: temperature, age, movement
[11:37:14] <ohsix> ah well if you know how to say it like that then you already know what i'm explaining
[11:37:25] <ohsix> but i propose doing it "right" is always doing it real time
[11:37:27] <mru> KotH: the issue here is drift between the DAC clock and other clocks in the system
[11:37:35] <KotH> ohsix: and a few types of noise sources: intrinsic, circuitry, temperature, vibration,....
[11:37:38] <mru> such as the main system clock and the vsync clock
[11:38:03] <KotH> lol
[11:38:18] <ohsix> KotH: i know, but consider the domain you're working in; those are all big huge large changes that would indeed affect your output if you weren't tuning it in real time for drift as you measured it
[11:38:19] <mru> if you have a freerunning DAC and time your writes based on another clock, you'll overflow or underflow eventually
[11:38:36] <KotH> mru: so it's about perfectly syncing audio to video?
[11:38:38] <mru> a clock has no drift relative to itself
[11:38:55] <ohsix> not so, you can query for cursor position and know with some reliability where it has been in the past; it is different from having the same information wake you up to deliver samples
[11:38:56] <mru> and 10ms jitter in video is not a problem
[11:39:50] * KotH just drops in the pice of information that most clocks in a pc are in the quality ball park of an R-C oscillator
[11:39:51] <mru> I never mentioned any detail about how I'd be using those interrupts
[11:40:06] <mru> KotH: that's because many of them _are_ RC oscillators
[11:40:14] <KotH> mru: actually no
[11:40:36] <KotH> mru: most of them are low quality crystals with high jitter/noise circuitry
[11:40:38] <ohsix> the problem with cursors is waking up on them or using them for timing, since all cards vary in their misfunctinoality in how they do it, however when you query the position you know that it is sometime in the past, and that while there may be jitter it will always be somewhat in the past, and you can filter the jitter, so you have a stable input into your locked loop
[11:41:02] <av500> err
[11:41:19] <mru> and how the fuck would you know where you are without the interrupts?
[11:41:19] <KotH> av500: speak your mind, cause i dont dare to read that last utterance
[11:41:30] <ohsix> you read it
[11:41:55] <mru> from what?
[11:42:02] <ohsix> theres no reason for the card to wake anyone up for you to read where it is; theres a big difference between waking up on it and using it as an error input
[11:42:23] <ohsix> the pci config space most likely; i imagine some drivers shadow what they've been told in an interrupt context as well
[11:42:27] <mru> I never said anything about waking up
[11:42:45] <mru> and pci config space is not the answer
[11:42:50] <mru> do you know anything about pci?
[11:42:56] <KotH> er...
[11:43:04] <KotH> pci config space is definitly not an answer to anything
[11:43:06] <ohsix> well not the config space; you know what i mean
[11:43:14] <mru> not really
[11:43:19] <ohsix> somewhere bar+n
[11:43:25] <mru> you're so misinformed it's hard to predict what you might meant
[11:43:40] <mru> who said this was a pci device?
[11:43:51] <ohsix> must reflect poorly on you when i said we were talking about the same thing :]
[11:43:55] <ohsix> nobody
[11:44:03] <mru> and we're in userspace here
[11:44:16] <mru> so don't make that assumption
[11:44:24] <ohsix> it just so happens that any given pci device is probably good enough for discussion; making up contrived devices just to put one over on someone is antithetical
[11:44:31] * KotH thought we are in cyberspace
[11:44:40] <mru> I'm not making anything up
[11:44:47] <mru> just pick your favourite SoC
[11:45:01] <KotH> ac97 or I2S?
[11:45:16] <ohsix> are you asking me to detail how you might get said information from alsa?
[11:46:01] <ohsix> i'm not sure where the disconnect is; i mentioned that it might be in the pci space for the device, or the driver might shadow it from interrupt context, you asked where the information might be
[11:47:17] <ohsix> and overall its quite a departure from me showing you where to get the sample rate and whatnot from alsa
[11:49:17] <mru> which you still haven't done
[11:49:38] <ohsix> ya, gonna do that as soon as i'm done typing
[11:49:51] <mru> as usual to dragged off on a tangent in an attempt to avoid admitting you don't know the answer
[11:50:01] <ohsix> i asked about how you would like to know some info that was marginally useless anyways; and i don't know if alsa reports, and it kind of spun out
[11:51:41] <ohsix> i know where you can get all that information from proc; but that isn't what you're looking for :]
[11:52:20] <mru> that info is nowhere in proc
[11:54:43] <mru> it's obviously available somewhere in the drivers
[11:54:44] <ohsix> alsa is kind of incomprehensible, it would be nice if they had a block diagram and how the configuration files and labels came into play
[11:54:47] <mru> so why can't I query it?
[11:54:59] <ohsix> i duno; i'm looking
[11:55:03] <mru> "alsa is kind of incomprehensible" <--- *that* is the problem
[11:55:22] <ohsix> well i understand it enough; but people don't put in the effort when they write clients
[11:55:30] <av500> that is the issue
[11:55:37] <av500> if the api allows that, it is bad
[11:55:38] <mru> the effort is too great
[11:55:45] <ohsix> yea i get that
[11:55:53] <ohsix> but what it does is complex; but conceptually simple
[11:56:08] <ohsix> fwiw if any of you have worked with the Maya plugin api; it would be cool if alsa worked like that
[11:56:22] <mru> have you ever looked at the specs for an actual sound card?
[11:56:25] <ohsix> with binding function sets and facets/aspects of the object you're working with
[11:56:27] <mru> they're much simpler than alsa
[11:56:30] <ohsix> sure?
[11:56:59] <ohsix> well if you're tweaking registers sure; but how do you embody those registers for general use by different software and on different cards
[11:57:27] <mru> they all support the same basic settings
[11:57:30] <ohsix> you have abstractions like switches; a filter graph for ac97/hda that can be manipulated, its complex to present a uniform interface
[11:57:50] <ohsix> yea, but one of the big arguments against oss and for alsa back in the day was that people wanted to use their hardware
[11:58:04] <mru> alsa policy seems to be to make everything as complicated as the most complex operation imaginable
[11:58:07] <ohsix> providing a normalized mixer interface on top of oss is tough stuff
[11:58:20] <mru> I'm not talking about mixers
[11:58:38] <ohsix> i know; mixers are just one problem that alsa purports to solve with all the different control objects
[11:58:51] <ohsix> and a situation most users of oss would probably understand ...
[11:59:00] <ohsix> if i'm being patronizing just say so
[11:59:08] <mru> the mixer interface is horrible in both alsa and oss
[11:59:14] <mru> just in different ways
[11:59:36] <ohsix> well what you see in alsa is a best attempt at normalizing some stuff that controls a "ctl" often not even a hardware ctl
[11:59:50] <mru> but forget about the mixer
[11:59:56] <ohsix> but at best you get mute switches and source/sink switches with alsa
[12:00:05] <mru> a normal playback app has no business messing with mixers anyway
[12:00:50] <ohsix> with ac97/hda the pin complexes form a graph instead of a flat interface with knobs; and part of getting it to work in a new driver is connecting new elements up in there, the same attenuator/switch settings are nodes in that graph instead of the flat interface
[12:01:18] <av500> that is not why alsa apps struggle
[12:01:25] <av500> most would not even care
[12:01:44] <av500> i agree there might be complexity
[12:01:46] <ohsix> i'm just saying, it bears out all this complexity in a uniform, if obfuscated interface
[12:02:02] <mru> uniformly obfuscated
[12:02:10] <av500> obfuscatedly uniform
[12:02:31] <ohsix> iwth a normalized oss interface you'd have to do well to hide all those internal elements and not everyone would agree what the user edifice for that particular device is
[12:02:53] <mru> WE'RE NOT TALKING ABOUT MIXERS
[12:03:23] <mru> now please answer one simple question: how do I query the supported sample rates of a given hardware device?
[12:03:33] <av500> from webm: Whats the status of the 0.9.1 release? The just released ffmpeg-0.6 does not build against libvpx-0.9.0
[12:03:38] <ohsix> i was talking about the complexity, needed; but possibly presented poorly, i can't speak more plainly :[
[12:03:46] <ohsix> mru: still looking
[12:03:56] <av500> 0.6 is already outdated...
[12:04:07] <mru> ohsix: I know straight-forward speech is a challenge for you
[12:04:23] <ohsix> at least i manage without ad hominem :]
[12:04:30] <wbs> av500: isn't it the other way, ffmpeg doesn't build against the outdated 0.9.0 release of libvpx, since they changed stuff after the 0.9.0 release?
[12:04:35] <mru> that was just an observation
[12:04:38] <av500> wbs: no idea
[12:04:47] <mru> wbs: yes
[12:05:02] <ohsix> well i tend to keep observations that demean someone i'm trying to discuss something with to myself
[12:05:05] <av500> but ppl will blame 0.6
[12:05:12] <mru> ohsix: oh really...
[12:07:06] <Honoome> av500: people always blame ffmpeg… that's no news
[12:07:31] <Honoome> plus half the people out there will _never_ blame google: they are The Light… nothing else matters, no?
[12:07:44] <mru> not that simple
[12:07:56] <mru> until the vp8 release, google was Evil(tm)
[12:08:18] <Honoome> mru: no it wasn't, and of course it was FSF single-handedly that convinced Google to release it…
[12:08:37] <mru> eh, what about the streetview steals your soul stuff?
[12:08:45] <mru> and the wifi sniffing
[12:08:47] <av500> well, still might be worth to add that to 0.6 release notes....
[12:08:51] <mru> and the ad tracking
[12:09:03] <mru> and everything else they were lambasted for
[12:09:04] <av500> and keeping a copy of the WHOLE internet?
[12:09:04] <Honoome> mru: I'm being facetious here if you couldn't tell
[12:09:29] <mru> facetious is a good word
[12:09:39] <mru> it has all the vowels exactly once in alphabetical order
[12:09:56] <ohsix> mru: you can't read them directly; but you can exaustively try seeing if something is a valid hw configuration
[12:10:14] <av500> exaustively is not good
[12:10:20] <av500> it misses an "o"
[12:10:25] <ohsix> kind of like PIXELFORMATDESCRIPTOR and stuff for opengl on windows
[12:10:31] <mru> ohsix: and that is precisely the reaon why it sucks
[12:10:37] <ohsix> dunno
[12:11:03] <ohsix> you should just set the pcm you think your app wants; if it doesn't match hw: you cant open the device, but if you open plughw:hw, alsa will resample for you
[12:11:24] <mru> bad, bad, bad
[12:11:34] <mru> I don't want alsa's crappy resampler
[12:11:42] <ohsix> then do it yourself and open hw:
[12:11:47] <ohsix> or don't
[12:11:59] <mru> but how can I know what to resample if I can't query the supported rates?
[12:12:09] <ohsix> even if you knew the sample rate and stuff; you wouldn't know what configuration is valid until you tried to set it
[12:12:24] <ohsix> you try rates you are willing to resample to until hw: opens
[12:12:36] <mru> that's stupid
[12:12:50] <mru> there are thousands of possible rates
[12:13:07] <mru> anything from 8k to ~200k is reasonable to expect
[12:13:17] <ohsix> yea? and you pick one heh
[12:13:32] <mru> why can't I just get a list of ranges the hw supports?
[12:13:41] <ohsix> i'd pick 44.1, 48k, 96k, 192k
[12:13:51] <mru> well, those are likely
[12:13:55] <ohsix> indeed
[12:14:02] <mru> but that's not the point
[12:14:06] <mru> it's a stupid interface
[12:14:07] <ohsix> 32khz as well
[12:14:10] <mru> and 24
[12:14:13] <mru> and 11.025
[12:14:16] <mru> and 8
[12:14:30] <mru> and 22.05
[12:14:33] <ohsix> i don't think so; i think it works well, you can't apply all the hw params over all sample rates for all devices
[12:14:33] <mru> etc, etc
[12:14:59] <ohsix> like on some card out there you'll have to lower the bit depth to do 192khz, how do you convey that
[12:15:10] <mru> ok, I'll rephrase: why is alsa hiding information from me?
[12:15:23] <ohsix> its not; you can see it in /proc/asound
[12:15:31] <mru> that's not a proper api
[12:15:44] <ohsix> what you cant see in /proc/asound, or in any library call; is which combination of parameeters are valid; not all of them are
[12:16:01] <mru> yet alsa internally knows
[12:16:05] <mru> so why can't I find out?
[12:16:16] <ohsix> it often doesn't know until it tries to apply the configuration
[12:16:28] <ohsix> it knows lists of possible parameters, but not what combination of them is valid
[12:16:39] <mru> so why can't I get those lists?
[12:16:46] <mru> it would reduce the combinations for me to try
[12:17:05] <ohsix> you can, but you will still have to try and configure the pcm to know if the config is valid; and you're at no better a position than when you didn't know any of that information
[12:17:17] <mru> v4l2 can enumerate supported pixel formats and resolutions
[12:17:22] <mru> all with a simple ioctl
[12:17:46] <ohsix> v4l doesn't have to try a disprate set of configuration bits to see if it makes a valid hardware configuration, it can list them
[12:17:49] <mru> if I know the hw doesn't support some particular sample rate _at all_ there's no point trying that one
[12:17:57] <ohsix> right
[12:18:10] <mru> same for the other parameters
[12:18:39] <ohsix> but the point is your software should be able to handle a huge range of sample formats if its going to resample itself; you would pick what you prefer first
[12:19:11] <ohsix> sample formats/rates
[12:19:25] <mru> I prefer to not resample at all
[12:20:01] <ohsix> the order that i'd probably do it was to check the bit depth i want first; then the sampling rate, and i'd only have a few candidates to check; and thats if i wasn't just going to use 44.1khz and 16bit, and if i didn't mind plughw doing the connecting for me
[12:20:14] <ohsix> of course; then you'd just generate at the rate you find acceptable
[12:20:27] <ohsix> or the only rate possible for your app
[12:20:44] <ohsix> its not a significant restriction; and it easily covers invalid configurations cleanly
[12:20:55] <mru> I mind plughw, ok
[12:21:11] <ohsix> ok, people aren't opening hw: though, my grandma isn't
[12:21:27] <mru> your grandma is irrelevant
[12:21:33] <mru> she's not writing alsa apps
[12:21:35] <ohsix> i have one piece of software on my machine that might rightfully open hw: and mind plughw, and that's jack
[12:21:56] <mru> anything that cares about quality should stay well away from plughw
[12:22:07] <ohsix> quality is subjective
[12:22:21] <ohsix> resampling is objectively bad, for sure
[12:22:29] <ohsix> but the common case is there is no resampling
[12:22:40] <ohsix> and if plughw is in play its just shuffling stuff around
[12:22:59] <mru> resampling is always worse than not resampling
[12:23:13] <mru> and plughw resampling is worse than doing it yourself with a proper resampler
[12:23:27] <ohsix> yes, but wether the output is acceptable and fit for purpose is bujective
[12:23:45] <ohsix> i appreciate that you don't want to
[12:23:58] <ohsix> and thats a matter of just not opening plughw:hw, but opening hw:
[12:24:26] <ohsix> everything i use here commonly uses default: and there is no resampling
[12:26:01] <ohsix> the set-and-check api is cumbersome but it papers over well the fact that the expression space for the configuration can generate a _lot_ of invalid configurations
[12:26:17] <mru> I don't want proplems papered-over
[12:26:20] <mru> I want them solved
[12:26:26] <ohsix> its not papered over
[12:26:30] <mru> you said so
[12:26:34] <ohsix> you try some parameters, you know if they are valid
[12:26:46] <ohsix> the interface simply makes it less ugly, thats all
[12:27:12] <ohsix> knowing all the knobs in the configuration space will not tell you which are valid until they are set; thats the disconnect that makes it a useful method
[12:27:36] <mru> knowing the valid values of each upfront drastically reduces the search space
[12:27:43] <ohsix> it might not even be known which configurations are valid
[12:27:45] <mru> particularly if I have additional constraints
[12:27:50] <ohsix> sure; but practically speaking, you do
[12:28:11] <ohsix> 24 bit, 16 bit, 32khz, 44.1, 48, 96, 192
[12:28:45] <ohsix> there are cards that'll do fractional sample rates too; they're represented by a range of possible sample rates instead of discrete steps
[12:28:53] <mru> yes
[12:28:57] <mru> so tell me what they are
[12:29:03] <ohsix> i think simply asking for what you want and likely receiving it is good enough
[12:29:14] <mru> I'm telling you it's not
[12:29:30] <mru> what I want depends on what I can have
[12:29:35] <ohsix> not all configurations are valid; and you don't know until they're set
[12:29:48] <mru> do you know why restaurants have a menu?
[12:29:49] <ohsix> you ask for the best that you can get first, why would you ask for any less?
[12:30:04] <mru> it's to save you time asking for dish after dish until you find one they'll cook
[12:30:07] <ohsix> restaurants don't sell a wide range of discrete numbers
[12:30:23] <ohsix> and i haven't been to one that wont cook off the menu
[12:30:27] <ohsix> heh
[12:30:31] <twnqx> Oo
[12:30:34] <twnqx> not off the menu?
[12:30:35] <mru> only with the ingredients they have
[12:30:41] <twnqx> even mcdonalds offers that
[12:31:01] <twnqx> "one hamburger without pickles please"
[12:31:06] <mru> any decent place will customise the food of course
[12:31:10] <ohsix> twnqx: i mean for like; asking for fried eggs at a place that doesn't serve breakfast
[12:31:32] <mru> if they have eggs they might do it
[12:31:38] <mru> if they don't have any eggs, tough
[12:31:43] <ohsix> they will do it, in my experience
[12:31:53] <mru> only if they have eggs
[12:32:17] <ohsix> mru: when i sit down the waitress doesn't exaustively describe the ingredients they have available, and she doesn't say how the person cooking is willing or able to combine them
[12:32:18] <mru> (most places probably have eggs, but it serves as an example)
[12:32:23] <ohsix> the analogy is beyond strained
[12:32:45] <mru> of course she doesn't list all the ingredients available
[12:32:49] <ohsix> i ask for fried eggs regardless; she says if the cook can do it
[12:32:49] <mru> that's why you're given the menu
[12:33:16] <ohsix> the menu does describe valid combinations of food configuration; yes
[12:33:40] <ohsix> how would you express these valid combinations for hardware if you don't know they're valid until they're applied?
[12:34:15] <ohsix> and i'm not just talking about in alsa, i'm talking about the driver literally not knowing until the device gives them a go/no-go for validity
[12:35:10] <ohsix> perhaps overclocking is an appropriate analogy; you have a search space and you have no way to know if its valid for any configuration until you try it; and test it to see if it is fit for purpose
[12:35:28] <ohsix> but you do know that the default clocks will work
[12:35:47] <ohsix> 44.1 16bit is not unlike a default clock, and a good thing to try
[12:35:55] <mru> I still don't see why it's so damn hard to tell me what the supported sample rates are
[12:36:12] <ohsix> because it is not all the information you need to build a configuration
[12:36:21] <mru> no, but it's helpful
[12:36:22] <ohsix> take 192khz, say alsa told you the device could do it
[12:36:44] <ohsix> it doesn't tell you that its only valid with 16bit samples; but the device also does 24bit samples
[12:36:59] <ohsix> and indeed the driver may not know until it tries to set the configuration
[12:37:19] <mru> suppose instead the device only goes up to 96kHz
[12:37:25] <mru> then I'd know there's no point attempting 192
[12:37:47] <ohsix> sure; but software that needs 192khz is rare anyways
[12:38:01] <mru> ?????
[12:38:07] <ohsix> you might say for any given widget, trying anything other than 44.1 is silly
[12:38:16] <mru> why would I say that?
[12:38:34] <pross-au> thats just crap
[12:38:35] <ohsix> you just might; given you might not try 192khz anyways when the device only does 96khz
[12:38:58] <mru> is it just me, or is ohsix making even less sense than usual?
[12:39:10] <ohsix> but all you can do is ask if your configuration is valid; the driver may not know until it is applied, the driver has no apriori knowledge that can help you improve that process
[12:39:23] <mru> sure it does
[12:39:31] <ohsix> heh
[12:39:31] <pross-au> devices that have the capacity to list their supported sample rates, should be afforded an userland api to access those rates
[12:39:48] <mru> exactly
[12:39:50] <ohsix> pross-au: then it should tell you which configurations are possible, right?
[12:40:20] <ohsix> the problem there is the _driver may not know_, until the parameters are tried
[12:40:21] * mru is reminded of EDID
[12:40:22] <kshishkov> pross-au: there was such thing for OSS
[12:40:28] <pross-au> ohsix: yes
[12:40:43] <pross-au> ohsix: then have a flag to indicate that
[12:40:48] <mru> reducing the number of combinations to try is always a good thing
[12:40:59] <ohsix> so should the driver do the search for valid configurations, at boot time? instead of just trying to set the pcm config to what you're asking for?
[12:41:06] <ohsix> of course it is i'm not discounting that
[12:41:07] <mru> now suppose I want to maximise bitdepth at all cost
[12:41:19] <pross-au> ohsix: of course not
[12:41:23] <ohsix> but you most likely will try one config and it'll work
[12:41:24] <mru> then I'll query the supported bitdepths, choose the highest one, then search for a sample rate that supports it
[12:41:35] <mru> or similarly for any other parameter
[12:41:46] <ohsix> mru: right; that's what i'd do, and said as much earlier
[12:41:53] <mru> but that's impossible
[12:42:08] <ohsix> _if_ i had to; practically speaking most of the time i'm gonna want 44.1khz, 16bit
[12:42:17] <mru> since you refuse to tell me the supported rates and bitdepths
[12:42:37] <mru> what you might want most of the time has zero relevance here
[12:42:46] <ohsix> i'd pick 24, then 192, works? no, 96k, works? no, 48k, works? yes but i'd have to resample, 44.1? good! it works
[12:42:59] <ohsix> it just speaks to common usage
[12:43:28] <ohsix> a _user_ that might want to know the search space of a device without exaustively trying every combination will look at his chips datasheet, or /proc/asound/*
[12:43:33] <mru> now you missed both 32 and float
[12:43:43] <ohsix> i don't need 32 or float
[12:43:55] <mru> you != everybody
[12:44:07] <ohsix> and people that do need it start with it, so?
[12:44:22] <mru> should my app google for the datasheet and parse it?
[12:44:46] <ohsix> thats why "common use" is important, 44.1khz 16bit is pretty much good to go, first try; huge possible search space but they nailed it on the first try
[12:45:00] <mru> you're impossible
[12:45:05] <ohsix> heh
[12:45:20] <ohsix> its not a bad interface in light of possible bad configurations
[12:45:27] <mru> wtf
[12:45:31] <pross-au> what if 44.1 16khz is not supported
[12:45:46] <ohsix> if all drivers could know all possible configurations, then it would be reasonable just to iterate over all possible configurations
[12:45:47] <pross-au> are you saying EVERY userland app has to perform its own detection
[12:45:54] <pross-au> imho, that's the job of the audio subsystem
[12:46:22] <ohsix> yes; if they want to use hw:, they're using the hardware, if they don't want to know or care, they use default, which might resample or do other magical things
[12:46:46] <ohsix> the "audio subsystem" will be fine if you aren't concerned with lording over hw:, it will do what you ask in the manner that it can
[12:46:52] <mru> how can anyone so totally fail to miss the point?
[12:47:15] <ohsix> you understand there are invalid configurations yes?
[12:47:25] <mru> of course
[12:47:29] <ohsix> you understand the driver might not know what those are until they are set on the device
[12:47:35] <pross-au> depends. i've seen devices freeze when given invalid configurations
[12:47:45] <mru> but you're refusing to give me so much as a hint on what might be valid
[12:48:33] <ohsix> sure; i'm just saying in light of invalid configurations its kind of meaningless; you know the spectrum of sample rates and bit depths your software might be considered for, and you'd simply try and use them
[12:49:02] <ohsix> most bad software just gives you a place to put in the sample rates and buffer sizes/periods; and they aren't even trying to be clever
[12:49:10] <mru> what if my software works with _any_ configuration?
[12:49:28] <twnqx> then it's well behaving software :P
[12:49:30] <pross-au> thank god they got berkley sockets right the first time
[12:49:46] <ohsix> then certainly one configuration would be preferable over all of them? be it 192khz and 24bit; or 44.1 16bit; you'd pick the one that is preferable to the application you are writing
[12:49:47] <mru> so alsa is designed for crap software?
[12:49:59] <mru> why would one be preferred?
[12:50:19] <ohsix> why wouldn't it? if you're going to search the entire config space, you're not looking for a preffered format?
[12:50:20] <pross-au> my sb card prefers 8-bit 8000Hz
[12:50:30] <ohsix> if you're not looking for something then why are you searching
[12:50:30] <mru> for playback of files, sure whatever is in the file is the preferred setting
[12:51:06] <mru> if I can synthesise whatever I want, I'll want to choose something the hw supports
[12:51:18] <mru> probably the best it can support according to some metric
[12:51:33] <ohsix> right; so test the configuration in order of most prefered to least prefered, suitable for your application
[12:51:38] <pross-au> so ffplaying 8khz game samples, you suggest i resample them to 44.1khz, then have the audio driver re-resample back to 8khz.
[12:51:55] <ohsix> as i stated; i'd go with bit depth first, but it can be any criteria
[12:51:56] <mru> how do I test all possible configurations?
[12:52:07] <mru> there are billions
[12:52:16] <ohsix> you don't test all possible configurations, and there aren't billions
[12:52:37] <ohsix> pross-au: driver doesn't resample anything
[12:53:29] <ohsix> if you were playing a file the first format you would try would probably be without any conversion of any sort; and you'd try 8khz first in that case
[12:53:46] <mru> forget that case
[12:53:55] <mru> suppose I can generate _any_ format
[12:54:09] <mru> fine, let's limit it a bit
[12:54:13] <ohsix> you can; but hardware doesn't play any format
[12:54:19] <pross-au> pcskr cool
[12:54:27] <mru> up to 64 channels, up to 32 bits or float, up to 512kHz
[12:54:36] <mru> now where do I start?
[12:54:56] <mru> so how do I find the best format the hardware does support?
[12:55:17] <ohsix> i get your point, but i also know you don't have to ever conduct an exaustive search of possible configurations, and if you did, by some chance; you'd sooner calculate it by reading the spec sheet rather than exercising it with alsa, something that might not even explore some dimensions of the device
[12:55:26] <merbzt> I have 11111Hz 8bit wav files, how should I do ?
[12:55:56] <mru> so you're saying my app needs to include datasheets for all sound cards?
[12:56:18] <KotH> can someone tell me what all this noise is about? or where the discussion started?
[12:56:19] <kshishkov> mru: precisely! And don't ask a driver for model name too.
[12:56:24] <ohsix> heh if a device has 64 channels alsa is going to split them up in a manner where you'd know there were 64 (maybe not 64 that went together, but 64 something)
[12:56:28] <mru> KotH: someone said alsa sucks
[12:56:36] <kshishkov> someone?
[12:56:37] <KotH> mru: true things are true
[12:57:19] <pross-au> other then linux, who else uses alsa?
[12:57:26] <ohsix> merbzt: try setting it, if its not one of the hw that can do a continuous range you use your resampler to 12khz or something that gives you a nice integer ratio of expansion for a cheap resample
[12:58:12] <merbzt> still 8 bits ?
[12:58:13] <pross-au> quit <N/C>
[12:58:21] <ohsix> ya still 8 bits
[12:58:51] <ohsix> 8 bits isn't all that unusual, if you had a weird sample rate like that you'd probably adjust sample rate first
[12:59:11] <mru> not if the hw supports it
[12:59:44] <ohsix> which would mean the first attempt at setting a device configuration would succeed
[12:59:45] <kshishkov> merbzt: dig out original SoundBlaster and enjoy
[13:00:18] <ohsix> i thought i said as much in the original reply :]
[13:01:24] <ohsix> mru: fwiw i've worked with a lot of interfaces like that; maybe it sucks, maybe it doesn't, but when you have 2 phase initialization theres not much you can do sometimes, but try a configuration and fail, generally you know within a very small window what is acceptable before outright failure is an option
[13:01:58] <merbzt> we want the pony
[13:02:10] <merbzt> not just everything, a pony also
[13:02:38] <ohsix> zomg ponies
[13:03:13] <ohsix> its like on old vga hardware, only some register settings got you a picture; or a monitor that caught fire :]
[13:03:27] <ohsix> vga hardware isn't going to tell you how to light a given monitor on fire
[13:04:14] <ohsix> but if you know the model monitor you can make an educated guess, if that was your goal (i think it was horizontal deflection, later models would just shut off though)
[13:05:15] <ohsix> the problem was the juice for the horizontal deflection would cause a secondary ringing in the flyback; that amounted to a large voltage flux that wasn't clamped in any meaningful way
[13:12:50] <mru> monitors tell you what they support
[13:12:53] <mru> it's called edid
[13:14:17] <ohsix> heh, not when you could set them on fire :] (and you can still send them invalid signals)
[13:19:59] <ohsix> morning BBB
[13:20:06] <BBB> howdy
[13:24:37] <ohsix> mru: the invalid configuration stuff is even more important with chained elements, i know you think that part shouldn't exist; but it constrains valid configurations even further, to a subset of the element and what it is slaved to
[14:39:22] <censor> hi all
[14:39:53] <censor> i just read that 0.6 was released, with RTMP support - does that include RTMP broadcast/publishing, how you need it for Akamai live streaming, and if that's not the case, is it planned to support it?
[14:39:57] <mru> we don't want censorship here
[14:40:40] <censor> me neither ;)
[14:41:00] <censor> look up the egyptian explanation of the name...
[14:41:53] <wbs> censor: yes, it should support broadcast/publishing, both using the lavf-internal rtmp protocol and using librtmp
[14:42:09] <wbs> whether it actually works with all rtmp servers is another question though
[14:42:13] <censor> great news, thanks!
[14:42:27] <censor> well, at least it's worth trying =)
[15:48:23] <BBB> how do I know in what CPU version a particular instruction is available?
[15:54:59] <kshishkov> look at the reference
[15:55:53] <kshishkov> each CPU supports whole instruction sets like SSE, not "Intel Pentium III rev 2, now with movupd"
[15:56:37] <BBB> I know that, but the intel manual doesn't say whether each instruction belongs to "mmx", "sse2", or whatever
[15:56:49] <BBB> and most useful instructions are ssse3, which my cpu doesn't have ;(
[15:57:52] * kshishkov usually can check that in NASM docs
[15:57:59] <kshishkov> appendix B
[15:58:15] <BBB> ok
[16:05:15] <pengvado> intel manual does say, but not in a concise way
[16:06:09] <pengvado> e.g. SSE instructions will say "If CPUID.01H:EDX.SSE[bit 25] = 0" in the possible exceptions table.
[16:08:31] <Dark_Shikari> mru: http://games.venturebeat.com/2010/06/17/gaikai-signs-ea-as-digital-distribu…
[16:15:58] <av500> Dark_Shikari: ping
[16:16:03] <BBB> I need a lookup cheatsheet where I type in what I want my favourite instruction to do, and it'll tell me what's the closest instruction that matches that description
[16:16:15] <Dark_Shikari> av500: ping
[16:16:21] <av500> Dark_Shikari: see pm
[16:16:45] <mru> BBB: such a thing exists
[16:16:49] <mru> BBB: it's called brain
[16:16:53] <mru> needs some training though
[16:16:59] <BBB> heh :) yeah thanks
[16:16:59] <av500> mru: where can I donwload one?
[16:17:46] <Dark_Shikari> av500: yes
[16:29:06] <BBB> dark_shikari: ok, so I'm trying something simple, a horizontal-only 4-tap subpel MC in a 4x4 block (in plain mmx, as a start, does that make sense?)... pmaddwd isn't terribly useful, is it? I'm trying to multiply all four pixels that are used to calculate the destination pixel at once, but pmaddwd gives me the result as two dwords in a mm register, and I can't figure out how to add them.. is it easier to multiply "the whole row at once" for the
[16:29:06] <BBB> tap, then again for the second, and then use paddsb to add it up and write out the whole row at once?
[16:29:32] <Dark_Shikari> you use paddw not paddsb
[16:29:40] <Dark_Shikari> then psraw when you're done
[16:29:41] <Dark_Shikari> then packuswb
[16:29:46] <BBB> don't I have to clip then?
[16:29:50] <Dark_Shikari> saturate
[16:29:52] <Dark_Shikari> that's what packuswb does
[16:30:01] <BBB> oh ok, I thought paddsb would do that
[16:30:09] <BBB> so pmaddwd is useless?
[16:30:27] <Dark_Shikari> er... but that's not how it works
[16:30:28] <Dark_Shikari> it's
[16:30:34] <Dark_Shikari> (A+B+C+D... + round)>>X
[16:30:38] <Dark_Shikari> you can't saturate until after the >> X
[16:30:52] <BBB> oh right
[16:31:16] <Dark_Shikari> pmaddwd is useful here
[16:31:37] <Dark_Shikari> suppose you need to calculate A*src[-1] + B * src[0] + C * src[1] + D * src[2] for each pixel.
[16:32:03] <Dark_Shikari> you create a global constant xmm7 = {A,B,A,B}, signed words
[16:32:12] <Dark_Shikari> you create a global constant xmm6 = {C,D,C,D}, signed words
[16:32:18] <Dark_Shikari> er, I mean, mm7/mm6
[16:32:22] <Dark_Shikari> not xmm since you're doing mmx.
[16:32:28] <Dark_Shikari> movq mm0, [src-1]
[16:32:30] <Dark_Shikari> movq mm1, [src]
[16:32:34] <Dark_Shikari> movq mm2, [src+1]
[16:32:37] <Dark_Shikari> movq mm3, [src+2]
[16:32:41] <Dark_Shikari> punpcklbw mm0, mm1
[16:32:44] <Dark_Shikari> punpcklbw mm2, mm3
[16:32:49] <Dark_Shikari> pmaddwd mm0, mm7
[16:32:54] <Dark_Shikari> pmaddwd mm2, mm6
[16:33:07] <Dark_Shikari> paddw mm0, mm2
[16:33:15] <BBB> wait wait, you shouldn't tell me the solution :-p
[16:33:17] <Dark_Shikari> paddw mm0, ROUND
[16:33:20] <Dark_Shikari> psrlw mm0, SHIFT
[16:33:23] <BBB> that way I'll never get it ;)
[16:33:26] <Dark_Shikari> etc
[16:33:28] <Dark_Shikari> =p
[16:34:51] <BBB> so why would you interleave mm0 and mm1? to make words, shouldn't you interleave with zero before the pmaddwd?
[16:34:59] <BBB> there's probably magic there, but what is your magic? :)
[16:35:05] <Dark_Shikari> Oh, yeah, I screwed up
[16:35:10] <Dark_Shikari> it should be both.
[16:35:33] <Dark_Shikari> 1) interleave with zero, 2) interleave with each other
[16:35:42] <BBB> why interleave with each other?
[16:35:43] <Dark_Shikari> thus it'd be movd instead of movq
[16:35:47] <Dark_Shikari> here's why
[16:35:52] <Dark_Shikari> you could just interleave with zero and use pmullw
[16:35:58] <Dark_Shikari> But that way you need _4_ registers with constants
[16:36:00] <Dark_Shikari> A, B, C, D.
[16:36:13] <Dark_Shikari> If you interleave, you just need {A,B,A,B} and {C,D,C,D}
[16:36:28] <Dark_Shikari> i.e. 2 pixels from each of two sources in each register
[16:36:36] <Dark_Shikari> You will be register-strapped here.
[16:36:54] <Dark_Shikari> You might want to just do the naive way first.
[16:37:09] <BBB> oh, and then pmaddwd autoadds them halfly after mul, paddd adds them for the second half
[16:37:18] <BBB> and I did two pixels in the row at once
[16:37:21] <BBB> I think I get it
[16:37:34] <BBB> that makes sense
[16:37:37] <Dark_Shikari> It's probably easier to start with pmullw.
[16:37:40] <Dark_Shikari> But you can try both.
[16:37:46] <Dark_Shikari> My method will do 2 pixels at once
[16:37:50] <BBB> I'll try, I can always throw it out :)
[16:37:52] <Dark_Shikari> so you'd do 2 pixels, then another 2 pixels
[16:37:58] <BBB> right, and then next row
[16:38:01] <Dark_Shikari> then punpckldq
[16:38:04] <Dark_Shikari> to get 4 pixels
[16:38:06] <Dark_Shikari> then packuswb
[16:38:09] <BBB> should I code loops in asm, or just a macro-loop as you explained yesterday?
[16:38:56] <Dark_Shikari> you should generally not needlessly unroll code _unless_ it lets you save ops
[16:39:06] <Dark_Shikari> for example, you can unroll by a factor of two to get a packusswb in there
[16:39:13] <Dark_Shikari> because packuswb lets you do two things at once
[16:39:14] <Dark_Shikari> e.g.
[16:39:20] <Dark_Shikari> packuswb 0A0B, 0C0D
[16:39:21] <Dark_Shikari> gives you
[16:39:23] <Dark_Shikari> ABCD
[16:40:29] * BBB tries the two-pixel approach
[16:40:39] <BBB> don't worry, I'll ask more silly questions
[16:40:44] <BBB> I'll get this one day
[17:32:34] <av500> hmm, these pesky ffdevs stand in the way of progress always :)
[17:33:00] <elenril> ffprogress!
[17:34:27] <mru> I'm a bit puzzled
[17:34:35] <mru> I want the patch applied, michael says no
[17:34:40] <mru> yet I'm the one blocking something
[17:34:45] <mru> I just don't get it
[17:34:51] <av500> i meant the vp8 thread :)
[17:35:09] <mru> s/michael/google/
[17:39:06] <av500> but then, gg is present in both threads...
[17:44:53] <av500> [amrnb @ 0x8b94160]dtx mode not implemented
[17:47:19] <av500> is amrnb now over? since we have libopencore-amr?
[17:47:48] <kshishkov> on the contrary
[17:47:52] <mru> your troll attempt is far too obvious
[17:48:27] <kshishkov> av500: feel free to yell at superdump to make him implement DTX
[17:48:43] * av500 yells at superdump to make him implement DTX
[17:48:55] <kshishkov> av500: though he's not related to TI it should make you feel better
[17:49:18] <av500> kshishkov: you have very wrong picture of me
[17:49:43] * mru has a big picture
[17:50:11] <kshishkov> av500: should I look in the mirror and compare?
[18:08:06] <BBB> Dark_Shikari: is there a packusdw-like instruction?
[18:08:14] <BBB> or one without clipping is fine also
[18:09:00] <Dark_Shikari> yes.... in sse4. but why do you need it?
[18:09:22] <Dark_Shikari> packssdw is fine because your 32-bit values will never even be larger than 16-bit
[18:09:56] <BBB> true
[18:10:47] <Dark_Shikari> so in short, here's how I'd do it in mmx
[18:11:04] <Dark_Shikari> 2 sets of pmaddwd stuff -> packssdw -> 4 pixels as words
[18:11:11] <Dark_Shikari> add results -> 4 final pixels as words
[18:11:20] <Dark_Shikari> only add _after_ packing to words because it's one less op (one add instead of two)
[18:11:28] <Dark_Shikari> then, repeat that step twice, so you have two sets of 4 pixels
[18:11:31] <Dark_Shikari> then packuswb
[18:11:32] <Dark_Shikari> then store
[18:11:37] <Dark_Shikari> thus, 4 sets of pmaddwd -> 8 output pixels
[18:17:31] <BBB> 2 sets of pmaddwd only multiply 8 numbers, for a four-tap filter that's 2 pixels, not 4
[18:17:32] <BBB> ?
[18:17:42] <BBB> the rest I understand
[18:17:47] <BBB> I have a function that's almost there
[18:17:54] <BBB> only need to add the filter constants somewhere now
[18:17:57] <Dark_Shikari> "one set of pmaddwd" calculates 2 pixels
[18:17:58] <BBB> and test and see how often it crashes
[18:18:10] <Dark_Shikari> "one set of pmaddwd" is two pmaddwds and associated support code
[18:18:18] <BBB> ok, I got it then
[18:18:24] <BBB> I think I have the code you're thinking of
[18:18:26] <Dark_Shikari> so basically it's a branch out, then branch in
[18:18:29] <BBB> let me test it :)
[18:18:29] <Dark_Shikari> load 8 pixels from each source
[18:18:37] <Dark_Shikari> split into 4 sets of pmaddwd
[18:18:39] <Dark_Shikari> combine it back together.
[18:18:44] <Dark_Shikari> or you can do 4 pixels, then split into 2 sets.
[18:19:00] <Dark_Shikari> I would do the loading 4 pixels to start -- less register pressure
[18:19:03] <Dark_Shikari> pastebin it when you're done
[18:19:33] <BBB> ok
[18:19:39] <BBB> I'm doing 4 pixels right now
[18:19:45] <BBB> it actually fits in 6 registers, I think
[18:19:49] <Dark_Shikari> including constants?
[18:19:53] <Dark_Shikari> you'll need regs for constants
[18:19:54] <BBB> yes
[18:19:56] <Dark_Shikari> sweet.
[18:19:57] <Dark_Shikari> let me see.
[18:19:58] <BBB> only 2
[18:20:00] <BBB> in a bit
[18:20:01] <BBB> still working
[18:20:05] <Dark_Shikari> Yeah, isn't my trick nice to save regs ;)
[18:20:16] <Dark_Shikari> And save two adds, I guess.
[18:20:17] <BBB> I think pmullw would've needed 7 or 8
[18:20:20] <BBB> I didn't try
[18:20:25] <Dark_Shikari> k
[18:20:50] <BBB> punpck* reg, [mem] <- does mem need to be aligned in any way?
[18:20:56] <Dark_Shikari> for mmx: no
[18:21:02] <BBB> ok, good
[18:21:03] <Dark_Shikari> Keep in mind that it's best still to avoid crossing cachelines
[18:21:06] <Dark_Shikari> now, here's a gotcha
[18:21:10] <Dark_Shikari> a horrible, horrible gotcha
[18:21:20] <Dark_Shikari> mmx punpcklXX loads 4 bytes from memory
[18:21:23] <Dark_Shikari> using the 32-bit load unit.
[18:21:25] <Dark_Shikari> This makes sense, right?
[18:21:41] <Dark_Shikari> because it only uses the low half of each unit.
[18:21:44] <Dark_Shikari> of each reg, that is
[18:21:51] <Dark_Shikari> right?
[18:26:42] <BBB> I guess so
[18:26:50] <BBB> I'm only reading 4 bytes in every iteration anyway?
[18:27:04] <BBB> why?
[18:27:30] <Dark_Shikari> because we haven't modified it to use 8 yet
[18:30:27] <Dark_Shikari> anyways, per what I said above
[18:30:29] <Dark_Shikari> here's the gotcha
[18:30:35] <Dark_Shikari> you would think that an sse punpckl would read 64 bits, right?
[18:30:40] <Dark_Shikari> since it reads the lower half of 128-bit.
[18:30:51] <Dark_Shikari> But it doesn't. It reads the full 128 bits of memory, even if it only uses 64 bits of it.
[18:31:03] <Dark_Shikari> Which means it crashes on unaligned loads.
[18:32:27] <BBB> does punpcklbw mm0, 0 make sense to do a zero-extend byte->word?
[18:32:31] <BBB> yasm doesn't appear to like it
[18:32:39] <Dark_Shikari> of course not
[18:32:44] <Dark_Shikari> you can't have immediates for mmx or xmm
[18:32:47] <Dark_Shikari> 0 must be a register
[18:32:53] <BBB> the docs say I can give it a mm32
[18:32:56] <BBB> isn't that a constant?
[18:33:05] <Dark_Shikari> no, that's the lower half of an mmx register
[18:33:12] <BBB> oh
[18:33:22] <BBB> damnit, so I need 7 registers then, one as my zero :)
[18:34:32] <Dark_Shikari> btw, what are the 4 tapfilter constants?
[18:34:40] <Dark_Shikari> sometimes you can do magic if the numbers are right
[18:36:56] <BBB> they depend on one of the input argumens, similar to chroma mc in h264
[18:37:01] <BBB> mx/my are function args
[18:37:35] <BBB> 6, 123, 12, 1 | 9, 93, 50, 6 or the reverse of either of these two
[18:38:58] <BBB> http://ffmpeg.pastebin.com/ws4QEkYw
[18:39:11] <BBB> I haven't tested it yet, I still have to integrate it into the calling code
[18:39:15] <BBB> but it compiles at least
[18:39:27] <BBB> and the fourtimes_64 thing is a hack because I'm lazy
[18:39:32] <BBB> oh, it's actually 63, typo
[18:39:47] <BBB> hm, no, it's 127, double typo
[18:39:50] <BBB> anyway, needs testing
[18:40:08] <BBB> n/m
[18:42:00] <Dark_Shikari> there should be no tapfilter setup code
[18:42:04] <Dark_Shikari> prepare all the constants globally
[18:42:10] <Dark_Shikari> also you can use "times 4 dw 64"
[18:42:41] <Dark_Shikari> i.e. the punpck/punpck code above nextrow should be gone
[18:42:53] <Dark_Shikari> 31 is BCDE, not EFGH
[18:42:54] <BBB> right, that could be integrated int he table
[18:43:17] <Yuvi> ugh, the filter really does need 17 bit signed intermediates
[18:43:28] <Dark_Shikari> o.0
[18:43:36] <Dark_Shikari> oh fuck
[18:43:38] <Dark_Shikari> what the fuck
[18:43:46] <BBB> why?
[18:43:48] <Dark_Shikari> they're insane
[18:43:53] <Yuvi> (3+77+77+3)*255
[18:43:57] <Yuvi> as max
[18:43:59] <kshishkov> sounds like VP3 :)
[18:44:10] <BBB> but it's unsigned
[18:44:25] <Yuvi> -(16+16)*255
[18:45:10] <BBB> hello by the way, have you been idling and looking at this all morning? :-p
[18:45:17] <Yuvi> barely
[18:45:21] <BBB> pheew
[18:45:34] <Yuvi> I just thought the paddd was ugly
[18:45:39] <Yuvi> but it appears to be needed
[18:45:43] <BBB> I agree
[18:45:50] <BBB> but I couldn't think of anything better for now
[18:45:54] <BBB> hey, it's my first asm ;)
[18:46:30] <Dark_Shikari> you should generally group related instructions together
[18:46:31] <Dark_Shikari> e.g.
[18:46:32] <Dark_Shikari> movd/movd
[18:46:34] <Dark_Shikari> punpck/punpck
[18:46:38] <Yuvi> it's not bad ;)
[18:46:40] <Dark_Shikari> it's clearer to read
[18:46:55] <Dark_Shikari> and better on in-order
[18:46:58] <BBB> how do I integrate this in vp8dsp.c?
[18:47:12] <BBB> I'll do that, but would like to test and see if it crashes
[18:47:15] <BBB> or maybe it works
[18:47:17] <Dark_Shikari> you should use pw_64, not fourtimes_64
[18:47:20] <Dark_Shikari> as the name
[18:47:21] <Yuvi> make x86/vp8dsp.c, similar to x86/mlpdsp.c
[18:47:22] <Dark_Shikari> and put it in mm7
[18:47:36] <Dark_Shikari> If you want to save a reg, use my pavgw trick
[18:47:39] <Dark_Shikari> Oh, you made a mistake
[18:47:42] <Yuvi> and make+call ff_vp8dsp_init_mmx
[18:48:01] <Dark_Shikari> you're going to need to add twotimes_64 _before_ the pack
[18:48:04] <Dark_Shikari> with paddd
[18:48:15] <Dark_Shikari> oh wait, no, it doesn't matter does it?
[18:48:26] <Dark_Shikari> because it would get saturated...
[18:48:34] <Dark_Shikari> hmm. this is a good question
[18:48:42] <Dark_Shikari> Yuvi: do we actually need 17-bit intermediate?
[18:48:50] <Dark_Shikari> the right shift is by 7
[18:48:56] <Dark_Shikari> that means anything larger than 1 << 15 will saturate
[18:49:03] <Dark_Shikari> er, >= 1<<15
[18:49:27] <Yuvi> hm, maybe not
[18:49:32] <Dark_Shikari> BBB: your code doesn't use r4 and r5
[18:49:46] <BBB> it uses r4 in the beginning
[18:49:54] <Dark_Shikari> where
[18:50:05] <BBB> where it uses r2, that should be r4 :-p
[18:50:09] <BBB> r5 is indeed unused
[18:50:14] <Dark_Shikari> isn't r4 and r5 the mv?
[18:50:18] <Dark_Shikari> or are they just the low bits of each mv?
[18:50:35] <BBB> low bits
[18:50:42] <BBB> this is a H-only function
[18:50:46] <BBB> was simplest to start with
[18:50:48] <Dark_Shikari> so, BBB, you have two choices
[18:50:50] <BBB> so my is 0
[18:50:56] <Dark_Shikari> add 2z64 before packssdw
[18:50:59] <Dark_Shikari> *2x64
[18:51:02] <Dark_Shikari> or add 4x64 after
[18:51:05] <Dark_Shikari> you're using "paddd"
[18:51:08] <Dark_Shikari> that's a 32-bit add
[18:51:10] <Dark_Shikari> your source is 16-bit
[18:51:12] <Dark_Shikari> oops ?
[18:51:15] <BBB> oh, right, oops
[18:51:32] <BBB> paddw then?
[18:51:39] <BBB> is probably faster than 2x paddd
[18:51:47] <Dark_Shikari> yes
[18:52:46] <Yuvi> but you'll have to be vary vary careful about order if you sum the taps with 16 bit signed adds
[18:53:00] <Dark_Shikari> well he's using 32-bit now
[18:53:09] <BBB> should ff_vp8_dsp_x86_init() be in libavcodec/vp8dsp.c or should I create another C file in x86/?
[18:53:14] <Dark_Shikari> BBB: also you could use the pavgw trick here to save a register
[18:53:14] <BBB> yeah right now it's 32 bit, should be ok
[18:53:18] <Yuvi> another C in x86
[18:53:31] <BBB> Dark_Shikari: I'll use it, let me first try this code, just to see if it works ;)
[18:53:40] <BBB> I'm still learning ;)
[19:09:46] <BBB> Undefined symbols:
[19:09:46] <BBB> "_ff_put_vp8_epel4_h4_mmx", referenced from:
[19:09:46] <BBB> _ff_put_vp8_epel4_h4_mmx$non_lazy_ptr in libavcodec.a(vp8dsp-init.o)
[19:09:56] <BBB> probably stupid question or so, but there's really no typo :)
[19:10:21] <Dark_Shikari> check in the .o what it's actually called.
[19:10:29] <kshishkov> is your function declared global?
[19:10:45] <BBB> oh, so smart, it prefixes the ff itself
[19:10:48] <BBB> I had called it ff_ already
[19:10:56] <BBB> so it was _ff_ff_put
[19:11:54] <BBB> whoa it's bitexact
[19:12:03] <Dark_Shikari> WHAT?
[19:12:04] <BBB> there were no additional typos :-p
[19:12:06] <Dark_Shikari> You got it right the first time?
[19:12:07] <Dark_Shikari> awesome.
[19:12:11] <BBB> hehe :)
[19:12:11] <Dark_Shikari> that is the best feeling in the world
[19:12:16] <Dark_Shikari> When you write a big fancy function
[19:12:18] <Dark_Shikari> and its RIGHT
[19:12:25] <BBB> this one is really quite small :)
[19:12:32] <mru> usually when that happens it's not actually being called
[19:12:36] <Dark_Shikari> mru: true
[19:12:38] <Dark_Shikari> =p
[19:12:41] <Dark_Shikari> Highly true.
[19:12:42] <BBB> how do I test that?
[19:12:45] <mru> so I insert a deliberate error just to be sure
[19:12:46] <BBB> it's very likely
[19:12:51] <Dark_Shikari> BBB: "mov esp, 0"
[19:12:56] <Dark_Shikari> ;)
[19:12:57] <BBB> ok
[19:21:12] <BBB> I guess I need a better test movie
[19:21:21] <BBB> Yuvi: didn't you have a movie that had a lot of subpel mv?
[19:21:43] <Yuvi> http://www.supergenije.com/cruncher/test.webm
[19:22:07] <Dark_Shikari> BBB: or you made a booboo
[19:22:24] <Dark_Shikari> it's rather unlikely that a particular subpel position will never get called.
[19:22:25] <BBB> the function pointer is set
[19:22:41] <Dark_Shikari> unelss the clip was literally encoded with no subpel
[19:24:05] <av500> its vp8, anything is possible...
[19:25:00] <BBB> 4x4 is only used for split subpel, my guess is the movie has a lot of full blocks (setting 16x16 4tap function to NULL crashes instantly), but few split blocks
[19:25:17] <Dark_Shikari> Oh
[19:25:19] <Dark_Shikari> this is 4x4 only
[19:25:22] <BBB> right
[19:25:25] <Dark_Shikari> and you don't do 8x8 by calling 4x4 a lot
[19:25:30] <BBB> right
[19:25:31] <BBB> yet
[19:25:32] <Dark_Shikari> fyi, 16x16 should be done (except in fullpel cases) by calling 8x8
[19:25:38] <Dark_Shikari> for mmx, 8x8 should probably call 4x4
[19:25:39] <Dark_Shikari> for sse, no way
[19:25:44] <Dark_Shikari> since sse can do more pixels at once
[19:25:48] <BBB> right
[19:25:52] <BBB> I understood that much ;)
[19:26:00] <BBB> I'll probably work on a few mmx functions before I go to sse2
[19:26:02] <Dark_Shikari> Also, one of the later optimizations you'll be doing is minimizing loads
[19:26:09] <Dark_Shikari> Notice right now you reload a lot of pixels multiple times
[19:26:12] <Dark_Shikari> this hurts on pre-nehalem intel
[19:26:12] <BBB> right
[19:26:32] <BBB> I figure you can load 8 pixels at once, then shr(8) them for the next pixel?
[19:26:45] <BBB> and use the lower 4 bytes using punpcklXY
[19:27:06] <Dark_Shikari> pshufb is one way to do a ton of magic
[19:27:10] <Dark_Shikari> remember, you have to interleave them
[19:27:15] <Dark_Shikari> so if you can do an arbitrary byte-shuffle...
[19:32:16] <BBB> ok now it's used, and of course it's wrong, let me try and figure out how bad it is :-p
[19:33:22] <wbs> Yuvi: https://roundup.ffmpeg.org/issue2013 may need your attention
[19:33:43] <wbs> av500: yes, dtx is unsupported in the internal amrnb decoder, but if you pass -acodec libopencore_amrnb, it should work
[19:36:08] <av500> wbs: yes, I got that
[19:36:27] <av500> I need it on arm, so i'll have to see how to crossbuild it..
[19:36:45] <av500> but that should be fairly easy as android does that too
[19:37:07] <av500> btw, why are patch files on roundup: application/octet-stream
[19:37:32] <av500> shouldnt they be "text"?
[19:38:47] <mru> did someone just use android and easy in the same sentence?
[19:39:10] <Yuvi> wbs: patch should be ok
[19:39:26] <av500> mru: nice, eh? :)
[19:41:22] <Yuvi> wbs: actually, check ogg_packet.e_o_s
[19:44:54] <Yuvi> BBB: Dark_Shikari: okay, I'm satisfied that the filter can be done with only 16-bit saturating adds, like so: http://pastie.org/1009054
[19:45:56] <BBB> Yuvi: I still don't see why, the sum of all filter coeffs is 127, and pixel is 255 max, that's 15 bits
[19:45:59] <BBB> there's no negative coeffs
[19:46:17] <BBB> so I think it can all be done with much less trouble, no?
[19:46:25] <Yuvi> f1 and f4 are negative
[19:46:51] <Yuvi> and the positive coeffs add up to 160 in the worst case (77+77+3+3)
[19:47:12] <BBB> oh shit I didn't see that
[19:47:17] <BBB> that's probably my bug :-p
[19:47:47] <BBB> ok, so 160*255, got it
[20:12:02] <BBB> how do I print the mmx registers?
[20:13:02] <Dark_Shikari> carefully
[20:13:10] <Honoome> lol
[20:13:54] <BBB> all-registers :)
[20:23:58] <Vitor1001> pengvado: Sorry about the stupid question, but what you means with alternating between two schedules?
[20:24:12] <Vitor1001> You are wondering why I need to do aligned loads?
[20:29:53] <Dark_Shikari> he means why are you doing one ordering of stuff in one place
[20:29:56] <Dark_Shikari> and another ordering in another place
[20:29:56] <Dark_Shikari> iirc
[20:31:14] <wbs> Yuvi: ok, so like this then? http://albin.abo.fi/~mstorsjo/0001-libvorbis-Only-drop-1-byte-packets-at-en…
[20:31:54] <Vitor1001> Dark_Shikari: I still don't get what you mean with ordering.
[20:32:22] <Vitor1001> You mean the fact I reverse the vector or the weird 1-byte unalignement?
[20:32:34] <Dark_Shikari> instruction ordering
[20:33:26] <Vitor1001> Oh... Actually no idea. Before I started coding in asm I thought instruction ordering was important. Until I started benchmarking :p
[20:33:51] <Dark_Shikari> There's really no reason to pick a particular order in most cases
[20:33:52] <Dark_Shikari> Just be consistent.
[20:34:06] <Vitor1001> ok, good point
[20:39:52] <wbs> av500: I haven't looked into how to enable all the arm optimizations in opencore though
[20:43:35] <ZeZu> instruction ordering "can" be usefull
[20:43:45] <ZeZu> depends on how deep your getting into optimization
[20:43:54] <ZeZu> and the processor of course
[20:44:08] <mru> if you don't have out of order execution, it's _very_ important
[20:44:15] <ZeZu> for in-order execution processors that can still execute multiple instructions
[20:44:16] <ZeZu> yes
[20:44:31] <ZeZu> its absolutely required for good opts
[20:44:36] <mru> and even with it, you must make sure to not overload the reorderqueue
[20:45:48] <ZeZu> and even in the case of deep pipelining and full out of order .. it can be usefull in a variety of cases , to eliminate stalls (that shouldn't happen anyways but do ..) and to keep instructions on word boundaries so you can use faster isntructions in other places
[20:46:20] <ZeZu> damn its time for a new keyboard again
[20:52:11] <BBB> yay my function is bitexact
[20:52:26] <BBB> only 5 or 6 mistakes in dq vs wd or stuff like that :)
[20:52:43] <BBB> I suppose I need to now test if it's faster?
[20:55:51] <_av500_> wbs: there might be none for amr
[20:57:56] <Vitor1001> ZeZu, mru, are there any x86 cpu that don't support out-of-order execution?
[20:58:39] <mru> anything prior to ppro for sure
[20:58:52] <ZeZu> x86 isn't the only arch. I deal with, but I believe some of the cheap embedded line may not, amd geode .. but even that prob does
[20:59:09] <mru> atom?
[21:03:07] <BBB> Dark_Shikari: can I use ff_pw_64?
[21:03:16] <BBB> it's defined as xmm_reg_t, I don't know if I can touch that from mmx code
[21:06:54] <ZeZu> atom is out of order
[21:09:31] <ZeZu> VLIW / stream processors are a good example for optimizing instruction ordering, esp. for something like gpu
[21:10:25] <mru> vliw offers total static scheduling
[21:10:36] <mru> which is good for DSPs and such
[21:10:54] <ZeZu> shiny
[21:11:36] <ZeZu> instruction ordering is real fun for dynamic optimization, if you want to do multi-pass anyhow
[21:17:15] <BBB> Dark_Shikari: and also, other places use [ff_pw_64] directly, they don't actually move it to a register, can I do that too? or bad idea?
[21:17:48] <BBB> oh n/m that, I misread
[21:17:51] <BBB> they do the same as me
[21:22:47] <peloverde> Friendly reminder, the PS patch is looking for review
[21:27:37] <BBB> didn't I review it already?
[21:28:01] <BBB> Dark_Shikari: also, what is the typical performance boost I should see for this function?
[21:28:07] <BBB> (or anyone, for that matter)
[21:28:15] <saintdev> BBB: he posted an updated patch a few days ago
[21:29:18] <BBB> the patch is 144k :-(
[21:29:51] <saintdev> o.O
[21:31:04] <saintdev> BBB: a little later there's a copy that uses tablegen that's only 73K
[21:31:09] <saintdev> :P
[21:31:13] <peloverde> BBB, yes you did, thanks
[21:31:15] <BBB> "only"
[21:31:20] <BBB> I'll look at it again
[21:31:23] <peloverde> The current version is half the size
[21:31:25] <peloverde> due to tablegen
[21:31:43] <saintdev> that's pretty cool :)
[21:32:54] <peloverde> There is a big nasty table used by both the AAC encoder and decoder that I really want to tablegen, but I'm not 100% sure about the best way to do it.
[21:33:48] <Dark_Shikari> BBB: yes you can use pw_64
[21:34:04] <Dark_Shikari> you can use it directly, but don't repeat loads unnecessarily unless you run out of regs
[21:34:09] <Dark_Shikari> it's fine to use it directly if you're only using it once
[21:34:14] <Dark_Shikari> typical performance boosts ranges, a lot.
[21:52:38] <BBB> I'm seeing only a 10% increase
[21:52:55] <BBB> which is rather disappointing... maybe it's because it's only a 4x4 block?
[21:54:25] <Dark_Shikari> In that function?
[21:54:28] <Dark_Shikari> or overall?
[21:54:37] <Dark_Shikari> A normal increase is like 3x, 5x, 10x
[21:55:58] <BBB> that's what I expected also
[21:56:02] <BBB> in this specific function
[21:56:17] <Dark_Shikari> well your cpu does just suck
[21:56:24] <BBB> probably
[21:56:27] <Dark_Shikari> also pastebin the function again
[21:56:58] <BBB> I probably count wrong
[21:57:02] <BBB> let me recount just to be sure
[21:57:05] <BBB> then I'll pastebin it
[21:57:11] <BBB> I think my START_TIMER is placed wrongly
[21:57:21] <Dark_Shikari> You know that start_timer isn't normalized in ffmpeg, right?
[21:57:26] <Dark_Shikari> that is, it doesn't subtract out the cost of an empty timer.
[21:57:31] <Dark_Shikari> You have to do that yourself.
[21:58:20] <BBB> that's fine
[21:58:30] <BBB> I did it for all 4x4 mx&1==1 functions
[21:58:33] <BBB> not just those with my==0
[21:58:39] <BBB> now it's about 2,5x faster
[21:58:43] <Dark_Shikari> Yes, that sounds about right.
[21:58:50] <BBB> I'll pastebin, 1 second
[22:00:49] <BBB> http://ffmpeg.pastebin.com/XUBhFPa7
[22:01:27] <BBB> I'm not using your average-function trick yet, have to look at that
[22:01:30] <Dark_Shikari> fix your constant array
[22:01:33] <BBB> ?
[22:01:37] <Dark_Shikari> i.e. to not do the punpcks on init
[22:01:38] <Dark_Shikari> oh
[22:01:39] <Dark_Shikari> wait
[22:01:40] <Dark_Shikari> you did
[22:01:41] <Dark_Shikari> wait what?
[22:01:52] <Dark_Shikari> oh, I see
[22:01:54] <Dark_Shikari> nevermind, I'm blind.
[22:02:22] <Dark_Shikari> reorder the movds and punpck like I said
[22:02:26] <Dark_Shikari> i.e. movd/movd/punpck/punpck
[22:02:28] <BBB> oh right
[22:02:29] <BBB> ok
[22:03:24] <Dark_Shikari> what's with the sub r0, r1?
[22:03:30] <Dark_Shikari> dst and src are guaranteed to have the same stride?
[22:03:35] <BBB> yes
[22:03:41] <Dark_Shikari> Since my isn't used, you only need 5,5
[22:03:45] <BBB> ok
[22:04:10] <BBB> if I don't use mx, can I somehow convince it to not store it in a register?
[22:04:22] <Dark_Shikari> explain?
[22:04:28] <Dark_Shikari> and why do you use 6,6,2? you don't use any xmm regs
[22:04:41] <BBB> I thought it was for mm%d regs
[22:04:52] <BBB> how many mm regs are there?
[22:04:54] <Dark_Shikari> the third number is for xmm
[22:04:57] <Dark_Shikari> there are 8 mm regs
[22:05:02] <BBB> oh, just right :)
[22:05:05] <Dark_Shikari> you can use those without telling it
[22:05:11] <Dark_Shikari> it should be 5,5 (with no 2)
[22:05:15] <BBB> ok, changed
[22:05:17] <Dark_Shikari> now, what's your issue with mx?
[22:05:37] <BBB> if I write the v4 variant of this
[22:05:48] <Dark_Shikari> which uses my but not mx?
[22:05:54] <BBB> yes
[22:05:56] <Dark_Shikari> Here's what you do
[22:05:58] <Dark_Shikari> 1) 4,4
[22:06:00] <Dark_Shikari> er, i mean
[22:06:02] <Dark_Shikari> 4,5
[22:06:11] <Dark_Shikari> 2) mov r4, r5m
[22:06:15] <Dark_Shikari> :)
[22:06:23] <BBB> r5m = ?
[22:06:28] <Dark_Shikari> memory location of r5 on the stack
[22:06:30] <Dark_Shikari> actually that's suboptimal
[22:06:33] <Dark_Shikari> what you _should_ do is
[22:06:46] <Dark_Shikari> %ifidn r5, r5m
[22:06:53] <Dark_Shikari> %define my r5
[22:06:55] <Dark_Shikari> %else
[22:07:01] <Dark_Shikari> mov r4, r5m
[22:07:04] <Dark_Shikari> %define my r4
[22:07:05] <Dark_Shikari> %endif
[22:07:18] <Dark_Shikari> On x86_64, r5 == r5m and there's no pushing necessary to get it
[22:07:21] <Dark_Shikari> so you don't want to do the redundant move
[22:07:29] <Dark_Shikari> %ifidn == if identical
[22:07:40] <BBB> omg you are crazy... ok :)
[22:08:02] <BBB> what about the rest of the mmx func?
[22:08:28] <Dark_Shikari> btw, that's an example of register munging being necessary to get _absolutely_ optimal code on all arches.
[22:09:06] <Dark_Shikari> By the way, why don't you pass mx + my<<2 or something?
[22:09:13] <Dark_Shikari> I guess that would end up being more ops.
[22:09:14] <Dark_Shikari> meh
[22:09:18] <BBB> yeah
[22:09:19] <Dark_Shikari> the rest of the asm looks good.
[22:09:27] <Dark_Shikari> dec r3 isn't aligned
[22:09:31] <Dark_Shikari> 158 isn't aligned
[22:09:37] <Dark_Shikari> the instructions should be aligned on commas
[22:09:48] <Dark_Shikari> 122 too
[22:09:51] <Dark_Shikari> at least that's how I do it
[22:10:34] <BBB> dec r3 is an oops :)
[22:10:44] <BBB> the rest, if that's how you do it, I'll change it
[22:11:06] <BBB> so then it's e.g. (122) sub r0_,r1?
[22:11:18] <BBB> with a space before and after the comma
[22:11:26] <BBB> or just sub r0,doublespacer1?
[22:11:34] <Dark_Shikari> no
[22:11:34] <BBB> let me check your x264 code
[22:11:37] <Dark_Shikari> sub r0, r1
[22:11:45] <Dark_Shikari> i.e. the r0 starts in a different place
[22:11:48] <BBB> oh, so you push r0 forward
[22:11:48] <BBB> ok
[22:12:02] <Dark_Shikari> paste it again when you're done
[22:14:14] <BBB> http://ffmpeg.pastebin.com/vguDqCar <- just vp8dsp.asm
[22:14:30] <BBB> also cleaned up the top of the file a bit, and removed the C table
[22:15:34] <Dark_Shikari> btw, look at vp8_filter_block1d_h6_mmx in libvpx/vp8/common/x86/subpixel_mmx.asm
[22:16:08] <BBB> well that's cheating :-p
[22:16:10] <Dark_Shikari> now think to yourself how much fucking better yours is.
[22:16:21] <Dark_Shikari> Except for the shifting bit at the start, but you can copy that.
[22:16:29] <Dark_Shikari> Yours is like half the size.
[22:16:44] <Dark_Shikari> and probably faster.
[22:17:11] <Dark_Shikari> ok, true, theirs does 8 pixels instead of 4.
[22:17:24] <Dark_Shikari> or wait... no it doesn't
[22:17:28] <Dark_Shikari> wait, theirs is totally fucked up
[22:17:28] <Dark_Shikari> WTF
[22:17:32] <Dark_Shikari> packuswb mm3, mm0 ; pack and unpack to saturate
[22:17:33] <Dark_Shikari> punpcklbw mm3, mm0 ;
[22:17:37] <Dark_Shikari> LOL
[22:17:43] <Dark_Shikari> AHAHAHAHAHAHAHAHAHAHHAHAHAHAHAHAHAHAHAHAHA
[22:17:46] <Dark_Shikari> mru: oh god
[22:19:11] <BBB> they calculate a sixtap, even if it's a fourtap
[22:19:15] <BBB> that's why it's so weird
[22:19:16] <Dark_Shikari> not just that
[22:19:19] <Dark_Shikari> no that isn't the only thing
[22:19:25] <Dark_Shikari> They have 4 pixels, in 16-bit
[22:19:27] <Dark_Shikari> they pack to 8-bit
[22:19:29] <Dark_Shikari> and then they unpack again
[22:19:30] <Dark_Shikari> and store
[22:19:41] <BBB> isn't that a little retarded?
[22:19:44] <Dark_Shikari> Yes.
[22:19:46] <BBB> :)
[22:19:49] <Dark_Shikari> Remember when I said "retarded monkeys"?
[22:19:52] <Dark_Shikari> That.
[22:19:58] <Yuvi> would pmullw be faster here than pmaddwd?
[22:20:08] <Dark_Shikari> Yuvi: pmaddwd is the same speed and gives you a free add
[22:20:11] <Dark_Shikari> and saves you two registers
[22:20:19] <Yuvi> but you're adding 0 aren't you?
[22:20:23] <Dark_Shikari> no
[22:20:28] <Dark_Shikari> note the interleaving
[22:20:56] <BBB> Yuvi: I'm doing several pixels at the same time, to take advantage of pmaddwd
[22:21:08] <Yuvi> hm, so punpck -> pmadd -> paddd vs. pmullw -> paddsw ?
[22:21:23] <Dark_Shikari> Yuvi: pmaddwd requires
[22:21:25] <Yuvi> BBB: you're always doing that in simd though
[22:21:42] <Dark_Shikari> 4x punpck, 4x pmadd, 2x padd
[22:21:45] <Dark_Shikari> pmullw requires
[22:21:54] <Dark_Shikari> 4x punpck, 4x pmullw, 4x padd
[22:21:56] <Dark_Shikari> and 2 more registers
[22:22:26] <Dark_Shikari> because you need 4 registers for masks instead of 2
[22:22:28] <BBB> hmm... wife is calling
[22:23:07] <BBB> Yuvi: don't worry, I won't commit yet, this is a little useless
[22:23:12] <BBB> it was just the easiest to implement
[22:23:36] <BBB> I'll work on h6, the v4/6 and the h4/6v4/6 variants also
[22:24:12] <Dark_Shikari> this is a very good start
[22:24:15] <Dark_Shikari> top-quality asm function
[22:24:15] <BBB> I can hopefully reuse a little of this code in some others
[22:24:17] <Dark_Shikari> well optimized.
[22:24:31] <Dark_Shikari> you will be able to do h6 without spilling
[22:24:45] <Dark_Shikari> since you can do pmaddwd with memory if needed, and round with memory
[22:24:46] <Dark_Shikari> to save regs
[22:25:06] <BBB> I can simply do the same I do here right?
[22:25:11] <BBB> just repeat it three times instead of two
[22:25:14] <Dark_Shikari> yeah
[22:25:18] <BBB> since I add mm1 to mm0, I simply reuse mm1
[22:25:36] <BBB> I just need one more reg for the 5th/6th filter coeffs
[22:26:03] <BBB> I'll round with memory then
[22:26:05] <BBB> simplest
[22:26:11] <BBB> ok, that's tonight/tomorrow
[22:26:13] <BBB> off for now
[22:26:18] <saintdev> what is spilling? haven't been able to figure that out from context yet.
[22:27:11] <mru> saving regs to stack
[22:28:10] <Dark_Shikari> spilling is what you have to do when you run out of registers
[22:28:13] <Dark_Shikari> BBB: or you use pavgw trick
[22:28:17] <Dark_Shikari> and then you don't need a constant at all
[22:28:25] <Yuvi> Dark_Shikari: http://pastie.org/1009306 <- like that should be the same, no?
[22:28:29] <saintdev> effectively a push/pop with a simd reg?
[22:28:48] <Yuvi> which has the disadvantage of more loads for the filter
[22:29:54] <Yuvi> or am I missing a different reason why pmaddwd is faster here?
[22:30:59] <Dark_Shikari> Yuvi: fewer ops
[22:31:00] <Dark_Shikari> period
[22:31:07] <Dark_Shikari> fewer adds, fewer memory references
[22:31:30] <Yuvi> I'm not seeing the fewer adds
[22:31:50] <Dark_Shikari> oh you're right
[22:31:53] <Dark_Shikari> it's just fewer regs
[22:31:57] <Dark_Shikari> Either way, it's certainly not worse.
[22:32:09] <Dark_Shikari> and when he moves to pmaddubsw, it'll be easier to base that code on the current code
[22:33:37] <Dark_Shikari> same with sse
[22:33:44] <Dark_Shikari> with sse, this trick will let a 4x4 block be done incredibly quickly
[22:33:49] <Dark_Shikari> with just two multiplies
[22:33:57] <Dark_Shikari> per row
[22:34:47] <CIA-98> ffmpeg: cehoyos * r23640 /trunk/libavfilter/vsrc_buffer.c:
[22:34:47] <CIA-98> ffmpeg: Use enum PixelFormat to silence one icc warning:
[22:34:47] <CIA-98> ffmpeg: warning #188: enumerated type mixed with another type
[22:34:47] <CIA-98> ffmpeg: enum PixelFormat pix_fmts[] = { c->pix_fmt, PIX_FMT_NONE };
[22:34:47] <CIA-98> ffmpeg: ^
[22:34:57] <Yuvi> true, pmaddubsw will work a lot better
[22:36:00] <Dark_Shikari> pmaddubsw will let us do an 8-pixel row in two multiplies
[22:36:17] <Dark_Shikari> one 16-byte load, two pshufb, two multiplies, one add
[22:36:37] <Dark_Shikari> compared to the current 8xpack 8xmult 4xadd or so
1
0