[FFmpeg-devel-irc] IRC log for 2010-05-10

Tue May 11 02:00:41 CEST 2010

[00:29:10] <CIA-7> ffmpeg: vitor * r23077 /trunk/libavfilter/defaults.c:
[00:29:10] <CIA-7> ffmpeg: Alloc 16 extra bytes in libavfilter frames. Needed for MMX-optimized swscale.
[00:29:10] <CIA-7> ffmpeg: Fix issue 1924.
[06:18:20] <benoit-> mornings
[06:19:15] <av500> gm
[06:36:37] <wbs> morning
[07:05:25] <superdump> morning
[07:06:33] <wbs> superdump: what's the policy regarding reading reference code when doing a proper lgpl reimplementation? more specifically, amr
[07:06:57] <wbs> i'm trying to get a grip on how the comfort noise/dtx/sid/whatever stuff really works, but the standard docs say like 10% of what the code actually does ;P
[07:07:21] <Dark_Shikari> imo, feel free to read whatever you want, just put a large gap between reading and writing, so that your short term memory doesn't cover any of the code.
[07:07:24] <Dark_Shikari> but I'm pretty liberal
[07:07:42] <Dark_Shikari> and diego would probably disagree
[07:08:04] <superdump> i suppose it's ideal if you've never looked at the code, then even if it resembles it, there's no way you could have known
[07:08:21] <superdump> but reference code is for referencing so...
[07:08:35] <superdump> not copy and pasting, but for understanding how stuff works
[07:08:41] <superdump> and the amr specs are shite
[07:08:49] <superdump> so you need the ref code to understand properly
[07:09:02] <kshishkov> hmm, I'd argue
[07:09:30] <kshishkov> usually you cannot make anything out of reference code
[07:09:50] <CIA-7> ffmpeg: benoit * r23078 /trunk/libavcodec/h264_mp4toannexb_bsf.c:
[07:09:50] <CIA-7> ffmpeg: Check NAL unit size to avoid reading past the buffer.
[07:09:50] <CIA-7> ffmpeg: This fixes issue1907
[07:09:50] <CIA-7> ffmpeg: Patch by Thomas Devanneaux gmail(thomdev)
[07:11:26] <superdump> well we're talking about amr, not 'usually'
[07:11:28] <superdump> :)
[07:12:04] <superdump> also, if they say that the code takes precedent over the spec in case of conflicts, you _have_ to look at the reference code
[07:12:07] <superdump> and they do
[07:12:32] <wbs> hah, great
[07:12:35] <kshishkov> well, AMR spec was even less understandable than reference code IIRC
[07:12:43] <superdump> right
[07:13:10] <superdump> i didn't try to disseminate the comfort noise parts though
[07:13:28] <kshishkov> who cares? It's the same stuff for all speech codecs
[07:16:10] <superdump> mmm
[07:18:20] <av500> mmm, is that irc comfort noise?
[07:18:28] <kshishkov> yeth
[07:18:45] <kshishkov> of of the kinds
[07:18:48] <wbs> av500: although it isn't all that comforting at times ;P
[07:18:49] <kshishkov> *one of
[07:21:20] <kshishkov> wbs: not all of us have that comforting noise of Baltic sea waves
[07:21:40] <wbs> true :-)
[07:22:54] <av500> kshishkov: theres the rhine for you if you want :)
[07:48:10] <ohsix> is there a lunixy tool to convert between subtitle types somewhere
[07:52:36] <elenril> aegisub?
[07:54:57] <ohsix> it includes commandline tools?
[07:55:56] <elenril> don't think so
[08:11:29] <KotH> ohsix: use perl, there isnt anything more unix to convert between text file formats
[08:12:04] <ohsix> they aren't all text files
[08:12:15] <kshishkov> KotH: awk!
[08:12:44] <KotH> ohsix: that's why i said perl and not sed+awk
[08:13:36] <KotH> kshishkov: awk alone is a bit awkward... you need at least sed to do anything reasonable with it
[08:13:50] <kshishkov> KotH: worng
[08:13:58] <kshishkov> yes, _that_ wrong
[08:14:24] <kshishkov> I usually do that like /text-expr/{flag=1;} //{if(flag){do smth} flag=0;}
[08:15:05] <kshishkov> that also saves a bit on grepping
[08:15:47] <kshishkov> and it can manipluate fields
[08:16:06] <kshishkov> but I use perl for a bit more complicated tasks anyway
[08:17:08] <KotH> oh.. i do these kind of things in flex usually.. by far simpler than awk ;)
[08:17:37] <kshishkov> "flex" as parser generator?
[08:18:23] <KotH> exactkly, the swiss army knife :)
[08:18:47] <mru> flex is nice, but not really a substitute for awk, sed, or perl
[08:19:32] <kshishkov> KotH: in your case it's swiss army yatagan
[08:29:20] <mru> fate was so nice and green... now look what someone's gone and done
[08:30:10] <av500> isnt that the purpose of fate? to tell you somebody did something?
[08:30:22] <mru> people should still be a little careful
[08:31:02] <kshishkov> wow, http://fate.multimedia.cx/index.php?stderr=221920
[08:31:23] <mru> that's been there for ages
[08:31:30] <mru> I don't know why mike doesn't fix it
[08:32:49] <wbs> is there any history in fate to show which commit actually broke test X on machine Y?
[08:33:18] <kshishkov> I wonder how http://fate.multimedia.cx/index.php?build_record=221921 got there
[08:34:15] <mru> it's failing on most of my machines
[08:34:21] <kshishkov> yes
[09:10:18] <j-b> Diego?
[09:24:33] <av500> mru: I have no idea why I needed that #if CONFIG_BSFS in the past...
[09:33:50] <CIA-7> ffmpeg: mru * r23079 /trunk/Makefile: FATE: print friendly error for individual tests when SAMPLES unset
[09:41:10] <funman> hi
[09:41:34] <kshishkov> hI
[09:41:52] <kshishkov> got any code to contribute?
[09:42:11] <funman> not yet but i would like to work on it
[09:42:34] <funman> i want to benchmark audio decoder
[09:43:11] <funman> i see ffmpeg has a -benchmark option but i think ffmpeg is only used in conversion, not decoding-only?
[09:43:56] <kshishkov> decoding is a process of converting packed data into unpacked form
[09:44:09] <funman> developer doc mentions benchmarking but doesn't give examples
[09:44:17] <funman> hm ok
[09:44:35] <kshishkov> so try "ffmpeg -benchmark -i infile -f null -"
[09:45:03] <kshishkov> add -vn/-an if needed
[09:45:31] <funman> thanks
[09:54:43] <funman> i'm looking at rockbox wma decoder, afaik mt & saratoga are too busy to contribute it back to FFmpeg
[09:59:55] <j-b> salut funman
[10:00:09] <funman> salut
[10:11:12] <KotH> j-b: if you are looking for diego, dont look in #mplayer :)
[10:11:31] <KotH> j-b: he's only on #mplayerdev and here
[10:11:51] <av500> not in #punctuation?
[10:12:28] <j-b> lol
[10:12:34] <j-b> KotH: too bad, patch applied
[10:15:46] <KotH> :)
[10:23:31] <j-b> I hope I didn't mistake
[10:42:55] <BastyCDGS> question, does somebody know if it is really portable to assume certain array element ordering, i.e. x[1024] == x[256][4]?
[10:43:43] <mru> that's forbidden
[10:44:07] <BastyCDGS> thanks, I'm doing the lut32 tables
[10:44:08] <mru> the layout is defined, but you're not allowed to do out-of-bounds accesses
[10:44:19] <BastyCDGS> I meant the layout
[10:44:29] <mru> why do you care about the layout?
[10:44:48] <BastyCDGS> I'm rearranging it a bit
[10:44:52] <mru> so?
[10:44:55] <Kovensky> optimilization?
[10:45:01] <mru> memory layout still doesn't matter
[10:45:02] <BastyCDGS> yes same for dp32 as for dp8 ;)
[10:45:20] <av500> Kovensky: lol
[10:45:20] <mru> as long as you access the arrays as they are declared, the compiler will take care of the rest
[10:47:02] <BastyCDGS> does this still work if you mix 1D with 2D array access?
[10:47:10] <mru> that's not allowed
[10:47:20] <BastyCDGS> i.e. safe to assume [i][j] == i*256 + j?
[10:47:23] <mru> no
[10:47:26] <mru> well, it is
[10:47:31] <mru> but you're still not allowed to do that
[10:47:47] <BastyCDGS> ok, then I'll use 1D array
[10:47:51] <mru> why?
[10:49:03] <kshishkov> mru: s/why/what for/
[10:49:15] <mru> same question
[10:49:27] <mru> and don't say it's faster
[10:49:33] <BastyCDGS> to do sth. like this:
[10:49:33] <BastyCDGS> http://pastebin.org/217296
[10:50:08] <mru> that won't work
[10:50:22] <mru> AV_WN64A is a macro that might evalute the args more than once
[10:50:37] <mru> it probably won't
[10:50:50] <BastyCDGS> oh yes you're right, then I will put the ++ in the next line
[10:51:11] <mru> but that's a separate issue
[10:51:26] <mru> you're using a 2d array just fine there
[10:52:24] <BastyCDGS> it's probably better to use + 1, + 2 than ++ here an there...right?
[10:53:03] <mru> uh?
[10:53:30] <BastyCDGS> isn't the compiler smart enough here?
[10:53:36] <mru> to do what?
[10:54:07] <BastyCDGS> to temporary store mask and simply increment by one for each
[10:54:23] <BastyCDGS> hmm, anway it doesn't use inc instruction anyway but addl 1
[10:54:26] <mru> isn't that what your code does?
[10:54:29] <BastyCDGS> so it shouldn't matter
[10:54:46] <mru> and now you're micro-optimising again
[10:54:47] <mru> stop it
[10:54:49] <BastyCDGS> I mean when I use + 1 and + 2 instead those ++ parts (code looks simplier then)
[10:54:57] <mru> not to me
[10:55:16] <kierank> 11:54] <@mru> and now you're micro-optimising again --> more like pico-optimising
[10:55:57] <mru> things like this are almost impossible to write optimally for all architectures anyway
[10:55:58] <BastyCDGS> what's the problem when I've fun with it?
[10:56:07] * av500 notices the fast do{...}while() pattern
[10:56:09] <mru> if speed is that critical, you should write it all in asm
[10:56:30] <mru> the problem is that you're wasting your time and ours
[10:56:39] <BastyCDGS> well, I really thought of doing some asm versions of those but that's for later ;)
[10:56:50] <mru> lol
[10:58:10] <kierank> BastyCDGS: you need to look at things in perspective
[10:59:25] <mru> it's a frickin amiga format ffs
[10:59:53] <mru> speed is hardly relevant
[11:00:01] * av500 makes note to buy even more beer
[11:00:03] <mru> at least not the last 0.00001%
[11:00:28] <BastyCDGS> http://pastebin.org/217327
[11:00:39] <KotH> av500: beer?
[11:00:42] <mru> WASTE OF TIME
[11:00:45] <KotH> av500: comming to lug-camp?
[11:00:49] <mru> the paste, not the beer
[11:00:55] <mru> beer is never a waste of time
[11:00:59] <mru> unless it's french
[11:01:00] <funman> speaking of micro-optimising, what is the point of r19669 ?
[11:01:25] <mru> funman: to not have a stupid VLA
[11:01:55] <funman> so mainly cosmetics? (nice looking code)
[11:02:07] <mru> safer, more portable code
[11:02:13] <BastyCDGS> besides this I'm doing here micro and macro opt in one time
[11:02:27] <funman> ok
[11:02:33] <mru> a vla can easily blow up your stack and there's nothing you can do about it
[11:02:53] <mru> unless the compiler calls malloc, which is almost as bad
[11:02:58] <mru> and slower
[11:03:22] <mru> with gcc you lose one register and can't inline the function
[11:04:52] <kshishkov> what, another one?
[11:04:59] <mru> it requires a frame pointer
[11:05:26] * kshishkov waits when GCC implements decent 1-register code compilation for x86
[11:06:15] <BastyCDGS> mru, by rearranging the tables I hope to get another speedup of 200% in dp32
[11:06:24] <BastyCDGS> the thing is they're accessed now more close
[11:06:31] <mru> well, I can't deny you hope
[11:06:44] <mru> how big is the table?
[11:06:57] <BastyCDGS> 32*1024*8
[11:06:58] <pJok>  kshishkov, you are also waiting for Ukraine to be part of the EU? which is most likely to happen first? ;)
[11:07:08] <mru> that's a huge table
[11:07:12] <mru> make it smaller
[11:07:16] <mru> much smaller
[11:07:26] <av500> pJok: you think it will cost more then greece?
[11:07:40] <mru> a table that size will totally blow up your L1$
[11:07:59] <kierank> i don't think the eu will allow countries to fudge the economic restrictions any more ;)
[11:08:08] <kierank> restrictions for entry that is
[11:08:08] <kshishkov> pJok: GCC stuff, of course. Ukraine is actively maintaining status quo
[11:08:10] <BastyCDGS> why, you're just accessing one of 1024*8 per plane i.e. per inner loop
[11:08:23] <BastyCDGS> i.e. 8K per plane
[11:08:36] <BastyCDGS> but of course it should be discussed if it's worth having a 256K table
[11:08:44] <mru> but every image has all planes
[11:08:50] <mru> so you need to load the entire table
[11:08:57] <mru> it's not worth it
[11:09:26] <pJok> av500, unlike greece i think that ukranians actually know how to do an honest days work
[11:09:41] <mru> that table is big enough to blow L2 on many chips
[11:09:46] <pJok> i laughed when i heard that the retirement age in greece was 50...
[11:09:52] <mru> wtf?
[11:09:57] <mru> 50, seriously?
[11:10:00] <pJok> yeah
[11:10:05] <pJok> no wonder they are going bankrupt
[11:10:06] <mru> no wonder they're in trouble
[11:10:42] <av500> its not 50
[11:11:29] <mru> BastyCDGS: can you make the table entries 32-bit instead?
[11:11:42] <mru> read one byte at a time from the input
[11:11:48] <pJok> av500, there was something about that on the news... and they said 50
[11:11:51] <mru> then shift, mask, and index twice
[11:12:18] <mru> http://news.bbc.co.uk/1/hi/world/europe/8506142.stm
[11:12:18] <pJok> not that i trust the news these days
[11:12:26] <mru> "The socialist government said it wanted to increase the average retirement age from 61 to 63 by 2015."
[11:12:30] <av500> pJok: http://aleksandreia.wordpress.com/2010/03/08/greek-retirement-age-and-more-on-the-greek-debt-crisis/
[11:12:35] <BastyCDGS> mru!
[11:12:38] <BastyCDGS> I got it!
[11:12:47] <BastyCDGS> speedup of 230%
[11:12:53] <BastyCDGS> with Ooze
[11:12:59] <mru> I don't trust your benchmarks
[11:13:16] <BastyCDGS> from 45k to 22k
[11:13:55] <mru> you need to measure the decoding time for the entire image, not one line
[11:14:13] <pJok> av500, so noone really knows the retirement age in greece...
[11:14:26] <av500> pJok: some ppl may retire at 58 -> all ppl retire at 58 -> 58 is in the 50ies -> all greek retire at 50..,
[11:14:35] <av500> thats how news works these days
[11:14:50] <mru> av500: and 58 was entirely made up to begin with
[11:14:57] <av500> mru: yup
[11:15:11] <mru> http://dilbert.com/fast/2010-05-05/
[11:16:45] <BastyCDGS> mru the speed calculation is averaged by 8192 runs that is quite accurate
[11:16:59] <BastyCDGS> but a whole image test would be nice too
[11:17:17] <kierank> do you know how much time that function takes in decoding an image BastyCDGS?
[11:17:55] <BastyCDGS> the other stuff is just memcpy or byterun1 decoding of one line
[11:18:00] <BastyCDGS> and a memset 0 of it before
[11:20:17] <BastyCDGS> mru, I just was completely reading my mail I was comparing speed with current version not with my first optimization
[11:20:30] <BastyCDGS> but still, from my first optimization its 30k
[11:20:32] <BastyCDGS> now its 22k
[11:21:00] <BastyCDGS> but the question is right, if that is worth a 256K table
[11:22:10] <mru> do a 32-bit table
[11:22:14] <mru> it may well be faster
[11:22:42] <BastyCDGS> with 32-bit I have to >> 4 and &15 but it probably won't be much slower
[11:22:46] <BastyCDGS> I'll just try
[11:23:09] <mru> a shift and a mask is much faster than a cache miss
[11:24:06] <BastyCDGS> yes
[11:25:13] <mru> almost everything you thought you knew from the amiga is wrong on modern systems
[11:27:58] <kshishkov> you should've started with VAX insted ;)
[11:28:28] <BastyCDGS> mru, 18k with 32 bit
[11:28:40] <BastyCDGS> but didn't add the shift & and stuff yet
[11:28:47] <mru> eh?
[11:28:55] <mru> benchmarking incorrect code is pointless
[11:29:19] <BastyCDGS> I just replaced 64 bit stuff by 32 bit and duplicated AV calls
[11:29:36] <BastyCDGS> and that shows that the 32-bit approach is faster
[11:29:46] <mru> not necessarily
[11:30:05] <mru> what machine are you running this on btw?
[11:30:20] <BastyCDGS> AMD athlon XP 2100
[11:30:32] <mru> how much cache does that have?
[11:31:26] <BastyCDGS> L1: 64K, L2: 256K
[11:31:43] <KotH> that's a 5y old machine...
[11:31:53] <mru> then you should see some difference
[11:32:29] <KotH> BastyCDGS: and i thought, i had an old box at home :)
[11:32:47] <BastyCDGS> KotH, the machine is just fine for me ;)
[11:32:57] <BastyCDGS> what do you have?
[11:33:55] <KotH> Athlon 64 3700 (2.2GHz)
[11:36:49] <BastyCDGS> mru, when I add shifting and masking I get 30k
[11:36:58] <BastyCDGS> sth. 29k as well
[11:37:16] <mru> as I said, I don't trust your benchmarking method
[11:39:05] <BastyCDGS> anway, why? just because the result doesn't fit your expectations?
[11:39:17] <mru> because you're doing it wrong
[11:40:20] <BastyCDGS> I'm benchmarking the stuff I'm changing
[11:40:33] <mru> you think you are
[11:41:26] <BastyCDGS> dp32 isn't inlined
[11:41:27] <KotH> mru: you should tune down your tone a bit
[11:41:36] <KotH> mru: you aren't helping your case talking like this
[11:41:36] <BastyCDGS> if that's what you mean
[11:41:46] <mru> that's not what I mean
[11:42:00] <mru> you need to measure the full decoding time
[11:42:25] <mru> that's especially important when dealing with cache effects
[11:43:11] <BastyCDGS> but it does measure the full decoding...that's why it's called 8100 times
[11:43:37] <mru> you're measuring individual lines
[11:45:53] <BastyCDGS> mru, do we agree that the outer loops are neglictible to the inner loop?
[11:46:06] <mru> that's not the issue
[11:46:46] <mru> if the inner loop blows the L1$, something entirely innocent-looking might start taking significant amounts of time
[11:47:52] <BastyCDGS> 8K per plane, therefore the cache gets full after 8 planes (if it's 64K), since L1 is used for different stuff too, I calc with 7 planes
[11:48:05] <BastyCDGS> given a 24bpp image, the cache is 3-4x flushed
[11:48:39] <BastyCDGS> and yes the measure detects that, the first runs are way over 100k
[11:50:54] <mru> but you're not measuring the effects of chucking out everything that was in the cache before the loop
[11:51:02] <KotH> BastyCDGS: you dont have a fully associative cache
[11:51:40] <mru> wouldn't help if he did
[11:52:03] <mru> with LRU replacement he'd still lose everything that was there
[11:52:06] <KotH> BastyCDGS: you can only guesstimate what the cache behaviour is in simple programs... anything that is bigger is nearly impossible to know in advance
[11:52:10] <kierank> this conversation is getting painful to watch
[11:52:19] <mru> and random replacement is quite common too
[11:52:21] <KotH> kierank: then dont watch
[11:54:13] <BastyCDGS> mru, what do you think about creating the 256K table in decode_init with malloc and then filling it there?
[11:54:30] <BastyCDGS> av_malloc of course ;)
[11:54:32] <mru> that will obviously not make any difference
[11:54:45] <BastyCDGS> I meant not because of speed but library size
[11:54:54] <mru> separate discussion
[11:55:32] <BastyCDGS> but maybe, filling it by hand will fill L2, too. so it might give a speed impact but hard to say which one
[11:55:45] <KotH> kierank: if you want to watch something, i suggest you watch "kamen no maid guy" :)
[11:55:47] <mru> go read about caches
[11:55:55] <KotH> mru: uhmm..
[11:56:03] <KotH> mru: there are no good texts on cache behaviour
[11:56:07] <BastyCDGS> writing values will fill the caches
[11:56:17] <funman> BastyCDGS: there is a CONFIG_SMALL define already, perhaps you could use that
[11:56:28] <mru> KotH: hmm, I don't recall reading much about them either...
[11:56:28] <KotH> mru: not even H&P covers anything more recent than a P-II
[11:56:44] <mru> nothing fundamental has changed since then
[11:56:52] <KotH> nope
[11:57:05] <BastyCDGS> the optimization manual I posted quite a time here has lots of text about cache opts
[11:57:05] <KotH> just that we do not have a two level cache, but a three level
[11:57:16] <mru> that's not a fundamental difference
[11:57:24] <KotH> the delay difference between L1 and DRAM grow by one or two magnitudes
[11:57:34] <BastyCDGS> http://www.agner.org/optimize/
[11:57:36] <mru> that's why L2 and even L3 are important now
[11:58:15] <KotH> and cache behaviour has become a major bottleneck while in P-II times the CPU contributed to a larger portion of the processing time
[11:58:38] <mru> the basic operating principles of the caches are still the same
[11:58:45] <mru> fancy prefetchers aside
[11:58:58] <KotH> IIRC H&P only mentiones caches, explains how they work, but does not discuss the impact of caches to programming and optimization
[11:58:59] <BastyCDGS> funman, thank you for the advice regarding CONFIG_SMALL
[11:59:15] <mru> BastyCDGS: leave that for later
[11:59:54] <BastyCDGS> KotH the link I posted does cover impact of caches to programming, both asm and C/C++
[12:00:23] <mru> then there are three alternatives
[12:00:29] <mru> 1) you are wrong and it doesn't
[12:00:32] <mru> 2) you didn't read it
[12:00:36] <mru> 3) you didn't understand it
[12:02:06] <KotH> BastyCDGS: it hardly touches the issues around caches
[12:02:18] <funman> Would you be interested by a piece of ARM asm code from rockbox for the flac decoder?
[12:02:31] <KotH> BastyCDGS: it tells you things that were true (ok, still are) 10y ago.. today it's a lot more complex
[12:02:35] <funman> it's slower than C, but it's assembly, so it's better right?
[12:02:53] <BastyCDGS> KotH the optimization manual deals anything up to Core2Duo
[12:02:55] <KotH> funman: give it to mru, he'll make it 3 times as fast
[12:02:58] <KotH> ;)
[12:03:14] <spaam> KotH: only 3 times? :O
[12:03:32] <funman> it's armv4 so i'm not sure he's interested
[12:03:42] <KotH> BastyCDGS: i just skimmed over optimizing_cpp.pdf... it doesnt explain anything of the more complex issues
[12:03:59] <funman> http://svn.rockbox.org/viewvc.cgi/trunk/apps/codecs/libffmpegFLAC/arm.S?view=markup
[12:04:22] <mru> those docs are heavily focused on x86 stuff
[12:04:25] <KotH> BastyCDGS: cache/memory behaviour is something really hard to predict and you have to think yourself trough a lot of cases to grasp what's going on
[12:04:36] <mru> it sounds like you need to understand better how caches work in general
[12:04:42] <KotH> BastyCDGS: it's not something that can be explained in a few pages of lengthy text like that
[12:04:53] <mru> things like associativity and replacement policy
[12:05:00] <KotH> BastyCDGS: at least not if it's so basic level like this one
[12:05:25] <BastyCDGS> I didn't say it's full comprehensive and will explain all, but anyway it's a good lecture to start with
[12:05:47] <funman> it replaces this loop: http://git.ffmpeg.org/?p=ffmpeg;a=blob;f=libavcodec/flacdec.c;hb=HEAD#l389
[12:05:50] <BastyCDGS> I just said that it covers cache programming, not more not less
[12:06:13] <KotH> s/cover/mention/
[12:06:21] <mru> funman: the bps > 16 one?
[12:06:38] <KotH> BastyCDGS: do you own a copy of H&P ?$
[12:06:50] <BastyCDGS> no I don't
[12:07:01] <funman> yes, although the condition in rockbox is different but i can't tell why
[12:07:21] <KotH> BastyCDGS: get one
[12:07:59] <BastyCDGS> btw, I should mention the speed of my RAM here too, not the processor only ;)
[12:08:00] <KotH> BastyCDGS: it might sound stupid, but H&P covers all the basic stuff you need to know about most topics relevant in programming of fast code
[12:08:04] <BastyCDGS> it's 333MHz DDR1
[12:08:06] <mru> funman: why the fuck do they care about bpp>16?
[12:08:23] <BastyCDGS> mru, I use flacs with bpp > 16 ;)
[12:08:32] <funman> what's wrong with that?
[12:08:33] <mru> BastyCDGS: not on a hacked ipod
[12:08:45] <KotH> BastyCDGS: sadly, H&P doesnt go much into multiprocessor systems
[12:08:46] <mru> that said, gcc does some pretty grim things with the <=16 part too
[12:09:00] <mru> but flac decoding is so blazingly fast I can't be bothered to dig into it
[12:09:19] <BastyCDGS> KotH, do you have a PDF?
[12:09:38] <kierank> lol
[12:10:10] <KotH> BastyCDGS: no
[12:10:19] <KotH> BastyCDGS: but i'm quite sure you can find one on the net
[12:10:27] <KotH> BastyCDGS: though, this is a book worth to buy
[12:10:37] <BastyCDGS> what's the full name of the book?
[12:11:02] <funman> mru: if we only supported widespread files, we wouldn't have an atrac decoder!
[12:11:23] <mru> funman: I didn't say not to support them at all
[12:11:26] <KotH> BastyCDGS: computer architecture a quantitative approach
[12:11:27] <kshishkov> funman: but it's a Sony!
[12:11:42] <KotH> BastyCDGS: or The Hennesy And Patterson for short
[12:11:47] <mru> but since the hw is incapable of reproducing >16 bits anyway, you might as well convert them before loading them on the ipod
[12:12:58] <funman> depends 'the hw'
[12:13:27] <funman> there are some rockboxed players which have a spdif output
[12:13:30] <mru> show me a device capable of >16-bit output that's too slow to run that code
[12:14:55] <funman> clearly speed isn't a problem here so i'm wondering why the asm was made here (perhaps just for the dev to have fun)
[12:15:46] <mru> I looked at that code some time ago
[12:16:06] <mru> I didn't manage to achive enough speedup to justify the asm
[12:17:30] <BastyCDGS> funman, is CONFIG_SMALL intended for code size or memory size or both?
[12:17:48] <mru> don't worry about config_small for now
[12:18:00] <funman> hum i'm reading it wrong it replaces the 16bps loop only
[12:18:07] <BastyCDGS> I ask because I want to know if I should use no 256K table at all when CONFIG_SMALL is used or differ between malloc and static const
[12:18:17] <funman> but there's coldfire asm for the other one
[12:18:28] <BastyCDGS> coldfire? nice...modern m68k ;)
[12:18:45] <mru> I'm saying you should almost certainly not have a 256k table at all
[12:19:29] <funman> BastyCDGS: code size is in memory anyway
[12:19:36] <funman> -size
[12:23:42] <funman> BastyCDGS: so it's for memory (code+data), not storage. If the table has the same size if you generate it at runtime or not, then just make it static (storage is cheap)
[12:24:48] <BastyCDGS> mru, maybe you missed that, but in the 256k table the elements are usually accessed nearby, except when plane jumps
[12:28:27] <kierank> Why does this function overwrite the register it's meant to be preserving on the stack: http://pastebin.org/217577 ?
[12:28:47] <kierank> effectively overwrite the register that is
[12:29:26] <mru> BastyCDGS: two words: replacement policy
[12:38:42] <BastyCDGS> mru, good news for you, I dropped the 256k table
[12:38:49] <BastyCDGS> and rearranged some stuff based on the old dp32 opt
[12:38:53] <BastyCDGS> now I have 24k
[12:39:09] <BastyCDGS> I think that's good enough to sacrify the 256k table (which just gets 22k)
[12:41:01] <BastyCDGS> http://pastebin.org/217641
[12:41:48] <mru> but you're still not doing the benchmarks properly
[12:42:57] <mru> and what's with the local array?
[12:44:12] <BastyCDGS> do you think it's better to use the lut directly?
[12:49:16] <mru> depends on why you did it like that
[12:49:34] <mru> what does the table definition look like?
[12:50:07] <BastyCDGS> const uint32_t *lut[4] = {plane32_lut[plane][0],
[12:50:07] <BastyCDGS>                               plane32_lut[plane][1],
[12:50:07] <BastyCDGS>                               plane32_lut[plane][2],
[12:50:07] <BastyCDGS>                               plane32_lut[plane][3]};
[12:50:21] <mru> that's what you pasted, yes
[12:50:30] <mru> not what I asked
[12:50:48] <BastyCDGS> damn that's great! 17K
[12:51:22] <mru> ?
[12:52:04] <funman> 17K? that's cold
[12:52:14] <BastyCDGS> http://pastebin.org/217681
[12:52:32] <BastyCDGS> changing this to this pastebin was a jump from 25k to 17k dezicycles
[12:53:08] <av500> KotH: for unknown reasons my company decided to buy that book....
[12:53:21] <mru> BastyCDGS: what does the table definition look like?
[12:53:39] <mru> you obviously changed it between those pastes
[12:53:54] <mru> it's impossible for both of those to compile
[12:53:57] <BastyCDGS> current:
[12:53:58] <BastyCDGS> static const uint32_t plane32_lut[32][16*8] = {
[12:54:30] <BastyCDGS> old:
[12:54:30] <BastyCDGS> static const uint32_t plane32_lut[24][4][16] = {
[12:54:50] <BastyCDGS> the 24 here was changed to 32 (just copied that from patch file)
[12:55:16] <mru> why did 4 turn into 8?
[12:55:47] <BastyCDGS> because the old code used get_bits(&gb,4);
[12:56:55] <BastyCDGS> new table size is 16K
[12:58:49] <BastyCDGS> lol, mru, I see my fault it's not necessary ;)
[13:03:21] <KotH> av500: hmm?
[13:03:31] <KotH> av500: H&P isnt much worth for a reference book
[13:03:38] <av500> I did not care
[13:03:40] <av500> it was cheap
[13:04:43] <av500> and I did not order any books this month :)
[13:11:16] <BastyCDGS> mru
[13:11:19] <BastyCDGS> the final code:
[13:11:19] <BastyCDGS> http://pastebin.org/217726
[13:11:21] <BastyCDGS> works
[13:11:24] <BastyCDGS> 18k dezicycles
[13:12:13] <BastyCDGS> sry this one:
[13:12:13] <BastyCDGS> http://pastebin.org/217733
[13:12:42] * mru -> airport
[13:13:16] <BastyCDGS> where you going to fly too? to the moon to watch the back side and check if gcc works then? :D
[13:13:37] <BastyCDGS> do you see further possibilities for optimize?
[13:17:52] <Tjoppen> won't that segfault on line 42 if *buf >= 8?
[13:18:08] <Tjoppen> *43
[13:18:48] <BastyCDGS> the offset is 0-63
[13:19:23] <Tjoppen> .. ah, right
[13:19:28] <BastyCDGS> since I'm shifting << 2 it's 0 <= 0x3C at maximum
[13:19:42] <Tjoppen> I was looking at the size of the first dimension (32)
[13:19:57] <BastyCDGS> I just tried it Ooze.iff decodes fine ;)
[13:20:46] <Tjoppen> ah, and lowest two bits are always zero. fair enough
[13:26:58] <BastyCDGS> patch submitted to ML
[13:27:02] * mru @ airport
[13:27:09] <av500> mru: no ash cloud?
[13:27:10] <BastyCDGS> lol
[13:27:48] <BastyCDGS> are you going to the monks, mru? was I that terrible?
[13:28:04] <mru> FRA
[13:28:23] <BastyCDGS> FRA?
[13:28:25] <BastyCDGS> france?
[13:28:48] <wbs> fÃ¶rsvarets radioanstalt? ;P
[13:30:06] <kierank> could be frankfurt
[13:30:21] <av500> its the right 3 letter code...
[13:30:43] <BastyCDGS> in sports 3 letter code for france is FRA, too...
[13:30:55] <av500> he flies, not runs
[13:31:01] <funman> who would want to go to france?
[13:31:36] <pJok> FRA logs everything in here
[13:34:00] <mru> through security...
[13:34:36] * kshishkov knows where mru is heading to
[13:34:47] <BastyCDGS> and where?
[13:34:55] <kshishkov> Frankfurt
[13:35:06] <BastyCDGS> why?
[13:37:33] <mru> BEEEEER!!
[13:38:54] <Tjoppen> öl \o/
[13:39:05] <BastyCDGS> hint: Erdinger Hefeweizen ;)
[13:39:37] <Tjoppen> weihenstephaner
[13:40:00] <BBB> BastyCDGS: decodeplane32() looks good
[13:40:31] <av500> BastyCDGS: erdinger??? you have no taste
[13:40:45] <av500> erdinger is the heinken of hefes
[13:41:15] <mru> ack
[13:42:29] <BBB> BastyCDGS: unsigned/signed is irrelevant, we're reading from a bitstream and believe me, bitstreams have no concept of signed/unsigned
[13:42:32] <BBB> they have concept of bits
[13:42:33] <BastyCDGS> well I know of no german which doesn't like erdinger...it's one of the favorite beers there
[13:42:38] <BBB> if the bits are invalid, they are invalid
[13:42:40] <BastyCDGS> but remember there are different erdingers
[13:42:47] <BBB> whether you print them in signed or unsigned, they're crap either way
[13:42:56] <BBB> it's like a fourcc
[13:43:00] <BBB> you can print it as a fourcc
[13:43:09] <BBB> but if it's invalid, it's probably something like ^d%$x
[13:43:12] <BBB> if you're lucky
[13:43:48] <av500> BBB: hey, thats a valid one! :)
[13:44:06] <BBB> ^d?
[13:44:08] <BBB> no it isn't
[13:44:16] <av500> yes, klingon war codec
[13:44:30] <BBB> ah, of course
[13:44:41] <BBB> I thought that was \t%$x
[13:44:48] <av500> yes, the draft one
[13:45:01] <BBB> god damn these stupid klingons and their unusual integer counting
[13:45:07] <av500> what became klingox 3.11
[13:46:38] <Tjoppen> bah. lavf and libmicrohttpd are not on speaking terms
[13:48:38] <BBB> mru: decodeplane32() optimization ok?
[13:50:33] <BastyCDGS> BBB, fixed width & height using avcodec
[13:50:37] <BastyCDGS> check_dim
[13:50:44] <BBB> good :)
[13:50:50] <BBB> what made you think it doesn't check <=0?
[13:50:59] <BastyCDGS> because somebody yesterday told me so
[13:51:12] <BBB> if it were, I'd asked you to patch avcodec_check_dims() :)
[13:51:20] <BBB> because w=0/h=0 is obviously a bug
[13:51:37] <BastyCDGS> maybe it was fixed in the meantime?
[13:51:44] <mru> no
[13:52:41] <BastyCDGS> I thought you are sitting in the plane right now? ;)
[13:52:53] <wbs> BastyCDGS: if you don't do it already, subscribe to cvslog, or refresh the list of commits at git.ffmpeg.org, you'd know that nothing such was modified between yesterday and now
[13:53:16] <BastyCDGS> wbs, I know but didn't check today right now
[13:55:49] <Tjoppen> bah, nbgit doesn't do blame
[13:56:22] <Tjoppen> otherwise quite a nice plugin (for netbeans)
[13:56:35] <mru> eeeew
[13:58:46] * BBB wonders what mru "ehw"s about
[13:58:55] <av500> the food on the plane
[13:59:05] <av500> shepherds pie
[13:59:15] <BBB> ugh
[13:59:29] <kshishkov> ploughman's lunch :)
[13:59:32] <BBB> if you fly cathay, they give you a delicious instant noodle after you wake up
[13:59:43] <BBB> it's said to be the reason why all asians fly cathay
[14:00:20] <av500> cathay is not the nicest asian carrier
[14:00:34] <kshishkov> BBB: maybe he just reacted like that when hearing certain J*v* IDE name
[14:00:35] <kierank> singapore airlines is supposedly good
[14:00:38] <av500> yup
[14:01:17] <av500> and coop noodles are cheap
[14:01:20] <av500> err cup
[14:01:29] <kshishkov> Ukrainian International Airlines are not good
[14:02:44] <kshishkov> they'll give you some snacks from local store if your flight lasts for more than two hours
[14:04:23] <Tjoppen> netbeans isn't that bad. it's better at parsing C++ than MSVC for instance
[14:05:00] <BBB> singapore airlines is good
[14:05:06] <BBB> but doesn't fly the route I want
[14:05:28] <BBB> malaysian airlines is terrible :-p
[14:07:21] <kshishkov> Tjoppen: ok, you've convinced me. I'll try surstromming next
[14:07:41] <BBB> av500: hahahahhahahhaha :)
[14:07:57] <BBB> av500: I pay a good $25-$30 for the best noodle you've ever had here in new york
[14:08:12] <BBB> they are so delicious, people pay $30 and stand in line (reservations not allowed) for >2hrs
[14:08:17] <av500> thats not the ones you get on a cathay flight
[14:08:22] <BBB> good noodles are priceless :)
[14:08:30] <BBB> I've been told cathay's noodle is quite good
[14:08:55] <av500> good compared to nothing on a 15h flight, yes
[14:09:00] <mru> korean air is good
[14:09:33] <mru> 3 meal choices even in economy
[14:09:44] <BastyCDGS> BBB, fixed the space ;)
[14:09:47] <av500> beef, fish and surprise
[14:09:58] <BBB> hm, good, I can probably apply that patch and the minor dp8 one then
[14:09:59] <av500> beef is gone after 3 rows :)
[14:10:04] <BBB> mru: dp3 ok to apply also?
[14:10:10] <BBB> I want to work on getting the ham patch in
[14:10:16] <av500> tasty ham
[14:10:51] <mru> BBB: probably ok, haven't seen the final patch
[14:11:28] <BastyCDGS> I will continue with HAM on wednesday ok? I have to do some math stuff, in 2 days the exams are
[14:11:36] <BBB> sure
[14:11:36] <BastyCDGS> have to learn some stochastics and linear algebra
[14:11:51] <BBB> this patch is ok, will apply later today
[14:12:14] <BastyCDGS> but what I would do right now as a very last...grayscale stuff
[14:13:11] <mru> learn that stuff well, you'll be needing it
[14:14:02] <BastyCDGS> you think I'll better START_TIMER/STOP_TIMER stuff then? ;)
[14:15:04] <BastyCDGS> BBB, what's with palette underflow patch it seems to be okay to, at least nobody complained right now ;)
[14:15:24] <BBB> I'll look at it
[14:15:33] <BBB> have to do actual work also ;)
[14:17:58] <BastyCDGS> oh wait, I see a problem with dp32 patch
[14:17:59] <BastyCDGS> endianess
[14:19:00] <BBB> rofl :)
[14:19:14] <BBB> I hope you test this stuff on both le and be
[14:19:41] * mru points at saracen
[14:19:44] <wbs> it may be a good treat not to be so triggerhappy
[14:20:05] * mru onboard
[14:20:15] * mru out
[14:20:56] <kshishkov> trevlig resan
[14:22:42] <Tjoppen> kshishkov: hehe
[14:23:31] * Tjoppen goes back to staring at wireshark's dumps
[14:29:27] <BastyCDGS> hmm the #define LUT32 has now a width of 82 lines
[14:30:01] <kshishkov> that's height
[14:30:23] <BastyCDGS> I changed them to endian awareness which makes the lines twice as long
[14:30:32] <BastyCDGS> AV_LE2ME32C(1 << plane), AV_LE2ME32C(1 << plane), AV_LE2ME32C(1 << plane), AV_LE2ME32C(1 << plane),   \
[14:36:21] <BastyCDGS> you're right I was looking at lines sorry
[14:36:27] <BastyCDGS> columns is over 100
[14:37:00] <BBB> make a second macro
[14:37:33] <BBB> #define LUT32LINE(a,b,c,d) \ AV_LE2ME32C(a), \ \n AV_LE2ME32C(b), \ \n [etc]
[14:37:42] <BBB> and then use LUT32LINE(0, 0, 0, 0), \
[14:38:06] <BBB> then indenting remains consistent and each line only gros 11 characters
[14:38:11] <BBB> grows
[14:51:00] <BastyCDGS> 72 cols now BBB ;)
[14:56:27] <BastyCDGS> uhh
[14:56:37] <BastyCDGS> it's wrong with the AV_LE2ME32C on be
[14:59:08] <BastyCDGS> and correct without on be/le ;)
[14:59:18] <BastyCDGS> so the patch should be fine as submitted to ML
[15:27:38] <BBB> ok
[15:31:21] <BastyCDGS> grayscale patch is almost finished
[15:31:23] <BastyCDGS> just one minute
[15:33:49] <BastyCDGS> submitted
[16:06:51] <Tjoppen> how.. interesting
[16:07:43] <Tjoppen> instead of getting somethihng like "GET /foo/bar" in for the first header line in libmicrohttpd, when lavf connects, I get lines like "c2"
[16:08:16] <Tjoppen> no wonder the seeks fail
[16:30:18] <Tjoppen> hah, I think I figured it out
[16:31:41] <Tjoppen> looks like I need to patch the http protocol handler.. when it seeks chunksize needs to be reset
[16:32:18] <peloverde> http://www.embedded.com/columns/technicalinsights/224701206?cid=RSSfeed_embedded_news
[17:01:52] <CIA-7> ffmpeg: rbultje * r23080 /trunk/libavcodec/iff.c:
[17:01:53] <CIA-7> ffmpeg: Ensure that width and height are > 0. avcodec_open() itself only checks that
[17:01:53] <CIA-7> ffmpeg: they are >= 0.
[17:01:53] <CIA-7> ffmpeg: Patch by Sebastian Vater <cdgs basty googlemail com>.
[17:18:59] <CIA-7> ffmpeg: rbultje * r23081 /trunk/libavcodec/iff.c:
[17:18:59] <CIA-7> ffmpeg: Optimize decodeplane32().
[17:18:59] <CIA-7> ffmpeg: Patch by Sebastian Vater <cdgs basty googlemail com>.
[18:21:39] * janneg slaps ramiro with the there-is-no-libfaadbin_decoder-trout
[18:36:26] <BBB> maybe I'm just being unclear on the mailinglist
[18:36:28] <BBB> hmm...
[18:36:44] <BBB> can anyone confirm my replies are unclear in that grayscale/iff thread?
[19:01:28] <ramiro> ./configure --disable-filters && make --> ffmpeg.c:1664: undefined reference to `av_vsrc_buffer_add_frame'
[19:02:23] <ramiro> mru: ^^
[19:02:32] <ramiro> janneg: ?
[19:05:43] <KotH> BBB: do i have to read your replies first, or can i just confirm it? ;)
[19:14:21] <BBB> you lazy bugger :)
[19:19:15] <_av500_> BBB: what did you say?
[19:42:18] <janneg> ramiro: ./configure --enable-gpl --enable-libfaad --enable-libfaadbin --disable-ffserver misses -ldl in extralibs
[20:08:53] <ramiro> janneg: oooh, I finally understand it =) I was the one to introduce that, right?
[20:16:51] <janneg> ramiro: yes, three years ago, so I don't think it's an urgent problem
[20:17:26] <janneg> ramiro: patch sent to ml
[21:17:03] <CIA-7> ffmpeg: reimar * r23082 /trunk/libavcodec/x86/h264dsp_mmx.c:
[21:17:03] <CIA-7> ffmpeg: Replace more "m" constraints with MANGLE to fix compilation issues
[21:17:03] <CIA-7> ffmpeg: with x86_32 gcc 4.4.4 and -fPIC.
[22:21:56] <peloverde> "Howcast found that for some of its video transcoding, the video quality produced by the open-source application FFmpeg wasn't up to snuff." ouch http://news.idg.no/cw/art.cfm?id=8373D6CF-1A64-6A71-CE7AABF6DAB260F5
[23:52:47] <Compn> peloverde : ehe, good article, but i've never heard of 'howcast'
[23:54:39] <Compn> lol
[23:54:46] <Compn> howcast using ... facebook for video hosting
[23:56:04] <Compn> oops, no its not
[23:56:06] <Compn> damn facebook crap
[23:57:52] <Compn> http://media.howcast.com/system/videos/6/28/03/328.flv
[23:57:58] <Compn> uses lame at least