[FFmpeg-devel-irc] IRC log for 2010-06-16

irc at mansr.com irc at mansr.com
Thu Jun 17 02:00:42 CEST 2010


[01:08:33] <bcoudurier> oh men, people can say flash sucks, but all other alternatives suck even more (except on the desktop of course)
[01:13:16] <CIA-92> ffmpeg: hyc * r23621 /trunk/ffserver.c:
[01:13:16] <CIA-92> ffmpeg: When reading a stream, should retry on EAGAIN instead of just failing. Also,
[01:13:16] <CIA-92> ffmpeg: when reading a live feed, should retry regardless of whether any client has
[01:13:16] <CIA-92> ffmpeg: opened the stream.
[01:13:53] <hyc> oh while you're here, bcoudurier: one more patch - I rewrite the ffm file header with whatever header ffmpeg sent
[01:14:06] <hyc> since ffserver's notion of the codec parameters is pretty sparse
[01:14:46] <hyc> this way the ffm file will have a fully populated header, and can be played directly with ffplay etc.
[01:15:02] <hyc> can't dig up the email with the patch at the moment
[01:15:08] <hyc> still trawling
[01:16:28] <hyc> ah, here. the first hunk of the patch. http://lists.mplayerhq.hu/pipermail/ffmpeg-devel/2010-May/088944.html
[01:17:15] <hyc> though you could give a look to the second hunk too.
[01:18:32] <hyc> probably that should be handled by a different config option. basically I want to be able to define a stream with very few fixed parameters. as long as the bitrate is right, it should be OK for the framerate to be 24, 30, or whatever the input was.
[04:54:03] <CIA-92> ffmpeg: michael * r23622 /trunk/libavutil/internal.h: Document FF_SYMVER and attribute_used
[05:08:53] <Gottaname|Mobili> !seen hyc
[05:08:56] <Gottaname|Mobili> :(
[05:09:14] <siretart> god morgon
[05:09:21] <siretart> oh, michael is up early as well :-)
[05:11:37] <elenril> morning
[05:13:49] <elenril> siretart: "- libvpx is considered (L)GPL incompatible" ?
[05:21:19] <astrange> not anymore
[05:22:23] <elenril> i know, so what's this line doing in release notes?
[05:25:32] <elenril> siretart: 'the matroska demuxer was exented'
[05:34:37] <KotH> moin girls
[05:38:13] <siretart> elenril: please fix straight in svn
[05:38:26] <siretart> hi KotH!
[05:53:49] <elenril> siretart: i can't ;)
[06:11:18] <CIA-92> ffmpeg: siretart * r23623 /branches/0.6/RELEASE: remove note that libvpx was considered (L)GPL incompatible
[06:11:24] <siretart> elenril: done
[06:11:30] <siretart> need to hurry to work now
[06:11:53] <siretart> wow, phornix has already picked it up: http://bit.ly/bsn3Fx
[06:14:32] <av500> siretart: somebody needs to update the web!
[06:14:52] <peloverde> oh noes! the tubes!
[06:15:08] <av500> the web stil says lgpl incompat!
[06:15:11] <av500> omg
[06:15:19] <av500> it will be in google cache in 5 min...
[06:16:06] <siretart> av500: the website is fine, you mean the file in the releases/ subdir?
[06:16:28] <siretart> it is included in the tarball, and I'm not going to reroll the tarballs for that
[06:16:44] <av500> http://ffmpeg.org/releases/ffmpeg-0.6.release
[06:16:50] <av500> - libvpx is considered (L)GPL incompatible
[06:17:08] <av500> it might scare ppl
[06:17:11] <siretart> yes, its a bug in the release, maybe we can bribe Koth to cowboy/fake the file, it is not signed after all
[06:17:16] <siretart> will be fixed in 0.6.1
[06:18:31] <ohsix> it has a date on it right? it was correct when it was written?
[06:19:18] <siretart> ohsix: it doesn't mention the timezone, we have still june 15 for nearly 4 hours in apia time ;-)
[06:19:30] <siretart> afk now for work
[06:51:48] <elenril> kshishkov: http://news.slashdot.org/story/10/06/15/2256227/In-Ukraine-IT-Freelancing-Under-Threat
[06:59:52] <av500> elenril: "...or leave the country."
[07:11:10] <KotH> siretart: if you tell me what you want, then i might do it for you
[07:12:51] <siretart> KotH: av500 suggests to delete the (last) line in ffmpeg-0.6.release that claims libvpx was considered (L)GPL incompatible
[07:16:33] <KotH> siretart: and what do you offer to bribe me?
[07:16:44] <superdump> siretart: maybe it should read HE AAC v1 support rather than just HE AAC
[07:17:01] <superdump> the PS decoding necessary for full HE AAC support is still being worked on
[07:17:43] <superdump> though if 0.6 is likely to hang around for any length of time, it would be nice for PS support once merged to be backported
[07:18:40] <siretart> superdump: sounds good. the changelog should be clear enough for that, feel free to adjust the website
[07:19:09] <siretart> superdump: as for the backport, sounds great to me. please nominate the revisions in trunk to backport and I'll consider it
[07:19:46] <superdump> they haven't landed yet but peloverde is working on it :)
[07:20:07] <siretart> great! :-)
[07:32:34] <av500> lol at steve lhomme: "...For example in Matroska audio frames have to be actual frames for decoding.."
[07:33:23] <av500> well, except for RA when they are rmvb scrambled audio frames because they are to laze to write a descrambler in libmkv...
[07:34:50] <jai> right no we just duplicate mplayer's descrambling code all over the place
[07:34:53] <jai> *now
[07:35:25] <av500> the descrambling code has one place, that is the RMVB demuxer
[07:35:42] <av500> why they decided to just copy the whole "superblocks" into mkv is beyond me
[07:35:56] <av500> what if I need to remux and cut inside such a superblock?
[07:36:04] <av500> I use MKV edit lists? :)
[07:36:19] <jai> dont do that with realaudio then ;)
[07:36:44] <av500> not that I would do anything with realaudio unless at gunpoint
[07:36:44] <elenril> why would you want to use realaudio anyway =p
[07:37:06] <av500> elenril: only as long as chinese ppl buy my products...
[07:37:09] <saintdev> because it's 1995
[07:37:32] <av500> already? new windows should be out soon then
[07:38:09] <saintdev> oh wait, they were only just founded in 95
[07:38:35] <elenril> av500: and ipv4 adresses will depleted in 2 years
[07:38:49] <saintdev> guess that makes it 97 or so when it was really popular
[07:39:13] <av500> elenril: ipv4 will always be depleted in 2 years
[07:40:05] <elenril> av500: MyPointExactly =p
[07:40:15] <superdump> http://lwn.net/Articles/392153/
[07:40:26] <av500> elenril: no matching trope?
[07:41:04] <kshishkov> elenril, that's what you get when goverment decides to regulate things. Previously nobody cared about outsourcing and it was a good currency income. Let's see what they get now
[07:41:29] <elenril> sure, http://tvtropes.org/pmwiki/pmwiki.php/Main/DontExplainTheJoke
[07:42:05] <av500> lol: 6) The joke is German, where a lot of jokes are explained in the end, for some reason.
[07:42:49] <kshishkov> elenril: well, I remember hearing a prayer "oh god, don't let them regulate electronic commerce and stuff"
[08:45:27] <Tjoppen> the fft visualizer in ffplay shows a bunch of green lines when playing mono
[08:48:48] <av500> wrong color?
[08:49:29] <Tjoppen> more like it thinks it's stereo and transforms garbage
[08:50:03] <Tjoppen> it's red-ish for the left channel, green-ish for right, or white if both channels are the same. for mono, it should therefore just do the one channel and display in white
[08:52:49] <Tjoppen> ah, someone already reported it: https://roundup.ffmpeg.org/issue2005 "  FFplay displays random data for audio in mono files."
[08:53:27] <elenril> maybe that's a feature
[08:54:03] <Tjoppen> worksforme? :)
[09:29:13] <Kovensky> <@superdump> http://lwn.net/Articles/392153/ <-- lol @ that bawjaws comment
[09:29:17] <Kovensky> sure is fail
[09:32:13] <Kovensky> "I’ve never been so happy to see the number -448" lol mike
[09:33:38] <av500> I have had such moments
[10:20:08] <nfl> merbzt: ping
[10:29:25] <janneg> has anyone iso 14496-3 2009? google doesn't. I'm working on a LATM demuxer
[10:30:23] <av500> http://jongyeob.com/moniwiki/pds/upload/14496-3.pdf
[10:30:24] <mru> janneg: only 2005 here
[10:30:35] <av500> dunno if its 2009
[10:33:26] <janneg> mru: let's first see which version is behind the link. will take some minutes 52M with 50k
[10:34:06] <av500> janneg: yes, im at 50%
[10:34:13] <av500> 62%
[10:34:45] <janneg> 42% and it is accelerating
[10:35:38] <av500> 93
[10:36:17] <av500> 2004
[10:36:26] <av500> Amd2:2004
[10:36:42] <mru> janneg: sure you need 2009?
[10:36:44] <CIA-92> ffmpeg: cehoyos * r23624 /trunk/libavutil/mem.h: icc 12 finally fixed attribute(used) so gcc's DECLARE_ASM_CONST can be used.
[10:37:35] <av500> and it' a frakin scan?
[10:38:38] <janneg> mru: I don't think I need the latest for LATM.
[10:39:09] <jai> demuxer or bitstream filter?
[10:39:27] <janneg> av500: already suspected that. the pdf in the iso webstore is just 8M
[10:39:45] <janneg> jai: demuxer, and I have paul's bitstream filter
[10:39:58] <jai> hmm, k
[10:43:04] <janneg> av500: that's just iso14496-3:2001/Amd2:2004 SSC and already more than 1000 pages
[10:43:59] <mru> I have the 2005 version if that helps
[10:44:02] <mru> as proper pdfs
[10:44:09] <av500> mru: A8 without VFP . a big loss?
[10:44:20] <mru> is that even a valid configuration?
[10:44:40] <av500> it exists
[10:44:49] <mru> where?
[10:45:05] <av500> http://www.freescale.com/webapp/sps/site/prod_summary.jsp?code=i.MX508&webpageId=1267451468810721548DD8&nodeId=0162468rH31143ZrDR8DD8&fromPage=tax
[10:45:19] <mru> oh those idiots
[10:45:26] <av500> :)
[10:45:38] <barque> ARM sucks
[10:45:43] <barque> I used an ARM9 in my EDP
[10:45:46] <barque> slow as all hell
[10:45:52] <janneg> mru: yes
[10:45:52] <mru> barque: careful what you say
[10:46:04] <barque> if you want anything useful try PPC
[10:46:07] <av500> barque: please, keep some of your wisdom to yourself
[10:46:08] <barque> embedded of course
[10:46:08] <mru> rotfl
[10:46:34] <barque> and anything low powered try PIC
[10:46:40] <barque> and cheap
[10:46:43] * mru dies laughing
[10:46:53] <siretart> any mips fans around?
[10:46:55] <barque> well keep laughing
[10:47:09] <spaam> RIP mru :(
[10:47:09] <mru> siretart: my SGI Octane has fans
[10:47:16] * siretart fetches some popcorn :-)
[10:47:55] <spaam> now someone going to say that x86 is better then * ? :P
[10:48:18] <barque> No seriously, please tell me the strengths of ARM over PPC or PIC
[10:48:23] <barque> show me how great(er) it is
[10:48:26] <mru> barque: what were you doing with the arm9?
[10:48:40] <siretart> spaam: or m68k ;-)
[10:48:56] <spaam> siretart: :)
[10:48:56] <av500> x86 is better then x85
[10:48:57] <barque> small thin client embedded video/audio processing , should've went with a DSP but I had *no* idea this thing  was gonna be this bad
[10:49:05] <av500> lol
[10:49:20] <mru> barque: wrong tool for the job
[10:49:21] <av500> siretart: pass some popcorn
[10:49:34] <barque> of course but, 640x480 integer muls and this thing choked for 2 seconds
[10:49:35] <mru> a ppc with the same price or power consumption would be equally slow at that
[10:50:39] <av500> mru: back to vfp, big loss?
[10:50:44] <mru> yes
[10:50:55] <mru> even the a8 vfp is much faster than softfloat
[10:50:59] <mru> and there's also no neon
[10:51:01] <mru> HUGE LOSS
[10:51:02] <lu_zero> well ppc is easier to understand (less options)
[10:51:02] <av500> ah damn, its the fpu it self. eight
[10:51:05] <av500> ah damn, its the fpu it self. right
[10:51:10] * av500 stupid
[10:51:27] <mru> lu_zero: that's highly debatable
[10:51:37] <mru> the ppc instruction set has its share of quirks
[10:52:04] <mru> not saying ppc is bad per se
[10:52:22] <lu_zero> mru: there are fewer instructions -> possibly less quirk
[10:52:47] <lu_zero> beside when someone microcodes some instructions to spare space...
[10:53:04] * lu_zero looks at the multiple load and store that are now quite actively deprecated =_=
[10:53:12] <barque> honestly, you either want performance or low power. If you're looking at mass produced cheap-as-all-hell low powered stuff, PIC is your man. If you want performance, try a DSP. I really don't know where anything else fits in.
[10:53:36] <lu_zero> barque: the bfin is right in between
[10:53:53] <lu_zero> (talking about quirky instruction set)
[10:54:04] <barque> I was expecting some nice balance between the two on the ARM... BOOOY was I wrong.
[10:54:22] <lu_zero> barque: arm isn't that ugly
[10:54:24] <mru> you're looking at three corners of a triangle
[10:54:32] <lu_zero> just have to pick the right one
[10:54:38] <mru> PIC is ultra-low power uC
[10:54:39] <lu_zero> (same for ppc)
[10:54:50] <mru> DSP is highly specialised number-crunching
[10:54:54] <barque> maybe it was the ARM9,... meh water spilt anyway. I guess Cortex might've done me better.
[10:55:01] <mru> ARM, MIPS, PPC etc are general-purpose CPUs
[10:55:31] <mru> surely you can't be suggesting a PIC would have done better than an ARM9
[10:55:53] <barque> no, I was thinking (after the fact) that I should've went with a Freescale DSP board
[10:56:01] <barque> I probably should have.
[10:56:36] <lu_zero> barque: all depends on what you have to do
[10:57:03] <lu_zero> freescale arm aren't bad even if you must be careful about their revision and errata in my experience...
[10:57:08] <barque> I know, I just (prior to implementation) was all hyped up about the ARM ... thinking "oh man with all this commosion, this thing will do magic!!"
[10:57:25] <lu_zero> barque: same they said about ppc and mips
[10:57:34] <lu_zero> then apple and sgi went doing something else
[10:57:54] <av500> barque: sorry, you make little sense
[10:58:14] <av500> video decoding on an arm9 is as bad as on a similarly clocked ppc
[10:58:17] <barque> av500, can you scroll up and read? how come you're the only one who can't comprehend.
[10:58:28] <av500> Im extra thick
[10:58:29] <barque> as I said: I was wrong.
[10:58:35] <lu_zero> and tall
[10:58:39] <lu_zero> and overly huge
[10:58:47] <lu_zero> so barque beware
[10:58:57] <av500> lu_zero: :)
[10:58:57] <merbzt> lu_zero: I always get scared when I meet av500
[10:59:22] <av500> barque: not many ppl admit they are wrong on irc, so I guess I missed that :)
[10:59:42] * mru only saw him saying using arm9 was wrong
[10:59:50] <mru> not that his uninformed opinion was wrong
[10:59:54] <barque> yeah my choice of ARM9 for that application was wrong.
[11:00:11] <barque> I guess PPC would've been 'as bad'
[11:00:19] <wbs> I never saw any retraction of the general statement that arm sucks either
[11:00:20] <av500> yep
[11:00:36] <barque> but honestly next 'media' application would definitely need a DSP for me
[11:00:45] <lu_zero> barque: btw
[11:00:54] <mru> guess why there are thousands of chips with ARM _and_ DSP
[11:01:06] <lu_zero> using altivec or neon will give you something that is nearly dsp-like as experience
[11:01:11] <mru> DSPs generally suck as application processors
[11:01:11] <lu_zero> pending some effort
[11:01:28] <barque> DSPs are tricky... you gotta be careful with your own static scheduling
[11:01:34] <mru> we know how they work
[11:01:36] <barque> make sure you don't cause any data hazards
[11:01:39] <lu_zero> the problem is that you have to put that effort most of the time
[11:01:45] <barque> yeah
[11:01:53] <jai> is there a reason we dont have a av_destroy_stream or equivalent function?
[11:02:49] <mru> jai: what would that do?
[11:03:18] <jai> mru: deallocate memory for a single stream and reduce nb_streams
[11:03:47] <jai> there is av_close_input_stream, but that does it for all open streams
[11:03:52] <mru> and when a packet for that stream is encountered?
[11:07:12] <jai> i assume the user application would drop them
[11:08:03] <mru> the user app wouldn't even get them in that case
[11:08:22] <mru> the data you want to release is required to demux properly
[11:08:26] <mru> might be at least
[11:08:29] <jai> ah, i'm looking at this from the muxing perspective
[11:08:49] <mru> then why did you open it in the first place?
[11:09:08] <jai> exactly, the user application shouldnt
[11:09:35] <mru> barque: so seriously... you tried to "process" 640x480 video on an ARM9, failed, and concluded that all ARMs suck?
[11:09:43] <mru> jai: so what's the problem?
[11:10:07] <jai> but i was just thinking if there is a way to possibly rollback the changes av_new_stream does
[11:10:32] <jai> it probably reflects bad api usage i guess
[11:12:01] <KotH> barque: just in case you dont know yet, but the people here have a lot of first hand experience with different cpus and how to write highly optimized asm for those. and they are not doing it since yesterday
[11:13:02] <barque> wow what happened here
[11:13:17] <mru> barque: you made an arse of yourself
[11:13:25] <barque> ok ok hold on
[11:13:29] <barque> 1 by 1 please
[11:13:31] <barque> I was in the kitchen
[11:13:40] <barque> I concluded that by no means ARM has anything special
[11:13:47] <barque> so yeah that's my definition of "sucks"
[11:13:59] <barque> KotH, I didn't say that
[11:14:05] <mru> and what, by your reckoning, has "something special"?
[11:14:09] <_av500_> i dont get the kichen bit..
[11:14:37] <barque> I mentioned two ends of the extreme, PIC or DSPs
[11:14:39] <mru> _av500_: maybe he tried to fry an egg on an ARM and failed
[11:14:40] <barque> they are special
[11:15:01] <mru> barque: you mentioned two of the three major CPU types
[11:15:02] <barque> alright alright, I'm sorry if I made any ARM fanboys angry , you guys can get back to topic now. av500: I was eating
[11:15:08] <mru> PIC is an 8-bit microcontroller
[11:15:16] <barque> not just 8 bit
[11:15:17] <KotH> barque: rotfl.. neither PIC nor DSPs are special
[11:15:18] <mru> very limited in processing power
[11:15:20] <barque> they have 16 and 32 bit lines as well
[11:15:29] <mru> very low power consumption
[11:15:31] <mru> and very cheap
[11:15:37] <barque> yes I talked about all of that
[11:15:43] <mru> you use them when they are good enough for the job
[11:15:49] <mru> you would not use a PIC for video processing
[11:15:50] <KotH> mru: and they^ve troubles if you order more than 10k/month :)
[11:15:56] <barque> and I didn't say you would.
[11:16:37] <mru> a DSP is tailored for running very specific algorithms on large amounts of data
[11:16:42] <KotH> barque: again, the people here know their stuff
[11:16:49] <barque> sigh , I didn't say they don't
[11:16:54] <barque> that does not mean I don't know my stuff
[11:16:54] <KotH> barque: there is no need in telling them what a PIC or DSP is
[11:16:57] <barque> or I assumed otherwise
[11:17:01] <barque> this conversation ends here.
[11:17:09] <KotH> barque: no, you just show that you are very inexperienced
[11:17:16] <barque> back then I was
[11:17:18] <mru> and overconfident
[11:17:25] <mru> and a little arrogant
[11:17:26] <barque> as I said, no I make my choices differently, as I mentioned earlier.
[11:17:41] <barque> s/no/now
[11:17:45] <mru> I'm not stupid, I just make my choices differently
[11:18:08] <KotH> barque: you ended up in a corner of the internet where people are obsessed with getting the most out of any kind of cpu. in a corner where people know how to write efficient code in any language (that supports writting efficient code)
[11:18:20] <barque> you are assuming I didn't write efficient code
[11:18:25] <mru> yes, we are
[11:18:32] <janneg> jai: the problem is that stream numbers are allocated contineously. if you remove stream 0, many applications will miss the last stream since its id is now equal to nb_streams
[11:18:39] <mru> you don't seem like the kind of person who'd know how to write efficient code
[11:18:39] <KotH> barque: most of the people here are writing code longer than you can can spell cpu
[11:18:40] <barque> ok then, stick a poodle up your doodle
[11:19:13] <KotH> insulting op was never a good idea in any channel i've ever been ^^'
[11:19:19] <wbs> what's it with all the arrogant newcomers these days?
[11:19:31] <wbs> "oh please, can anyone notice me? see how I can use all these fancy words"
[11:19:33] * mru must troll them harder
[11:20:10] <KotH> wbs: i could say a lot of things now, but i probably should not, as i wasnt any different 10y ago ^^'
[11:20:57] <janneg> jai: mythtv has patched libavformat to remove streams, shuffle the streams around in the nb_streams large arrays and inform the application by a callback
[11:23:32] <twnqx> KotH: i didn't look at the ffmpeg level of code efficiency since i stopped demo coding around the 486 era :P
[11:24:37] <twnqx> ever since i've stopped at the level of algorithmic optimization... and prayed to the compiler codes for the rest.
[11:25:05] <mru> the compiler god doesn't exist and cannot hear your prayers
[11:25:41] <twnqx> that's something i've learned here, too :X
[11:25:58] <Honoome> mru: ooooh in that is nothing different from other gods, so it's a proper religion? :)
[11:26:09] <mru> hehe
[11:26:20] <KotH> Honoome: are you a true believer in FSM?
[11:26:37] <merbzt> FSM 4 the win
[11:27:08] <jai> janneg: that sounds quite flexible. for the time being though, i've fixed the application
[11:27:41] <Honoome> KotH: I'm actually at a loss, I can't see anything positive at all out there
[11:28:13] <KotH> Honoome: out there?
[11:28:42] <mru> people obsess far too much with looking for meaning, good, or bad in everything
[11:28:48] <mru> things happen, that's it
[11:28:57] <Honoome> KotH: or in here
[11:29:18] <Honoome> mru: is it just me or _bad_ things happen more often than not? :P
[11:29:27] <mru> if good things happen around you, enjoy them while it lasts
[11:29:34] <Honoome> I guess the only thing I can believe in is the insurmontable abyss of the human idiocy
[11:29:48] <mru> I try not to think of that so much
[11:29:51] <mru> it's depressing
[11:30:06] <KotH> Honoome: read "the black swan"
[11:30:22] <KotH> Honoome: and you'll notice that we dont notice the good things as much as we notice the bad things
[11:30:53] <mru> the thing is, most things are neither good nor bad
[11:30:59] <mru> and most good things aren't all that good
[11:31:04] <Honoome> oh I agree
[11:31:22] <mru> it's hard for something to be as good as a magnitude 9 earthquake is bad
[11:32:08] <KotH> what's bad about a mag 9 earthquake?
[11:32:14] <mru> ask reynaldo
[11:32:23] * Honoome can think of Shilling going to Mars, alone
[11:32:28] <Honoome> that'd be as good!
[11:32:39] <mru> that's the thing, would it?
[11:32:48] <mru> far more unlikely, yes
[11:32:56] <Honoome> or the FatELF guy forgetting all about his crazy idea
[11:33:17] <mru> isn't that idea thoroughly dead anyway?
[11:33:43] <iive> Honoome: once i've readed that humans minds are working in crisys mode. Something to do with the survival of the species during the last ice age.  It just means we accept good things for normal and didn't notice them, but we notice bad things and remember them.
[11:33:54] <_av500_> fat elves? yuk
[11:34:09] <mru> that's a contradiction in terms
[11:34:31] <Honoome> hmm some cosplay convention begs you to differ unfortunately
[11:34:49] <lu_zero> Honoome: that are long hobbits
[11:34:59] <mru> lol
[11:35:13] <lu_zero> s/that/those
[11:35:19] <mru> s/long/tall/
[11:35:35] * _av500_ hides
[11:35:35] * lu_zero should wake up and eat some vocabulary
[11:35:40] <Honoome> but no, the idea is not totally dead yet, people still bring it up from time to time
[11:35:47] <mru> oh dear
[11:36:00] <Honoome> "it's cool, Apple does it!"
[11:36:32] <mru> anyway, even the best events have much less impact than a moderately bad one
[11:36:38] <kshishkov> _av500_: ever seen "find the hidden elephant" picture?
[11:37:04] <_av500_> :)
[11:37:04] <mru> so we take more notice of the bad ones
[11:37:17] <mru> kshishkov: hiding in plain sight?
[11:38:16] <lu_zero> that's easy
[11:38:35] <lu_zero> you just need something that catch the attention more than you do
[11:38:36] <kshishkov> mru: no, not unable to hide at all so every part of elephant is sticking out behind the trees
[11:39:16] <mru> lu_zero: like a war to cover up a sex scandal
[11:39:27] <lu_zero> the other way round usually
[11:39:33] <lu_zero> death << sex
[11:39:45] <mru> did you see the film Wag the Dog?
[11:40:05] <lu_zero> people wants to forget about death and usually sex is the quickest route
[11:40:23] <lu_zero> mru: no
[11:40:31] <Honoome> lu_zero: that's just in Italy
[11:40:53] <mru> it's about the US president trying to cover up a sex scandal
[11:41:01] <mru> it came out about a year before the lewinsky ordeal
[11:41:18] <kshishkov> _their_ war was not in Albania though
[11:42:39] <mru> details
[11:42:48] <lu_zero> btw
[11:43:02] <lu_zero> berlusconi restated that he is into teenagers...
[11:43:20] <kshishkov> has he been caught with one again?
[11:43:35] <lu_zero> actually that or he couldn't traslate properly "bella figliola" in English
[11:43:47] <mru> which means?
[11:44:13] <lu_zero> cute chick
[11:44:27] <mru> maybe to him it's the same thing
[11:44:30] <lu_zero> not cute daughter
[11:44:40] <lu_zero> oh well
[11:44:48] <lu_zero> both goes for him
[11:45:12] <lu_zero> either is the mother, so he could be his dad
[11:45:28] <lu_zero> or the daughter, so he could be her granddad
[11:45:34] <lu_zero> s/his/her/
[11:45:35] <lu_zero> damn
[11:45:47] <Honoome> lu_zero: you might want to read my latest post about -O0 causing segfaults :P
[11:45:48] <lu_zero> today I cannot write =_=
[11:45:57] <lu_zero> segfaults?
[11:46:11] <mru> who in their right mind uses -O0?
[11:46:28] <Honoome> mru: a few people seem to want to… that's why I'm documenting why it's a bad idea
[11:46:40] <mru> those are not in their right minds
[11:46:50] <Honoome> I know…
[11:47:50] <Honoome> especially with glibc that is basically a different library at -O0
[11:49:58] <enkidu> even -Os is better than -O0 ;/
[11:50:27] <Honoome> people seem to think that backtraces are useful at -O1
[11:50:35] <Honoome> while I usually get something useful at -O2 as well
[11:50:49] <Honoome> I use -O0 only to test the difference in code size for feng
[11:51:12] <mru> -O0 is not useful ever
[11:51:24] <mru> except maybe for chasing compiler bugs
[11:53:03] <Honoome> well the last one I found about the different code emitted between -O0 and -O1 make it fun ;P
[11:54:18] <Honoome> gcc can easily replace an sprintf() that would cause a buffer overflow with its return value as if the buffer was big enough
[11:56:07] <mru> if the string is never used later?
[11:58:02] <Honoome> yep
[11:58:13] <mru> that's a perfectly valid optimisation
[11:58:20] <Honoome> so you get (bad) code that works at -O1, but segfaults at -O0
[11:58:26] <mru> buffer overflows cause undefined behaviour
[11:58:32] <mru> which includes not crashing
[11:58:53] <Honoome> no doubt, the problem is that people who insist on -O0 probably don't know of this kind of stuff ;)
[11:59:45] <mru> obviously sprintf should only be used when the buffer can be proved large enough
[12:00:23] <Honoome> btw this only works out properly when the sprintf function is left undeclared, with gentoo's default compiler
[12:00:49] <Honoome> because if stdio.h is included, _FORTIFY_SOURCE=2 will define the fortified-sprintf wrapper, and cause the code to _still_ crash
[12:04:34] <nfl> hi is there any way to shorten int16_t lp[i]  = av_clip_int16(-av_clip_int16(ff2+ff1 >> 10)); ?
[12:05:16] <kshishkov> yep, av_clip_int16(-(ff2+ff1 >>10))
[12:05:28] <kshishkov> should be more or less the same
[12:06:12] <nfl> not same for -(-32768)
[12:07:05] <kshishkov> indeed, it's turned to zero then
[12:07:18] <av500> its "clipped" :)
[12:07:44] <kshishkov> but since av_clip_int16 returns int16_t, there's no need to clip it again
[12:08:49] <mru> kshishkov: -INT16_MIN == INT16_MIN
[12:09:04] <mru> actually no
[12:10:01] <mru> the int16 will be promoted to int during the negation
[12:13:04] * Honoome is running the gentoo tinderbox against ffmpeg 0.6
[12:13:19] <nfl> if (ff1+ff2>>10) > 32767, it will return -32768
[12:14:00] <Honoome> hmm the nut test still fails…
[12:14:09] <mru> what nut test?
[12:14:32] <Honoome> regtest-nut in lavftest
[12:14:42] <mru> fails on 0.6?
[12:14:44] <mru> that's bad
[12:14:57] <Honoome> http://bugs.gentoo.org/attachment.cgi?id=229675 failed on 0.5 as well
[12:15:09] <Honoome> sorry 0.5_p* gentoo snaps
[12:16:08] <av500> mru: see the sorry state of building stuff under android: http://gitorious.org/~olvaffe/ffmpeg/ffmpeg-android/commits/android
[12:17:35] <mru> what kind of idiot is that?
[12:17:56] <av500> an average one
[12:18:25] <av500> all 3rd party libs in android are basically snapshots that are stripped of the confugire run and added Android.mk files...
[12:18:30] <mru> Honoome: 0.6 branch passes make test here
[12:19:00] <mru> and fate
[12:19:09] <Honoome> mru: specifics? here is on x86, gcc 4.4 and 4.5, no particular cflags
[12:19:28] <mru> x86_64, gcc 4.3
[12:19:30] <mru> gentoo
[12:19:59] <pross-au> your running make test *after* the release?
[12:20:03] <pross-au> *you're
[12:20:14] <mru> I assumed siretart would have done it before
[12:20:17] * Honoome is, as it's part of packaging
[12:20:25] <mru> now Honoome is claiming there are failing tests
[12:20:35] <Honoome> well no regressions at least
[12:20:41] <Honoome> I had the same failure before
[12:21:03] <mru> plain ./configure
[12:21:09] <mru> did you use some extra stupid flags?
[12:21:27] <pross-au> rgr
[12:21:37] <Honoome> no extra cflags, the configure is executed through the ebuild
[12:21:57] <Honoome> I'll run a make test on my checkout just to be safe
[12:22:07] <mru> trying 4.4.4
[12:22:29] <mru> this is why I have an i7
[12:22:44] <lu_zero> Honoome: 0.6 is newer than the current snapshot btw?
[12:23:00] * lu_zero would be quite wary of introducing feature regressions
[12:23:20] <mru> pass with 4.4.4
[12:23:39] <Honoome> lu_zero: should be, at least I hope so
[12:25:16] <lu_zero> uhm
[12:25:22] <lu_zero> feng is still running...
[12:25:39] <Honoome> hmm
[12:25:40] <Honoome> http://paste.pocoo.org/show/226049/
[12:25:43] <Honoome> something's wrong…
[12:26:44] <mru> very
[12:26:58] <janneg> lu_zero: current snapshot is 22846 and 0.6 was created from r23017
[12:27:04] <Honoome> blah, using shared libraries it doesn't load the just-built ones but the system ones
[12:27:23] <mru> yeah, that's a problem
[12:27:39] <mru> maybe we should tinker with the env to make it load the right ones
[12:28:09] <Honoome> and if I rewrite the env it seems fine here.
[12:28:54] <Honoome> finishing the whole testsuite and then will give a try to a 32-bit build here
[12:45:19] <Honoome> I found already one package failing to build with ffmpeg-0.6 :/
[12:46:23] <janneg> Honoome: which one? I thought 0.5 and 0.6 were considered compatible
[12:46:43] <Honoome> backlite: http://bugs.gentoo.org/show_bug.cgi?id=324277 … seems like it's a C++ compatibility problem
[12:48:00] <av500> mru will tell you c++ is not supported :)
[12:48:17] <mru> what av500 said
[12:48:47] <Honoome> thus why I opened the bug to backlite and not to ffmpeg-0.6 ;)
[12:48:58] <Honoome> make: *** [regtest-mxf] Error 1
[12:48:58] <Honoome> this is my manual build at 32-bit though
[12:49:17] <Honoome> http://paste.pocoo.org/show/226057/
[12:49:34] <janneg> Honoome: missing  -D__STDC_CONSTANT_MACROS
[12:49:51] <Honoome> janneg: feel free to tell that to the maintainer :P
[12:52:52] <Honoome> ah there are actually a few bugs reported already: https://bugs.gentoo.org/showdependencytree.cgi?id=324255
[12:53:01] <Honoome> seems like they are all C++
[12:56:26] <Honoome> mru: try building ffmpeg with "gcc -m32" on gentoo and see if you also get test failures… maybe it's a 32-bit problem :/
[12:57:08] <mru> building
[12:58:40] <merbzt> cant we just add that define
[12:58:55] <mru> test passes
[12:58:58] <merbzt> all the noise it is causing isn't worth it
[12:58:59] <mru> merbzt: no
[12:59:08] <mru> where would we add it?
[12:59:09] <wbs> merbzt: doesn't help if the app has included stdint.h before including libavutil's headers
[12:59:34] <merbzt> so there is no sane way to define it in the headers ?
[13:00:06] <merbzt> mein godt ...
[13:00:09] <wbs> not in our headers, the c++ app has to define it itself, either before including the relevant headers, or as a global define
[13:00:40] <CIA-92> ffmpeg: lucabe * r23625 /trunk/libavformat/ (rtpenc.c rtpenc.h):
[13:00:40] <CIA-92> ffmpeg: If the video stream is H.264 with MP4 syntax, store the NAL lenght size in
[13:00:40] <CIA-92> ffmpeg: the RTP muxer context (it will be used later for splitting frames in NALs)
[13:01:35] <merbzt> could we somehow define an error saying that __STDC_CONSTANT_MACROS needs to be defined ?
[13:01:56] <lu_zero> because C++ users do not know better?
[13:02:00] <lu_zero> fine with that
[13:02:38] <lu_zero> we do pollute our headers but it's playing nice
[13:03:04] <merbzt> for the record I don't know how one would do it, just an idea
[13:03:31] <mru> I'm against that
[13:03:48] <mru> easy enough to do, but we just shouldn't ge going into that territory
[13:04:00] <merbzt> I just think it causes lots of discussion that leads nowhere
[13:04:14] <mru> but what comes next?
[13:04:26] <merbzt> atom winter
[13:04:27] <pross-au> mru ffmpegOS
[13:06:07] <av500> pross-au: ffos wont run on anything else than ffcpu
[13:07:00] <pross-au> that too
[13:07:40] <pross-au> and ahme, i guess we'll need to support those who can only build ffcpu with discrete logic
[13:08:03] <CIA-92> ffmpeg: janne * r23626 /trunk/libavcodec/dvbsubdec.c:
[13:08:03] <CIA-92> ffmpeg: dvbsub: parse display definition segment
[13:08:03] <CIA-92> ffmpeg: The display definition segment is used to properly display SD DVB subtitles in
[13:08:03] <CIA-92> ffmpeg: HD video streams.
[13:08:39] <benoit-> merbzt: do you plan to post your translation to the ML, for a potential review ?
[13:09:04] <kshishkov> av500: you're totally wrong - FFOS should run on every modern >=32-bit CPU
[13:09:08] <mru> but who will do the klingon translation?
[13:09:46] <benoit-> mru: you talked about it, you do it
[13:09:48] <benoit-> that's fair
[13:11:05] <pross-au> i like that policy benoit-
[13:11:17] <enkidu> bleh
[13:11:27] <enkidu> 300K outside and inside chilling cold ;/
[13:12:57] * kshishkov prefers temperature measuring system proposed by Swedish astronomer and Swedish naturalist
[13:13:46] <elenril> kshishkov: that's not real temperature!
[13:14:15] <nfl> is minor to be bumped when changing the internal api?
[13:14:49] <kshishkov> elenril: if you can't deal with negative temperatures it's your problem
[13:15:26] <enkidu> ok, so - as I am living in city where Fahrenheit were born - lets use Fahrenheit scale ;]
[13:15:49] <elenril> kshishkov: that's not the problem actually
[13:16:10] * lu_zero prefers K
[13:16:38] <lu_zero> even if putting 0 at the water triple point is good as well
[13:17:07] <kshishkov> too big offset for practical usage
[13:17:11] <nfl> never mind
[13:20:05] <elenril> kshishkov: in a proper temperature scale 0 must correspond to zero energy/zero entropy
[13:20:21] <mru> impossible
[13:20:28] <mru> even at 0K you have some entropy
[13:20:31] <mru> due to quantum effects
[13:21:07] <av500> in proper temp scale, 0 is when hell freezes over...
[13:21:43] <enkidu> mru: no, in 0K you have no quantum effects
[13:21:51] <elenril> mru: entropy is zero by definition in a pure state
[13:21:52] <enkidu> thats why reaching 0K is imposible
[13:22:34] <mru> hmm, who should I trust, the physics prof or enkidu?
[13:22:57] <av500> wikipedia?
[13:23:05] <lu_zero> enkidu: no editing!
[13:23:09] <enkidu> maybe becouse I asked some doctors, why 0K is impossible?
[13:23:48] <elenril> mru: http://en.wikipedia.org/wiki/Entropy_(statistical_thermodynamics)#Counting_of_microstates
[13:23:58] <elenril> or get any book on statistical mechanics
[13:24:01] <lu_zero> maybe depends on the definition of 0K
[13:25:04] <enkidu> 0K - temperature, where all motion freezes
[13:25:42] <enkidu> perfect crystal, in 0K have entropy equal 0
[13:26:57] <kshishkov> that was proven wrong very long time ago
[13:27:18] <mru> direct consequence of the uncertainty principle
[13:27:53] <kshishkov> but I'd agree on 0K being unreachable
[13:27:57] <mru> no motion means infinite uncertainty of momentum
[13:28:44] <elenril> mru: if you have a system in a precisely known quantum state, then its entropy is zero
[13:29:05] <elenril> doesn't break any uncertainty or anything like that
[13:29:17] <enkidu> elenril: but such a system cannot be created.
[13:29:21] <elenril> indeed
[13:29:38] <elenril> i dind't say it can be
[13:30:14] <enkidu> thats why lim (T-> 0K)(S(T,V)) !=0
[13:32:16] <mru> which is exactly what I said a while ago
[13:32:29] <mru> that even at 0K you have some energy
[13:33:14] <enkidu> if in 0K you have any energy, it is not 0K
[13:34:00] <enkidu> because you can extract this energy in a form of heat
[13:36:33] <pengvado> what happens when you view your 0K system from another reference frame? then it has kinetic energy, but still no entropy.
[13:37:01] * mru views the system from a non-reference B-frame
[13:37:11] <av500> from a golden frame?
[13:38:43] <kshishkov> all B-frames are non-reference by definition
[13:38:52] <mru> not in h264
[13:38:55] <elenril> kinetic energy is a lie!
[13:39:04] <mru> B means bidirectional
[13:39:50] <elenril> you must be in rest relative to aether!
[13:40:09] <pengvado> not all B-frames are bidirection in h264 either
[13:40:27] <mru> they don't have to be in mpeg2 either
[13:40:41] <mru> you could code all the MBs referencing only the previous ref frame
[13:40:55] <av500> you could hide P frames in B frames
[13:41:02] <av500> ah no
[13:41:20] <mru> in mpeg2 a B-frame can't be a ref frame
[13:41:35] <av500> right
[13:41:49] <mru> but you can hide an I-frame in a P-frame
[13:42:12] <av500> sure. make all MBs intra
[13:42:28] <elenril> so we herd you liek frames, so we put a frame in ur frame...
[13:44:03] <mru> wan't me, I was framed
[13:44:08] <mru> wasn't
[13:44:40] <elenril> http://tvtropes.org/pmwiki/pmwiki.php/Main/IncrediblyLamePun
[13:45:54] <siretart> mru: i did run both fate and all regtests before release.
[13:45:58] <siretart> mru: what's failing?
[13:46:03] <av500> c++
[13:46:29] <siretart> ?
[13:46:53] <av500> error: ‘UINT64_C’ was not declared
[13:46:53] <lu_zero> siretart: all our dumb c++ users
[13:46:54] <mru> siretart: nothing is failing here
[13:47:07] <lu_zero> but they were even before
[13:47:16] <mru> Honoome claimed something was failing for him
[13:47:41] <lu_zero> Honoome: is making sure Gentoo will fix those packages...
[13:48:29] <siretart> lu_zero: puh. I'm relieved :-)
[13:48:53] * Honoome is actually making sure Gentoo knows of those packages and will leave to the respective maintainer to fix them :P
[13:49:05] <Honoome> mru: I'm still having test failures here when building 32-bit, you don't?
[13:49:10] <mru> nope
[13:49:26] <Honoome> okay so I'll have to spend the afternoon tracking that down, I guess
[13:49:36] <mru> try svn head
[13:50:00] <siretart> hm. I guess this shows how well the advice "use svn trunk" works in practice :-/
[13:50:21] <Honoome> *cough* this _was_ with svn trunk ^^
[13:50:46] <mru> fate says trunk is fine
[13:51:48] <siretart> oh, I understood these reports were filed just today. ok.
[14:48:31] <av500> mpeg2 in .mp4?
[14:49:33] <mru> why not?
[14:49:48] <av500> used in the wild?
[14:50:23] <Honoome> hrm is there a proper term to refer to what valgrind with memcheck or dmalloc do?
[14:52:49] <elenril> magic
[14:53:43] <Honoome> "memory $something".. I get to think of accountability (like you do with money) but it might be a false friend from italian
[14:55:45] <mru> "Memcheck is a memory error detector"
[14:55:49] <mru> says valgrind.org
[14:56:31] <Honoome> I meant only the leak detection part actually
[14:59:53] <pengvado> then it's a "memory allocation error detector"
[15:00:46] <pengvado> if you want a transitive verb, "keep track of" or "verify"
[15:05:59] <enkidu> do not always trust valgring. Sometimes it is good to leave value uninitlialised. Sometimes...
[15:06:39] <av500> and then use them in an if() statement?
[15:08:21] <enkidu> oc not :) but using presented approach is very good as random seed generator
[15:08:45] <av500> I would not trust it
[15:10:19] <av500> uninitialized does not mean random content
[15:10:33] <enkidu> not always *
[15:10:51] <wbs> and that has to be about the only place where uninitialized data is ok, it's not a good argument for "don't trust valgrind" generally
[15:11:05] <wbs> don't trust valgrind blindly, is a better statement though
[15:11:24] <enkidu> 17:05 < enkidu> do not always trust valgring.
[15:12:30] <av500> enkidu: and as example you gave random seed generator
[15:14:08] <iive> probably "memcheck is a memory _usage_ error detector" would be more correct.
[15:14:20] <wbs> even for that, it can be trusted when it says you're using uninitialized data
[15:14:28] <wbs> whether that's what's intended or not is up to you
[15:16:37] <enkidu> small example - we use uninitialized values of integer  to pass it with time() into pseudo-random data generator
[15:17:11] <enkidu> it would be better, than just using time() based randomize
[15:17:37] <Honoome> problem is, like _any_ analysis tool, you need to know what the code is doing before trying to act on it
[15:17:46] <wbs> yes, and even then, you can fully trust valgrind when it says you're using uninitialized data
[15:17:55] <enkidu> true
[15:18:08] <av500> if the data is on the stack, it might be not initialized but very unrandom
[15:18:09] <Honoome> if any tool were able to understand what you're _trying_ to do, rather than what you're _doing_, we'd have truly self-healing software
[15:18:18] <enkidu> but your reaction cannot be: "int val = 0"
[15:18:28] <enkidu> if you dont need 0
[15:18:35] <wbs> of course not, you always need to analyze the issue at hand
[15:18:52] <wbs> for some things, the proper initialization value may be something completely different
[15:18:56] <wbs> or whatever
[15:18:57] <Honoome> enkidu: you cannot say that valgrind is untrustworthy because a maintainer is on crack and decides to change code he has no fucking clue about
[15:19:06] <av500> enkidu: so you are relying on int val; to be in fact int val=rand();  ???
[15:20:09] <enkidu> nevermind... I will find an article about debian openssl maintainer, who found uninitialised value with valgring... and cutted random space by 10^6
[15:22:30] <enkidu> http://lwn.net/Articles/282230/
[15:22:47] <wbs> yes, we all know that story already
[15:23:35] <wbs> that doesn't change the fact that you can trust valgrind, valgrind never said you should comment out that code, did it?
[15:23:43] <wbs> it just said that it was uninitialized, which was completely true
[15:25:46] <enkidu> it didnt. anyways, I am rather not using valgring - as someone said, debuggers are evil
[15:26:32] <enkidu> because too many people - rather than thinking while coding - are running debugger on broken code and then analysing, what fails
[15:31:32] <iive> debuggers are not evil. dumb coders thinking they are the greatest are evil.
[15:44:51] <BBB> spyfeng: good work! I'll help you testing, hopefully
[15:45:02] <BBB> Tjoppen: I admit guilt, again not ready yet, will do
[15:51:15] <Tjoppen> BBB: ok. heh
[15:51:22] <BBB> I really am sorry
[15:51:25] <BBB> I'll get to it
[15:51:25] <BBB> I
[15:51:34] <BBB> I'm being tossed around between stuff a little right now
[15:51:37] <BBB> I'll be better
[16:17:05] <Honoome> okay vdr is screwed
[16:51:14] <mru> vdr has been messed up since its inception
[17:17:46] <BBB> my first optimization patch
[17:17:49] <BBB> let's see how bad that is
[17:17:53] <BBB> actually, not really
[17:18:03] <BBB> no asm in it yet...
[17:18:08] <BBB> but I'll do that next I guess
[17:37:47] <Dark_Shikari> FAIL of the day
[17:37:47] <Dark_Shikari> 01:37 < holger> "The Atom libraries (s8/n8) and SSSE3 libraries (v8/u8) have been merged into a single optimization (designated by v8/u8) that optimizes the performance for both of these nearly identical architectures."
[17:40:28] <KotH> and why is that a fail?
[17:44:08] <Dark_Shikari> because the atom is totally different from every other ssse3 chip
[17:44:19] <Dark_Shikari> atom is more different from a core 2 than a 486 is
[17:44:37] <Dark_Shikari> also, both of the most interesting ssse3 instructions take about 6 times longer on the atom
[17:44:41] <Dark_Shikari> making them useless
[17:46:52] <enkidu> :o
[17:56:47] <BBB> atom is slow anyway
[17:56:54] <BBB> so 6x slower is just a relative thing which means little
[17:56:56] <BBB> :-p
[17:57:11] <BBB> Dark_Shikari: why don't you help me with something simple^2
[17:57:30] <Dark_Shikari> you didn't ask
[17:57:31] <BBB> actually, let me figure something out
[17:57:31] <BBB> brb
[17:57:43] <BBB> Dark_Shikari: that's because I'm still figuring out the question ;)
[18:07:02] <BBB> I'm trying to "shark" ffplay playing a vp8 movie to see what it does and when
[18:08:03] <BBB> is that a correct approach?
[18:09:08] <Dark_Shikari> "shark"?  huh
[18:09:37] <enkidu> dunno, how shark might change anything in ffplay...
[18:09:41] <BBB> macos version of valgrind with a prerrt ui
[18:09:50] <BBB> pretty*
[18:09:57] <Dark_Shikari> um, it's not valgrind
[18:09:59] <Dark_Shikari> it's oprofile
[18:10:35] <BBB> that thing that tells me how much time is spent in which functions
[18:10:44] <janneg> BBB: do you know why the number of skips is so high?
[18:10:58] <BBB> janneg: I don't know what a skip is
[18:11:39] <janneg> BBB: a not counted run
[18:11:51] <BBB> maybe because it's too fast?
[18:11:56] <BBB> honestly, no idea
[18:12:21] <Dark_Shikari> I don't know what your question is
[18:12:27] <Dark_Shikari> you're profiling it.  so what?
[18:13:10] <janneg> START/STOP_TIMER does to not count outliers (for example due to scheduling)
[18:14:01] <janneg> BBB: it means that a high number of runs takes longer
[18:15:53] <Honoome> and of courseeeee chromium does not build with ffmpeg 0.6 because of the usual C++ screwup
[18:17:03] <peloverde> I blame bjarne
[18:18:21] <Honoome> peloverde: we can't blame him if there are so many crazy people who actually _use_ it
[18:18:35] <BBB> about 9% is spent in vp8_v/h_loop_filter16_inner_c, and a bunch more (6-8% each) in vp8_v/h_loop_filter8/16_c, so those are good functions to start optimizing?
[18:18:43] <BBB> (the 9% is also each)
[18:20:11] <BBB> put_vp8_epel*() is only 4% or less each, unfortunate, I understand what they do and thought they were easy enough to start with :(
[18:21:12] <Dark_Shikari> do them
[18:21:16] <Dark_Shikari> it's 4% each but ther'es lots of them
[18:21:25] <BBB> 4% for two actually
[18:21:28] <BBB> 1-2% each
[18:21:35] <BBB> there's 9x3 of them
[18:21:54] <BBB> I guess some fall off my chart because they're hardly used
[18:22:16] <BBB> so now come the stupid questions... how do I do that? (I'm serious)
[18:22:38] <BBB> e.g., how do I decide what optimization set (mmx, sse, whatnot) to use?
[18:22:53] <BBB> I guess I need to start by getting all these big phat manuals from intel.com?
[18:22:55] <Dark_Shikari> you have never written asm before?
[18:22:58] <BBB> no
[18:23:03] <BBB> but I lean quickly
[18:23:13] <BBB> learn*
[18:29:19] <CIA-92> ffmpeg: stefano * r23627 /trunk/libavutil/eval.c: Improve av_parse_eval() error reporting.
[18:29:20] <CIA-92> ffmpeg: stefano * r23628 /trunk/tests/codec-regression.sh:
[18:29:20] <CIA-92> ffmpeg: Remove the "b" from "Mb" in -b values for the dnxhd tests.
[18:29:20] <CIA-92> ffmpeg: They are just ignored, and tend to confuse both machines and humans.
[18:29:20] <CIA-92> ffmpeg: stefano * r23629 /trunk/libavutil/eval.c:
[18:29:21] <CIA-92> ffmpeg: Make av_parse_expr() fail if there are trailing chars at the end of
[18:29:21] <CIA-92> ffmpeg: the provided expression.
[18:29:22] <CIA-92> ffmpeg: Allow detection of mistyped expressions.
[18:33:00] <Yuvi> BBB: %y0-dffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffl-cxxxxxxxxxxxxxxxxxxxxxxxxx∏”Ú¿∏ň¨JHo
[18:33:38] <Yuvi> err, I mean the % obviously depend on the sample, e.g. http://www.supergenije.com/cruncher/test.webm has a lot more subpel
[18:33:52] <BBB> Yuvi: but is the patch ok?
[18:34:16] <BBB> (I know the second patch adding the indexing is screwed up, fixed that locally)
[18:34:36] <Yuvi> yeah
[18:35:28] <BBB> then since DS has gone awol or so, how would I go about optimizing this stuff? would you suggest I start with epel or with the filter stuff?
[18:35:44] <BBB> epel looks simpler so is likely better for me to begin with
[18:36:13] <Yuvi> yeah, it should be a bit simpler since the loop filter will rely on masking to take the place of branching
[18:36:58] <BBB> ok, and any opinion/thoughts about sse vs mmx vs whatnot?
[18:37:05] <wbs> Yuvi: do you have a second to spare on libvorbis maintainance, btw? what do you think about this patch? http://lists.mplayerhq.hu/pipermail/ffmpeg-devel/2010-June/090370.html
[18:37:11] <Yuvi> for mmx vs. sse2, I'd write it in yasm and do both at the same time in a macro, though sse2 only for when 128 bit arith helps
[18:37:13] <BBB> (I'm asking because I don't know - I'll figure out by just doing it)
[18:37:30] <Yuvi> e.g. not width 4
[18:37:35] <BBB> right
[18:37:39] <BBB> ok, will start with mmx then
[18:38:18] <Yuvi> for x264's yasm macros, use mmsize, which equals 8 for mmx and 16 for sse
[18:38:41] <Yuvi> start with the _v* funcs, they'll be slightly easier than _h in simd
[18:39:48] <Dark_Shikari> BBB: if you want, I can give you the 30-minute "how to do asm" lesson
[18:39:54] <Dark_Shikari> gave it to my GSOC student a few days ago
[18:40:08] <_av500_> blog post?
[18:40:16] <Yuvi> wbs: should be okay, don't think they'll ever overlap though
[18:40:23] <Dark_Shikari> _av500_: it's kinda interactive
[18:40:53] * _av500_ will watch Dark_Shikari interact with BBB...
[18:41:03] <lu_zero> Dark_Shikari: I want it as well
[18:41:04] <Yuvi> Dark_Shikari: flash blog post?
[18:41:13] <_av500_> html5?
[18:41:19] <jai> html5+js you mean
[18:41:27] <Dark_Shikari> Yuvi: I can't embed my conciousness in Flash, it's too limited.
[18:42:02] <Yuvi> BBB: oh, modify the english in the comment for put_vp8_epel_pixels_tab to mention the 4tap vs. 6tap
[18:42:16] <Dark_Shikari> BBB: y/n?
[18:42:29] <BBB> Dark_Shikari: I'd love to, is tomorrow ok?
[18:42:35] <BBB> today I have to run in less than 30
[18:42:40] <BBB> but I'd really love to
[18:42:40] <Dark_Shikari> run where?
[18:42:44] <wbs> Yuvi: ok, I'll apply it soon then
[18:42:46] <BBB> docter's appointment
[18:42:53] <Dark_Shikari> why not after that
[18:43:09] <BBB> I promised my wife I'd go home after that
[18:43:13] <BBB> no internet at home
[18:43:15] <BBB> I know, I'm lame
[18:43:31] <BBB> I can go after 6ESt, so ~3PM your time, if that's ok
[18:43:42] <Dark_Shikari> so in 3 hours?
[18:43:53] <BBB> and 15 minutes, yes
[18:44:07] <Dark_Shikari> k, sure
[18:44:21] <BBB> sorry... but again, would really love to
[18:45:07] <BBB> Yuvi: committed to your git tree, including correct table patch
[18:48:10] <Dark_Shikari> BBB: ping me then
[18:48:24] <BBB> ok
[18:48:42] <BBB> afk now ;)
[19:04:47] <CIA-98> ffmpeg: stefano * r23630 /trunk/configure:
[19:04:47] <CIA-98> ffmpeg: Name the default configure log filename as "config.log" rather than
[19:04:47] <CIA-98> ffmpeg: "config.err". The former name was misleading, as the file contains
[19:04:47] <CIA-98> ffmpeg: useful information not necessarily related to errors.
[19:04:47] <CIA-98> ffmpeg: mstorsjo * r23631 /trunk/libavcodec/libvorbis.c: libvorbis: Use memmove instead of memcpy for shifting data
[19:07:08] <wbs> Yuvi: any comments on http://lists.mplayerhq.hu/pipermail/ffmpeg-devel/2010-June/089909.html then? or should I ping superdump about that one?
[19:17:10] <CIA-98> ffmpeg: stefano * r23632 /trunk/doc/fftools-common-opts.texi: Document how to set boolean options.
[19:17:11] <CIA-98> ffmpeg: stefano * r23633 /trunk/doc/fftools-common-opts.texi: Document terminal coloring in the -loglevel option.
[19:31:54] <CIA-98> ffmpeg: michael * r23634 /trunk/libavformat/avformat.h: Marking what should be removed in relation to MAX_STREAMS.
[20:02:14] <enkidu> i have found a bug in ff
[20:02:35] <enkidu> [flv @ 0x99e6fe0]Broken FLV file, which says no streams present, this might fail
[20:03:27] <kierank> what's the bug?
[20:05:05] <_av500_> enkidu: roundup
[20:13:49] <enkidu> _av500_: under heavy load headers are not sent?
[20:14:10] <enkidu> it is regression (compared too version used one year ago)
[21:34:03] <peloverde> If I try to use cbrt_tablegen outside of aacdec.c there will be a duplicate table right?
[21:50:51] <astrange> > git blame --reverse 3af83fb0af63c..HEAD
[21:50:52] <astrange> fatal: cannot stat path '3af83fb0af63c..HEAD': No such file or directory
[21:50:55] <astrange> ...isn't that the right syntax
[21:51:37] <astrange> oh i see
[22:29:48] <BBB> Dark_Shikari: ping
[22:31:16] <Dark_Shikari> BBB: pong
[22:31:21] <BBB> woohoo
[22:31:21] <BBB> let's go
[22:31:30] <Dark_Shikari> ok, grab x264's source, open common/predict-a.asm, tell me when you're ready.
[22:31:41] <Dark_Shikari> and give me a rough estimate of what you know about:
[22:31:43] <Dark_Shikari> a) registers
[22:31:45] <Dark_Shikari> b) calling convention
[22:31:58] <Dark_Shikari> so I don't repeat what you already know.
[22:32:18] <BBB> I can read asm and I've RE'ed, so I think I know basics about a and b
[22:32:22] <BBB> let me checkout x264
[22:33:08] <janneg> lu_zero: ping ^^^
[22:34:21] <Dark_Shikari> ping me when you're done.
[22:36:12] <BBB> does it need to be compiled?
[22:36:24] <Dark_Shikari> no
[22:36:27] <BBB> ok, done
[22:36:53] <BBB> I don't see common/predict-a.asm
[22:37:01] <Dark_Shikari> common/x86/predict-a.asm
[22:37:04] <Dark_Shikari> find predict_4x4_dc_mmxext.
[22:37:05] <Dark_Shikari> brb.
[22:37:45] <CIA-98> ffmpeg: stefano * r23635 /trunk/libavutil/eval.c:
[22:37:45] <CIA-98> ffmpeg: Add more tests to eval, help detecting some of the more apparent
[22:37:45] <CIA-98> ffmpeg: errors, far from being a complete test system.
[22:38:15] <Dark_Shikari> found it?
[22:38:17] <Dark_Shikari> mmmm chicken
[22:38:25] <BBB> yes
[22:38:36] <Dark_Shikari> ok, so this function does the following
[22:38:38] <BBB> and the C function is in my other terminal
[22:38:48] <Dark_Shikari> it takes an input pixel array of stride FDEC_STRIDE
[22:38:50] <Dark_Shikari> of the form:
[22:38:55] <Dark_Shikari> N A B C D
[22:38:59] <Dark_Shikari> E X X X X
[22:39:00] <Dark_Shikari> F X X X X
[22:39:02] <Dark_Shikari> G X X X X
[22:39:08] <Dark_Shikari> H X X X X
[22:39:19] <Dark_Shikari> it calculates M = (A+B+C+D+E+F+G+H+4)>>3
[22:39:23] <Dark_Shikari> and sets all of "X" equal to M.
[22:39:25] <Dark_Shikari> got it?
[22:39:28] <BBB> yes
[22:39:35] <BBB> average+round
[22:39:36] <Dark_Shikari> so, let's start from the first line
[22:39:43] <Dark_Shikari> cglobal name, 1, 4
[22:40:00] <Dark_Shikari> 1 means "we want one of the function's arguments loaded into registers for us"
[22:40:08] <Dark_Shikari> Thus, "r0" will have the function's first argument.
[22:40:21] <Dark_Shikari> "4" means "we want 4 general-purpose registers to use during this function"
[22:40:30] <Dark_Shikari> This includes the one we used up with a parameter, r0.
[22:40:35] <BBB> ok
[22:40:36] <Dark_Shikari> So, we have r0, r1, r2, and r3 available to use.
[22:40:40] <Dark_Shikari> got it?
[22:40:52] * Honoome wonders if Dark_Shikari, mru and others would actually ever write a book on real-world hand-coded optimisations
[22:40:53] <BBB> so they're push'ed for us and pop'ed for us at the end
[22:40:56] <Dark_Shikari> Yup.
[22:40:59] <BBB> ok, got it
[22:41:01] <Dark_Shikari> pushing and popping is done for us.
[22:41:08] <Dark_Shikari> Now, first, we want to calculate A+B+C+D
[22:41:20] <Dark_Shikari> pxor clears mm7 (I assume you are familiar with the xor --> zero trick?)
[22:41:28] <BBB> yes
[22:41:37] <BBB> MS uses it everywhere :)
[22:41:58] <BBB> mm7 is a "mmx register"?
[22:42:00] <Dark_Shikari> yes.
[22:42:05] <BBB> are they all free to use?
[22:42:13] <Dark_Shikari> Yes.  They are all caller-save.
[22:42:16] <Dark_Shikari> mm0-mm7.
[22:42:19] <BBB> ok
[22:42:25] <Dark_Shikari> They are not used by any calling convention.
[22:42:32] <Dark_Shikari> for parameters/etc
[22:42:36] <Dark_Shikari> so you can do what you want with them.
[22:42:39] <Dark_Shikari> so, we zero mm7.
[22:42:45] <Dark_Shikari> Then, we move ABCD into mm0 using "movd"
[22:42:50] <Dark_Shikari> I assume you're familiar with bracket-syntax?
[22:42:54] <Dark_Shikari> i.e. [x+y] means *(x+y)
[22:42:56] <BBB> yes, dereference
[22:43:02] <BBB> (is that what it's called?)
[22:43:04] <Dark_Shikari> yes
[22:43:13] <Dark_Shikari> FYI, x86 can do the following inside a bracket
[22:43:23] <BBB> so r0 is "1 line above the start of block X"
[22:43:30] <Dark_Shikari> [REG1 + REG2*{0,1,2,4,8} + (32-bit constant)]
[22:43:40] <Dark_Shikari> REG1 and REG2 can be the same.
[22:43:57] <BBB> to make 5 or 3
[22:44:00] <Dark_Shikari> yes.
[22:44:03] <BBB> I've seen that for "fast multiplies"
[22:44:05] <BBB> ok
[22:44:07] <Dark_Shikari> so, now we do psadbw.
[22:44:17] <Dark_Shikari> psadbw does a SAD on the 8 pairs of bytes in the two inputs
[22:44:21] <Dark_Shikari> and sums it all up
[22:44:32] <Dark_Shikari> i.e. reg1 = {A1, B1 ... H1}
[22:44:37] <Dark_Shikari> and reg2 = {A2, B2 ... H2}
[22:44:44] <Dark_Shikari> sum = abs(A1-A2) + abs(B1-B2)...
[22:44:59] <Dark_Shikari> "movd" sets the top half of the mm register to 0, so mm0 contains ABCD0000
[22:45:05] <Dark_Shikari> mm7 contains 00000000
[22:45:12] <Dark_Shikari> so psadbw actually calculates A+B+C+D.
[22:45:23] <BBB> mm are 8-byte registers?
[22:45:28] <Dark_Shikari> yes.
[22:45:32] <Dark_Shikari> 64-bit.
[22:45:47] <BBB> this is still x86-(32) asm right?
[22:45:52] <Dark_Shikari> yes.
[22:45:54] <BBB> interesting...
[22:45:54] <BBB> ok
[22:46:11] <Dark_Shikari> so finally, once that's done, we move the result to r3d.
[22:46:12] <BBB> so sum goes back into mm7 then
[22:46:17] <Dark_Shikari> no, back into mm0.
[22:46:22] <Dark_Shikari> psadbw DST, SRC
[22:46:24] <BBB> oh right
[22:46:25] <BBB> yes
[22:46:33] <Dark_Shikari> "d" means 32-bit version of the register.
[22:46:35] <Dark_Shikari> "w" means 16-bit
[22:46:37] <Dark_Shikari> "b" means 8-bit
[22:46:40] <BBB> ok
[22:46:48] <Dark_Shikari> so "r3" means "native size of r3"
[22:46:52] <Dark_Shikari> "r3d" means "32-bit size"
[22:46:56] <Dark_Shikari> so on 64-bit, that would mean the low half.
[22:47:18] <Dark_Shikari> We use "d" for numerical calculations where native size isn't needed because, on 64-bit, it's a smaller instruction size.
[22:47:29] <Dark_Shikari> Note however this CANNOT be done if you're using the register as an address.
[22:47:30] <BBB> ok
[22:47:35] <Dark_Shikari> Because addresses must be native-size.
[22:47:40] <BBB> hence r0
[22:47:42] <BBB> and not r0d
[22:47:43] <Dark_Shikari> Yes.
[22:47:46] <BBB> ok
[22:47:50] <Dark_Shikari> so, now, r3d contains A+B+C+D.
[22:47:59] <Dark_Shikari> Now, for the first bit of macro fun you'll get to experience.
[22:48:15] <Dark_Shikari> %assign n 1
[22:48:17] <Dark_Shikari> %rep 3
[22:48:20] <Dark_Shikari> <stuff that uses n>
[22:48:22] <Dark_Shikari> %assign n n+1
[22:48:23] <Dark_Shikari> %endrep
[22:48:31] <Dark_Shikari> That repeats said "stuff", first with n=1, then n=2, then n=3.
[22:48:39] <Dark_Shikari> Do *that* with the C preprocessor.
[22:48:40] <Dark_Shikari> :)
[22:48:52] <BBB> so it's a for() loop that is auto-unrolled by yasm?
[22:48:59] <Dark_Shikari> treat it as a macro.
[22:49:05] <BBB> ok
[22:49:06] <Dark_Shikari> Don't treat it as "unrolling a for loop"
[22:49:08] <Dark_Shikari> But yes, that's what it does.
[22:49:21] <Dark_Shikari> so there what we do is:
[22:49:26] <Dark_Shikari> 1) move E into r1d (see the original chart)
[22:49:34] <Dark_Shikari> 2) repeatedly move F, G, H into r2d, then add to r1d.
[22:49:45] <Dark_Shikari> thus, at the end of the %rep, r1d contains E+F+G+H.
[22:49:51] <Dark_Shikari> and r3d still contains A+B+C+D.
[22:49:53] <Dark_Shikari> got it?
[22:49:54] <BBB> yes
[22:50:02] <Dark_Shikari> do you know the magic of lea?
[22:50:15] <BBB> it's a quick-add or quick-multiply
[22:50:22] <Dark_Shikari> yeah, it's using the brackets for math.
[22:50:30] <Dark_Shikari> lea r1d, [r1+r3+4]
[22:50:36] <Dark_Shikari> we just did all of that in one op, so now we have
[22:50:39] <Dark_Shikari> A+B+C+D+E+F+G+H+4
[22:50:43] <Dark_Shikari> then we shr 3 (obvious)
[22:50:47] <BBB> can I use r1d/r3d in lea?
[22:50:54] <Dark_Shikari> Yes, but it's larger instruction size on 64-bit.
[22:50:58] <Dark_Shikari> And does the same thing.
[22:51:00] <BBB> ok
[22:51:04] <Dark_Shikari> multiply with 0x01010101
[22:51:05] <Dark_Shikari> (splat)
[22:51:08] <Dark_Shikari> and then we store 4 times.
[22:51:17] <Dark_Shikari> Understand why this function works?
[22:51:26] <BBB> I think so
[22:51:29] <BBB> pretty simple :)
[22:51:33] <Dark_Shikari> Now, finally
[22:51:34] <Dark_Shikari> we RET
[22:51:35] <Dark_Shikari> not ret
[22:51:36] <Dark_Shikari> RET
[22:51:38] <Dark_Shikari> RET is overloaded.
[22:51:42] <BBB> ?
[22:51:44] <Dark_Shikari> It will automatically do any popping that needs to be done.
[22:51:53] <BBB> ah, ok
[22:52:00] <Dark_Shikari> Finally, note the _reason_ this function exists.
[22:52:05] <BBB> is that x264 magic or is that yasm?
[22:52:08] <Dark_Shikari> x264asm
[22:52:18] <Dark_Shikari> x264's asm abstraction syntax, written in yasm, which is in ffmpeg.
[22:52:29] <BBB> ok, so I can use it, good
[22:52:47] <Dark_Shikari> so, the _reason_ this function exists
[22:52:51] <Dark_Shikari> psadbw lets us do 4 additions in one op.
[22:52:53] <Dark_Shikari> That's it.
[22:52:58] <Dark_Shikari> er, 3 additions, I guess.
[22:53:05] <Dark_Shikari> Beyond that, there is no reason to write this function.
[22:53:06] <BBB> sum of 4 consecutive bytes
[22:53:10] <BBB> right
[22:53:12] <BBB> ok
[22:53:14] <Dark_Shikari> The only gain we get here is from using psadbw.
[22:53:16] <Dark_Shikari> Saves ~3 clocks.
[22:53:21] <Dark_Shikari> Otherwise, it's the same as the C.
[22:53:41] <BBB> is there a way to get gcc to output something like this to use as baseline?
[22:53:47] <Dark_Shikari> what do you mean?
[22:53:57] <BBB> can gcc output yasm syntax?
[22:54:02] <Dark_Shikari> no, but there's a perl script for that
[22:54:09] <Dark_Shikari> however, when writing asm, that's not generally what you do
[22:54:15] <Dark_Shikari> because most asm is actually easier to write in yasm than in gcc.
[22:54:33] <Dark_Shikari> This is just an example of a rare function that is mostly "C".
[22:54:42] <Dark_Shikari> or should I say, mostly scalar.
[22:54:49] <BBB> right, I was surprised how much asm was familiar :)
[22:54:58] <Dark_Shikari> So let's go to another.
[22:54:59] <Dark_Shikari> mc-a.asm
[22:55:06] <Dark_Shikari> pixel_avg2_w16_sse2
[22:55:16] <Dark_Shikari> function prototype is on line 505
[22:55:40] <Dark_Shikari> takes two sources with a given stride, and averages 16-pixel rows (width 16) from each
[22:55:46] <Dark_Shikari> and writes the result to dst
[22:55:52] <Dark_Shikari> it does this "height" times (that many rows)
[22:56:06] <Dark_Shikari> In other words, linear interpolation for motion compensation.
[22:56:11] <Dark_Shikari> Got it?
[22:56:39] <BBB> yes
[22:56:46] <BBB> there's no C function for this in common/*.c?
[22:56:49] <Dark_Shikari> Yes there is
[22:57:02] <Dark_Shikari> pixel_avg
[22:57:26] <BBB> oh ok
[22:58:01] <Honoome> [the books we all want to read: DS's and mru's “Hand-written Optimisations”; mine and Luca's “Mess-up Project Management”; michael's “Designing Operating Systems”… more or less]
[22:58:04] <BBB> so for every 16 pixels, it outputs 1 "pixel avg"?
[22:58:11] <Dark_Shikari> no
[22:58:22] <Dark_Shikari> it averages "height" rows of 16 pixels each
[22:58:52] <BBB> a "block" average
[22:59:10] <Dark_Shikari> ?
[22:59:35] <Dark_Shikari> on each iteration, it takes 2 sets of 16 pixels, and outputs 1 set of 16 pixels.
[22:59:42] <BBB> ok
[22:59:42] <Dark_Shikari> I'm not quite sure what you're saying
[22:59:51] <Dark_Shikari> do you not know how linear motion compensation works?
[22:59:56] <Dark_Shikari> vp8 does it, theora does it
[22:59:59] <Dark_Shikari> mpeg-2 does it, mpeg-4 does it
[23:00:00] <BBB> so ABCDEFGH + ZYX[etc] -> (A+Z)/2 (B+Y)/2 [etc.]
[23:00:06] <Dark_Shikari> Yes
[23:00:11] <Dark_Shikari> except /2 is (A+B+1)>>1
[23:00:15] <BBB> ok
[23:00:23] <BBB> sorry, misunderstood for a second
[23:00:29] <Dark_Shikari> so, back to the asm
[23:00:31] <Dark_Shikari> 6,7
[23:00:36] <Dark_Shikari> we have 6 args, and we want them all in registers for us to use.
[23:00:41] <Dark_Shikari> 7: we need all 7 registers.
[23:00:46] <Dark_Shikari> Now, the first bit of magic
[23:01:10] <DonDiego> gnite
[23:01:12] <Dark_Shikari> You notice in the C, we increment 3 pointers per iteration, right?
[23:01:42] <BBB> I've seen MS doing it, it allows us to do *[address1] and *[address2_base+address1]
[23:01:47] <BBB> and then only increment address1 at the end of the loop
[23:02:00] <BBB> iteration, not loop
[23:02:09] <Dark_Shikari> you mean the "sub" trick here?
[23:02:11] <BBB> yes
[23:02:16] <Dark_Shikari> Ah, so you've seen that
[23:02:25] <Dark_Shikari> this is allowed because src1 and src2 have the same stride
[23:02:35] <BBB> ok
[23:02:37] <Dark_Shikari> so we increment only one, and keep src2 stored as an offset from src1.
[23:02:46] <Dark_Shikari> so you understand the first two lines.
[23:02:49] <Dark_Shikari> Now, the loop
[23:02:51] <Dark_Shikari> .height_loop:
[23:02:55] <Dark_Shikari> there's a label.  It's like a label in C.
[23:03:12] <Dark_Shikari> Next, we load 4 rows of 16 pixels: two from src1, two from src2
[23:03:16] <Dark_Shikari> into xmm0, xmm2, xmm1, xmm3
[23:03:21] <Dark_Shikari> xmm -> 16-byte SSE register
[23:03:38] <BBB> movdqu = load 8 bytes?
[23:03:38] <Dark_Shikari> "movdqu" --> unaligned load.  16-byte loads must always be aligned except when done via movdqu.  movdqu is slower.
[23:03:43] <Dark_Shikari> double quadword
[23:03:51] <Dark_Shikari> word == 16 bits
[23:03:55] <Dark_Shikari> thus, 16 bytes
[23:04:11] <BBB> ok
[23:04:27] <Dark_Shikari> our loads must be unaligned because this is motion compensation
[23:04:29] <Dark_Shikari> and our mv can point anywhere.
[23:04:49] <BBB> right
[23:04:52] <Dark_Shikari> now, the real magic
[23:04:53] <Dark_Shikari> pavgb.
[23:05:05] <Dark_Shikari> this does (A+B+1)>>1 for each pair of input bytes.
[23:06:17] <BBB> and movdqa is aligned-mov-16byte
[23:06:25] <Dark_Shikari> yes
[23:06:27] * BBB looks up in intel manual
[23:06:33] <Dark_Shikari> so we store our output interpolated bytes.
[23:06:41] <Dark_Shikari> then, we use lea to increment our pointers by stride*2
[23:06:48] <Dark_Shikari> we use "sub" to decrement our loop counter
[23:07:00] <Dark_Shikari> and "jg" back to height_loop (I assume you know about the jump condition codes)
[23:07:25] <BBB> "sort of", I sometimes have to look them up, ja is unsigned, jg is signed, right?
[23:08:05] <Dark_Shikari> yes
[23:08:08] <Dark_Shikari> jump if greater than
[23:08:11] <BBB> so sub sets a signed or "equal" bit if it's zero or sign changes
[23:08:15] <BBB> jg takes that into account
[23:08:18] <Dark_Shikari> so if r5d > 0, jump to height_loop
[23:08:25] <Dark_Shikari> i.e. this is a for loop counting towards zero.
[23:08:28] <BBB> right
[23:08:29] <Dark_Shikari> r5d is height
[23:08:32] <Dark_Shikari> so it's just like the C version
[23:08:39] <Dark_Shikari> now, finally, REP_RET
[23:08:43] <Dark_Shikari> have you ever seen "rep ret" in disassembly?
[23:08:47] <BBB> no
[23:09:14] <Dark_Shikari> I'm surprised.  it must be against corporate policy at the companies you've RE'd code from =p
[23:09:18] <BBB> I've seen rep used in other asm pieces, but not ret rep
[23:09:27] <Dark_Shikari> "rep" means repeat.
[23:09:31] <BBB> right
[23:09:34] <Dark_Shikari> "rep ret" makes no fucking sense.
[23:09:36] <BBB> what is rep rey then? :)
[23:09:44] <BBB> "return to the toplevel caller"?
[23:09:53] <Dark_Shikari> AMD has had a long standing problem in branch prediction that is widely documented.
[23:10:11] <Dark_Shikari> if you have a branch immediately before a return, it will mispredict.
[23:10:17] <Dark_Shikari> However, this only applies to the 1-byte "ret"
[23:10:25] <Dark_Shikari> if you put a nop before it, it'll be fine.
[23:10:28] <Dark_Shikari> Or if you use the 3-byte "ret 0"
[23:10:34] <Dark_Shikari> Or.... if you use the 2-byte "rep ret".
[23:10:42] <BBB> which is smaller :)
[23:10:46] <Dark_Shikari> REP_RET will use "rep ret" instead of "ret" IF AND ONLY IF there are no pops to perform.
[23:10:46] <BBB> ok, I think
[23:10:49] <BBB> so rep is nop
[23:10:55] <Dark_Shikari> rep is a prefix, not an instruction.
[23:11:04] <BBB> but it does nothing :)
[23:11:06] <Dark_Shikari> REP_RET is just like RET, except if there are no pops to perform, it does a rep ret.
[23:11:10] <Dark_Shikari> instead of a ret.
[23:11:14] <Dark_Shikari> to avoid that.
[23:11:16] <BBB> ok
[23:11:22] <Dark_Shikari> we do that because, besides the pops, we have nothing after the branch and before the ret.
[23:11:28] <Dark_Shikari> And on AMD64, there won't be any pops.
[23:11:30] <BBB> so in this case it probably is just rety, because you used 7 registers, right?
[23:11:37] <Dark_Shikari> On x86_32, yes.
[23:11:40] <Dark_Shikari> On x86_64, not quite so much.
[23:11:59] <BBB> how many caller-save regs does amd64 have?
[23:12:11] <BBB> or should I not make any assumptions and just always use rep_ret?
[23:12:58] <Dark_Shikari> no assumptions.
[23:13:03] <Dark_Shikari> of course, only use rep_ret where appropriate.
[23:13:07] <Dark_Shikari> also
[23:13:10] <Dark_Shikari> you cannot use more than 7 registers
[23:13:18] <Dark_Shikari> Any code that uses more than 7 registers is no longer platform-agnostic
[23:13:24] <Dark_Shikari> And thus must be ifdeffed for 64-bit
[23:13:32] <Dark_Shikari> You can do it, and we do it, of course
[23:13:37] <Dark_Shikari> it's just that the macro system is designed to abstract
[23:13:38] <BBB> ok
[23:13:44] <Dark_Shikari> and obviously if you're not abstracting, you have to special-case.
[23:13:51] <Dark_Shikari> Fortunately you rarely need more than 7.
[23:13:52] <BBB> pavgb documentaiton on intel's website says that avg(1+1)=2
[23:13:53] <BBB> why?
[23:14:07] <Dark_Shikari> you mean avg(1,1)?
[23:14:08] <Dark_Shikari> they're wrong
[23:14:17] <BBB> page 3-489
[23:14:24] <Dark_Shikari> I don't care about intel's manuals
[23:14:26] <BBB> it says so, twice
[23:14:26] <BBB> :)
[23:14:26] <Dark_Shikari> nor do I have them
[23:14:35] <Dark_Shikari> then you misunderstood it or it's a typo
[23:14:44] <Dark_Shikari> http://agner.org/optimize are the only manuals you need
[23:15:14] <Dark_Shikari> so anyways, that's a very very common "style" of function in x264
[23:15:19] <Dark_Shikari> and among the asm you'll be doing
[23:15:26] <Dark_Shikari> i.e.
[23:15:27] <Dark_Shikari> 1) init stuff
[23:15:33] <Dark_Shikari> 2) do a loop over pixels, loading and storing
[23:15:39] <Dark_Shikari> 3) increment pointers, check loop condition
[23:15:42] <Dark_Shikari> 4) finish up, ret
[23:16:58] <Dark_Shikari> Next, let's do a taste of macros.
[23:17:03] <Dark_Shikari> Line 227, sad-a.asm.
[23:17:54] <BBB> ok
[23:18:01] <Dark_Shikari> pixel_sad_8x16_sse2
[23:18:10] <Dark_Shikari> this does a SAD on an 8x16 source and reference block
[23:18:26] <Dark_Shikari> SAD_INC_4x8P_SSE is a macro.
[23:18:33] <Dark_Shikari> it does a sad on 4 lines of width 8.
[23:18:42] <Dark_Shikari> As you can tell, it loads 4 8-byte segments
[23:18:49] <BBB> movq
[23:18:57] <Dark_Shikari> then loads 4 more into the top half of the sse registers (movhps)
[23:18:59] <Dark_Shikari> then SADs
[23:19:01] <Dark_Shikari> then accumulates.
[23:19:04] <Dark_Shikari> But a few things are different.
[23:19:07] <Dark_Shikari> First of all, "m".
[23:19:13] <Dark_Shikari> "m" == "mm" when INIT_MMX is set.
[23:19:18] <Dark_Shikari> "m" = "xmm" when INIT_XMM is set.
[23:19:27] <Dark_Shikari> This lets you abstract between MMX and SSE easily.
[23:19:38] <Dark_Shikari> mova == movq when MMX is set, movdqa otherwise.
[23:19:43] <Dark_Shikari> movu is movq, movdqu otherwise
[23:19:47] <Dark_Shikari> movh is movd, movq otherwise
[23:19:53] <Dark_Shikari> mmsize is 8, 16 otherwise.
[23:20:13] <Dark_Shikari> get the idea?
[23:20:27] <BBB> I think so... I might get confused halfway but I'll try to not lose track :)
[23:20:33] <Dark_Shikari> This function doesn't use that abstraction.  But it does use another feature that we get from using "m" instead.
[23:20:39] <BBB> so this is a macro for doing both sse and mmx in one go
[23:20:43] <Dark_Shikari> Not this one.
[23:20:49] <Dark_Shikari> We use the _other_ feature you get from "m".
[23:20:52] <Dark_Shikari> Notice "SWAP"
[23:21:01] <Dark_Shikari> That swaps all _future_ usages of m0 and m1.
[23:21:14] <Dark_Shikari> No actual instruction is used.
[23:21:19] <Dark_Shikari> It's equivalent to an xchg, effectively.
[23:21:25] <Dark_Shikari> So what we did here is
[23:21:30] <Dark_Shikari> If on the first iteration, we SWAP m0 and m1
[23:21:31] <BBB> ah, so you save one instruction
[23:21:32] <Dark_Shikari> otherwise, we accumulate
[23:21:38] <Dark_Shikari> The naive way is to zero the accumulator
[23:21:42] <Dark_Shikari> and add into it 4 times
[23:21:46] <Dark_Shikari> instead, we SWAP into it on the first time
[23:21:55] <Dark_Shikari> and add the 3 other times.
[23:22:57] <Dark_Shikari> So as you can see, the combination of a macro and SWAP can save instructions.
[23:23:03] <Dark_Shikari> note the syntax
[23:23:06] <Dark_Shikari> %macro NAME NUMPARAMS
[23:23:55] <BBB> and so I call it with one of those unrolled-loops where the first argument is %n
[23:24:01] <BBB> so that the first time it swaps
[23:24:05] <BBB> and all other times it adds
[23:24:09] <BBB> thus saving one instruction
[23:24:12] <BBB> I think?
[23:24:26] <Dark_Shikari> yes
[23:24:39] <Dark_Shikari> now, let's look at an example of this abstraction
[23:24:42] <Dark_Shikari> line 219, quant-a.asm
[23:24:45] <Dark_Shikari> we're not going to look at the same.
[23:24:47] <Dark_Shikari> er, the asm.
[23:24:59] <Dark_Shikari> But just accept that we have two macros, QUANT_DC and QUANT_AC.
[23:25:13] <Dark_Shikari> Each takes two arguments: the name and the number of iterations.
[23:25:25] <Dark_Shikari> Each of them calls other macros.
[23:25:36] <Dark_Shikari> QUANT_END, PABSW, and PSIGNW.
[23:25:45] <Dark_Shikari> QUANT_DC also calls QUANT_DC_START, in addition.
[23:26:25] <Dark_Shikari> PABSW reg1, reg2 takes the absolute value of reg2 and puts it in reg1.
[23:26:36] <Dark_Shikari> PSIGNW reg1, reg2 takes the signs from reg2, and applies them to reg1.
[23:26:36] <BBB> this is like perl-scripting in your assembly-editor?
[23:26:42] <Dark_Shikari> I'm not a perl guy.
[23:26:50] <Dark_Shikari> Now, these functions perform h264 quantization.
[23:27:02] <Dark_Shikari> now, PABSW and PSIGNW have two implementations.
[23:27:16] <Dark_Shikari> The first one is the pre-ssse3 implementation, which emulates them with existing instructions
[23:27:23] <Dark_Shikari> For example, absolute value is done exactly how you'd think it would be done.
[23:27:28] <Dark_Shikari> i.e. the asm hack for absolute value
[23:27:32] <Dark_Shikari> which you have probably seen before.
[23:27:52] <BBB> once, it looked scary
[23:28:00] <BBB> the function was called "abs()" so I skipped it ;)
[23:28:04] <Dark_Shikari> open x86uutil.asm
[23:28:11] <Dark_Shikari> %macro ABS1_MMX 2    ; a, tmp pxor    %2, %2 psubw   %2, %1 pmaxsw  %1, %2
[23:28:11] <Dark_Shikari> %endmacro
[23:28:35] <Dark_Shikari> zero a temp reg
[23:28:45] <Dark_Shikari> (0-A)
[23:28:47] <Dark_Shikari> max A, -A
[23:28:51] <Dark_Shikari> get it?
[23:29:07] <BBB> yes
[23:29:07] <Dark_Shikari> max A,-A == absolute value of A
[23:29:19] <BBB> tmp is a "scratch register"
[23:29:21] <Dark_Shikari> ok, so anyways, that's an example of a pre-SSSE3 implementation of what SSSE3 has as a single instruction (pabsw)
[23:29:25] <BBB> ok
[23:29:34] <Dark_Shikari> So, we start by initializing everything to the ordinary, pre-ssse3 variants.
[23:29:38] <Dark_Shikari> QUANT_END is QUANT_END_MMX
[23:29:42] <Dark_Shikari> PABSW is PABSW_MMX
[23:29:45] <Dark_Shikari> PSIGNW is PSIGNW_MMX
[23:29:50] <Dark_Shikari> QUANT_DC_START is QUANT_DC_START_MMX
[23:30:04] <Dark_Shikari> Then we initialize our two quant DC (2x2 and 4x4) and quant AC (4x4 and 8x8) functions.
[23:30:19] <Dark_Shikari> the sizes are 1 (4 coefficients), 4 (16 coefficients), 4 (16 coefficients), and 16 (64 coefficients), respectively.
[23:30:25] <Dark_Shikari> Get what we did there?
[23:31:21] <BBB> I'm there
[23:31:45] <BBB> so these are macros that define whole functions
[23:31:59] <Dark_Shikari> yes
[23:32:05] <Dark_Shikari> QUANT_DC name, iter
[23:32:13] <Dark_Shikari> initializes a dc quant function of name name and number of iterations iter
[23:32:22] <Dark_Shikari> So, then, we move onto the SSE2 versions
[23:32:27] <Dark_Shikari> we INIT_XMM
[23:32:33] <Dark_Shikari> and use half the number of iterations.
[23:32:36] <Dark_Shikari> that was easy.
[23:33:00] <BBB> how come the DC variant suddenly takes 2 arguments instead of 1?
[23:33:05] <Dark_Shikari> 3 arguments, you mean
[23:33:07] <Dark_Shikari> the name is an argument
[23:33:12] <Dark_Shikari> well, here's the reason
[23:33:13] <BBB> oh right, yes, 3
[23:33:20] <Dark_Shikari> first let's start at the beginning
[23:33:24] <Dark_Shikari> normally, xmm registers are caller-save
[23:33:32] <Dark_Shikari> You can use all 8 (or 16, on x86_64)
[23:33:36] <Dark_Shikari> No popping or pushing.
[23:33:47] <Dark_Shikari> And then Microsoft decided we had it too easy.
[23:34:01] <Dark_Shikari> They decided that xmm6-15 were callee-save.
[23:34:10] <Dark_Shikari> In win64.
[23:34:22] <Dark_Shikari> cglobal actually has 3 numerical arguments, not 2.
[23:34:29] <Dark_Shikari> ,numparams, numregs, numxmmregs
[23:34:31] <Dark_Shikari> The last is just optional.
[23:34:35] <BBB> callee-save = value maintained with a call in between?
[23:34:40] <Dark_Shikari> It means we have to push it.
[23:34:43] <BBB> ok
[23:34:52] <Dark_Shikari> if our function uses more than 6 xmmregs (beyond xmm5)
[23:34:58] <Dark_Shikari> we must tell cglobal to push it for us.
[23:35:06] <Dark_Shikari> QUANT_DC does.
[23:35:23] <Dark_Shikari> So the third, optional, argument is the number of xmmregs to push.
[23:35:29] <Dark_Shikari> if you look at QUANT_DC
[23:35:33] <Dark_Shikari> %macro QUANT_DC 2-3
[23:35:43] <Dark_Shikari> 2-3 means 2 required, 3 total (3rd is optional)
[23:35:46] <Dark_Shikari> cglobal %1, 1,1,%3
[23:35:51] <Dark_Shikari> if %3 is set, it gets added to cglobal
[23:35:54] <Dark_Shikari> to tell it to back up those registers.
[23:36:27] <Dark_Shikari> So anyways, we now have initialized our sse2 functions.
[23:36:32] <Dark_Shikari> Then....
[23:36:33] <Dark_Shikari> %define PABSW PABSW_SSSE3
[23:36:33] <Dark_Shikari> %define PSIGNW PSIGNW_SSSE3
[23:36:38] <BBB> cglobal name, 1 function argument, uses 1 register, back up xmm register %3
[23:36:41] <Dark_Shikari> Bam, we now can initialize our ssse3 functions.
[23:36:49] <Dark_Shikari> BBB: back up %3 xmm registers.
[23:37:15] <BBB> because we use m0-7
[23:37:17] <Dark_Shikari> yes.
[23:37:21] <BBB> 6 are save, so back up 2
[23:37:21] <BBB> ok
[23:37:23] <Dark_Shikari> yeah
[23:37:27] <Dark_Shikari> so then we go init ssse3
[23:37:33] <BBB> I saw the pabsw SSSE3 already
[23:37:39] <Dark_Shikari> quant_2x2_dc is too large for sse, only mmx
[23:37:42] <Dark_Shikari> but it still benefits from pabsw/psignw
[23:37:50] <Dark_Shikari> so we INIT_XMM again and make an ssse3 version of that too.
[23:37:58] <Dark_Shikari> Then we go back to XMM with INIT_XMM
[23:38:14] <Dark_Shikari> and %define up our new QUANT_END and QUANT_DC_START macros for sse4
[23:38:16] <Dark_Shikari> and make the sse4 versions.
[23:38:41] <BBB> scary, but good
[23:38:42] <Dark_Shikari> So, using a plethora of macros -- but only one core version of each function
[23:38:47] <Dark_Shikari> we just made mmx, sse2, ssse3, and sse4 versions
[23:38:52] <Dark_Shikari> most importantly -- no code duplication
[23:39:04] <Dark_Shikari> if we found a way to optimize this further, we could go back, change 2 lines, and it would apply to all the functions.
[23:39:13] <Dark_Shikari> If not for the macros, we would have copy-pasted these functions 12 times.
[23:39:41] <BBB> interesting especially because it's built-in, so I don't have to add silly \ at the end of each line
[23:39:48] <Dark_Shikari> Yup
[23:39:53] <BBB> so to go from asm to a macro requires no changes to the actual function body
[23:39:56] <Dark_Shikari> exactly
[23:39:57] <BBB> sweet
[23:40:00] <Dark_Shikari> you can add %macro and %endmacro
[23:40:04] <Dark_Shikari> and put your %1, %2... etc
[23:40:06] <Dark_Shikari> your macro params
[23:40:08] <BBB> right
[23:40:10] <Dark_Shikari> and bam, you've macroed up some huge function
[23:40:11] <Dark_Shikari> in 30 seconds
[23:40:22] <BBB> ok, I'm gonna have to break at this stage, wife is calling me :-p
[23:40:27] <Dark_Shikari> k, we're almost done
[23:40:27] <astrange> hmm reordered_opaque doesn't seem to work perfectly with this divx+packed bframes file
[23:40:29] <Dark_Shikari> ping me when you're ready
[23:40:39] <BBB> haha, ok, sorry to have to break, but this is really cool
[23:40:42] <Dark_Shikari> ah, no problem
[23:40:43] <BBB> I think I'm getting it
[23:40:45] <BBB> brb :)
[23:40:49] <Dark_Shikari> no rush :)
[23:40:51] <astrange> i send in 5 pts values at the beginning and the first one i get back is the largest pts, not the lowest
[23:41:05] <astrange> so ffmpeg.c adds dup frames and probably the a/v sync is wrong
[23:45:32] <mru> Honoome: I'm afraid I can't write that book you requested
[23:48:06] <iive> Dark_Shikari: i have few questions. back to the very start, first line, function name and arguments. are function always getting (n) arguments of native register size. And could asm function return result (like return -1;)
[23:48:50] <Dark_Shikari> yes.  rax is the return register as always.
[23:49:00] <Dark_Shikari> and yes, all arguments are native size.
[23:49:18] <iive> rax even in 32bit?
[23:49:32] <mru> rax, eax, who cares
[23:49:38] <Dark_Shikari> rax maps to eax on 64-bit
[23:49:39] <Dark_Shikari> er, 32-bit
[23:49:46] <Dark_Shikari> the abstraction layer handles that
[23:50:53] <iive> good to know. i would have expected to be RAX if it was abstraction. so you can use and specific register names not just r0,r1 ?
[23:52:16] <iive> rcx/ecx comes to mind.
[23:52:33] <Dark_Shikari> yes
[23:52:40] <Dark_Shikari> that's useful for cl
[23:53:26] <Dark_Shikari> though often you have to come up with custom register assignments for complex functions that you want equally optimal on multiple arches
[23:53:30] <Dark_Shikari> (for functions that are pure-gpr stuff)
[23:53:35] <Dark_Shikari> CABAC being the only example in x264.


More information about the FFmpeg-devel-irc mailing list