[FFmpeg-devel-irc] IRC log for 2010-09-04

irc at mansr.com irc at mansr.com
Sun Sep 5 02:00:22 CEST 2010


[02:01:04] <lu_zero> BBB: did you check if the gcc unrolling is within its params and not unrolling it that way gives better result?
[02:01:46] <BBB> I've checked unrolling results more generally a while back
[02:01:54] <BBB> it's always slightly, really minuscully, faster
[02:02:14] <BBB> the problem is that the code size increases per iteration, leading to poor cache handling
[02:02:27] <BBB> so in the end it'll slow down the code as a whole
[02:02:58] <lu_zero> as whole <- different units lump together?
[02:03:06] <BBB> right
[02:03:16] <BBB> loopfilter + idct + intra pred + mc pred + etc. etc.
[02:04:59] <lu_zero> using --combine would give gcc a clue about that
[02:05:38] <BBB> gcc should figure that out itself
[02:05:49] <BBB> I can't spend hours profiling each and every gcc option ;
[02:05:51] <BBB> ;)
[02:06:15] <lu_zero> gcc cannot
[02:06:34] <lu_zero> if those are different files gcc cannot do much
[02:07:26] <lu_zero> --params max-unrolled-insns=val
[02:07:55] <BBB> it depends on the type of loop, really
[02:07:59] <lu_zero> or max-average-unrolled-insns
[02:08:02] <BBB> an assign loop should be unrolled
[02:09:37] <lu_zero> BBB: gcc can do something good within its param but it cannot know how that code will be used by other units so either you merge units with --combine or you do ipa or you tell gcc since you know better
[02:10:08] <BBB> or I write it in hand-written asm because I know better :-p
[02:11:44] <lu_zero> you don't, you might for a single target
[02:11:59] <BBB> right, this is all code in x86/
[02:18:03] <lu_zero> BBB: and you are not going to rewrite everything in asm
[02:18:40] <BBB> well that would take me forever
[02:22:58] <lu_zero> and once you get something changed in C you have to change it everywhere you made an asm version or the other way round...
[04:49:53] <Dark_Shikari> lu_zero: unrolling is almost always bad
[04:49:57] <Dark_Shikari> it is only useful in two cases:
[04:49:59] <Dark_Shikari> 1) in-order CPUs
[04:50:03] <Dark_Shikari> 2) constant propagation
[04:50:06] <Dark_Shikari> (mostly)
[05:35:30] <lu_zero> when you have many register it might get useful as well
[05:36:44] <Dark_Shikari> again, that's only useful in the case of 1)
[05:36:54] <Dark_Shikari> that gives no gain on an out-of-order CPU
[05:37:33] <Dark_Shikari> and yes, if you're on an in-order cpu with few registers, you're screwed.
[05:38:01] <lu_zero> I'm not so sure out-of-order is _that_ smart
[05:38:04] <Dark_Shikari> it is
[05:38:14] <Dark_Shikari> out of order will start executing iteration N+1 of a loop before N is done
[05:38:29] <Dark_Shikari> (assuming the branch to continue the loop is predicted correctly, which it usually is)
[05:38:52] <lu_zero> so you want out-of-order+best-branch-prediction
[05:39:05] <Dark_Shikari> "want"?
[05:39:07] <Dark_Shikari> that's how it works
[05:39:13] <Dark_Shikari> if it didn't work that way, OOE would be useless
[05:39:24] <Dark_Shikari> You're proposing that the CPU flush the pipeline on every single branch, even correctly predicted ones
[05:39:41] <lu_zero> branch prediction misses more than you'd like
[05:40:06] <lu_zero> and that's why branch hinting (pgo or hand made) gives you around 10% more
[05:40:11] <Dark_Shikari> um....
[05:40:16] <Dark_Shikari> PGO does not affect branch hints
[05:40:19] <Dark_Shikari> PGO affects which side of the branch is inlined
[05:40:28] <Dark_Shikari> "branch hints" are totally fucking useless because they're only listened to on the first run
[05:40:59] <Dark_Shikari> "branch prediction" misses more than you'd like -- but not on things which are generally perfectly predictable, like loops that always run 4 times.
[05:41:09] <Dark_Shikari> like, say, a loop over the idcts in a block.
[05:41:55] <lu_zero> from what you say loop unrolling is completely pointless on current intels
[05:42:04] <Dark_Shikari> Except in the case of constant propagation.
[05:42:15] <Dark_Shikari> And in the case where each loop iteration is extremely tiny.
[05:42:26] <Dark_Shikari> (the threshold for "extremely tiny" is... very small though)
[05:42:36] <Dark_Shikari> And in some cases it's actually better to be small
[05:42:45] <Dark_Shikari> the core 2 and core i7 have special buffers for loops of 64 bytes or less
[05:44:37] <lu_zero> it's quite counterintuitive =|
[05:44:41] <Dark_Shikari> Not really
[05:44:46] <Dark_Shikari> Cache space is valuable
[05:45:39] <Dark_Shikari> code should never be larger than necessary unless a gain can be proven to be had.
[05:45:50] <lu_zero> we are going to do everything to avoid cache misses, since branch costs and branch penalities got reduced a lot
[05:46:20] <Dark_Shikari> there are no penalties or costs to perfectly predicted branches
[05:47:14] <lu_zero> ?
[05:47:45] <lu_zero> there is not such thing as perfect prediction
[05:47:58] <Dark_Shikari> Sure there is
[05:48:18] <Dark_Shikari> a loop which always goes 4 times, as in this example
[05:48:23] <Dark_Shikari> will be perfectly predicted on every single run after the first
[05:48:29] <Dark_Shikari> because it always does the same thing
[05:50:10] <lu_zero> and you spare the cache area for the unrolled part
[05:50:42] <Dark_Shikari> exactly.
[05:52:03] <lu_zero> regarding the special loop buffers, they got used specific through specific instructions or the cpu decides if they can be used or not?
[05:53:38] <Dark_Shikari> The CPU just keeps track of the last 64 bytes of decoded instructions.
[05:53:42] <Dark_Shikari> That's it.
[05:53:47] <Dark_Shikari> Thus, if a loop is smaller than 64 bytes
[05:53:53] <Dark_Shikari> it'll reload the decoded instructions from that buffer
[05:53:56] <Dark_Shikari> instead of re-decoding them
[05:54:02] <Dark_Shikari> It's not very fancy.
[05:55:20] <lu_zero> so it's enought to know that and make sure the code fits that way
[05:55:50] <hyc> makes it such a joy to hand-optimize for multiple chips
[05:56:14] <lu_zero> hyc: agreed
[05:57:29] <hyc> loop buffers are cool, the 68010 had one too, but then it was expanded into a full-blown I$ in 68020
[05:57:58] <hyc> x86 ISA is still a boat-anchor
[05:58:17] <hyc> caches for pre-decoded bytes, caches for decoded bytes, what a mess
[05:59:40] <lu_zero> intel is putting lots of effort workarounding
[06:00:18] <Dark_Shikari> hyc: not really
[06:00:25] <Dark_Shikari> the loop buffer doesn't make a big difference
[06:00:26] <lu_zero> I wonder what would happen if they provide a way to directly use the internal isa
[06:00:29] <Dark_Shikari> it's just a nice little convenience
[06:00:32] <Dark_Shikari> if anything, it's moreso for power saving
[06:00:36] <Dark_Shikari> as opposed to speed
[06:00:40] <Dark_Shikari> to avoid running the instruction decoder in tight loops
[06:00:49] <hyc> sure, but you get both benefits out of it
[06:01:01] <Dark_Shikari> it only helps speed if the instruction decoder is the bottleneck
[06:01:03] <hyc> on the 68010 you got effectively 0-cycle loops
[06:01:35] <hyc> (eh. no. 1-cycle.)
[06:02:23] <hyc> I've often wished to have direct access to the underlying microcode language
[06:02:59] <hyc> lu_zero: but I have to agree it's better this way
[06:03:16] <lu_zero> which one?
[06:03:16] <hyc> they can keep improving the hardware while retaining software compatibility
[06:03:35] <lu_zero> hyc: there is a balance
[06:03:44] <hyc> if you were coding directly to the micromachine, you'd be rewriting code all the time...
[06:05:02] <hyc> Dark_Shikari: does bypassing the instruction decoder, using the loop buffer, mean the decoder is free to look at something else? I guess there's nothing else for it to look at if the loop is running
[06:05:32] <Dark_Shikari> no.
[06:05:50] <Dark_Shikari> there is only one instruction flow in the CPU, the pipeline
[06:05:54] <Dark_Shikari> it can't magically go off and do something else.
[06:06:12] <hyc> yeah, makes sense.
[06:07:04] <hyc> was thinking about hyperthreading and getting myself confused
[06:12:42] <hyc> Dark_Shikari: do you get early access to new AMD and/or Intel chips to tune the x264 code for them?
[06:14:04] <Dark_Shikari> It would be nice.
[06:15:12] <hyc> I'll take that as a no. So then you just use their Optimization Guides as a starting point?
[06:19:06] <Dark_Shikari> no
[06:19:08] <Dark_Shikari> I just don't
[06:35:15] <lu_zero> so what you do?
[06:44:23] <Dark_Shikari> I wait until the chip comes out
[06:44:26] <Dark_Shikari> and I get one
[06:44:56] <KotH> .o0(Dark_Shikari must have a huge computer room)
[06:45:40] <saintdev> s/get/get access to/
[06:47:27] <Dark_Shikari> yes
[07:04:33] <lu_zero> good morning KotH !
[07:07:14] <KotH> ohayou lu_zero
[07:18:49] <pentanol> ok, what this involve?
[07:19:40] <pentanol> oops not for this channel;)
[07:20:20] <KotH> pentanol: i think the most important part is using the right channel ;)
[07:43:17] <pentanol> KotH sure;)
[08:46:46] <j-b> kierank?
[08:50:09] <_av500_> 'jour
[10:00:09] <CIA-11> ffmpeg: stefano * r25040 /trunk/ (27 files in 5 dirs):
[10:00:10] <CIA-11> ffmpeg: Rename FF_MM_ symbols related to CPU features flags as AV_CPU_FLAG_
[10:00:10] <CIA-11> ffmpeg: symbols, and move them from libavcodec/avcodec.h to libavutil/cpu.h.
[10:05:58] <CIA-11> ffmpeg: stefano * r25041 /trunk/doc/APIchanges:
[10:05:58] <CIA-11> ffmpeg: Add APIchanges entry corresponding to the libavutil/cpu.h addition of
[10:05:58] <CIA-11> ffmpeg: r25040.
[11:59:04] <mru> morning BBB
[11:59:11] <BBB> what did I break
[11:59:17] <spaam> haha
[11:59:33] <BBB> fate looks greenish...
[11:59:38] <BBB> good morning mru :-p
[12:00:02] <mru> I updated armcc
[12:00:09] <mru> one bug down, at least one new
[12:02:40] <BBB> which one is new?
[12:03:12] <mru> 8 new failures
[12:03:28] <mru> some are almost certainly the same bug
[12:03:36] <mru> like dv failing 4 times
[12:03:46] <mru> wait, make that 12 new
[12:03:50] <mru> 4 were fixed
[12:05:59] <mru> hmm, which one should I investigate first?
[12:11:00] <BBB> is it fixable or is it a compiler failure?
[12:11:09] <BBB> PS config.asm is great, thanks for that!
[12:11:16] <mru> compiler bugs afaik
[12:14:39] <mru> but if I don't report them they'll never get fixed
[14:14:01] <felipec> I want to implement a hwaccel codec, but I need code to initialize/deinitialize and I don't see anything in AVHWAccel to do that
[14:14:08] <felipec> I guess I'll have to add init and close hwaccel functions
[14:14:28] <kierank> what accelerator?
[14:17:36] <felipec> kierank: TI's OMAP3 DSP
[14:17:52] <mru> that's not very specific
[14:18:27] <felipec> tidspbridge
[14:18:45] <mru> lol
[14:19:18] <mru> still not very specific
[14:19:32] <kshishkov> strong static change should init it fine
[14:20:53] <felipec> mru: it is: tidspbrdige requires a series of steps to initialize and deinitialize DSP socket-nodes
[14:21:16] <mru> dspbridge is a hell of a lot more than than a video decoder interface
[14:21:24] <mru> or less, depending on how you see it
[14:21:54] <felipec> mru: yes, but the series of steps to initialize/deinitialize are the same regardless
[14:22:08] <mru> it's like you answering "tcp" when asked what protocol a web server uses
[14:22:16] <mru> and "http" would be the correct answer
[14:22:50] <felipec> mru: but if you want to know which specific socket-nodes: http://www.omappedia.org/wiki/L23.i3.8_Release_Notes
[14:23:53] <felipec> mru: http would be TI's usn, but that sits on top of tidspbridge's API, which would be your tcp
[14:25:41] <felipec> all encoders/decoders provided by TI use the usn protocol, but before even using the usn protocol, the socket-nodes have to be loaded through tidspbridge, and that's regardless of the protocol
[14:35:29] * mru reports another armcc bug
[14:36:22] <mru> overzealous constant propagation this time
[14:43:41] <felipec> ok, so I guess I need to create a new AVCodec with CODEC_ID_MPEG4... but how do I prioritize it over the other ones?
[14:43:56] <mru> sounds wrong
[14:45:15] <felipec> mru: what other option is there?
[14:45:37] <felipec> I need init/close functions, and private data to attach to AVCodec
[14:45:38] <mru> whatever the vaapi and vdpau decoders do
[14:46:11] <mru> you said you were writing a hwaccel thing
[14:46:56] <felipec> mru: they don't need any of the things I mentioned
[14:47:05] <kierank> can the omap do things like hardware fft?
[14:47:27] <mru> the dsp is probably quite good at it
[14:50:06] <felipec> mru: actually, mpeg4_vdpau also registers its own AVCodec
[14:50:14] <felipec> with the same codec id
[14:50:26] <janneg> felipec: add init/close callbacks if you need them
[14:50:56] <felipec> janneg: I started doing that... but then I also need private data
[14:51:20] <janneg> vdpau was added before AVHWAccel was added an is still not converted to to AVHWAccel
[14:51:50] * kshishkov wants to get rid of it because he fells it'll never be converted
[14:51:56] <felipec> AVHWAccel has private data to attach to a "Picture", but nothing to attach to AVCodec
[15:18:01] <felipec> ok, so I've added hwaccel_private to AVCodecContext... however, now I'm wondering... can FFmpeg send more than one frame for decoding to the hw accel at a time?
[15:19:07] <kshishkov> no
[15:19:58] <felipec> kshishkov: ah... then it won't be efficient
[15:20:51] <kshishkov> well, you can introduce decoding delay
[15:22:03] <spaam> kshishkov: back in .de now? :)
[15:22:08] <felipec> kshishkov: what do you mean?
[15:25:07] <kshishkov> felipec: why did you say it won't be efficient?
[15:25:28] <kshishkov> spaam: well, it looks like Stockholm to me
[15:26:08] <spaam> kshishkov: ok :)
[15:26:13] <kshishkov> spaam: though linje sju spårvagn feature a lot of German ads and mention Frankfurt
[15:26:34] <mru> that's the tourist line
[15:27:07] <kshishkov> it's Spårvagn City project now
[15:27:13] <felipec> kshishkov: pushing buffers to the DSP is not a free operation; it requires message passing, cache flushing, etc. so as one frame is being decoded, the next one should be pushed
[15:27:45] <kshishkov> mru: I doubt that planned ending for it (like Ropsten) is too tourist attractive
[15:28:13] <mru> when it opened some 10 years ago it was mainly as a tourist attraction
[15:28:21] <mru> maybe they've expanded it since
[15:28:29] <felipec> otherwise there will be unnecessary latency... in my tests some time ago I noticed some stuttering when only one buffer was handled at a time, and the bitrate suddenly increased
[15:28:44] <kshishkov> felipec: delay means: you push buffer, return nothing, push another one, read data, push another read data, ... push nothing read last frame
[15:29:05] <kshishkov> mru: now it runs to Sergels torg
[15:29:49] <felipec> kshishkov: ah, I think that would also introduce some latency, but it might work
[15:29:59] <kshishkov> mru: http://sl.se/Om-SL/SL-planerar-och-bygger/Sparvag-City/
[15:30:35] <kshishkov> felipec: delay is sinonym for latency indeed
[15:33:36] <felipec> kshishkov: no, I mean, while reading the data, the next buffer could be pushed at the same time, but I'm probably thinking in GStreamer terms (multi-thread)
[15:34:16] <kshishkov> indeed you are
[15:34:43] <mru> gst is known to have efficiency issues
[15:35:08] <kshishkov> does it mean it has some efficiency as well?
[15:35:28] <mru> it sucks up cpu cycles efficiently
[15:38:07] <felipec> that's true, but it's multi-threaded, which is useful for hwaccel
[15:40:50] <Dark_Shikari> mru: "overzealous constant propagation"?  how in the world does that happen?
[15:44:35] <kshishkov> Dark_Shikari: for example, reading from const int*
[15:44:56] <Dark_Shikari> that sounds more like "not understanding what const does"
[15:45:46] <kshishkov> yeees... maybe it was like int random(){return 42;}
[15:46:32] <mru> Dark_Shikari: it propagated a constant to places it shouldn't have
[15:47:08] <Dark_Shikari> example?
[15:47:32] <mru> parse_bs_info() in alsdec.c
[15:48:29] <Dark_Shikari> which values?
[15:48:36] <mru> the div parameter
[15:48:37] <Dark_Shikari> Aha.  it's recursive.
[15:48:46] <mru> it's always called with div=0
[15:48:59] <Dark_Shikari> And so it did constant propagation even though there's a div+=1 ?
[15:49:00] <mru> the compiler "forgot" it's recursive
[15:49:01] <Dark_Shikari> fail.
[15:49:24] <mru> I've reported the bug
[15:49:33] <mru> we'll see what they say in a few days
[15:49:49] <mru> they usually give an initial response rather quickly
[15:58:03] <felipec> I should not use malloc, but I can use posix_memalign?
[15:58:51] <mru> av_malloc
[15:59:51] <Dark_Shikari> av_malloc is already aligned
[15:59:56] <Dark_Shikari> Unless you need more than 16
[16:00:03] <felipec> I need buffers aligned to 128
[16:00:21] <mru> that's a common misconception
[16:00:29] <Dark_Shikari> aligned to 128 for what?
[16:00:33] <Dark_Shikari> no cpu I know of has 128-byte cachelines
[16:00:38] <mru> the dsp has 128-byte cachelines
[16:00:48] <mru> in L2
[16:00:48] <Dark_Shikari> which dsp?
[16:00:50] <Dark_Shikari> "the dsp"
[16:00:52] <mru> the omap3 one
[16:01:06] <mru> guess you missed the start of the discussion
[16:01:46] <Dark_Shikari> ag
[16:01:47] <Dark_Shikari> *ah
[16:11:05] <mru> same bug triggers in huffman.c
[16:11:57] <felipec> is there any restrictions on pthreads?
[16:12:19] <Dark_Shikari> that's a bit of a generic question
[16:12:28] <mru> Dark_Shikari: felipec always asks open-ended questions like that
[16:12:34] <Dark_Shikari> lol
[16:13:46] <kshishkov> so generic answer is "it depends on implementation", use generously
[16:14:32] <felipec> Dark_Shikari: does the FFmpeg community have some restrictions regarding the usage of pthreads?
[16:15:19] <kshishkov> yes, usual Occam shovel - do not use unless really needed
[16:15:38] <mru> felipec: you obviously can't call pthread functions directly
[16:15:49] <mru> we support non-pthread systems
[16:16:09] <mru> besides, there's no need for threads for what you're doing
[16:17:15] <felipec> kshishkov: well, I was going to say a separate thread is needed for DSP messaging, but perhaps I can avoid it
[16:17:29] <mru> why would you need a separate thread?
[16:20:53] <felipec> mru: because there's an ioctl to wait for events, and it might hang forever, but again, I'm thinking on gst terms
[16:21:09] <felipec> in FFmpeg it's ok to block
[16:21:09] <mru> try to not do that
[16:21:48] <Dark_Shikari> it's easy to think in gst terms
[16:21:50] <Dark_Shikari> 1) add queues everywhere
[16:21:52] <Dark_Shikari> 2) does it work?
[16:21:54] <Dark_Shikari> 3) if not, add more queues
[16:22:07] <mru> 4) hope there's enough ram
[16:22:30] <kshishkov> mru: s/hope/assume/
[16:22:38] <mru> pretend
[16:22:53] <kierank> ignore the concept of ram
[16:23:25] <kshishkov> swap it to ideal Turing machine
[16:28:05] <felipec> Dark_Shikari: most of GStreamer is that way... not my elements ;)
[16:33:29] <felipec> when I run ffplay the codec seems to be init'ed/closed once before actually using it
[16:33:41] <felipec> is there a way to differentiate that step?
[16:34:54] <kshishkov> nope
[16:35:56] <felipec> kshishkov: hmm... it's quite wasteful to allocate/free the DSP nodes unnecessarily
[16:36:12] <mru> running ffplay is wasteful
[16:36:49] <kshishkov> well, it's the best player for obscure codecs we have
[16:37:09] <kshishkov> so its purpose is slightly different
[16:45:49] <_av500_> felipec: you have a callback from the sp on returned buffers
[16:45:52] <_av500_> dsp
[16:46:07] <_av500_> make that push a queue and i dont think you need a thread
[16:46:20] <_av500_> just one frame delay as already suggested
[16:46:37] <_av500_> in decode N, you block on the return of N-1
[16:47:28] <felipec> _av500_: there's no callback, only an ioctl to wait for events
[16:47:54] <_av500_> hmm
[16:48:06] <_av500_> LCML something, no?
[16:48:07] <mru> or if that blocks too much, make the queue longer
[16:48:22] <mru> so in decode N you wait for N-x to finish
[16:48:35] <mru> with x as large as it needs to be to avoid waiting very long
[16:48:41] <_av500_> x
[16:48:45] <_av500_> x
[16:48:47] <_av500_> oops
[16:48:58] <_av500_> x=1 should be ok
[16:49:54] <_av500_> felipec: the api that i wrote against had a callback and was some LCML_foo stuff
[16:49:57] <mru> since decoding should take much longer than setup, yes
[16:50:23] <_av500_> felipec: but there might be a thread inside that LCML thing
[16:50:46] <felipec> _av500_: no, LCML sucks... and it's very tied to openmax
[16:50:48] <_av500_> maybe its another wrapper i should skip
[16:51:39] <_av500_> felipec: i use it without omx
[16:51:45] <_av500_> did not see the ties
[16:52:09] <felipec> _av500_: you need openmax headers for starters
[16:52:25] <_av500_> that yes
[16:52:43] <_av500_> for the omx_u32
[16:53:08] <_av500_> and for yet another BOOL definition
[16:53:16] <_av500_> or was that from bridge
[16:53:19] <felipec> _av500_: but at the end LCML has a "messaging thread" that does the same thing; wait for events in a loop
[16:53:30] <_av500_> yes, i remember now
[16:54:27] <_av500_> nevertheless, omx codec wrapper bring yet another thread...
[16:54:34] <_av500_> and i guess gst codec wrapper also have one
[16:57:58] <felipec> _av500_: and it's incredibly stupid that you need dlopen it, you can't just link to it... and it's also mixing map and reserve addresses in the same structure while overwriting some other essential fields, and not even abstracting the dmm stuff correctly, as your code needs to align to 128 the data anyway
[16:59:15] <felipec> _av500_: I tried to cleanup LCML once, and I had many many patches, but it's just impossible to collaborate with TI
[16:59:33] <felipec> it was much easier to write something from scratch: gst-dsp
[17:02:11] <_av500_> felipec: in my case i had 11k! lines of omx wrapper to get rid of, so lcml was the least of my worries
[17:02:46] <felipec> _av500_: I had probably more :)
[17:03:35] <felipec> _av500_: I took MPEG-4 omx code + LCML + libdspbdirge, it was 50k lines of code, and the result was gst-dsp that was doing the same, but in 5K lines
[17:04:05] <felipec> oh, plus whatever gst-openmax was taking
[17:04:25] <_av500_> that i dont count since i dont run gst
[17:04:53] <_av500_> i just care for whats between codec->decode() an the dsp
[17:07:56] <_av500_> felipec: btw, iirc i dont dlopen lcml
[17:08:28] <_av500_> but i might be wrong
[17:14:13] <felipec> _av500_: you are supposed to, otherwise some functions might get overridden... like GetHandle (should be LCML_GetHandle)... see http://lxr.post-tech.com/source/hardware/ti/omap3/omx/video/src/openmax_il/video_decode/src/OMX_VideoDec_Utils.c
[17:16:57] <_av500_> felipec: dont mention VideoDec_Utils :)
[17:18:05] <felipec> _av500_: I know... it's horrible
[18:16:52] <mru> armcc _really_ hates bytestream.h
[18:17:32] <felipec> I recall seeing some rendering support... are decoders able to request a buffer to write from the renderer?
[18:18:03] <mru> AVCodecContext.get_buffer()
[18:19:48] <felipec> mru: thx
[18:21:16] <felipec> I should ignore zero-copy for now though
[18:21:35] <mru> you should never copy stuff needlessly
[18:21:39] <mru> and there is never a need to copy
[18:26:43] <felipec> yes, but I don't want to complicate things right now
[18:42:43] <felipec> and how should fatal errors be reported?
[19:25:21] <twnqx> a = a / 0;
[19:31:20] <felipec> twnqx: seriously? exiting in that way might cause problems to the DSP
[19:31:43] <twnqx> no, i'm not serious
[19:31:47] <kierank> lol
[19:33:02] <lu_zero> hyc: ping
[19:33:31] <lu_zero> you tried ffrtsp -> android on an actual device or the emulator?
[19:38:21] <Vitor1001> wbs: ping
[19:38:53] <felipec> all right, all the DSP communication seems to be working :) now I just need to push the data out
[20:17:32] <wbs> Vitor1001: pong
[20:27:20] <felipec> I don't understand what a hwaccel should do to get something rendered
[20:28:07] <mru> rendering is outside the scope of libavcodec
[20:30:17] <funman> libavcorender!
[20:31:52] <felipec> mru: of course, but decoders write the decoded data somewhere
[20:33:40] <mru> yes, wherever the app tells them to
[20:36:16] <felipec> ah, current_picture_ptr
[20:37:23] <Vitor1001> wbs: Have you ever thought about a way to fate test network protocols?
[20:38:46] <wbs> Vitor1001: not much.. muxers/demuxers that simply either read or write work just fine if using a file instead, rtpenc is in this category
[20:39:22] <wbs> Vitor1001: but that wouldn't test the actual network URLProtocols.. and for frankenstein stuff like the rtsp demuxer, it would require quite a bit of work
[20:40:46] <Vitor1001> wbs: What about a simple program that open a port network port, wait for ffmpeg to connect and follow a script?
[20:40:54] <wbs> Vitor1001: I guess the main question is how to test stuff without implementing fake test servers to run the tests against, if we explicitly want the actual URLProtocols tested
[20:41:12] <felipec> hmm, nope, writing to current_picture_ptr doesn't do anything... and besides that's specific for mp4
[20:41:20] <wbs> Vitor1001: yeah, you could do something simple like that using cat transcript | nc
[20:41:31] <wbs> felipec: hey there :-) Martin here
[20:42:16] <Vitor1001> wbs: exactly, I was thinking something like that. But some protocols might need to wait for the other part to reply.
[20:42:35] <wbs> Vitor1001: the problem is with more complex stuff like rtsp, where it may open many separate connections on other ports, too. otoh, you could at least test rtsp with tcp-interleaving
[20:42:42] <felipec> wbs: hello :)
[20:42:59] <felipec> wbs: I'm writing an omap3 dsp hwaccel ;)
[20:43:04] <Vitor1001> wbs: Ugh, there are protocols that require several ports? :p
[20:43:15] <mru> rtp
[20:43:17] <wbs> felipec: so I see, great that you've got progress :-)
[20:43:30] <wbs> Vitor1001: so you've totally missed out on all the complexity of rtsp/rtp ;P
[20:43:45] <felipec> wbs: I see all the communication happening... buffers in and out... but I don't know where to push the data
[20:43:53] <Vitor1001> wbs: I'm thinking about this because the network protocols is the least covered part of ffmpeg in the regtests
[20:44:20] <wbs> Vitor1001: long story short: rtsp is the control channel, where you negotiate a session and may transfer the actual data in rtp in almost whichever medium you want, inside or outside of that channel.
[20:45:01] <wbs> Vitor1001: yep, I've noticed that myself too - I almost never have to worry about breaking any regtests since there aren't any on the code I'm touching
[20:45:32] <Vitor1001> wbs: What about a program that follow a script that has the following commands:
[20:45:37] <Vitor1001> 1) open port X
[20:45:47] <Vitor1001> 2) wait for x bytes of data in port y
[20:45:57] <Vitor1001> 3) send data x to port y
[20:46:04] <Vitor1001> 4) close port x
[20:46:10] <Vitor1001> ?
[20:46:36] <Vitor1001> where x and y are parameters of the command?
[20:46:54] <wbs> felipec: haven't done much work within lavc myself; either you allocate a buffer internally in your codec and set the pointer to it in the AVFrame the user sent in, or then call get_buffer and set that pointer in the AVFrame returned to the user
[20:47:31] <wbs> Vitor1001: hmm, let me think...
[20:47:49] <felipec> wbs: yeah, but I'm not writing the codec... only the hwaccel
[20:48:05] <wbs> felipec: ok.. well, I know even less about the lavc hwaccel architecture
[20:49:40] <wbs> Vitor1001: for something simple like http, you'd want to both verify that the request that we sent was what we expected (sending bad requests would be failing the test... but sending a differently formatted request that does the same should be ok.. but for that we could of course update the checksums the test verifies against)
[20:50:30] <wbs> Vitor1001: and then feed back data on the port, which the client decodes
[20:50:54] <wbs> Vitor1001: all of this could be handled via cat data | nc > request
[20:51:37] <Vitor1001> wbs: Yes, for something simple like http nc should work fine
[20:52:06] <Vitor1001> wbs: But people will cry it is not portable and etc, and if we need to cook up our own stuff anyway for the more complicated sutff
[20:52:15] <wbs> yeah
[20:52:25] <Vitor1001> wbs: Is there any reason the whole conversation will not be bittexact
[20:52:27] <Vitor1001> ?
[20:53:16] <wbs> for http I think it should be bitexact without any problems, unless we send timestamps in the request for some reason
[20:53:36] <Vitor1001> wbs: Will any protocol you know cause a problem?
[20:53:43] <wbs> the rtp stuff uses both timestamps and random numbers though
[20:53:53] <wbs> the rtp muxer at least
[20:54:18] <wbs> so for regtest mode, we'd have to override them in some way
[20:54:21] <Vitor1001> wbs: We can always use 0 as random number when using "-flags +bitexact"...
[20:55:01] <wbs> yeah. setting timestamps to 0/some fixed value is a bit uglier, but probably doable too
[20:55:09] <Vitor1001> So maybe doing a script and doing the md5 of the whole conversation might work...
[20:55:23] <wbs> yeah
[20:55:35] <felipec> I don't think the hwaccel API has any functionality for that
[20:56:09] <wbs> but like I said earlier, for rtpenc, you wouldn't need to do it over any protocol at all, you should be able to do ffmpeg -i foo -vcodec whatever -f rtp outputfile just fine, and test it like any other muxer
[20:56:18] <Vitor1001> wbs: Cool, I'll think about it a little more, and when I'd have a little time I'll send a RFC to -devel
[20:56:42] <Vitor1001> wbs: You could send a patch for a fate test for it.
[20:57:26] <wbs> then for the rtsp demuxer though... that's the hairiest of them all. if you explicitly request tcp interleaving, I guess that actually could work with the cat foo | nc > out approach too
[20:57:52] <janneg> felipec: you're probably right. the current hwaccels all include decoding and presentation
[20:58:07] <wbs> and if we can cover most of the protocols with that approach, I'm quite ok with testing of network protocols be limited to platforms with a good shell + nc
[20:58:22] <janneg> i.e. the don't return the decoded frame
[20:58:40] <Vitor1001> wbs: That might be ok too.
[20:58:58] <felipec> janneg: yeap... that's what I'm finding out
[20:59:20] <Vitor1001> wbs: I just don't know if mru will not be mad at the idea of "make fate" depending on installed nc...
[20:59:53] <mru> not acceptable
[21:00:12] <Vitor1001> That was fast
[21:00:35] <Vitor1001> mru: what about if we cook up our own nc on tests/?
[21:01:08] <mru> as long as it works for remote testing
[21:01:24] <mru> you might want to run the "server" on the host machine in that casse
[21:01:27] <mru> case
[21:02:37] <Vitor1001> Why not run everything in the client? Will save an extra configure option...
[21:02:58] <mru> huh?
[21:03:26] <mru> running multiple things on the client is not easy
[21:03:35] <Vitor1001> Running both the tests/mini_server and ffmpeg in your beagle
[21:03:54] <Vitor1001> :p
[21:03:59] <mru> what's the problem with running it on the host?
[21:04:50] <Vitor1001> Making cross-compiling and testing still more complicated
[21:05:13] <mru> I thought it would be easier that way
[21:05:35] <Vitor1001> You will have to pass in configure both the client and host ips
[21:05:48] <mru> no
[21:05:54] <wbs> what's suitable input data for a rtpenc muxer test? ideally, we'd want to test the packetization of all different codecs there, too.. can I pick any sample from the fate samples collection and stream copy from them?
[21:06:00] <mru> it's trivial to figure out the host ip
[21:06:05] <Vitor1001> So how will they know who to connect to?
[21:06:10] <funman> wbs: you know multicat?
[21:06:40] <wbs> funman: nope?
[21:07:18] <funman> http://www.videolan.org/projects/multicat.html < i've heard it does things with rtp so it might be useful
[21:08:00] <Vitor1001> wbs: mru: does any of you know a tool that would allow to record a session in a machine-friendly way?
[21:08:44] <wbs> Vitor1001: do you want to capture a real protocol session and record it to a file to build a test out of it?
[21:08:58] <Vitor1001> wbs: Basically, yes...
[21:09:27] * funman points to multicat again
[21:09:35] <Vitor1001> wbs: That would be testing a real-use case, no?
[21:09:39] <wbs> Vitor1001: yeah
[21:09:56] <wbs> I guess you can do it with wireshark in some way
[21:10:12] <Vitor1001> funman: Isn't multicat protocol specific?
[21:10:37] <wbs> for protocols within one single tcp connection I guess you should be able to dump the data from both directions to files in some way
[21:10:38] <Vitor1001> funman: I mean, can I test h264 on AVI over gopher?
[21:10:53] <wbs> funman: hmm, doesn't really seem to be what we need
[21:11:13] <Vitor1001> wbs: I would like to know in which sequence it was send/received...
[21:12:20] <wbs> Vitor1001: well, you get that in the original wireshark capture, but if you dump all of it to files you'd obviously lose the relative orderings of received and sent data
[21:12:30] <wbs> but perhaps you can script it or extract it from a pcap file in some way :-)
[21:13:01] <Vitor1001> wbs: I'll give a look when I have some time, but I never coded for the network stack...
[21:13:15] <funman> Vitor1001: yes it's specific
[21:14:09] <wbs> Vitor1001: wireshark is a great starting point regardless of what you want to do anyway, I's say :-)
[21:14:25] <Vitor1001> wbs: And looks something useful to learn.
[21:14:32] <felipec> is there a way to render some test pattern to an AVFrame?
[21:15:01] <Vitor1001> felipec: Write a video filter
[21:15:27] <Vitor1001> See the "life source" patch in -devel
[21:15:45] <felipec> I just want a red frame or something
[21:15:58] <Vitor1001> Ah, BMP files.
[21:16:18] <Vitor1001> ffmpeg -i red.bmp out.avi
[21:16:22] <Vitor1001> for ex.
[21:20:04] <wbs> uhm.. do we have any tests of muxers at all at the moment?
[21:21:11] <wbs> tests/lavf-regression.sh seems to test only demuxers as far as I can see
[21:22:35] <Vitor1001> wbs: See tests/fate.mak: fate-nsv-demux
[21:22:50] <mru> lavf-regression tests loads of muxers
[21:23:17] <Vitor1001> wbs: But then you can already test the combo demuxing+decoding like most fate tests
[21:24:01] <Vitor1001> wbs: Oops, 10l, you said "muxers" but I read "demuxers" for some reason
[21:25:19] <wbs> mru: ok.. then I just have to read a bit closer to see where and how to hook in a new test
[21:28:01] <wbs> ah, now I see
[21:36:36] <kierank> the fluendo windows media codecs have debug symbols enabled, right?
[21:41:30] <Compn> kierank : some of the binary codecs over the years have had symbols ya
[21:41:37] <Compn> i think mike had a list
[21:45:47] <spaam> why do some arch/compiler fate have fewer tests then other and they are green?   like 707, 713
[21:46:30] <spaam> ppc have fewer then x86..
[21:55:59] <kierank> did anyone ever complain about mplayer shipping binaries?
[22:02:53] <kierank> some nice samples in http://samples.mplayerhq.hu/A-codecs/MP3-pro/
[22:03:11] <kierank> interesting selection
[22:04:55] <kierank> are there any "DTS express" samples Compn
[22:16:28] <felipec> all right... it was working all the time, it's just that for some reason the image is not updated
[22:17:43] <felipec> only after I seek I see the frame
[22:17:44] <merbanan> kierank: why are you wondering about the fluendo codecs ?
[22:18:23] <kierank> merbanan: i remember someone saying they had debug symbols
[22:18:26] <kierank> was just curious
[22:18:53] <merbanan> they have a few
[22:19:12] <merbanan> but they don't have much of info we don't already have
[22:20:32] <kierank> the arcsoft dts symbols seem to have a few so maybe i might try dts express
[22:23:05] <kierank> haha you know when you're looking at dts binaries since there are loads and loads of tables
[22:29:41] <Compn> kierank : never heard of it, any info ?
[22:30:01] <kierank> it's a special codec for blu-ray commentaries
[22:30:07] <Compn> ugh
[22:30:30] <Compn> dont think so, i havent been paying attention to bluray stuff
[22:30:31] <kierank> so instead of a separate track with talking then the background film audio you have the talking encoded separately and played on top of the normal film audio
[22:30:42] <Compn> oh like the reverse onion haha
[22:30:50] <Compn> or i guess like the onion
[22:30:52] <Compn> whichever
[22:38:09] * Compn looks around
[22:40:43] <felipec> is there a way to turn off that 'last message was repeated' blah
[22:48:03] <Compn> use an os/shell that doesnt do that ?
[22:48:10] <Compn> i think its shell related?
[22:49:18] <Dark_Shikari> no it isn't
[22:49:19] <Dark_Shikari> it's an ffmpeg bug
[22:49:36] <Compn> kierank : lol, i see you postin on doom9, someone just asked if you 'knew more about computers you'd know...'
[22:49:36] <Compn> :D
[22:49:41] <Compn> ah
[22:49:48] <Compn> http://forum.doom9.org/showthread.php?t=153332&page=3
[23:00:04] <felipec> I think my issue is with timestamps, but I don't see how
[23:07:13] <felipec> ahg, screw ffplay, I'll use omapfbplay
[23:15:59] <felipec> mru: LIBRARY_PATH doesn't work for cross-compiling as you once said... I can't to ffmpeg when it's on /opt/ffmpeg
[23:19:45] <hyc> mplayer should already have good support for OMAPfb
[23:19:53] <felipec> and now -lavcore is also needed... pkg-config would have dealt with all that
[23:21:02] <hyc> omapfbplay is pretty limited
[23:21:12] <felipec> hyc: I don't need a humongous player
[23:21:12] <felipec> just something to test ffmpeg
[23:22:19] <hyc> ok... I just remember having a lot of problems with omapfbplay. I wrote omapfb patches to get it working with rotation and clipping, I only use mplayer now
[23:23:36] <felipec> hyc: I just want this to work... once that's done then I might write a GStreamer element (not like gst-ffmpeg)


More information about the FFmpeg-devel-irc mailing list