[Ffmpeg-devel-irc] ffmpeg-devel.log.20140701

burek burek021 at gmail.com
Wed Jul 2 02:05:02 CEST 2014


[00:02] <Rin_Tohsaka> I'm not a coding guru so I'm not 100% sure about the on-topic-ness about this question ~~ does ffvp9 contain SSSE3 optimizations that, similar to its x64 optimizations, would result in sub-par performance on CPUs lacking such functionality?
[00:04] <nevcairiel> Rin_Tohsaka: both ssse3 and avx optimizations, and most of those are only available on 64-bit on top, so if you run on 32-bit or on a CPU without ssse3, it'll be quite a bit slower
[00:05] <Rin_Tohsaka> Well I'm running 64bit, but I'm running it on one of those pre-Bulldozer AMD CPUs, so...
[00:06] <Rin_Tohsaka> It's a bit depressing to so my little 2GHz Conroe matching something over twice as fast
[00:06] <Rin_Tohsaka> *to see my
[00:07] <nevcairiel> blame AMD for not adding new instructions for a long time
[00:07] <nevcairiel> they only started to play catch-up with bulldozer
[00:07] <Rin_Tohsaka> oh, I do, I do
[00:07] <Rin_Tohsaka> ...only to fall behind in other areas of their CPU design
[00:07] <Rin_Tohsaka> well, at least Bobcat has SSSE3
[00:08] <Rin_Tohsaka> basically I'm seeing almost 3x slower performance when SSSE3 isn't available
[00:08] <jamrial> but afaik it has 64bits execution units, which is awfully slow for sse
[00:10] <Rin_Tohsaka> Well come tommorrow afternoon I'll be aquiring my friends' old E-350 laptop, so I'll definitely investigate the performance 
[00:11] <jamrial> and yes, amd pre bulldozer is not good for ffvp9. 1280x720 videos will run fine, but 1920x1080 or so will not
[00:11] <Rin_Tohsaka> 1280x720 30fps you mean
[00:11] <Rin_Tohsaka> ...which is the most depressing part
[00:12] <Rin_Tohsaka> 1280x720 60fps is nothing but crazy framedrops
[00:14] <Rin_Tohsaka> anyway, I do have a related question.  Perhaps it's due to my lack of SSSE3, but I'm finding that there's pretty much no performance difference between x86 and x64 for VP9 decoding
[00:15] <Rin_Tohsaka> are the x64 optimizations part of the same code that includes the SSSE3 optimizations by chance?
[00:15] <jamrial> the optimizations up to sse2 are all for both x86 and x64. it's only ssse3 and above that (most) require x64
[00:15] <BtbN> Isn't ssse3 part of amd64?
[00:15] <Rin_Tohsaka> that's SSE3
[00:16] <jamrial> no
[00:17] <Rin_Tohsaka> So the impression I'm getting is that SSE3 isn't really useful for video decoding... would this be accurate?
[00:18] <Rin_Tohsaka> or is that just a VP9 thing?
[00:18] <jamrial> sse3 is mostly float instructions, so no
[00:19] <Rin_Tohsaka> theoretically those could be useful in post-processing filters, but that's not really the same as the codec
[00:24] <Rin_Tohsaka> Considering that this channel is logged, I don't suppose it would be a crime if you are quoted elsewhere on the internet in regards to these SSSE3 optimizations, would you?
[00:25] <Rin_Tohsaka> *would it?
[00:25] <jamrial> nope
[00:42] <Timothy_Gu> jamrial: do you know anything about AMD K10 Barcelona's SSE4a?
[00:43] <Timothy_Gu> Wikipedia says it's not the same as Intel's SSE4
[00:43] <Timothy_Gu> What is it exactly?
[00:43] <fionag> It's basically just lzcnt and a couple other things
[00:44] <Timothy_Gu> fionag: you sure?
[00:44] <fionag> http://en.wikipedia.org/wiki/SSE4#SSE4a   even less, apparently.  LZCNT is *technically* under a separate flag, though it's included on all SSE4a chips
[00:44] <Timothy_Gu> https://en.wikipedia.org/wiki/SSE4#POPCNT_and_LZCNT says it's something else
[00:45] <Timothy_Gu> oh ok, thanks
[00:45] <fionag> they marketed them together but they're under separate cpuflags
[00:45] <fionag> I think it was another case of weird naming collision with both companies making new SSEs
[00:45] <Timothy_Gu> so they're not that useful for FFmpeg, right?
[00:46] <fionag> I don't think so, I've never used them
[00:46] <Timothy_Gu> by the way, what does "r4m" mean in asm?
[00:47] <Timothy_Gu> or "argm"
[00:47] <fionag> it means "the stack memory location associated with r4m, or r4 itself if r4m isn't on the stack, as in the case of arch_x86_64"
[00:47] <fionag> *associated with r4
[00:47] <Timothy_Gu> so if the argument isn't declared in cglobal, you have to use "m"
[00:48] <Timothy_Gu> ?
[00:48] <fionag> Ummm...    I guess that's one way of putting it?
[00:49] <fionag> you can use it to access the argument when it's still on the stack
[00:49] <Timothy_Gu> what happens if you don't declare an argument?
[00:50] <Timothy_Gu> it won't be in a register?
[00:50] <Timothy_Gu> on 32
[00:50] <fionag> you mean, declaring its name, or declaring it in the numerical bit?
[00:50] <fionag> like in the cglobal 5, 5, 8 bit
[00:50] <Timothy_Gu> number
[00:51] <fionag> X, Y, Z ->  X is the number of arguments to load into registers, Y is the number of registers to have free (Y has to be >= X)
[00:51] <fionag> so if X is less than the total number of actual arguments, the remaining ones will still be on the stack
[00:52] <fionag> be super careful doing this though, since on e.g. x86-64, if you have 5 arguments and you autoload 3, the remaining 2 will be in registers anyways
[00:52] <fionag> since that's how they come
[00:52] <fionag> so you can't *rely* on them being on the stack unless they're past the first 6.
[00:52] <J_Darnley> And don't forget that win64 only has 4 arguments in registers!
[00:53] <Timothy_Gu> ok, but if you do declare the number, on 32 won't they still be on the stack?
[00:58] <Timothy_Gu> J_Darnley, is there any advantages of win64 that unix64 dont have?
[01:13] <J_Darnley> No, I think most people only regard win64 as having disadvantages
[01:13] <BBB> plepere: why are you shuffling your input data
[01:13] <BBB> plepere: if any shuffling is needed, life is over
[01:13] <BBB> plepere: no shuffling, ever
[01:14] <BBB> plepere: if you need different order, adjust your filter rodata
[01:14] <BBB> plepere: that may indeed mean different rodata per optimization variant, thats pretty typical in fact
[01:14] <J_Darnley> I can't name them but I'm sure other people can.
[01:14] <BBB> plepere: but youre shuffling your input data way too much
[01:20] <wm4> why is life over when shuffling?
[01:41] <BBB> plepere: my short recommendation is to get rid of all vextractf128, vinsertf128, vpermq, change final movu [rdi] into a mova, then try again
[02:02] <fionag> which patch is this?
[02:25] <BBB> he posted a pastebin entry a few hours ago
[02:32] <BBB> <plepere> [22:26:48] BBB : http://pastebin.com/r5HqWC3h
[04:42] <cone-689> ffmpeg.git 03Michael Niedermayer 07master:80da227c660e: cmdutils_opencl: Use av_malloc_array()
[04:42] <cone-689> ffmpeg.git 03Michael Niedermayer 07master:7faa7d3d42af: avcodec/hevc: Use av_malloc(z)_array()
[04:42] <cone-689> ffmpeg.git 03Michael Niedermayer 07master:a97137e94869: avfilter/f_ebur128: Use av_malloc_array()
[11:24] <ubitux> > the multi-threaded h264 decoder from FFmpeg is great& Anton, who wrote it, is a libav developer!
[11:24] <ubitux> huh?
[11:24] <ubitux> afebe2f7cac1e23ea5b198cfe5bfabf5e7f1105f ?
[11:25] <ubitux> also, i don't remember elenril being author or ffmpeg-mt but i might be wrong
[11:39] <plepere> ubitux, can you take a look at my AVX2 code ? I don't have any significant speed-up from going from SSE2 to AVX2.
[11:46] <ubitux> plepere: didn't BBB commented on it already?
[12:53] <kierank> ubitux: you saw my comment
[12:54] <ubitux> kierank: yeah
[12:56] <J_Darnley> It's working!  It's working!  The beat detection is working!
[12:56] <ubitux> J_Darnley: you're working on an audio beat detector?
[12:57] <J_Darnley> As part of a larger visualiser, yes
[12:57] <ubitux> in ffmpeg?
[12:57] <ubitux> that's really great if so, it was in my todo list
[12:57] <J_Darnley> As a separate lib but I test it using an avf filter
[12:57] <ubitux> is it a custom algorithm or you're following a paper?
[12:58] <J_Darnley> Stealing old BSD-3 code
[12:58] <J_Darnley> well "stealing"
[12:58] <J_Darnley> "un-ashamedly copying"
[12:59] <ubitux> grumbl, "idsp"
[12:59] <ubitux> -    DSPContext dsp;
[12:59] <ubitux> +    IDCTDSPContext idsp;
[12:59] <ubitux> :/
[13:00] <ubitux> inverse digital signal processing @_@
[13:41] <cone-145> ffmpeg.git 03Diego Biurrun 07master:adcb8392c9b1: mjpeg: Split off bits shared by MJPEG and LJPEG encoders
[13:41] <cone-145> ffmpeg.git 03Michael Niedermayer 07master:909f53f2b285: Merge commit 'adcb8392c9b185fd8a91a95fa256d15ab1432a30'
[13:47] <plepere> ubitux, BBB just gave me the feedback
[15:36] <cone-145> ffmpeg.git 03Diego Biurrun 07master:e3fcb1434746: dsputil: Split off IDCT bits into their own context
[15:36] <cone-145> ffmpeg.git 03Michael Niedermayer 07master:581b5f0b9b93: Merge commit 'e3fcb14347466095839c2a3c47ebecff02da891e'
[15:53] <cone-145> ffmpeg.git 03Diego Biurrun 07master:79793f833784: Update Fiona's name in copyright statements.
[15:53] <cone-145> ffmpeg.git 03Michael Niedermayer 07master:8d0c7031a846: Merge commit '79793f833784121d574454af4871866576c0749d'
[16:00] <Daemon404> ubitux, too many vowels
[16:00] <Daemon404> IDCTDSPCTX idspctx;
[16:00] <Daemon404> better ^
[16:04] <J_Darnley> :)
[16:33] <J_Darnley> ubitux: if you're interested in seeing the beat detection code, I've got my work in progress here:
[16:33] <J_Darnley> https://gitorious.org/jdarnley/advanced-visualization-studio
[16:34] <J_Darnley> look at the mess that is bpm.c and a little in main.c:avs_render_frame()
[16:39] <cone-145> ffmpeg.git 03Yusuke Nakamura 07master:20f95f21f9b9: mov: Support default-base-is-moof.
[16:39] <cone-145> ffmpeg.git 03Michael Niedermayer 07master:fb318def5d3a: Merge commit '20f95f21f9b9595608ba668a6eca78f2d508be67'
[18:31] <cone-145> ffmpeg.git 03James Almer 07master:dd2c9034b174: x86/swr: convert resample_{common, linear}_double_sse2 to yasm
[19:34] <cone-145> ffmpeg.git 03Diego Biurrun 07master:d0449e754553: vaapi: Update idct_permutation location after dsputil/idctdsp split
[19:34] <cone-145> ffmpeg.git 03Michael Niedermayer 07master:06e5d28e33e9: Merge commit 'd0449e754553b0c110b6cd75f6725b82144fbd2a'
[19:54] <ubitux> J_Darnley: cool ok :)
[20:07] <Daemon404> i like how carl responded with his cut&paste response after ubitux already provided a proper one
[20:10] <cone-145> ffmpeg.git 03Luca Barbato 07master:f1f6156b3fc9: matroska: K&R formatting cosmetics
[20:10] <cone-145> ffmpeg.git 03Michael Niedermayer 07master:8365287e613d: Merge commit 'f1f6156b3fc9eb77b439d031ba18974d80b8341e'
[20:12] <ubitux> -    return (ebml_master){ avio_tell(pb), bytes };
[20:12] <ubitux> +    return (ebml_master) {avio_tell(pb), bytes };
[20:12] <ubitux> lol
[20:17] <JEEB> wat
[20:18] <ubitux> JEEB: it's just the libav K&R dialect
[20:20] <ubitux> i don't understand why the move comments above some lines when it's actually related to one line in particular
[20:21] <ubitux> nor do i understand the need to break some av_log calls when they aren't even overflowing the 80-col limit
[20:21] <ubitux> or stuff like:
[20:21] <ubitux> -    start = 3600000*sh + 60000*sm + 1000*ss + 10*sc;
[20:21] <ubitux> +    start = 3600000 * sh + 60000 * sm + 1000 * ss + 10 * sc;
[20:21] <ubitux> is that really an improvement?
[20:22] <wm4> yes, but high volumes of cosmetic commits are pretty annoying anyway
[20:30] <cone-145> ffmpeg.git 03Luca Barbato 07master:b75a1f9892b5: matroska: Factor out write_track from mkv_write_tracks
[20:30] <cone-145> ffmpeg.git 03Michael Niedermayer 07master:ee78b0c252e5: Merge commit 'b75a1f9892b5b715397edbf837e4d4cda337907b'
[21:00] <cone-145> ffmpeg.git 03Luca Barbato 07master:48e6432407a7: matroska: Factor out mkv_write_stereo_mode
[21:00] <cone-145> ffmpeg.git 03Michael Niedermayer 07master:141ee109132a: Merge commit '48e6432407a73d5006d84609456e6e0bc3dd8fc4'
[21:53] Action: Compn updates fiona in brain
[21:53] <Compn> does still go by 'dark_shikari' or no ?
[21:58] <cone-145> ffmpeg.git 03Michael Niedermayer 07master:4e09300ffab7: mjpegdec: Support pix_fmt_id == 0x22112200
[22:15] Action: TimNich-home cannot see the name Fiona without hearing Mike Myers uttering it in his Shrek Scottish accent
[22:24] <cone-145> ffmpeg.git 03Michael Niedermayer 07master:cd417d947e65: avcodec/mjpegdec: fix width for non chroma in rescaling
[22:24] <cone-145> ffmpeg.git 03Michael Niedermayer 07master:7558e5534581: avcodec/mjpegdec: Support pix_fmt_id==0x11222200
[22:25] <Mavrik> hmm, maybe a stupid question, but how does AVPacket reference counting work? :)
[22:25] <Mavrik> e.g. how do I allocate a reference counted avpacket
[22:51] <michaelni> av_new_packet()
[23:06] <cone-145> ffmpeg.git 03Michael Niedermayer 07master:64d98dadc7d0: avcodec/mjpegdec: set upscale_h/upscale_v using generic code instead of hardcoding a list
[23:42] <J_Darnley> Dammit gdb!  Will you catch that crash!
[23:49] <J_Darnley> Oh of course.  I forgot this pointer was int* rather than uint8_t*
[23:55] <cone-145> ffmpeg.git 03Michael Niedermayer 07master:784e1cf76beb: avcodec/mjpegdec: handle luma upscale detection generically
[23:55] <cone-145> ffmpeg.git 03Michael Niedermayer 07master:ef7e8425e839: avcodec/mjpegdec: factorize some parts of the pix_fmt_id switch()
[00:00] --- Wed Jul  2 2014


More information about the Ffmpeg-devel-irc mailing list