[Ffmpeg-devel-irc] ffmpeg-devel.log.20140213

burek burek021 at gmail.com
Fri Feb 14 02:05:02 CET 2014


[00:22] <cone-480> ffmpeg.git 03Derek Buitenhuis 07master:50ea93158d4c: Add libx265 encoder
[00:22] <cone-480> ffmpeg.git 03Michael Niedermayer 07master:c3204856331b: avformat/asfdec: pass on error code from avio_seek()
[00:22] <cone-480> ffmpeg.git 03Michael Niedermayer 07master:86316039ab06: Merge commit '50ea93158d4c480f64069e8bd1da388486dcf4ba'
[00:38] <cone-480> ffmpeg.git 03Ronald S. Bultje 07master:91be8df20b57: vp9: add fate sample for parallelmode.
[00:38] <cone-480> ffmpeg.git 03Michael Niedermayer 07master:89c5de66523b: Merge commit '91be8df20b57a18307e90f1c4886a35ea7b28880'
[00:47] <cone-480> ffmpeg.git 03Ronald S. Bultje 07master:dff1c19140e7: vp9: add a new segmentation sample.
[00:47] <cone-480> ffmpeg.git 03Michael Niedermayer 07master:e03c1af55eae: Merge remote-tracking branch 'qatar/master'
[01:27] <llogan> Daemon404: you already have a fan http://ffmpeg.org/pipermail/ffmpeg-user/2014-February/020004.html
[01:30] <cone-480> ffmpeg.git 03Lukasz Marek 07master:9c3478c23434: tools/uncoded_frame: fix double free
[01:32] <Daemon404> yes they didnt rtfm
[01:33] <Daemon404> crf is a private option for x264
[01:33] <Daemon404> i chose to make the options as autoated as possible for libx265
[01:33] <Daemon404> -preset veryslow -x265-params crf=15
[01:34] Action: Daemon404 will maybe map crf if people constantly complain
[01:34] <llogan> they will
[01:34] <Daemon404> i absolutely refuse to do what the x264 wrapper does though
[01:35] <Daemon404> which is duplicate every single option by hand
[01:35] <Daemon404> thts already in x264opt
[01:35] <llogan> crf is also a private option for libvpx
[01:36] <BtbN> btw., can someone take a look at my nvenc encoder, if i made any major mistakes, also regarding ffmpeg code guidlines. It seems to be working fine now.
[01:36] <BtbN> Code is on my github fork: https://github.com/BtbN/FFmpeg/tree/nvenc
[01:36] <Daemon404> i need to map a few generic options in avctxlike bframes, + crf, nothing more i think
[01:39] <J_Darnley> Daemon404: Does x264 support 9 and 12 bit per sample?
[01:39] <J_Darnley> ih x265
[01:40] <cone-480> ffmpeg.git 03James Almer 07master:429f742a61e7: tta: split off hybrid filter processing as ttadsp
[01:40] <Daemon404> no 9
[01:40] <J_Darnley> But yes to 12?
[01:40] <Daemon404> 12 in the future
[01:40] <Daemon404> i believe
[01:40] <Daemon404> im not sure we have a 12 bit colorspace defined
[01:40] <J_Darnley> oh, okay
[01:40] <J_Darnley> nevermind
[01:41] <J_Darnley> I just thought I saw some extra PIX_FMT defines that could be added
[01:41] <Daemon404> high bit depth support is sketchy in x265 right now
[01:43] Action: J_Darnley changes topic
[01:44] <J_Darnley> I'm not sure all this extra code is worth it to maybe save a reg or two
[02:06] <llogan> 6.69 KB/s...downloading snapshot tarball from ffmpeg.org.
[02:34] <Compn> http://mirrors.dotsrc.org/fosdem/2014/Janson/Sunday/NSA_operation_ORCHESTRA_Annual_Status_Report.webm
[02:53] <cone-480> ffmpeg.git 03Michael Niedermayer 07master:ccc48b318b56: avcodec/arm/int_neon: fix handling sizes % 16 != 0
[03:37] <Timothy_Gu> You guys can make it a GSoC project. If not, then I'd do it with a fee.
[03:38] <Timothy_Gu> The spec itself costs money too: http://sale.gb168.cn/Saleagent/Customer/Shopping/StandardDetails.aspx?StandNo=GB/T%2022726-2008
[03:38] <Timothy_Gu> But it's cheap enough (62 CNY, ~$6)
[10:13] <leperep> hello, small question in vp9 asm
[10:13] <leperep> in vp9mc.asm from line 158, the 8 tap filter
[10:14] <leperep> you do 8 loads. Is it not more efficient to do less loads then shuffle the data ?
[11:39] <Skyler_> leperep: it /really/ depends; you'd have to bench it
[11:39] <Skyler_> newer CPUs have a lot of very very fast load hardware, and unless you're load bound, it may often be better to do "wasteful" loads to avoid doing lots of extra palignrs.
[11:41] <Skyler_> though I think the sbutterfly steps could probably be removed by doing something like what happens in x264's hpel_filter_h (munging the order of the coefficients you multiply by to avoid rearranging the inputs)
[12:25] <leperep> thanks Skyler_  :)
[12:54] <BBB> leperep: I tested the palignr-approach, and found it slower
[12:55] <BBB> leperep: in fact, I tested many approaches (and I bet so did Skyler_ in vp8), and this is for some reason just faster
[12:55] <BBB> leperep: note that vp8 mc does the same approach for the h 4/6tap filter
[13:09] <Compn> leperep : patches welcome to speed it up :)
[13:11] <leperep> Compn : I'm on HEVC, I'm looking up vp9 to see how you guys do it. :)
[13:13] <leperep> but yeah, I was taught that memory moves are the most expensive things that you can do, so I was trying to minimize them as much as possible.
[13:14] <leperep> so seeing all the redundant loads is counter-intuitive
[13:15] <BBB> leperep: you can only learn so much; in the end, it's all about measuring it
[13:16] <BBB> I yes I remember seeing your name in th eopenhevc repository in that intrinsics branch
[13:16] <leperep> on hands experience is much better. :)
[13:16] <BBB> can hevc share the mc assembly?
[13:16] <BBB> it's 8-tap also right?
[13:16] <leperep> the qpel is 8-tap
[13:16] <BBB> sharing that would be really cool
[13:17] <BBB> I wrote the interface to be relatively ... non-vp9'y, i.e. it takes coefficients as input parameter in the assembly (see wrapper in x86/vp9_init.c)
[13:17] <leperep> in some instances, it's a 7-tap, but we use it as a 8 tap filter
[13:17] <BBB> and then the coefficients live there too (see top of file), but that can be split into its own file to be vp9-specific if the mc asm itself is shared
[13:17] <BBB> right, that's the same for vp9, it's sometimes 8-tap with one zero coefficient
[13:19] <leperep> it's the subpel1 define ?
[13:19] <leperep> in the dsp_init ?
[13:19] <BBB> filter_8tap_2d_fn
[13:19] <BBB> on top
[13:19] <leperep> ok
[13:20] <BBB> it basically calls the 1d h or v filter function in the assembly with a subpel position-adjusted coeff array
[13:20] <leperep> hmm
[13:20] <BBB> so the asm function is quite literally just h/v_filter(uint8_t *dst, stride, const uint8_t *src, sstride, int16_t filter[8*?]
[13:21] <BBB> oh the coeff array is int8_t, not int16_t, I guess
[13:21] <BBB> and then it's 8*16
[13:22] <leperep> we use (dst, dststride, src, srcstride, width, height, mx, my) with mx and my for the h and v filters (idx in a filter table)
[13:22] <BBB> right, so that means the function needs ot be table-aware
[13:22] <leperep> in vp9 it's always 8x8, right ?
[13:22] <BBB> that was the reason I did it differen tin the assembly, so the asm itself cna be shared, since it's now specific-coefficient unaware
[13:23] <BBB> no vp9 takes any height and width also
[13:23] <leperep> ok
[13:23] <BBB> height is indeed an input parameter
[13:23] <BBB> width is embedded in the function signature
[13:23] <leperep> ok
[13:23] <BBB> h4, h8, h16, v4, v8, v16
[13:23] <BBB> and 32/64 are wrappers in the c that call 16 twice or 4x
[13:23] <leperep> ok
[13:24] <BBB> that's mc_rep_func in vp9dsp_init.c
[13:24] <leperep> but that means that for the 32, you shift the src and dst between calls ?
[13:24] <BBB> and the 8tap-macro to set up the tables is F8_TAPS in vp9mc.asm
[13:25] <BBB> leperep: yes; you could probably gain a little by doing that directly in assembly on e.g. avx2 hardware
[13:25] <BBB> leperep: but I don't have avx2 hardware so I didn't do it yet
[13:25] <leperep> we have no AVX2 here. :p
[13:25] <leperep> but looking at the AVX2 functions, using it must be amazing
[13:26] <BBB> I haven't tried it yet, I believe ubitux wanted to play with it
[13:26] <ubitux> still waiting the hw
[13:26] <ubitux> ("soon"©®")
[13:26] <nevcairiel> is  AVX2 that different? I figured its just SSE2/3 in 256-bit
[13:26] <Skyler_> it's a little bit tricky because of the lanes
[13:26] <leperep> I feel we're a bit insane, thinking of asm as "wanting to play"
[13:26] <Skyler_> and because of the pain of cross-lane shuffles things get a little more complicated when trying to optimize
[13:27] <Skyler_> leperep: don't worry, that's pretty normal!  at least    think    I hope
[13:27] <Skyler_> *I think
[13:27] <leperep> we're the ones having fun. :)
[13:29] <Skyler_> regarding the loads, consider that a modern i7 can typically do ~3 SIMD ops per cycle and ~2 SIMD loads, and very often only one of those ops can be a shuffle
[13:29] <leperep> are unpacks considered OK ?
[13:30] <Skyler_> so as long as you're doing less than 2 loads per cycle and your loads aren't artificially slow for any reason, you can add loads without even taking one more cycle
[13:31] <leperep> thanks for the info, Skyler_ 
[13:31] <Skyler_> unpacks are okay -- like any op you want to minimize them but they're not artificially slow, though on many modern chips only one shuffle per clock is possible, so a function consistent of mostly-shuffles will be slower than you'd want.
[13:34] <leperep> BBB : here's our 8-tap filter in intrinsics : http://pastebin.com/PxYhFvbt
[13:34] <leperep> the filter load will be changed. :p
[13:35] Action: ubitux just sold his soul to the devil by doing some threading work
[13:39] <ubitux> 14 ’ 34 fps
[13:39] <ubitux> yay.
[13:39] <kierank> nice
[13:40] <leperep> how many threads ?
[13:40] <cone-12> ffmpeg.git 03Clément BSsch 07master:13aec744c204: avfilter/lut3d: support slice threading.
[13:40] <ubitux> leperep: dunno, about 8 i suppose
[13:41] <leperep> from a single thread to 8, you go from 14 to 34 fps ?
[13:42] <ubitux> 9 threads actually, yes
[13:42] <leperep> cool. :)
[13:42] <ubitux> but it was tested with a 1080p h264 input, so decoding was probably threaded as well
[13:43] <ubitux> anyway, i can now use lut3d in real time :p
[13:45] <BBB> oh filter threading
[13:46] <ubitux> gonna make the filter a bit more faster
[13:46] <ubitux> i wonder if anyone will add gpu accel to that filter at some point
[13:47] <leperep> gpu accel is only done in openCL, right ?
[13:47] <leperep> (or CUDA)
[13:48] <BBB> leperep: I still think it'd be really cool if the mc 8tap filter code could be shared between vp9 and hevc, but obviously up to you, let me know if you're interested in that and I'll help move things around until it fits
[13:48] <leperep> BBB : I am all for sharing the code
[13:49] <ubitux> leperep: we only somehow support opencl
[13:49] <ubitux> but i've heard gpu love 3d lut
[13:50] <cone-12> ffmpeg.git 03mrlika 07master:af786236cc61: mpegts muxer: DVB subtitles multiple languages support
[13:50] <leperep> BBB : I'm looking at the 8 tap filter, I think I'll try to use a wrapper function from HEVC to your code.
[13:51] <leperep> I don't know how to call you code, so I'll copy your code in my file for the moment
[13:55] <leperep> BBB : do you do the (un)weight operations at the same time as th 8-tap filter ?
[13:56] <BBB> hm...
[13:56] <BBB> so vp9 has no weighting
[13:56] <BBB> is mc in hevc always weighted?
[13:57] <leperep> it has some sort of weighting.
[13:57] <leperep> it can be a simple shift to a complex thing
[13:57] <leperep> for the moment we do the 8-tap, and another function for the weighting.
[13:59] <leperep> so as output of the 8-tap filter, we have  int16_t, which is used as input of the weighting which gives te result back to uint8
[13:59] <leperep> *the
[14:01] <Skyler_> is h265 like h264 in that it does mc -> 8bit -> weight -> 8bit 
[14:01] <Skyler_> or does it do mc -> weight -> 8bit?
[14:02] <leperep> 8bit -> mc -> weight -> 8bit
[14:03] <leperep> we tried putting the weighted in the same function of the mc, but the number of functions increased just too much
[14:03] <leperep> and the gain wasn't there
[14:03] <Skyler_> aahhh. so mc outputs to 16-bit instead of 8-bit like in h264
[14:04] <Skyler_> that sounds vaguely ickier to implement.  and way smarter.
[14:04] <leperep> the weighted is considered part of the mc, I think
[14:05] <leperep> Skyler_, but in the end, we have 4 types of weighting, so you have o multiply by 4 the number of functions.
[14:06] <Skyler_> 4 types???
[14:07] <leperep> put_unweighted_pred, put_weighted_pred_avg, weighted_pred, weighted_pred_avg
[14:07] <Skyler_> ah, okay
[14:10] <leperep> sure it looks cleaner, and it means that you don't have to do 2 loops or even need a temp 16-bit table for the mc results
[14:11] <leperep> but when we tried, the results weren't there and it meant a lot of trouble for an asm implementation.
[14:13] <leperep> BBB : is it possible in vp9 for the width to be 12 or 24 ?
[14:14] <BBB> leperep: no, only powers of 2
[14:15] <BBB> I think putting weight in the mc eventually isn't such a bad idea
[14:15] <BBB> I mean, mc simd is typically very small
[14:15] <BBB> and takes by far most of runtime
[14:16] <BBB> (in c)
[14:16] <BBB> so...
[14:16] <BBB> putting weight in there might not be such a bad idea
[14:16] <BBB> what's the diff between put_unweighted and put_weighted_avg etc?
[14:16] <BBB> isn't it put/avg_un/weighted?
[14:17] <leperep> yes
[14:17] <leperep> put_unweighted_pred, put_weighted_pred_avg, weighted_pred, weighted_pred_avg
[14:17] <leperep> oh wait the differences
[14:17] <BBB> there's 1 unweighted and 3 weighted
[14:17] <cone-12> ffmpeg.git 03Michael Niedermayer 07master:842b6c14bcfc: avformat/mpegtsenc: Check data array size in mpegts_write_pmt()
[14:17] <BBB> and one is avg and put
[14:17] <BBB> I'm confused
[14:18] <leperep> http://pastebin.com/wE6yGE3V if you want the details of each function
[14:19] <BBB> put_weighted_pred_avg is unweighted
[14:19] <BBB> right?
[14:19] <BBB> so the function name is odd
[14:19] <leperep> oh right. :/
[14:20] <leperep> no
[14:20] <leperep> mraulet just told me it's considered as weights of 1.
[14:20] <leperep> kinda silly
[14:22] <BBB> well that's unweighted
[14:22] <BBB> anyway, up to you
[14:22] <BBB> me work
[14:22] <BBB> bbl
[14:23] <leperep> ok ok
[14:32] <kurosu__> are vp9 intermediates 8 bits? hevc's are 16, whatever the bitdepth
[14:32] <leperep> seems so
[14:33] <kurosu__> and iirc, it's mc -> 16bits -> weight -> 8 bits
[14:33] <JEEB> yeah. I guess that is one of the ways they tried to make 8bit encoding suck a bit less
[14:33] <JEEB> because having the intermediates go down to output bit depth all the time certainly didn't help :)
[14:33] <JEEB> (in AVC)
[14:34] <kurosu__> there was also a rounding issue
[14:35] <cone-12> ffmpeg.git 03Clément BSsch 07master:0e97ec54dea9: avfilter/curves: support slice threading.
[14:35] <ubitux> 92 ’ 129 fps
[14:35] <ubitux> threading is insane @_@
[14:36] <nevcairiel> 1/3 boost? doesnt sound like it threads well
[14:36] <kurosu__> depends if the amount of work per pixel is non-constant
[14:37] <kurosu__> (and if the filter is simple)
[14:37] <ubitux> nevcairiel: yeah, but i'd say it's good enough for now :)
[14:39] <kurosu__> can filters be threaded by frame? (like for decoders, even more so if there is no inter-frame dependency)
[14:39] <ubitux> no, unfortunately
[14:40] <ubitux> would be nice to have some "branch" threading in a filtergraph
[14:41] <nevcairiel> slice threading was a much easier goal, so it was implemented first
[14:41] <nevcairiel> works really well for yadif, too
[14:41] <nevcairiel> which is the only thing i use. :D
[14:42] <Daemon404> it the only thing many use :P
[14:57] <Balteck> hi all. Any1 can help me about an unsupported stream?
[15:27] <JEEB> the longest thread gets longer /~o/~
[15:35] <wm4> lol
[15:35] <Daemon404> i wonder wh no intermediate wrk gets submitted
[15:35] <Daemon404> (or is it even possible?)
[15:36] <JEEB> well, he could post the M/S encoding part I guess? not sure tho
[15:36] <JEEB> this looks like a JW32 situation to me tho
[15:36] <JEEB> although in a smaller scale
[15:37] <Daemon404> schizo?
[15:37] <JEEB> no
[15:37] <JEEB> as in "here's a huge dump of code that does nice stuff, please merge"
[15:38] <Daemon404> yeah
[15:38] <JEEB> basically JW32's fork ended up being too large
[15:38] <Daemon404> i will reply on the ticket and ask
[15:38] <JEEB> and no-one is able to review it in a sane time
[15:40] <nevcairiel> he always said he does plan to submit it in smaller parts once he is happy with it. when that is, is another question.
[15:40] <JEEB> yeah, someone already poked him in the thread
[15:40] <JEEB> I think it was ubitux 
[15:40] <Daemon404> ah.
[16:16] <ubitux> 'seems changing if else if into switch case is now called a refactor...
[16:17] <durandal_1707> how would you call it?
[16:17] <ubitux> a change switch from if/else to switch
[16:17] <ubitux> or whatever, but not a refactor
[16:18] <durandal_1707> tell them to get back to school
[16:21] Action: durandal_1707 stop caring long ago
[18:19] <superware> I'm using libav to decode a live network stream, what's the most simple way to sync incoming frames for rendering?
[18:25] <superware> I meant libav*
[18:33] <superware> how should I sync video frames from a live network stream?
[18:39] <Compn> superdump : ask on libav-user mailing list for help with developing application with libav* libraries
[18:39] <Compn> err
[18:39] <Compn> superware : http://ffmpeg.org/mailman/listinfo/libav-user
[18:40] <Compn> is the url to the mailing list with info how to subscribe
[19:01] <superware> Compn: do you know of a more relevant IRC channel?
[19:13] <Compn> superware : you can ask in #ffmpeg and #libav channels
[19:14] <Compn> this channel is more for development of the program and libs itself, not for 3rd party software
[19:56] <Daemon404> interesting
[19:56] <Daemon404> so THATS why trac was down
[19:58] Action: J_Darnley screams
[19:58] <J_Darnley> I can't work out what these XOP instruction macros are doing
[20:03] <Skyler_> which ones?
[20:04] <J_Darnley> specifically pmacsdd
[20:05] <J_Darnley> I've been trying to get A = A + B * [MEM]
[20:06] <Skyler_> that won't work
[20:06] <Skyler_> or actually wait, maybe it will?  trying to think of whether xop supports memory arguments in both of the rightmost positions
[20:07] <Skyler_> oh, hmm, I see.  it has the problem of clobbering A in the first step in the non-xop emulation.
[20:10] <Skyler_> eesh.  I think that particular calculation is actually impossible without either a) clobbering A b) clobbering B c) clobbering a temp
[20:10] <Skyler_> I guess the xop macros are kind of useless for general-purpose use then
[20:11] <Skyler_> I wonder why I didn't run into that before.
[20:11] <J_Darnley> They were designed for your use in x264 not my use in flacenc?
[20:12] <J_Darnley> I'll try and get my own working, or rather assembling.
[20:14] <Skyler_> more like "they happened to work in the one place I used them, but I didn't notice they would break in lots of other places because I was very stupid"
[20:39] <J_Darnley> Is there some kind of penalty for mixing AVX and older instructions on an AVX cpu?
[20:39] <J_Darnley> Something like: pmulld A, B, C; paddd D, A
[20:39] <J_Darnley> Rather than: pmulld A, B, C; paddd D, D, A
[21:01] <Skyler_> J_Darnley: first of all, x86inc automatically handles selection between AVX and non-AVX instructions
[21:01] <Skyler_> secondly, the rule is as follows:
[21:01] <Skyler_> 1) SSE + AVX 128-bit: okay
[21:01] <Skyler_> 2) AVX 128-bit + AVX 256-bit: okay
[21:02] <Skyler_> 3) SSE + AVX 256-bit, on the same registers, without a vzeroupper in between: bad bad bad bad bad bad bad horrible terrible awful
[21:33] <cone-12> ffmpeg.git 03James Darnley 07master:623f380a1844: lavc: fix flac encoder and decoder dependencies
[21:37] <J_Darnley> Skyler_: Thanks
[22:19] <cone-12> ffmpeg.git 03James Almer 07master:23a8c6345200: x86inc: Extend FMA_INSTR functionality
[22:19] <cone-12> ffmpeg.git 03James Almer 07master:e87974bc00e9: flac/x86: add ff_flac_lpc_32_xop()
[22:41] <wm4> I just found ff_cropTbl
[22:41] <wm4> what does it do?
[22:41] <ubitux> clip
[22:41] <ubitux> (iirc)
[23:21] <J_Darnley> If I use a [mem] operand with pshufd (pshufd xmm, [mem], q0123), does the address have to be aligned?
[23:22] <Skyler_> yes
[23:22] <Skyler_> (with SSE, at least)
[23:23] <cone-12> ffmpeg.git 03Derek Buitenhuis 07master:955544e4d031: libx265: Fix use of uninitialized input picture
[23:23] <cone-12> ffmpeg.git 03Derek Buitenhuis 07master:25bc8390bb50: libx265: Remove redundant default param call
[23:23] <cone-12> ffmpeg.git 03Derek Buitenhuis 07master:c107769c680b: MAINTANERS: Add myself as libx265 maintainer
[23:25] <J_Darnley> You mean sse though to sse42
[23:25] <J_Darnley> ?
[23:26] <Skyler_> I mean SSE instructions (vs AVX)
[23:26] <Skyler_> AVX allows unaligned memory refs, though in my experience it doesn't really make any speed difference (vs a separate movdqu)
[23:27] <J_Darnley> what about movq in that case?
[23:29] <Daemon404> Skyler_, do you plan to backport the vpmacsdql emulation stuff to x264?
[23:42] <Skyler_> sure, do you have a patch/link to commit/etc?
[00:00] --- Fri Feb 14 2014


More information about the Ffmpeg-devel-irc mailing list