[Ffmpeg-devel-irc] ffmpeg-devel.log.20140112

burek burek021 at gmail.com
Mon Jan 13 02:05:03 CET 2014


[00:03] <ubitux> hey btw
[00:03] <ubitux> for those who have seen the movie the fifth estate
[00:03] <ubitux> there is a ffmpeg reference at around 1h15
[00:04] <ubitux> "or we can use ffmpeg and toss it into final cut"
[00:04] <JEEB> :o
[00:04] <ubitux> who haven't* seen, actually
[00:05] <ubitux> we should create a page, just like nmap, with references to ffmpeg in movies
[00:06] <ubitux> llogan: hey, you should tweet that
[00:06] <ubitux> ;)
[00:10] <Compn> haha
[00:11] <Compn> ubitux : nice catch
[00:11] <Compn> i havent seen it
[00:11] <Compn> how is it? bad against wikileaks or ok ?
[00:11] Action: Compn guesses bad
[00:11] <ubitux> i haven't seen the movie, someone told me
[00:11] <ubitux> look at the scene, it's the heroes so i guess it's the good side assuming you're for wikileaks
[00:17] <Timothy_Gu> llogan, you should really set up a "Trivia" wiki/website page recording all these stuff.
[00:23] <{V}> ubitux, nope doesn't look like they think it's positive http://wikileaks.org/RELEASE-Julian-Assange.html 
[00:24] <ubitux> {V}: i thought the "it" from Compn was about ffmpeg, not the movie
[00:25] <ubitux> yes ofc wikileaks didn't appreciate the movie at all
[00:29] <cone-780> ffmpeg.git 03Michael Niedermayer 07master:2981f6a79fac: avcodec/dct-test: correct output bias of prores idct
[00:29] <cone-780> ffmpeg.git 03Michael Niedermayer 07master:65801040c648: avcodec/dct-test: reproduce 4..1019 clipping when testing prores IDCT
[00:31] <michaelni> BBB, dct-test says the prores asm idct has max_err=1 for 10bit so it should still be working fine
[00:59] <BBB> dunno where the code is that drops 2 bits, can't easily find it, and it's 2yrs since I last looked at it
[01:00] <BBB> but anyway, the asm is not bitperfect even though libav's is
[01:04] <cone-780> ffmpeg.git 03Guillaume Martres 07master:c9fe0caf7a1a: hevc: clip pixels when transquant bypass is used
[01:04] <cone-780> ffmpeg.git 03Guillaume Martres 07master:b00a8b4d194f: hevc: remove useless clip in FUNC(sao_band_filter)()
[01:05] <michaelni> smarter, applied your proper fix, thanks alot!
[01:05] <smarter> you're welcome :)
[01:05] <JEEB> fabulous alots all around \o/
[01:06] <ubitux> BBB: you're refering to 6398c0f7e19fc3637930514cd81dba7cfbbacd4a?
[01:06] <ubitux> (and 370d7ef2c7a9d00ee885da7ff5dec1b879b33650)
[01:06] <michaelni> BBB bit-perfect to apple or the C code ?
[01:08] <ubitux> (ah, and b87d882578807e9a45848a528891bd82a5165712)
[01:14] <ubitux> BBB: btw, transpose should be done, WIP available in my branch, but horizontal doesn't work yet for some reason
[01:43] <BBB> ubitux: awesome!
[01:43] <BBB> michaelni: c code ofc
[01:45] <BBB> ubitux: and the transpose itself works? that is, you can test that by itself and it generates expected output and converges?
[03:15] <michaelni> BBB, see [FFmpeg-devel] [PATCH 5/5] avcodec/x86/proresdsp_init: x86 prores IDCT is bitexact again
[03:15] <BBB> \o/
[03:18] <BBB> well wait that changes the results and in some cases amounts to cheating
[03:18] <BBB> 2/5 is cheating, you're decreasing the accuracy of the coeffs
[03:18] <BBB> (not that I care, I mean, it's prores, I'm just saying)
[03:36] <michaelni> what point do 17bit coeffs make for 10bit data ?
[03:36] <michaelni> and why was 17 enough and not 64bit ?
[03:39] <michaelni> also the coeffs are good enough so that mpeg P frames dont accumulate vissible errors after 300 frames being added on top of each other
[03:39] <michaelni> so there should be quite a bit of headroom for one I 1 frame
[03:42] <michaelni> also PSNR didnt change for the coeff change, that is 2 digits after the decimal point, in the fate test
[03:43] <michaelni> if someone has suggestions for some other test that i could do to ensure that the change has no negative impact ?
[04:05] <Compn> did you check if theres any difference when converting it to bt709 or bt601 ?
[04:05] <Compn> thats all anyone cares about
[04:12] <michaelni> i didnt check but likely there will be a pixel that is +1 or -1 in the picture
[04:12] <michaelni> or actually maybe the fate tests checked that already
[04:20] <cone-780> ffmpeg.git 03Michael Niedermayer 07master:2ce4543286a8: avcodec/dct-test: add support for C prores IDCT
[08:50] <ubitux> BBB: i don't know if the transpose works well, but that's not the problem currently
[08:51] <ubitux> i have a weird crash even without the transpose
[08:51] <ubitux> in the c code, if i transpose, call vertical, transpose, it works fine and passes fate
[08:52] <ubitux> now if i do exactly the same but calling the ssse3 vertical instead
[08:52] <ubitux> i have a crash
[08:53] <ubitux> => 0x0000000000a589c5 <+2421>:	pandn  xmm7,XMMWORD PTR [r10+rsi*1]
[08:53] <ubitux> here i have 0x18016a8 + 384, which is only aligned on 4
[08:53] <ubitux> i'm guessing this is the problem
[08:54] <ubitux> but if so, that's troublesome and i don't really know how to solve that&
[08:57] <ubitux> it was working fine when i wasn't transposing in-place but on a 16*16 temporary stack buffer
[08:58] <ubitux> i wonder if i should reproduce that
[09:26] <ubitux> it seems using an intermediate movu solves the problem
[09:30] <ubitux> or maybe not.
[09:39] <rcombs> ubitux: got a spare XMM register? You could use an unaligned load
[09:39] <ubitux> that's what i was trying to do, but it seems i haven't :(
[09:40] <rcombs> yeah, if you're pre-AVX and you're using a memory pointer as an operand for& pretty much anything but an explicitly unaligned load/store, that has to be aligned
[09:40] <ubitux> what's strange is that i didn't have the issue with the vertical filter
[09:41] <ubitux> as if vertical filter was always aligned
[09:41] <rcombs> what file is this?
[09:41] <ubitux> (which means i could eventually replace all the movu with mova for vertical)
[09:41] <ubitux> well, any fate file
[09:41] <ubitux> + my misc samples
[09:42] <rcombs> there's no significant difference between movu and mova in performance on modern CPUs; you'll just get better performance when you're aligned
[09:44] <ubitux> ok
[09:45] <cone-885> ffmpeg.git 03Stefano Sabatini 07master:d497141b8594: examples/muxing: simplify video PTS setting
[09:52] <ubitux> lol, got it working
[09:52] <ubitux> and my transpose was working out of the box..
[09:55] <nevcairiel> thats always suspicious
[09:57] <ubitux> nevcairiel: are you questioning my awesomen^Wluck?
[09:57] <nevcairiel> :D
[09:57] <nevcairiel> i would never!
[09:59] <ubitux> the thing is that i tested the transpose in a PoC, i only had to do the memory store
[09:59] <ubitux> and luckily it was working out of the box
[10:01] <ubitux> https://github.com/ubitux/FFmpeg/commit/48afad80c2b167d9d761c426a24673c2e349fe64#diff-651121e4f3585d617f2de38cdfbcc3adR65
[10:01] <ubitux> most ugly thing in the world
[10:02] <ubitux> i have no available registers, so i'm doing 3 movu instead
[10:06] <saste> what about removing ft_load_flags from drawtext?
[10:06] <saste> ubitux, ^ ?
[10:07] <ubitux> why?
[10:07] <ubitux> 5340 decicycles in ff_vp9_loop_filter_h_16_16_ssse3, 2096476 runs, 676 skips
[10:07] <ubitux> 3964 decicycles in ff_vp9_loop_filter_v_16_16_ssse3, 2096762 runs, 390 skips
[10:07] <ubitux> i need to improve that&
[10:08] <saste> ubitux, because it doesn't make much sense to fiddle with FT flags
[10:08] <nevcairiel> h is transpose+v?
[10:08] <ubitux> nevcairiel: yes
[10:09] <ubitux> saste: dunno, couldn't they be used to have a bitexact output so we could add the filter to fate?
[10:09] <saste> ubitux, oh i just realized your "why" was not related to my question
[10:09] <ubitux> saste: it was :)
[10:10] <saste> ubitux, the flags are probably too much low level to be exposed to the user
[10:10] <ubitux> even the hinting?
[10:10] <saste> there is something potentially useful, for example monochrome, but they can be set through a dedicated option/flag
[10:11] <saste> about bitexact, that really depends on FT version
[10:11] <ubitux> well maybe there is a bitexact flag in those
[10:11] <ubitux> (pedantic?)
[10:11] <ubitux> anyway, do they cause trouble?
[10:11] <saste> ubitux, see ramiro's patch
[10:12] <saste> it is messing with render, and conflicting with the way we let the user ser +render
[10:12] <saste> *set
[10:12] <ubitux> mmh
[10:12] <ubitux> dunno, no opinion then
[10:20] <ubitux> vertical definitely works assumed an aligned block all the time
[11:28] <ubitux> horizontal is only 3x faster :(
[11:38] <ubitux> 8.20s to 6.28s on overall decode time though
[11:38] <ubitux> (46 to 60 fps)
[11:38] <nevcairiel> just from one function? thats neat :)
[11:40] <ubitux> no, the two
[11:40] <ubitux> :p
[11:40] <ubitux> https://github.com/ubitux/FFmpeg/commit/9dcbf146dfed4c9ad00ea17beb50231b4c62a807
[13:42] <BBB> ubitux: the destination memory is likely not aligned?
[13:42] <BBB> ubitux: use LOCAL_ALIGNED if it's on the stack, or DECLARE_ALIGNED if it's in a struct that is allocated with av_malloc()
[13:47] <BBB> michaelni: I care more from the theoretical point of view, you're trying to save 2 bits of coefficient precision for the sole sake of making the idct like what, 4 instructions faster per 1d or so? I don't remember the exact amount but it was something trivial like that. I don't think that's necessary anymore. But do what you wish, it's your idct, I'm not going to pretend that I care about prores, but it does yet again confirm apple's point, we 
[13:47] <BBB> local tests with 1 person visible checking that it "looks good to him" and then commit as the de-facto implementation to millions of users, and we don't care the least that the results changed - "oh, it still looks ok'ish", that's not how video comparisons should be done
[14:05] <michaelni> BBB the C IDCT has ATM a MSE of 0.04873672, the SSE2 one of 0.02940234
[14:05] <michaelni> so my patchset actually improves accuracy
[14:06] <michaelni> that is as a whole, i didnt yet test them individually 
[14:07] <michaelni> also using 17bit coeffs when every SIMD implementation will need to work with 15bit+1sign bit is quite inconvenient
[14:08] <ubitux> BBB: are you commenting on the final patch?
[14:12] <BBB> ubitux: ?
[14:12] <BBB> oh new patch yay
[14:12] <BBB> no, your comment yesterday (or this morning)
[14:12] <ubitux> it's done and work now
[14:13] <ubitux> but see the note
[14:13] <BBB> movu...
[14:13] <BBB> that sucks
[14:13] <BBB> use mova
[14:13] <BBB> and make the memory aligned ;)
[14:14] <ubitux> for vertical it's mova
[14:14] <BBB> oh I think I see why
[14:14] <BBB> yeah that's actually expected
[14:14] <BBB> it's because vertical loads from topleft block position (which is aligned) - x*stride (which is aligned)
[14:14] <BBB> horizontal loads from topleft position - x
[14:15] <BBB> topleft position is aligned, but x isn't
[14:15] <BBB> and can't be
[14:15] <BBB> so I guess that's ok
[14:15] <BBB> nevermind my comments the
[14:15] <BBB> you use 3 stores after the transpose right, not all 16
[14:15] <ubitux> huh?
[14:16] <BBB> awh
[14:16] <BBB> your transpose stores
[14:17] <BBB> so the ideal situation is that the transpose doesn't just work on memory -> memory
[14:17] <BBB> but the pre-lf transpose goes from memory -> mostly registers
[14:17] <michaelni> BBB, IDCT_1D has 112 instructions less if i didnt miscount. that would be 14 per 8x1 dct
[14:17] <ubitux> BBB: yes that's ugly, but actually there is not much i can do about that
[14:17] <BBB> and the reverse for the post-lf transpose (mostly registers -> memory)
[14:17] <michaelni> but possible some of this difference was from other changes, i didnt double check
[14:17] <ubitux> BBB: i re-use the pre-loaded reg at first
[14:17] <ubitux> BBB: but i need to store them anyway for later re-read
[14:18] <BBB> all of them?
[14:18] <BBB> hm...
[14:18] <ubitux> i tried to remove some store but it didn't work as expected
[14:18] <ubitux> there is probably a way to remove some, but it might be tricky
[14:18] <ubitux> unless i'm missing something obvious
[14:19] <BBB> nah it's not obvious, it's just grinding :-p
[14:19] <BBB> maybe we should add a fixme comment to investigate that later?
[14:19] <ubitux> sure, ok
[14:19] <BBB> I don't mind committing as-is, just would like to document things we think we can improve later on
[14:20] <BBB> you changed the filter14 at the end to use movu unconditionally
[14:20] <ubitux> ?
[14:20] <ubitux> +%ifidn %1, v
[14:20] <ubitux> +%define movu mova
[14:20] <BBB> euwh :-p
[14:21] <BBB> make that movx please, not good to overwrite default macros
[14:21] <BBB> just so it's clear it's sometimes movu sometimes mova :)
[14:21] <BBB> or something not mova and movu and movh and movd and movq etc.
[14:21] <ubitux> so s/movu/movx/ and a define for v and h?
[14:22] <BBB> yes
[14:22] <ubitux> ok
[14:22] <BBB> rest looks ok
[14:22] <ubitux> i'm definitely not satisfied with horizontal
[14:22] <BBB> well, it's a good first try, I mean, it's really a lot faster
[14:22] <ubitux> because of MASK_APPLY_FU, and the memory stores of the transpose
[14:22] <ubitux> but yeah, it's like 3x faster so...
[14:22] <BBB> yeah I saw the mask_apply_fu define
[14:23] <BBB> one thing you could do is allocate more stack
[14:23] <BBB> for the h one
[14:23] <BBB> stack is guaranteed to be aligned, and then use that for temp storage
[14:23] <BBB> then you can always use mova, that should help quite a bit
[14:23] <ubitux> mmh
[14:23] <BBB> (the initial load and final store are still movu, but the rest is mova then)
[14:33] <michaelni> BBB the coeff change patch changes MSE from 0.05012422 to 0.04890000
[14:33] <BBB> you know the prores fdct?
[14:33] <BBB> whoa
[14:36] <michaelni> not really but these tests use the double precission reference fdct
[14:36] <michaelni> iam sure the fdct used in the encoder could be iproved
[14:49] <kierank> i would assume the authors of the decoder have a hook into the fdct
[14:49] <BBB> sure, but the idct in proresdec should optimize for the fdct in proresenc, not some reference floating point fdct we used for academic purposes, that's not used
[14:49] <BBB> e.g. the vp9 idct and fdct were definitely tuned towards each other
[14:58] <michaelni> BBB, for vp9 you where in control of both fdct and idct, for prores apple is in control more or less well actually less. so there are already multiple fdcts around that are used
[14:58] <BBB> ?
[14:59] <BBB> I don't think apple cares the least about what ffmpeg outputs
[14:59] <BBB> you should assume prores is a corporate product with - for their purposes - one fdct and one idct
[14:59] <kierank> apple is in control yes, so ffmpeg must match apple
[14:59] <BBB> whatever ffmpeg does is, in their words, introducing artifacts
[15:00] <michaelni> yes, we should match apples fdct with our idct, sorry i misunderstood you
[15:00] <BBB> brb
[15:01] <michaelni> i dont have test samples from this fdct though so the reference one is the best that i can easily test against
[15:01] <kierank> michaelni: you should ask the authors to  tell you where in the dll the apple fdct is
[15:20] <pross-au> given dll & debugger, they're usually easy to spot.
[15:53] <cone-846> ffmpeg.git 03Kostya Shishkov 07release/1.1:5dcc17992430: vc1: Reset numref if fieldmode is not set
[15:53] <cone-846> ffmpeg.git 03Aurelien Jacobs 07release/1.1:3e089e8f7158: matroskadec: use correct compression parameters for current track CodecPrivate
[15:53] <cone-846> ffmpeg.git 03Martin Storsjö 07release/1.1:12479588d789: sdp: Check that fmt->oformat is non-null before accessing it
[15:53] <cone-846> ffmpeg.git 03Anton Khirnov 07release/1.1:343c87ac19c8: rv30: fix extradata size check.
[15:53] <cone-846> ffmpeg.git 03Anton Khirnov 07release/1.1:f194f2be418a: eacmv: check the framerate before setting it.
[15:53] <cone-846> ffmpeg.git 03Anton Khirnov 07release/1.1:5e7a5dd70b51: gifdec: return meaningful error codes.
[15:53] <cone-846> ffmpeg.git 03Anton Khirnov 07release/1.1:c5c7e3e6f7cf: gifdec: check that the image dimensions are non-zero
[15:53] <cone-846> ffmpeg.git 03Michael Niedermayer 07release/1.1:ce795ac0f53e: Merge commit 'c5c7e3e6f7cf17943c04bd078f260eaf789afbc9' into release/1.1
[16:13] <cone-846> ffmpeg.git 03Anton Khirnov 07release/1.1:819541ff833d: gifdec: convert to bytestream2
[16:13] <cone-846> ffmpeg.git 03Anton Khirnov 07release/1.1:ffa83bcc4937: lzw: switch to bytestream2
[16:13] <cone-846> ffmpeg.git 03Anton Khirnov 07release/1.1:a8f6d93071a8: pmpdec: check that there is at least one audio packet.
[16:13] <cone-846> ffmpeg.git 03Justin Ruggles 07release/1.1:24a8dfd37b45: lavr: check that current_buffer is not NULL before using it
[16:13] <cone-846> ffmpeg.git 03Luca Barbato 07release/1.1:0e8ae6d10c60: mpegvideo: Drop a faulty assert
[16:13] <cone-846> ffmpeg.git 03Michael Niedermayer 07release/1.1:7e3437989769: Merge commit '0e8ae6d10c609bb968c141aa2436413a55852590' into release/1.1
[16:28] <cone-846> ffmpeg.git 03Luca Barbato 07release/1.1:d6d2617d07fc: avio: Use AVERROR_PROTOCOL_NOT_FOUND
[16:28] <cone-846> ffmpeg.git 03Michael Niedermayer 07release/1.1:e776a1e8f37d: ac3dec: fix outptr increment.
[16:28] <cone-846> ffmpeg.git 03Michael Niedermayer 07release/1.1:cdc47c48137f: omadec: check GEOB sizes against buffer size
[16:28] <cone-846> ffmpeg.git 03Michael Niedermayer 07release/1.1:35f9a0896ee6: omadec: Fix wrong number of array elements
[16:28] <cone-846> ffmpeg.git 03Anton Khirnov 07release/1.1:51ff11647f8d: pcx: round up in bits->bytes conversion in a buffer size check
[16:28] <cone-846> ffmpeg.git 03Anton Khirnov 07release/1.1:7b337b122959: truemotion1: make sure index does not go out of bounds
[16:28] <cone-846> ffmpeg.git 03Michael Niedermayer 07release/1.1:c693ccb89a9e: Merge commit '7b337b122959b9bf634c31b549892df974f35b40' into release/1.1
[16:44] <cone-846> ffmpeg.git 03Anton Khirnov 07release/1.1:7c214e313c92: avidec: fix a memleak in the dv init code.
[16:44] <cone-846> ffmpeg.git 03Anton Khirnov 07release/1.1:26221a54eca3: motionpixels: clip VLC codes.
[16:44] <cone-846> ffmpeg.git 03Serhii Marchuk 07master:c917cde9cc52: mpegts muxer, DVB subtitles encoder: common DVB subtitles payload
[16:44] <cone-846> ffmpeg.git 03Anton Khirnov 07release/1.1:cbf51c4d36af: matroskadec: pad EBML_BIN data.
[16:44] <cone-846> ffmpeg.git 03Anton Khirnov 07release/1.1:f9f2591beb11: alsa-audio-dec: explicitly cast the delay to a signed int64
[16:44] <cone-846> ffmpeg.git 03Luca Barbato 07release/1.1:265603675722: ffv1: Assume bitdepth 0 means 8bit
[16:44] <cone-846> ffmpeg.git 03Luca Barbato 07release/1.1:0358a099f8ab: indeo4: Check the block size if reusing the band configuration
[16:44] <cone-846> ffmpeg.git 03Michael Niedermayer 07release/1.1:1203e92181b4: Merge commit '0358a099f8abe60230dc2e5bec59bfceb7d1be07' into release/1.1
[16:57] <cone-846> ffmpeg.git 03Luca Barbato 07release/1.1:03457cabd618: indeo4: Check the inherited quant_mat
[16:57] <cone-846> ffmpeg.git 03Anton Khirnov 07release/1.1:481e55eba7a7: audio_mix: fix channel order in mix_1_to_2_fltp_flt_c
[16:57] <cone-846> ffmpeg.git 03Luca Barbato 07release/1.1:1d7a453dcfe4: prores: Reject negative run and level values
[16:57] <cone-846> ffmpeg.git 03Luca Barbato 07release/1.1:e361fde8b011: avi: properly fail if the dv demuxer is missing
[16:57] <cone-846> ffmpeg.git 03Reinhard Tartler 07release/1.1:f53a5332b017: Prepare for 9.11 RELEASE
[16:57] <cone-846> ffmpeg.git 03Luca Barbato 07release/1.1:5bbee02ae04f: shorten: Extend fixed_coeffs to properly support pred_order 0
[16:57] <cone-846> ffmpeg.git 03Martin Storsjö 07release/1.1:d149c14a2263: mov: Don't allocate arrays with av_malloc that will be realloced
[16:57] <cone-846> ffmpeg.git 03Luca Barbato 07release/1.1:61057f4604eb: avi: directly resync on DV in AVI read failure
[16:57] <cone-846> ffmpeg.git 03Derek Buitenhuis 07release/1.1:5ae7ed3aa4f3: nut: Fix unchecked allocations
[16:57] <cone-846> ffmpeg.git 03Luca Barbato 07release/1.1:65830277d2d2: prores: Add a codepath for decoding errors
[16:57] <cone-846> ffmpeg.git 03Michael Niedermayer 07release/1.1:f479c178943f: Merge commit '65830277d2d2ee3658e1f070a61044fff261ed3e' into release/1.1
[17:07] <cone-846> ffmpeg.git 03Tim Walker 07release/1.1:a0866c71293d: shorten: Fix out-of-array read
[17:07] <cone-846> ffmpeg.git 03Michael Niedermayer 07release/1.1:9b89824f2022: Merge remote-tracking branch 'qatar/release/9' into release/1.1
[18:06] <ubitux> BBB: i'm going to try the stack thing
[18:07] <ubitux> but reading x86inc.asm, it seems the stack align is somehow tricky, and i don't really understand 
[18:07] <nevcairiel> shouldnt it do this for you
[18:08] <ubitux> ;      and an extra register will be allocated to hold the original stack
[18:08] <ubitux> ;      pointer (to not invalidate r0m etc.). To prevent the use of an extra
[18:08] <ubitux> ;      register as stack pointer, request a negative stack size.
[18:08] <ubitux> i'm a bit uncomfortable with this :)
[18:08] <nevcairiel> oh yeah that always confused me as well
[18:08] <nevcairiel> i dont suppose you still have a spare register eh
[18:09] <ubitux> :(
[18:09] <ubitux> i'm already using 8 reg
[18:09] <nevcairiel> well then you are over 32-bit anyway, so one more doesnt hurt
[18:09] <nevcairiel> :D
[18:09] <ubitux> :)
[18:10] <JEEB> aye
[18:15] <cone-846> ffmpeg.git 03Yu Xiaolei 07master:842b8f4ba2e7: fix build with gas-preprocessor.pl
[18:15] <cone-846> ffmpeg.git 03Michael Niedermayer 07master:6044f161d3bc: Revert "swscale: disable ARM code until its build failure with clang/iphone is fixed"
[18:24] <BBB> ubitux: it does it for you
[18:25] <BBB> ubitux: x86inc.asm indeed does creepy stuff, that's why we don't do it ourselves but let it handle it for us
[18:25] <ubitux> so i don't have to care at all about it?
[18:25] <BBB> ubitux: cglobal name, args, gprs, xmms, memory, name1, name2, etc
[18:25] <BBB> exactly
[18:25] <BBB> that's the whole point of x86inc.asm - don't care about complicated stuff that is annoying
[18:25] <BBB> just the easy fun stuff
[18:25] <ubitux> so i just add a 256 field and play with rsp?
[18:26] <BBB> don't write into rsp
[18:26] <BBB> :-p
[18:26] <BBB> but yes
[18:26] <BBB> if you add a 256 field, [rsp+0] until [rsp+256-16] is all yours to mova into
[18:26] <BBB> also do you remember http://blogs.gnome.org/rbultje/2014/01/12/brute-force-thread-debugging/? I finally got to write it down
[18:27] <ubitux> aha!
[18:27] <BBB> I think that's now 1.5 years ago
[18:27] <BBB> I was lazy I guess
[18:27] <ubitux> thx :)
[18:27] <ubitux> i plan to write a few things about that asm initiation
[18:28] <BBB> cool
[18:28] <BBB> x86inc.asm is awesome, maybe we should tell more people about it, yes
[18:30] <kierank> BBB: quite interesting post
[18:33] <jnvsor> Is the overlay filter supposed to drop frames from the higher fps input to match the lower one?
[18:34] <nevcairiel> i would think the master image would dictate that, and the image to-be-overlayed would be dropped or repeated if required
[18:34] <nevcairiel> but i have no clue how its actually implemented
[18:36] <jnvsor> Well my overlay input has a variable framerate (webcam through v4l2) and when the webcam stutters so does the master input, filing a bug now
[18:37] <BBB> kierank: thanks, that's why I posted it here (I don't think there's a planet ffmpeg, is there?)
[18:38] <kierank> I would submit it to reddit/hacker news but I don't think it's a good idea
[18:39] <kierank> you'll be told that the decoder  should be in Haskell
[18:40] <ubitux> :D
[18:40] <ubitux> "that would have never happened with erlang!"
[18:40] <BBB> feel free to submit, I don't know how that works :-p
[18:41] <kierank> done
[18:42] <ubitux> plussed
[18:43] <kierank> BBB: dunno if it's worth mentioning the ratio of horizontal mvs to vertical mvs and how that helps (I  think ffmpeg's frame threading is the same as x264's)
[18:44] <BBB> nah, frame-mt is more a "how this came about", the post is more about debugging it, rather than why frame-mt helps at all
[18:44] <BBB> but feel free to add a comment saying that, it's indeed helpful to know that kind of stuff as a basis for why frame-mt helps at all
[18:45] <nevcairiel> i always figured frame-mt works because mvs mostly just reference one or two rows of MBs, so when those are done decoding the next frame can start decoding its top rows, and so forth
[18:46] <kierank> horizontal movement dominates in a lot of motion
[18:46] <kierank> 5:1 at least
[18:46] <nevcairiel> guess that too
[18:46] <BBB> right, vertical motion tends to be small
[18:47] <BBB> so although there is some lag between progress of frame decoding in each thread, it's close enough that it works for large resolution content
[18:47] <BBB> and that's when you need it most anyway
[18:47] Action: kierank sends that article to someone who had slice thread heisenbugs with 4k
[18:48] <BBB> lol
[18:48] <BBB> frame-mt is happy now, right?
[18:48] <ubitux> yes but valgrind isn't
[18:48] <kierank> said person is having problems making a test case
[18:48] <BBB> :-p
[18:49] <BBB> valgrind valgrind
[18:49] <BBB> yeah I'll look into that
[18:49] <ubitux> ;)
[18:49] <ubitux> BBB: btw, should i push the 3 small itxfm patches?
[18:50] <BBB> I only saw one?
[18:51] <ubitux> really?
[18:51] <ubitux> https://github.com/ubitux/FFmpeg/compare/vp9-itxfm
[18:51] <BBB> the one where you merge the %2 and %3 in mulsub_2w_2x
[18:51] <BBB> oh!
[18:51] <BBB> I get it know
[18:51] <BBB> it shows me all 3 diffs merged
[18:51] <BBB> but it's actually 3 patches
[18:52] <BBB> I didn't see that :-p
[18:52] <BBB> yes they're all fine
[18:52] <ubitux> ok, will apply tonight
[18:54] <BBB> github is funny
[19:03] <ubitux> the stack thing is really going to simplify a lot the horizontal
[19:04] <jnvsor> If the overlay filter got a v4l2 stream that wasn't sending frames regularly, would it block waiting for the next frame?
[19:16] <BBB> ubitux: cool
[19:16] <ubitux> since i can hardcode the position
[19:16] <ubitux> without jungling with dst*q pointers
[19:16] <ubitux> it basically removes all the lea
[19:18] <BBB> only for the h
[19:18] <BBB> the v still needs the leas
[19:18] <BBB> and the h does need the leas for initial load and final store
[19:19] <BBB> (pre-lf transpose load and post-lf transpose store)
[19:19] <ubitux> yes only for h
[19:19] <ubitux> but still, it's cool to see simplification in the slowest one
[19:32] <BBB> yes agreed
[19:35] <ubitux> yay working out of the box
[19:35] <ubitux> let's bench this shit
[19:39] <ubitux> 5259 ’ 4936
[19:39] <ubitux> i love you BBB ;)
[19:40] <ubitux> so exactly 1k more than vertical
[19:40] <nevcairiel> so, you are happy now? :)
[19:41] <ubitux> yeah i guess
[19:41] <nevcairiel> did the FU go away?
[19:41] <ubitux> yup
[19:43] <ubitux> and the file is now 666 lines
[19:43] <ubitux> so i'm really happy
[19:43] <nevcairiel> hehe
[19:55] <ubitux> BBB: https://github.com/ubitux/FFmpeg/commit/03b6166e7ad0413e44e4aaf246f10eab199743e7
[19:55] <ubitux> comments?
[19:58] <ubitux> nevcairiel: i should have closed firefox from the beginning, i got free cycles doing so
[19:58] <nevcairiel> lol
[19:59] <nevcairiel> firefox the cycle eater
[20:11] <cone-846> ffmpeg.git 03Luca Barbato 07master:a1f5164814ac: vc1dsp: K&R formatting cosmetics
[20:11] <cone-846> ffmpeg.git 03Michael Niedermayer 07master:6bd001d76657: swscale: disable ARM code until its build failure with clang/iphone is fixed
[20:12] <cone-846> ffmpeg.git 03Michael Niedermayer 07master:4daf8bc31b91: Merge remote-tracking branch 'qatar/master'
[22:33] <BBB> ubitux: results look good, I don't think I had any further comments
[22:34] <BBB> so ok to commit
[22:34] <ubitux> cool :)
[22:34] <ubitux> will push in a few minutes with the itxfm ones
[22:34] <BBB> \o/
[22:34] <BBB> nice work
[22:36] <BBB> poke me when it's pushed, we should be beating the crap out of libvpx at this point
[22:41] <cone-846> ffmpeg.git 03Clément BSsch 07master:7c55ee6168ea: vp9/x86: merge IDCT coef macros.
[22:41] <cone-846> ffmpeg.git 03Clément BSsch 07master:c9aa0b8f70b1: vp9/x86: remove reg redundancy in VP9_MULSUB_2W_2X.
[22:41] <cone-846> ffmpeg.git 03Clément BSsch 07master:e11ceea68ff3: vp9/x86: factor out some code in VP9_UNPACK_MULSUB_2W_4X.
[22:41] <cone-846> ffmpeg.git 03Clément BSsch 07master:af68bd1c06ff: vp9/x86: add ff_vp9_loop_filter_[vh]_16_16_ssse3().
[22:41] <ubitux> BBB: here you go
[22:41] <BBB> yay
[22:45] <BBB> rebase, rebuild, reblah
[22:46] <ubitux> BBB: anyway,
[22:47] <ubitux> BBB: thanks a lot for your teachings and mentoring :)
[22:47] <ubitux> 'learned a lot with that stuff
[22:48] <BBB> don't forget the 8/4 loopfilters! :-p
[22:48] <ubitux> heh
[22:49] <ubitux> i'm not leaving :D
[22:49] <ubitux> yeah i'll do those
[22:49] <BBB> \o/
[22:49] <BBB> can you measure libvpx vs ffvp9 on your computer?
[22:49] <BBB> I can upload a few more files for that purpose if you want, I generated a few a while ago
[22:50] <ubitux> mmh i'm going to leave soon, but feel free to share the tests you want me to run
[22:52] <BBB> ok
[22:52] <ubitux> don't forget to remove the assert-level option for the bench
[22:52] <ubitux> :D
[23:21] <jnvsor> I notice cone monitors different repos for every developer, how do you guys coordinate?
[23:22] <ubitux> it doesn't
[23:22] <ubitux> unless i misunderstood you
[23:22] <jnvsor> Well your commits at 22:40 are on your github repo but not the main ffmpeg repos at github or ffmpeg.org
[23:23] <jnvsor> Ah wait I've got them from ffmpeg.org now
[23:24] <jnvsor> My bad
[00:00] --- Mon Jan 13 2014


More information about the Ffmpeg-devel-irc mailing list