burek021 at gmail.com
Thu Jan 5 03:05:04 EET 2017
[01:52:08 CET] <atomnuker> bofh_: looked around, all ways of removing the memcpy would be slower
[01:52:50 CET] <atomnuker> I'll just precompute the FFT pre/post reindexing in init, apply the pre-indexing in the FFT and the post indexing in the post of the (i)MDCT
[01:54:00 CET] <atomnuker> (we have to copy and apply the twiddles in post for both inverse and forward transforms, so it'll remove the memcpy if I reorder there
[01:55:28 CET] <atomnuker> the fact that it won't be a complete FFT by itself annoys me slightly but oh well, lets hope nothing's going to not use a non-mdct transform in the future
[01:56:24 CET] <atomnuker> (or if they do it'll have to have some kind of a post-fft operation to hide the reindexing into)
[02:14:35 CET] <bofh_> atomnuker: is precomputing the tables actually faster? it's a few simple arith ops
[02:14:45 CET] <bofh_> (iirc a mul, 3 shifts, 3 adds and a negate)
[02:14:49 CET] <bofh_> at least for preindex
[11:17:59 CET] <atomnuker> bofh_: patch's up on the ML
[11:18:23 CET] <atomnuker> haing a LUT for pre/post FFT reindexing did improve performance
[11:19:35 CET] <atomnuker> got away with only having to use a single tmp buffer, though reindexing is done directly with real->complex conversion
[11:20:20 CET] <atomnuker> I had to remove the main body of the FFT function and put it into the MDCT becuase it was going to be unusable for a forward transform
[11:21:20 CET] <atomnuker> and because GODDAMNIT SHIT SEGFAULTED, which turnend out to be fft_calc expecting to write to an aligned address and I'll let you guess that it wasn't
[11:28:39 CET] <atomnuker> disappointed the imdct was only 2x faster considering the FFT by itself is almost 6x faster
[11:29:34 CET] <atomnuker> capped by the input/output rearranging probably, still a simd on fft15 would help
[11:33:16 CET] <atomnuker> (the tmp buffer was already there and it was only used for a real-only->complex conversion to feed into the FFT)
[12:49:04 CET] <cone-892> ffmpeg 03Carl Eugen Hoyos 07master:38e4bcae0944: lavf/matroska: Fix the codec_id for mkv tag A_MPEG/L1.
[12:49:49 CET] <BtbN> I wonder what some people thing about opening a massively broken Pullrequest, just to ask about some obscure issue.
[12:49:52 CET] <BtbN> *think
[12:50:44 CET] <BtbN> also, new NUCs!
[12:52:20 CET] <BBB> BtbN: its the new poke
[12:53:07 CET] <BtbN> https://github.com/FFmpeg/FFmpeg/pull/246
[12:53:21 CET] <BtbN> "wants to merge 861 commits into release/3.2 from master"
[12:53:53 CET] <BBB> atomnuker: you crazy bugger, thats amazing
[12:57:47 CET] <Chloe> BtbN: I'm not sure how people miss all the notices
[13:03:29 CET] <BtbN> It's not even a Pull Request in this case.
[13:03:40 CET] <BtbN> Just randomly pulling some branches, to ask about an issue.
[13:04:47 CET] <durandal_1707> BBB: fixed decoding bug, needs now to clean up the code
[13:05:26 CET] <durandal_1707> I dunno shold I use imdct or ifft
[13:19:57 CET] <BBB> durandal_1707: \o/
[13:19:59 CET] <BBB> brb
[13:25:53 CET] <cone-892> ffmpeg 03Paul B Mahol 07master:72d61015109e: avfilter/avf_aphasemeter: fix memleaks
[13:47:43 CET] <durandal_1707> BBB: im using ifft right now, but waves from subframes don't line up
[13:57:54 CET] <BBB> there must be some kind of overlap that is probably non-standard
[13:58:03 CET] <BBB> I cant quite say how it should work :/
[15:02:36 CET] <bofh_> atomnuker: cool, many many thanks
[15:04:32 CET] <bofh_> atomnuker: so I b/c the post-reindex for the MDCT is the same as the one for the standard imdct I re-used the asm for that (modified slightly so it unpacks stuff from interleaved to size-4 blocks). but if you've totally modified it to merge things then that wouldn't be very useful. I'll try to do a SIMD impl of fft15 today, it shouldn't be too hard since the input stride is fixed.
[15:07:48 CET] <bofh_> looking at the patch now
[15:20:23 CET] <bofh_> bit surprised you don't keep at least the fft5 twiddles hardcoded, it's 4 floats and it would seem leaving them as immediates might make loading them faster on some arches. did this turn out to not be the case?
[15:23:44 CET] <bofh_> (Also oops, I guess I forgot to use a DECLARE_ALIGNED() where it was needed)
[15:26:05 CET] <bofh_> (and yeah, in the MDCT case I knew you could do postreindexing at the same time as conversion, but it feels a bit... inelegant to integrate the FFT inseparably into the MDCT, though it will make it faster).
[15:29:21 CET] <durandal_1707> BBB: the waves either overlap to zero or do not overlap at all
[15:30:41 CET] <BBB> dont overlap at all - you mean in some cases if there was zero overlap the sound would continue perfectly?
[15:30:53 CET] <BBB> so maybe theres some bitstream toggle to switch overlap on a per-subframe basis?
[15:30:58 CET] <BBB> (that would be strange)
[15:57:23 CET] <durandal_1707> BBB: nope I'm cleaning code, trying to use lavc fft implementation
[15:59:09 CET] <durandal_1707> it perfectly goes to zero after each subframe ending
[17:08:30 CET] <BBB> that vp9 file is not black at all
[17:08:35 CET] <BBB> pebkac?
[17:09:05 CET] <BBB> durandal_1707: I have no idea at all, sorry :( &
[17:12:15 CET] <philipl> BtbN: you want to chime in on that thread about failing out early in cuvid?
[17:12:38 CET] <BtbN> Didn't he want to re-do something?
[17:13:00 CET] <philipl> yeah, he said that 422/444 worked for mjpeg (not a huge surprise, I think).
[17:13:10 CET] <philipl> but that means his change will start having codec specific conditionals in it.
[17:14:43 CET] <jamrial_> BBB: closed with no comment by the reporter, so guess pebkac indeed :p
[18:21:52 CET] <bofh_> atomnuker: nm, you want to easily do forward/inverse, that's why.
[18:24:38 CET] <bofh_> atomnuker: also the issue with the imdvt, at least one of them, is that input stride argument
[18:26:00 CET] <bofh_> it can't be simplified out by the compiler, but as far as I can see will always be 1 or 2 (& a small change in opus_celt.c should make it possible to only need imdct stride == 1)
[18:40:09 CET] <atomnuker> bofh_: also we don't DECLARE_ALIGNED on static structs
[18:47:06 CET] <bofh_> any reason why?
[18:47:24 CET] <bofh_> does it fail on msvc or something?
[19:08:13 CET] <atomnuker> oh wait, we do, nvm
[19:08:37 CET] <atomnuker> bofh_: also hardcoding the exptab resulted in being quite inaccurate at large frame sizes
[19:10:20 CET] <bofh_> aoh, that's because the hardcoded values aren't rounded correctly and truncated 1-2 digits too early
[19:10:25 CET] <bofh_> whoops
[19:11:31 CET] <bofh_> worst-case error for me at a frame size of 2^12 * 15 was still only ~1e-5, though...
[19:25:22 CET] <durandal_1707> BBB: i cleaned up code a lot, but I still wonder why I don't get high frequencies
[19:26:13 CET] <durandal_1707> I have high frequency tones but missing treble I think
[19:33:46 CET] <Phi_> Hi folks, I've got a busted make install on the master branch, MinGW
[19:37:06 CET] <Phi_> https://bpaste.net/show/fc5a65b0e6f8
[19:38:02 CET] <Phi_> Should I be posting this here, or on #ffmpeg ?
[19:39:08 CET] <jamrial> on #ffmpeg
[19:44:54 CET] <BtbN> No idea if make install on msvc is even tested or intended
[19:46:34 CET] <jamrial> it is using an msys environment. we have fate clients doing that
[19:47:36 CET] <BtbN> is this even ffmpeg, or libav? Can't finy any reference to doc/examples/libav
[19:51:14 CET] <jamrial> ffmpeg, but looks like it could be a dirty tree
[20:08:29 CET] <Phi_> it's ffmpeg, and I think MSVC is intended (toolchain=msvc)
[20:09:04 CET] <Phi_> I did do a git reset --hard before I updated the code
[20:09:11 CET] <Phi_> so it shouldn't be corrupt
[20:09:48 CET] <JEEB> if you want to clean up the code dir, there's also `git clean -dfx` (and yes, this will blow everything away)
[20:13:46 CET] <Phi_> I'm just re-cloning it now
[20:14:37 CET] <jamrial> make sure the build directory is also clean if you're doing an out-of-tree build
[20:15:42 CET] <Phi_> yep
[20:15:49 CET] <Phi_> so I'll move to #ffmpeg?
[20:17:30 CET] <jamrial> yes
[20:18:12 CET] <Phi_> *scurries away*
[20:21:48 CET] <bofh_> durandal_1707: can I have a look at your code? might have an idea
[20:26:51 CET] <durandal_1707> bofh_: github.com/richardpl/FFmpeg/tree/qdmc
[20:32:19 CET] <durandal_1707> for some files I still get wrong noise band selected
[20:32:52 CET] <durandal_1707> manual intervention helps though
[21:10:02 CET] <atomnuker> bofh_: do you have any idea what the revtab for the forward MDCT look like?
[21:10:46 CET] <atomnuker> not the FFT revtab, the MDCT revtab
[21:11:38 CET] <atomnuker> I had some code which didn't work for the previous FFT which would split and stride the input by 2 and then by 3 (for the 3x5 FFTs)
[21:12:01 CET] <atomnuker> (and that's the only thing I'm missing to get the mdct to work)
[21:16:24 CET] <durandal_1707> 3d fft?
[21:19:02 CET] <atomnuker> no, the FFT on the mailing list is something like a 2D FFT with different dimensions
[21:19:40 CET] <atomnuker> that's for a 15-point FFT where you 5-point FFT the input with a stride of 3 to recombine into 1
[21:31:42 CET] <kierank> atomnuker: does 15 have some special meaning
[21:31:46 CET] Action: kierank should read more about fft
[21:32:29 CET] <fritsch> kierank: it's a prime
[21:32:32 CET] <fritsch> :-)
[21:32:36 CET] <fritsch> and important in that case
[21:32:57 CET] <fritsch> I still thought the padding approach would be really slower - but cannot prove it
[21:33:41 CET] <Sesse> hi. I'm trying to get speedhq up in ffmpeg, but I'm struggling a bit with the VLC tables. what is the static_size parameter to INIT_VLC_STATIC? how do I know what to set it to?
[21:35:02 CET] <BBB> set it super-small and increase it until valgrind doesnt complain / it doesnt crash, or set it super-large and decrease it until valgrind starts complaining / it starts crashing
[21:35:12 CET] <Sesse> ... :-P
[21:36:11 CET] <BBB> I think its a performance parameter
[21:36:21 CET] <kierank> Sesse: meh just init the table in a context
[21:36:22 CET] <atomnuker> kierank: 15*(2^n) is one way of getting to the opus framesizes
[21:36:25 CET] <BBB> let me see
[21:36:35 CET] <BBB> kierank: Ive used STATIC, its fine
[21:36:38 CET] <Sesse> BBB: I thought that was the bit_size stuff
[21:36:49 CET] <Sesse> that set the LUT size for a single lookup
[21:36:59 CET] <BBB> Sesse: yes youre right, looking
[21:37:44 CET] <peloverde> Sesse: IIRC it's the number of table rows needed given that bit_size
[21:38:06 CET] <Sesse> peloverde: and content, presumably?
[21:38:14 CET] <Sesse> I guess you have these overflow entries
[21:38:17 CET] <Sesse> (for longer codes)
[21:38:48 CET] <Sesse> perhaps I should just heed the "needed %d had %d" message if it complains
[21:39:20 CET] <peloverde> If my bit size is four and I have two five length codes, both of those codes would need an extra row
[21:39:26 CET] <Sesse> yes
[21:40:06 CET] <Sesse> okay, so basically (1 << num_bits) + num_extra_rows
[21:40:08 CET] <Sesse> makes sense
[21:40:48 CET] <Sesse> next up is figuring out which bits ffmpeg wants reversed and which it doesn't :-) (this is a little-endian codec, but it reuses a ton of the mpeg-2 tables)
[21:44:15 CET] <BBB> Sesse: its floor(largest_bitsize / num_bits) << num_bits + num_extra_rows, I think
[21:44:35 CET] <Sesse> BBB: makes sense. in any case, now I reversed the right amount of times and my code magically built and seems to work
[21:45:03 CET] <BBB> lol
[21:45:11 CET] <BBB> Im pretty sure thats how we all got started
[21:45:18 CET] <BBB> in fact I still do it that way sometimes ;)
[21:52:33 CET] <durandal_1707> Sesse: you RE it or have source already?
[21:52:41 CET] <Sesse> durandal_1707: do I want to comment on this? :-)
[21:53:08 CET] <BBB> if you got the source illegally, you might not specifically want to mention that
[21:55:58 CET] <kierank> durandal_1707: he RE'd it
[21:56:18 CET] <Sesse> kierank: snitch! :-P
[21:56:39 CET] <kierank> Sesse: pretty commonplace here tbh
[21:58:03 CET] <durandal_1707> intra only is easy to do
[21:58:26 CET] <Sesse> frustrating enough the first time, I can assure you
[21:58:30 CET] <durandal_1707> try clear video one :)
[21:59:24 CET] <kierank> I need to write an email explaining all the problems I have with 10-bit and rgb in mpegvideo.c
[22:00:03 CET] <BBB> <kierank> I need to write an email explaining all the problems I have with [..] mpegvideo.c
[22:00:17 CET] <BBB> :-p
[22:00:20 CET] <Sesse> compression people are the only people in the world who can write stuff like this with no comment whatsoever:
[22:00:25 CET] <Sesse> level = ((level * 2 + 1) * qscale * quant_matrix[j]) >> 5;
[22:00:28 CET] <Sesse> level = (level - 1) | 1;
[22:00:31 CET] <Sesse> level = (level ^ SHOW_SBITS(re, &gb, 1)) -
[22:00:33 CET] <Sesse> SHOW_SBITS(re, &gb, 1);
[22:01:18 CET] <kierank> Sesse: https://www.urbandictionary.com/define.php?term=ricing
[22:01:33 CET] <Sesse> kierank: comments don't slow down your code
[22:01:40 CET] <kierank> I agree
[22:02:05 CET] <Sesse> I suppose the last one is something like if next bit is 1, level = -level
[22:02:46 CET] <BBB> looks like it
[22:02:53 CET] <Sesse> but what about the second line?
[22:02:54 CET] <BBB> the xor trick is pretty common to apply signs from bitstreams
[22:03:10 CET] <Sesse> BBB: you'd hope that could be wrapped in a macro, or even the compiler could deal with it
[22:03:27 CET] <BBB> I dont disagree
[22:03:33 CET] <BBB> but Im too tired to fight the fight
[22:03:36 CET] <Sesse> :-)
[22:03:41 CET] <BBB> -ENOTWORTHIT
[22:05:21 CET] <BBB> the second line is strange, I dont know TBH
[22:05:33 CET] <BBB> ask michaelni, he often knows stuff like that
[22:08:01 CET] <michaelni> its how the dequant is defined by the standard IIRC
[22:08:31 CET] <Sesse> aha
[22:08:36 CET] <Sesse> well, not relevant for shq, then
[22:10:11 CET] <bofh_> kierank: telecom framesizes tend to be of that form b/c they correspond to integer numbers of ms for a frame when your samplerate is a multiple of 8000
[22:11:11 CET] <bofh_> kierank: jmspeex's background is telecom/speech coding and CELT is supposed to be able to do voip, hence the odd/insane choice of framesize
[22:12:15 CET] <kierank> bofh_: oh it's you from twitter
[22:12:18 CET] <kierank> bofh_: yeah
[22:27:25 CET] <atomnuker> bofh_: no, it was telecom people demanding, CELT was power of two
[22:41:53 CET] <kierank> 9:08 PM <michaelni> its how the dequant is defined by the standard IIRC
[22:41:55 CET] <kierank> it's some idiom afaik
[22:42:56 CET] <kierank> an attempt to merge dequant and bitstream reading somehow
[22:45:01 CET] <kierank> doesn't match anything in the spe
[22:45:02 CET] <kierank> c
[22:46:53 CET] <michaelni> libavcodec/mpegvideo.c, dct_unquantize_mpeg1_intra_c, level = (level - 1) | 1;
[22:46:59 CET] <michaelni> no bitstream here
[22:48:46 CET] <Sesse> I was wondering if it was some kind of anti-rounding
[22:48:56 CET] <kierank> A "normal" coefficient in which a value of run and level is decoded followed by a single bit, s, giving the
[22:48:57 CET] <kierank> sign of the coefficient signed_level is computed from level and s as shown below. run coefficients shall be
[22:48:57 CET] <kierank> set to zero and the subsequent coefficient shall have the value signed_level.
[22:48:57 CET] <kierank> if (s ==0)
[22:48:57 CET] <kierank> signed_level = level;
[22:48:57 CET] <kierank> else
[22:48:57 CET] <kierank> signed_level = (-level);
[22:48:59 CET] <kierank> it's that
[22:49:08 CET] <Sesse> kierank: thats the third line, no?
[22:49:48 CET] <iive> Sesse, michaelni using showbits twice kind of looks like it might be slower, than using a variable
[22:49:55 CET] <kierank> then I don't know
[22:50:03 CET] <kierank> I didn't write the mpeg-4 sstp dequant like that
[22:50:18 CET] <kierank> because I didn't understand what mpeg12 was doing
[22:50:23 CET] <iive> the second line looks like odd/even thing.
[22:50:36 CET] <iive> probably rounding down.
[22:50:57 CET] <iive> e.g. (3-1)|1 = 3
[22:51:07 CET] <iive> (4-1)|1 = 3
[22:51:25 CET] <kierank> 9:00 PM <"BBB> <kierank> I need to write an email explaining all the problems I have with [..] mpegvideo.c
[22:51:32 CET] <kierank> yeah that's why I plan to do it during the weekend
[22:55:10 CET] <BBB> Sesse: Im suspecting that the weird behaviour in the spec is an efficient way to not specialcase omiting the sign if level = 0
[22:55:13 CET] <michaelni> mpeg1 spec says: if ( ( dct_recon[m][n] & 1 ) == 0 ) dct_recon[m][n] = dct_recon[m][n] - Sign(dct_recon[m][n]) ;
[22:56:15 CET] <michaelni> for the question why the spec does this i suspect it was intended to reduce idct rounding differences between implementaions but thats a pure guess
[22:56:57 CET] <michaelni> mpeg2 replaced this by the last coeff lsb bit flip
[22:58:35 CET] <kierank> oh it's mpeg1
[23:01:06 CET] <iive> michaelni: I don't see how the second line implements the if() you quoted
[23:01:59 CET] <iive> for one thing, if sign() is positive, then the dct_recon would not be changed...
[23:02:32 CET] <michaelni> iive, the value in our impementation at this point is just the abs(), the sign/positive/negative is added in later
[23:02:54 CET] <iive> that's my point
[23:03:07 CET] <Sesse> someone went through all this trouble to save some and or or somewhere
[23:03:09 CET] <iive> it would round down both positive and negative values
[23:03:13 CET] <Sesse> but didn't bother multiplying qscale into quant_matrix
[23:06:39 CET] <iive> Sesse: qscale might be different for every MB, unless I confuse it with mpeg4
[23:07:08 CET] <Sesse> iive: it's part of the context, it seems
[23:07:47 CET] <Sesse> but okay, it seems you're right
[23:08:16 CET] <Sesse> seemingly it can change per-frame
[23:08:19 CET] <Sesse> still
[23:08:23 CET] <iive> still, there are 6 blocks in MB and just 4 matrix...
[23:11:51 CET] <michaelni> i would be surprised if noonw tried to merge qscale in the matrix
[23:12:34 CET] <Sesse> I suppose you'd need multiple matrices
[23:12:45 CET] <Sesse> ie., keep them cached around
[23:13:49 CET] <michaelni> for the encoder side there is q_*_matrix
[23:15:30 CET] <michaelni> you can also use " -flags2 +fast" IIRC to completely ignore the matrix which is "wrong" but should be even faster
[23:15:49 CET] <Sesse> wait, how can you ignore the matrix :-)
[23:16:29 CET] <Sesse> like, no dequant?
[23:17:00 CET] <iive> assume they contain the same value, so use only quant
[23:17:25 CET] <iive> I was wrong, in mpeg2 the quant is present in the slice header.
[23:18:01 CET] <michaelni> also if you are interrested in merging the multiply, it should be very easy to test what could be gained by this in best case. By simply removin the *qscale and benchmarking
[23:18:37 CET] <kierank> Sesse: there are lots of stupid, crashy, "fast" modes in ffmpeg
[23:18:51 CET] <iive> no...
[23:19:20 CET] <iive> there is macroblock_quant, that allows override
[23:21:23 CET] <iive> Sesse: one thing you could do is have 32*4 matrixes, one for each quant. of course, you trade cpu for cache
[23:45:39 CET] <atomnuker> bofh_: so any idea how to make the forward mdct revtab?
[23:48:31 CET] <Sesse> what's the difference between idct_context.idct and idct_context.idct_put?
[23:48:45 CET] <Sesse> or more, which one do I want :-)
[23:50:55 CET] <Sesse> hm, idct_put seemingly clamps and deals with line size and stuff
[23:50:56 CET] <Sesse> so I guess that's it
[23:52:49 CET] <BBB> idct does a coeff to diff, typically both being int16_t
[23:53:06 CET] <BBB> idct_put places it in a pixel buffer with clamping, typically uint8_t with 8bit clamping in 0-255 range
[23:53:13 CET] <Sesse> diff?
[23:53:38 CET] <BBB> idct is not always pure pixels, it can be a residual on top of a prediction, e.g. motion prediction or intra prediction
[23:53:42 CET] <Sesse> ah
[23:53:45 CET] <Sesse> of course
[23:53:51 CET] <Sesse> that's why the idct_add, too
[23:53:52 CET] <BBB> in that case, youd use idct_add instead of idct_put
[23:54:11 CET] <BBB> so idct_add means pixel += diff
[23:54:18 CET] <BBB> idct_put means pixel = diff
[23:54:23 CET] <Sesse> I suppose there's some convention for the scale factor of the idct coefficients, too?
[23:54:24 CET] <BBB> and diff = idct(coeff)
[23:54:34 CET] <Sesse> diff = dct(coeff), you mean
[23:54:35 CET] <Sesse> presumably
[23:54:40 CET] <Sesse> eh, no
[23:54:42 CET] <Sesse> nevermind me :-)
[23:54:44 CET] <BBB> :-p
[23:54:55 CET] <Sesse> but my question about the scale factor of the idct coefficient still stands
[23:55:35 CET] <BBB> Im assuming youre using simple_idct?
[23:55:52 CET] <BBB> we just had this discusssion with kierank a few days ago
[23:55:59 CET] <Sesse> I have no idea what I'm using
[23:56:11 CET] <Sesse> I'm calling ff_idctdsp_init
[23:56:12 CET] <kierank> Sesse: well remember they copypasted the xvid one
[23:56:13 CET] <BBB> I dont recall what scale factor it uses, but the answer is that its always wrong and you have to wiggle until it works
[23:56:20 CET] <Sesse> kierank: they don't use that one in the newer versions
[23:56:21 CET] <kierank> BBB: he's just doing boring 8-bit i think
[23:56:27 CET] <BBB> "boring"
[23:56:38 CET] <kierank> well he doesn't have dc coefficients that are > 32-bit
[23:56:43 CET] <kierank> sorry > 16-bit
[23:57:17 CET] <kierank> I think the idct i will have to write will be 32-bit coeffs, same intermediates, 16-output
[23:58:10 CET] <BBB> yes
[00:00:00 CET] --- Thu Jan 5 2017
More information about the Ffmpeg-devel-irc