[Ffmpeg-devel-irc] ffmpeg-devel.log.20151007

burek burek021 at gmail.com
Thu Oct 8 02:05:03 CEST 2015


[00:02:56 CEST] <jamrial> durandal_1707: yes, it passes. also tried a 13 minutes long sample and performance wise it's more even matched compared to the 1 minute sample
[00:15:10 CEST] <cone-644> ffmpeg 03Andrey Utkin 07master:fdb32838723e: avformat/httpauth: Add space after commas in HTTP/RTSP auth header
[00:39:12 CEST] <rcombs> nevcairiel: wanna talk about that matroskaenc AAC extradata issue?
[00:40:10 CEST] <rcombs> (and more broadly, automatically applying bitstream filters in mux.c and filtering a packet before writing the header)
[00:40:29 CEST] <rcombs> it looks like it'd be pretty simple to handle this in the interleave architecture
[01:31:34 CEST] <cone-644> ffmpeg 03James Almer 07master:72254b19b8ca: x86/alacdsp: add simd optimized functions
[01:31:35 CEST] <cone-644> ffmpeg 03James Almer 07master:285e41c34c9c: checkasm: add alacdsp tests
[01:58:38 CEST] <Compn> carl / ubitux , was this patch committed? just curious if bug is still valid, https://trac.ffmpeg.org/ticket/3971
[02:03:01 CEST] <rcombs> https://gist.github.com/742d985f3dd5848d7f2d OK, rate my evil
[02:09:40 CEST] <rcombs> (it works!)
[02:10:03 CEST] <rcombs> nevcairiel: wm4: ^
[02:43:52 CEST] <Daemon404> oh man youre removing my favourite bit of cargo culted pasting, rcombs 
[02:49:28 CEST] <rcombs> well, it lives on in av_apply_bitstream_filters
[02:49:59 CEST] <Daemon404> doesnt it still have the same problems of not being able to fail properly as the old api
[02:50:22 CEST] <Compn> _gb_
[02:50:23 CEST] <Compn> er
[02:51:24 CEST] <rcombs> did I miss an error case?
[02:52:10 CEST] <Daemon404> rcombs, no. the old api had the issue, and the new one still is a wrapper over it
[02:53:13 CEST] <Daemon404> i cant remember 100% what the issue was
[02:53:16 CEST] <rcombs> which bit are you referring to?
[02:53:32 CEST] <rcombs> I plan to split that patch into about 4 commits
[02:53:39 CEST] <Daemon404> its one of teh api funcs that already exists
[02:54:09 CEST] <Daemon404> ignore me for now
[02:54:14 CEST] <Daemon404> thw new api is still lightyears better
[02:54:28 CEST] Action: Daemon404 also remembers calling some filter->filter->thing() or something
[02:54:47 CEST] <rcombs> with a bit of tweaking it should be able to replace ffmpeg.c's filtering altogether
[02:55:00 CEST] <rcombs> have ffmpeg.c add filters to AVStream
[02:55:23 CEST] <rcombs> just needs matroskadec to add to the end of the filter linked list instead of replacing it
[02:55:30 CEST] <rcombs> (which there should be an API function for)
[02:55:49 CEST] <rcombs> also, I'd like to move bsf args to the bsfc
[02:55:49 CEST] <Daemon404> the best people to review the design are likely asleep atm
[02:56:25 CEST] <rcombs> I'll send it as a WIP then
[03:12:45 CEST] <rcombs> sent, along with explanation
[03:43:50 CEST] <Daemon404> lol lab
[04:47:36 CEST] <cone-037> ffmpeg 03Michael Niedermayer 07master:f4585e666fc9: avformat/flvdec: Print stream type in case a new stream is discovered after the header
[04:47:36 CEST] <cone-037> ffmpeg 03Michael Niedermayer 07master:e96ecaf053d8: avcodec/pngenc: Initialize fctl_chunk to 0
[05:02:13 CEST] <cone-037> ffmpeg 03Michael Niedermayer 07master:a852db796edc: avcodec/pngenc: Check that there is at least 1 frame
[10:10:57 CEST] <ubitux> Compn: no it's not; a one line patch from nevcairiel exists
[10:11:08 CEST] <nevcairiel> (its linked in the ticket)
[10:11:11 CEST] <ubitux> Compn: a user had this issue?
[12:59:02 CEST] <cone-428> ffmpeg 03Ganesh Ajjanagadde 07master:2d8ef1b6902d: doc/developer: use https instead of http
[13:27:10 CEST] <durandal11707> how to build ffmpeg with profiling enabled?
[13:29:08 CEST] <J_Darnley> For which profiler?
[13:29:38 CEST] <J_Darnley> oprofile onyl needs debug symbols (I think)
[13:29:56 CEST] <J_Darnley> gprof is -pg
[13:30:18 CEST] <J_Darnley> I haven't used any others so check their manual(s)
[14:06:25 CEST] <Compn> ubitux : no idea, just reviewing bug reports
[14:55:50 CEST] <cone-428> ffmpeg 03Ganesh Ajjanagadde 07master:b3b6665c6053: avcodec/libx264: silence -Waddress
[14:55:51 CEST] <cone-428> ffmpeg 03Ronald S. Bultje 07master:ce7872903345: vp9: don't keep a stack pointer if we don't need it.
[15:33:26 CEST] <durandal_1707> how to access uint8_t *in_lines_cur[2] from yasm?
[15:35:23 CEST] <J_Darnley> [address] and [address+gprsize]
[15:36:06 CEST] <J_Darnley> Or did you mean the uint8 data?
[15:36:30 CEST] <J_Darnley> in that case load the two pointers and then dereference
[15:36:51 CEST] <durandal_1707> i need to access in_lines_cur[0][0]
[15:38:08 CEST] <J_Darnley> mov register, [address]
[15:38:31 CEST] <J_Darnley> something something_else, [register]
[16:24:09 CEST] <cone-428> ffmpeg 03Arttu Ylä-Outinen 07master:b3777b2c2eb5: libkvazaar: Update to work with the latest version
[16:24:10 CEST] <cone-428> ffmpeg 03Arttu Ylä-Outinen 07master:b9446d0b5693: configure: Add version check for libkvazaar
[16:24:11 CEST] <cone-428> ffmpeg 03Arttu Ylä-Outinen 07master:425d6134ed88: libkvazaar: Remove unnecessary NULL checks
[16:24:12 CEST] <cone-428> ffmpeg 03Arttu Ylä-Outinen 07master:c09419ca80f1: libkvazaar: Replace asserts with proper errors
[16:24:13 CEST] <cone-428> ffmpeg 03Arttu Ylä-Outinen 07master:8db62f04191a: libkvazaar: Set pts and dts
[16:24:14 CEST] <cone-428> ffmpeg 03Arttu Ylä-Outinen 07master:5fefa7b512cc: libkvazaar: Use av_image_copy for copying pixels
[16:24:15 CEST] <cone-428> ffmpeg 03Arttu Ylä-Outinen 07master:0e348683875d: libkvazaar: Fix setting framerate
[16:24:16 CEST] <cone-428> ffmpeg 03Arttu Ylä-Outinen 07master:cb8999f368bb: doc/encoders: Fix libkvazaar documentation
[16:51:36 CEST] <cone-428> ffmpeg 03Shivraj Patil 07master:b0732b0214a4: avcodec/mips: build fix for MSA
[16:51:37 CEST] <cone-428> ffmpeg 03Shivraj Patil 07master:322e960dbf32: avcodec/mips: build fix for MSA 64bit
[17:04:23 CEST] <wm4> michaelni: opening a libswresample context for resampling is extremely slow on weak hardware (rpi2), it takes up to 40ms each time
[17:04:46 CEST] <wm4> why does it take so long?
[17:05:12 CEST] <fritsch> wm4: i can't answer this - but out of that reason - we have a "gpu driven resampler" especially for the PI in kodi
[17:05:30 CEST] <wm4> yeah, I want to avoid this
[17:05:52 CEST] <nevcairiel> it builds the polyphase filterbank, which i guess can be that slow?
[17:07:10 CEST] <michaelni> yes, also it can be tuned by phase_shift, filter_size, linear_interp, ...
[17:08:30 CEST] <michaelni> is there a situation where the init is done often and not just once (and on rare parameter changes) ?
[17:09:12 CEST] <wm4> I need to adjust audio playback speed (and this can happen fairly often), which is done by resampling the audio
[17:09:16 CEST] <nevcairiel> it looks like it scales with (2^phase_shift) * filter_size, approximately
[17:09:36 CEST] <nevcairiel> wm4: sounds like something you might want soundtouch for instead
[17:10:05 CEST] <wm4> can you eat it?
[17:10:07 CEST] <kierank> wm4: for refresh rate matching?
[17:10:12 CEST] <wm4> kierank: yes
[17:10:16 CEST] <fritsch> for sync playback
[17:10:17 CEST] <nevcairiel> then definitely get soundtouch
[17:10:19 CEST] <fritsch> we use it
[17:10:29 CEST] Action: kierank has the same problem
[17:10:31 CEST] <fritsch> (to display)
[17:10:38 CEST] <nevcairiel> http://www.surina.net/soundtouch/
[17:10:49 CEST] <nevcairiel> we use that for refresh matching, super easy to use
[17:10:54 CEST] <nevcairiel> and fast
[17:10:56 CEST] <wm4> nevcairiel: haven't seen that before
[17:11:00 CEST] Action: kierank just duplicates samples currently
[17:11:11 CEST] <kierank> but my drift is small
[17:11:27 CEST] <fritsch> https://github.com/xbmc/xbmc/blob/master/xbmc/cores/AudioEngine/Engines/ActiveAE/ActiveAEResamplePi.cpp <- swresample compatible PI gpu resampler
[17:11:59 CEST] <wm4> kierank: I've always wondered if the "pro" stuff can't sync exactly by using elaborate clock synchronization
[17:12:14 CEST] <kierank> if i was reading from a file, sure
[17:12:18 CEST] <fritsch> kierank: there is a way better solution
[17:12:25 CEST] <kierank> but I'm coming from a live stream somewhere else
[17:12:48 CEST] <wm4> fritsch: aha that's pretty fun
[17:13:05 CEST] <kierank> fritsch: what is the better solution (that's not resampling?)
[17:13:19 CEST] <fritsch> no better without of course
[17:13:20 CEST] <nevcairiel> fwiw, soundtouch is also what ReClock uses, which seemed to have inspired this whole feature in most media software today
[17:13:28 CEST] <fritsch> though keep in mind: with an AVR duplicate and passthrough
[17:13:35 CEST] <fritsch> will fool the AVR
[17:13:40 CEST] <fritsch> drop is not that hard
[17:13:48 CEST] <kierank> not if you duplicate in the right place
[17:13:52 CEST] <fritsch> as the timestamps don't duplicate but are still monotonic
[17:14:07 CEST] <fritsch> with passthrough you got no chance than to just add the same package again
[17:14:11 CEST] <kierank> ok maybe you are thinking of a different problem
[17:14:17 CEST] <kierank> I am thinking about ac3 passthrough
[17:14:22 CEST] <fritsch> jep - same here
[17:14:26 CEST] <kierank> you just duplicate one of the zero bytes and not the payload
[17:14:30 CEST] <fritsch> or dts, dts-hd etc.
[17:14:37 CEST] <michaelni> to sync timestamps (aka adjust sampling rate slightly) in swr theres, swr_set_compensation() swr_next_pts()
[17:14:57 CEST] <fritsch> michaelni: https://github.com/xbmc/xbmc/blob/master/xbmc/cores/AudioEngine/Engines/ActiveAE/ActiveAEResampleFFMPEG.cpp#L182 <- jep
[17:15:10 CEST] <fritsch> we asked for some ffmpeg patch which you commited some years ago, that made this possible
[17:15:13 CEST] <fritsch> thanks for that
[17:15:44 CEST] <wm4> eh, that's what it's for?
[17:17:31 CEST] <fritsch> jep, we got that added to public interface 2 years ago
[17:17:32 CEST] <fritsch> iirc
[17:17:34 CEST] <michaelni> also if someone needs faster init/open for swr, i can probably optimize it more, also using another filter_type, i think kaiser is default could make a noticeable difference, would need to try
[17:18:23 CEST] <fritsch> wm4: as you just profile? mind changing the filter_type?
[17:18:30 CEST] <fritsch> and check if it gets better?
[17:19:06 CEST] <fritsch> but if you reopened a new context whenever the ratio changed - i would better use the sw_set_compensation
[17:19:26 CEST] <wm4> just changing to cubic reduces the time to 10ms
[17:19:31 CEST] <wm4> yeah.
[17:19:56 CEST] <fritsch> i think we had this issue for "menu sounds" or on track change iirc
[17:20:05 CEST] <fritsch> or live tv stuttered when commercials set in
[17:20:08 CEST] <fritsch> on the pi
[17:20:23 CEST] <fritsch> yeah popcorn mix wanted to use the gpu anyways
[17:20:31 CEST] <fritsch> so reeimplemented it in a pi specific way
[17:20:33 CEST] <nevcairiel> why would live tv care when commercials come in
[17:20:37 CEST] <nevcairiel> a broadcast doesnt change
[17:20:39 CEST] <fritsch> cause audio changes
[17:20:43 CEST] <fritsch> from ac3
[17:20:44 CEST] <fritsch> to stereo
[17:20:46 CEST] <fritsch> to whatever
[17:20:53 CEST] <nevcairiel> thats one odd broadcast you are watching
[17:20:54 CEST] <fritsch> decoded input changes from 6 channels
[17:20:56 CEST] <fritsch> to 2 channels
[17:21:01 CEST] <fritsch> no - that's DVB-C
[17:21:01 CEST] <wm4> I could steal that omx-based resampler code from kodi, but actually I don't want too much rpi-specific code at all
[17:21:02 CEST] <fritsch> :-)
[17:21:16 CEST] <wm4> I even wrote a kernel patch to make ALSA multichannel PCM work on rpi
[17:21:22 CEST] <fritsch> nevcairiel: it's rather odd your see no difference between james bond and tagesschau :-)
[17:21:33 CEST] <fritsch> wm4: yeah, performance sucks - right? :-)
[17:21:58 CEST] <wm4> which performance
[17:22:12 CEST] <kierank> 4:20 PM <"nevcairiel> thats one odd broadcast you are watching
[17:22:14 CEST] <kierank> normal stuff
[17:22:24 CEST] <fritsch> audio / video sync performance
[17:22:32 CEST] <fritsch> omxplayer did not support that for a long time
[17:22:45 CEST] <fritsch> and RPi1 B had issues when handling video / audio differently for a long time
[17:22:50 CEST] <fritsch> got massively better with RPi2
[17:23:14 CEST] <fritsch> but still - the only devices that would help are "external DACs via USB"
[17:23:21 CEST] <fritsch> that have > 2 analog outs
[17:26:08 CEST] <fritsch> wm4: https://github.com/xbmc/xbmc/pull/5479 <- original commit
[17:31:47 CEST] <wm4> hm.
[17:43:05 CEST] <BBB> (lldb) bt
[17:43:06 CEST] <BBB> * thread #1: tid = 0x3ba6b2, 0x00000003, queue = 'com.apple.main-thread', stop reason = EXC_BAD_ACCESS (code=1, address=0x3)
[17:43:06 CEST] <BBB>   * frame #0: 0x00000003
[17:43:09 CEST] <BBB> I wonder how that happened
[17:43:24 CEST] <Daemon404> lldb? my condolences
[17:43:45 CEST] <Daemon404> 0x3 also looks interesting
[17:44:05 CEST] <mateo`> hey o/, is there some documentation that describe how the parsers work in ffmpeg ? also, does a video decoder always receive a full frame from the demuxer (+parser i guess) ? 
[17:44:31 CEST] <Daemon404> theres not much documentation on the parsers
[17:44:38 CEST] <Daemon404> i had to read ffmpeg.c when i used them
[17:44:46 CEST] <Daemon404> related: rcombs's patches from last night
[17:44:52 CEST] <mateo`> aren't they auto inserted before the decoders ?
[17:45:00 CEST] <wm4> video decoders expect exactly one frame per packet, yes
[17:45:08 CEST] <wm4> and parsers are sometimes used to achieve this
[17:45:17 CEST] <Daemon404> mateo`, in ffmpeg cli yes
[17:45:59 CEST] <kierank> j-b: qsv patch...
[17:45:59 CEST] <BBB> lldb is apples gdb
[17:46:02 CEST] <BBB> I dont know why
[17:47:37 CEST] <Daemon404> it's llvm's debugger
[17:47:40 CEST] <mateo`> thanks (and i could have remembered that rule of one packet == one frame, my ffmpeg is a bit rusty :()
[17:48:04 CEST] <BBB> Daemon404: I know that much :D Im actually quite comfi with it
[17:48:16 CEST] <BBB> Daemon404: I dont know why apple doesnt ship gdb, it only disappeared recently
[17:48:22 CEST] <BBB> whereas gcc disappeared ages ago
[17:48:35 CEST] <j-b> kierank: with static globals, I hope?
[17:48:38 CEST] <j-b> or worse?
[17:48:41 CEST] <kierank> yes
[17:48:54 CEST] <j-b> yes, worse?
[17:49:12 CEST] <kierank> static global
[17:49:18 CEST] <kierank> >+static AVQSVContext* g_av_qsv = NULL;
[17:49:27 CEST] <wm4> lol
[17:49:33 CEST] <ubitux> g for gore
[17:50:15 CEST] <Daemon404> BBB, gdb with clang is iffy is why
[17:50:20 CEST] <Daemon404> and apple uses clang
[17:53:13 CEST] <wm4> apple also are afraid of gpl
[17:53:16 CEST] <wm4> reason enough
[18:11:39 CEST] <BBB> Gramner: Ill address all your comments once Im done writing a few more versions here, Im sort of on a roll so I dont want to quit right now :D
[18:11:46 CEST] <BBB> Gramner: dont feel ignored ;)
[18:14:04 CEST] <cone-428> ffmpeg 03Michael Niedermayer 07master:6024c865ef13: swresample/resample: merge first iteration into init in bessel()
[18:14:05 CEST] <cone-428> ffmpeg 03Michael Niedermayer 07master:1bc873acd6e1: swresample/resample: manually unroll the main loop in bessel()
[18:25:43 CEST] <fritsch> wm4: ^^ now it should 18 ms
[18:26:22 CEST] <wm4> why 18 ms?
[18:26:40 CEST] <j-b> kierank: amazing code.
[18:26:49 CEST] <j-b> kierank: worse clearly the money
[18:27:20 CEST] <wm4> the qsv guy just wants a shared context, right? like vaapi has VADisplay
[18:33:26 CEST] <fritsch> wm4: commit message says: +10% and you had 20 ms before
[18:33:42 CEST] <wm4> I had about 40ms
[18:33:50 CEST] <fritsch> yeah, then scale linearly :-)
[18:33:59 CEST] <nevcairiel> wm4: it already allwos putting in a shared context via hwaccel_context or whatever
[18:33:59 CEST] <fritsch> by 2
[18:34:01 CEST] <wm4> I've applied the patch and now it's about 37ms (maybe I made a mistake when rebuilding)
[18:34:09 CEST] <nevcairiel> wm4: but he wants it "automagic" when using ffmpeg.c or some shit
[18:34:17 CEST] <wm4> nevcairiel: I'm writing a reply
[18:39:39 CEST] <fritsch> wm4: yeah, of by one - with a milli second timer - haha
[19:08:38 CEST] <cone-428> ffmpeg 03Nicolas George 07master:4982130d5a7b: lavfi/af_aresample: remove looping on request_frame().
[19:08:39 CEST] <cone-428> ffmpeg 03Nicolas George 07master:114f3f526e5a: lavfi/avf_showcqt: remove looping on request_frame().
[19:08:40 CEST] <cone-428> ffmpeg 03Nicolas George 07master:9a520c4d52a2: lavfi/avf_showspectrum: remove looping on request_frame().
[19:08:41 CEST] <cone-428> ffmpeg 03Nicolas George 07master:8a2e2fc34aae: lavfi/avf_showwaves: remove looping on request_frame().
[19:08:42 CEST] <cone-428> ffmpeg 03Nicolas George 07master:a45e96a54fc4: lavfi/vf_alphamerge: remove looping on request_frame().
[19:08:43 CEST] <cone-428> ffmpeg 03Nicolas George 07master:4bc7eb2dd232: lavfi/vf_fps: remove looping on request_frame().
[19:08:44 CEST] <cone-428> ffmpeg 03Nicolas George 07master:ca540fbdb448: lavfi/vf_select: remove looping on request_frame().
[19:08:45 CEST] <cone-428> ffmpeg 03Nicolas George 07master:73a5546ba8e2: lavfi/vf_thumbnail: remove looping on request_frame().
[19:08:46 CEST] <cone-428> ffmpeg 03Nicolas George 07master:86b8a82f4f9c: lavfi/vf_w3fdif: remove looping on request_frame().
[19:08:47 CEST] <cone-428> ffmpeg 03Nicolas George 07master:4883e5d54012: lavfi/vf_yadif: remove looping on request_frame().
[19:08:48 CEST] <cone-428> ffmpeg 03Nicolas George 07master:35c3043ea4e6: lavfi/avf_showspectrum: reindent after last commit.
[19:08:49 CEST] <cone-428> ffmpeg 03Nicolas George 07master:d7849248dd0d: lavfi/vf_alphamerge: reindent after last commit.
[19:08:50 CEST] <cone-428> ffmpeg 03Nicolas George 07master:90d087247c20: lavfi/vf_w3fdif: reindent after last commit.
[19:08:51 CEST] <cone-428> ffmpeg 03Nicolas George 07master:ea2fd42f9df4: lavfi/vf_thumbnail: reindent after last commit.
[19:08:52 CEST] <cone-428> ffmpeg 03Nicolas George 07master:8a9fa46e87da: lavfi/vf_yadif: reindent after last commit.
[20:13:23 CEST] <Daemon404> you know i just noticed, the cabac tablegen says: * Header file for hardcoded AAC SBR windows
[20:14:37 CEST] <JEEB> lol
[20:15:01 CEST] <peloverde> Getting ready for USAC I see :p
[20:15:57 CEST] <Daemon404> i had to google that
[20:16:34 CEST] <Daemon404> lul
[21:07:30 CEST] <wm4> michaelni: so what do the parameters of swr_set_compensation() mean?
[21:08:16 CEST] <wm4> hm I guess I can copy whatever ffplay.c does
[21:09:36 CEST] <fritsch> wm4: or what kodi does
[21:09:43 CEST] <fritsch> context is clear
[21:10:01 CEST] <fritsch> third param is also clear: that is what "would normally" happen if nothing needed to be compensated
[21:10:57 CEST] <wm4> not really clear
[21:11:01 CEST] <fritsch> second param is the delta, which means "new number of samples - normal number of samples"
[21:11:11 CEST] <fritsch> https://github.com/xbmc/xbmc/blob/master/xbmc/cores/AudioEngine/Engines/ActiveAE/ActiveAEResampleFFMPEG.cpp#L186
[21:11:14 CEST] <fritsch> see
[21:11:42 CEST] <wm4> is it all kinds of convoluted or am I dumb
[21:11:59 CEST] <fritsch> nope ignore the *m_dst_rate/m_src_rate for a moment
[21:12:03 CEST] <fritsch> and think it would be constant
[21:12:15 CEST] <fritsch> or think of dst_rate == src_rate
[21:12:19 CEST] <wm4> I know it does that because it's the "unit" these functions use
[21:12:27 CEST] <wm4> (even though this unit is documented on another function, urgh)
[21:12:58 CEST] <wm4> so why doesn't swr just allow changing the sample rate on the fly
[21:13:03 CEST] <wm4> same thing, but simpler concept
[21:13:15 CEST] <fritsch> cause it would need to reinit fully new?
[21:13:27 CEST] <fritsch> to get the init data changed?
[21:13:40 CEST] <fritsch> lookup tables, whatever?
[21:13:46 CEST] <wm4> but swr_set_compensation essentially does the same, change the apparent sample rate
[21:16:09 CEST] <fritsch> i don't think so
[21:16:25 CEST] <fritsch> at the end - of course it resamples
[21:16:39 CEST] <fritsch> but by choosing other coefficients within in the current context?
[21:28:40 CEST] <rcombs> wm4: I was actually intending to tweak AVStream::bsfs to allow it to be set by the user as well, but only via a lavf API for safely adding bsfs to the list
[21:29:15 CEST] <wm4> lists of bsfs?
[21:31:44 CEST] <wm4> anyway, after ignoring the clunky API and just copying ffplay.c code, it works as I'd have expected
[21:31:46 CEST] <wm4> good enough
[21:40:01 CEST] <durandal_1707> w3fdif=simple is almost 50% faster than yadif with SIMD added
[21:58:24 CEST] <fritsch> durandal_1707: is that code already in master?
[21:58:52 CEST] <durandal_1707> fritsch: nope
[21:58:55 CEST] <fritsch> and concerning image quality?
[22:04:15 CEST] <durandal_1707> well, it is less blurred
[22:05:39 CEST] <wm4> I thought yadif was The Best
[22:05:47 CEST] <wm4> (for open source realtime uses)
[22:06:04 CEST] <iive> that doesn't use motion estimation
[22:17:32 CEST] <fritsch> last time I tried to use w3fdif=simple for kodi
[22:17:36 CEST] <fritsch> it was much, much too slow
[22:17:42 CEST] <fritsch> so we kept yadif
[22:17:54 CEST] <fritsch> we have half and full
[22:18:21 CEST] <fritsch> but will test again when the SIMD changes land
[22:36:48 CEST] <BBB> durandal_1707: dude!
[22:36:56 CEST] <BBB> durandal_1707: nice asm
[22:37:11 CEST] <BBB> durandal_1707: please do follow general conventions, e.g. split only the dsp interface bits in a header file, not the whole filter struct
[22:38:07 CEST] <BBB> durandal_1707: how big is coef?
[22:40:14 CEST] <durandal_1707> 32bit
[22:42:02 CEST] <BBB> 15bit
[22:42:04 CEST] <BBB> not 32
[22:42:12 CEST] <durandal_1707> what?
[22:42:12 CEST] <BBB> see coef_lf/hf
[22:43:06 CEST] <BBB> you can divide all coefs by 2 in that table, make them 14bit, and then use pmaddwd
[22:43:27 CEST] <BBB> then it works on sse2 also, and its probably faster because you merge several things together
[22:43:51 CEST] <durandal_1707> isnt pmaddwd magic?
[22:44:03 CEST] <BBB> pmaddwd is the best thing in the world after pmaddubsw and pshufb
[22:44:11 CEST] <BBB> pmulld is sse4, so kind of shitty
[22:44:32 CEST] <BBB> pmaddwd is more powerful than pmulld, since it adds and multiplies in one instruction
[22:44:34 CEST] <jamrial> blah, you were faster BBB :P
[22:44:42 CEST] <BBB> sorry
[22:44:43 CEST] <jamrial> i was about to send an email suggesting pmaddwd
[22:44:50 CEST] <durandal_1707> but it uses only 2 registers
[22:45:02 CEST] Action: J_Darnley goes to lookup that instruction
[22:45:15 CEST] <BBB> J_Darnley: also see pmaddubsw
[22:45:46 CEST] <jamrial> i'm sending it anyway
[22:45:51 CEST] <jamrial> also, pmulld is painfully slow
[22:45:59 CEST] <jamrial> slower than any other mul instructino
[22:46:37 CEST] <jamrial> while every other mul instruction takes about 3 to 5 cycles to complete, pmulld can take up to 10 on some arches
[22:46:59 CEST] <J_Darnley> Oh.  A horizontal addition.
[22:49:10 CEST] <BBB> for a word register a and b of values a[0-7] and b[0-7], pmaddwd gives an output register of dwords a0*b0+a1*b1, a2*b2+a3*b3, a4*b4+a5*b5, a6*b6+a7*b7
[22:52:15 CEST] <Gramner> that's the weird thing about x86 simd. some instructions are just really, really good and it's worth designing your entire algorithm around getting the most use of those instructions. yet for some reason intel doesn't really seem to acknowledge that and extend/improve such instructions
[22:53:26 CEST] <BBB> maybe intel should have a few people lurking in this channel
[22:54:01 CEST] <wm4> I bet all intel does is checking what their compiler wants to do
[22:54:08 CEST] <BBB> intel people: can I please please please have a packing instruction like packsswd that also downshits with rounding? Im ok if it does so with 15 bits like pmulhrsw, but making that imm8 would be even better
[22:54:12 CEST] <J_Darnley> Now I wish flac could use 16-bit internally.
[22:54:13 CEST] <wm4> and then add instructions which do just that
[22:54:27 CEST] <wm4> BBB: "downshits"? fatal typo
[22:54:28 CEST] <jamrial> downshit? :P
[22:54:32 CEST] <BBB> lol
[22:54:34 CEST] <BBB> oops
[22:54:37 CEST] <BBB> downshifts*
[22:54:50 CEST] <nevcairiel> yeah dont typo, now you will get a shitty instruction, instead of shifty one
[22:54:57 CEST] <BBB> hum...
[22:55:03 CEST] <BBB> I wonder what that instruction will do
[22:57:19 CEST] <Gramner> probably something like gather. you can now do multiple loads in one instruction! except it's a lot slower than doing multiple individual loads for no explainable reason
[22:58:18 CEST] <jamrial> one can hope gather/scatter will get faster with new arches. afaik palignr was kinda slow when it was introduced compared to current cpus
[22:59:08 CEST] <Gramner> i could perfectly understand if it had exactly the same performance as multiple loads, that way they could have support for it early on and then improve it. but it makes no sense for it to actually be slower
[23:01:09 CEST] <BBB> Gramner: but think of the positive; if its slower, they can make it faster!
[23:01:24 CEST] <BBB> its like a microsoft new windows release
[23:01:31 CEST] <BBB> you know in advance that itll be the best windows ever
[23:01:44 CEST] <BBB> (because the previous one was pretty shifty)
[23:02:21 CEST] <BBB> jamrial: I doubt pshufb would be faster for zero extension tbh
[23:02:46 CEST] <BBB> durandal_1707: the magic instruction is movh m0, [data], pxor m1, m1; punpcklbw m0, m1
[23:02:53 CEST] <jamrial> for byte to word most likely no, true
[23:03:06 CEST] <cone-428> ffmpeg 03Paul B Mahol 07master:0948ba320496: avfilter/x86/vf_blend.asm: add hardmix and phoenix sse2 SIMD
[23:03:07 CEST] <cone-428> ffmpeg 03Paul B Mahol 07master:e999210cec5b: avfilter/x86/vf_blend.asm: 11th register is used, update functions
[23:03:18 CEST] <Gramner> yes, with skylake gather got a lot faster. it's now 35% of the speed of doing individual loads/shuffles (compared to like 10% in haswell)
[23:03:27 CEST] <BBB> didnt I add comments to blend.asm?
[23:04:31 CEST] <BBB> oh he did address
[23:04:34 CEST] <BBB> durandal_1707: I had one more comment
[23:04:38 CEST] <BBB> durandal_1707: look at this:
[23:04:39 CEST] <BBB> +        pxor            m1, m2
[23:04:40 CEST] <BBB> +        pxor            m0, m3
[23:04:41 CEST] <BBB> +        pxor            m1, m3
[23:04:53 CEST] <BBB> durandal_1707: these are all bitwise instructions, so a double pxor can always be replaced by a single
[23:05:11 CEST] <BBB> if you do pxor m1, 255 and pxor m1, 128, you can effectively do pxor m1, 127
[23:05:23 CEST] <BBB> and the result is identical for obvious reasons, and saves one instruction
[23:09:10 CEST] <Gramner> another fun oversight (which I complained about to intel engineers before avx2 was even released): you cannot insert/extract elements smaller than 16 bytes from anything other than the lowest 16 bytes of a vector register
[23:10:04 CEST] <Daemon404> by the time you can look at it, its too late to change it
[23:10:43 CEST] <Gramner> with avx-512 you can kind-of do inserts with abusing broadcasts + merging opmasks, but that requires multiple instructions and is probably slow
[23:11:05 CEST] <Gramner> for avx2, yes of course. but they could and should have fixed it for avx-512
[23:11:22 CEST] <Daemon404> might have been quite far in the pipeline
[23:11:24 CEST] <Daemon404> i dunno
[23:11:28 CEST] <Daemon404> hardware is slow
[23:12:57 CEST] <Gramner> doubt it. integer instructions in avx-512 was kind of thrown on afterwards
[23:14:11 CEST] <jamrial> yeah, the avx512bw, dq, vl, ifma52 and vbmi extensions were announced a year or so after avx512 proper
[23:14:55 CEST] <cone-428> ffmpeg 03Paul B Mahol 07master:624a1a0e690e: avfilter/x86/vf_blend.asm: hardmix: do same with two pxor instructions less
[23:15:46 CEST] <Gramner> avx-512f is pretty much the larrabee instruction set copy&pasted (iirc with a single bit flipped in every instruction because then it's new!)
[23:32:46 CEST] <cone-428> ffmpeg 03Christophe Gisquet 07master:71199ee9077d: dnxhddec: better support for 4:4:4
[23:34:09 CEST] <cone-428> ffmpeg 03Paul B Mahol 07master:4e7fa057d2ff: avfilter/vf_w3fdif: scale down coefficiends by 2
[23:34:16 CEST] <andrelec1> hello
[23:34:41 CEST] <andrelec1> i need somme help for creat a very little projet with ffmpeg ... 
[23:34:58 CEST] <andrelec1> i have lot of error when i compile my very little soft -_-" !
[23:35:31 CEST] <J_Darnley> Then I suggest you install ffmpeg's libraries and headers properly.
[23:35:44 CEST] <andrelec1> ys
[23:37:01 CEST] <andrelec1> but i can't do it properly -_-"
[23:37:22 CEST] <andrelec1> well i juste download the source ... configure make make install ...
[23:37:28 CEST] <andrelec1> i create new projet 
[23:38:01 CEST] <andrelec1> add avcodec.h an avformat.h 
[23:38:17 CEST] <andrelec1> and in main juste add av_register_all();
[23:38:18 CEST] <J_Darnley> Unless you are going to propose a patch for ffmpeg's code then this is the wrong channel.  Go to #ffmpeg with user questions.
[23:39:08 CEST] <andrelec1> hoo 
[23:39:09 CEST] <andrelec1> sorry
[23:42:53 CEST] <durandal_1707> BBB: no single filter splits only dsp interface bits into separate file
[23:43:09 CEST] <BBB> durandal_1707: can we start doing that? I commented that to earlier patches also
[23:43:28 CEST] <BBB> libavfilter/psnr.h does btw
[23:43:38 CEST] <BBB> ssim also
[00:00:00 CEST] --- Thu Oct  8 2015


More information about the Ffmpeg-devel-irc mailing list