[Ffmpeg-devel-irc] ffmpeg-devel.log.20161006

burek burek021 at gmail.com
Fri Oct 7 03:05:02 EEST 2016


[02:18:21 CEST] <cone-136> ffmpeg 03Timothy Gu 07master:bdcd586c0df8: pixdesc: Order function prototypes semantically
[02:18:22 CEST] <cone-136> ffmpeg 03Timothy Gu 07master:54220ce731f1: pixfmt: Use enum assignment for aliases
[04:49:51 CEST] <rcombs> jamrial: around?
[05:49:58 CEST] <jamrial> rcombs: yeah
[05:51:02 CEST] <rcombs> jamrial: so, I haven't applied your MKV patch series locally, but my understanding is that it wouldn't help this case, because e.g. when writing the final global duration, we don't check if the pb is seekable and just try it anyway&
[05:51:36 CEST] <rcombs> &and then the seek _succeeds_ in the segment case, since the underlying pb (currently) is a file, and even if we mark it non-seekable the seek call still works
[05:51:42 CEST] <jamrial> in write_trailer()? isn't all that under a seekable check?
[05:51:54 CEST] <rcombs> just, we end up at that offset in the last segment, not the first/header one
[05:52:05 CEST] <rcombs> jamrial: writing each stream's duration is; writing the global one is not
[06:06:54 CEST] <jamrial> rcombs: look at the stuff inside the pb->seekable check in mkv_write_trailer. unless i'm missing something, both global and stream durations are in there
[06:10:35 CEST] <rcombs> jamrial: oh, sorry! It's been a while and I misremembered the cause of the problem
[06:11:02 CEST] <rcombs> jamrial: it wasn't the duration, but instead the `end_ebml_master(pb, mkv->segment);`
[06:11:35 CEST] <rcombs> though that's since been wrapped in an `is_live` check, so I suppose it can be worked around
[08:59:36 CEST] <cone-183> ffmpeg 03Rodger Combs 07master:4c9c4fe8b21b: lavf/utils: ignore outlier subtitle and data stream end times as well
[08:59:36 CEST] <cone-183> ffmpeg 03Rodger Combs 07master:a6bce3ca90de: lavf/utils: avoid using programs for duration when there's only one
[11:30:44 CEST] <ubitux> gopro videos have "moments", which represent specific instant in a video. they are stored in udta in mov
[11:30:52 CEST] <ubitux> should they be exported as a crafter string
[11:30:55 CEST] <ubitux> or "chapters"
[11:31:11 CEST] <ubitux> chapters represent a segment, while moment are just instant time technically
[11:31:30 CEST] <ubitux> still, it's probably better than a crafted string with ts separated by space/comma/whatever
[11:31:52 CEST] <ubitux> or even multime moment0=234 moment1=542 moment2=4531 etc
[11:32:28 CEST] <nevcairiel> chapters is probably fine, if no other chapter info is available
[11:32:59 CEST] <wm4> sounds like chapters to me
[11:33:15 CEST] <ubitux> so chapters with end=NOPTS?
[11:33:34 CEST] <nevcairiel> dont think thats valid
[11:33:53 CEST] <ubitux> there is a special case if (end != AV_NOPTS_VALUE && start > end) 
[11:34:02 CEST] <ubitux> at the beginning of avpriv_new_chapter()
[11:34:10 CEST] <ubitux> maybe it's an automatic "til the next/end"
[11:34:27 CEST] <nevcairiel> if it accepts end=NOPTS then thats probably fine, i thought it would also block that
[13:21:46 CEST] <BtbN> philipl, can't you use http://datamining.xmu.edu.cn/documentation/cuda4.1/group__CUDA__TEXREF_g26f709bbe10516681913d1ffe8756ee2.html#g26f709bbe10516681913d1ffe8756ee2 ?
[13:21:50 CEST] <BtbN> To bind your texture?
[13:59:47 CEST] <jkqxz> wm4:  VAAPI is usable with mesa/gallium on recent AMD graphics cards (and I think some Nvidia, though I haven't seen that working myself) - you could put one of those in a big-endian machine.  (It doesn't support 10-bit surfaces, though, so little point in doing that for now.)
[14:02:23 CEST] <cone-183> ffmpeg 03Nablet Developer 07master:8d858674fd1b: avcodec/qsvenc_h264: fix segfault when a53 SEI is not available
[14:04:59 CEST] <iive> Is that a real name? Nablet Developer?
[14:08:09 CEST] <wm4> it's a company
[14:09:17 CEST] <iive> aha
[15:03:46 CEST] <Compn> corporations are people, my friend\
[15:10:27 CEST] <iive> Compn: I'll believe it, when texas executes one.
[15:15:27 CEST] Action: Rathann makes some popcorn and heads back to #libav-devel
[15:16:08 CEST] <Rathann> jkqxz: which cards are considered "recent"?
[15:16:50 CEST] <Rathann> jkqxz: is Radeon HD 7970 recent enough or ancient already? ;)
[15:26:05 CEST] <iive> Rathann: it is recent enough to get vulkan support eventually...
[15:26:13 CEST] <jkqxz> Rathann:  GCN-era.  That one should work, though you'd have to try it to see if it actually does.  Decode-only might work on earlier ones, I don't really know.
[15:30:56 CEST] <Rathann> cool
[15:41:10 CEST] <iive> i've been using mesa gallium video decode for ages on hd5670 :D and much earlier ones are supported.
[15:47:03 CEST] <wm4> I thought you use xvmc
[15:47:36 CEST] <iive> mesa gallium provides xvmc, vdpau , vaapi and probably some more.
[15:52:30 CEST] <BtbN> why would someone voluntarily use vaapi?
[15:54:40 CEST] <iive> why? Is it bad API? I thought that just the implementation is bad - rigged with bugs and random crashes.
[15:54:41 CEST] <kwizart> iive, omxil 
[15:55:12 CEST] <iive> kwizart: i think it needs some external libary for omx
[15:56:01 CEST] <BtbN> iive, it's a horrible to use API.
[15:56:03 CEST] <kwizart> yep, only for the loader (and the headers), one can use bellagio-omxil, but it's unmaintained
[15:56:23 CEST] <BtbN> And it's just a very thing abstraction around Intel-Hardware.
[15:56:34 CEST] <iive> BtbN: even for decoding only?
[15:56:38 CEST] <BtbN> The mesa devs implementing it for some other hardware had to jump through some hoops to kind of emulate the API.
[15:56:40 CEST] <BtbN> iive, yes.
[15:56:55 CEST] <nevcairiel> for decoding it seems largely similar to the other traditional hwaccels
[15:57:21 CEST] <nevcairiel> although the functions to download an image of a vaapi surface is apparently quite arcane, and there is like 3 of them
[15:57:41 CEST] <BtbN> The "propper" way to download images via VAAPI does not exist in ffmpeg.
[15:57:51 CEST] <BtbN> Which is why it's horribly slow, barely managing 60 FPS on 1080p
[15:57:57 CEST] <BtbN> Because it's a mess
[15:58:03 CEST] <wm4> I had quite some trouble with this
[15:58:13 CEST] <nevcairiel> from what I am told, the "proper" way is just quite annoying for various reasons
[15:58:18 CEST] <nevcairiel> and as such didnt fit ffmpeg
[15:58:29 CEST] <BtbN> Someone would have to write yasm for it. I have code using intrinscis for it.
[15:58:38 CEST] <BtbN> And something is wrong with that code, as it's still slow.
[15:59:12 CEST] <nevcairiel> anton wrote a sse4 download yasm for libav, it might get merged sometime this year if we're lucky =p
[15:59:37 CEST] <jkqxz> Er, what?  How can it possibly be that slow?  (Unless you're using a 3Hz Bay Trail or something.)
[16:00:00 CEST] <nevcairiel> without magic copy it slows down that much for me on windows too
[16:00:07 CEST] <nevcairiel> or at least used to
[16:00:17 CEST] <nevcairiel> more revent drivers a nd/or hardware seem to have improved that
[16:00:20 CEST] <nevcairiel> recent*
[16:00:34 CEST] <wm4> yeah, in some cases system memcpy seems to be just as fast
[16:00:36 CEST] <nevcairiel> dont have my old intel gpus around anymore
[16:00:47 CEST] <jkqxz> Download runs at ~800fps or so with the GPU copy.  Magic memcpy can match that on the high-power cores, but it seems to suck on the low-power ones.
[16:00:50 CEST] <nevcairiel> gave my sandybridge to my father =p
[16:01:43 CEST] <iive> no dma to system ram?
[16:02:03 CEST] <nevcairiel> old ones didnt have that, not sure if driver or hardware thing
[16:02:11 CEST] <nevcairiel> new setups seem to do it for me
[16:02:28 CEST] <nevcairiel> at least the braswell i tested last week was fine with just memcpy
[16:03:11 CEST] <BtbN> 1080p on bsw with memcpy is ~10 fps for me.
[16:03:34 CEST] <nevcairiel> use windows then! :P
[16:03:41 CEST] <bofh_> anything with enhanced rep/mov stosb (cpuid eax=7, bit 9 of ebx) *should* be fine with just memcpy
[16:03:43 CEST] <nevcairiel> or update drivers, maybe it helps
[16:03:48 CEST] <bofh_> (assuming memcpy isn't implemented idiotically)
[16:04:03 CEST] <BtbN> which drivers? It's all in kernel or vaapi intel.
[16:04:09 CEST] <BtbN> The graphics drivers don't matter
[16:04:11 CEST] <jkqxz> Yeah, memcpy sucks on Braswell, and magic memcpy isn't much better.  Use the GPU copy instead, it runs at >200fps.
[16:05:04 CEST] <BtbN> GPU memcpy?
[16:06:22 CEST] <jkqxz> What hwcontext_vaapi.c does at the moment.
[16:06:32 CEST] <BtbN> that barely manages 60 FPS for 1080p.
[16:08:49 CEST] <jkqxz> Actually looking at my results, it is ~120fps.  Increasable to ~150fps with a bit of hacking (avoid the second copy), but not in a way that fits cleanly into the current API.
[16:09:21 CEST] <jkqxz> Maybe you are doing something slow after that step?
[16:09:28 CEST] <bofh_> I just took a look at the libav diff for a GPU memcpy() and it's literally just a non-temporal memcpy.
[16:09:50 CEST] <bofh_> (no idea why they use movntdqa when movntps is equivalent and SSE2 instead of SSE4.1, but <shrug>)
[16:09:51 CEST] <BtbN> decoding to -c:v rawvideo -f null -?
[16:10:07 CEST] <bofh_> https://github.com/libav/libav/commit/d7bc52bf456deba0f32d9fe5c288ec441f1ebef5.patch for the curious
[16:10:15 CEST] <nevcairiel> bofh_: that instruction has special magic, its not just non-termporal, its specifically designed for rading from gpu memory
[16:10:32 CEST] <bofh_> nevcairiel: oh. huh. thanks, will reread intel manual.
[16:11:01 CEST] <nevcairiel> (or more specifically, from uncacheable speculative write combining memory)
[16:12:26 CEST] <bofh_> err, isn't the main use of a nontemporal load to move data to/from speculative write-combining memory, omitting caches in the process?
[16:12:37 CEST] <bofh_> load/store*
[16:13:02 CEST] <BtbN> the whole purpose of this is to get the raw frame out of VAAPI without plummeting to 10FPS because memcpy is slow for uswc memory
[16:14:56 CEST] <jkqxz> No, the whole purpose was to get the raw frame out of dxva where there isn't another route.  Doing it for VAAPI was put off for further study because there is already a fast path there, and even on cores where the uswc load is fast the gains very marginal and required threading.
[16:15:08 CEST] <BtbN> "fast"
[16:15:09 CEST] <bofh_> ahh, I see, movntdqa in SSE4 is the only nontemporal *load*.
[16:15:25 CEST] <bofh_> the rest are only nontemporal *stores*.
[16:15:33 CEST] <BtbN> it's not fast enough for anything beyond 1080p
[16:15:40 CEST] <BtbN> 1080p60 to be precise
[16:20:06 CEST] <BtbN> philipl, https://github.com/FFmpeg/FFmpeg/compare/master...BtbN:master it's getting somewhere.
[16:21:10 CEST] <iive> wasn't glibc memcpy at some point implemented to copy from back to front, because on atom cpu it is faster? i remember some memcpy vs memmove issues
[16:21:41 CEST] <philipl> BtbN: nice stuff.
[16:22:02 CEST] <jkqxz> BtbN:  It is much faster for me.  I will be able to test this later today; maybe if you could share how you are testing we could work out where the discrepancy is coming from.
[16:22:20 CEST] <philipl> BtbN: that tex ref stuff might work to get a linear pointer to the texture.
[16:40:56 CEST] <philipl> BtbN: not sure how I work out the alignment though
[16:51:22 CEST] <cone-183> ffmpeg 03Matthieu Bouron 07master:091915165118: lavc/mediacodecdec: fix size variable shadowing in ff_mediacodec_dec_decode
[17:12:17 CEST] <BtbN> philipl, work out the alignment?
[17:14:44 CEST] <philipl> BtbN: GL texture -> 2D array -> device pointer
[17:14:59 CEST] <philipl> Copying to that memory will require an alignment and it won't be 256
[17:15:27 CEST] <BtbN> 256 is definitely enough for that.
[17:16:03 CEST] <philipl> The texture is already laid out in memory. If you blindly use 256, it'll be wrong
[17:17:14 CEST] <BtbN> but that function takes care of that?
[17:17:30 CEST] <philipl> How could it. That information gets discarded
[17:18:06 CEST] <BtbN> you give it the width and height in the ARRAY_DESCRIPTOR
[17:18:11 CEST] <BtbN> And the number of channels
[17:18:19 CEST] <philipl> I start with a GL Texture with some alignment. I map it to a CUarray with internally tracked alignment. I bind the array to a texref (with internal alignment). I get the linear pointer from the texref and that's it.
[17:18:49 CEST] <philipl> the memcpy2d needs to know the alignment to suceed.
[17:19:06 CEST] <BtbN> You create an OpenGL Texture, get the CUtexref for it, call cuTexRefSetAddress2D, and you can use the texture.
[17:19:12 CEST] <philipl> No.
[17:19:59 CEST] <philipl> You create the texture. You then use interop to get a CUarray, then use cuTexRefSetArray, then you use cuTexRefGetddress
[17:20:16 CEST] <philipl> You can't 'get' a texref from a GL texture.
[17:21:17 CEST] <philipl> The punchline is that if I define an external buffer pool of these things, I can't tell cuvid what the alignment is.
[17:24:29 CEST] <philipl> The API is based around asking for buffers of a given size, where the hwcontext decides on alignment. But I can't provide GL texture backed buffers on that basis. I have to create them based on frame dimensions and they have whatever alignment GL gives them.
[18:47:06 CEST] <cone-183> ffmpeg 03Carl Eugen Hoyos 07master:d2af93ac1608: configure: Also try -mstack-alignment for clang,
[19:17:40 CEST] <cone-183> ffmpeg 03James Almer 07master:9b8ac526f6b2: avformat/matroskaenc: don't write an empty Colour master element
[19:17:41 CEST] <cone-183> ffmpeg 03James Almer 07master:a4044498f777: avformat/matroskadec: check for more reserved values on some Colour elements
[19:38:50 CEST] <jamrial> ubitux: your fate clients have been reporting problems with fate-aic for days now
[19:39:09 CEST] <jamrial> which is weird since no other client does
[19:39:51 CEST] <ubitux> i guess that's the bitflip thing
[19:40:03 CEST] <ubitux> didn't occur much lately though
[19:40:21 CEST] <ubitux> i disabled one disk from the raid, maybe i should switch disk to test
[19:40:25 CEST] <ubitux> give me a moment
[19:41:32 CEST] <ubitux> 78945f998c63b69be7cf84380c08bc39  fate/fate-suite/aic/small_apple_intermediate_codec.mov
[19:41:36 CEST] <ubitux> yup. yet again.
[22:32:40 CEST] <cone-988> ffmpeg 03James Almer 07master:d41aeea8a64b: avformat/matroskaenc: print debug message with cluster offsets only if the output is seekable
[22:32:40 CEST] <cone-988> ffmpeg 03James Almer 07master:4e3bdf729a80: avformat/matroskaenc: always use a dynamic buffer when writting clusters
[22:32:40 CEST] <cone-988> ffmpeg 03James Almer 07master:6724525a1576: avformat/matroskaenc: write a CRC32 element on each Cluster
[22:32:40 CEST] <cone-988> ffmpeg 03James Almer 07master:3b189fae7328: avformat/matroskaenc: write a CRC32 element on SeekHead
[22:32:40 CEST] <cone-988> ffmpeg 03James Almer 07master:79248795d4af: avformat/matroskaenc: write a CRC32 element on Cues
[22:32:40 CEST] <cone-988> ffmpeg 03James Almer 07master:87ce2595de2c: avformat/matroskaenc: write a CRC32 element on Tracks
[22:32:40 CEST] <cone-988> ffmpeg 03James Almer 07master:eccefece616e: avformat/matroskaenc: write a CRC32 element on Chapters
[22:32:41 CEST] <cone-988> ffmpeg 03James Almer 07master:4687240d52b3: avformat/matroskaenc: write a CRC32 element on Attachments
[22:32:42 CEST] <cone-988> ffmpeg 03James Almer 07master:650e17d88b63: avformat/matroskaenc: write a CRC32 element on Tags
[22:32:43 CEST] <cone-988> ffmpeg 03James Almer 07master:3bcadf822711: avformat/matroskaenc: write a CRC32 element on Info
[23:43:17 CEST] <cone-988> ffmpeg 03James Almer 07master:711bfb33df4e: avformat/matroskaenc: add an option to disable writting CRC32 elements
[00:00:00 CEST] --- Fri Oct  7 2016


More information about the Ffmpeg-devel-irc mailing list