[Ffmpeg-devel-irc] ffmpeg-devel.log.20170105

burek burek021 at gmail.com
Fri Jan 6 03:05:02 EET 2017


[00:36:30 CET] <bofh_>  18/win 18
[00:41:03 CET] <cone-003> ffmpeg 03Bela Bodecs 07master:8c9c43fc43f8: avformat/hlsenc: bugfix in duplicate filename detection
[01:03:38 CET] <Sesse> hmm, frame->width and frame->height are 0x0
[01:03:41 CET] <Sesse> that's inconvenient
[01:04:10 CET] <Sesse> perhaps call av_frame_get
[01:04:50 CET] <Sesse> then I coredumped instead
[01:04:56 CET] <Sesse> and valgrind coredumps
[01:05:10 CET] <Sesse> perhaps time to --enable-debug
[01:07:48 CET] <RiCON> hm, decklink is silently disabled if w32threads aren't disabled or pthreads aren't specifically enabled on windows
[01:07:59 CET] <RiCON> shouldn't configure say something?
[01:09:43 CET] <kierank> Sesse: you need to call ff_set_dimensions
[01:09:55 CET] <Sesse> kierank: with 1920x1088?
[01:10:01 CET] <kierank> have a look at what I do in https://git.videolan.org/?p=ffmpeg.git;a=blob;f=libavcodec/cfhd.c
[01:10:02 CET] <Sesse> it dies long before I get to 1080, though
[01:10:16 CET] <kierank> https://git.videolan.org/?p=ffmpeg.git;a=blob;f=libavcodec/cfhd.c#l440
[01:10:53 CET] <Sesse> hmm
[01:12:13 CET] <kierank> actually you don't have that
[01:12:15 CET] <kierank> the container does it
[01:12:31 CET] <Sesse> yeah, I have frame->width and frame->height
[01:12:41 CET] <Sesse> that's 1920x1080, although I need to tell it 1920x1088 somehow
[01:12:42 CET] <kierank> you need it in avctx I believe
[01:12:48 CET] <Sesse> still, this isn't my problem (yet)
[01:12:48 CET] <kierank> and then it does get_buffer
[01:12:52 CET] <kierank> and gives you a filled in frame
[01:12:57 CET] <kierank> yeah not sure how to deal with cropping
[01:17:17 CET] <Sesse> let's see, so my base pointer is 0x1a252080
[01:17:23 CET] <Sesse> 0x1a565ba0 is seemingly way off
[01:17:30 CET] <kierank> base pointer of what?
[01:17:35 CET] <Sesse> the y output
[01:17:50 CET] <Sesse> okay, so that really is way too far
[01:17:56 CET] <kierank> have you set the correct pixel format
[01:18:00 CET] <kierank> and are you using the stride
[01:18:01 CET] <Sesse> since it's 3226400 bytes into something that should be 2073600 bytes
[01:18:08 CET] <Sesse>     avctx->pix_fmt= AV_PIX_FMT_YUV420P;
[01:18:11 CET] <Sesse> that ought to be right no?
[01:18:14 CET] <kierank> yes
[01:18:33 CET] <Sesse> I must be doing some silly arithmetic mistake
[01:18:37 CET] <Sesse>             uint8_t *dest_y  = frame->data[0] + frame->linesize[0] * (y + field_number);
[01:18:51 CET] <Sesse> then a bunch of
[01:18:51 CET] <Sesse>                 if ((ret = decode_speedhq_block(s, &gb, 0, last_dc, dest_y, linesize_y)) < 0)
[01:18:54 CET] <Sesse> and finally 
[01:18:55 CET] <Sesse>                 dest_y += 16;
[01:18:58 CET] <Sesse> repeat for each macroblock
[01:21:00 CET] <Sesse> so, dest_y=0x1a37e080, but ff_put_pixels_clamped_sse2 writes to address 0x1a565ba0
[01:21:04 CET] <kierank> I assume you have logic to detect end of line
[01:21:04 CET] <Sesse> that's a bit extensive
[01:21:35 CET] <Sesse> note, linesize_y is twice frame->linesize[0], because interlacing
[01:21:36 CET] <Sesse> but still
[01:21:48 CET] <Sesse> and yes, it ought to detect end of line just fine
[01:22:03 CET] <kierank> y + field_number?
[01:22:10 CET] <Sesse> yes, field_number is 0 or 1
[01:22:13 CET] <Sesse> so offset for second field
[01:22:17 CET] <Sesse> I decode as psf
[01:22:22 CET] <kierank> ok
[01:22:30 CET] <Sesse> okay, so at the last macroblock
[01:22:31 CET] <Sesse> dest_y=0x1a3ba0d0
[01:22:36 CET] <Sesse> but write is to 0x1a586148
[01:23:07 CET] <Sesse> I must be misunderstanding idct_put somehow
[01:23:27 CET] <kierank> y goes from [0, 544]?
[01:23:33 CET] <kierank> 543 i mean?
[01:23:35 CET] <Sesse> should be 1080
[01:23:37 CET] <Sesse> in theory
[01:23:41 CET] <Sesse> well, 1088
[01:23:49 CET] <Sesse> let me see if perhaps it's chroma I messed up
[01:23:56 CET] <kierank> linesize i think
[01:24:11 CET] <kierank> frame->linesize[0] * (y + field_number);
[01:24:12 CET] <kierank> that is weird
[01:24:16 CET] <Sesse> why? it's the stride
[01:24:21 CET] <kierank> the latter
[01:24:30 CET] <Sesse> I don't understand
[01:24:41 CET] <kierank> each field is 544 lines
[01:24:43 CET] <Sesse> no
[01:24:47 CET] <Sesse> the image is 1088 lines
[01:24:53 CET] <Sesse> and I decode it as such
[01:24:53 CET] <kierank> sure so each field is 544
[01:24:56 CET] <Sesse> but I only write to every other line
[01:25:05 CET] <kierank> yes and you loop to 1088 over that
[01:25:07 CET] <kierank> as far as I can tell
[01:25:10 CET] <Sesse> no, I don't
[01:25:19 CET] <Sesse> it's a bit more complicated, due to the slicing and stuff
[01:25:31 CET] <Sesse> okay, the issue isn't y, it's cb
[01:25:52 CET] <kierank> then you subsampling is miscalculated
[01:25:55 CET] <Sesse> probably
[01:26:09 CET] <Sesse> let me see precisely which block is going wrong
[01:26:24 CET] <Sesse> aaah
[01:26:25 CET] <Sesse> it's 422, of course
[01:26:26 CET] <Sesse> not 420
[01:26:33 CET] <kierank> heh
[01:26:59 CET] <Sesse> hey, I get output... garbled output, thoug
[01:27:02 CET] <Sesse> probably some dct scale
[01:27:13 CET] <Sesse> (it's super-intense)
[01:27:29 CET] <kierank> the mpeg2 dct should work in principle
[01:27:34 CET] <kierank> you're not doing anything odd
[01:27:37 CET] <Sesse> I need to scale down some
[01:27:38 CET] <Sesse> I suppose
[01:29:20 CET] <Sesse> okay, no, still only garbage
[01:31:23 CET] <Sesse> probably something with my zigzag or something
[01:31:46 CET] <kierank> well see if dc works
[01:31:52 CET] <Sesse> don
[01:31:53 CET] <kierank> also remember to zero the buffer
[01:31:55 CET] <Sesse> don't be so reasonable
[01:33:14 CET] <Sesse> luma-only doesn't work, and seems to give me lots of chroma
[01:33:15 CET] <Sesse> that's fun
[01:33:26 CET] <Sesse> err, of course that's because I'm doing dc-only
[01:33:29 CET] <Sesse> but that didn't work either, so
[01:34:49 CET] <Sesse> ok, I'd better sleep, Real Work(TM) tomorrow
[01:35:13 CET] <Sesse> good night :-)
[10:09:25 CET] <cone-538> ffmpeg 03Steven Liu 07master:93593674bc8d: avformat/hlsenc: fix memleak in hlsenc
[11:05:40 CET] <durandal_170> michaelni: how much bits can init vlc sparse accept?
[11:06:11 CET] <durandal_170> I seem to not get some numbers
[11:45:11 CET] <cone-538> ffmpeg 03Carl Eugen Hoyos 07master:f3adb6f74b8c: configure: Fix standalone compilation of the ljpeg encoder.
[12:08:26 CET] <Sesse> kierank: fwiw, I think it's not my IDCT that's wrong, but my coefficient parsing
[12:08:37 CET] <Sesse> kierank: it ends up eating up the data way too fast, because the last half or so of the slice is just nothing
[12:09:02 CET] <cone-538> ffmpeg 03Michael Niedermayer 07master:8f1d18a91be9: avcodec/bitstream: assert that *_size in ff_init_vlc_sparse() is valid
[12:09:03 CET] <cone-538> ffmpeg 03Michael Niedermayer 07master:7ca2a23aaad7: avcodec/bitstream: Document the values supported for *_size in ff_init_vlc_sparse()
[12:09:23 CET] <Sesse> with my own code, I had safeguards in place that would assert-fail on various impossible conditions (like trying to write to coefficient 100), but I don't have those in the ffmpeg version right now
[12:09:44 CET] <Sesse> (especially since EOB is implemented as a symbol with just a very high run)
[12:18:48 CET] <cone-538> ffmpeg 03Carl Eugen Hoyos 07master:e6050d81b019: lavc/Makefile: Clean up the amv encoder dependencies.
[12:24:41 CET] <michaelni> durandal_170, added docs and assert but maybe you meant some other bits than these
[12:26:38 CET] <cone-538> ffmpeg 03Bela Bodecs 07master:4c63910bdbf0: vformat/hlsenc: typo in default localtime pattern
[12:45:30 CET] <durandal_170> michaelni: the length of bits, have table with maximal length of 18
[12:50:29 CET] <michaelni> the used nb_bits * depth must be >= 18 then
[12:55:04 CET] <durandal_170> here is spectrum of native and reference decoder output: https://transfer.sh/RARww/mpvshot
[12:55:33 CET] <durandal_170> referene is on right
[12:55:55 CET] <durandal_170> why it is so rich?
[16:20:43 CET] <durandal_170> I finished decoder, just need to rename some variables
[16:54:49 CET] <BBB> durandal_170: \o/
[18:11:10 CET] <bofh_> atomnuker: so I just applied your patch locally and finished fighting w/perf_events, so hopefully we'll see in a sec.
[18:32:51 CET] <Sesse> kierank: now my coefficient reading is fine, now it's about the dct
[18:32:58 CET] <Sesse> it's almost as if my DC is wrong
[18:33:15 CET] <kierank> or a bit large
[18:34:23 CET] <Sesse> https://home.samfundet.no/~sesse/shq2_ffmpeg.png
[18:35:24 CET] <kierank> maybe a shift thing
[18:35:34 CET] <kierank> have a look how dc_precision does it
[18:35:42 CET] <Sesse> I've tried shifting both ways
[18:35:43 CET] <Sesse> neither really helps
[18:35:54 CET] <Sesse> also tried dc-only, no go
[18:36:56 CET] <BBB> is this a rgb codec?
[18:37:01 CET] <Sesse> no, ycbcr
[18:37:12 CET] <BBB> ok, so start with ignoring color, i.e. do y only
[18:37:52 CET] <BBB> the greenness is luma being off, which is likely a missing +128 or so (note how luma [0,255] is actually [-128,127], so a coded zero should be represented as 128)
[18:38:05 CET] <BBB> sorry, greenness = chroma being off
[18:38:13 CET] <Sesse> hmm
[18:38:19 CET] <Sesse> there's perhaps some other convention here
[18:38:28 CET] <Sesse> my decoder assumes chroma=0 is no chroma
[18:39:40 CET] <BBB> I mean the dct coefficient values, so the dc of chroma coefficient blocks basically
[18:39:43 CET] <Sesse> okay, so I memset chroma to 128 everywhere, now it's all black and white
[18:39:50 CET] <BBB> cool
[18:39:54 CET] <Sesse> so, for idct
[18:39:56 CET] <BBB> is the b/w correct?
[18:39:58 CET] <Sesse> no
[18:40:00 CET] <Sesse> still entirely wrong
[18:40:02 CET] <BBB> can I see? :)
[18:40:08 CET] <BBB> (not surprising btw)
[18:40:08 CET] <Sesse> for idct, what's the right base for 128?
[18:40:51 CET] <Sesse> https://home.samfundet.no/~sesse/shq2_ffmpeg_2.png
[18:41:03 CET] <Sesse> https://home.samfundet.no/~sesse/shq2_luma_10.png is roughly what it looks like with my own (non-ffmpeg) decoder
[18:43:28 CET] <BBB> do you read coefficients with the correct scantable permutation?
[18:43:35 CET] <Sesse> I believe so
[18:43:47 CET] <Sesse>     ff_init_scantable_permutation(ctx->idsp.idct_permutation, FF_IDCT_PERM_NONE);
[18:43:50 CET] <Sesse>     ff_init_scantable(ctx->idsp.idct_permutation, &ctx->intra_scantable, ff_zigzag_direct);
[18:44:01 CET] <Sesse> and then I read s->intra_scantable.permutated
[18:44:26 CET] <Sesse> if I go dc-only, it still is wrong in the same fashion (just more blocky)
[18:44:37 CET] <Sesse> so there's something with the offset that ffmpeg wants
[18:44:43 CET] <Sesse>     block[scantable[0]] = (last_dc[component] * quant_matrix[0] + 1024);
[18:44:53 CET] <Sesse> this gives correct results with my own decoder (with my own idct)
[18:45:16 CET] <BBB> you mean while (coeff = read_coeff()) { coeff[I believe the 1024 is correct in ffmpeg
[18:45:20 CET] <BBB> it looks like it shifts by 3 also
[18:45:44 CET] <Sesse> I lost you
[18:47:08 CET] <BBB> 1024 = 128 * 8 or 128 << 3
[18:47:20 CET] <Sesse> I tried >> 3 at the end, if that's what you're asking
[18:47:29 CET] <BBB> at the end of what?
[18:47:36 CET] <Sesse> block[scantable[0]] = (last_dc[component] * quant_matrix[0] + 1024) >> 3;
[18:48:31 CET] <Sesse> for output pixel 0..255, what is the correct range of the dc coefficient?
[18:48:42 CET] <Sesse> 0..255? 0..1023? -2048..2048? something else?
[18:48:53 CET] <BBB> 0..2048 I believe
[18:48:59 CET] <BBB> for the idct impl that youre using
[18:49:07 CET] <BBB> DC_SHIFT is 3 in simple_idct_template.c
[18:49:17 CET] <Sesse> I didn't actively pick an idct
[18:49:18 CET] <Sesse> I did
[18:49:19 CET] <Sesse>     ff_idctdsp_init(&ctx->idsp, avctx);
[18:49:26 CET] <Sesse> and supposed that would give me something reasonable
[18:49:35 CET] <BBB> which idct func are you calling?
[18:49:47 CET] <BBB> and what type of idct are you looking for (8x8 right?)
[18:49:54 CET] <Sesse> ctx->idsp->      s->idsp.idct_put(dest, linesize, block);
[18:49:55 CET] <Sesse> eh
[18:49:55 CET] <Sesse>       s->idsp.idct_put(dest, linesize, block);
[18:50:00 CET] <Sesse> and yes, 8x8
[18:51:36 CET] <BBB> I dont think you want the >> 3
[18:51:57 CET] <BBB> (in block[scantable[0]] assignment)
[18:51:57 CET] <Sesse> doesn't work with it, doesn't work without it
[18:52:12 CET] <Sesse> but just setting block[scantable[0]] = 1024 seems to give me an uniform gray
[18:52:17 CET] <BBB> yes
[18:52:20 CET] <BBB> that should set all values to 128
[18:52:22 CET] <BBB> so gray
[18:52:37 CET] <Sesse> so it's curious
[18:52:40 CET] <BBB> what is the value range of last_dc[..] and quant_matrix[0]?
[18:52:41 CET] <Sesse> maybe something is off in my quant matrix
[18:52:49 CET] <Sesse> quant_matrix[0] should always be 16, period
[18:52:50 CET] <BBB> its possible, yes
[18:53:01 CET] <BBB> always 16& ok
[18:53:07 CET] <Sesse> and evidently, it is
[18:53:12 CET] <BBB> and last_dc[component] is &
[18:53:16 CET] <BBB> 0-255?
[18:53:43 CET] <Sesse> no, it's a much wider range, including negative
[18:53:49 CET] <Sesse> iirc it's something like -1024..1024
[18:53:59 CET] <BBB> ok, so a range of 2048
[18:54:06 CET] <Sesse> don't take that as gospel
[18:54:09 CET] <Sesse> but it's somewhere around that
[18:54:34 CET] <BBB> then you need block[scantable[0]] = 1024 + ((last_dc[component] * quant_matrix[0]) >> 4);
[18:55:12 CET] <Sesse> hmm, I'm trying to mimic basically what their code is doing, to get as close in precision as possible
[18:55:30 CET] <BBB> then you need your own idct which is simple_idct with 4 more bits (similar to prores_idct)
[18:55:52 CET] <BBB> its totally fine, but for testing it may be quicke to just remove the bits and use simple_idct, just to make sure its all correct'ish
[18:55:54 CET] <Sesse> are you saying I might be overflowing at some point?
[18:56:26 CET] <BBB> non, just that youre truncating signal
[18:56:33 CET] <BBB> its not very important for testing purposes
[18:57:10 CET] <Sesse> I struggle with why I need to change that 1024 factor compared to last_dc*quant_matrix in ffmpeg, but not in my own, completely plain idct
[18:57:24 CET] <BBB> one probably adds it as baseline, the other doesn't
[18:57:35 CET] <Sesse> mine doesn't
[18:57:43 CET] <BBB> you sure?
[18:57:49 CET] <Sesse> well, I wrote it myself
[18:58:15 CET] <BBB> idct(coeff[]) with a coeff dc centered around 0 (value range [-1024,1024]) would give negative pixel values in half of the cases, right?
[18:58:18 CET] <BBB> that doesnt make any sense
[18:58:22 CET] <Sesse> thus te +1024
[18:58:24 CET] <BBB> right
[18:58:25 CET] <Sesse> which is already there
[18:58:28 CET] <Sesse>     block[scantable[0]] = (last_dc[component] * quant_matrix[0] + 1024);
[18:58:43 CET] <BBB> the 1024 needs to be added pre-dequant though
[18:58:53 CET] <Sesse> so that's what I don't understand
[18:58:54 CET] <BBB> otherwise it wouldnt work obviously
[18:58:58 CET] <Sesse> that's not obvious to me
[18:59:03 CET] <BBB> hm ...
[18:59:25 CET] <BBB> dc is the average pixel value in the block right?
[18:59:29 CET] <BBB> that should always be positive
[18:59:34 CET] <Sesse> which +1024 does
[18:59:37 CET] <BBB> so block[scantable[0]] must always be positive
[19:00:00 CET] <BBB> if last_dc[component] is [-1024,1024], and quant_matrix[0] is 16, then the product of these 2 is [-16k,16k]
[19:00:06 CET] <BBB> so the offset must also be 16k
[19:00:11 CET] <BBB> to get an always positive dc
[19:00:16 CET] <BBB> otherwise it doesnt make any sense
[19:00:28 CET] <Sesse> hmm, let me think
[19:00:44 CET] <BBB> is it possible that in your (non-ffmpeg) code, quant_matrix[0] is not actually 16 but 1?
[19:00:57 CET] <BBB> then it would totally make sense again
[19:00:57 CET] <Sesse> yes!
[19:00:58 CET] <Sesse> that's it
[19:01:00 CET] <BBB> aha
[19:01:01 CET] <BBB> ok
[19:01:04 CET] <BBB> well that was easy
[19:01:04 CET] <Sesse> you're perfectly right
[19:01:15 CET] <Sesse> hmm
[19:01:18 CET] <Sesse> no
[19:01:30 CET] <Sesse> sorry, I've forgotten all of my own code
[19:01:36 CET] <Sesse> in a mere few days
[19:02:03 CET] <Sesse> now dc level is right
[19:02:05 CET] <Sesse> on to ac
[19:02:15 CET] <BBB> the ac shouldnt need the offset
[19:02:20 CET] <Sesse> this is correct
[19:02:23 CET] <Sesse> and I also don't
[19:02:28 CET] <BBB> since all acs are centered around 0
[19:02:41 CET] <Sesse> this looks more like a permutation problem to me
[19:03:00 CET] <Sesse> https://home.samfundet.no/~sesse/shq2_ffmpeg_3.png
[19:03:02 CET] <BBB> thats what I said earlier ;) your code looks right but its very tricky
[19:03:45 CET] <BBB> youre right-shifting acs by 4 also after dequant right?
[19:03:53 CET] <Sesse> yes
[19:03:55 CET] <Sesse>             block[j] = (level * quant_matrix[i]) >> 4;
[19:04:01 CET] <Sesse> j is scantable[i]
[19:04:35 CET] <BBB> and scantable = s->intra_scantable.permutated ?
[19:04:37 CET] <Sesse> yes
[19:05:01 CET] <BBB> I used to know how that scantable permutation monster worked & let me refresh my memory ..
[19:05:32 CET] <Sesse> if I just say block[i], I get another result, that's wrong in a different way
[19:07:42 CET] <BBB> I dont think youre supposed to call ff_init_scantable_permutation yourself
[19:07:52 CET] <BBB> I think the dsp implementation should do that for you
[19:08:04 CET] <BBB> you just call ff_idctdsp_init(..) and then read the value it set for you
[19:08:12 CET] <BBB> (using ff_scantable_init)
[19:08:51 CET] <Sesse> hey, that worked
[19:08:54 CET] <kierank> Sesse: https://github.com/kierank/ffmpeg-sstp/blob/master/libavcodec/mpeg4videodec.c#L1789
[19:08:58 CET] <kierank> oh
[19:09:14 CET] <Sesse> now it decodes just fine
[19:09:37 CET] <BBB> \o/
[19:10:04 CET] <Sesse> it needs 50% of one core, though
[19:10:05 CET] <Sesse> for 1080p
[19:10:57 CET] <BBB> and your own (non-ffmpeg) decoder?
[19:11:07 CET] <Sesse> that's way way way slower
[19:11:10 CET] <Sesse> 500ms per frame or something
[19:11:13 CET] <Sesse> it's not built for speed at all
[19:11:39 CET] <Sesse> newtek claims ~2500 fps for 1080p, although that's all four threads and on a desktop i7
[19:11:44 CET] <Sesse> and encoding
[19:12:54 CET] <BBB> 2500 fps o_o
[19:13:17 CET] <BBB> maybe thats ricing
[19:13:28 CET] <Sesse> they claim avx2 and stuff
[19:13:40 CET] <kierank> BBB: yes their code is really riced
[19:13:48 CET] <BBB> theres a lot of opportunity to improve
[19:13:56 CET] <kierank> it's one function with tons of inline asm
[19:14:00 CET] <kierank> we wouldn't allow that kind of thing here
[19:14:08 CET] <BBB> for example, if you permutate the dequant table, you can merge the dequant into the idct (see prores)
[19:14:25 CET] <BBB> the bitstream reading itself will become the main factor of cpu time at that point
[19:14:59 CET] <Sesse> kierank: supposedly also avx2
[19:15:03 CET] <Sesse> I guess they can do multiple idcts in parallel then
[19:16:16 CET] <BBB> that makes sense
[19:16:23 CET] <Sesse> http://pastebin.com/w8nQBTwb -- current profile
[19:16:31 CET] <BBB> you can do 2 8x8s using avx2
[19:16:32 CET] <kierank> most of that is ffmpeg
[19:16:38 CET] <kierank> do ffmpeg -i foo -f null -
[19:16:50 CET] <kierank> most of that is ffplay I mean
[19:16:55 CET] <kierank> doing chroma conversions and whatever
[19:16:56 CET] <BBB> most of that is swscale
[19:17:07 CET] <Sesse> Decoder (codec pcm_s16le) not found for input stream #0:1
[19:17:08 CET] <Sesse> hmm
[19:17:09 CET] <Sesse> let's see
[19:17:24 CET] <Sesse> I can never figure out the ffmpeg command line
[19:17:30 CET] <kierank> ffmpeg -i foo -f null -
[19:17:42 CET] <Sesse> yes, that's what I'm doing
[19:17:43 CET] <kierank> -an i think
[19:17:52 CET] <kierank> i had this problem actually myself
[19:17:55 CET] <Compn> who trolled everyone into working on shq all of a sudden :P
[19:17:58 CET] <Sesse> there we are
[19:17:59 CET] <Sesse> frame=  250 fps=133 q=-0.0 Lsize=N/A time=00:00:10.00 bitrate=N/A speed=5.31x    
[19:18:11 CET] <BBB> 133 fps is  not the worst
[19:18:14 CET] <Sesse> this is on a i7-4600U
[19:18:34 CET] <Sesse> http://pastebin.com/YUmT1Jsx
[19:18:35 CET] <Compn> can always wait until someone pays you to optimize it :D
[19:18:38 CET] <BBB> its an intra-only codec right? so its easy to speed it up by marking it as an intra only codec so it auto-uses frame threading
[19:18:39 CET] <Compn> leave it slow :PPP
[19:18:52 CET] <kierank> and use ff_thread_get_buffer iirc as well
[19:18:59 CET] <BBB> whats idct?
[19:19:19 CET] <Sesse> I don't know why it chose idct, it seems very mmx
[19:20:06 CET] <BBB> its sse2 simple_idct impl
[19:20:14 CET] <Sesse> libavcodec/x86/simple_idct.c
[19:20:26 CET] <BBB> oh no, its mmx
[19:20:28 CET] <BBB> youre right
[19:20:29 CET] <Sesse> it's in there
[19:20:29 CET] <BBB> ...
[19:20:34 CET] <kierank> yeah default mpeg-2 is mmx
[19:20:40 CET] <BBB> holy crap we have actual mmx code
[19:20:54 CET] <BBB> anyway that should be trivial to sse2ize, I wrote a sse2 idct like simple_idct for prores
[19:21:02 CET] <BBB> just adapt that a tiny bit and itll work for simple_idct also
[19:21:10 CET] <Sesse> I don't really think I want to write my own idct...
[19:21:11 CET] <BBB> or like compn said, wait until somebody pays you to do that
[19:21:15 CET] <Sesse> rotfl
[19:21:25 CET] <Sesse> what I would really love, is if someone paid a hex-rays license :-P
[19:21:33 CET] <BBB> that would be nice yes
[19:21:47 CET] <BBB> anyway, idct as runtime sink is ok IMO
[19:22:01 CET] <BBB> its dsp code so easy to fix later on
[19:22:12 CET] <Sesse> http://pastebin.com/YUmT1Jsx
[19:22:52 CET] <Sesse> so bitstream reader is actually more
[19:23:10 CET] <BBB> its like 50/50
[19:23:24 CET] <BBB> Im gonna grab lunch
[19:23:25 CET] <BBB> brb
[19:24:09 CET] <Sesse> I wonder how much the zero-extension of stuff like the scantable costs
[19:24:14 CET] <Sesse> I don't understand why that's uint8_t
[19:24:22 CET] <Sesse> no way it's going to kill the L1 cache
[19:28:14 CET] <Sesse> but okay, I now have one primary concern: how do I deal with cropping?
[19:28:32 CET] <Sesse> ie., if it's 1920x1080, I need to round up to 1920x1088 and decode that
[19:30:01 CET] <fritsch> h264's data should already be 1088
[19:30:05 CET] <fritsch> and you are supposed to crop
[19:30:25 CET] <Sesse> am I as codec supposed to crop?
[19:31:03 CET] <fritsch> nope, you aren't
[19:31:10 CET] <fritsch> but if you were a player
[19:32:04 CET] <Sesse> where does h264 come into play?
[19:34:45 CET] <fritsch> mpeg2 and mpeg4 require frame sizes to be multiplies of 16 because of the macroblock size
[19:34:56 CET] <Sesse> I know. you're not answering my question at all
[19:35:15 CET] <Sesse> I'm implementing a codec. how do I tell ffmpeg that I'm decoding to a larger buffer than they expect?
[19:35:16 CET] <fritsch> your question cannot be answered without context
[19:35:27 CET] <BtbN> the h264 bitstream has fields that control cropping
[19:35:53 CET] <BtbN> and avcodec has fields for width vs. coded_width, and so on.
[19:36:08 CET] <Sesse> BtbN: can I set coded_width from the codec?
[19:36:16 CET] <BtbN> But after decoding it doesn't matter at all
[19:36:26 CET] <BtbN> Nobody needs to care about some extra data at the end of the buffer
[19:36:44 CET] <Sesse> no, but the frame needs to be big enough
[19:36:55 CET] <Sesse> and ff_get_buffer gives me a 1920x1080 frame
[19:36:55 CET] <BtbN> well, so allocate one that's big enough
[19:37:02 CET] <Sesse> I'm not the one allocating it
[19:37:08 CET] <Sesse> ffmpeg is
[19:37:19 CET] <BtbN> Then you need to tell it the correct size
[19:37:24 CET] <Sesse> and how do I do that?
[19:37:31 CET] <Sesse> (I've been asking this for ten minutes now)
[19:38:57 CET] <Sesse> can I just set avctx->coded_width? if so, when does it need to be set, and when can I expect ->width to be available?
[19:41:51 CET] <BtbN> Can allways allocate it yourself
[19:42:00 CET] <BtbN> I don't see the ff h264 decoder use ff_get_buffer at all
[19:42:06 CET] <Sesse> seemingly, this works:
[19:42:07 CET] <Sesse>     avctx->coded_width = FFALIGN(avctx->coded_width, 16);
[19:42:07 CET] <Sesse>     avctx->coded_height = FFALIGN(avctx->coded_height, 16);
[19:42:11 CET] <Sesse> at the top of my decode_frame function
[19:42:12 CET] <fritsch> yes
[19:42:23 CET] <BtbN> https://github.com/FFmpeg/FFmpeg/blob/master/libavcodec/utils.c#L900
[19:42:24 CET] <fritsch> as ff_get_buffer takes avctx and a frame ptr
[19:42:29 CET] <BtbN> does not use the coded_width for allocation
[19:42:52 CET] <BtbN> ah. nevermind, it does.
[19:42:52 CET] <fritsch> it uses it by accident
[19:42:55 CET] <Sesse>             frame->width  = FFMAX(avctx->width,  AV_CEIL_RSHIFT(avctx->coded_width,  avctx->lowres));
[19:42:57 CET] <fritsch> see 912
[19:43:14 CET] <Sesse> hmm
[19:43:14 CET] <BtbN> That's not by accident at all
[19:43:24 CET] <BtbN> So yeah, just set the coded_width/height correctly, and you're good.
[19:43:43 CET] <fritsch> why not setting width / height?
[19:43:45 CET] <Sesse> yeah, frame->width/->height is still saying 1920x1080, but I get a larger buffer
[19:43:48 CET] <Sesse> so it's fine
[19:44:00 CET] <BtbN> because width/height is supposed to be the actual frame size
[19:44:03 CET] <fritsch> yes
[19:44:08 CET] <fritsch> got it while I was writing
[19:54:03 CET] <Sesse> okay, so all that's left for shq2 now is the dreaded reversing
[19:54:22 CET] <Sesse> since INIT_VLC_LE seemingly also assumes codes are to be bit-reversed
[19:54:40 CET] <Sesse> you seemingly can't have little-endian reading with non-reversed codes
[20:05:39 CET] <kierank> Sesse: you now feel our pain about random undocumented stuff
[20:08:39 CET] <Sesse> kierank: you think I've never felt any pain about random undocumented stuff before :-P
[20:17:58 CET] <Sesse> hm, why is EOB encoded as run=0,level=127 instead of just level=1,run=65? that would end the block without any extra checks needed
[20:49:18 CET] <Sesse> Compn: do you know if there are any shq0 or shq4 samples around?
[20:50:12 CET] <Compn> Sesse : you need a hex rays license? ask kodi , they can probably buy you one
[20:50:15 CET] <durandal_170> you don't have encoder?
[20:50:31 CET] <Compn> i dont think we got samples for all of shq crap
[20:51:11 CET] <Sesse> durandal_170: I have a 4:2:2 encoder
[20:51:48 CET] <Sesse> which is shq2
[20:52:11 CET] <Sesse> (but I also have an shq2 sample :-) )
[20:52:25 CET] <Sesse> newtek said they could send me their internal test cases, I'm waiting for that now
[20:52:26 CET] <Compn> you saw this http://samples.ffmpeg.org/ffmpeg-bugs/trac/ticket5506/
[20:52:38 CET] <Sesse> yeah, SHQ2_cut.avi is the one I've been testing with
[20:52:42 CET] <Sesse> haven't bothered SHQ3 yet, it's with alpha
[20:53:08 CET] <Sesse> (1/3/5/7/9 is with alpha)
[20:53:18 CET] <Sesse> (0/2/4 is without, there is no 6 or 8)
[20:54:08 CET] <Compn> let me see
[20:54:21 CET] <Compn> someone uploadded shq3 and 5 , but the links are dead
[20:54:24 CET] <Compn> maybe carl has them...
[20:55:19 CET] <Compn> http://ffmpeg-users.933282.n4.nabble.com/Unknow-codec-SHQ3-5-td4662245.html
[20:56:13 CET] Action: Compn brb afk
[21:08:22 CET] <Compn> hmm might as well restart 
[21:54:21 CET] <Compn> http://samples.ffmpeg.org/allsamples.txt has now been updated.
[21:55:55 CET] <Compn> finally found my private key again...
[21:56:07 CET] <Compn> can commit to ffmpeg again... bwahaha
[22:00:22 CET] <durandal_170> nòoooooo, save us from now on
[22:20:33 CET] <durandal_170> llogan: i don't get that warning. which gcc version is that?
[22:27:15 CET] <BBB> Sesse: whats your interest in shq2?
[22:27:42 CET] <BBB> Sesse: is this one of those I had a sample and it didnt play and I had time so I did stuff things? or is there more to it?
[22:53:13 CET] <llogan> durandal_170: 6.2.1
[23:08:58 CET] <jamrial_> it may be a false positive. wouldn't be the first we get of this warning
[23:19:04 CET] <cone-496> ffmpeg 03Bela Bodecs 07master:4068f5fac7b8: doc/muxers/hlsenc: typo hls_flag: discont_starts => discont_start
[23:19:04 CET] <cone-496> ffmpeg 03Steve Lhomme 07master:fd0716b364f8: dxva2: make ff_dxva2_get_surface() static and rename it
[23:29:55 CET] <llogan> i wasn't sure of the bogus-ity of it. also, i conviently forgot to include the ~~~~~~^~~ pointing to [idx].
[23:30:32 CET] <cone-496> ffmpeg 03Steven Liu 07master:57ae94a3c0fc: avformat/hlsenc: fix Explicit null dereferenced in hlsenc
[23:32:08 CET] <cone-496> ffmpeg 03Rostislav Pehlivanov 07master:4fdacf4cdbb3: imdct15: remove the AArch64 assembly
[23:32:09 CET] <cone-496> ffmpeg 03Rostislav Pehlivanov 07master:2d208aaabe20: imdct15: replace the FFT with a faster PFA FFT algorithm
[23:35:19 CET] <Compn> BBB : Sesse is working on decoder...
[23:35:23 CET] <Compn> i think
[23:35:49 CET] <Compn> or do you mean why working on a decoder in the first place ? i'd also kind of like to know :D
[23:36:24 CET] <cone-496> ffmpeg 03Steven Liu 07master:d1f3e475f980: avformat/test/fifo_muxer: add check for FailingMuxerPacketData alloc
[23:36:34 CET] <BBB> Compn: yeah the second ;)
[00:00:00 CET] --- Fri Jan  6 2017


More information about the Ffmpeg-devel-irc mailing list