[FFmpeg-user] concat - C needs to go before B - How?

Wed Feb 21 18:55:23 EET 2024

On 21/02/2024 05.20, Paul B Mahol wrote:
> On Wed, Feb 21, 2024 at 10:12 AM Mark Filipak <markfilipak.imdb at gmail.com>
> wrote:
> 
>> Gyan, Paul, Devin, Jim, anyone. Help!
>>
>> You folks have been following my trim+concat adventure. I think I may have
>> found the problem, not just for me. It's a general problem.
>>
>> "N" signifies one frame time. All times are relative to ptsA.
>>
>> __DTS__  _PTS__
>> ptsA-2N  ptsA    I-frame with DTS-to-PTS=2N <- call this frame "A"
>> ptsA-N   ptsA-N  B-frame with DTS-to-PTS=0  <- call this frame "B"
>> --- join here ---
>> ptsA-2N  ptsA+N  I-frame with DTS-to-PTS=3N <- call this frame "C"
>>
>> You see, dtsC is before dtsB, so C needs to go before B. That's not
>> happening.
>>
>> I don't know of a way to force C to be before B in order to test whether
>> that fixes the glitch. Do you?
>>
>> Aside: Is it okay if dtsC==dtsA?
> 
> dts can be anything, and for decoding only pts are relevant.

DTS cannot be 'anything'.
ITU H.262: "2.1.24 decoding time-stamp (DTS) (system): A field that may be present in a PES packet 
header that indicates the time that an access unit is decoded in the system target decoder." More 
relevant is that DTS is directly used to control when access units are removed from various buffers. 
ITU H.222 goes on quite a bit about buffer management, including equations in which DTS appears. It 
also covers when DTS is not present in PESs but that's not the case here because DTS is present.

Rather than providing a single state machine model (i.e., a state diagram showing circles and 
arrows), ITU H.222's 'explanation' is text that is scattered across various paragraphs that 
'explain' the various PES header tags. It's one of the things that makes reading such narrowly 
crafted specifications so much 'fun'.

Gyan suggested a discontinuity flag. ITU H.222 says something about that and that there are two 
instances but I'm not sure what exactly Gyan was referring to -- what I tried didn't provoke any 
FFmpeg complaints but it didn't work, either.

I had a thought before falling asleep last night.
If I recall correctly, there's a tag that instructs the decoder to duplicate a decoded 'picture' 
either once or twice. It may be possible to assert that tag to the current decoder rather than to 
some future decoder. I think the best way to handle the case of the 1st frame of the 2nd segment 
needing to simultaneously be on both sides of the join: before the join by DTS and after the join by 
PTS, may be to remove the B-frame that ends the 1st segment, and to duplicate the current 'picture' 
(i.e., the current decoded P-/I-frame) by using that duplication tag, instead. That would preserve 
the number of frames (and the running time) while eliminating DTS (buffer) issues and allow the 2nd 
segment's I-frame to stay in the 2nd segment. Of course, there's audio and subtitle packets with 
their own DTSs that get in the way, but it should be trivial to simply give them new DTSs that 
'move' them into the 2nd segment -- the 'error' would be undetectable.

Comment: It appears to me that mpegts decoding and buffering has gone through quite a bit of 
modification over the years as problems have cropped up and understanding has increased -- 
extensions and new tags have been created. I think the tag to which I referred in the previous 
paragraph is one such creation. Unfortunately, specifications don't usually talk about why 
such-and-such is created. They usually only present the new tags.

--Mark.