[Libav-user] Accessing coded frame pts for audio
chisholm at mitre.org
Mon Apr 15 21:19:04 CEST 2013
On 4/15/2013 1:08 PM, Brad O'Hearne wrote:
> In a processing workflow that includes first encoding a video / audio frame followed by writing that frame to a file/network stream, it is required to set the packet pts prior to encoding. The subsequent encoding operation then returns an output value (gotPacket) indicating whether a packet was returned, e.g.:
> returnVal = avcodec_encode_audio2(codecCtx, &_avPacket, _streamAudioFrame, &gotPacket);
> A packet might not have been returned, but if one has been returned, there are two notable characteristics about it:
> 1. it is not necessarily the same packet that was encoded; i.e. the encoder can return a different packet (presumably due to reordering as necessary, etc.).
> 2. The returned packet does not have an accurate pts set, and so this must be manually set in the code.
> When dealing with a video stream, the general approach to setting packet pts is to grab the AVCodecContext's pts of its coded frame, and then rescale it from a value relative to the context's time_base (that's the one based on frame rate) to one relative to the stream's time_base (the one based on time), as such:
> _avPacket.pts = av_rescale_q(codecCtx->coded_frame->pts, codecCtx->time_base, _videoStream->time_base);
> The key point there is that the pts of the frame involved with the packet returned is accessible in the context's coded_frame, i.e.:
> That's great for video. But when processing audio, coded_frame->pts is consistently junk, always valued -9223372036854775808. So my question is after encoding, if the avcodec_encode_audio2 indeed returns a packet how to access the pts for the encoded frame (or in the case of audio, it would probably be more proper to say the encoded samples)? Again, given that the packet returned by the encoder isn't necessarily the one that was encoded, we need the pts of the packet when it was encoded.
> How do I access this value?
I can't address everything here, but I can contribute a couple things:
-9223372036854775808 is not technically "junk". It is AV_NOPTS_VALUE,
which is a deliberately chosen value with a specific meaning: there is
no pts. It is the minimum value of a signed 64-bit integer. Over the
years I've learned to recognize -2^31, but -2^63 is a more recent
animal... so I have missed this myself ;)
What I've been doing to get timestamps of encoded frames in the face of
codec caching is maintain a timestamp queue. You push the pts of your
frame onto the queue each time you encode a frame; you only pop a
timestamp off if you got an encoded frame back. The first frame out of
the codec has to be the first one you put in, and it will match first
timestamp popped off the queue. Unless the codec is eating frames, the
timestamps you pop off the queue _must_ match the frames you're getting
out of the codec. Pseudocode is something like:
encoded_frame_pts = tsQueue.pop();
// do whatever with the encoded frame and its pts
The queue will naturally size itself to the latency of the codec, so you
can "autodetect" the latency of codecs by checking the size of your
timestamp queue. If there is no latency, you'll just be popping off the
timestamp immediately after pushing it on, which might be seen as a
waste of time, but the technique still works.
More information about the Libav-user