[PATCH] xiph packetizer
Hi guys, Here is a first attempt at a RTP packetizer for the Xiph codecs. This is still a work in progress; Vorbis likes to crash when rescaling the timebase, and the Theora output only vaguely resembles what it should. (There are some unrelated changes in there, particularly to the depacketizer. I will separate those later.) The only major architectural issue here is that the configuration identifier (here hard-coded to 0xfedcba) needs to be carried across from the SDP to the packetizer proper. Right now I don't believe there is a clean way to do that without adding another field to RTPMuxContext, a generic priv_data or something. The "best" way, IMO, would be to structure RTP packetization in a way similar to how depacketization is done -- through dynamic handlers. We might be able to get away with a hard-coded configuration identifier, but I don't know yet if my samples will periodically emit in-band extradata. (The RFC provisions for this.) In that case, the config ID needs to be refreshed, and kept across subsequent packets. Josh
On Wed, 21 Jul 2010, Josh Allmann wrote:
Here is a first attempt at a RTP packetizer for the Xiph codecs. This is still a work in progress; Vorbis likes to crash when rescaling the timebase, and the Theora output only vaguely resembles what it should. (There are some unrelated changes in there, particularly to the depacketizer. I will separate those later.)
Umm, it seems the patch is missing the rtpenc_xiph.c file... // Martin
On 22 July 2010 02:35, Martin Storsjö <martin@martin.st> wrote:
On Wed, 21 Jul 2010, Josh Allmann wrote:
Here is a first attempt at a RTP packetizer for the Xiph codecs. This is still a work in progress; Vorbis likes to crash when rescaling the timebase, and the Theora output only vaguely resembles what it should. (There are some unrelated changes in there, particularly to the depacketizer. I will separate those later.)
Umm, it seems the patch is missing the rtpenc_xiph.c file...
My bad, fixed. This version works better, but is not yet complete. -The scaffolding for Vorbis is mostly up, but I have to set its SDP properly. -Theora video has problems with I-frames (or whatever the Theora equivalent is), and exhibits serious blocking in areas of motion. I hope this is due to some invalid reads that Valgrind complains about. -Packing multiple frames in a single packet is another TODO. I will nail all of these tomorrow. Josh
On 07/26/2010 12:05 PM, Josh Allmann wrote:
+ +/** + * Packetize Xiph frames into RTP according to + * RFC 5215 (Vorbis) and the Theora RFC draft. + * (http://svn.xiph.org/trunk/theora/doc/draft-ietf-avt-rtp-theora-00.txt) + */
the draft-ietf-avt-rtp-theora-00.txt isn't exactly up to date... (yes, my fault) lu -- Luca Barbato Gentoo/linux http://dev.gentoo.org/~lu_zero
On 07/26/2010 12:05 PM, Josh Allmann wrote:
+ uint8_t comment[] = { + ' ','h', 'i','r','o',' ','n','a','k','a','m','u','r','a' + };
You should try to grab the comment from the sdp or just leave it empty ^^; lu -- Luca Barbato Gentoo/linux http://dev.gentoo.org/~lu_zero
Hi, On Mon, Jul 26, 2010 at 6:05 AM, Josh Allmann <joshua.allmann@gmail.com> wrote: [..]
@@ -154,6 +156,11 @@ static int rtp_write_header(AVFormatContext *s1) } case CODEC_ID_AAC: s->num_frames = 0; + case CODEC_ID_VORBIS: + case CODEC_ID_THEORA: + if (!s->max_frames_per_packet || s->max_frames_per_packet > 15) + s->max_frames_per_packet = 15;
av_clip()? Ronald
On Mon, 26 Jul 2010, Ronald S. Bultje wrote:
Hi,
On Mon, Jul 26, 2010 at 6:05 AM, Josh Allmann <joshua.allmann@gmail.com> wrote: [..]
@@ -154,6 +156,11 @@ static int rtp_write_header(AVFormatContext *s1) } case CODEC_ID_AAC: s->num_frames = 0; + case CODEC_ID_VORBIS: + case CODEC_ID_THEORA: + if (!s->max_frames_per_packet || s->max_frames_per_packet > 15) + s->max_frames_per_packet = 15;
av_clip()?
Umm, no, that's not what he's doing. But something like this could be useful: if (!max) max = 15; max = av_clip(max, 1, 15); Since you want both the default-initialization to a sane value, but still want to restrict the possible values to a certain range. (This way, you won't be able to set negative values.) // Martin
On Mon, 26 Jul 2010, Josh Allmann wrote:
This version works better, but is not yet complete.
-The scaffolding for Vorbis is mostly up, but I have to set its SDP properly. -Theora video has problems with I-frames (or whatever the Theora equivalent is), and exhibits serious blocking in areas of motion. I hope this is due to some invalid reads that Valgrind complains about. -Packing multiple frames in a single packet is another TODO.
A few comments: +void ff_rtp_send_xiph(AVFormatContext *s1, const uint8_t *buff, int size) +{ + RTPMuxContext *s = s1->priv_data; + int max_pkt_size, xdt, frag; + uint8_t *q; + + max_pkt_size = s->max_payload_size; + + /* set xiph data type */ + switch (*buff) { + case 0x01: // vorbis id + case 0x05: // vorbis setup + case 0x80: // theora header + case 0x82: // theora tables + xdt = 1; // packed config payload + case 0x03: // vorbis comments + case 0x81: // theora comments + xdt = 2; // comment payload + default: + xdt = 0; // raw data payload + } I guess you want break statements in the switch, too... + /* set ident + * Probably need a non-fixed way of generating + * this, but it has to be done in SDP and passed in from there. */ + q = s->buf; + *q++ = 0xfe; + *q++ = 0xcd; + *q++ = 0xba; I haven't read the specs, but what's the role of this ident code? Is there any harm in having it hardcoded to a specific value? Is it set in the original stream data somewhere, so that you'd have to parse out the correct value from there? Or is it only used to distinguish streams if you have more than one vorbis/theora stream in the same presentation? In that case, you could use e.g. one hardcoded value for vorbis and another for theora - that would probably be enough for some time at lesat. // Martin
On 07/27/2010 12:03 PM, Martin Storsjö wrote:
+ /* set ident + * Probably need a non-fixed way of generating + * this, but it has to be done in SDP and passed in from there. */ + q = s->buf; + *q++ = 0xfe; + *q++ = 0xcd; + *q++ = 0xba;
I haven't read the specs, but what's the role of this ident code? Is there any harm in having it hardcoded to a specific value? Is it set in the original stream data somewhere, so that you'd have to parse out the correct value from there? Or is it only used to distinguish streams if you have more than one vorbis/theora stream in the same presentation? In that case, you could use e.g. one hardcoded value for vorbis and another for theora - that would probably be enough for some time at lesat.
given that we don't have any way to change the extradata and it is needed just to make sure the packets we are receiving haven't changed the extradata is some way. That had been mandated by loud people wanting to support chained vorbis the same way they did in ogg iirc, we can safely hardcode it. lu -- Luca Barbato Gentoo/linux http://dev.gentoo.org/~lu_zero
Hi, On 27 July 2010 03:03, Martin Storsjö <martin@martin.st> wrote:
On Mon, 26 Jul 2010, Josh Allmann wrote:
This version works better, but is not yet complete.
-The scaffolding for Vorbis is mostly up, but I have to set its SDP properly.
Done.
-Theora video has problems with I-frames (or whatever the Theora equivalent is), and exhibits serious blocking in areas of motion.
Fixed, although on occassion it is a bit wonky. The depacketizer will complain about a missing start fragment at a particular position in big buck bunny, disrupting the video briefly, but otherwise it is fine.
I hope this is due to some invalid reads that Valgrind complains about.
Valgrind fixed.
-Packing multiple frames in a single packet is another TODO.
Not yet complete, but is not critical for proper operation, either.
A few comments:
+ /* set xiph data type */ + switch (*buff) { + case 0x01: // vorbis id + case 0x05: // vorbis setup + case 0x80: // theora header + case 0x82: // theora tables + xdt = 1; // packed config payload + case 0x03: // vorbis comments + case 0x81: // theora comments + xdt = 2; // comment payload + default: + xdt = 0; // raw data payload + }
I guess you want break statements in the switch, too...
Fixed.
+ /* set ident + * Probably need a non-fixed way of generating + * this, but it has to be done in SDP and passed in from there. */ + q = s->buf; + *q++ = 0xfe; + *q++ = 0xcd; + *q++ = 0xba;
I haven't read the specs, but what's the role of this ident code? Is there any harm in having it hardcoded to a specific value? Is it set in the original stream data somewhere, so that you'd have to parse out the correct value from there? Or is it only used to distinguish streams if you have more than one vorbis/theora stream in the same presentation? In that case, you could use e.g. one hardcoded value for vorbis and another for theora - that would probably be enough for some time at lesat.
As Luca said, it is only used to make sure the extradata doesn't change mid-stream. Different streams can have the same ident, eg a Vorbis and a Theora can share 0xfecdba. Our depacketizer doesn't handle changing the ident anyway. Revised patch attached. I also had to enlarge the outgoing RTSP buffer to handle the SDP extradata. Josh
On 27 July 2010 15:28, Josh Allmann <joshua.allmann@gmail.com> wrote:
Hi,
On 27 July 2010 03:03, Martin Storsjö <martin@martin.st> wrote:
On Mon, 26 Jul 2010, Josh Allmann wrote:
This version works better, but is not yet complete.
-The scaffolding for Vorbis is mostly up, but I have to set its SDP properly.
Done.
-Theora video has problems with I-frames (or whatever the Theora equivalent is), and exhibits serious blocking in areas of motion.
Fixed, although on occassion it is a bit wonky. The depacketizer will complain about a missing start fragment at a particular position in big buck bunny, disrupting the video briefly, but otherwise it is fine.
I hope this is due to some invalid reads that Valgrind complains about.
Valgrind fixed.
-Packing multiple frames in a single packet is another TODO.
Not yet complete, but is not critical for proper operation, either.
A few comments:
+ /* set xiph data type */ + switch (*buff) { + case 0x01: // vorbis id + case 0x05: // vorbis setup + case 0x80: // theora header + case 0x82: // theora tables + xdt = 1; // packed config payload + case 0x03: // vorbis comments + case 0x81: // theora comments + xdt = 2; // comment payload + default: + xdt = 0; // raw data payload + }
I guess you want break statements in the switch, too...
Fixed.
+ /* set ident + * Probably need a non-fixed way of generating + * this, but it has to be done in SDP and passed in from there. */ + q = s->buf; + *q++ = 0xfe; + *q++ = 0xcd; + *q++ = 0xba;
I haven't read the specs, but what's the role of this ident code? Is there any harm in having it hardcoded to a specific value? Is it set in the original stream data somewhere, so that you'd have to parse out the correct value from there? Or is it only used to distinguish streams if you have more than one vorbis/theora stream in the same presentation? In that case, you could use e.g. one hardcoded value for vorbis and another for theora - that would probably be enough for some time at lesat.
As Luca said, it is only used to make sure the extradata doesn't change mid-stream. Different streams can have the same ident, eg a Vorbis and a Theora can share 0xfecdba. Our depacketizer doesn't handle changing the ident anyway.
Revised patch attached. I also had to enlarge the outgoing RTSP buffer to handle the SDP extradata.
Re-sending, due to this hunk: @@ -154,6 +156,11 @@ static int rtp_write_header(AVFormatContext *s1) } case CODEC_ID_AAC: s->num_frames = 0; + case CODEC_ID_VORBIS: + case CODEC_ID_THEORA: + if(!s->max_frames_per_packet) s->max_frames_per_packet = 15; + s->max_frames_per_packet = av_clip(s->max_frames_per_packet, 1, 15); + s->max_payload_size -= 6; // ident+frag+tdt/vdt+pkt_num+pkt_length default: if (st->codec->codec_type == AVMEDIA_TYPE_AUDIO) { av_set_pts_info(st, 32, 1, st->codec->sample_rate); Apparently AAC and AMR both fall through to the default case. Whether that's intentional, I don't know, so I moved my stuff to minimize behavioral changes to existing code. Josh
On Tue, 27 Jul 2010, Josh Allmann wrote:
Re-sending, due to this hunk:
@@ -154,6 +156,11 @@ static int rtp_write_header(AVFormatContext *s1) } case CODEC_ID_AAC: s->num_frames = 0; + case CODEC_ID_VORBIS: + case CODEC_ID_THEORA: + if(!s->max_frames_per_packet) s->max_frames_per_packet = 15; + s->max_frames_per_packet = av_clip(s->max_frames_per_packet, 1, 15); + s->max_payload_size -= 6; // ident+frag+tdt/vdt+pkt_num+pkt_length default: if (st->codec->codec_type == AVMEDIA_TYPE_AUDIO) { av_set_pts_info(st, 32, 1, st->codec->sample_rate);
Apparently AAC and AMR both fall through to the default case. Whether that's intentional, I don't know, so I moved my stuff to minimize behavioral changes to existing code.
Yes, that's quite intentional. Unfortunately, with that design in the switch, it's quite hard to add yet another codec to share the fallback but with non-shared codec specific code, as you noticed. // Martin
On Tue, 27 Jul 2010, Josh Allmann wrote:
@@ -135,6 +137,12 @@ static int rtp_write_header(AVFormatContext *s1) s->nal_length_size = (st->codec->extradata[4] & 0x03) + 1; } break; + case CODEC_ID_VORBIS: + case CODEC_ID_THEORA: + if(!s->max_frames_per_packet) s->max_frames_per_packet = 15; + s->max_frames_per_packet = av_clip(s->max_frames_per_packet, 1, 15); + s->max_payload_size -= 6; // ident+frag+tdt/vdt+pkt_num+pkt_length + break;
You do want the default case handling (setting the audio sample rate as stream time base) for vorbis. One way of doing that is adding a defaultcase: directly after the default: line, and replacing the break; with a goto defaultcase; here.
+ /* set ident + * Probably need a non-fixed way of generating + * this, but it has to be done in SDP and passed in from there. */ + q = s->buf; + *q++ = 0xfe; + *q++ = 0xcd; + *q++ = 0xba;
Please add a note here saying that this must match what's in sdp.c.
+ s->buf_ptr = q; + + /* set fragment + * 0 - whole frame (possibly multiple frames) + * 1 - first fragment + * 2 - fragment continuation + * 3 - last fragmement */ + frag = size <= max_pkt_size ? 0 : 1;
You must readd the s->timestamp = s->cur_timestamp line that you removed. s->cur_timestamp gets set to the timestamp of the current AVPacket in rtp_write_packet, while ff_rtp_send_data only uses s->timestamp. If you don't set s->timestamp at all, all packets will be sent with the same timestamp. This value isn't copied directly, since if you bundle many AVPackets into the same RTP packet, you most often want the timestamp of the first packet, not the last.
@@ -872,7 +872,7 @@ int ff_rtsp_send_cmd_with_content_async(AVFormatContext *s, int send_content_length) { RTSPState *rt = s->priv_data; - char buf[4096], *out_buf; + char buf[16384], *out_buf; // large buffer to accommodate xiph sdp char base64buf[AV_BASE64_SIZE(sizeof(buf))];
You don't need to enlarge this buffer - the body data of the request never gets stuffed into this buffer. (If we'd want to support HTTP tunneling in the muxer, too, we'd might want to stuff it in here, though, or allocate this buffer dynamically.)
@@ -1295,7 +1295,7 @@ static int rtsp_setup_output_streams(AVFormatContext *s, const char *addr) rt->start_time = av_gettime();
/* Announce the stream */ - sdp = av_mallocz(8192); + sdp = av_mallocz(16384); // massive SDP buffer due to Xiph extradata if (sdp == NULL) return AVERROR(ENOMEM); /* We create the SDP based on the RTSP AVFormatContext where we @@ -1314,7 +1314,7 @@ static int rtsp_setup_output_streams(AVFormatContext *s, const char *addr) ff_url_join(sdp_ctx.filename, sizeof(sdp_ctx.filename), "rtsp", NULL, addr, -1, NULL); ctx_array[0] = &sdp_ctx; - if (avf_sdp_create(ctx_array, 1, sdp, 8192)) { + if (avf_sdp_create(ctx_array, 1, sdp, 16384)) { av_free(sdp); return AVERROR_INVALIDDATA; }
I fixed this to use a define now, you want to rebase this on top of that and adjust that define instead.
+ config[0] = config[1] = config[2] = 0; + config[3] = 1; + config[4] = 0xfe; + config[5] = 0xcd; + config[6] = 0xba;
Same here, add a note here saying that this must match what's in rtpenc_xiph.c.
+ av_strlcatf(buff, size, "a=rtpmap:%d theora/9000\r\n"
This should be 90000, not 9000. And sorry for quoting so little for each of the review points, I realized that it's quite hard to follow now. Ohwell, I hope you find which parts I refer to. With the changes I outlined here, I got it to work quite well (after fixing DSS to handle larger SDP sizes). Once in the test stream I used, I got this message from the depacketizer, though: [rtsp @ 0x102049000] Unimplemented RTP Xiph packet settings (1,2,0) [rtsp @ 0x102049000] Unimplemented RTP Xiph packet settings (2,2,0) [rtsp @ 0x102049000] Unimplemented RTP Xiph packet settings (3,2,0) Also, there's an issue with vorbis timestamps if you stream copy from an ogg file - not all packets have pts/dts set when reading them, and for vorbis, you'd need to decode it to figure out the proper length of each frame. In ffmpeg.c, lines 1622-1623, the pts is updated using an estimate of the frame size, which doesn't turn out to be right (in my case at least). If playing only vorbis over RTSP/RTP, it still worked, but the timestamps advanced slowly from 0 to perhaps 1, then jumped to 5-6 seconds, advanced slowly there again, and jumped onwards when the ogg demuxer returned a packet that actually had a timestamp. Combined with a video stream, it doesn't work too well in that setup of course. If live transcoding into theora/vorbis, and sending it out over RTSP/RTP, it seemed to work really well, though! // Martin
Hi, Thanks for the thorough review! On 28 July 2010 02:56, Martin Storsjö <martin@martin.st> wrote:
On Tue, 27 Jul 2010, Josh Allmann wrote:
@@ -135,6 +137,12 @@ static int rtp_write_header(AVFormatContext *s1) s->nal_length_size = (st->codec->extradata[4] & 0x03) + 1; } break; + case CODEC_ID_VORBIS: + case CODEC_ID_THEORA: + if(!s->max_frames_per_packet) s->max_frames_per_packet = 15; + s->max_frames_per_packet = av_clip(s->max_frames_per_packet, 1, 15); + s->max_payload_size -= 6; // ident+frag+tdt/vdt+pkt_num+pkt_length + break;
You do want the default case handling (setting the audio sample rate as stream time base) for vorbis. One way of doing that is adding a defaultcase: directly after the default: line, and replacing the break; with a goto defaultcase; here.
Fixed.
+ /* set ident + * Probably need a non-fixed way of generating + * this, but it has to be done in SDP and passed in from there. */ + q = s->buf; + *q++ = 0xfe; + *q++ = 0xcd; + *q++ = 0xba;
Please add a note here saying that this must match what's in sdp.c.
Done.
+ s->buf_ptr = q; + + /* set fragment + * 0 - whole frame (possibly multiple frames) + * 1 - first fragment + * 2 - fragment continuation + * 3 - last fragmement */ + frag = size <= max_pkt_size ? 0 : 1;
You must readd the s->timestamp = s->cur_timestamp line that you removed. s->cur_timestamp gets set to the timestamp of the current AVPacket in rtp_write_packet, while ff_rtp_send_data only uses s->timestamp. If you don't set s->timestamp at all, all packets will be sent with the same timestamp.
This value isn't copied directly, since if you bundle many AVPackets into the same RTP packet, you most often want the timestamp of the first packet, not the last.
Done.
@@ -872,7 +872,7 @@ int ff_rtsp_send_cmd_with_content_async(AVFormatContext *s, int send_content_length) { RTSPState *rt = s->priv_data; - char buf[4096], *out_buf; + char buf[16384], *out_buf; // large buffer to accommodate xiph sdp char base64buf[AV_BASE64_SIZE(sizeof(buf))];
You don't need to enlarge this buffer - the body data of the request never gets stuffed into this buffer. (If we'd want to support HTTP tunneling in the muxer, too, we'd might want to stuff it in here, though, or allocate this buffer dynamically.)
Removed.
@@ -1295,7 +1295,7 @@ static int rtsp_setup_output_streams(AVFormatContext *s, const char *addr) rt->start_time = av_gettime();
/* Announce the stream */ - sdp = av_mallocz(8192); + sdp = av_mallocz(16384); // massive SDP buffer due to Xiph extradata if (sdp == NULL) return AVERROR(ENOMEM); /* We create the SDP based on the RTSP AVFormatContext where we @@ -1314,7 +1314,7 @@ static int rtsp_setup_output_streams(AVFormatContext *s, const char *addr) ff_url_join(sdp_ctx.filename, sizeof(sdp_ctx.filename), "rtsp", NULL, addr, -1, NULL); ctx_array[0] = &sdp_ctx; - if (avf_sdp_create(ctx_array, 1, sdp, 8192)) { + if (avf_sdp_create(ctx_array, 1, sdp, 16384)) { av_free(sdp); return AVERROR_INVALIDDATA; }
I fixed this to use a define now, you want to rebase this on top of that and adjust that define instead.
Done.
+ config[0] = config[1] = config[2] = 0; + config[3] = 1; + config[4] = 0xfe; + config[5] = 0xcd; + config[6] = 0xba;
Same here, add a note here saying that this must match what's in rtpenc_xiph.c.
Done.
+ av_strlcatf(buff, size, "a=rtpmap:%d theora/9000\r\n"
This should be 90000, not 9000.
Done.
Once in the test stream I used, I got this message from the depacketizer, though:
[rtsp @ 0x102049000] Unimplemented RTP Xiph packet settings (1,2,0) [rtsp @ 0x102049000] Unimplemented RTP Xiph packet settings (2,2,0) [rtsp @ 0x102049000] Unimplemented RTP Xiph packet settings (3,2,0)
Interesting, that indicates your sample has a comment embedded in it. IIRC, the Theora decoder ignores comments anyway, but I'm not sure about Vorbis.
Also, there's an issue with vorbis timestamps if you stream copy from an ogg file - not all packets have pts/dts set when reading them, and for vorbis, you'd need to decode it to figure out the proper length of each frame. In ffmpeg.c, lines 1622-1623, the pts is updated using an estimate of the frame size, which doesn't turn out to be right (in my case at least).
If playing only vorbis over RTSP/RTP, it still worked, but the timestamps advanced slowly from 0 to perhaps 1, then jumped to 5-6 seconds, advanced slowly there again, and jumped onwards when the ogg demuxer returned a packet that actually had a timestamp. Combined with a video stream, it doesn't work too well in that setup of course.
If live transcoding into theora/vorbis, and sending it out over RTSP/RTP, it seemed to work really well, though!
That's pretty strange, I would like to see your sample. With Big Buck Bunny, Vorbis standalone timestamps are fairly linear. Using Theora and Vorbis together with RTSP/RTP, things work well. Theora standalone is still full of artifacts around motion with TCP, and UDP is a slideshow because of all the dropped packets. Tomorrow, I'll copy over the packetization routine from Feng and see if the results are more sane. Josh
On Thu, 29 Jul 2010, Josh Allmann wrote:
Also, there's an issue with vorbis timestamps if you stream copy from an ogg file - not all packets have pts/dts set when reading them, and for vorbis, you'd need to decode it to figure out the proper length of each frame. In ffmpeg.c, lines 1622-1623, the pts is updated using an estimate of the frame size, which doesn't turn out to be right (in my case at least).
If playing only vorbis over RTSP/RTP, it still worked, but the timestamps advanced slowly from 0 to perhaps 1, then jumped to 5-6 seconds, advanced slowly there again, and jumped onwards when the ogg demuxer returned a packet that actually had a timestamp. Combined with a video stream, it doesn't work too well in that setup of course.
If live transcoding into theora/vorbis, and sending it out over RTSP/RTP, it seemed to work really well, though!
That's pretty strange, I would like to see your sample. With Big Buck Bunny, Vorbis standalone timestamps are fairly linear.
Can you give an url to this test clip?
Using Theora and Vorbis together with RTSP/RTP, things work well. Theora standalone is still full of artifacts around motion with TCP, and UDP is a slideshow because of all the dropped packets. Tomorrow, I'll copy over the packetization routine from Feng and see if the results are more sane.
Hmm, that sounds really strange. Can you give the command lines that you've used for these test setups? // Martin
On 29 July 2010 04:54, Martin Storsjö <martin@martin.st> wrote:
On Thu, 29 Jul 2010, Josh Allmann wrote:
Also, there's an issue with vorbis timestamps if you stream copy from an ogg file - not all packets have pts/dts set when reading them, and for vorbis, you'd need to decode it to figure out the proper length of each frame. In ffmpeg.c, lines 1622-1623, the pts is updated using an estimate of the frame size, which doesn't turn out to be right (in my case at least).
If playing only vorbis over RTSP/RTP, it still worked, but the timestamps advanced slowly from 0 to perhaps 1, then jumped to 5-6 seconds, advanced slowly there again, and jumped onwards when the ogg demuxer returned a packet that actually had a timestamp. Combined with a video stream, it doesn't work too well in that setup of course.
If live transcoding into theora/vorbis, and sending it out over RTSP/RTP, it seemed to work really well, though!
That's pretty strange, I would like to see your sample. With Big Buck Bunny, Vorbis standalone timestamps are fairly linear.
Can you give an url to this test clip?
http://mirrorblender.top-ix.org/peach/bigbuckbunny_movies/big_buck_bunny_480...
Using Theora and Vorbis together with RTSP/RTP, things work well. Theora standalone is still full of artifacts around motion with TCP, and UDP is a slideshow because of all the dropped packets. Tomorrow, I'll copy over the packetization routine from Feng and see if the results are more sane.
Hmm, that sounds really strange. Can you give the command lines that you've used for these test setups?
ffmpeg -re -i acodec copy -vcodec copy -f rtsp rtsp://localhost/foo.sdp also with -an, -vn, and ?tcp variants. ffplay rtsp://localhost/foo.sdp Josh
On Thu, 29 Jul 2010, Josh Allmann wrote:
On 29 July 2010 04:54, Martin Storsjö <martin@martin.st> wrote:
On Thu, 29 Jul 2010, Josh Allmann wrote:
Also, there's an issue with vorbis timestamps if you stream copy from an ogg file - not all packets have pts/dts set when reading them, and for vorbis, you'd need to decode it to figure out the proper length of each frame. In ffmpeg.c, lines 1622-1623, the pts is updated using an estimate of the frame size, which doesn't turn out to be right (in my case at least).
If playing only vorbis over RTSP/RTP, it still worked, but the timestamps advanced slowly from 0 to perhaps 1, then jumped to 5-6 seconds, advanced slowly there again, and jumped onwards when the ogg demuxer returned a packet that actually had a timestamp. Combined with a video stream, it doesn't work too well in that setup of course.
If live transcoding into theora/vorbis, and sending it out over RTSP/RTP, it seemed to work really well, though!
That's pretty strange, I would like to see your sample. With Big Buck Bunny, Vorbis standalone timestamps are fairly linear.
Can you give an url to this test clip?
http://mirrorblender.top-ix.org/peach/bigbuckbunny_movies/big_buck_bunny_480...
Yes, this one seems to work just fine. I think this comes from the way that many individual packets are stored in one ogg page(?) or something - I think the libavformat ogg muxer changed to store many packets per page not too long ago. That would explain the difference.
Using Theora and Vorbis together with RTSP/RTP, things work well. Theora standalone is still full of artifacts around motion with TCP, and UDP is a slideshow because of all the dropped packets. Tomorrow, I'll copy over the packetization routine from Feng and see if the results are more sane.
Hmm, that sounds really strange. Can you give the command lines that you've used for these test setups?
ffmpeg -re -i acodec copy -vcodec copy -f rtsp rtsp://localhost/foo.sdp
also with -an, -vn, and ?tcp variants.
ffplay rtsp://localhost/foo.sdp
Ok, I'll see if I can reproduce your problems in a while. // Martin
On Thu, 29 Jul 2010, Josh Allmann wrote:
On 29 July 2010 04:54, Martin Storsjö <martin@martin.st> wrote:
On Thu, 29 Jul 2010, Josh Allmann wrote:
Also, there's an issue with vorbis timestamps if you stream copy from an ogg file - not all packets have pts/dts set when reading them, and for vorbis, you'd need to decode it to figure out the proper length of each frame. In ffmpeg.c, lines 1622-1623, the pts is updated using an estimate of the frame size, which doesn't turn out to be right (in my case at least).
If playing only vorbis over RTSP/RTP, it still worked, but the timestamps advanced slowly from 0 to perhaps 1, then jumped to 5-6 seconds, advanced slowly there again, and jumped onwards when the ogg demuxer returned a packet that actually had a timestamp. Combined with a video stream, it doesn't work too well in that setup of course.
If live transcoding into theora/vorbis, and sending it out over RTSP/RTP, it seemed to work really well, though!
That's pretty strange, I would like to see your sample. With Big Buck Bunny, Vorbis standalone timestamps are fairly linear.
Can you give an url to this test clip?
http://mirrorblender.top-ix.org/peach/bigbuckbunny_movies/big_buck_bunny_480...
Hmm, ffplay has some weird issues with this file on OS X, but if hacked to request slightly larger YUV overlay buffers from SDL (with a width being a multiple of 16), it works fine.
Using Theora and Vorbis together with RTSP/RTP, things work well. Theora standalone is still full of artifacts around motion with TCP, and UDP is a slideshow because of all the dropped packets. Tomorrow, I'll copy over the packetization routine from Feng and see if the results are more sane.
Hmm, that sounds really strange. Can you give the command lines that you've used for these test setups?
ffmpeg -re -i acodec copy -vcodec copy -f rtsp rtsp://localhost/foo.sdp
also with -an, -vn, and ?tcp variants.
ffplay rtsp://localhost/foo.sdp
Hmm, that's weird, Theora works just fine over TCP for me. Over UDP, I get lots of artefacts due to dropped packets, though, but that's probably to be expected. Tested it with the latest patch you sent (and with the previous one with my local modifications). // Martin
On 29 July 2010 23:48, Martin Storsjö <martin@martin.st> wrote:
On Thu, 29 Jul 2010, Josh Allmann wrote:
http://mirrorblender.top-ix.org/peach/bigbuckbunny_movies/big_buck_bunny_480...
Hmm, ffplay has some weird issues with this file on OS X, but if hacked to request slightly larger YUV overlay buffers from SDL (with a width being a multiple of 16), it works fine.
Works well on linux.
Using Theora and Vorbis together with RTSP/RTP, things work well. Theora standalone is still full of artifacts around motion with TCP, and UDP is a slideshow because of all the dropped packets. Tomorrow, I'll copy over the packetization routine from Feng and see if the results are more sane.
Hmm, that sounds really strange. Can you give the command lines that you've used for these test setups?
ffmpeg -re -i acodec copy -vcodec copy -f rtsp rtsp://localhost/foo.sdp
also with -an, -vn, and ?tcp variants.
ffplay rtsp://localhost/foo.sdp
Hmm, that's weird, Theora works just fine over TCP for me. Over UDP, I get lots of artefacts due to dropped packets, though, but that's probably to be expected. Tested it with the latest patch you sent (and with the previous one with my local modifications).
I don't know exactly what I did, but this round adds in support for multiple Xiph frames per packet, and the tcp issue is gone. Pebkac most likely. The r23231 bug with ogg stream copy/Vorbis pts is still there, but that's outside the scope of this patch. Josh
On Mon, 2 Aug 2010, Josh Allmann wrote:
On 29 July 2010 23:48, Martin Storsjö <martin@martin.st> wrote:
On Thu, 29 Jul 2010, Josh Allmann wrote:
http://mirrorblender.top-ix.org/peach/bigbuckbunny_movies/big_buck_bunny_480...
Hmm, ffplay has some weird issues with this file on OS X, but if hacked to request slightly larger YUV overlay buffers from SDL (with a width being a multiple of 16), it works fine.
Works well on linux.
Using Theora and Vorbis together with RTSP/RTP, things work well. Theora standalone is still full of artifacts around motion with TCP, and UDP is a slideshow because of all the dropped packets. Tomorrow, I'll copy over the packetization routine from Feng and see if the results are more sane.
Hmm, that sounds really strange. Can you give the command lines that you've used for these test setups?
ffmpeg -re -i acodec copy -vcodec copy -f rtsp rtsp://localhost/foo.sdp
also with -an, -vn, and ?tcp variants.
ffplay rtsp://localhost/foo.sdp
Hmm, that's weird, Theora works just fine over TCP for me. Over UDP, I get lots of artefacts due to dropped packets, though, but that's probably to be expected. Tested it with the latest patch you sent (and with the previous one with my local modifications).
I don't know exactly what I did, but this round adds in support for multiple Xiph frames per packet, and the tcp issue is gone. Pebkac most likely.
Hmm, weird. Now I'm getting some issues with vorbis audio with this patch. I'll debug it and see what's going wrong in a while - it may just as well be something in my setup, but I suspect something with multiple frames per packet.
The r23231 bug with ogg stream copy/Vorbis pts is still there, but that's outside the scope of this patch.
Yes, that's out of scope. // Martin
On Tue, 3 Aug 2010, Martin Storsjö wrote:
On Mon, 2 Aug 2010, Josh Allmann wrote:
I don't know exactly what I did, but this round adds in support for multiple Xiph frames per packet, and the tcp issue is gone. Pebkac most likely.
Hmm, weird.
Now I'm getting some issues with vorbis audio with this patch. I'll debug it and see what's going wrong in a while - it may just as well be something in my setup, but I suspect something with multiple frames per packet.
Yes, it was issues with multiple frames per packet - in the depacketizer actually. How did you test this - it didn't seem to work at all for me? After the two patches I sent to -devel, it seems to work just fine, though. Except that, it seems to work in general. Relatively thorough review below:
@@ -135,6 +137,14 @@ static int rtp_write_header(AVFormatContext *s1) s->nal_length_size = (st->codec->extradata[4] & 0x03) + 1; } break; + case CODEC_ID_VORBIS: + case CODEC_ID_THEORA: + if (!s->max_frames_per_packet) s->max_frames_per_packet = 15; + s->max_frames_per_packet = av_clip(s->max_frames_per_packet, 1, 15); + s->max_payload_size -= 6; // ident+frag+tdt/vdt+pkt_num+pkt_length + s->num_frames = 0; + if (st->codec->codec_id == CODEC_ID_VORBIS) goto defaultcase; + break;
I think you could do this for both codecs - the default case sets buf_ptr, which needs to be set for theora too.
diff --git a/libavformat/rtpenc_xiph.c b/libavformat/rtpenc_xiph.c new file mode 100644 index 0000000..989354f --- /dev/null +++ b/libavformat/rtpenc_xiph.c @@ -0,0 +1,117 @@ +/* + * RTP packetization for Xiph audio and video + * Copyright (c) 2010 Josh Allmann + * + * This file is part of FFmpeg. + * + * FFmpeg is free software; you can redistribute it and/or + * modify it under the terms of the GNU Lesser General Public + * License as published by the Free Software Foundation; either + * version 2.1 of the License, or (at your option) any later version. + * + * FFmpeg is distributed in the hope that it will be useful, + * but WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU + * Lesser General Public License for more details. + * + * You should have received a copy of the GNU Lesser General Public + * License along with FFmpeg; if not, write to the Free Software + * Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA + */ + +#include "avformat.h" +#include "rtpenc.h" + +/** + * Packetize Xiph frames into RTP according to + * RFC 5215 (Vorbis) and the Theora RFC draft. + * (http://svn.xiph.org/trunk/theora/doc/draft-ietf-avt-rtp-theora-00.txt) + */ +void ff_rtp_send_xiph(AVFormatContext *s1, const uint8_t *buff, int size) +{ + RTPMuxContext *s = s1->priv_data; + int max_pkt_size, xdt, frag; + uint8_t *q; + + max_pkt_size = s->max_payload_size; + + /* set xiph data type */ + switch (*buff) { + case 0x01: // vorbis id + case 0x05: // vorbis setup + case 0x80: // theora header + case 0x82: // theora tables + xdt = 1; // packed config payload + break; + case 0x03: // vorbis comments + case 0x81: // theora comments + xdt = 2; // comment payload + break; + default: + xdt = 0; // raw data payload + }
The indentation of the case statements usually is at the same level as the switch statement itself in ffmpeg. Also, a break at the end of the default case wouldn't hurt I think.
+ + /* Set ident. Must match the one in sdp.c + * Probably need a non-fixed way of generating + * this, but it has to be done in SDP and passed in from there. */ + q = s->buf; + *q++ = 0xfe; + *q++ = 0xcd; + *q++ = 0xba; + + /* set fragment + * 0 - whole frame (possibly multiple frames) + * 1 - first fragment + * 2 - fragment continuation + * 3 - last fragmement */ + frag = size <= max_pkt_size ? 0 : 1; + + if (s->num_frames && (xdt || frag)) { + /* immediately send any buffered frames + * if buffer is not raw data, or if current frame is fragmented. */ + ff_rtp_send_data(s1, s->buf, s->buf_ptr - s->buf, 0); + }
You could move this if clause to below the if (!frag && !xdt), then this one would be only if (s->num_frames).
+ + if (!frag && !xdt) { // do we have a whole frame of raw data? + int remaining = max_pkt_size - ((int)(s->buf_ptr - s->buf) + size);
This calculation is slightly off - max_pkt_size already has the 6 bytes header subtracted, but those 6 bytes still are included in s->buf_ptr - s->buf. Also, it feels a bit convoluted. If you happen to have a packet of e.g. 1453, this forces the lines below to send the currently buffered data, even if there isn't any frame buffered (s->buf_ptr - s->buf is 3 bytes, and only contains the ident header).
+ if (remaining < 0 || s->num_frames >= s->max_frames_per_packet) { + /* send previous packets now; no room for new data */ + ff_rtp_send_data(s1, s->buf, s->buf_ptr - s->buf, 0); + s->num_frames = 0; + } + + /* buffer current frame to send later */ + if (0 == s->num_frames) s->timestamp = s->cur_timestamp; + s->num_frames++; + *q++ = s->num_frames; // set packet header
Some comment mentioning that frag and xdt should be or'ed in here, too, but omitted since they're both 0, wouuld help the readability.
+ if (s->num_frames > 1) q = s->buf_ptr; // jump ahead if needed + *q++ = (size >> 8) & 0xff; + *q++ = size & 0xff; + memmove(q, buff, size);
Why memmove? I don't think there's any theoretical case where the buffers could overlap?
+ q += size; + s->buf_ptr = q; + return; + } + + s->timestamp = s->cur_timestamp; + s->num_frames = 0; + s->buf_ptr = q; + while (size > 0) { + int len = (!frag || frag == 3) ? size : max_pkt_size; + q = s->buf_ptr; + + /* set packet headers */ + *q++ = (frag << 6) | (xdt << 4); + *q++ = (len >> 8) & 0xff; + *q++ = len & 0xff; + /* set packet body */ + memmove(q, buff, len);
Same here
+ q += len; + buff += len; + size -= len; + + ff_rtp_send_data(s1, s->buf, q - s->buf, 0); + + frag = size <= max_pkt_size ? 3 : 2; + } +} diff --git a/libavformat/rtsp.c b/libavformat/rtsp.c index 689ad29..1be3cd0 100644 --- a/libavformat/rtsp.c +++ b/libavformat/rtsp.c @@ -52,7 +52,7 @@ int rtsp_default_protocols = (1 << RTSP_LOWER_TRANSPORT_UDP); #define SELECT_TIMEOUT_MS 100 #define READ_PACKET_TIMEOUT_S 10 #define MAX_TIMEOUTS READ_PACKET_TIMEOUT_S * 1000 / SELECT_TIMEOUT_MS -#define SDP_MAX_SIZE 8192 +#define SDP_MAX_SIZE 16384
Is there any limit on the size of the extradata for theora/vorbis? Can we be reasonably sure this is enough for at least one theora stream plus one vorbis stream?
diff --git a/libavformat/sdp.c b/libavformat/sdp.c index b34b944..acd954a 100644 --- a/libavformat/sdp.c +++ b/libavformat/sdp.c @@ -21,6 +21,7 @@ #include <string.h> #include "libavutil/avstring.h" #include "libavutil/base64.h" +#include "libavcodec/xiph.h" #include "avformat.h" #include "internal.h" #include "avc.h" @@ -220,6 +221,68 @@ static char *extradata2config(AVCodecContext *c) return config; }
+static char *xiph_extradata2config(AVCodecContext *c) +{ + char *config, *encoded_config; + uint8_t *header_start[3]; + int headers_len, header_len[3], config_len; + int first_header_size; + + switch (c->codec_id) { + case CODEC_ID_THEORA: + first_header_size = 42; + break; + case CODEC_ID_VORBIS: + first_header_size = 30; + break; + default: + av_log(c, AV_LOG_ERROR, "Unsupported Xiph codec ID\n"); + return NULL; + } + + if (ff_split_xiph_headers(c->extradata, c->extradata_size, + first_header_size, header_start, + header_len) < 0) { + av_log(c, AV_LOG_ERROR, "Extradata corrupt.");
Add a newline to the log message
+ return NULL; + } + + headers_len = header_len[0]+header_len[2];
Some space around the + would make it nicer to read :-)
+ config_len = 4 + // count + 3 + // ident + 2 + // packet size + 1 + // header count + 2 + // header size + headers_len; // and the rest + config = av_malloc(config_len); + encoded_config = av_malloc(AV_BASE64_SIZE(config_len)); + + if (!config || !encoded_config) { + av_log(c, AV_LOG_ERROR, + "Not enough memory for configuration string\n");
If either of them was allocated, but not the other, you'd leak memory here
+ return NULL; + } +
// Martin
Hi, On 4 August 2010 04:55, Martin Storsjö <martin@martin.st> wrote:
On Tue, 3 Aug 2010, Martin Storsjö wrote:
On Mon, 2 Aug 2010, Josh Allmann wrote:
I don't know exactly what I did, but this round adds in support for multiple Xiph frames per packet, and the tcp issue is gone. Pebkac most likely.
Hmm, weird.
Now I'm getting some issues with vorbis audio with this patch. I'll debug it and see what's going wrong in a while - it may just as well be something in my setup, but I suspect something with multiple frames per packet.
Yes, it was issues with multiple frames per packet - in the depacketizer actually. How did you test this - it didn't seem to work at all for me? After the two patches I sent to -devel, it seems to work just fine, though.
Thanks for that. I won't rely on my mother to test audio anymore.
Except that, it seems to work in general. Relatively thorough review below:
@@ -135,6 +137,14 @@ static int rtp_write_header(AVFormatContext *s1) s->nal_length_size = (st->codec->extradata[4] & 0x03) + 1; } break; + case CODEC_ID_VORBIS: + case CODEC_ID_THEORA: + if (!s->max_frames_per_packet) s->max_frames_per_packet = 15; + s->max_frames_per_packet = av_clip(s->max_frames_per_packet, 1, 15); + s->max_payload_size -= 6; // ident+frag+tdt/vdt+pkt_num+pkt_length + s->num_frames = 0; + if (st->codec->codec_id == CODEC_ID_VORBIS) goto defaultcase; + break;
I think you could do this for both codecs - the default case sets buf_ptr, which needs to be set for theora too.
Fixed.
diff --git a/libavformat/rtpenc_xiph.c b/libavformat/rtpenc_xiph.c new file mode 100644 index 0000000..989354f --- /dev/null +++ b/libavformat/rtpenc_xiph.c @@ -0,0 +1,117 @@ +/* + * RTP packetization for Xiph audio and video + * Copyright (c) 2010 Josh Allmann + * + * This file is part of FFmpeg. + * + * FFmpeg is free software; you can redistribute it and/or + * modify it under the terms of the GNU Lesser General Public + * License as published by the Free Software Foundation; either + * version 2.1 of the License, or (at your option) any later version. + * + * FFmpeg is distributed in the hope that it will be useful, + * but WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU + * Lesser General Public License for more details. + * + * You should have received a copy of the GNU Lesser General Public + * License along with FFmpeg; if not, write to the Free Software + * Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA + */ + +#include "avformat.h" +#include "rtpenc.h" + +/** + * Packetize Xiph frames into RTP according to + * RFC 5215 (Vorbis) and the Theora RFC draft. + * (http://svn.xiph.org/trunk/theora/doc/draft-ietf-avt-rtp-theora-00.txt) + */ +void ff_rtp_send_xiph(AVFormatContext *s1, const uint8_t *buff, int size) +{ + RTPMuxContext *s = s1->priv_data; + int max_pkt_size, xdt, frag; + uint8_t *q; + + max_pkt_size = s->max_payload_size; + + /* set xiph data type */ + switch (*buff) { + case 0x01: // vorbis id + case 0x05: // vorbis setup + case 0x80: // theora header + case 0x82: // theora tables + xdt = 1; // packed config payload + break; + case 0x03: // vorbis comments + case 0x81: // theora comments + xdt = 2; // comment payload + break; + default: + xdt = 0; // raw data payload + }
The indentation of the case statements usually is at the same level as the switch statement itself in ffmpeg. Also, a break at the end of the default case wouldn't hurt I think.
Fixed.
+ + /* Set ident. Must match the one in sdp.c + * Probably need a non-fixed way of generating + * this, but it has to be done in SDP and passed in from there. */ + q = s->buf; + *q++ = 0xfe; + *q++ = 0xcd; + *q++ = 0xba; + + /* set fragment + * 0 - whole frame (possibly multiple frames) + * 1 - first fragment + * 2 - fragment continuation + * 3 - last fragmement */ + frag = size <= max_pkt_size ? 0 : 1; + + if (s->num_frames && (xdt || frag)) { + /* immediately send any buffered frames + * if buffer is not raw data, or if current frame is fragmented. */ + ff_rtp_send_data(s1, s->buf, s->buf_ptr - s->buf, 0); + }
You could move this if clause to below the if (!frag && !xdt), then this one would be only if (s->num_frames).
Fixed.
+ + if (!frag && !xdt) { // do we have a whole frame of raw data? + int remaining = max_pkt_size - ((int)(s->buf_ptr - s->buf) + size);
This calculation is slightly off - max_pkt_size already has the 6 bytes header subtracted, but those 6 bytes still are included in s->buf_ptr - s->buf. Also, it feels a bit convoluted.
If you happen to have a packet of e.g. 1453, this forces the lines below to send the currently buffered data, even if there isn't any frame buffered (s->buf_ptr - s->buf is 3 bytes, and only contains the ident header).
Hmm, OK. I tightened it up. The total header size is actually 4 + 2n, where n is the number of frames in the packet -- we need to prefix the length before each frame. In this case though, those additional headers are included in (s->buf_ptr - s->buf), but we do need to have room for an additional 2 bytes if the current frame is not the first of the packet. I am not 100% certain this fix is correct, so a second pair of eyes would be good. + int data_size = (s->buf_ptr - s->buf) + size; + int header_size = s->num_frames ? 8 : 6; // 6 bytes standard, 2 more bytes for length of extra frame + int remaining = max_pkt_size - data_size - header_size;
+ if (remaining < 0 || s->num_frames >= s->max_frames_per_packet) { + /* send previous packets now; no room for new data */ + ff_rtp_send_data(s1, s->buf, s->buf_ptr - s->buf, 0); + s->num_frames = 0; + } + + /* buffer current frame to send later */ + if (0 == s->num_frames) s->timestamp = s->cur_timestamp; + s->num_frames++; + *q++ = s->num_frames; // set packet header
Some comment mentioning that frag and xdt should be or'ed in here, too, but omitted since they're both 0, wouuld help the readability.
Done.
+ if (s->num_frames > 1) q = s->buf_ptr; // jump ahead if needed + *q++ = (size >> 8) & 0xff; + *q++ = size & 0xff; + memmove(q, buff, size);
Why memmove? I don't think there's any theoretical case where the buffers could overlap?
Not sure why, either. Fixed.
+ q += size; + s->buf_ptr = q; + return; + } + + s->timestamp = s->cur_timestamp; + s->num_frames = 0; + s->buf_ptr = q; + while (size > 0) { + int len = (!frag || frag == 3) ? size : max_pkt_size; + q = s->buf_ptr; + + /* set packet headers */ + *q++ = (frag << 6) | (xdt << 4); + *q++ = (len >> 8) & 0xff; + *q++ = len & 0xff; + /* set packet body */ + memmove(q, buff, len);
Same here
Fixed.
+ q += len; + buff += len; + size -= len; + + ff_rtp_send_data(s1, s->buf, q - s->buf, 0); + + frag = size <= max_pkt_size ? 3 : 2; + } +} diff --git a/libavformat/rtsp.c b/libavformat/rtsp.c index 689ad29..1be3cd0 100644 --- a/libavformat/rtsp.c +++ b/libavformat/rtsp.c @@ -52,7 +52,7 @@ int rtsp_default_protocols = (1 << RTSP_LOWER_TRANSPORT_UDP); #define SELECT_TIMEOUT_MS 100 #define READ_PACKET_TIMEOUT_S 10 #define MAX_TIMEOUTS READ_PACKET_TIMEOUT_S * 1000 / SELECT_TIMEOUT_MS -#define SDP_MAX_SIZE 8192 +#define SDP_MAX_SIZE 16384
Is there any limit on the size of the extradata for theora/vorbis? Can we be reasonably sure this is enough for at least one theora stream plus one vorbis stream?
I am assuming you mean one theora stream, OR one vorbis stream, not AND. sdp_parse in rtsp.c (line 397) claims 16KB max for the FMTP line, but I'm not sure what the provenance of that number is. I took a quick glance through the Theora and Vorbis bitstream specs, and couldn't find any hard figures for this. From empirical testing, the Theora extradata is usually a bit smaller than the Vorbis. Even then, I have yet to see (non-base64-encoded) sizes of more than 5KB. So we should be comfortably under this limit.
diff --git a/libavformat/sdp.c b/libavformat/sdp.c index b34b944..acd954a 100644 --- a/libavformat/sdp.c +++ b/libavformat/sdp.c @@ -21,6 +21,7 @@ #include <string.h> #include "libavutil/avstring.h" #include "libavutil/base64.h" +#include "libavcodec/xiph.h" #include "avformat.h" #include "internal.h" #include "avc.h" @@ -220,6 +221,68 @@ static char *extradata2config(AVCodecContext *c) return config; }
+static char *xiph_extradata2config(AVCodecContext *c) +{ + char *config, *encoded_config; + uint8_t *header_start[3]; + int headers_len, header_len[3], config_len; + int first_header_size; + + switch (c->codec_id) { + case CODEC_ID_THEORA: + first_header_size = 42; + break; + case CODEC_ID_VORBIS: + first_header_size = 30; + break; + default: + av_log(c, AV_LOG_ERROR, "Unsupported Xiph codec ID\n"); + return NULL; + } + + if (ff_split_xiph_headers(c->extradata, c->extradata_size, + first_header_size, header_start, + header_len) < 0) { + av_log(c, AV_LOG_ERROR, "Extradata corrupt.");
Add a newline to the log message
Fixed.
+ return NULL; + } + + headers_len = header_len[0]+header_len[2];
Some space around the + would make it nicer to read :-)
Fixed.
+ config_len = 4 + // count + 3 + // ident + 2 + // packet size + 1 + // header count + 2 + // header size + headers_len; // and the rest + config = av_malloc(config_len); + encoded_config = av_malloc(AV_BASE64_SIZE(config_len)); + + if (!config || !encoded_config) { + av_log(c, AV_LOG_ERROR, + "Not enough memory for configuration string\n");
If either of them was allocated, but not the other, you'd leak memory here
Fixed. Josh
On Thu, 5 Aug 2010, Josh Allmann wrote:
+ + if (!frag && !xdt) { // do we have a whole frame of raw data? + int remaining = max_pkt_size - ((int)(s->buf_ptr - s->buf) + size);
This calculation is slightly off - max_pkt_size already has the 6 bytes header subtracted, but those 6 bytes still are included in s->buf_ptr - s->buf. Also, it feels a bit convoluted.
If you happen to have a packet of e.g. 1453, this forces the lines below to send the currently buffered data, even if there isn't any frame buffered (s->buf_ptr - s->buf is 3 bytes, and only contains the ident header).
Hmm, OK. I tightened it up. The total header size is actually 4 + 2n, where n is the number of frames in the packet -- we need to prefix the length before each frame.
In this case though, those additional headers are included in (s->buf_ptr - s->buf), but we do need to have room for an additional 2 bytes if the current frame is not the first of the packet.
I am not 100% certain this fix is correct, so a second pair of eyes would be good.
I'll have another look in a while.
+ int data_size = (s->buf_ptr - s->buf) + size; + int header_size = s->num_frames ? 8 : 6; // 6 bytes standard, 2 more bytes for length of extra frame + int remaining = max_pkt_size - data_size - header_size;
diff --git a/libavformat/rtsp.c b/libavformat/rtsp.c index 689ad29..1be3cd0 100644 --- a/libavformat/rtsp.c +++ b/libavformat/rtsp.c @@ -52,7 +52,7 @@ int rtsp_default_protocols = (1 << RTSP_LOWER_TRANSPORT_UDP); #define SELECT_TIMEOUT_MS 100 #define READ_PACKET_TIMEOUT_S 10 #define MAX_TIMEOUTS READ_PACKET_TIMEOUT_S * 1000 / SELECT_TIMEOUT_MS -#define SDP_MAX_SIZE 8192 +#define SDP_MAX_SIZE 16384
Is there any limit on the size of the extradata for theora/vorbis? Can we be reasonably sure this is enough for at least one theora stream plus one vorbis stream?
I am assuming you mean one theora stream, OR one vorbis stream, not AND. sdp_parse in rtsp.c (line 397) claims 16KB max for the FMTP line, but I'm not sure what the provenance of that number is. I took a quick glance through the Theora and Vorbis bitstream specs, and couldn't find any hard figures for this. From empirical testing, the Theora extradata is usually a bit smaller than the Vorbis. Even then, I have yet to see (non-base64-encoded) sizes of more than 5KB. So we should be comfortably under this limit.
No, I ment AND - if you stream both and audio video, the description of both go into the same single SDP block, so it has to be large enough for both of them. The comment at line 397 in rtsp.c refers to one single SDP line, so if there are cases where a vorbis line actually could be almost 16 KB, we'd need yet a bit more space if we'd want to stream theora at the same time. I'll have a look at the rest later. // Martin
On 5 August 2010 01:44, Martin Storsjö <martin@martin.st> wrote:
On Thu, 5 Aug 2010, Josh Allmann wrote:
-#define SDP_MAX_SIZE 8192 +#define SDP_MAX_SIZE 16384
Is there any limit on the size of the extradata for theora/vorbis? Can we be reasonably sure this is enough for at least one theora stream plus one vorbis stream?
I am assuming you mean one theora stream, OR one vorbis stream, not AND. sdp_parse in rtsp.c (line 397) claims 16KB max for the FMTP line, but I'm not sure what the provenance of that number is. I took a quick glance through the Theora and Vorbis bitstream specs, and couldn't find any hard figures for this. From empirical testing, the Theora extradata is usually a bit smaller than the Vorbis. Even then, I have yet to see (non-base64-encoded) sizes of more than 5KB. So we should be comfortably under this limit.
No, I ment AND - if you stream both and audio video, the description of both go into the same single SDP block, so it has to be large enough for both of them. The comment at line 397 in rtsp.c refers to one single SDP line, so if there are cases where a vorbis line actually could be almost 16 KB, we'd need yet a bit more space if we'd want to stream theora at the same time.
I have yet to see a theora+vorbis stream take more than ~14KB of sdp. If we encounter a sample that takes up more, we can enlarge the buffer then. Josh
On Thu, 5 Aug 2010, Josh Allmann wrote:
@@ -135,6 +137,14 @@ static int rtp_write_header(AVFormatContext *s1) s->nal_length_size = (st->codec->extradata[4] & 0x03) + 1; } break; + case CODEC_ID_VORBIS: + case CODEC_ID_THEORA: + if (!s->max_frames_per_packet) s->max_frames_per_packet = 15; + s->max_frames_per_packet = av_clip(s->max_frames_per_packet, 1, 15); + s->max_payload_size -= 6; // ident+frag+tdt/vdt+pkt_num+pkt_length + s->num_frames = 0; + if (st->codec->codec_id == CODEC_ID_VORBIS) goto defaultcase; + break;
I think you could do this for both codecs - the default case sets buf_ptr, which needs to be set for theora too.
Fixed.
@@ -135,6 +137,14 @@ static int rtp_write_header(AVFormatContext *s1) s->nal_length_size = (st->codec->extradata[4] & 0x03) + 1; } break; + case CODEC_ID_VORBIS: + case CODEC_ID_THEORA: + if (!s->max_frames_per_packet) s->max_frames_per_packet = 15; + s->max_frames_per_packet = av_clip(s->max_frames_per_packet, 1, 15); + s->max_payload_size -= 6; // ident+frag+tdt/vdt+pkt_num+pkt_length + s->num_frames = 0; + goto defaultcase; + break;
In this case, when you have an unconditional goto, you could drop the break
+ + if (!frag && !xdt) { // do we have a whole frame of raw data? + int remaining = max_pkt_size - ((int)(s->buf_ptr - s->buf) + size);
This calculation is slightly off - max_pkt_size already has the 6 bytes header subtracted, but those 6 bytes still are included in s->buf_ptr - s->buf. Also, it feels a bit convoluted.
If you happen to have a packet of e.g. 1453, this forces the lines below to send the currently buffered data, even if there isn't any frame buffered (s->buf_ptr - s->buf is 3 bytes, and only contains the ident header).
Hmm, OK. I tightened it up. The total header size is actually 4 + 2n, where n is the number of frames in the packet -- we need to prefix the length before each frame.
In this case though, those additional headers are included in (s->buf_ptr - s->buf), but we do need to have room for an additional 2 bytes if the current frame is not the first of the packet.
I am not 100% certain this fix is correct, so a second pair of eyes would be good.
+ int data_size = (s->buf_ptr - s->buf) + size; + int header_size = s->num_frames ? 8 : 6; // 6 bytes standard, 2 more bytes for length of extra frame + int remaining = max_pkt_size - data_size - header_size;
Actually, the problem is that it's too tight. The check above, frag = size <= max_pkt_size ? 0 : 1; indicated that it should fit without fragmenting it, but this check says that it doesn't fit and you should send whatever you have buffered from before - even if there's no packet buffered from before. You _could_ add an "s->num_frames > 0 &&" part to the if clause, so you won't accidentally send empty packets. I think the calculation of whether it fits or not is clearer this way: // We're allowed to write 6 + max_pkt_size bytes from s->buf. // Calculate how much space is left after writing 2 length bytes // + size from buf_ptr. int remaining = s->buf + 6 + max_pkt_size - (s->buf_ptr + 2 + size); So, the s->buf + 6 + max_pkt_size could be stored away as an end_ptr or something, if you like - that makes it easier to check with the current s->buf_ptr if the things we're going to write (2 + size) fits or not. Writing it all in one expression like this is ok for me, too. Except for this, it looks quite good. // Martin
On 5 August 2010 02:21, Martin Storsjö <martin@martin.st> wrote:
On Thu, 5 Aug 2010, Josh Allmann wrote:
@@ -135,6 +137,14 @@ static int rtp_write_header(AVFormatContext *s1) s->nal_length_size = (st->codec->extradata[4] & 0x03) + 1; } break; + case CODEC_ID_VORBIS: + case CODEC_ID_THEORA: + if (!s->max_frames_per_packet) s->max_frames_per_packet = 15; + s->max_frames_per_packet = av_clip(s->max_frames_per_packet, 1, 15); + s->max_payload_size -= 6; // ident+frag+tdt/vdt+pkt_num+pkt_length + s->num_frames = 0; + goto defaultcase; + break;
In this case, when you have an unconditional goto, you could drop the break
Fixed.
+ + if (!frag && !xdt) { // do we have a whole frame of raw data? + int remaining = max_pkt_size - ((int)(s->buf_ptr - s->buf) + size);
This calculation is slightly off - max_pkt_size already has the 6 bytes header subtracted, but those 6 bytes still are included in s->buf_ptr - s->buf. Also, it feels a bit convoluted.
If you happen to have a packet of e.g. 1453, this forces the lines below to send the currently buffered data, even if there isn't any frame buffered (s->buf_ptr - s->buf is 3 bytes, and only contains the ident header).
Hmm, OK. I tightened it up. The total header size is actually 4 + 2n, where n is the number of frames in the packet -- we need to prefix the length before each frame.
In this case though, those additional headers are included in (s->buf_ptr - s->buf), but we do need to have room for an additional 2 bytes if the current frame is not the first of the packet.
I am not 100% certain this fix is correct, so a second pair of eyes would be good.
+ int data_size = (s->buf_ptr - s->buf) + size; + int header_size = s->num_frames ? 8 : 6; // 6 bytes standard, 2 more bytes for length of extra frame + int remaining = max_pkt_size - data_size - header_size;
Actually, the problem is that it's too tight. The check above, frag = size <= max_pkt_size ? 0 : 1; indicated that it should fit without fragmenting it, but this check says that it doesn't fit and you should send whatever you have buffered from before - even if there's no packet buffered from before.
You _could_ add an "s->num_frames > 0 &&" part to the if clause, so you won't accidentally send empty packets. I think the calculation of whether it fits or not is clearer this way:
// We're allowed to write 6 + max_pkt_size bytes from s->buf. // Calculate how much space is left after writing 2 length bytes // + size from buf_ptr. int remaining = s->buf + 6 + max_pkt_size - (s->buf_ptr + 2 + size);
So, the s->buf + 6 + max_pkt_size could be stored away as an end_ptr or something, if you like - that makes it easier to check with the current s->buf_ptr if the things we're going to write (2 + size) fits or not. Writing it all in one expression like this is ok for me, too.
Split it into ptr and end_ptr for clarity. Also added the s->num_frames > 0 check as discussed on IRC. Also changed the commenting style to use // rather than /*. Makes it slightly easier to debug out chunks of code :) Josh
On 5 August 2010 11:40, Josh Allmann <joshua.allmann@gmail.com> wrote:
On 5 August 2010 02:21, Martin Storsjö <martin@martin.st> wrote:
On Thu, 5 Aug 2010, Josh Allmann wrote:
@@ -135,6 +137,14 @@ static int rtp_write_header(AVFormatContext *s1) s->nal_length_size = (st->codec->extradata[4] & 0x03) + 1; } break; + case CODEC_ID_VORBIS: + case CODEC_ID_THEORA: + if (!s->max_frames_per_packet) s->max_frames_per_packet = 15; + s->max_frames_per_packet = av_clip(s->max_frames_per_packet, 1, 15); + s->max_payload_size -= 6; // ident+frag+tdt/vdt+pkt_num+pkt_length + s->num_frames = 0; + goto defaultcase; + break;
In this case, when you have an unconditional goto, you could drop the break
Fixed.
+ + if (!frag && !xdt) { // do we have a whole frame of raw data? + int remaining = max_pkt_size - ((int)(s->buf_ptr - s->buf) + size);
This calculation is slightly off - max_pkt_size already has the 6 bytes header subtracted, but those 6 bytes still are included in s->buf_ptr - s->buf. Also, it feels a bit convoluted.
If you happen to have a packet of e.g. 1453, this forces the lines below to send the currently buffered data, even if there isn't any frame buffered (s->buf_ptr - s->buf is 3 bytes, and only contains the ident header).
Hmm, OK. I tightened it up. The total header size is actually 4 + 2n, where n is the number of frames in the packet -- we need to prefix the length before each frame.
In this case though, those additional headers are included in (s->buf_ptr - s->buf), but we do need to have room for an additional 2 bytes if the current frame is not the first of the packet.
I am not 100% certain this fix is correct, so a second pair of eyes would be good.
+ int data_size = (s->buf_ptr - s->buf) + size; + int header_size = s->num_frames ? 8 : 6; // 6 bytes standard, 2 more bytes for length of extra frame + int remaining = max_pkt_size - data_size - header_size;
Actually, the problem is that it's too tight. The check above, frag = size <= max_pkt_size ? 0 : 1; indicated that it should fit without fragmenting it, but this check says that it doesn't fit and you should send whatever you have buffered from before - even if there's no packet buffered from before.
You _could_ add an "s->num_frames > 0 &&" part to the if clause, so you won't accidentally send empty packets. I think the calculation of whether it fits or not is clearer this way:
// We're allowed to write 6 + max_pkt_size bytes from s->buf. // Calculate how much space is left after writing 2 length bytes // + size from buf_ptr. int remaining = s->buf + 6 + max_pkt_size - (s->buf_ptr + 2 + size);
So, the s->buf + 6 + max_pkt_size could be stored away as an end_ptr or something, if you like - that makes it easier to check with the current s->buf_ptr if the things we're going to write (2 + size) fits or not. Writing it all in one expression like this is ok for me, too.
Split it into ptr and end_ptr for clarity. Also added the s->num_frames > 0 check as discussed on IRC.
Also changed the commenting style to use // rather than /*. Makes it slightly easier to debug out chunks of code :)
Hopefully last round; added in Changelog entry and micro version bump.
Josh
On Thu, 5 Aug 2010, Josh Allmann wrote:
On 5 August 2010 11:40, Josh Allmann <joshua.allmann@gmail.com> wrote:
Split it into ptr and end_ptr for clarity. Also added the s->num_frames > 0 check as discussed on IRC.
Also changed the commenting style to use // rather than /*. Makes it slightly easier to debug out chunks of code :)
Hopefully last round; added in Changelog entry and micro version bump.
Looks good to me. I'll apply it in a few days if I don't hear anything from Luca A (I mailed him regarding reviewing rtp packetizers a few days ago). // Martin
On Thu, 5 Aug 2010, Martin Storsjö wrote:
On Thu, 5 Aug 2010, Josh Allmann wrote:
On 5 August 2010 11:40, Josh Allmann <joshua.allmann@gmail.com> wrote:
Split it into ptr and end_ptr for clarity. Also added the s->num_frames > 0 check as discussed on IRC.
Also changed the commenting style to use // rather than /*. Makes it slightly easier to debug out chunks of code :)
Hopefully last round; added in Changelog entry and micro version bump.
Looks good to me. I'll apply it in a few days if I don't hear anything from Luca A (I mailed him regarding reviewing rtp packetizers a few days ago).
No reply from Luca A yet, so I went ahead and applied it. // Martin
Hi, On Tue, Jul 27, 2010 at 10:48 PM, Josh Allmann <joshua.allmann@gmail.com> wrote:
Re-sending, due to this hunk:
@@ -154,6 +156,11 @@ static int rtp_write_header(AVFormatContext *s1) } case CODEC_ID_AAC: s->num_frames = 0; + case CODEC_ID_VORBIS: + case CODEC_ID_THEORA: + if(!s->max_frames_per_packet) s->max_frames_per_packet = 15; + s->max_frames_per_packet = av_clip(s->max_frames_per_packet, 1, 15); + s->max_payload_size -= 6; // ident+frag+tdt/vdt+pkt_num+pkt_length default: if (st->codec->codec_type == AVMEDIA_TYPE_AUDIO) { av_set_pts_info(st, 32, 1, st->codec->sample_rate);
Apparently AAC and AMR both fall through to the default case. Whether that's intentional, I don't know, so I moved my stuff to minimize behavioral changes to existing code.
You could use goto default (where you would normally use "break") so that this works for all codecs without randomness. Ronald
On Tue, 27 Jul 2010, Josh Allmann wrote:
-Theora video has problems with I-frames (or whatever the Theora equivalent is), and exhibits serious blocking in areas of motion.
Fixed, although on occassion it is a bit wonky. The depacketizer will complain about a missing start fragment at a particular position in big buck bunny, disrupting the video briefly, but otherwise it is fine.
Hmm, is this over a real network or over loopback? Sometimes, this kind of issues can be fixed by running both the sending and receiving RTSP streams over TCP (by adding ?tcp at the end of the urls). If the issue still is present then, there's a real problem in the code... // Martin
On 28 July 2010 00:46, Martin Storsjö <martin@martin.st> wrote:
On Tue, 27 Jul 2010, Josh Allmann wrote:
-Theora video has problems with I-frames (or whatever the Theora equivalent is), and exhibits serious blocking in areas of motion.
Fixed, although on occassion it is a bit wonky. The depacketizer will complain about a missing start fragment at a particular position in big buck bunny, disrupting the video briefly, but otherwise it is fine.
Hmm, is this over a real network or over loopback? Sometimes, this kind of issues can be fixed by running both the sending and receiving RTSP streams over TCP (by adding ?tcp at the end of the urls). If the issue still is present then, there's a real problem in the code...
Looks like it's a real problem with the code, still happens with TCP over loopback.The depacketizer also complains about a bad cseq. I'll take a look at it in the morning. Josh
participants (4)
-
Josh Allmann -
Luca Barbato -
Martin Storsjö -
Ronald S. Bultje