[FFmpeg-devel] Microsoft Smooth Streaming

C Chatterjee cchatterj at hotmail.com
Wed Oct 26 21:59:07 CEST 2011

Thanks for the links to the smooth streaming patch.
I integrated in my FFMPEG that is Dec 28-2010 tarball.

What is the command to make this work on an input file say ob.ts?

Here's a simple command line:
-y -i ob.ts -acodec libfaac -vcodec libx264 -vpre veryfast -b 250k -threads 4 cc.ism

I am missing something here? There has to be a .ism and a number of .mp4 files.

Pls send me an example command.


Date: Wed, 26 Oct 2011 04:25:29 +0200
From: michaelni at gmx.at
To: ffmpeg-devel at ffmpeg.org
CC: baptiste.coudurier at gmail.com
Subject: Re: [FFmpeg-devel] Microsoft Smooth Streaming

On Tue, Oct 25, 2011 at 09:25:22PM -0200, Marcus Nascimento wrote:
> Please, check the answers bellow.
> Thank you very much in advance.
We have to thank you for the excelent explanation.
Also iam CCing this to baptiste who is our mov/mp4 expert. He probably
can help in explaining how to best connect all the things together.
> On Tue, Oct 25, 2011 at 3:54 PM, Nicolas George <
> nicolas.george at normalesup.org> wrote:
> > Le quartidi 4 brumaire, an CCXX, Marcus Nascimento a écrit :
> > > I'd like to extend FFMpeg to support Microsoft Smooth Streaming
> > (streaming
> > > playback), the same way it has been done by all the available Silverlight
> > > players.
> >
> > Contributions are always welcome on principle.
> >
> > > By now I do not intend to dump data to a file to be played locally or
> > > anything like that. And probably will never intend to do that. I just
> > want
> > > to play it.
> >
> > If it can play it, then it can also dump it to a file. I hope you were not
> > counting otherwise.
> >
> >
> Definitely not. I was only worried about legal issues. Don't want to cause
> trouble to FFMpeg or something like that.
> > > I did some research in this mail list and find out some posts that talked
> > > about that before.
> > > However I couldn't find in depth information or anything beyond the point
> > > I'm stuck.
> > >
> > > I've done a lot of research on MS Smooth Streaming theory of operation,
> > > studied some ISOFF (and PIFF) and some more.
> > > It is pretty clear to me how MS Smooth Streaming works. Now it is time to
> > > focus on how to do that in the FFMpeg way.
> > >
> > > First things first, I'd like to know how a streaming should be processed
> > in
> > > order to be played by FFMpeg.
> >
> > I believe you would receive more relevant replies faster if you took a few
> > minutes to describe an overview of how the protocol works.
> >
> >
> Right away. I'll give as many details as necessary here. Prepare yourself
> for some reading!
> First of all, Microsoft Smooth Streaming basic idea is to encode the same
> video in multiple bitrates. The client can decide which bitrate to use. At
> any time it is possible to switch to another bitrate based on bandwidth
> availability and other measurements.
> Each encoding bitrate will originate an independent ISMV file (IIS Smooth
> Media Video I supose).
> The encoding keeps focus in the idea of fragmented structure that ISOFF (ISO
> File Format - the MP4 file format) offers. Keyframes are generated regularly
> and equally spaced in all ISMV files (2s).
> This is more restrictive than regular encoding procedures that allow some
> flexibility on keyframe intervals (I believe it, since I'm not an specialist
> on that).
> Important to say that all fragments always start with a keyframe.
> Each ISOFF fragment is perfectly aligned between different bitrates (in
> terms of time, of course. Data size may vary drastically). That alignment
> allows the client to request different bitrates for one fragment and switch
> to another bitrate in the next fragment.
> The ISMV file format is called PIFF and is based on the ISOFF with a few
> additions. There are 3 uuid box types that are dedicated to DRM purposes (I
> wont touch them here). Thus the meaning of PIFF: Protected Interoperable
> File Format. The PIFF brand (ftyp box value) is "piff".
> More on PIFF format here: http://go.microsoft.com/?linkid=9682897
> The server side (in the MS implementation) is just an extension to the IIS
> called IIS Media Services.
> That is just a web service that accepts HTTP requests with a custom
> formatted URL.
> The base URL is something like http://domain.com/video.ism (note that is not
> ISMV), which is never requested.
> By the time the client wants to play a video, it will request a Manifest
> file. The URL is <baseUrl>/Manifest.
> The Manifest is just a XML file that provides some information regarding
> different streams and other information.
> Here is a basic example (modified parts of the original found here:
> http://playready.directtaps.net/smoothstreaming/SSWSS720H264/SuperSpeedway_720.ism/Manifest
> ):
> <SmoothStreamingMedia MajorVersion="2" MinorVersion="1"
> Duration="1209510000">
> <StreamIndex Type="video" Name="video" Chunks="4" QualityLevels="2"
> MaxWidth="1280" MaxHeight="720" DisplayWidth="1280" DisplayHeight="720"
> Url="QualityLevels({bitrate})/Fragments(video={start time})">
> <QualityLevel Index="0" Bitrate="2962000" FourCC="H264" MaxWidth="1280"
> MaxHeight="720"
> CodecPrivateData="000000016764001FAC2CA5014016EFFC100010014808080A000007D200017700C100005A648000B4C9FE31C6080002D3240005A64FF18E1DA12251600000000168E9093525"/>
> <QualityLevel Index="1" Bitrate="2056000" FourCC="H264" MaxWidth="992"
> MaxHeight="560"
> CodecPrivateData="000000016764001FAC2CA503E047BFF040003FC52020202800001F480005DC03030003EBE8000FAFAFE31C6060007D7D0001F5F5FC6387684894580000000168E9093525"/>
> <c d="20020000"/>
> <c d="20020000"/>
> <c d="20020000"/>
> <c d="6670001"/>
> </StreamIndex>
> <StreamIndex Type="audio" Index="0" Name="audio" Chunks="4"
> QualityLevels="1" Url="QualityLevels({bitrate})/Fragments(audio={start
> time})">
> <QualityLevel FourCC="AACL" Bitrate="128000" SamplingRate="44100"
> Channels="2" BitsPerSample="16" PacketSize="4" AudioTag="255"
> CodecPrivateData="1210"/>
> <c d="20201360"/>
> <c d="19969161"/>
> <c d="19969161"/>
> <c d="8126985"/>
> </StreamIndex>
> </SmoothStreamingMedia>
> We can see it says the version of the smooth stream media and the duration
> (this is measured in 1 / 10,000,000 seconds).
> Next we see the video section which says each quality level has 4 chunks
> (fragments), with 2 quality levels available. It also says the video
> dimensions and the URL format.
> Next it gives information about each bitrate with codec information and
> codec private data (I believe it is used to configure the codec is a opaque
> way).
CodecPrivateData looks like H.264 SPS and PPS NAL units from a quick
look. This should be decoded hex->binary and placed in extradata or
injected into the bitstream. FFmpegs decoders are quite forgiving on
where and how they get this data normally ...
> Next it lists each fragment size. The first fragment would be referenced as
> 0 (zero), and the others as a sum of previous fragments size. I'm not sure
> exactly what those values mean.
> Next we have the same structure for audio description.
> After getting the Manifest file, the client must decide which quality level
> is best suited for the device and its resources.
> It is not clear to me on what parameters it bases it's decisions. I heard
> about size of the screen and its resolution, computing power, download
> bandwidth, etc.
> As soon as the quality level is chosen, I suppose the decoder has to be
> configured in a suitable way, using the CodecPrivateData information
> provided.
> The client then will start requesting fragments following the URL pattern
> given in the Manifest.
> To request the first fragment for the first quality level, it would follow
> the <baseUrl>/QualityLevel(0)/Fragments(video=0).
> To request the forth fragment for the second quality level, it would follow
> the <baseUrl>/QualityLevel(1)/Fragments(video=60060000).
> It is still possible to request just the audio following the same idea. For
> instance: <baseUrl>/QualityLevels(0)/Fragments(audio=20201360).
> Each fragment received is arranged in PIFF wire format. In other words:
> Contains exactly one moof box and exactly one mdat box and nothing
> more (check MP4 specs for more info).
> Of course there are internal boxes to those if applicable. It may contain
> custom uuid boxes designed to allow DRM protection. Lets not consider them
> here.
> I'm not sure which information I can get from the moof boxes, but I assume
> it would be relevant for the demuxer only, since the codec would only work
> on the mdat contained opaque data. Correct me if I'm wrong, please.
> The client would apply some heuristics while requesting fragments and
> sometime it may decide to switch to another quality level. I suppose it
> would have to reconfigure the decoder and repeat it over and over until the
> end of that.
most likely no reconfiguration is needed, simply feeding the next
"fragment" to the decoder might work fine.
the decoder should detect changes and reconfigure itself.
> > > 2 - A very simple external code just request FFMpeg to play a smooth
> > > streaming media. FFMpeg will detect this is a HTTP based media and will
> > > request the manifest file for that (I believe I'd have to create a custom
> > > HTTP based solution for that). By the time the manifest is available,
> > ffmpeg
> > > would configure the decoder. Then makes further HTTP requests same way as
> > in
> > > 1.
> >
> > There is already HTTP client code, as surely you know.
> >
> >
> Yes. I've seen something about it. It looks suitable for the case.
> It may be my starting point for studying. But I still feel like in need for
> some big picture on how ffmpeg works in general.
What we have basically are demuxers and protocols.
Protocols are things that (for our purpose here) provide a bytestream
from some url and may provide seeking support.
Demuxers are things that on top of a protocol (or other things)
produce data packets for various streams
What you describe can be implemented either as protocol that works
on top of a http protocol and which than feeds its data to a mp4
demuxer (which possibly needs modifications to handle the data)
A demuxer that works on top of a http protocol and has a instance of
a mp4 demuxer to which it passes the data.
There are other ways too ...
Michael     GnuPG fingerprint: 9FF2128B147EF6730BADF133611EC787040B0FAB
No human being will ever know the Truth, for even if they happen to say it
by chance, they would not even known they had done so. -- Xenophanes

ffmpeg-devel mailing list
ffmpeg-devel at ffmpeg.org

More information about the ffmpeg-devel mailing list