[FFmpeg-user] Fwd: Working with HLS Multiple Audio Renditions

Mon Jul 4 13:45:39 CEST 2016

Hey guys!

(Sorry if you receive this twice, I just subscribed to the list and I had
previously sent a message
wating for approval).

First of all thanks for all the outstanding work you guys are doing on
FFmpeg, it's my go-to
solution to work with audio and video streams.

I'm currently trying to crack the "Multiple Audio Renditions" nut; I use
JWPlayer 7 in my
everyday job and they recently released support for alternate audio streams
for HLS as
described in this part of the standard (Master Playlist with Alternative
audio):

https://tools.ietf.org/html/draft-pantos-http-live-streaming-19#section-8.6

An example of how this should look like is visible here:

https://support.jwplayer.com/customer/portal/articles/1761348-multiple-audio-renditions

Without going into the gory details, at work we allow the users to record
videos using an iPad camera,
and we now want to allow them to also record multiple audio from different
sources (BT mics, for example),
and then having our transcoding server produce an HLS playlist where they
can switch the different
audio tracks with ease.

**I'm using ffmpeg 2.7.2**

As far as I know, FFmpeg does not offer any "out of the box" support for
this in the `hls` muxer, by
which I mean the muxer only segment an input media into `.ts` segments
(audio + video) and creates a simple .m3u8 playlist,
so my current flow is the following:

* I use the `hls` muxer with the `-an` to produce audioless segments, all
packed up in a valid manifest which
ffmpeg generates. Example (`video.m3u8`):

```
#EXTM3U
#EXT-X-VERSION:3
#EXT-X-TARGETDURATION:63
#EXT-X-MEDIA-SEQUENCE:0
#EXTINF:60.458333,
video_0.ts
#EXTINF:59.625000,
video_1.ts
#EXTINF:62.250000,
video_2.ts
....
#EXT-X-ENDLIST
```

* I get the audio tracks I receive from the outside, and I use the
`segment` muxer to map those into valid segments:

```
ffmpeg -i foo.mp3 -threads 0 -ac 2 -acodec libmp3lame -async 1 -ar 44100
-muxdelay 0 -f segment \
        -segment_time 60 -segment_list_size 0 -segment_format mp3
-segment_list "foo_%d.mp3" audio1.m3u8
```

Which yields:

```
#EXTM3U
#EXT-X-VERSION:3
#EXT-X-MEDIA-SEQUENCE:0
#EXT-X-ALLOW-CACHE:YES
#EXT-X-TARGETDURATION:61
#EXTINF:60.003265,
foo_1.mp3
#EXTINF:60.003265,
foo_2.mp3
#EXTINF:60.003265,
foo_3.mp3
...
#EXT-X-ENDLIST
```

* I manually assemble the master playlist myself to conform with the
standard. Real wold example of what I'm producing (`index.m3u8`):

```
#EXTM3U
#EXT-X-VERSION:3

#EXT-X-STREAM-INF:PROGRAM-ID=1,BANDWIDTH=1200000,RESOLUTION=960x540,NAME="Main",CODECS="avc1.66.30",AUDIO="mp3"
video.m3u8

#EXT-X-MEDIA:TYPE=AUDIO,GROUP-ID="mp3",LANGUAGE="en",NAME="Left
Part",DEFAULT=YES,AUTOSELECT=YES,URI="audio1.m3u8"
#EXT-X-MEDIA:TYPE=AUDIO,GROUP-ID="mp3",LANGUAGE="en",NAME="Right
Part",DEFAULT=NO,AUTOSELECT=NO,URI="audio2.m3u8"
```

This produces a valid playlist which I'm then able to stream, but
unfortunately it seems to lose the audio/video sync pretty quickly,
and generally speaking the way I'm doing things feels a bit wrong. If I
jump to a particular point in the video, the audio stops
playing or simply the wrong one is played. Maybe my segment size is too big?

*Thus, my questions for you guys are:*

1. Which is the recommended workflow for producting HLS playlist with
multiple audio?
   Can you guys point me to any helpful tutorial/documentation?

2. Currently I'm doing several passes;
    First pass: Produce the audioless video segments
    Second .. N passes: For each audio input the user recorded, I ran the
`segment` muxer.
    Is this a correct way to approach the problem? Which are the
constraints I need to enforce on the codecs/bitrates to make
    sure I'm not producing audio segments which won't sync correctly?

3. Shall I simply give up any hope of making this work with separate audio
files and simply switch working with media having 1 video
stream and N audio streams? I can imagine at least 2 other scenarios to
make this work:
    a. I would run ffmpeg and map the video stream (0:0) to the HLS muxer,
and all the N audio streams to the segment muxer, hoping
       ffmpeg will be able to produce synced chunks.
    b. I would run ffmpeg to produce a "normal" HLS playlist, then run it
again, on each chunk, to separate the audio from the video.
       The main reasoning here would be, again, to have ffmpeg deal with
all the transcoding and the syncing.

As you guys can see, I'm quite confused and starting to ramble, so I'm
putting my trust in you audio/video gurus to help me out ;)

Sorry for the lengthy email and I hope I've made myself clear enough!

Peace,

Alfredo