[FFmpeg-user] optimal exact segmentation without re-encoding

David Bernat david.bernat at gmail.com
Thu Jul 6 05:17:03 EEST 2023

On Wed, Jul 5, 2023 at 9:39 PM Carl Zwanzig <cpz at tuunq.com> wrote:

> On 7/5/2023 3:29 PM, David Bernat wrote:
> > Premise: segment a mov file into about one second segments without
> > re-encoding; yet preserving concatenation; such that the segmentation is
> > embarrassingly parallel, for high-speed segmenting.
> What is the purpose of these segmented files (how will they be used)? Are
> they going to be processed and then assembled together (which may involve
> decode/recode operations). Depending on the unknown intermediate work, it
> may be faster overall to decode all into an uncompressed state and work on
> that before concatenating into a reencode pass (which could also be done
> in
> segments which are then themselves concatenated).

The parallelization of processing is an essential motivation. (Thank you
and looking forward to forwarding this discussion and solution with you.)

Multiple uses but may we focus on two:

1. computer vision and thumbnails:

Object detection runs quickly on images; and, in most cases, it is
sufficient to summarize a long video by applying object detection every 1
or 2 seconds.
Extracting a frame using -ss is very fast but not immediate: my speed tests
indicate that a CPU can achieve about 60 image extractions in about 15
ffmpeg -ss can be embarrassingly parallelized and does speed up from about
18 seconds to 15 seconds.
The precise timestamp of the image is not essential (within a few frames is
more than sufficient) and so ffmpeg -ss is sufficient.
Accelerating this even further is a wonderful benefit.
Furthermore, if the images are already segmented, each segment can be
handled separately on a different cloud unit or CPU.
In this case, motivating a storage option that is segmented is the key.
Notice that this method does not strictly require that recomposition is

2. storage and generic processing:

This adds the additional requirement that recomposition is achievable.
Numerous use cases apply the parallelization scheme described above.
It would be hugely beneficial for the storage efficiencies gained from the
above also constitute the full storage of the video.
In an extreme case, you can imagine a video player, in which the one second
video clips are also serving a make-shift streaming solution.
Whether this is the best example of that use case the concept is sufficient
and additional use cases are welcomed.

> Also, please post some of the commands you've used.

Here is an overview of the processes I am using at the moment:

*This series of commands identifies the KEYFRAMES.*

# cmd = "ffprobe -loglevel error -select_streams v:0 -show_entries
packet=pts_time,flags -of csv=print_section=0 IMG_9209.MOV | awk -F','
'/K/ {print $1}'"
# result = subprocess.run(cmd, shell=True, cwd=os.getcwd(), capture_output=True)

# key_frames = [float(t) for t in result.stdout.decode().split()] ==>
[0.000000, 1.068333, 2.135000, 3.211667, 4.295000]

This command creates SEGMENTs from KEYFRAME to KEYFRAME but is not
precisely exact, and hence fails the concatenation requirement.

# cmd = "ffmpeg -y -ss {key_frames[i]} -to {key_frames[i+1]} -i
IMG_9209.MOV -c copy segment_{i}.move"

# The output of SEEK will have ffprobe results at timestamps like this
(with K being keyframe and D being negative):
# [0.000000,K_ -0.066667,_D -0.100000,_D -0.033333,_D 0.135000,__
0.066667,__ 0.033333,__ 0.100000,__ 0.285000,__ ...
# ... 0.826667,__ 0.910000,__ 1.076667,K_ 1.035000,__ 0.993333,__
1.243333,__ 1.160000,__]

# Notice that 2.135000 - 1.068333 = 1.076667 as this is segment_2.mov;
and timestamps are reset to start at zero.

This command TRIMs each SEGMENT and does successfully fulfill the
concatenation requirement; but is very, very slow (0.25x real time)

# cmd = "ffmpeg -i segment_2.mov -vf trim=0:1.076667 segment_2_trim.mov"
# The output of TRIM takes 4-5 seconds despite being a 1.07 second
clip: re-encoding is always slow. But, it is correct.
# [0.000000,K_ 0.033333,__ 0.016667,__ 0.008333,__ 0.025000,__
0.066667,__ 0.050000,__ 0.041667,__ ...
# ... 1.016667,__ 1.058333,__ 1.041667,__ 1.033333,__ 1.050000,__]

This python code launches CONCAT:

# s = time.time()
# tmp = tempfile.NamedTemporaryFile(delete=False)
# relpath = os.path.relpath(os.getcwd(), tmp.name)
# with open(tmp.name, "w") as f:
#     [f.write(f"file '{relpath}/segment_{i}.mov'\n") for i in range(28)]
# cmd = f"ffmpeg -y -f concat -safe 0 -i {tmp.name} -c copy concatenated.mov"
# subprocess.run(cmd.split(), shell=False, cwd=os.getcwd())
# print(time.time()-s)

[this next one does not work: it is intended to create keyframes every
one second. though this entire avenue may be unnecessary.]

# cmd = f"ffmpeg -y -i IMG_9209.MOV -force_key_frames
expr:gte(t,n_forced*1) -c copy keyframes.mov"

Thank you.

> z!
> _______________________________________________
> ffmpeg-user mailing list
> ffmpeg-user at ffmpeg.org
> https://ffmpeg.org/mailman/listinfo/ffmpeg-user
> To unsubscribe, visit link above, or email
> ffmpeg-user-request at ffmpeg.org with subject "unsubscribe".

More information about the ffmpeg-user mailing list