[FFmpeg-devel] [GSoC] Motion Interpolation

Mon Jun 20 11:54:15 CEST 2016

On Sat, Jun 18, 2016 at 3:16 AM Michael Niedermayer <michael at niedermayer.cc>
wrote:

> On Fri, Jun 17, 2016 at 08:19:00AM +0000, Davinder Singh wrote:
> [...]
> > Yes, I did that, after understanding it completely. It now works with the
> > motion vectors generated by mEstimate filter. Now I’m trying to improve
> it
> > based on this paper: Overlapped Block Motion Compensation: An
> > Estimation-Theoretic Approach
>
> > <
> http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.112.8359&rep=rep1&type=pdf
> >
>
> this is 22 years old
>
>
> > and
> > this one: Window Motion Compensation
> > <https://www.researchgate.net/publication/252182199>.Takes a lot of time
>
> this is 25 years old
>
> not saying old papers are bad, just that this represents the knowledge
> of 20 years ago
>
> also its important to keep in mind that blind block matching of any
> metric will not be enough. To find true motion the whole motion
> vector fields of multiple frames will need to be considered
>
> For example a ball thrown accross the field of view entering and
> exiting the picture needs to move smoothly and at the ends (in time)
> there are frames without the ball then a frame with the ball
> these 2 are not enough to interpolate the frames between as we have
> just one location where the ball is. With the next frames though
> we can find the motion trajectory of the ball and interpolate it end
> to end
>
> I think papers which work on problems like this and also interpolation
> of all the areas that end up overlapping and covering each other
> like the backgroud behind the ball in that example would be better
> starting points for implementing motion estiation because ultimatly
> that is the kind of ME code we would like to have.
> Block matching with various windows, OBMC, ... are all good but
> if in our example the vectors for the ball or background are off that
> will look rather bad with any motion compensation
> So trying to move a bit toward this would make sense but first
> having some motion estimation even really basic and dumb with
> mc working in a testable filter (pair) should probably be done.
> Iam just mentioning this as a bit of a preview of what i hope could
> eventually be implemented, maybe this would be after GSoC but its
> the kind of code needed to have really usable frame interpolation
>
>
>
> > reading them. I think we need to add new Raised Cosine window (weights)
> > along with Linear Window (currently implemented). What do you say?
>
> i dont know, the windows used in snow are already the best of several
> tried (for snow).
> no great gains will be found by changing the OBMC window from snow.
>
>
> >
> > Also making mInterpolate work with variable macroblock size MC. The
> current
> > interpolation works without half pel accuracy, though.
>
> mcfps has fully working 1/4 pel OBMC code, that should be fine to be
> used as is i think unless i miss something
>
> half pel is 20 years old, it is not usefull
> multiple block sizes on the MC side should not really matter ATM
> smaller blocks are a bit slower but first we should get the code
> working, then working with good quality and then working fast.
>
> multiple block sizes may be usefull for the estimation side if it
> improves estimation somehow.
>
> Can i see your current "work in progress" ?
>
>
> [...]
> > I’m moving estimation code to some new file motion_est.c file and the
> > methods are shared by both mEstimate and mInterpolate filters. mEstimate
> > store the MVs in frame’s side data for any other filter. Moreover, any
> > other filter if need post processing on MVs it can directly use the
> shared
> > methods. But, mInterpolate use them internally, no saving in sidedata,
> and
> > saving unnecessary processing.
>
> This design sounds good
>
>
> >
> >
> > Also, Paper [1] doesn’t uses window with OBMC at all. It just find normal
> > average without weight. Perhaps to compare papers I either need to add
> > multiple option for each setting or need to assign the algorithm as
> > researcher’s name in filter options.
>
>
>
Paper [1] and [2] uses functions or do post processing on motion vectors,
so needs fast ME algorithms, which currently I’m working on. [*M]

Let me summarize the papers (from Email 1, this thread):

Paper [1]: Zhai et al. (2005) A Low Complexity Motion Compensated Frame
Interpolation Method

[Quote]
This paper propose a MCFI method intended for real time processing. It
first examines the motion vectors in the bitstream [*1]. 8x8 block size is
used rather than 16x16 as in most cases; Using smaller block size leads to
denser motion field, so neighboring MVs are more highly correlated, so
prediction is better. To reduce complexity, MVs in bitstream are utilized
[*1]. But need to be filtered as not all of them represent true motion.
They are grouped into “good vectors, can be used directly” and “bad
vectors, need to find true motion”. For classification of MVs into groups,
SAD and BAD is used. For an 8x8 block in to-be-interpolated frame F(in) we
get motion vector MV of block at same location in current frame. If F(in)
is exactly middle of F(prev) and F(cur), then MV/2 points to avblock in
prev frame & -MV/2 points to a block in current frame from F(in). Then SAD
& BAD of both of these blocks are compared to certain thresholds [*2],
based on which blocks are classified. For bad ones, overlapped block
bi-directional motion estimation (OBBME) is carried out to find true
motion. In OBBME, the size of block in F(in) is enlarged to 12x12, then
bi-directional ME is performed to get MV that minimizes the diff. between
two block located at MV/2 & -MV/2 in F(prev) & F(cur) wrt current block in
F(in). Diff is calc by eq (1) in Paper. Like in BMA, we can use any fast ME
algo here [*M]. After this, there are still few MVs. For that post
processing is performed on MVs that break the continuity. We calculate the
variation of each motion vector and its neighboring MVs. If variation
exceeds a certain threshold, the MV is regarded as a single bad motion
vector and then vector median filtering is applied. It finds one vector
among 8, that minimizes eq (2). Finally, OBMC is applied. No weights are
used [*3]. Pixels are simple averages given by eq 4-6.
[/Qoute]

[*1] We can for now use motion vectors generated on filter side. As you
suggested, later we can use decoder’s vectors.
[*2] Threshold values are not given in paper :(
[*3] Initially, we can test using the generated/refined motion vector field
with the currently implemented window based OBMC. Later to reduce
complexity we can use their method.

Paper [2]: Choi et al. (2007) Motion-Compensated Frame Interpolation Using
Bilateral Motion Estimation and Adaptive Overlapped Block Motion
Compensation

[Quote]
This algorithm has four steps. First, we propose the bilateral ME scheme to
obtain the motion field of an interpolated frame. Then, we partition a
frame into several object regions by clustering MVs. We apply the
variable-size block MC (VS-BMC) to object boundaries in order to
reconstruct edge information with a higher quality. Finally, we use the
adaptive overlapped block MC, which adjusts the coefficients of overlapped
windows based on the reliabilities of neighboring MVs. The adaptive OBMC
(AOBMC) can overcome the limitations of the conventional OBMC, such as
over-smoothing and poor de-blocking.
I. We perform bilateral ME which prevents overlapping and hole problem [*4]
by estimating the motion vectors of interpolated frame directly. If the
conventional BMA is used to find a block-wise motion vector field between
the previous frame and the current frame, the motion trajectories may not
cover all pixels in the interpolated frame, consequently yielding hole
regions. In addition, multiple trajectories may pass through the same
pixel, causing overlapping regions. Therefore, we should estimate the
motion vectors for the blocks in the interpolated frame, instead of using
the motion vectors between the previous frame and the current frame. In
proposed Bilateral ME we obtain the MV by comparing a block at a shifted
position in the F(prev) and another block at the opposite position in
F(curr), by minimizing SAD [*5][*M]. Since there can be multiple
trajectories through the current block, we impose a spatial smoothness
constraint to improve robustness of ME. The SMD is calc which is avg.
between abs. px. values at boundary of predicted and neighboring block. We
find best MV by minimizing weighted sum of SAD & SMD given by eq (6).
II. Then MBs are classified into clusters according to MVs. [TL;DR] First,
all MBs are considered as single object and cluster center is set to avg.
MV of blocks. If diff b/w block's MV and threshold T (=8), the block
belongs to new object. The avg. MV of the blocks in new object is set as a
new cluster center. Each cluster center is updated to the avg. of MVs in
the cluster. Steps 2–4 are iteratively repeated until there is no change in
the cluster centers. [/TL;DR]
III. To express complex motions, we adopt VS-BMC to reconstruct boundary
blocks. We adopt a quadtree-based VS-BMC, which divides an 8x8 boundary
block into 4x4 or 2x2 sub-blocks. [*6] Then we find MVs for sub-block using
SAD like before, if new MV is less than 1/4MV of orig. block, accept the
division - iterate it by subdividing it furthermore, or terminate procedure.
IV. Finally adaptive OBMC is used with window such as raised cosine [*7].
Conventional OBMC can yield blurring or over-smoothing artifacts. AOBMC
reconstructs the interpolated frame faithfully by controlling the weights
of overlapping windows according to the reliabilities of MVs. See Fig. 6 &
7.
[/Quote]

[*M] The shared methods from motion_est.c will allow this without
repetition of code.
[*5] It is very similar to OBBME used in Paper [1] except the block size is
not changed.
[*6] This required the current mInterpolate code to support variable size
OBMC.
[*7] We could use the linear window instead of raised cosine one. But too
late, I already implemented it.

Another interesting paper I found is 3D recursive search. It's little old
but very popular. See images here:
http://i65.tinypic.com/zkfgox.png
http://i67.tinypic.com/2dihmb7.png
http://i65.tinypic.com/rgw38n.png

Paper [3]: de Haan et al. (1994) True Motion Estimation with 3D Recursive
Search Block Matching
(http://ieeexplore.ieee.org/xpls/abs_all.jsp?arnumber=246088)
Gonna read now. It has unusual notation.

Once we implement these, then we can deal with objects entering or exiting
the screen. I think it is hole or overlapping problem addressed in paper
[2], several approaches have been proposed to handle it like median
filtering, spatial interpolation (
http://ieeexplore.ieee.org/xpls/abs_all.jsp?arnumber=389461) or MC using
neighboring motion fields. Will look into it more. The hole or overlapping
problem is handled by bilateral motion estimation used in paper [2] (*4).
Also have to handle scene changing issues. I read in some paper that they
are too computational expensive.

Which one do you think we should start with? I think it should be 3DRS.
3DRS is fastest of these three. Paper 2 compares result of all these three.
3DRS is around 16fps, [1] is ~7fps. [2] is ~3fps. Paper 2 outperforms both
of them.

I'll push ME code soon: https://github.com/dsmudhar/FFmpeg/tree/dev
Will commit more frequently, from now.

> [...]
>
>
> --
> Michael     GnuPG fingerprint: 9FF2128B147EF6730BADF133611EC787040B0FAB
>
> Why not whip the teacher when the pupil misbehaves? -- Diogenes of Sinope
> _______________________________________________
> ffmpeg-devel mailing list
> ffmpeg-devel at ffmpeg.org
> http://ffmpeg.org/mailman/listinfo/ffmpeg-devel

...

DSM_