University of Cambridge > > Microsoft Research Cambridge, public talks > Efficient Data Structures for Nonlinear Video Processing

Efficient Data Structures for Nonlinear Video Processing

Add to your list(s) Download to your calendar using vCal

If you have a question about this talk, please contact Microsoft Research Cambridge Talks Admins.

Nonlinear techniques are used extensively in image and video processing with applications ranging from low level kernels such as denoising and detail enhancement to higher level operations such as object manipulation and special effects. In this talk, we will describe two computationally efficient data structures which dramatically simplify and accelerate a variety of algorithms for video processing.

Our first data structure is the bilateral grid, an image representation that explicitly accounts for intensity edges. By interpreting brightness differences as Euclidean distances, the bilateral grid naturally encodes the notion of edge-awareness into filters defined on it. Smooth functions defined on the bilateral grid are piecewise-smooth in image space. Within this framework, we derive efficient reinterpretations of a number of nonlinear filters commonly used in computational photography as operations on the bilateral grid, including the bilateral filter, edge-aware scattered data interpolation, and local histogram equalization. We also show how these techniques can be easily parallelized onto modern graphics hardware for real-time processing of high definition video.

The second data structure we describe is the video mesh, designed as a flexible central data structure for general-purpose nonlinear video editing workflows. It represents objects in a video sequence as 2.5D “paper cutouts” and allows interactive editing of moving objects and modeling of depth, which enables 3D effects and post-exposure camera control. In our representation, motion and depth are sparsely encoded by a set of points tracked over time. The video mesh is a triangulation over this point set and per-pixel information is obtained by interpolation. To handle occlusions and detailed object boundaries, we rely on the user to rotoscope the scene at a sparse set of frames using spline curves. We introduce an algorithm to robustly and automatically cut the mesh into local layers with proper occlusion topology, and propagate the splines to the remaining frames. Object boundaries are refined with per-pixel alpha mattes.

At its core, the video mesh is a collection of texture-mapped triangles, which we can edit and render interactively using graphics hardware. We demonstrate the effectiveness of our representation with special effects such as 3D viewpoint changes, object insertion, depth-of-field manipulation, and 2D to 3D video conversion.

This talk is part of the Microsoft Research Cambridge, public talks series.

Tell a friend about this talk:

This talk is included in these lists:

Note that ex-directory lists are not shown.


© 2006-2019, University of Cambridge. Contact Us | Help and Documentation | Privacy and Publicity