Efficient Data Structures for Nonlinear Video Processing
- đ¤ Speaker: Jiawen Chen, MIT's Computer Science and Artificial Intelligence Laboratory
- đ Date & Time: Monday 11 April 2011, 14:00 - 15:00
- đ Venue: Small lecture theatre, Microsoft Research Ltd, 7 J J Thomson Avenue (Off Madingley Road), Cambridge
Abstract
Nonlinear techniques are used extensively in image and video processing with applications ranging from low level kernels such as denoising and detail enhancement to higher level operations such as object manipulation and special effects. In this talk, we will describe two computationally efficient data structures which dramatically simplify and accelerate a variety of algorithms for video processing.
Our first data structure is the bilateral grid, an image representation that explicitly accounts for intensity edges. By interpreting brightness differences as Euclidean distances, the bilateral grid naturally encodes the notion of edge-awareness into filters defined on it. Smooth functions defined on the bilateral grid are piecewise-smooth in image space. Within this framework, we derive efficient reinterpretations of a number of nonlinear filters commonly used in computational photography as operations on the bilateral grid, including the bilateral filter, edge-aware scattered data interpolation, and local histogram equalization. We also show how these techniques can be easily parallelized onto modern graphics hardware for real-time processing of high definition video.
The second data structure we describe is the video mesh, designed as a flexible central data structure for general-purpose nonlinear video editing workflows. It represents objects in a video sequence as 2.5D “paper cutouts” and allows interactive editing of moving objects and modeling of depth, which enables 3D effects and post-exposure camera control. In our representation, motion and depth are sparsely encoded by a set of points tracked over time. The video mesh is a triangulation over this point set and per-pixel information is obtained by interpolation. To handle occlusions and detailed object boundaries, we rely on the user to rotoscope the scene at a sparse set of frames using spline curves. We introduce an algorithm to robustly and automatically cut the mesh into local layers with proper occlusion topology, and propagate the splines to the remaining frames. Object boundaries are refined with per-pixel alpha mattes.
At its core, the video mesh is a collection of texture-mapped triangles, which we can edit and render interactively using graphics hardware. We demonstrate the effectiveness of our representation with special effects such as 3D viewpoint changes, object insertion, depth-of-field manipulation, and 2D to 3D video conversion.
Series This talk is part of the Microsoft Research Cambridge, public talks series.
Included in Lists
- All Talks (aka the CURE list)
- bld31
- Cambridge Centre for Data-Driven Discovery (C2D3)
- Cambridge talks
- Chris Davis' list
- Guy Emerson's list
- Interested Talks
- Microsoft Research Cambridge, public talks
- ndk22's list
- ob366-ai4er
- Optics for the Cloud
- personal list
- PMRFPS's
- rp587
- School of Technology
- Small lecture theatre, Microsoft Research Ltd, 7 J J Thomson Avenue (Off Madingley Road), Cambridge
- Trust & Technology Initiative - interesting events
- yk449
Note: Ex-directory lists are not shown.
![[Talks.cam]](/static/images/talkslogosmall.gif)

Jiawen Chen, MIT's Computer Science and Artificial Intelligence Laboratory
Monday 11 April 2011, 14:00-15:00