Jason J. Corso

Snippet Topic: Semantic Segmentation
Partsbased Model of the Full Scene  Scene understanding remains a significant challenge in the computer vision community. The visual psychophysics literature has demonstrated the importance of interdependence among parts of the scene. Yet, the majority of methods in scene understanding remain local. Pictorial structures have arisen as a fundamental partsbased model for some vision problems, such as articulated object detection. However, the form of classical pictorial structures limits their applicability for global problems, such as semantic pixel labeling. In this work, we propose an extension of the pictorial structures approach, called pixelsupport partssparse pictorial structures, or PS3, to overcome this limitation. Our model extends the classical form in two ways: first, it defines parts directly based on pixelsupport rather than in a parametric form, and second, it specifies a space of plausible partsbased scene models and permits one to be used for inference on any given image. PS3 makes strides toward unifying objectlevel and pixellevel modeling of scene elements. We have thus far implemented the first half of our model and rely upon external knowledge to provide an initial graph structure for a given image.

J. J. Corso.
Toward partsbased scene understanding with pixelsupport
partssparse pictorial structures.
Pattern Recognition Letters: Special Issue on Scene
Understanding and Behavior Analysis, 34(7):762769, 2013.
Early version appears as arXiv.org tech report 1108.4079v1.
[ bib 
.pdf ]

More information about our earlier work in semantic region labeling is available here. 

Generalized Image Understanding  NSF CAREER project is underway. From representation to learning to inference, effective use of highlevel semantic knowledge in computer vision remains a challenge in bridging the signalsymbol gap. This research investigates the role of semantics in visual inference through the generalized image understanding problem: to automatically detect, localize, segment, and recognize the core highlevel elements and how they interact in an image, and provide a parsimonious semantic description of the image. Specifically, this research examines a unified methodology that integrates low (e.g., pixels and features), mid (e.g. latent structure), and highlevel (e.g., semantics) elements for visual inference. Adaptive graph hierarchies induced directly from the images provide the core mathematical representation. A statistical interpretation of affinities between neighboring pixels and regions in the image drives this induction. Latent elements and structure are captured with multilevel Markov networks. A probabilistic ontology represents the core knowledge and uncertainty of the inferred structure and guides the ultimate semantic interpretation of the image. At each level, rigorous methods from computer science and statistics are connected to and combined with formal semantic methods from philosophy. 

GraphShifts: Greedy Hierarchical Energy Minimization  GraphShifts is an energy minimization algorithm that manipulates a dynamic hierarchical decomposition of the data to rapidly and robustly minimize an energy function. It is a steepest descent minimizer that explores a nonlocal search space during minimization. A dynamic hierarchical representation makes exploration of this large search space plausible, and it facilitates both large jumps in the energy space analogous to combined splitandmerge operations as well as small jumps analogous to PDElike moves. We use a deterministic approach to quickly choose the optimal move at each iteration. It has been applied in 2D and 3D joint image segmentation and classification in medical images, as depicted below for the segmentation of subcortical brain structures, and natural images as depicted at the semantic image labeling page. Graphshifts typically converges orders of magnitude faster than conventional minimization algorithms, like PDEbased methods, and has been shown to be very robust to initialization.

J. J. Corso, A. Yuille, and Z. Tu.
GraphShifts: Natural Image Labeling by Dynamic Hierarchical
Computing.
In Proceedings of IEEE Conference on Computer Vision and
Pattern Recognition, 2008.
[ bib 
code 
project 
.pdf ]

More information and code. 

Multilevel Segmentation with Bayesian Affinities  Automatic segmentation is a difficult problem: it is underconstrained, precise physical models are generally not yet known, and the data presents high intraclass variance. In this research, we study methods for automatic segmentation of image data that strive to leverage the efficiency of bottomup algorithms with the power of topdown models. The work takes one step toward unifying two stateoftheart image segmentation approaches: graph affinitybased and generative modelbased segmentation. Specifically, the main contribution of the work is a mathematical formulation for incorporating soft model assignments into the calculation of affinities, which are traditionally model free. This Bayesian modelaware affinity measurement has been integrated into the multilevel Segmentation by Weighted Aggregation algorithm. As a byproduct of the integrated Bayesian model classification, each node in the graph hierarchy is assigned a most likely model class according to a set of learned model classes. The technique has been applied to the task of detecting and segmenting brain tumor and edema, subcortical brain structures and multiple sclerosis lesions in multichannel magnetic resonance image volumes. More information. 

A Data Set for Video Label Propagation 
