In News

NJ Star Ledger
NJCom
Newhouse News Service

 

Current Research Projects

3D Face Reconstruction from Selfie Videos

 

Past Research Projects

3D Human Shape Modeling  from Video Imagery
3D Human Shape Modeling from 3D Sensor Data
Automated Scene Understanding
Facial Features Detection and Tracking
Facial Expression Recognition
Shoulder and Skin Blobs Tracking
Human Activity Recognition

 

Thesis Research

Spectral Latent Variable Model
Hierarchical Features for 3D Human Pose Estimation
Visual Tracking in Latent Space
Bayesian Mixture of Experts for 3D Human Pose Estimation
 
 

 

Software

Sparse Bayesian Multi-Category Classifier
Spectral Latent Variable Model
Bayesian Mixture of Expert

 

Publications/Patents
 

Thesis

CV/Resume
 
  
 


 



  Atul Kanaujia, Ph. D.
 Atul Kanaujia received his B.Tech. in Computer Science and Engineering from Indian Institute of Technology, Bombay in 2000 and MSc. in Computer Science from Rutgers University in 2003. He worked as a Associate Member of Technical Staff at Mentor Graphics, R & D (India) during 2003-2004. He received PhD from Rutgers, The State University of New Jersey. His thesis was supervised by Dimitris Metaxas and co-supervised by Cristian Sminchisescu. Dr. Kanaujia is currently working as a Lead Research Scientist at Nokia Here research lab. Prior to joining Nokia he was working at ObjectVideo, Inc. His areas of research include 3D human pose and face reconstruction from monocular and multi-view imagery, non-linear manifold learning, semi-supervised learning, 2D facial features detection and tracking, human activity recognition and scene understanding.


Current Research

3D Human Face Modeling from Selfie Videos

I am currently working on very precise 3D Face reconstruction from image sequences. Areas of research include discriminative models for facial landmarks detection and matching across keyframes in video, structure from motion and 3D model fitting. Additionally, I am working on advanced methods for learning 3D representation of human faces.

 

Past Research Projects

3D Human Shape Modeling  from Video Imagery

We propose a robust framework that combines discriminative and generative approaches for inferring 3D pose and anthropometric characteristics of a person. In order to deal with loss of depth information and resolve ambiguities, we use a combination of techniques based on learned dynamical priors, biomechanical characterization of human pose and multi-hypothesis tracking. Each of the techniques aims to constrain the pose search to find the most optimal pose and shape that best describes the person in the image. The system is fully automatic and has modular architecture to support extensibility and facilitate transition to operational deployment

 

3D Human Shape Modeling from 3D Sensor Data

In this project we are developing a fast and efficient system for accurate human 3D pose and shape estimation from multi-modal 3D sensor data. Although most of the existing surveillance systems capture monocular image sequences, multi-camera systems are increasingly being deployed at both indoor facilities (such as retailers, sports training and airports) and outdoor environments to overcome core deficiencies of single camera systems. The goal is to apply pose and shape estimation algorithms for detecting anomalous human shapes and hostile actions from 3D sensor data.

 

Automated Scene Understanding - Markov Logic Networks

Automatic extraction and representation of visual concepts and semantic information in scene is a desired capability in any security and surveillance operations. In this project we target the problem of visual event recognition in network information environment, where faulty sensors, lack of effective visual processing tools and incomplete domain knowledge frequently cause uncertainty in the data set and consequently, in the visual primitives extracted from it. We adopt Markov Logic Network (MLN), that combines probabilistic graphical models and first order logic, to address the task of reasoning under uncertainty. MLN is a knowledge representation language that combines domain knowledge, visual concepts and experience to infer simple and complex real-world events. MLN generalizes over the existing state-of-the-art probabilistic models, including hidden Markov models, Bayesian networks, and stochastic grammars. Moreover, the framework can be made scalable to support variety of entities, activities and interactions that are typically observed in the real world.

 

 

Thesis Research

Facial Features Tracking

We present a generic framework to track shapes across large variations by learning non-linear shape manifold as overlapping, piece-wise linear subspaces. We use landmark based shape analysis to train a Gaussian mixture model over the aligned shapes and learn a Point Distribution Model (PDM) for each of the mixture components.

 

 

Shoulder and Skin Blobs Tracking

The goal of this research was to track hands of a generic human subject at approximately real time. The fast and convulsive hand movement provides  important cues for recognizing deception and nervousness of the subject.

 

Facial Expression Recognition

We propose a framework to recognize various expressions by tracking facial features. Our method uses localized active shape models to track feature points in the subspace obtained from localized Non-negative Matrix Factorization. The tracked feature points are used to train conditional model for recognizing prototypic expressions like Anger, Disgust, Fear, Joy, Surprise and Sadness. We formulate the task as a sequence labeling problem and use Conditional Random Fields (CRF) to probabilistically predict expressions. In CRF, the distribution is conditioned on the entire sequence rather than a single observation.

 

Human Activity Recognition

We present algorithms for recognizing human motions  in monocular video sequences, based on discriminative Conditional Random Field (CRF). Existing approaches make simplifying, often unrealistic assumptions on the conditional independence of observations given the motion class labels and cannot accommodate overlapping features or long term contextual dependencies in the observation sequence. In contrast, conditional models like the CRFs seamlessly represent contextual dependencies, support efficient, exact inference using dynamic programming, and their parameters can be trained using convex optimization.

 

Spectral Latent Variable Model  

We propose non-linear generative models referred to as Sparse Spectral Latent Variable Models (SLVM), that combine the advantages of spectral embeddings with the ones of parametric latent variable models: (1) provide stable latent spaces that preserve global or local geometric properties of the modeled data; (2) offer low-dimensional generative models with probabilistic, bi-directional mappings between latent and ambient spaces, (3) are probabilistically consistent (i.e., reflect the data distribution, both jointly and marginally) and efficient to learn and use.

 

Hierarchical Features for 3D Human Pose Estimation  

In this work we advocate the semi-supervised learning of hierarchical image descriptions in order to better tolerate variability at multiple levels of detail. We combine multilevel encodings with improved stability to geometric transformations, with metric learning and semi-supervised manifold regularization methods in order to further profile them for task invariance– resistance to background clutter and within the same human pose class variance. We quantitatively analyze the effectiveness of both descriptors and learning methods and show that each one can contribute, sometimes substantially, to more reliable 3D human pose estimates in cluttered images.

 

Visual Tracking in Latent Space

We propose family of algorithms for 3D human motion visual inference in low dimensional non-linear state spaces. Low-dimensional models are appropriate because many visual processes exhibit strong non-linear correlations in both the image observations and the target (hidden state variables). We empirically show that the method successfully reconstructs the complex 3D motion of humans in real monocular video sequences

 

3D Human Pose Estimation using Bayesian Mixture of Experts (BME)  

A Discriminative framework to estimate 3D human motion in monocular video sequences is proposed in this work. We aim for probabilistically motivated tracking algorithms and for models that can estimate complex multi-valued mappings based on different image descriptors encoding the observations .