Tracking Facial Features using Mixture of Point Distribution Models

The current technology is protected by patenting and trade marking office "System and Method for Tracking Facial Features,", Atul Kanaujia and Dimitris Metaxas, Rutgers Docket 07-015, Provisional Patent # 60/874,451 filed December,12  2006. No  part of this technology may be reproduced or displayed in any form without the prior written permission of the authors.

View this page in Romanian courtesy of azoft

View this page in Russian courtesy of Coupofy team

Facial Features Detection

We present a generic framework to track shapes across large variations by learning non-linear shape manifold as overlapping, piece-wise linear subspaces. We use landmark based shape analysis to train a Gaussian mixture model over the aligned shapes and learn a Point Distribution Model (PDM) for each of the mixture components. The target shape is searched by first maximizing the mixture probability density for the local feature intensity profiles along the normal followed by constraining the global shape using the most probable PDM cluster. The feature shapes are robustly tracked across multiple frames by dynamically switching between the PDMs. Our contribution is to apply ASM to the task of tracking shapes involving wide aspect changes and generic movements. This is achieved by incorporating shape priors that are learned over non-linear shape space and using them to learn the plausible shape space. We demonstrate the results on tracking facial features and provide several empirical results to validate our approach. Our framework runs close to real time at 25 frames per second and can be extended to predict pose angles using Mixture of Experts.

Our generic framework enables large scale automated training of different shapes from multiple view- points. The model can handle larger amount of variability and can be used to learn non-linear continuous shape manifold.

There have been several efforts in the past to represent non-linear shape variations using kernel PCA and multi-layer perceptron. The results from non-linear approaches largely depend on whether all the shape variations have been adequately represented in the training data. Discontinuities in the shape space may cause these models to generate implausible shapes. Kernel methods suffer from a major drawback to learn pre-image function for mapping shape in the feature space to the original space. We propose to use multiple overlapping subspaces  to capture larger shape  variance occurring in the data set due to full profile head movement. Our objective is to accurately track facial features across large head rotations. The contribution of our work is: (1) Improve the specificity of ASM to handle large shape variations by learning non-linear shape manifold. (2)Real time framework to track shapes, and (3) Learning non-linearities for accurate prediction of 3D pose angles from 2D shapes.

Shape fitting results on a full profile pose initialized with the average frontal shape. The cluster based approach allows occluded landmarks points to be identified  during the search and are ignored during the likelihood optimization of individual landmarks. This heuristic search gives very accurate face alignment for the full profile face.  

SIFT Descriptor for Appearance Modeling

We used improved appearance modeling using SIFT descriptors (Left) Gradient profile matching cost of a landmark point over a window of size 19x19. Notice the multiple minima
resulting in poor alignment of shapes. (Right) SIFT descriptor matching cost for the same landmark point

(Top) Facial feature localization using ASM with gradient profiles. (Bottom) Localization using local descriptors as SIFT features. Notice the accurate localization of eye features due to SIFT descriptors

Tracking the Shapes

Running ASM at every frame is computationally expensive and causes feature points to jitter strongly. We track the features using Sum of Squared Intensity Difference(SSID) tracker across consecutive frames. The SSID tracker is a method for registering two images and computes the displacement of the feature by minimizing the intensity matching cost, computed over a fixed sized window around the feature. Over a small inter-frame motion, a linear translation model can be accurately assumed. For an intensity surface at image location I(xi, yi, tk),  the tracker estimates the displacement vector d = (δxi, δyi) from new image I(xi + δx, yi + δy, tk+1) by minimizing the residual error over a window W around (xi, yi)














Tracking Results - Click on the images to view the movies


Full Profile Head Rotation

Generic Head Movement


Emblem Detection - Eye Blink, Head Nod and Shake Detection - Click on the images of the videos to view


Eye Blink Detection

Eye Blink Detection

Eye Blink Detection

Head nodding and shaking is detected by recognizing the motion pattern of nose in the videos


3D Head Pose Estimation using Facial Features Tracking - Real time Click on the images to view the movies

Generic Head Movement and Pose Prediction

Fast Head nodding and Shaking

Head Pose Estimation



  1. Tracking Facial Features Using Mixture of Point Distribution Models, Atul Kanaujia, Yuchi Huang, Dimitris Metaxas, CVGIP 2006, (PDF)
  2. Emblem Detections by Tracking Facial Features, Atul Kanaujia, Y. Huang, Dimitris Metaxas, CVPR Workshop on semantic learning, 2006, (PDF)
  3. Large Scale Learning of Active Shape Models, Atul Kanaujia and Dimitris Metaxas,  ICIP 2007 (PDF)

  4. Dynamic Tracking of Facial Expressions using Adaptive, Overlapping Subspaces, Dimitris Metaxas, Atul Kanaujia, Zhiguo Li. ICCS 2007 (PDF)


"System and Method for Tracking Facial Features,", Atul Kanaujia and Dimitris Metaxas, Rutgers Docket 07-015, Provisional Patent # 60/874,451 filed December, 12  2006.