I have graduated from Rutgers University, and started working at Intel Labs, Hillsboro, Oregon. I defended my PhD thesis titled Preventing and Diagnosing Software Upgrade Failures in Oct 2011 under the guidance of Ricardo Bianchini.
My research interests are in the area of reliability and manageability of operating systems and distribtued systems. At Intel, I work on the Systems Security area.
Modern software systems are complex and comprise many interacting
and dependent components. Frequent upgrades are required to fix bugs, patch
security vulnerabilities, and add or remove features. Unfortunately, many
upgrades either fail or produce undesired behavior resulting in service
disruption, user dissatisfaction, and/or monetary
loss. To make matters worse, when upgrades fail or misbehave, developers are given
limited (and often unstructured) information to pinpoint and correct the problems.
As part of this project, we have build two systems to improve the management of software
upgrades. Both systems rely on environment information and dynamic execution data collected from users who have previously upgraded the software.
The first (called Mojave) is an upgrade recommendation system that informs a user who intends to upgrade the software about whether the upgrade is likely to succeed. Regardless of Mojave's recommendation, if the user decides to upgrade and it fails, our second system (called Sahara) comes into play.
Sahara is a failed upgrade debugging system that identifies a small subset of routines that are likely to contain the root cause of the failure. We evaluate both systems using several real upgrade failures with widely used software. Our results demonstrate that our systems are very accurate in predicting upgrade failures and identifying the likely culprits for upgrade failures.
The focus of this project was to improve reliability of large distributed storage systems by using better reliability metrics and more efficient policies for recovering from hardware failures. We proposed a new metric, Normalcy Deviation Score (NDS), for dynamically quantifying the reliability status of the storage system, and MinI (Minimum Intersection), a novel recovery scheduling policy that improves reliability by efficiently reconstructing data after a failure. We evaluated NDS and MinI for three popular data allocation schemes using a simulation of a distributed storage system based on erasure-codes.
The aim of this project was to validate database administrator actions before they are propagated to the online system. The validation infrastructure detects major classes of operator mistakes, before they are exposed to the end-user. We designed and implemented the validation infrastructure for RUBiS, a multi-tier auction service using network virtualization and C-JDBC (Cluster JDBC).
Rekha Bachwani:— rbachwan[at]cs.university.edu