Research

Ongoing projects in our group

 
 
 

Driving in the MATRIX

Deep learning has rapidly transformed the state of the art algorithms used to address a variety of problems in computer vision and robotics. These breakthroughs have however relied upon massive amounts of human annotated training data. This time-consuming process has begun impeding the progress of these deep learning efforts. By training machine learning algorithms on a rich virtual world, we can illustrate that real objects in real scenes can be learned and classified using synthetic data. This approach offers the possibility of accelerating deep learning’s application to sensor based classification problems like those that appear in self-driving cars.

 

Failing To Learn

One of the major open challenges in self-driving cars is the ability to detect cars and pedestrians to safely navigate in the world. Deep learning-based object detector approaches have enabled great advances in using camera imagery to detect and classify objects. But for a safety critical application such as autonomous driving, the error rates of the current state-of-the-art are still too high to enable safe operation. Moreover, the characterization of object detector performance is primarily limited to testing on prerecorded datasets. Errors that occur on novel data go undetected without additional human labels. In this paper, we propose an automated method to identify
mistakes made by object detectors without ground truth labels. We show that inconsistencies in object detector output between a pair of similar images can be used as hypotheses for false negatives (e.g. missed detections) and using a novel set of features for each hypotheses, an off-the-shelf binary classifier can be used to find valid errors. In particular, we study two distinct cues - temporal and stereo inconsistencies - using data that is readily available on most autonomous vehicles. Our method can be used with any camera-based object detector and we illustrate the technique on several sets of real world data. We show that a state-of-the-art detector, tracker and our classifier trained only on synthetic data can identify valid errors on KITTI tracking dataset with an Average Precision of 0.88. We also release a new tracking dataset with over 100 sequences totaling more than 80, 000 labeled pairs of stereo images along with ground truth disparity from a game engine to facilitate further research.

Guaranteed Safe Autonomous Driving

Trajectory planning for autonomous vehicles requires a mathematical model to describe how the vehicle moves through the world. However, models are imperfect, and accounting for model uncertainty is critical to ensuring safety. Furthermore, depending on model complexity, a trajectory planner may or may not be able to find solutions in real time. The proposed work uses low-complexity models to produce trajectories, and bounds the model error of the vehicle's ability to follow such trajectories. The range of states a vehicle can achieve in this framework is computed offline in a Forward Reachable Set (FRS), which is represented as a function that conservatively approximates the vehicle's states (in 2-D space) and its parameterized trajectories. The FRS is intersected with obstacles in the world at runtime to exclude unsafe trajectories; optimization over the remaining trajectories ensures that a trajectory is chosen that is safe for the vehicle to follow despite uncertainty. This method is demonstrated in simulated comparison against the Rapidly-exploring Random Trees (RRT) and Nonlinear Model Predictive Control (NMPC) approaches; and on a Segway RMP mobile robot and a Rover carlike robot.

Modeling Camera Effects

Recent work has focused on generating synthetic imagery and augmenting real imagery to increase the size and variability of training data for learning visual tasks in urban scenes. This includes increasing the occurrence of occlusions or varying environmental and weather effects. However, few have addressed modeling the variation in the sensor domain. Unfortunately, varying sensor effects can degrade performance and generalizability of results for visual tasks trained on human annotated datasets. This paper proposes an efficient, automated physically- based augmentation pipeline to vary sensor effects – specifically, chromatic aberration, blur, exposure, noise, and color cast – across both real and synthetic imagery. In particular, this paper illustrates that augment- ing training datasets with the proposed pipeline improves the robustness and generalizability of object detection on a variety of benchmark vehicle datasets.