Ongoing projects in our group


Driving in the MATRIX

Deep learning has rapidly transformed the state of the art algorithms used to address a variety of problems in computer vision and robotics. These breakthroughs have however relied upon massive amounts of human annotated training data. This time-consuming process has begun impeding the progress of these deep learning efforts. By training machine learning algorithms on a rich virtual world, we can illustrate that real objects in real scenes can be learned and classified using synthetic data. This approach offers the possibility of accelerating deep learning’s application to sensor based classification problems like those that appear in self-driving cars.


Failing To Learn

One of the major open challenges in self-driving cars is the ability to detect cars and pedestrians to safely navigate in the world. Deep learning-based object detector approaches have enabled great advances in using camera imagery to detect and classify objects. But for a safety critical application such as autonomous driving, the error rates of the current state-of-the-art are still too high to enable safe operation. Moreover, the characterization of object detector performance is primarily limited to testing on prerecorded datasets. Errors that occur on novel data go undetected without additional human labels. In this paper, we propose an automated method to identify
mistakes made by object detectors without ground truth labels. We show that inconsistencies in object detector output between a pair of similar images can be used as hypotheses for false negatives (e.g. missed detections) and using a novel set of features for each hypotheses, an off-the-shelf binary classifier can be used to find valid errors. In particular, we study two distinct cues - temporal and stereo inconsistencies - using data that is readily available on most autonomous vehicles. Our method can be used with any camera-based object detector and we illustrate the technique on several sets of real world data. We show that a state-of-the-art detector, tracker and our classifier trained only on synthetic data can identify valid errors on KITTI tracking dataset with an Average Precision of 0.88. We also release a new tracking dataset with over 100 sequences totaling more than 80, 000 labeled pairs of stereo images along with ground truth disparity from a game engine to facilitate further research.

Guaranteed Safe Autonomous Driving

Trajectory planning for autonomous vehicles requires a mathematical model to describe how the vehicle moves through the world. However, models are imperfect, and accounting for model uncertainty is critical to ensuring safety. Furthermore, depending on model complexity, a trajectory planner may or may not be able to find solutions in real time. The proposed work uses low-complexity models to produce trajectories, and bounds the model error of the vehicle's ability to follow such trajectories. The range of states a vehicle can achieve in this framework is computed offline in a Forward Reachable Set (FRS), which is represented as a function that conservatively approximates the vehicle's states (in 2-D space) and its parameterized trajectories. The FRS is intersected with obstacles in the world at runtime to exclude unsafe trajectories; optimization over the remaining trajectories ensures that a trajectory is chosen that is safe for the vehicle to follow despite uncertainty. This method is demonstrated in simulated comparison against the Rapidly-exploring Random Trees (RRT) and Nonlinear Model Predictive Control (NMPC) approaches; and on a Segway RMP mobile robot and a Rover carlike robot.

Modeling Camera Effects

Recent work has focused on generating synthetic imagery and augmenting real imagery to increase the size and variability of training data for learning visual tasks in urban scenes. This includes increasing the occurrence of occlusions or varying environmental and weather effects. However, few have addressed modeling the variation in the sensor domain. Unfortunately, varying sensor effects can degrade performance and generalizability of results for visual tasks trained on human annotated datasets. This paper proposes an efficient, automated physically- based augmentation pipeline to vary sensor effects – specifically, chromatic aberration, blur, exposure, noise, and color cast – across both real and synthetic imagery. In particular, this paper illustrates that augment- ing training datasets with the proposed pipeline improves the robustness and generalizability of object detection on a variety of benchmark vehicle datasets.

Pickup and Delivery

This paper develops a computationally efficient algorithm for the Multiple Vehicle Pickup and Delivery Problem (MVPDP) with the objective of minimizing the tour cost incurred while completing the task of pickup and delivery of customers. To this end, this paper constructs a novel 0-1 Integer Quadratic Programming (IQP) problem to exactly solve the MVPDP. Compared to the state-of-the-art Mixed Integer Linear Programming (MILP) formulation of the problem, the one presented here requires fewer constraints and decision variables. To ensure that this IQP formulation of the MVPDP can be solved in a computationally efficient manner, this paper devises a set of sufficient conditions to ensure convexity of this formulation when the integer variables are relaxed. In addition, this paper describes a transformation to map any non-convex IQP formulation of the MVPDP into an equivalent convex one. The superior computational efficacy of this convex IQP method when compared to the state-of-the-art MILP formulation is demonstrated through extensive simulated and real-world experiments.

Robust Environmental Mapping

Constructing a spatial map of environmental parameters is a crucial step to preventing hazardous chemical leakages, forest fires, or while estimating a spatially distributed physical quantities such as terrain elevation. Although prior methods can do such mapping tasks efficiently via dispatching a group of autonomous agents, they are unable to ensure satisfactory convergence to the underlying ground truth distribution in a decentralized manner when any of the agents fail. Since the types of agents utilized to perform such mapping are typically inexpensive and prone to failure, this results in poor overall mapping performance in real-world applications, which can in certain cases endanger human safety. This paper presents a Bayesian approach for robust spatial mapping of environmental parameters by deploying a group of mobile robots capable of ad-hoc communication equipped with short-range sensors in the presence of hardware failures. Our approach first utilizes a variant of the Voronoi diagram to partition the region to be mapped into disjoint regions that are each associated with at least one robot. These robots are then deployed in a decentralized manner to maximize the likelihood that at least one robot detects every target in their associated region despite a non-zero probability of failure. A suite of simulation results is presented to demonstrate the effectiveness and robustness of the proposed method when compared to existing techniques.


Pedestrians at complex urban intersections

Driving in complex urban environments is one of the major challenges for autonomous vehicles (AVs). For AVs to operate in an environment crowded with people, understanding pedestrian pose, motion, behavior, and intention will greatly increase our ability to function safely and efficiently. We present a novel dataset titled PedX, a large-scale multimodal collection of pedestrians at complex urban intersections. PedX consists of more than 5,000 pairs of high-resolution stereo images and LiDAR data along with providing 2D and 3D labels of pedestrians. We also present a novel 3D model fitting algorithm for automatic 3D labeling harnessing constraints across different modalities and novel shape and temporal priors. All annotated 3D pedestrians are localized into the real-world metric space, and the generated 3D models are validated using a mocap system configured in a controlled outdoor environment to simulate pedestrians in urban intersections. We also show that the manual 2D labels can be replaced by state-of-the-art automated labeling approaches, thereby facilitating automatic generation of large scale datasets.