Failing to Learn


Failing to Learn: Autonomously Identifying Perception Failures for Self-driving Cars

Manikandasriram Srinivasan Ramanagopal, Cyrus Anderson, Ram Vasudevan, Matthew Johnson-Roberson


One of the major open challenges in self-driving cars is the ability to detect cars and pedestrians to safely navigate in the world. Deep learning-based object detector approaches have enabled great advances in using camera imagery to detect and classify objects. But for a safety critical application, such as autonomous driving, the error rates of the current state of the art are still too high to enable safe operation. Moreover, the characterization of object detector performance is primarily limited to testing on prerecorded datasets. Errors that occur on novel data go undetected without additional human labels. In this letter, we propose an automated method to identify mistakes made by object detectors without ground truth labels. We show that inconsistencies in the object detector output between a pair of similar images can be used as hypotheses for false negatives (e.g., missed detections) and using a novel set of features for each hypothesis, an off-the-shelf binary classifier can be used to find valid errors. In particular, we study two distinct cues–temporal and stereo inconsistencies—using data that are readily available on most autonomous vehicles. Our method can be used with any camera-based object detector and we illustrate the technique on several sets of real world data. We show that a state-of-the-art detector, tracker, and our classifier trained only on synthetic data can identify valid errors on KITTI tracking dataset with an average precision of 0.94. We also release a new tracking dataset with 104 sequences totaling 80,655 labeled pairs of stereo images along with ground truth disparity from a game engine to facilitate further research.


Published in IEEE Robotics and Automation Letters -

Preprint on Arxiv -


    title={Failing to Learn: Autonomously Identifying Perception Failures for Self-driving Cars}, 
    author={M. Srinivasan Ramanagopal and C. Anderson and R. Vasudevan and M. Johnson-Roberson}, 
    journal={IEEE Robotics and Automation Letters}, 




Note this data can only be used for non-commercial applications.

Data is provided in KITTI tracking format. The data is gathered at 10Hz at different times of the day from a game engine. There are 104 sequences of varying lengths, totaling 80,655 images.

Sequences and annotations (22.5GB)

We generated corresponding fake right camera images by using the depth buffer information and performed simple in-painting operation from OpenCV to fill the holes. The corresponding ground truth disparity are also provided below. 

Right Camera Images (54.5GB)

Disparity Images (4.8GB)