PEDX
Ming-Yuan Yu
myyu@umich.edu
Nick Goumas
Karl Rosaen
krosaen@umich.edu
Ram Vasudevan
ramv@umich.edu
Wonhui Kim
wonhui@umich.edu
Manikandasriram Srinivasan Ramanagopal
srmani@umich.edu
Matthew Johnson-Roberson
mattjr@umich.edu
Charles Barto
bartoc@umich.edu
All authors affiliated with the Robotics Department of the University of Michigan, Ann Arbor.
Abstract
PedX is a large-scale multi-modal collection of pedestrians at complex urban intersections. The dataset provides high-resolution stereo images and LiDAR data with manual 2D and automatic 3D annotations. The data was captured using two pairs of stereo cameras and four Velodyne LiDAR sensors.
DESCRIPTION
Understanding pedestrian pose, motion, behavior, and intention is important for mobile robots, such as autonomous vehicles (AVs), to function safely and efficiently in an environment crowded with people. We collected a large-scale multimodal dataset, PedX (pedestrians in intersections), at downtown intersections in Ann Arbor, MI, USA in 2017. The PedX dataset consists of more than 5,000 pairs of high-resolution (12MP) stereo images and LiDAR data along with providing 2D and 3D labels of pedestrians. We also developed a novel 3D model fitting algorithm for automatic 3D labeling harnessing constraints across different modalities and novel shape and temporal priors.
Based on 3D poses and locations estimated in prior frames, we designed a biomechanically inspired recurrent neural network, Bio-LSTM, that can predict the location and 3D articulated body pose of pedestrians in a global coordinate frame in future frames. The proposed network incorporated biomechanical constraints, including the periodicity of human walking gait, the mirror symmetry of the human body, and the change of ground reaction forces in a human gait cycle. The proposed Bio-LSTM network shows improved longer-term pose prediction performance (>5 seconds) for in-the-wild pedestrians at real-world intersection scale and can robustly handle noise from the field data.
Data Acquisition
Sensor Setup
Our data capturing platform was equipped with the following sensors:
4 Laserscanners: Velodyne HDL-32E
4 Color cameras, 12 Megapixels: Allied Vision Manta G-1236C
4 Lenses, 12mm: V1228-MPY
The vehicle is equipped with four LiDAR scanners, two at each side of the roof with a roll angle of 45° between them. The cameras are mounted on top of our vehicle in two stereo pairs. The left pair is mounted on an independent bar rotated by 30° to capture the incoming road from the left and the right pair facing directly forward. We arranged the cameras with a baseline of 0.33m and 0.27m for the left and right stereo pairs, respectively. We triggered the cameras via a trigger signal emitted when the second camera from the left started exposing its sensor. We recorded the timestamp of each image using the cameras' internal clock and these clocks were synchronized via the IEEE1588-2008 PTP protocol. We also synchronized the computer timestamp to the camera clocks using the same method. Using this timestamp, we compute LiDAR returns within a given camera frame.
Scene Selection
The capture sites and times are selected to maximize the amount of traffic, complexity of crossing patterns, and lighting and weather variation. To capture complex interactions between pedestrians and vehicles, we focused on 4-way stop intersections without any traffic signals. Three intersections are selected around a downtown area where the pedestrian-camera distance ranges from 5-40m. Lighting conditions vary based on cloud cover and shadows cast by the buildings. We manually selected interesting sequences of captured frames for manual annotation based on the observed activity of pedestrians or pedestrian-vehicle interactions.
Data Formatting
Image Compression
The raw Bayer 12-bit images were converted into compressed PNG/JPEG image formats. We have compressed the raw images into 16-bit PNG files to keep the high dynamic ranges. Due to the large file size, however, 16-bit PNG images are currently not available for download. Please contact us if you need those image.
For the final release, the raw images were processed with JPEG compression with a quality level of 90. We provide original and rectified images in the Deep Blue Dataset. We applied the gamma correction when rectifying the images.
Statistics
Distance
The histograms show the distributions of distance of pedestrians. Distances between pedestrians and camera centers are computed to plot the first two histograms. Note that majority of the pedestrians are within the range of distance 20-35m.
Orientation
RELATED PUBLICATIONS
C. Anderson, X. Du, R. Vasudevan, and M. Johnson-Roberson, "Stochastic Sampling Simulation for Pedestrian Trajectory Prediction," IEEE/RSJ International Conference on Intelligent Robots and Systems, 2019. [PDF]
W. Kim, M. Srinivasan Ramanagopal, C. Barto, M.-Y. Yu, K. Rosaen, N. Goumas, R. Vasudevan, and M. Johnson-Roberson, "PedX: Benchmark Dataset for Metric 3-D Pose Estimation of Pedestrians in Complex Urban Intersections," in IEEE Robotics and Automation Letters, vol. 4, no. 2, pp. 1940-1947, 2019. [PDF] [dataset] [code] [video]
X. Du, R. Vasudevan, and M. Johnson-Roberson, "Bio-LSTM: A Biomechanically Inspired Recurrent Neural Network for 3-D Pedestrian Pose and Gait Prediction," in IEEE Robotics and Automation Letters, vol. 4, no. 2, pp. 1501-1508, 2019. [PDF] [video]
@inproceedings{anderson2019stochastic, title={Stochastic Sampling Simulation for Pedestrian Trajectory Prediction}, author={Anderson, Cyrus and Du, Xiaoxiao and Vasudevan, Ram and Johnson-Roberson, Matthew}, booktitle={IEEE/RSJ Int. Conf. Intell. Robots and Systems (IROS)}, year={2019} } @article{kim2019pedx, title={PedX: Benchmark dataset for metric 3-D pose estimation of pedestrians in complex urban intersections}, author={Kim, Wonhui and Ramanagopal, Manikandasriram Srinivasan and Barto, Charles and Yu, Ming-Yuan and Rosaen, Karl and Goumas, Nick and Vasudevan, Ram and Johnson-Roberson, Matthew}, journal={IEEE Robotics and Automation Letters}, volume={4}, number={2}, pages={1940--1947}, year={2019}, publisher={IEEE} } @article{du2019bio, title={Bio-lstm: A biomechanically inspired recurrent neural network for 3-d pedestrian pose and gait prediction}, author={Du, Xiaoxiao and Vasudevan, Ram and Johnson-Roberson, Matthew}, journal={IEEE Robotics and Automation Letters}, volume={4}, number={2}, pages={1501--1508}, year={2019}, publisher={IEEE} }
License
MIT License
Copyright (c) 2019 UM & Ford Center for Autonomous Vehicles (FCAV) Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions: The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software.
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NON INFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.