TY - GEN
T1 - Multimodal CNN Pedestrian Classification: A Study on Combining LIDAR and Camera Data
AU - Melotti, Gledson
AU - Premebida, Cristiano
AU - Goncalves, Nuno M. M. Da S.
AU - Nunes, Urbano J. C.
AU - Faria, Diego R.
PY - 2018/12/10
Y1 - 2018/12/10
N2 - This paper presents a study on pedestrian classification based on deep learning using data from a monocular camera and a 3D LIDAR sensor, separately and in combination. Early and late multi-modal sensor fusion approaches are revisited and compared in terms of classification performance. The problem of pedestrian classification finds applications in advanced driver assistance system (ADAS) and autonomous driving, and it has regained particular attention recently because, among other reasons, safety involving self-driving vehicles. Convolutional Neural Networks (CNN) is used in this work as classifier in distinct situations: having a single sensor data as input, and by combining data from both sensors in the CNN input layer. Range (distance) and intensity (reflectance) data from LIDAR are considered as separate channels, where data from the LIDAR sensor is feed to the CNN in the form of dense maps, as the result of sensor coordinate transformation and spatial filtering; this allows a direct implementation of the same CNN-based approach on both sensors data. In terms of late-fusion, the outputs from individual CNNs are combined by means of learning and non-learning approaches. Pedestrian classification is evaluated on a ‘binary classification’ dataset created from the KITTI Vision Benchmark Suite, and results are shown for each sensor-modality individually, and for the fusion strategies.
AB - This paper presents a study on pedestrian classification based on deep learning using data from a monocular camera and a 3D LIDAR sensor, separately and in combination. Early and late multi-modal sensor fusion approaches are revisited and compared in terms of classification performance. The problem of pedestrian classification finds applications in advanced driver assistance system (ADAS) and autonomous driving, and it has regained particular attention recently because, among other reasons, safety involving self-driving vehicles. Convolutional Neural Networks (CNN) is used in this work as classifier in distinct situations: having a single sensor data as input, and by combining data from both sensors in the CNN input layer. Range (distance) and intensity (reflectance) data from LIDAR are considered as separate channels, where data from the LIDAR sensor is feed to the CNN in the form of dense maps, as the result of sensor coordinate transformation and spatial filtering; this allows a direct implementation of the same CNN-based approach on both sensors data. In terms of late-fusion, the outputs from individual CNNs are combined by means of learning and non-learning approaches. Pedestrian classification is evaluated on a ‘binary classification’ dataset created from the KITTI Vision Benchmark Suite, and results are shown for each sensor-modality individually, and for the fusion strategies.
UR - https://ieeexplore.ieee.org/document/8569666/
U2 - 10.1109/ITSC.2018.8569666
DO - 10.1109/ITSC.2018.8569666
M3 - Conference publication
SN - 978-1-7281-0321-1
T3 - 2018 21st International Conference on Intelligent Transportation Systems (ITSC)
SP - 3138
EP - 3143
BT - 2018 21st International Conference on Intelligent Transportation Systems (ITSC)
PB - IEEE
T2 - 2018 IEEE International Conference on Intelligent Transportation Systems (ITSC)
Y2 - 4 November 2018 through 7 November 2018
ER -