https://arxiv.org/abs/2211.06108
In autonomous driving, LiDAR and radar are crucial for environmental perception. LiDAR offers precise 3D spatial sensing information but struggles in adverse weather like fog. Conversely, radar signals can penetrate rain or mist due to their specific wavelength but are prone to noise disturbances. Recent state-of-the-art works reveal that the fusion of radar and LiDAR can lead to robust detection in adverse weather. The existing works adopt convolutional neural network architecture to extract features from each sensor data, then align and aggregate the two branch features to predict object detection results. However, these methods have low accuracy of predicted bounding boxes due to a simple design of label assignment and fusion strategies.
In this paper, we propose a bird's-eye view fusion learning-based anchor box-free object detection system, which fuses the feature derived from the radar range-azimuth heatmap and the LiDAR point cloud to estimate possible objects.
Fig. 1. Performance of the RaLiBEV in clear and foggy weather. The LiDAR and radar visualization results are combined by radar range azimuth heatmap with jet pseudo-color in the background and LiDAR with white point cloud and 2D object bounding boxes. The ground-truth boxes are orange, and the predicted boxes by RaLiBEV are red. All the boxes use green lines to indicate the heading direction.
Different label assignment strategies have been designed to facilitate the consistency between the classification of foreground or background anchor points and the corresponding bounding box regressions. Furthermore, the performance of the proposed object detector is further enhanced by employing a novel interactive transformer module. The superior performance of the methods proposed in this paper has been demonstrated using the recently published Oxford Radar RobotCar dataset. Our system's average precision significantly outperforms the state-of-the-art method by 13.1% and 19.0% at Intersection of Union (IoU) of 0.8 under 'Clear+Foggy' training conditions for 'Clear' and 'Foggy' testing, respectively.
Fig. 2. Left, visualization of detection results of MVDNet and RaLiBEV. Right, dynamic weather sensitive features in sensor fusion process.
Fig. 3, shows the ground-truth, the prediction from MVDNet in clear weather, the prediction from RaLiBEV in clear weather and foggy weather at the 3332th and 4103th frame of test dataset. The ground-truth bounding boxes are decorated with yellow color, and the predicted boxes by each model are shown with red color. All boxes use green lines to indicate the heading of the object. It is easy to find out that the prediction of MVDNet have 3 miss detection and 4 predicted boxes with reversed heading direction in the top row figure, and 1 false alarm and 2 reversed predicted boxes in the bottom row figure, though the weather is clear. In contrast, The RaLiBEV shows perfect detection results in clear weather, and only gives one false detection and no reversed heading prediction in both time frame under foggy weather.
In Fig. 3, The top two rows are corresponding to Frame 2799 and the bottom two rows are corresponding to Frame 3299, which are selected randomly. Each frame includes selections from clear and foggy weather conditions for analysis. For each column, the feature maps $X_Q, X_R, X_L, X_QX_R^T$ , and $X_QX_L^T$ are illustrated for visual comprehension. Furthermore, all feature maps are normalized to the range [0, 1] for a consistent representation. Utilizing a unified BEV query $X_Q$, the input radar feature map XR and LiDAR feature map $X_L$ are weighted to $X_QX_R^T$ and $X_QX_L^T$ respectively, subsequently producing a fused feature for additional processing. It can be observed that $X_Q$ manages radar and LiDAR features differently under varying weather conditions. In clear weather, $X_QX_L^T$ is noticeably larger in numerical value than $X_QX_R^T$ . However, in foggy conditions, the input XL is significantly impacted by noise, resulting in $X_QX_R^T$ being larger than $X_QX_L^T$. From these observations, we can draw the following conclusions: • The approach utilizes an interactive transformer to aggregate data from LiDAR and radar branches in a unified manner; • The BEV query in the interactive transformer can focus on more information-rich signals under different weather conditions.
Fig. 4. The entire pipeline of proposed radar and LiDAR fusion-based anchor box free object detector, RaLiBEV.
Fig. 5. Overview of label assignment strategies for object detection. (a) Identifies the object with a red-bordered ground-truth bounding box and a green headline. Yellow and red ellipses represent ground-truth and predicted Gaussian distributions, respectively. Red dots are positive sample points, and green dots mark the ground-truth Gaussian centers. Strategies (b) to (e) show different methods for selecting positive samples for box loss calculation, ranging from (b) using all anchor points within the ground-truth area, to (c) selecting the center, (d) the point with the highest foreground-background classification score, and (e) the point with the highest ”score plus IoU”. Strategy (f) integrates the approach from (e) with an alternative loss function as Eq. (14) in paper.