Abstract
For the task of autonomous indoor parking, various Visual-Inertial Simultaneous Localization And Mapping (SLAM) systems are expected to achieve comparable results with the benefit of complementary effects of visual cameras and the Inertial Measurement Units. To compare these competing SLAM systems, it is necessary to have publicly available datasets, offering an objective way to demonstrate the pros/cons of each SLAM system. However, the availability of such high-quality datasets is surprisingly limited due to the profound challenge of the groundtruth trajectory acquisition in the Global Positioning Satellite denied indoor parking environments. In this article, we establish BeVIS, a large-scale Benchmark dataset with Visual (front-view), Inertial and Surround-view sensors for evaluating the performance of SLAM systems developed for autonomous indoor parking, which is the first of its kind where both the raw data and the groundtruth trajectories are available. In BeVIS, the groundtruth trajectories are obtained by tracking artificial landmarks scattered in the indoor parking environments, whose coordinates are recorded in a surveying manner with a high-precision Electronic Total Station. Moreover, the groundtruth trajectories are comprehensively evaluated in terms of two respects, the reprojection error and the pose volatility, respectively. Apart from BeVIS, we propose a novel tightly coupled semantic SLAM framework, namely VISSLAM-2, leveraging Visual (front-view), Inertial, and Surround-view sensor modalities, specially for the task of autonomous indoor parking. It is the first work attempting to provide a general form to model various semantic objects on the ground. Experiments on BeVIS demonstrate the effectiveness of the proposed VISSLAM-2. Our benchmark dataset BeVIS is publicly available at https://shaoxuan92.github.io/BeVIS.
- [1] . 2015. Direct linear transformation from comparator coordinates into object space coordinates in close-range photogrammetry. Photogram. Eng. Remote Sens. 81, 2 (2015), 103–107.Google Scholar
Cross Ref
- [2] . 2010. Positioning by Intersection Methods. Algebraic Geodesy and Geoinformatics, Springer, Berlin, 249–263.Google Scholar
- [3] . 2018. DynaSLAM: Tracking, mapping, and inpainting in dynamic scenes. IEEE Robot. Autom. Lett. 3, 4 (2018), 4076–4083.Google Scholar
Cross Ref
- [4] . 2014. The Málaga urban dataset: High-rate stereo and LiDAR in a realistic urban scenario. Int. J. Robot. Res. 33, 2 (2014), 207–214.Google Scholar
Digital Library
- [5] . 2015. Robust visual inertial odometry using a direct EKF-based approach. In Proceedings of the IEEE/RSJ International Conference on Intelligent Robots and Systems. 298–304.Google Scholar
- [6] . 2020. YOLOv4: Optimal speed and accuracy of object detection.
arXiv:2004.10934 . Retrieved from https://arxiv.org/abs/2004.10934.Google Scholar - [7] . 2016. The EuRoC micro aerial vehicle datasets. Int. J. Robot. Res. 35, 10 (2016), 1157–1163.Google Scholar
Digital Library
- [8] . 2021. ORB-SLAM3: An accurate open-source library for visual, visual-inertial and multi-map SLAM. IEEE Trans. Robotics 37, 6 (2021), 1874–1890.Google Scholar
- [9] . 2011. Towards semantic SLAM using a monocular camera. In Proceedings of the IEEE/RSJ International Conference on Intelligent Robots and Systems. 1277–1284.Google Scholar
- [10] . 2013. Dense reconstruction using 3D object shape priors. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 1288–1295.Google Scholar
Digital Library
- [11] . 2016. A photometrically calibrated benchmark for monocular visual odometry.
arXiv:1607.02555 . Retrieved from https://arxiv.org/abs/1607.02555.Google Scholar - [12] . 2018. Recovering stable scale in monocular SLAM using object-supplemented bundle adjustment. IEEE Trans. Robot. 34, 3 (2018), 736–747.Google Scholar
Digital Library
- [13] . 2013. Unified temporal and spatial calibration for multi-sensor systems. In Proceedings of the IEEE/RSJ International Conference on Intelligent Robots and Systems. 1280–1286.Google Scholar
- [14] . 2003. Complete solution classification for the perspective-three-point problem. IEEE Trans. Pattern Anal. Mach. Intell. 25, 8 (2003), 930–943.Google Scholar
Digital Library
- [15] . 2018. [email protected] dataset: Benchmarking of visual odometry and SLAM techniques. Robot. Auton. Syst. 109 (2018), 59–67.Google Scholar
Cross Ref
- [16] . 2013. Vision meets robotics: The KITTI dataset. Int. J. Robot. Res. 32, 11 (2013), 1231–1237.Google Scholar
Digital Library
- [17] . 2015. Fast R-CNN. In Proceedings of the IEEE International Conference on Computer Vision. 1440–1448.Google Scholar
Digital Library
- [18] . 2014. Rich feature hierarchies for accurate object detection and semantic segmentation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 580–587.Google Scholar
Digital Library
- [19] . 2019. Complex urban dataset with multi-level sensors from highly diverse urban environments. Int. J. Robot. Res. 38, 6 (2019), 642–657.Google Scholar
Digital Library
- [20] . 2019. The Oxford multimotion dataset: Multiple SE(3) motions with ground truth. IEEE Robot. Autom. Lett. 4, 2 (2019), 800–807.Google Scholar
Cross Ref
- [21] . 2018. Mask-SLAM: Robust feature-based monocular SLAM by masking using semantic segmentation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops. 371–3718.Google Scholar
Cross Ref
- [22] . 2009. EPnP: An accurate O(n) solution to the PnP problem. Int. J. Comput. Vis. 81, 2 (2009), 155–166.Google Scholar
Digital Library
- [23] . 2015. Keyframe-based visual-inertial odometry using nonlinear optimization. Int. J. Robot. Res. 34, 3 (2015), 314–334.Google Scholar
Digital Library
- [24] . 2007. Accurate non-iterative O(n) solution to the PnP problem. In Proceedings of the IEEE International Conference on Computer Vision. 2252–2259.Google Scholar
Cross Ref
- [25] . 2016. A visual-aided inertial navigation and mapping system. IEEE Int. J. Adv. Robot. Syst. 13, 3 (2016), 94–112.Google Scholar
Cross Ref
- [26] . 2017. ORB-SLAM2: An open-source SLAM system for monocular, stereo, and RGB-D cameras. IEEE Trans. Robot. 33, 5 (2017), 1255–1262.Google Scholar
Digital Library
- [27] . 2017. Visual-inertial monocular SLAM with map reuse. IEEE Robot. Autom. Lett. 2, 2 (2017), 796–803.Google Scholar
Cross Ref
- [28] . 2018. QuadricSLAM: Dual quadrics from object detections as landmarks in object-oriented SLAM. IEEE Robot. Autom. Lett. 4, 1 (2018), 1–8.Google Scholar
Cross Ref
- [29] . 2011. AprilTag: A robust and flexible visual fiducial system. In Proceedings of the IEEE International Conference on Robotics and Automation. 3400–3407.Google Scholar
Cross Ref
- [30] . 2013. Exhaustive linearization for robust camera pose and focal length estimation. IEEE Trans. Pattern Anal. Mach. Intell. 35, 10 (2013), 2387–2400.Google Scholar
Digital Library
- [31] . 2019. A general optimization-based framework for global oose estimation with multiple sensors.
arXiv:1901.03642 . Retrieved from https://arxiv.org/abs/1901.03642.Google Scholar - [32] . 2018. Vins-mono: A robust and versatile monocular visual-inertial state estimator. IEEE Trans. Robot. 34, 4 (2018), 1004–1020.Google Scholar
Digital Library
- [33] . 2018. Online temporal calibration for monocular visual-inertial systems. In Proceedings of the IEEE/RSJ International Conference on Intelligent Robots and Systems. 3662–3669.Google Scholar
- [34] . 2017. YOLO9000: Better, faster, stronger. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 6517–6525.Google Scholar
Cross Ref
- [35] . 2018. YOLOv3: An Incremental Improvement.
arXiv:1804.02767 . Retrieved from https://arxiv.org/abs/1804.02767.Google Scholar - [36] . 2017. Faster R-CNN: Towards real-time object detection with region proposal networks. IEEE Trans. Pattern Anal. Mach. Intell. 39, 6 (2017), 1137–1149.Google Scholar
Digital Library
- [37] . 2013. AprilCal: Assisted and repeatable camera calibration. In Proceedings of the IEEE/RSJ International Conference on Intelligent Robots and Systems. 1814–1821.Google Scholar
- [38] . 2013. SLAM++: Simultaneous localisation and mapping at the level of objects. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 1352–1359.Google Scholar
Digital Library
- [39] . 2018. The TUM VI benchmark for evaluating visual-inertial odometry. In Proceedings of the IEEE/RSJ International Conference on Intelligent Robots and Systems. 1680–1687.Google Scholar
- [40] . 2019. Revisit surround-view camera system calibration. In Proceedings of the IEEE International Conference on Multimedia and Expo. 1486–1491.Google Scholar
Cross Ref
- [41] . 2020. A tightly-coupled semantic SLAM system with visual, inertial and surround-view sensors for autonomous indoor parking. In Proceedings of the ACM International Conference on Multimedia. 2691–2699.Google Scholar
Digital Library
- [42] . 2012. A benchmark for the evaluation of RGB-D SLAM systems. In Proceedings of the IEEE/RSJ International Conference on Intelligent Robots and Systems. 573–580.Google Scholar
- [43] . 2017. Meaningful maps with object-oriented semantic mapping. In Proceedings of the IEEE/RSJ International Conference on Intelligent Robots and Systems. 5079–5085.Google Scholar
- [44] . 1991. Least-squares estimation of transformation parameters between two point patterns. IEEE Trans. Pattern Anal. Mach. Intell. 13, 4 (1991), 376–380.Google Scholar
Digital Library
- [45] . 2014. Automatic parking based on a bird’s eye view vision system. Adv. Mechanical Engineering 6 (2014), 847406.Google Scholar
Cross Ref
- [46] . 2016. AprilTag 2: Efficient and robust fiducial detection. In Proceedings of the IEEE/RSJ International Conference on Intelligent Robots and Systems. 193–4198.Google Scholar
- [47] . 2011. Real-time metric state estimation for modular vision-inertial systems. In Proceedings of the IEEE International Conference on Robotics and Automation. 4531–4537.Google Scholar
Cross Ref
- [48] . 2017. An Introduction to Inertial Navigation. Retrieved from https://www.cl.cam.ac.uk/techreports/UCAM-CL-TR-696.pdf.Google Scholar
- [49] . 2019. CubeSLAM: Monocular 3D object SLAM. IEEE Trans. Robot. 35, 4 (2019), 925–938.Google Scholar
Digital Library
- [50] . 2016. Pop-up SLAM: Semantic monocular plane SLAM for low-texture environments. In Proceedings of the IEEE/RSJ International Conference on Intelligent Robots and Systems. 1222–1229.Google Scholar
- [51] . 2018. Vision-based parking-Slot detection: A DCNN-based approach and a large-scale benchmark dataset. IEEE Trans. Image Process. 27, 11 (2018), 5350–5364.Google Scholar
Digital Library
- [52] . 2000. A flexible new technique for camera calibration. IEEE Trans. Pattern Anal. Mach. Intell. 22, 11 (2000), 1330–1334.Google Scholar
Digital Library
- [53] . 2019. Visual semantic landmark-based robust mapping and localization for autonomous indoor parking. Sensors 19, 1 (2019), 161–180.Google Scholar
Cross Ref
Index Terms
SLAM for Indoor Parking: A Comprehensive Benchmark Dataset and a Tightly Coupled Semantic Framework
Recommendations
A Tightly-coupled Semantic SLAM System with Visual, Inertial and Surround-view Sensors for Autonomous Indoor Parking
MM '20: Proceedings of the 28th ACM International Conference on MultimediaThe semantic SLAM (simultaneous localization and mapping) system is an indispensable module for autonomous indoor parking. Monocular and binocular visual cameras constitute the basic configuration to build such a system. Features used in existing SLAM ...
An Indoor RGB-D Dataset for the Evaluation of Robot Navigation Algorithms
ACIVS 2013: 15th International Conference on Advanced Concepts for Intelligent Vision Systems - Volume 8192The paper presents a RGB-D dataset for development and evaluation of mobile robot navigation systems. The dataset was registered using a WiFiBot robot equipped with a Kinect sensor. Unlike the presently available datasets, the environment was ...
SLAM and navigation in indoor environments
PSIVT'11: Proceedings of the 5th Pacific Rim conference on Advances in Image and Video Technology - Volume Part IIn this paper, we propose a system for wheeled robot SLAM and navigation in indoor environments. An omni-directional camera and a laser range finder are the sensors to extract the point features and the line features as the landmarks. In SLAM and self-...






Comments