Abstract
Current, self-supervised depth estimation architectures rely on clear and sunny weather scenes to train deep neural networks. However, in many locations, this assumption is too strong. For example in the UK (2021), 149 days consisted of rain. For these architectures to be effective in real-world applications, we must create models that can generalise to all weather conditions, times of the day and image qualities. Using a combination of computer graphics and generative models, one can augment existing sunny-weather data in a variety of ways that simulate adverse weather effects. While it is tempting to use such data augmentations for self-supervised depth, in the past this was shown to degrade performance instead of improving it. In this paper, we put forward a method that uses augmentations to remedy this problem. By exploiting the correspondence between unaugmented and augmented data we introduce a pseudo-supervised loss for both depth and pose estimation. This brings back some of the benefits of supervised learning while still not requiring any labels. We also make a series of practical recommendations which collectively offer a reliable, efficient framework for weather-related augmentation of self-supervised depth from monocular video. We present extensive testing to show that our method, Robust-Depth, achieves SotA performance on the KITTI dataset while significantly surpassing SotA on challenging, adverse condition data such as DrivingStereo, Foggy CityScape and NuScenes-Night.
Original language | English |
---|---|
Title of host publication | Proceedings of the 2023 International Conference on Computer Vision |
Publisher | IEEE |
Pages | 8907-8917 |
Number of pages | 11 |
Publication status | E-pub ahead of print - 6 Oct 2023 |
Event | The 2023 International Conference on Computer Vision - Paris, France Duration: 2 Oct 2023 → 6 Oct 2023 https://iccv2023.thecvf.com/ |
Conference
Conference | The 2023 International Conference on Computer Vision |
---|---|
Abbreviated title | ICCV 2023 |
Country/Territory | France |
City | Paris |
Period | 2/10/23 → 6/10/23 |
Internet address |
Bibliographical note
This ICCV paper is the Open Access version, provided by the Computer Vision Foundation. Except for this watermark, it is identical to the accepted version; the final published version of the proceedings is available on IEEE Xplore.Funding & Acknowledgements: This research was funded and supported by the EPSRC’s DTP, Grant EP/W524566/1. Most experiments were run on Aston EPS Machine Learning Server, funded by the EPSRC Core Equipment Fund, Grant EP/V036106/1.