M.Sc. Niclas Joswig defends his doctoral thesis "Robust Visual Odometry under Feature-Sparse Conditions" on Friday the 21st of March 2025 at 13 o'clock in the University of Helsinki Chemicum building, Auditorium A129 (A.I. Virtasen aukio 1, 1st floor). His opponent is Professor Janne Heikkilä (University of Oulu) and custos Professor Laura Ruotsalainen (University of Helsinki). The defence will be held in English.
The thesis of Niclas Joswig is a part of research done in the Department of Computer Science and in the Spatiotemporal Data Analysis roup at the University of Helsinki. His supervisor has been Professor Laura Ruotsalainen (University of Helsinki).
Robust Visual Odometry under Feature-Sparse Conditions
Robustness of Visual Odometry (VO), a technique for estimating camera motion from sequential images, in challenging feature-sparse environments is a critical problem in Computer Vision. This dissertation develops the following novel methodologies to address this challenge: A probabilistic prior that models the camera pose uncertainty in challenging conditions, a dynamic decision algorithm that computes the camera motion with a novel adaptive fusion method combining data from sensors and two novel deep learning architectures that improve monocular depth estimation. In the research area of VO, this dissertation specifically addresses the challenges posed by environments characterized by severe feature sparsity and low amounts of structural features. Existing methods struggle to maintain accurate positioning under low-texture and complex viewpoint conditions, motivating the need for innovative research in this field.
The first contribution is a Plane Prior mechanism that computes a prior probability distribution over the camera rotation. Integrating this prior into VO pipelines enhances the pose computation accuracy and robustness under feature sparse conditions.
In our second contribution, we propose a novel Manhattan Decision Module that dynamically switches between novel combinations of sensor inputs to track the Manhattan frame, providing drift-free rotational tracking even in challenging environments. Additionally, we propose a novel Plane Rescue Mechanism that approximates the camera translation in feature sparse conditions after the rotation has been computed under the Manhattan assumption.
With the goal to also apply the proposed RGB-D based VO methods to monocular input, this dissertation introduces two novel monocular depth estimation techniques. The first developed depth model uses temporal image sequences instead of single images to infer depth from motion cues instead of conventional visual cues like vanishing points. Training the model to use temporal cues enables the network to generalize better to more complex viewpoints, such as the bird's eye view.
The second proposed technique is a novel self-supervised loss function that improves the depth model's ability to estimate accurate depth near edges in challenging environments. This improvement in depth accuracy at edges significantly boosts downstream RGB-D based VO performance, as these methods often rely heavily on geometric features located at edges.
This dissertation not only improves the theoretical understanding of VO in adverse conditions but also provides a framework for future research aimed at overcoming the remaining challenges in depth estimation and camera tracking technologies. These contributions are important for advancing the field of computer vision, particularly in the development of more robust and resilient positioning systems.
Availability of the dissertation
An electronic version of the doctoral dissertation will be available in the University of Helsinki open repository Helda at http://urn.fi/URN:ISBN:978-952-84-0811-6.
Printed copies will be available on request from Niclas Joswig: niclas.joswig@helsinki.fi.