Getting Robots In The Future To Truly See

Getting Robots In The Future To Truly See

Developing robots that can process visual information in real-time could lead to a new range of handy and helpful robots for around the home and in industry. Professor Andrew Davison and Dr Stefan Leutenegger from the Dyson Robotics Lab at Imperial College London discuss the advances they are making in developing robotic vision.

Real-Time Height Map Fusion

Monocular, Real-Time Surface Reconstruction

This video presents our approach to a scalable, real-time capable method for robust surface reconstruction that explicitly handles multiple scales. We perform depth-map and colour fusion directly into a multi-resolution triangular mesh that can be adaptively tessellated using the concept of Dynamic Level of Detail. Our method is capable of obtaining high quality, close-up reconstruction, as well as capturing overall scene geometry, while being memory and computationally efficient. 

Jacek Zienkiewicz, Akis Tsiotsios, Andrew Davison, Stefan Leutenegger. Monocular, Real-Time Surface Reconstruction using Dynamic Level of Detail. International Conference on 3D Vision (3DV), 2016

 

Monocular, Real-Time Surface Reconstruction

Monocular, Real-Time Surface Reconstruction

Real-Time depth map fusion that is capable of obtaining high detail, close-up reconstruction

This video presents our approach to a scalable, real-time capable method for robust surface reconstruction that explicitly handles multiple scales. We perform depth-map and colour fusion directly into a multi-resolution triangular mesh that can be adaptively tessellated using the concept of Dynamic Level of Detail. Our method is capable of obtaining high quality, close-up reconstruction, as well as capturing overall scene geometry, while being memory and computationally efficient. 

Jacek Zienkiewicz, Akis Tsiotsios, Andrew Davison, Stefan Leutenegger. Monocular, Real-Time Surface Reconstruction using Dynamic Level of Detail. International Conference on 3D Vision (3DV), 2016

 

Real-Time Height Map Fusion using Differentiable Rendering

Real-Time Height Map Fusion using Differentiable Rendering

Real-time method performing dense reconstruction of high quality height maps from monocular video.

This video presents a robust real-time method which performs dense reconstruction of high quality height maps from monocular video. By representing the height map as a triangular mesh, and using efficient differentiable rendering approach, our method enables rigorous incremental probabilistic fusion of standard locally estimated depth and colour into an immediately usable dense model. 

Jacek Zienkiewicz, Andrew J Davison, Stefan Leutenegger. Real-Time Height Map Fusion using Differentiable Rendering. IEEE/RSJ International Conference on Intelligent Robots and Systems, 2016 

Elastic Fusion Widget

ElasticFusion: Dense SLAM Without A Pose Graph

The video above demonstrates the Elastic Fusion system, a novel approach to real-time dense visual SLAM. Our approach applies local model-to-model surface loop closure optimisations to stay close to the mode of the map distribution, while utilising global loop optimisations to recover from arbitrary drift and maintain global consistency.

T Whelan, S Leutenegger, B Glocker, R F. Salas-Moreno,  AJ Davison, ElasticFusion: Dense SLAM Without A Pose GraphRobotics: Science and Systems (RSS), Rome, Italy, July 2015

ElasticFusion: Dense SLAM Without A Pose Graph

ElasticFusion: Dense SLAM Without A Pose Graph

Demostration of real time ElasticFusion on an office, hotel and copy dataset

The video above demonstrates the Elastic Fusion system, a novel approach to real-time dense visual SLAM. Our approach applies local model-to-model surface loop closure optimisations to stay close to the mode of the map distribution, while utilising global loop optimisations to recover from arbitrary drift and maintain global consistency.

T Whelan, S Leutenegger, B Glocker, R F. Salas-Moreno,  AJ Davison, ElasticFusion: Dense SLAM Without A Pose GraphRobotics: Science and Systems (RSS), Rome, Italy, July 2015

ElasticFusion: Dense SLAM Without A Pose Graph (extras)

ElasticFusion: Dense SLAM Without A Pose Graph (extras)

ElasticFusion on seating area, garden, The Burghers of Calais, stairs, MIT-76-417b, loopback dataset

The video above demonstrates the Elastic Fusion system, a novel approach to real-time dense visual SLAM. Our approach applies local model-to-model surface loop closure optimisations to stay close to the mode of the map distribution, while utilising global loop optimisations to recover from arbitrary drift and maintain global consistency.

T Whelan, S Leutenegger, B Glocker, R F. Salas-Moreno,  AJ Davison, ElasticFusion: Dense SLAM Without A Pose GraphRobotics: Science and Systems (RSS), Rome, Italy, July 2015

Deep Learning a Grasp Function for Grasping Under Gripper Pose Uncertainty, IROS2016

Deep Learning for Robot Grasping via Simulation

This video demonstrates our approach to grasping under gripper pose uncertainty.  In this approach, we assign a score for every possible grasp pose allowing us to achieve robustness to the gripper’s pose uncertainty by smoothing the grasp function with the pose uncertainty function. Synthetic and real experiments demonstrate that the learned grasp score is more robust to gripper pose uncertainty than when this uncertainty is not accounted for.

E Johns, S Leutenegger, AJ Davison. Deep Learning a Grasp Function for Grasping Under Gripper Pose Uncertainty. IEEE/RSJ International Conference on Intelligent Robots and Systems, 2016

Simultaneously recover the motion field and brightness image, while the event camera undergoes a generic motion through any scene.

Simultaneous Optical Flow and Intensity Estimation

This video demonstrates our algorithm to simultaneously recover the motion field and brightness image, while the event camera undergoes a generic motion through any scene. Our approach employs minimisation of a cost function that contains the asynchronous event data as well as spatial and temporal regularisation within a sliding window time interval. Our implementation relies on GPU optimisation and runs in near real-time. 

P Bardow, AJ Davison, S Leutenegger. Simultaneous Optical Flow and Intensity Estimation from an Event Camera. The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2016, pp. 884-892

Dense 3D Semantic Mapping with Convolutional Neural Networks

SemanticFusion: Dense 3D Semantic Mapping with CNN

We address the challenge of semantic maps by combining Convolutional Neural Networks (CNNs) and a state of the art dense Simultaneous Localisation and Mapping (SLAM) system, ElasticFusion, which provides long-term dense correspondence between frames of indoor RGB-D video even during loopy scanning trajectories. These correspondences allow the CNN’s semantic predictions from multiple view points to be probabilistically fused into a map. This not only produces a useful semantic 3D map, but we also show on the NYUv2 dataset that fusing multiple predictions leads to an improvement even in the 2D semantic labelling over baseline single frame predictions. We also show that for a smaller reconstruction dataset with larger variation in prediction viewpoint, the improvement over single frame segmentation increases. Our system is efficient enough to allow real-time interactive use at frame-rates of ≈25Hz.

John McCormac, Ankur Handa, Andrew Davison, and Stefan Leutenegger, SemanticFusion: Dense 3D Semantic Mapping with Convolutional Neural Networks, 2016, arXiv:1609.05130