Autonomous robot manipulation for planetary and terrestrial applications - Renaud Detry
posted on 23 March, 2021


Abstract: In this talk, I will first discuss the experimental validation of autonomous robot behaviors that support the exploration of Mars’ surface, lava tubes on Mars and the Moon, icy bodies and ocean worlds, and operations on orbit around the Earth. I will frame the presentation with the following questions: What new insights or limitations arise when applying algorithms to real-world data as opposed to benchmark datasets or simulations? How can we address the limitations of real-world environments—e.g., noisy or sparse data, non-i.i.d. sampling, etc.? What challenges exist at the frontiers of robotic exploration of unstructured and extreme environments? I will discuss our approach to validating autonomous machine-vision capabilities for the notional Mars Sample Return campaign, for autonomously navigating lava tubes, and for autonomously assembling modular structures on orbit. The talk will highlight the thought process that drove the decomposition of a validation need into a collection of tests conducted on off-the-shelf datasets, custom/application-specific datasets, and simulated or physical robot hardware, where each test addressed a different range of experimental parameters for sensing/actuation fidelity, breadth of environmental conditions, and breadth of jointly-tested robot functions. Next, I will present a task-oriented grasp model, that en- codes grasps that are configurationally compatible with a given task. The model consists of two independent agents: First, a geometric grasp model that computes, from a depth image, a distribution of 6D grasp poses for which the shape of the gripper matches the shape of the underlying surface. The model relies on a dictionary of geometric object parts annotated with workable gripper poses and preshape parameters. It is learned from experience via kinesthetic teaching. The second agent is a CNN-based semantic model that identifies grasp- suitable regions in a depth image, i.e., regions where a grasp will not impede the execution of the task. The semantic model allows us to encode relationships such as “grasp from the handle.” A key element of this work is to use a deep network to integrate contextual task cues, and defer the structured-output problem of gripper pose computation to an explicit (learned) geometric model. Jointly, these two models generate grasps that are mechanically fit, and that grip on the object in a way that enables the intended task.