From Interacting Hands to Expressive and Interacting Humans - Dimitris Tzionas
posted on 16 February, 2021


Abstract: A long-term goal of computer vision and artificial intelligence is to develop human-centred AI that perceives humans in their environments and helps them accomplish their tasks. For this, we need holistic 3D scene understanding, namely modelling how people and objects look, estimating their 3D shape and pose, and inferring their semantics and spatial relationships. For humans and animals this perceptual capability seems effortless, however, endowing computers with similar capabilities has proven to be hard. Fundamentally, the problem involves observing a scene through cameras, and inferring the configuration of humans and objects from images. Challenges exist at all levels of abstraction, from the ill-posed 3D inference from noisy 2D images, to the semantic interpretation of it. The talk will discuss several projects (IJCV’16, TOG’17, CVPR’19, ICCV’19, ECCV’20) that attempt to understand, formalize, and model increasingly complex cases of human-object interactions. These cases range from interacting hands to expressive and interacting whole-body humans. More specifically, the talk will present novel statistical models of the human hand and the whole body, and the usage of these models (1) to efficiently regularize 3D reconstruction from monocular 2D images and eventually (2) to build statistical models of interactions. The presented models are freely available for research purposes.