Towards Open-world Long Video Understanding - Weidi Xie (Shanghai Jiao Tong University)
posted on 30 July, 2024


Abstract: Understanding videos has long been of great interest for the vision community. Comparing to the analysis on static images, the extra time axis introduces both challenges and opportunities. In this talk, I will discuss some of the recent works on long video understanding from our group, for example, visual-language alignment on instructional videos, grounded visual question answering on egocentric videos, retrieval-augmented video understanding, open-world instance tracking within videos, etc. For more information, please check the papers here: https://weidixie.github.io/research.htm