Abstract: Video understanding is a challenging task due to its high dimensionality and the complex, entangled spatio-temporal context. Traditional pixel-based approaches are often inefficient and struggle to extract the core contextual information effectively. In this seminar, I will introduce high-level geometric features, such as bounding boxes and human skeletal representations, for video analysis and explain how they reduce computational complexity, improve model generalizability, and facilitate the identification of relevant patterns. Furthermore, I will explore how incorporating geometric concepts into video analysis improves the interpretability, privacy, and fairness of deep learning models. These advancements enable the application of video-based deep learning models in healthcare, such as predicting Parkinson’s disease and cerebral palsy.