Abstract: Recent advances in world models for vision have been driven largely by powerful 2D generative models that predict future scene states from visual observations and planned actions, often with little or no built-in knowledge of 3D geometry or physical dynamics. In this talk, I will discuss why physical inductive biased, such as explicit 3D structure, remain essential for world modelling. I will present recent work showing how incorporating geometric and physical priors into learned models leads to more generalisable behaviour, significantly more efficient inference, and better control over predictions.