Abstract: The visual concepts depicted in images from domains such as medicine, biodiversity monitoring, and biological imaging are inherently “fine-grained”. By this, we mean that distinct visual concepts may appear very similar to the untrained eye. Learning these subtle differences can be very challenging in low data regimes where expert annotations are not easily available. In this talk, I will provide an overview of recent research from my group on this topic. I will present work on learning 3D representations of images without requiring explicit 3D supervision, new methods for automatically discovering visual concepts in data, models for estimating the spatial distribution of fine-grained categories, and ongoing work to develop a new dataset for text-based retrieval of fine-grained visual concepts.