Mining the Latent. A Tuning-Free Paradigm for Versatile Applications with Diffusion Models - Ye Zhu (Princeton University)
posted on 27 February, 2024


Abstract: Diffusion Models, with their core design to model the distribution transition as a stochastic Markovian process, have become state-of-the-art generative models for data synthesis in computer vision. Despite its impressive generation performance, its high training cost has limited the number of research groups that are able to participate and contribute to the work, consequently hindering their downstream applications. In this talk, I will present a novel methodological paradigm to leverage the pre-trained diffusion models for versatile applications by a deep-dive understanding of their latent spaces from both theoretical and empirical perspectives. Specifically, we propose several tuning-free methods for data semantic editing [Zhu et al., NeurIPS 2023], data customization [Wang et al., ICLR 2024], and generalized unseen data synthesis [Zhu et al., arXiv 2024], all by mining the unique properties in the latent spaces, showing the great potential for versatile applications in an efficient and robust tuning-free manner.