DreamBooth. Fine Tuning Text-to-Image Diffusion Models for Subject-Driven Generation - Nataniel Ruiz (Boston University)
posted on 21 March, 2023


Abstract: We present a new approach for personalization of text-to-image diffusion models. Given few-shot inputs of a subject, we fine-tune a pretrained text-to-image model to bind a unique identifier with that specific subject such that we can synthesize fully-novel photorealistic images of the subject contextualized in different scenes. By leveraging the semantic prior embedded in the model with a new autogenous class-specific prior preservation loss, our technique enables synthesizing the subject in diverse scenes, poses, views, and lighting conditions that do not appear in the reference images. We apply our technique to several previously-unassailable tasks, including subject recontextualization, text-guided view synthesis, appearance modification, and artistic rendering (all while preserving the subject’s key features). We also show diverse new applications of our work undertaken by users that span from creating personalized AI avatars to generating novel art pieces by guiding the network using finetuning.