Abstract: Tracking pixels in videos is typically studied as an optical flow estimation problem, where every pixel is described with a displacement vector that locates it in the next frame. Even though wider temporal context is freely available, prior efforts to take this into account have yielded only small gains over 2-frame methods. In this talk, I will present our methods for re-attacking pixel-level motion estimation, based on the idea of treating pixels like “particles”. In this approach, we assume from the outset that pixels have long-range trajectories, and may change appearance over the course of tracking—much like tiny objects. Instead of producing dense flow fields for pairs of frames, our models produce sparse motion fields which travel across dozens of frames. I will demonstrate the advantages of our particle-based methods over optical flow and feature-matching methods, and show new state-of-the-art results for tracking arbitrary pixels across occlusions.