Video Editing via Factorized Diffusion Distillation

Uriel Singer*, Amit Zohar*, Yuval Kirstain, Shelly Sheynin, Adam Polyak, Devi Parikh, Yaniv Taigman
*Equal Contribution
Meta AI
In an aquarium
Sitting on a red bench
Set it in winter wonderland
Replace with a panda
In Minecraft style
Paint it pink and blue

Abstract

We introduce Emu Video Edit (EVE), a model that establishes a new state-of-the art in video editing without relying on any supervised video editing data. To develop EVE we separately train an image editing adapter and a video generation adapter, and attach both to the same text-to-image model. Then, to align the adapters towards video editing we introduce a new unsupervised distillation procedure, Factorized Diffusion Distillation. This procedure distills knowledge from one or more teachers simultaneously, without any supervised data. We utilize this procedure to teach EVE to edit videos by jointly distilling knowledge to (i) precisely edit each individual frame from the image editing adapter, and (ii) ensure temporal consistency among the edited frames using the video generation adapter. Finally, to demonstrate the potential of our approach in unlocking other capabilities, we align additional combinations of adapters.

Precise Video Editing

Input
With pyramids in the back
Add a balloon
In cartoon style
Input
With a santa hat
Add a beard
At Tokyo
Input
Cover it with flowers
Transform into a penguin
Turn the floor into glass
Input
At a show stage
Cartoon style
Depth map
Input
Add a red bench
Convert to a sketch
In pink colors
Input
Dressed as a princess
With a fireman uniform
Replace with a penguin
Input
Remove the plant
Make the mushroom pink
Paint the sofa gold
Input
Make it a doctor
With a policeman uniform
At New York
Input
Convert to a glass bowl
Cartoon style
Generate a sketch
Input
In the snow
Cover with mud
As a pop art painting
Input
Remove glasses
Extract a pose map
Replace with a penguin
Input
Make the water green
Paint the tail with rainbow colors
Futuristic style
Input
Add metal gloves
At an amusement park
Make it snow
Input
Add fireworks in the back
Make the suit green
Make it autumn

Acknowledgements

We extend our gratitude to the following people for their contributions (alphabetical order):
Andrew Brown, Bichen Wu, Ishan Misra, Saketh Rambhatla, Xiaoliang Dai, Zijian He.



Bibtex


@misc{singer2024video,
      title={Video Editing via Factorized Diffusion Distillation}, 
      author={Uriel Singer and Amit Zohar and Yuval Kirstain and Shelly Sheynin and Adam Polyak and Devi Parikh and Yaniv Taigman},
      year={2024},
      eprint={2403.09334},
      archivePrefix={arXiv},
      primaryClass={cs.CV}
}