Workflows/VACE ALL IN ONE

VACE ALL IN ONE

Save it for me

Operate

MimicPC

07/23/2025

ComfyUI

Video Generation

Wan 2.1

1 / 0

Detailed Introduction

Introduction

VACE is an all-in-one model designed for video creation and editing. It encompasses various tasks, including reference-to-video generation (R2V), video-to-video editing (V2V), and masked video-to-video editing (MV2V), allowing users to compose these tasks freely. This functionality enables users to explore diverse possibilities and streamlines their workflows effectively, offering a range of capabilities, such as Move-Anything, Swap-Anything, Reference-Anything, Expand-Anything, Animate-Anything, and more.

Wan-Video

a comprehensive and open suite of video foundation models that pushes the boundaries of video generation. Wan2.1 offers these key features:

👍 SOTA Performance: Wan2.1 consistently outperforms existing open-source models and state-of-the-art commercial solutions across multiple benchmarks.
👍 Supports Consumer-grade GPUs: The T2V-1.3B model requires only 8.19 GB VRAM, making it compatible with almost all consumer-grade GPUs. It can generate a 5-second 480P video on an RTX 4090 in about 4 minutes (without optimization techniques like quantization). Its performance is even comparable to some closed-source models.
👍 Multiple Tasks: Wan2.1 excels in Text-to-Video, Image-to-Video, Video Editing, Text-to-Image, and Video-to-Audio, advancing the field of video generation.
👍 Visual Text Generation: Wan2.1 is the first video model capable of generating both Chinese and English text, featuring robust text generation that enhances its practical applications.
👍 Powerful Video VAE: Wan-VAE delivers exceptional efficiency and performance, encoding and decoding 1080P videos of any length while preserving temporal information, making it an ideal foundation for video and image generation.

https://github.com/Wan-Video/Wan2.1

https://huggingface.co/Kijai/WanVideo_comfy

https://github.com/ali-vilab/VACE

Recommended machine：Ultra-PRO

Workflow Overview

How to use this workflow

There are 5 workflows in total, open as needed

Part 1: First and last frames

Step 1: Load Image

The top is the first frame, the bottom is the last frame

It is best not to exceed 768*768 when scaling the image, otherwise the generation time will be very long

Step 2: Input the Prompt

Prompt supports Chinese and English. After testing, the recognition ability of Chinese is better than that of English.

Step 3: Sampling parameter setting

When step=25, the video effect is already very fine. If you need to make a video with more elements such as the starry sky or flowers, it is recommended to set the step to more than 35

Step 4: Get Video

You can change the video length by setting frame_rate or num_frames (in WanVideo Empty Embeds). Video length = num_frames/frame_rate

Part 2: Image Fusion

Step 1: Load Image

Step 2: Input the Prompt

Prompt supports Chinese and English. After testing, the recognition ability of Chinese is better than that of English.

Step 3: Sampling parameter setting

When step=25, the video effect is already very fine. If you need to make a video with more elements such as the starry sky or flowers, it is recommended to set the step to more than 35

Step 4: Get Video

You can change the video length by setting frame_rate or num_frames (in WanVideo Empty Embeds). Video length = num_frames/frame_rate

Part 3: Motion Transfer

Step 1: Load Video & Reference Image

Video captures action, image captures portrait

Step 2: Input the Prompt

Prompt supports Chinese and English. After testing, the recognition ability of Chinese is better than that of English.

Step 3: Sampling parameter setting

When step=25, the video effect is already very fine. If you need to make a video with more elements such as the starry sky or flowers, it is recommended to set the step to more than 35

Step 4: Get Video

Part 4: Video Editing

Step 1: Load Video

Step 2: Edit video

Select the coordinate points you want to remove and the coordinate points you want to keep in the canvas

Shift + left mouse button, add pos coordinates

Shift + right mouse button, add neg coordinates

Right click to delete coordinates

Step 3: Input the Prompt

Prompt supports Chinese and English. After testing, the recognition ability of Chinese is better than that of English.

Step 4: Sampling parameter setting

When step=25, the video effect is already very fine. If you need to make a video with more elements such as the starry sky or flowers, it is recommended to set the step to more than 35

Step 5: Get Video

Part 5: Video Expansion

Step 1: Load Video

Step 2: Input the Prompt

Prompt supports Chinese and English. After testing, the recognition ability of Chinese is better than that of English.

Step 3: Sampling parameter setting

When step=25, the video effect is already very fine. If you need to make a video with more elements such as the starry sky or flowers, it is recommended to set the step to more than 35

Step 4: Get Video

Details

APP	ComfyUI(v0.3.34)
Update Time	07/23/2025
File Space	1.7 GB
Models	0
Extensions	15