Workflows/WAN 2.2 S2V: Fast Talking & Singing Video from Image + Audio

WAN 2.2 S2V: Fast Talking & Singing Video from Image + Audio

Save it for me

Operate

🦙Rishappi

09/17/2025

ComfyUI

Video Generation

New & Trending

Wan 2.2

1 / 0

Detailed Introduction

The WAN 2.2 S2V Workflow is the latest sound-to-video solution in ComfyUI, designed to generate cinematic HD-quality talking or singing videos using nothing more than an audio file, an image, and a simple prompt. Whether it’s dialogue, narration, or music, this workflow synchronizes your chosen audio with visuals to bring characters and personalities to life.

Key Features:

🎤 Sound-to-Video (S2V) – Transform audio or music into synchronized talking or singing videos.
🖼 Image + Audio Input – Upload an image of your character/person and an audio file (music, dialogue, or voice).
🎥 Automatic Frame Control – No need to calculate frames—the workflow automatically adjusts video length based on the audio duration.
🎨 Image Resolution Adjustment – Built-in node to customize input image resolution for sharper, cleaner results.
⚡ Fast & Efficient – Optimized for ULTRA PRO GPU use. First runs may take longer, but subsequent renders are significantly faster.
📘 Helpful Notes Included – Guidance provided inside the workflow for smooth setup and best practices.

Pro Tips:

Use output resolution wisely—higher resolutions take more time to render.
Best results achieved with ULTRA PRO GPU for both speed and cinematic quality.
Perfect for creating AI music videos, talking avatars, character performances, and storytelling visuals.

With this workflow, making HD talking or singing videos is as simple as uploading your image + audio, writing a prompt, and clicking queue. No manual setup needed—just fast, high-quality, cinematic results.

Details

APP	ComfyUI(v0.3.53)
Update Time	09/17/2025
File Space	1.3 GB
Models	2
Extensions	8