Apps Page Background Image
Workflows/WAN 2.2 S2V: Fast Talking & Singing Video from Image + Audio

WAN 2.2 S2V: Fast Talking & Singing Video from Image + Audio

Save it for me
Operate
@
🦙Rishappi
09/02/2025
ComfyUI
Popular & HOT
Video Generation
Wan 2.2
1 / 0
Detailed Introduction

The WAN 2.2 S2V Workflow is the latest sound-to-video solution in ComfyUI, designed to generate cinematic HD-quality talking or singing videos using nothing more than an audio file, an image, and a simple prompt. Whether it’s dialogue, narration, or music, this workflow synchronizes your chosen audio with visuals to bring characters and personalities to life.

Key Features:

  • 🎤 Sound-to-Video (S2V) – Transform audio or music into synchronized talking or singing videos.
  • 🖼 Image + Audio Input – Upload an image of your character/person and an audio file (music, dialogue, or voice).
  • 🎥 Automatic Frame Control – No need to calculate frames—the workflow automatically adjusts video length based on the audio duration.
  • 🎨 Image Resolution Adjustment – Built-in node to customize input image resolution for sharper, cleaner results.
  • ⚡ Fast & Efficient – Optimized for ULTRA PRO GPU use. First runs may take longer, but subsequent renders are significantly faster.
  • 📘 Helpful Notes Included – Guidance provided inside the workflow for smooth setup and best practices.

Pro Tips:

  • Use output resolution wisely—higher resolutions take more time to render.
  • Best results achieved with ULTRA PRO GPU for both speed and cinematic quality.
  • Perfect for creating AI music videos, talking avatars, character performances, and storytelling visuals.

With this workflow, making HD talking or singing videos is as simple as uploading your image + audio, writing a prompt, and clicking queue. No manual setup needed—just fast, high-quality, cinematic results.

Details
APPComfyUI(v0.3.53)
Update Time09/02/2025
File Space1.3 GB
Models2
Extensions8