The WAN 2.2 S2V Workflow is the latest sound-to-video solution in ComfyUI, designed to generate cinematic HD-quality talking or singing videos using nothing more than an audio file, an image, and a simple prompt. Whether itâs dialogue, narration, or music, this workflow synchronizes your chosen audio with visuals to bring characters and personalities to life.
Key Features:
- đ¤ Sound-to-Video (S2V) â Transform audio or music into synchronized talking or singing videos.
- đź Image + Audio Input â Upload an image of your character/person and an audio file (music, dialogue, or voice).
- đĽ Automatic Frame Control â No need to calculate framesâthe workflow automatically adjusts video length based on the audio duration.
- đ¨ Image Resolution Adjustment â Built-in node to customize input image resolution for sharper, cleaner results.
- ⥠Fast & Efficient â Optimized for ULTRA PRO GPU use. First runs may take longer, but subsequent renders are significantly faster.
- đ Helpful Notes Included â Guidance provided inside the workflow for smooth setup and best practices.
Pro Tips:
- Use output resolution wiselyâhigher resolutions take more time to render.
- Best results achieved with ULTRA PRO GPU for both speed and cinematic quality.
- Perfect for creating AI music videos, talking avatars, character performances, and storytelling visuals.
With this workflow, making HD talking or singing videos is as simple as uploading your image + audio, writing a prompt, and clicking queue. No manual setup neededâjust fast, high-quality, cinematic results.