The WAN 2.2 S2V Workflow is the latest evolution of WAN 2.2, designed to create cinematic HD-quality talking or singing videos with minimal effort. This powerful sound-to-video and video-to-video lipsync workflow for ComfyUI allows you to synchronize audio with video seamlessly, producing realistic lip movements and smooth motion in record time.
At this node (AudioCrop), you need to specify which part of the uploaded audio you want to use in your workflow. Set the start_time and end_time values so that the cropped audio matches the exact segment of speech you want.Key Features:
- đ€ Audio + Video Input â Upload an audio/music file along with a video of a person or character to generate synced talking or singing outputs.
- đ„ Automatic Frame Matching â No manual frame calculation requiredâthe workflow intelligently adjusts frame count to match the audio duration and aligns it with the video length.
- ⥠Fast & Reliable â Optimized for WAN 2.2 sound-to-video fast workflow execution. The first run may take longer, but subsequent runs are significantly faster.
- đŒ Cinematic HD Quality â Produces vibrant, detailed visuals with natural lip sync precision.
- đ§ Resolution Control â Adjust output resolution to balance render speed with visual quality.
- đ User-Friendly â Designed to be as simple as possible: upload, queue, and generate.
Pro Tips:
- Use ULTRA PRO GPU for best speed and quality results.
- Higher output resolutions require more generation timeâchoose wisely for your needs.
- Ideal for AI music videos, talking avatars, dubbing projects, character animations, and storytelling content.
With the WAN 2.2 S2V Workflow, creating high-quality lip-synced videos is easier than ever. Just upload your audio + video, write a prompt, and let the workflow handle the restâdelivering smooth, cinematic results every time.