Apps Page Background Image
Workflows/LTX 2.3 IMAGE TO VIDEO WITH CUSTOM AUDIO - PERFECT LIP SYNC WORKFLOW

LTX 2.3 IMAGE TO VIDEO WITH CUSTOM AUDIO - PERFECT LIP SYNC WORKFLOW

Save it for me
Operate
@
🦙Rishappi
03/23/2026
ComfyUI
New & Trending
Lip-sync
LTX-2
1 / 0
Detailed Introduction

The LTX 2.3 Image-to-Video with Custom Audio Workflow is a powerful and specialized ComfyUI pipeline designed to create high-quality lip-synced videos using your own custom audio or music. Built on the advanced LTX 2.3 model, this workflow delivers precise lip synchronization, smooth motion, and cinematic HD output, making it ideal for talking avatars, singing videos, and creative storytelling.With just an image, audio input, and a descriptive prompt, you can generate videos where characters perfectly match speech or lyrics, bringing static visuals to life with natural expressions and motion.

How It Works:

1️⃣ Upload the image you want to animate

2️⃣ Upload your custom audio or music file

3️⃣ Set the output resolution (width & height)

4️⃣ Define the audio duration

5️⃣ Write a detailed prompt describing actions (talking, dancing, walking) and camera motion

6️⃣ Click Queue to generate your video

This workflow integrates a free LTX text encoder API, which removes the heavy computational load normally associated with local text encoders. By offloading this step:

  • Generation runs smoother and faster
  • GPU memory usage is significantly reduced
  • Overall workflow efficiency is improved

Instructions for obtaining the free API key are as below

Get FREE LTX text encoder API ky from here :  https://console.ltx.video

When you login with the above link you can create one like this

After copy that API in workflow field

The workflow produces a fully lip-synced video, where the character’s mouth movements align accurately with the provided audio.

Key Features:

👄 Accurate Lip Sync – Matches mouth movements precisely to speech or song lyrics

🎧 Custom Audio Support – Use any voice, dialogue, or music track

🎤 Talking & Singing Capabilities – Perfect for AI presenters, performers, and virtual influencers

🎬 Prompt-Based Motion Control – Define actions, gestures, and camera movement through text

🖼️ HD Video Output – Clean, stable visuals with natural motion

♾️ Flexible Duration – Generate videos based on your audio length

Performance Tips:

  • ULTRA PRO GPU is recommended for best speed and quality
  • Higher resolutions increase generation time—adjust based on your needs
  • Detailed prompts improve realism and motion accuracy

Ideal Use Cases:

  • AI talking avatars and presenters
  • Music videos and lyric-based animations
  • Social media content creation
  • Storytelling with character dialogue
  • Virtual influencer videos

The LTX 2.3 Image-to-Video with Custom Audio Workflow delivers a seamless way to create realistic, expressive, and perfectly lip-synced videos—combining advanced AI motion, audio alignment, and creative control inside ComfyUI.


Details
APPComfyUI(v0.17.0)
Update Time03/23/2026
File Space11.3 GB
Models2
Extensions8