Workflows/LTX 2.3 IMAGE TO VIDEO WITH CUSTOM AUDIO - PERFECT LIP SYNC WORKFLOW

LTX 2.3 IMAGE TO VIDEO WITH CUSTOM AUDIO - PERFECT LIP SYNC WORKFLOW

Save it for me

Operate

🦙Rishappi

03/23/2026

ComfyUI

New & Trending

Lip-sync

LTX-2

1 / 0

Detailed Introduction

The LTX 2.3 Image-to-Video with Custom Audio Workflow is a powerful and specialized ComfyUI pipeline designed to create high-quality lip-synced videos using your own custom audio or music. Built on the advanced LTX 2.3 model, this workflow delivers precise lip synchronization, smooth motion, and cinematic HD output, making it ideal for talking avatars, singing videos, and creative storytelling.With just an image, audio input, and a descriptive prompt, you can generate videos where characters perfectly match speech or lyrics, bringing static visuals to life with natural expressions and motion.

How It Works:

1️⃣ Upload the image you want to animate

2️⃣ Upload your custom audio or music file

3️⃣ Set the output resolution (width & height)

4️⃣ Define the audio duration

5️⃣ Write a detailed prompt describing actions (talking, dancing, walking) and camera motion

6️⃣ Click Queue to generate your video

This workflow integrates a free LTX text encoder API, which removes the heavy computational load normally associated with local text encoders. By offloading this step:

Generation runs smoother and faster
GPU memory usage is significantly reduced
Overall workflow efficiency is improved

Instructions for obtaining the free API key are as below

Get FREE LTX text encoder API ky from here : https://console.ltx.video

When you login with the above link you can create one like this

After copy that API in workflow field

The workflow produces a fully lip-synced video, where the character’s mouth movements align accurately with the provided audio.

Key Features:

👄 Accurate Lip Sync – Matches mouth movements precisely to speech or song lyrics

🎧 Custom Audio Support – Use any voice, dialogue, or music track

🎤 Talking & Singing Capabilities – Perfect for AI presenters, performers, and virtual influencers

🎬 Prompt-Based Motion Control – Define actions, gestures, and camera movement through text

🖼️ HD Video Output – Clean, stable visuals with natural motion

♾️ Flexible Duration – Generate videos based on your audio length

Performance Tips:

ULTRA PRO GPU is recommended for best speed and quality
Higher resolutions increase generation time—adjust based on your needs
Detailed prompts improve realism and motion accuracy

Ideal Use Cases:

AI talking avatars and presenters
Music videos and lyric-based animations
Social media content creation
Storytelling with character dialogue
Virtual influencer videos

The LTX 2.3 Image-to-Video with Custom Audio Workflow delivers a seamless way to create realistic, expressive, and perfectly lip-synced videos—combining advanced AI motion, audio alignment, and creative control inside ComfyUI.

Details

APP	ComfyUI(v0.17.0)
Update Time	03/23/2026
File Space	11.3 GB
Models	2
Extensions	8