Workflows/ACE STEP 1.5 TEXT TO MUSIC - (SUNO AT HOME) COMFYUI WORKFLOW

ACE STEP 1.5 TEXT TO MUSIC - (SUNO AT HOME) COMFYUI WORKFLOW

Save it for me

Operate

🦙Rishappi

02/06/2026

ComfyUI

Audio Generation

New & Trending

1 / 0

Detailed Introduction

The ACE STEP 1.5 Text-to-Music Workflow brings a new generation of open-source music creation into ComfyUI, delivering state-of-the-art text-to-music performance with exceptional speed, structure, and control. Designed as a true foundation model for music AI, ACE-Step overcomes the long-standing trade-offs between generation speed, musical coherence, and controllability found in existing approaches.

This workflow makes it possible to generate high-quality, musically coherent tracks—including vocals and instrumentals—directly from text, positioning ACE STEP 1.5 as a powerful open alternative to platforms like SUNO.

Why ACE STEP 1.5 Is Different

Traditional music generation models face hard limitations:

LLM-based systems excel at lyric alignment but are slow and prone to structural artifacts.
Diffusion-based systems are fast but often lack long-range musical coherence.

🎼 ACE STEP bridges this gap with a hybrid architecture that combines:

Diffusion-based music generation
Sana’s Deep Compression AutoEncoder (DCAE) for efficient high-fidelity audio
A lightweight linear transformer for long-range structure
MERT and m-HuBERT semantic alignment (REPA) during training for stronger lyric–music consistency

The result is a model that delivers speed, structure, and expressiveness at the same time.

Key Capabilities:

⚡ Ultra-Fast Music Generation
Generates up to 4 minutes of music in ~20 seconds on an A100 GPU, making it up to 15× faster than LLM-based music models.

🎵 Strong Musical Coherence
Maintains consistent melody, harmony, rhythm, and structure across long durations—ideal for full-length tracks.

📝 Excellent Lyric Alignment
Accurately aligns lyrics with vocals, preserving timing, phrasing, and musical flow.

🎧 High-Fidelity Audio Output
Preserves fine-grained acoustic detail, enabling professional-quality results suitable for real creative workflows.

A True Music Foundation Model

Rather than acting as a single-purpose text-to-music system, ACE STEP 1.5 is designed as a general-purpose foundation model for music AI. Its architecture makes it easy to build and train specialized music tools on top—unlocking a new ecosystem for artists, producers, and content creators.In spirit, ACE STEP aims to become the “Stable Diffusion moment” for music—fast, open, flexible, and creator-first.

Ideal Use Cases:

Text-to-music and lyric-based song generation
AI vocals and instrumental tracks
Music prototyping and ideation
Remixing and creative experimentation
Content creation for video, games, and media

The ACE STEP 1.5 Text-to-Music Workflow empowers you to create fast, coherent, and controllable music directly inside ComfyUI—bringing open, next-generation music AI into your creative pipeline.

Details

APP	ComfyUI(v0.12.3)
Update Time	02/06/2026
File Space	13.5 MB
Models	0
Extensions	3