The ACE STEP 1.5 Text-to-Music Workflow brings a new generation of open-source music creation into ComfyUI, delivering state-of-the-art text-to-music performance with exceptional speed, structure, and control. Designed as a true foundation model for music AI, ACE-Step overcomes the long-standing trade-offs between generation speed, musical coherence, and controllability found in existing approaches.
This workflow makes it possible to generate high-quality, musically coherent tracksâincluding vocals and instrumentalsâdirectly from text, positioning ACE STEP 1.5 as a powerful open alternative to platforms like SUNO.
Why ACE STEP 1.5 Is Different
Traditional music generation models face hard limitations:
- LLM-based systems excel at lyric alignment but are slow and prone to structural artifacts.
- Diffusion-based systems are fast but often lack long-range musical coherence.
đŒ ACE STEP bridges this gap with a hybrid architecture that combines:
- Diffusion-based music generation
- Sanaâs Deep Compression AutoEncoder (DCAE) for efficient high-fidelity audio
- A lightweight linear transformer for long-range structure
- MERT and m-HuBERT semantic alignment (REPA) during training for stronger lyricâmusic consistency
The result is a model that delivers speed, structure, and expressiveness at the same time.
Key Capabilities:
⥠Ultra-Fast Music Generation
Generates up to 4 minutes of music in ~20 seconds on an A100 GPU, making it up to 15Ă faster than LLM-based music models.
đ” Strong Musical Coherence
Maintains consistent melody, harmony, rhythm, and structure across long durationsâideal for full-length tracks.
đ Excellent Lyric Alignment
Accurately aligns lyrics with vocals, preserving timing, phrasing, and musical flow.
đ§ High-Fidelity Audio Output
Preserves fine-grained acoustic detail, enabling professional-quality results suitable for real creative workflows.
A True Music Foundation Model
Rather than acting as a single-purpose text-to-music system, ACE STEP 1.5 is designed as a general-purpose foundation model for music AI. Its architecture makes it easy to build and train specialized music tools on topâunlocking a new ecosystem for artists, producers, and content creators.In spirit, ACE STEP aims to become the âStable Diffusion momentâ for musicâfast, open, flexible, and creator-first.
Ideal Use Cases:
- Text-to-music and lyric-based song generation
- AI vocals and instrumental tracks
- Music prototyping and ideation
- Remixing and creative experimentation
- Content creation for video, games, and media
The ACE STEP 1.5 Text-to-Music Workflow empowers you to create fast, coherent, and controllable music directly inside ComfyUIâbringing open, next-generation music AI into your creative pipeline.
