Apps Page Background Image
Workflows/ACE STEP 1.5 TEXT TO MUSIC - (SUNO AT HOME) COMFYUI WORKFLOW

ACE STEP 1.5 TEXT TO MUSIC - (SUNO AT HOME) COMFYUI WORKFLOW

Save it for me
Operate
@
🩙Rishappi
02/06/2026
ComfyUI
Audio Generation
New & Trending
1 / 0
Detailed Introduction

The ACE STEP 1.5 Text-to-Music Workflow brings a new generation of open-source music creation into ComfyUI, delivering state-of-the-art text-to-music performance with exceptional speed, structure, and control. Designed as a true foundation model for music AI, ACE-Step overcomes the long-standing trade-offs between generation speed, musical coherence, and controllability found in existing approaches.

This workflow makes it possible to generate high-quality, musically coherent tracks—including vocals and instrumentals—directly from text, positioning ACE STEP 1.5 as a powerful open alternative to platforms like SUNO.

Why ACE STEP 1.5 Is Different

Traditional music generation models face hard limitations:

  • LLM-based systems excel at lyric alignment but are slow and prone to structural artifacts.
  • Diffusion-based systems are fast but often lack long-range musical coherence.

đŸŽŒ ACE STEP bridges this gap with a hybrid architecture that combines:

  • Diffusion-based music generation
  • Sana’s Deep Compression AutoEncoder (DCAE) for efficient high-fidelity audio
  • A lightweight linear transformer for long-range structure
  • MERT and m-HuBERT semantic alignment (REPA) during training for stronger lyric–music consistency

The result is a model that delivers speed, structure, and expressiveness at the same time.

Key Capabilities:

⚡ Ultra-Fast Music Generation
Generates up to 4 minutes of music in ~20 seconds on an A100 GPU, making it up to 15× faster than LLM-based music models.

đŸŽ” Strong Musical Coherence
Maintains consistent melody, harmony, rhythm, and structure across long durations—ideal for full-length tracks.

📝 Excellent Lyric Alignment
Accurately aligns lyrics with vocals, preserving timing, phrasing, and musical flow.

🎧 High-Fidelity Audio Output
Preserves fine-grained acoustic detail, enabling professional-quality results suitable for real creative workflows.

A True Music Foundation Model

Rather than acting as a single-purpose text-to-music system, ACE STEP 1.5 is designed as a general-purpose foundation model for music AI. Its architecture makes it easy to build and train specialized music tools on top—unlocking a new ecosystem for artists, producers, and content creators.In spirit, ACE STEP aims to become the “Stable Diffusion moment” for music—fast, open, flexible, and creator-first.

Ideal Use Cases:

  • Text-to-music and lyric-based song generation
  • AI vocals and instrumental tracks
  • Music prototyping and ideation
  • Remixing and creative experimentation
  • Content creation for video, games, and media

The ACE STEP 1.5 Text-to-Music Workflow empowers you to create fast, coherent, and controllable music directly inside ComfyUI—bringing open, next-generation music AI into your creative pipeline.

Details
APPComfyUI(v0.12.3)
Update Time02/06/2026
File Space13.5 MB
Models0
Extensions3