Comfy-WaveSpeed is an inference optimization tool designed for ComfyUI, leveraging dynamic caching and enhanced torch.compile technology to significantly boost image generation speed while maintaining high-quality output.
Workflow
Core Features
1. First Block Cache (FBCache)
Caches the residual output of the first Transformer block, skipping redundant computations to achieve 1.5x to 3.0x speedups with minimal accuracy loss.
2. Enhanced torch.compile
Optimizes model inference with cached compilation, delivering faster subsequent runs and compatibility with LoRA.
Installation Steps
Add WaveSpeed Node
In the MimicPC ComfyUI environment:
- Navigate to the custom_nodes directory (via file manager or terminal).
Workflow Setup
In the MimicPC ComfyUI interface, build the workflow as follows:
1. Load Built-in Model
Use the Load Diffusion Model node and select flux-dev.safetensors (pre-installed on MimicPC).
2. Enable First Block Cache
1. Add the wavespeed->Apply First Block Cache node, connecting it to the model output.
2. Set residual_diff_threashold to 0.12 (recommended for FP8 quantization, 28 steps).
3. Add Compilation Boost (Optional)
Insert the wavespeed->Compile Model+ node, setting mode to max-autotune.
4. Generate Image
1. Connect KSampler and VAE Decode, then hit run to produce the image.
Performance Results (A10 24GB Machine)
Tested on MimicPC with an A10 24GB GPU, using the built-in Flux.1-dev (FP8 quantization, 28 steps, 1024x1024 resolution):
· Without WaveSpeed Node
o Generation Time: 73.08 seconds
o Standard process, no optimizations.
· With WaveSpeed Acceleration
o Configuration: Apply First Block Cache (residual_diff_threashold=0.12) + Compile Model+ (mode=max-autotune).
o Generation Time: 46.02 seconds
o Speedup: Approximately 1.59x, saving about 27 seconds.