Contact Us
Apps Page Background Image
Learn/Course/Run Flux.1 NF4 on WebUI Forge: Maximizing Speed and Performance

FeaturedRun Flux.1 NF4 on WebUI Forge: Maximizing Speed and Performance

1
0
0
mimicpc
08/28/2024
Maximize AI image generation speed and performance with Flux.1 NF4 on WebUI Forge. Learn how to optimize MimicPC configurations for rapid, high-quality results.

With the recent update from MimicPC, Stable Diffusion-WebUI-Forge now fully supports the Flux.1 model, offering users an enhanced experience in AI-driven image generation. This update significantly boosts speed and precision, particularly when using the NF4 format. In this blog post, we’ll explore the advantages of running Flux.1 NF4 on Stable Diffusion WebUI Forge, focusing on speed improvements and how to optimize performance across different MimicPC hardware configurations.


NF4 vs. FP8: A Comparison in Speed and Efficiency

Flux.1 introduces two primary checkpoint formats: NF4 and FP8. Each comes with distinct advantages, but NF4 stands out for its remarkable speed and efficiency.

  1. Speed Advantage: NF4 is significantly faster than FP8, especially on devices with limited VRAM. For instance, on an 8GB VRAM device like the 3070 Ti, NF4 can reduce the iteration time from 8.3 seconds (with FP8) to just 2.15 seconds—an impressive 3.86x speed improvement. This makes NF4 the optimal choice for users seeking rapid image generation.
  2. Memory Efficiency: NF4 checkpoint files are about half the size of their FP8 counterparts, making them more storage-efficient and faster to load.
  3. Precision and Dynamic Range: While FP8 can sometimes offer higher precision, NF4 generally provides better performance in terms of detail retention and dynamic range. This is due to NF4’s sophisticated tensor compression method, which optimizes both storage and computation.

MimicPC Hardware Recommendations: Getting the Most Out of NF4

MimicPC offers a range of hardware configurations designed to cater to different needs. Here’s how to leverage NF4’s advantages on each of these setups:

  1. Medium (T4 16GB VRAM | 16GB RAM):
    • Recommendation: The NF4 checkpoint is ideal for this setup. With 16GB of VRAM, you can benefit from NF4’s speed improvements while adjusting GPU weights and swap settings in the WebUI Forge for optimal performance.
  2. Large (A10G 24GB VRAM | 16GB RAM):
    • Recommendation: This configuration’s 24GB of VRAM allows for handling larger models with ease. The NF4 checkpoint is well-suited here, enabling faster processing without sacrificing image quality. Consider increasing GPU weights to fully utilize the available VRAM.
  3. Large-Pro (A10G 24GB VRAM | 32GB RAM):
    • Recommendation: With additional RAM, the Large-Pro setup is perfect for more complex projects. Use the NF4 format and enable the Async swap method for even faster processing times while maintaining stability.
  4. Ultra (L40S 48GB VRAM | 32GB RAM):
    • Recommendation: The Ultra model, with its 48GB of VRAM, is built for power users. Here, you can push NF4 to its limits, maxing out GPU weights and using advanced settings to handle multi-layered, complex scenes with ease.

Diffusion with Low Bits: Choosing the Right Setting

In WebUI Forge, you have the option to force the loading weight type through the "Diffusion with Low Bits" settings. These include Auto, nf4, fp8e4, fp4, and fp8e5.

However, in most cases, you can simply set this option to Auto, which will automatically select the optimal precision based on your downloaded checkpoint. This feature ensures that you are using the most efficient setting for your hardware without needing to manually adjust the configuration.

Optimizing NF4 on Stable Diffusion-WebUI-Forge

No matter which MimicPC model you’re using, the following settings will help you optimize the performance of Flux.1 NF4 on Stable Diffusion-WebUI-Forge:

  1. Swap Location:
    • CPU Swap: This method offloads part of the model to CPU memory when VRAM is insufficient. It’s reliable but slower.
    • Shared Memory Swap: For MimicPC models with ample RAM, consider using shared memory swap, which can be up to 15% faster than CPU swap, although it may cause instability on some systems.

  1. GPU Weights Slider: Adjust the GPU weights according to your project needs. Larger weights increase speed but require more VRAM. For most MimicPC configurations, starting with a mid-range setting and adjusting based on performance is advisable.
  2. Swap Method:
    • Queue: This method processes layers sequentially, providing stable and predictable performance.
    • Async: Ideal for higher-end MimicPC models like Large-Pro and Ultra, Async can accelerate processing but requires careful GPU memory management.

Distilled CFG Guidance

Flux-dev is a distilled model. It is recommended to set CFG=1 and then do not use negative prompts. Using “Distilled CFG Guidance” instead. The default value is 3.5.

Note that if CFG=1, the UI of negative prompt will be greyed out.

Generate images with NF4

UI select: flux,

Checkpoint select: flux1-dev-bnb-nf4-v2

Astronaut in a jungle, cold color palette, muted colors, very detailed, sharp focus

Steps: 20, Sampler: Euler, Schedule type: Simple, CFG scale: 1, Distilled CFG Scale: 3.5, Seed: 12345, Size: 896x1152, Model: flux1-dev-bnb-nf4-v2

We then get a similar image:

Black Myth: Wukong has been taking the world by storm lately, so let's see what kind of Wukong NF4 has in store for us!

Chinese mythology, the Monkey King wukong, wearing a golden hoop spell, holding a golden rod, riding a somersault cloud, soaring in the heavenly palace

Steps: 20, Sampler: Euler, Schedule type: Simple, CFG scale: 1, Distilled CFG Scale: 3.5, Seed: 3107193459, Size: 896x1152, Model hash: bea01d51bd, Model: flux1-dev-bnb-nf4-v2, Version: f2.0.1v1.10.1-previous-361-g65ec461f

Well, a happy monkey who hasn't experienced the Black Myth.

Girl, 20 years old, HD close-up photo of face, Disney style, very detailed

Steps: 20, Sampler: Euler, Schedule type: Simple, CFG scale: 1, Distilled CFG Scale: 3.5, Seed: 3107193459, Size: 896x1152, Model hash: bea01d51bd, Model: flux1-dev-bnb-nf4-v2, Version: f2.0.1v1.10.1-previous-361-g65ec461fcale: 3.5, Seed: 3107193459, Size: 896x1152, Model hash: bea01d51bd,n: f2.0.1v1.10.1-previous-361-g65ec461f

European vintage style living room with black wooden furniture, brown wooden floor, large floor-to-ceiling windows, brown leather sofa, crystal chandelier, white carved plaster ceiling

Steps: 20, Sampler: Euler, Schedule type: Simple, CFG scale: 1, Distilled CFG Scale: 3.5, Seed: 2503002636, Size: 896x1152, Model hash: bea01d51bd, Model: flux1-dev-bnb-nf4-v2, Version: f2.0.1v1.10.1-previous-361-g65ec461f

Conclusion

With MimicPC’s latest update to Stable Diffusion-WebUI-Forge, using the Flux.1 model—especially in the NF4 format—has never been more powerful. By aligning your hardware setup with the right configurations, you can fully exploit NF4’s speed and efficiency, making your image generation workflow faster and more effective. Whether you’re using a Medium setup or the Ultra model, this guide should help you optimize your use of Flux.1 NF4 and push your creative limits further than ever before.

Explore these updates today with MimicPC and experience the next level of AI-driven creativity.


Catalogue