Learn/Course/Auto1111 Tutorial Chapter 1 Basic Theory and Software Foundation

FeaturedAuto1111 Tutorial Chapter 1 Basic Theory and Software Foundation

MimicPC

06/04/2025

Stable Diffusion 3

Theory

AI art doesn't follow the traditional human painting process (drafting, outlining, detailing). Instead, it utilizes a method called "diffusion," which is more effective for generating images.

The basic concept and theory of AI art

Early Research and Limitations

In 2012, academics began exploring image generation using deep learning models. Initially, the resolution and content of the generated images were disappointing.

Recent Advancements

In recent years, AI-generated images have significantly improved in quality and accuracy, often containing notable aesthetic value. Today, AI art is recognized as a valuable tool for artists.

The Unique Process of AI Art

Diffusion: A New Approach

AI art doesn't follow the traditional human painting process (drafting, outlining, detailing). Instead, it utilizes a method called "diffusion," which is more effective for generating images.

Step 1: Adding Noise

Noise is added to the image, making it vague. This dissociation from the original image creates a larger imaginative space, a process known as "diffusion."

Step 2: Deep Learning Extraction

AI uses deep learning to extract characteristic information from different images, connecting it with the original image.

Step 3: Noise Removal and Style Transformation

The noise is then removed, and the image is repainted in a different style. This allows the transformation of a realistic image into various styles, such as cartoon.

The Terminology of AI Art

AI-Generated Image vs. AI Art

Since AI art doesn't paint in the traditional way, the term "AI-generated image" is more accurate than "AI art."

For example, the original image on the left could be regenerated through AI, then we got the result from the image on the right hand side. The style of these two image are totally different. Therefore, "AI-generated image" is more accurate than "AI art".

undefined

The Emergence of AI Art Tools

If you are a graphic designer, you need to learn Photoshop. If you are a video editor, you need to learn Premiere or Final Cut. Here is the question for AI: which tool should be used? Which software needs to be learned?

Exploring AI Art Tools

In the world of AI art, there are various applications available, such as Midjourney, Fooocus, and NovelAI. However, Many of these apps rely on the vendor's algorithms and effects and require expensive subscriptions to create a larger number of images, which is a limiting factor for users.

The Rise of Stable Diffusion

In August 2022, a groundbreaking application named "Stable Diffusion" revolutionized the quality of AI art through iterative algorithms. This application brought significant improvements, with outputs generated in seconds and the capability to run on any computer with an ordinary graphics card.

June 2024, stable diffusion 3 is officially launched.Stable Diffusion 3 represents a significant advancement in generative models, offering improved image quality, better text-image alignment, and enhanced performance. Its applications span various fields, providing valuable tools for creativity, research, and industry. As the technology continues to evolve, it holds the promise of further transforming how we generate and interact with visual content.

Capabilities of Stable Diffusion

Stable Diffusion allows users to create a wide variety of images in different styles, such as manga, 3D models, and realistic photos. The precision in controlling the art style is enhanced through tools like Lora and ControlNet.

Advantages of Open-Source Software

The most important aspect of Stable Diffusion is that it is open-source software. This means users can operate the entire system on their own computers for free and without any limitations. As a result, most AI art application tools are now developed based on Stable Diffusion, leveraging its powerful capabilities and flexibility.

In conclusion, while there are several AI art tools available, Stable Diffusion stands out due to its efficiency, versatility, and open-source nature. For those looking to delve into AI art, learning to use Stable Diffusion and its associated tools like Lora and ControlNet will be invaluable.

Configuration

Computer Requirements:

You need a computer running either Windows or macOS. Stable Diffusion cannot be run on mobile phones.

If your local computer lacks sufficient hardware resources, consider using a cloud service like Google Colab, Run Diffusion and MimicPC. which provides access to powerful computing resources remotely.

Graphics Card (GPU):

NVIDIA GPUs are highly recommended for Stable Diffusion due to their excellent performance in AI tasks.

Ensure that your GPU is independent of the CPU for optimal performance. This means the GPU should be a dedicated card, not integrated into the CPU.

Different NVIDIA GPU models vary in performance (hashrate), which directly affects the speed and efficiency of AI image generation.

Storage:

Adequate storage is essential for storing datasets, model checkpoints, and generated images.

The amount of storage available also determines the output resolution and the complexity of the models you can train.

Ensure you have enough free storage space to accommodate large datasets and outputs.

Configuration Requirements:

Before running Stable Diffusion, verify that your system meets the minimum hardware and software requirements specified by the application.

Install necessary drivers for your GPU and ensure they are up to date. Stable Diffusion often requires specific versions of CUDA and cuDNN for optimal performance.

Cloud GPU services can circumvent these hassles, and it's also vital to choose the right cloud service for you.

Optimizing Performance:

Monitor GPU utilization and temperature to ensure stable operation and prevent overheating.

Consider optimizing GPU settings and memory allocation based on the recommendations provided by Stable Diffusion or similar applications.

Use GPU monitoring tools to adjust settings for maximum efficiency during image generation.

How to install and turn on the stable diffusion WebUI

For Users Familiar with Python and Git:

Download and Install via Git:
- If you have a Python environment set up and are comfortable with Git:
- - Clone the Stable Diffusion repository from GitHub using the following command:
- git clone https://github.com/StableDiffusion/StableDiffusion.git

Navigate into the cloned directory (StableDiffusion) and proceed with installation as per the provided documentation.

For Beginners or Those Using an Integration Pack:

Download and Install Using an Integration Pack:
- Visit the official website of Stable Diffusion or an integration pack provider like auto1111. https://github.com/AUTOMATIC1111/stable-diffusion-webui
- Download the integration package (typically a zip file).
- Use decompression software such as Bandzip or WinRAR to extract the downloaded zip file.
- Or through a cloud service vendor, who generally has Stable diffusion pre-built and setup.
Setting Up Stable Diffusion:
- Create a new folder on your computer for Stable Diffusion. Ensure the folder path contains only English characters and has sufficient local disk storage.
- Unzip the contents of the downloaded zip file into this new folder.
Running Stable Diffusion:
- Locate and double-click the run.bat file within the Stable Diffusion folder.
- Wait for the application to load; this may take a moment.
- The Stable Diffusion WebUI homepage should automatically open in your default web browser.
Operating Stable Diffusion:
1. Keep the command-line interface (CLI) window open while using the Stable Diffusion WebUI.
2. Interact with the WebUI through your browser to generate AI images or perform other tasks.
3. Remember to keep the command-line interface running while using the WebUI, and close it when you're done operating in the browser.

Key Functions for AI Image Generation:

Now we are going to use WEBUI to start the learning journey!

Prompt to Image:
- This function takes a text prompt and generates an image based on the description provided.

Image to Image:
- This function allows for generating a new image from an existing one, often modifying or enhancing it.

Additional Tools and Features:

Zooming with Tab:
- Tabs are primarily used for zooming into images and processing them with AI to achieve clearer and more detailed results.
Settings:
- This section allows users to configure output parameters (such as resolution, format, and quality) and specify the save path for generated images.

Extensions:
- Extensions enable the installation of additional features, such as Lora extensions, to expand the capabilities of the AI image generation tool.

Gallery Browser:
- The Gallery Browser is used to view and manage generated images and their associated data, providing a comprehensive overview of all outputs.

Steps to Generate an AI Image

Download and Select the Model:
- Start by downloading your preferred AI model and selecting it within the image generation tool. The chosen model will define the overall style of the generated image.
Text to Image:
- Click on "Text to Image" to begin the image generation process.
Input Prompts:
- Positive Prompt: These are the key elements you want in your image. Example:postive prompt: a girl, blonde, having coffee, instagram photo,(masterpiece:1.2), best quality, highres, original, extremely detailed wallpaper, perfect lighting,(extremely detailed CG:1.2), looking at viewer, close-up, upper body,
- Negative Prompt: These specify the elements to avoid in your image. Example:Negative prompt: (worst quality:2), (low quality:2), (normal quality:2), lowres, normal quality, ((monochrome)), ((grayscale)), skin spots, acnes, skin blemishes, age spot, (ugly:1.331), (duplicate:1.331), (morbid:1.21), (mutilated:1.21), (tranny:1.331), mutated hands, (poorly drawn hands:1.5), blurry, (bad anatomy:1.21), (bad proportions:1.331), extra limbs, (disfigured:1.331), (missing arms:1.331), (extra legs:1.331), (fused fingers:1.61051), (too many fingers:1.61051), (unclear eyes:1.331), lowers, bad hands, missing fingers, extra digit,bad hands, missing fingers, (((extra arms and legs))).
Adjust Settings:
- Configure the settings to fine-tune the output:
- - Steps: 20
  - Sampler: DPM++ SDE Karras
  - CFG Scale: 8
  - Seed: 1315345756
  - Face Restoration: GFPGAN
  - Size: 800x450
  - Model Hash: 038ba203d8
  - Clip Skip: 2
  - ENSD: 31337
Generate and Save Image:
- After setting up everything, initiate the generation process.
- Once the image is generated, right-click on the image and choose "Save As" to download it to your desired location.

The basic concept of prompt

A prompt is the text or information provided by the user to guide an AI model in generating an image based on specific requirements. Essentially, a prompt is the language through which the user communicates their desired outcome to the AI.

In the context of AI image generation, "text to image" refers to the process where the entire generation is based solely on textual descriptions. On the other hand, "image to image" involves using an initial image to convey information, although prompts remain an important element in this process as well.

The scope of a prompt is broad, encompassing everything from the title of the image to its various characteristics and details.

The basic logic of prompt

Firstly, prompts must be written in English. Secondly, prompts should be composed of phrases rather than complete sentences, with phrases separated by commas.

Here are some example prompts to help you generate better images:

Character & Main Characteristics

Clothing: white dress
Hair style & color: blonde hair, long hair, short hair
Face details: small eyes, big mouth, large nose
Expression: smiling, crying
Body language: stretching arms, standing

Environment Details

Indoor/Outdoor
Main environment: forest, city, street
Details: tree, bush, white flower, day/night, morning, sunset, sunlight, blue sky

Composition

Distance: close-up, distant
Proportion: full body, upper body

Quality

High quality: best quality, ultra-detailed, masterpiece, high-res, 8k
Specified high detail: extremely detailed CG unity 8k wallpaper, unreal engine rendered

Painting Style

Illustration style: painting, illustration, drawing
Two-dimensional: anime, comic, game CG
Realistic: photorealistic, realistic

Weight of prompt

The effect of weight is to increase or decrease the priority of certain prompts. For example, if you provide many prompts to the AI, it might overlook some due to the sheer number. Therefore, you can add weight to the prompts you want to emphasize most in the image.

There are two ways to adjust the weight:

Using Parentheses:
- Adding parentheses around a prompt increases its weight by 1.1 times. For greater emphasis, you can use double parentheses, which increases the weight by 1.331 times.
- Example: (green flowers), (((green flowers)))
Using Numbers:
- Adding a number to a prompt adjusts its weight by the specified factor.
- Example: (green flowers:1), (white flowers:1.5), (purple flowers:1.5)

However, it's important to avoid setting the weight too high, as it might cause distortion in the image. A reasonable range is from 0.5 to 1.5.

Negative Prompt

Negative prompts specify the elements you do not want to appear in your image. Conversely, positive prompts list the elements you want to include.

Here are some common negative prompts:

Low quality: low quality, low resolution
Single color: monochrome, grayscale
Body & face: ugly, bad proportions, short
Body details: missing hands, extra fingers

Parameter setting

The higher the number of steps, the more detailed the generated image will be. However, once the steps exceed 20, the changes in the image become minimal. There is only a slight difference between images generated with 20 and 40 steps. Additionally, increasing the steps adds to the algorithm's processing time. Therefore, 20 steps is the default option, while the recommended range is 10 to 30 steps.

The sampler refers to the specific algorithm the AI uses to generate images. WebUI offers more than 10 algorithms, but typically, only 4-5 are commonly used. Different algorithms have distinct characteristics. For example, Euler and Euler a are suitable for illustration styles, while DPM 2M and 2M Karras are faster in generating speed. Samplers with a "+" are recommended for their stability. Some models are optimized for specific samplers, making the recommended sampler the best choice.

If the resolution is too low, the generated image will be blurry and lack detail. If the resolution is too high, the algorithm will run slower. Therefore, it is crucial to test various resolutions to find the optimal balance between quality and efficiency.

For facial resolution, it is recommended to enable this setting as it can enhance the faces of some characters. Tiling should not be enabled unless generating patterns. The safe range for CFG scale is between 7 and 12.

Batch drawing is a useful feature for generating multiple images at once. You can set the number of batches to produce as many images as you need. However, it is recommended to keep the quantity per batch at 1.

Catalogue