Learn/Course/Auto1111 Tutorial Chapter 2 Text to Image Introduction& Prompt Basis

FeaturedAuto1111 Tutorial Chapter 2 Text to Image Introduction& Prompt Basis

mimicpc

05/30/2025

Stable Diffusion 3

Unlock the power of prompts in AI image generation with Auto1111 Chapter 2. Discover the essentials of crafting effective prompts, adjusting prompt weight, and managing negative prompts for exceptional AI-created visuals.

The basic concept of prompt

A prompt is the text or information provided by the user to guide an AI model in generating an image based on specific requirements. Essentially, a prompt is the language through which the user communicates their desired outcome to the AI.

In the context of AI image generation, "text to image" refers to the process where the entire generation is based solely on textual descriptions. On the other hand, "image to image" involves using an initial image to convey information, although prompts remain an important element in this process as well.

The scope of a prompt is broad, encompassing everything from the title of the image to its various characteristics and details.

The basic logic of prompt

Firstly, prompts must be written in English. Secondly, prompts should be composed of phrases rather than complete sentences, with phrases separated by commas.

Here are some example prompts to help you generate better images:

Character & Main Characteristics

Clothing: white dress
Hair style & color: blonde hair, long hair, short hair
Face details: small eyes, big mouth, large nose
Expression: smiling, crying
Body language: stretching arms, standing

Environment Details

Indoor/Outdoor
Main environment: forest, city, street
Details: tree, bush, white flower, day/night, morning, sunset, sunlight, blue sky

Composition

Distance: close-up, distant
Proportion: full body, upper body

Quality

High quality: best quality, ultra-detailed, masterpiece, high-res, 8k
Specified high detail: extremely detailed CG unity 8k wallpaper, unreal engine rendered

Painting Style

Illustration style: painting, illustration, drawing
Two-dimensional: anime, comic, game CG
Realistic: photorealistic, realistic

Weight of prompt

The effect of weight is to increase or decrease the priority of certain prompts. For example, if you provide many prompts to the AI, it might overlook some due to the sheer number. Therefore, you can add weight to the prompts you want to emphasize most in the image.

There are two ways to adjust the weight:

Using Parentheses:
- Adding parentheses around a prompt increases its weight by 1.1 times. For greater emphasis, you can use double parentheses, which increases the weight by 1.331 times.
- Example: (green flowers), (((green flowers)))
Using Numbers:
- Adding a number to a prompt adjusts its weight by the specified factor.
- Example: (green flowers:1), (white flowers:1.5), (purple flowers:1.5)

However, it's important to avoid setting the weight too high, as it might cause distortion in the image. A reasonable range is from 0.5 to 1.5.

Negative Prompt

Negative prompts specify the elements you do not want to appear in your image. Conversely, positive prompts list the elements you want to include.

Here are some common negative prompts:

Low quality: low quality, low resolution
Single color: monochrome, grayscale
Body & face: ugly, bad proportions, short
Body details: missing hands, extra fingers

Parameter setting

The higher the number of steps, the more detailed the generated image will be. However, once the steps exceed 20, the changes in the image become minimal. There is only a slight difference between images generated with 20 and 40 steps. Additionally, increasing the steps adds to the algorithm's processing time. Therefore, 20 steps is the default option, while the recommended range is 10 to 30 steps.

The sampler refers to the specific algorithm the AI uses to generate images. WebUI offers more than 10 algorithms, but typically, only 4-5 are commonly used. Different algorithms have distinct characteristics. For example, Euler and Euler a are suitable for illustration styles, while DPM 2M and 2M Karras are faster in generating speed. Samplers with a "+" are recommended for their stability. Some models are optimized for specific samplers, making the recommended sampler the best choice.

If the resolution is too low, the generated image will be blurry and lack detail. If the resolution is too high, the algorithm will run slower. Therefore, it is crucial to test various resolutions to find the optimal balance between quality and efficiency.

For facial resolution, it is recommended to enable this setting as it can enhance the faces of some characters. Tiling should not be enabled unless generating patterns. The safe range for CFG scale is between 7 and 12.

Batch drawing is a useful feature for generating multiple images at once. You can set the number of batches to produce as many images as you need. However, it is recommended to keep the quantity per batch at 1.

Catalogue