Overview of Commonly Used Models in Stable Diffusion Webui (Part 2)

Here is a brief introduction to the functionality and usage of commonly used models in Stable Diffusion Webui, including stable-diffusion-v1-5, mo-di-diffusion, Cyberpunk-Anime-Diffusion, Arcane-Diffusion, Openjourney v4, SamDoesArt-V3, Anything V5/V3, anything-v4.0.

Satyam_SSJ10/fantasy-world Model

Satyam_SSJ10/fantasy-world model is based on the original Dpepteahand3 model and has been fine-tuned using artwork from various artists. The artist spent more than two weeks collecting artwork and manually selecting and cropping some pieces. The generated images will display the name of the Dpepteahand3 model, but this can be ignored.

AstraliteHeart/pony-diffusion-v4 Model

AstraliteHeart/pony-diffusion-v4。 pony-diffusion model is a fine-tuned diffusion model for generating images from text, with a focus on high-quality images of ponies, fur, and other non-realistic images. This model was originally used to fine-tune the SD model, which is a latent image diffusion model trained on a single ImageNet image. This checkpoint was fine-tuned for 15 epochs with a learning rate of 5.0e-6, using around 3 million text-image pairs of pony, fur, and other cartoon images (using metadata from derpibooru, e621, and danbooru). Using a secondary CLIP is usually the best option, but it is still recommended to try CLIP skip 1 and 2. It is recommended to add “derpibooru_p_95” in prompts and “derpibooru_p_low” in negative prompts to improve the quality of generated pony images.

prompthero/openjourney-v4 | Openjourney v4 Model

Openjourney v4 model was trained on 124,000 images for 4 epochs and 12,400 steps, with a total training time of 32 hours. The new version no longer requires the use of the “mdjrny-v4 style” command and supports multiple spatial styles, such as conference room style, office style, warehouse style, etc. This model was trained using Stable Diffusion version 1.5. For more information about this model, visit the Prompts page on their website.

DiaryOfSta/dosmix Model

DiaryOfSta/dosmix. Although this model may work well for some, it may be unpleasant for others. The solution is to include prompts that indicate negative aspects (Realistic: 0.11.4) or (realistic: 0.11) in the prompts.

Default prompt: best quality, masterpiece default negative prompt: (low quality, worst quality: 1.4)

Apply VAE. If you use kl-f8-anime2 or vae-ft-mse-840000-ema-pruned, you will get better color results. Set denoise to 0.5, and upscale by 2x. If you don’t upscale and denoise, you may not get the expected results. It is recommended to use prompts such as upper body and cowboy shots.

Meina/meinamix Model

Meina/meinamix. This model is a merged model that includes MeinaMix V1~6, MeinaPastel V3, MeinaHentai V2, Night Sky YOZORA Style Model, PastelMix, and Facebomb. There is no exact recipe because the author conducted several weighted merges with multiple settings and kept the better versions of each merge.

Recommended parameters:

Recommended parameters: Sampler: Euler a: 4060 steps. Sampler: DPM++ SDE Karras: 3060 steps. CFG Scale: 7. Resolutions: 512x768, 512x1024 for Portrait! Resolutions: 768x512, 1024x512, 1536x512 for Landscape! Hires.fix: R-ESRGAN 4x+Anime6b, with 10 steps at 0.1 up to 0.3 denoising. Clip Skip: 2. Negatives Prompts: (worst quality: 2, low quality: 2), (zombie, sketch, interlocked fingers, comic)

coreco/seekart-mega|seek.art MEGA Model

coreco/seekart-mega,seek.art MEGA is a general-purpose “can-do-everything” model that is significantly better than version 1.5 in dozens of styles. It was created by Coreco at seek.art.

Regarding this model, it was trained on nearly 50,000 high-quality digital artworks and photos based on SD 1.5. There is no merging or blending involved.

Settings for portrait samples:

  1. Use a resolution of 640px or higher.
  2. Use the vae-ft-mse-840000-ema VAE.
  3. Use negation to omit unwanted styles. This works well for more realistic output: bad anime crayon scribble.
  4. Be as specific as possible in your prompts.
  5. Auto1111 dynamic CFG threshold expansion works wonders and pairs well with this model’s sd-dynamic-thresholding. Follow the recommended advanced settings.

bodlo/pastelboys2d | Pastel Boys Model

bodlo/pastelboys2d is a model suitable for drawing various types of men. The “Pastel Boys 2nd” version has brighter colors and more detailed backgrounds, but the finger shapes need improvement. Recommended settings for this model are as follows:

  1. Sampling method: Euler a/DPM++ SDE Karras
  2. Clip skip: 2
  3. Hires.fix upscaler: R-ESRGAN 4x+Anime6B
  4. CFG Scale: 7~9
  5. VAE: vae-ft-mse-840000-ema-pruned/kl-f8-anime2

Lykon/anylora | AnyLoRA Model

AnyLoRA is a checkpoint model used to ensure that future LoRA training and updated models are compatible and have a style that is neutral enough to be accurate in any LoRA style. Compared to NAI, this model has better training effects, so you may need to adjust the weights or offsets (I suspect this is because NAI is now greatly diluted in newer models). I usually find that a weight of 0.65 works well, and then I shift it to 1. This is useful for inference (especially with style), even though it was primarily made for training. It has proven to be very useful for generating images and is now my preferred anime model. It also consumes very little VRAM. In summary, AnyLoRA is a lightweight anime generation model for LoRA training and inference, with good performance and a small VRAM footprint, suitable for Colab and local environments.