Logo
Overview
Stable Diffusion on AMD GPUs: Setup Guide

Stable Diffusion normally performs best on NVIDIA GPUs with CUDA support β€” that’s the reality. But here’s the thing: many users want to run Stable Diffusion on AMD GPUs without switching hardware, and it’s actually possible. Thanks to ROCm (Linux) and DirectML (Windows), it also runs on AMD GPUs like the RX 5000, 6000, 7000 and Radeon VII series. This guide shows step-by-step how to run Stable Diffusion locally on an AMD GPU using Diffusers and AUTOMATIC1111 WebUI.


βœ… Supported AMD GPUs for Stable Diffusion

  • Radeon RX 5000 Series (RDNA1)
  • Radeon RX 6000 Series (RDNA2)
  • Radeon RX 7000 Series (RDNA3)
  • Radeon VII
  • Radeon Pro Series

πŸ”§ Requirements to Run Stable Diffusion on AMD GPUs

RequirementLinuxWindows
Python 3.10+βœ…βœ…
AMD GPU 8GB+ VRAMβœ…βœ…
ROCm driversβœ…βŒ
DirectMLβŒβœ…
Gitβœ…βœ…

πŸš€ Method 1: Run Stable Diffusion on AMD (Windows – DirectML Setup)

DirectML allows running Stable Diffusion on AMD GPUs without ROCm β€” it’s the easier option if you’re on Windows, and honestly, it’s the path of least resistance.

Installation

  1. Install Python: python.org/downloads
  2. Install Git: git-scm.com/downloads
  3. Open PowerShell and create environment:
Terminal window
python -m venv sd-env
sd-env\Scripts\activate
pip install torch-directml
pip install diffusers transformers accelerate safetensors

Test Generation

from diffusers import DiffusionPipeline
import torch
device = torch.device("dml")
pipe = DiffusionPipeline.from_pretrained("runwayml/stable-diffusion-v1-5")
pipe.to(device)
image = pipe("cinematic robot portrait, detailed, 4k").images[0]
image.save("amd_test.png")

🐧 Method 2: Run Stable Diffusion on AMD (Linux – ROCm Setup)

ROCm delivers faster performance on Linux β€” if you’re comfortable with Linux, this is the way to go.

Install ROCm Drivers

Follow AMD instructions: ROCm docs

Install PyTorch with ROCm

Terminal window
pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/rocm6.0

Install Diffusers

Terminal window
pip install diffusers transformers accelerate safetensors

Run Stable Diffusion

from diffusers import StableDiffusionPipeline
import torch
pipe = StableDiffusionPipeline.from_pretrained("runwayml/stable-diffusion-v1-5", torch_dtype=torch.float16)
pipe.to("cuda")
image = pipe("fantasy castle, golden sunset, high detail").images[0]
image.save("output.png")

πŸ–₯️ Run Stable Diffusion WebUI (AUTOMATIC1111) on AMD GPUs

Windows (DirectML)

Terminal window
git clone https://github.com/lshqqytiger/stable-diffusion-webui-directml.git
cd stable-diffusion-webui-directml
python launch.py --precision full --no-half --use-directml

Linux (ROCm)

Terminal window
git clone https://github.com/lshqqytiger/stable-diffusion-webui-directml.git
cd stable-diffusion-webui-directml
python launch.py --enable-insecure-extension-access

⚑ Performance Optimization for AMD Stable Diffusion

SettingRecommendation
Resolution512x512 or 768x512
Steps20–30
SchedulerEuler A
Precision--no-half on DirectML

🧯 Basic Troubleshooting for AMD Stable Diffusion Setup

IssueSolution
Slow generationUse SD Turbo model
Memory crashLower resolution
Driver errorUpdate AMD Adrenalin
HF auth errorLogin to Hugging Face

πŸ”— Useful Resources


πŸ”§ Before You Begin – AMD GPU Compatibility Notes

AMD GPUs are not officially supported by PyTorch on Windows with CUDA (CUDA is NVIDIA-only). To run Stable Diffusion with AMD you must use:

  • DirectML (Windows) – easiest setup
  • ROCm (Linux) – fastest performance
FeatureDirectML (Windows)ROCm (Linux)
PerformanceMediumHigh
StabilityGoodVery good
FP16 supportLimitedFull
Best UseBeginnersPerformance users

⚑ AMD vs NVIDIA – Performance Reality

Let me be honest: AMD GPUs can run Stable Diffusion, but performance depends on model and settings. It won’t be as fast as NVIDIA, but it definitely works.

GPUSpeed (img/min)Works withNotes
RX 580 (8GB)1DirectMLSlow but works
RX 5700 XT2DirectMLGood entry GPU
RX 6700 XT4DirectML/ROCmSolid performance
RX 7900 XT6–8ROCmRecommended πŸ”₯

πŸ›‘οΈ Security Tip: Hugging Face Tokens

Avoid storing tokens in scripts. Use environment variables or .env files.

import os
from huggingface_hub import login
login(os.environ.get("HF_TOKEN"))

πŸ”Œ Optional: Use SD Turbo for faster AMD rendering

For faster speeds:

from diffusers import DiffusionPipeline
pipe = DiffusionPipeline.from_pretrained("stabilityai/sd-turbo")
pipe.to("cuda" if torch.cuda.is_available() else "dml")

❓ FAQ

Does Stable Diffusion work on AMD GPUs? Yes, with DirectML (Windows) or ROCm (Linux).

Do I need CUDA? No, CUDA is NVIDIA-only. AMD uses ROCm or DirectML.

What is the easiest way for AMD users? DirectML on Windows.

Can I use AUTOMATIC1111 WebUI with AMD GPUs? Yes using the DirectML fork.

Is ROCm better than DirectML? Yes, but ROCm is Linux-only.


πŸ”₯ AUTOMATIC1111 on AMD – Full Optimized Setup

Windows (DirectML Optimized)

Terminal window
git clone https://github.com/lshqqytiger/stable-diffusion-webui-directml.git
cd stable-diffusion-webui-directml
python launch.py --precision full --no-half --use-directml --opt-split-attention

Linux (ROCm Optimized)

Terminal window
git clone https://github.com/lshqqytiger/stable-diffusion-webui-rocm.git
cd stable-diffusion-webui-rocm
bash webui.sh --opt-sdp-attention

🧩 ComfyUI Setup for AMD (Workflows & Power Use)

Terminal window
git clone https://github.com/comfyanonymous/ComfyUI.git
cd ComfyUI
python main.py --cuda-provider rocm

πŸš€ Performance Benchmarks (AMD GPU)

GPUSD 1.5 SpeedSDXL Speed
RX 66003.5 it/s1.1 it/s
RX 6700 XT5.1 it/s2.2 it/s
RX 6800 XT6.2 it/s2.9 it/s
RX 7900 XT8.8 it/s4.1 it/s

βœ… Best Models for 6GB VRAM (Entry GPUs – RX 580, RX 5500 XT, RX 6500 XT)

ModelTypeLink
SD TurboFaststabilityai/sd-turbo
Realistic Vision v5.1 LiteRealisticCivitAI #4201
DreamShaper 6 LiteGeneral/CreativeCivitAI #4384

βœ… Best Models for 8GB VRAM (RX 5700 XT, RX 6600 XT)

ModelTypeLink
DreamShaper XLSDXL AllroundCivitAI #4384 (XL)
RevAnimatedAnime + Realistic MixCivitAI #7371
Realistic Vision v5.1PhotographyCivitAI #4201

βœ… Best Models for 12GB VRAM (RX 6700 XT, RX 6800 XT)

ModelTypeLink
Juggernaut XLRealistic XLRunDiffusion/Juggernaut-XI
SDXL Base 1.0Official Basestabilityai/sdxl-base-1.0
BluePencil XLArt XLCivitAI #240138

βœ… Best Models for 16GB+ VRAM (RX 7900 XT/XTX, Radeon Pro)

ModelTypeLink
FLUX.1-devFuture Modelblack-forest-labs/FLUX.1-dev
Photon XLHyper RealisticCivitAI #303980
ZavyChroma XLClean RealismCivitAI #129925

LoRAStyleLink
Cyberpunk CitySci‑fiCivitAI #78775
Film Color SciencePhotographicCivitAI #20182
Pixar StyleAnimationCivitAI #137573

✨ Power Prompt Library (Creator-Grade Prompts)

Realistic Portrait (Studio Quality)

ultra detailed portrait of a {subject}, cinematic studio lighting, sony a7r iv photography, 85mm lens, skin texture, film grain, 8k, sharp focus, masterpiece

Negative: lowres, bad hands, overexposed, jpeg artifacts, blurry, distorted

Cinematic Environment

ancient sci-fi temple in the mountains, volumetric god rays, dramatic scale, epic atmosphere, unreal engine, cinematic composition, ultra detailed environment concept art

Product Render

minimalist product render of futuristic headphones, soft reflections, volumetric rim light, octane render, high gloss, elegant industrial design, clean background

Cyberpunk City Shot

neon cyberpunk street in tokyo, wet asphalt reflections, neon haze, dystopian aesthetic, rain particles, atmospheric depth, cinematic 2.39:1

Digital Illustration

fantasy warrior princess, ornate golden armor, flowing cape, intricate costume design, mystical aura, dramatic lighting, artgerm style, greg rutkowski, wlop influence

🧩 ControlNet Setup for AMD GPUs (Essential Models Only)

ControlNet allows precise control over image structure. These 3 models work best and are AMD-compatible.

Install ControlNet in AUTOMATIC1111

  1. Go to Extensions β†’ Install from URL
  2. Add ControlNet repo URL:
https://github.com/Mikubill/sd-webui-controlnet
  1. Click Apply and Restart UI

Download Essential ControlNet Models

ModelPurposeDownload
CannyEdge-based object controllllyasviel/ControlNet-v1-1
DepthPerspective + compositionlllyasviel/ControlNet-v1-1
OpenPoseHuman pose + character posinglllyasviel/ControlNet-v1-1

AMD Performance Settings

  • Enable low VRAM mode if GPU < 12GB
  • Use only 1 ControlNet at a time on 8GB GPUs
  • Set preprocessors to lightweight modes

πŸ”§ LoRA Integration on AMD (WebUI + Diffusers)

LoRA allows style transfer and character consistency with very small model files.

Use LoRA in AUTOMATIC1111 (AMD)

  1. Download any .safetensors LoRA file from CivitAI
  2. Place it in:
stable-diffusion-webui/models/Lora/
  1. Activate in prompt:
<lora:FilmColor-LORA:0.7>
PurposeWeight Example
Subtle style influence0.4 – 0.6
Strong creative style0.7 – 0.9
Character consistency0.9 – 1.1

Use LoRA with Diffusers (AMD Compatible)

pipeline.load_lora_weights("./lora/FilmColor-LORA.safetensors")
image = pipeline(prompt, num_inference_steps=28).images[0]

Performance Tips for LoRA on AMD

  • Avoid stacking more than 2 LoRAs on 8GB GPUs
  • Use SD Turbo for faster LoRA results
  • If VRAM error: reduce resolution to 512x768 or enable CPU offload

πŸ—οΈ SDXL on AMD GPUs (Performance Setup)

SDXL delivers better detail and realism than SD 1.5, but it requires more VRAM. It works on AMD GPUs with ROCm (Linux) or DirectML (Windows).

SettingValue
SamplerEuler A or DPM++ 2M
Steps20–30
CFG Scale6.5–8
Resolution1024Γ—1024 (12GB VRAM)
RefinerOptional

Load SDXL in Diffusers (AMD Compatible)

from diffusers import DiffusionPipeline
import torch
pipe = DiffusionPipeline.from_pretrained(
"stabilityai/stable-diffusion-xl-base-1.0",
torch_dtype=torch.float16
)
pipe.to("cuda" if torch.cuda.is_available() else "dml")
image = pipe("epic fantasy castle, golden hour lighting").images[0]
image.save("sdxl_amd.png")

SDXL + Refiner Workflow

base = DiffusionPipeline.from_pretrained("stabilityai/stable-diffusion-xl-base-1.0", torch_dtype=torch.float16)
refiner = DiffusionPipeline.from_pretrained("stabilityai/stable-diffusion-xl-refiner-1.0", torch_dtype=torch.float16)
base.to("cuda"); refiner.to("cuda")
image = base(prompt, output_type="latent").images
image = refiner(prompt, image=image).images[0]

βœ… Best for AMD GPUs with 12GB+ VRAM: RX 6700 XT, RX 6800 XT, RX 7900 XT/XTX


πŸ–₯️ Windows vs Linux for AMD – Which Setup to Choose?

FeatureWindows (DirectML)Linux (ROCm)
SpeedMedium ⚑Fast πŸ”₯
StabilityGoodVery Stable βœ…
SDXL SupportYesYes
FLUX CompatibilityLimitedPartial
ControlNet SupportYesYes
Setup DifficultyEasy βœ…Medium βš™οΈ
Best ForBeginnersPerformance Users

GPU ClassResolutionStepsSamplerNotes
6GB (RX 580, 5500 XT)512Γ—76820Euler AUse SD Turbo
8GB (RX 5700 XT, 6600 XT)768Γ—76824DPM++ 2MGood balance
12GB (RX 6700 XT, 6800)1024Γ—102426DPM++ SDEBest for SDXL
16GB+ (RX 7900 XT/XTX)1216Γ—121628DPM++ KarrasHigh quality

βš™οΈ AUTOMATIC1111 Performance Flags for AMD (Quick Boost)

Use these flags in your launch.py command to improve speed and stability on AMD GPUs:

python launch.py --precision full --no-half --enable-insecure-extension-access --opt-sdp-attention

Recommended flags explained:

  • --precision full β†’ prevents DirectML crashes
  • --no-half β†’ fixes FP16 instability on AMD
  • --opt-sdp-attention β†’ speeds up sampling

Terminal window
# Remove any old AMD GPU drivers first
sudo amdgpu-uninstall || true
# System update
sudo apt update && sudo apt upgrade -y
# Install dependencies
sudo apt install wget gnupg2 software-properties-common -y
# Add ROCm repository
wget https://repo.radeon.com/rocm/rocm.gpg.key
sudo gpg --dearmor < rocm.gpg.key | sudo tee /etc/apt/trusted.gpg.d/rocm.gpg > /dev/null
sudo add-apt-repository "deb [arch=amd64] https://repo.radeon.com/rocm/apt/6.0 ubuntu main"
# Install ROCm
sudo apt update
sudo apt install rocm-dev -y
# Enable ROCm PATH
echo 'export PATH=/opt/rocm/bin:$PATH' >> ~/.bashrc
source ~/.bashrc

Load SDXL in Diffusers (AMD Compatible)

from diffusers import DiffusionPipeline
import torch
pipe = DiffusionPipeline.from_pretrained(
"stabilityai/stable-diffusion-xl-base-1.0",
torch_dtype=torch.float16
)
pipe.to("cuda" if torch.cuda.is_available() else "dml")
image = pipe("epic fantasy castle, golden hour lighting").images[0]
image.save("sdxl_amd.png")

SDXL + Refiner Workflow

base = DiffusionPipeline.from_pretrained("stabilityai/stable-diffusion-xl-base-1.0", torch_dtype=torch.float16)
refiner = DiffusionPipeline.from_pretrained("stabilityai/stable-diffusion-xl-refiner-1.0", torch_dtype=torch.float16)
base.to("cuda"); refiner.to("cuda")
image = base(prompt, output_type="latent").images
image = refiner(prompt, image=image).images[0]

Use LoRA with Diffusers (AMD Compatible)

pipeline.load_lora_weights("./lora/FilmColor-LORA.safetensors")
image = pipeline(prompt, num_inference_steps=28).images[0]

Performance Tips for LoRA on AMD

  • Avoid stacking more than 2 LoRAs on 8GB GPUs
  • Use SD Turbo for faster LoRA results
  • If VRAM error: reduce resolution to 512x768 or enable CPU offload

ControlNet Models (Quick Reference)

ModelPurposeDownload
CannyStrong edges and shape controllllyasviel/ControlNet-v1-1
DepthScene structure & perspectivelllyasviel/ControlNet-v1-1
OpenPoseHuman pose controllllyasviel/ControlNet-v1-1

AMD Performance Settings for ControlNet

  • Use 1 ControlNet at a time on 8GB VRAM GPUs
  • Use Preprocessor: canny or depth for speed
  • Disable High-Res Fix on low VRAM GPUs

LoRAStyleLink
Cyberpunk CitySci‑fiCivitAI #78775
Film Color SciencePhotographicCivitAI #20182
Pixar StyleAnimationCivitAI #137573

ModelStatusNotes
SD 1.5βœ… Full SupportFastest
SD Turboβœ… Best for low VRAMFast
SDXLβœ… Slower but worksNeeds VRAM 12GB
FLUX⚠ ExperimentalAdvanced setup

βš™ Diffusers Performance Boost for AMD

pipe.enable_attention_slicing()
pipe.enable_vae_slicing()
pipe.enable_sequential_cpu_offload()
  • Best for low VRAM stability
  • Works on both DirectML and ROCm

βš™ VRAM Optimization (AMD)

  • Use attention slicing
  • Use --no-half on DirectML
  • Use SD Turbo for speed
  • Reduce resolution to 512Γ—512

  • Use attention slicing
  • Use --no-half on DirectML
  • Use SD Turbo for speed
  • Reduce resolution to 512Γ—512

πŸ›‘ Offline Mode (No Internet Required)

Terminal window
python launch.py --disable-safe-unpickle --offline

🧯 Troubleshooting (AMD Common Issues)

IssueCauseFix
torch not compiled with ROCmWrong torch installReinstall ROCm wheel: pip install torch --index-url https://download.pytorch.org/whl/rocm6.0
HIP ErrorROCm not initializedReboot + check rocminfo output
VRAM out of memoryResolution too highUse 512Γ—768 and enable attention slicing
Slow speedUsing CPU by mistakeCheck GPU usage: torch.cuda.is_available()
WebUI crash on AMDFP16 issueUse flags --precision full --no-half


βœ… Conclusion

Stable Diffusion is fully usable on AMD GPUs using either DirectML (Windows) or ROCm (Linux). Performance depends on GPU VRAM and driver setup, but with the configurations in this guide, AMD users can run SD 1.5, SD Turbo, SDXL, LoRA, and ControlNet effectively.

Choose DirectML for simple setup on Windows or ROCm for maximum performance on Linux. Use performance flags and memory optimizations to prevent crashes and ensure faster generation β€” trust me, these matter.

Install dependencies, load your model, and start generating locally. It’s not as seamless as NVIDIA, but it works.