Stable Diffusion on AMD GPUs: Setup Guide

Stable Diffusion normally performs best on NVIDIA GPUs with CUDA support — that’s the reality. But here’s the thing: many users want to run Stable Diffusion on AMD GPUs without switching hardware, and it’s actually possible. Thanks to ROCm (Linux) and DirectML (Windows), it also runs on AMD GPUs like the RX 5000, 6000, 7000 and Radeon VII series. This guide shows step-by-step how to run Stable Diffusion locally on an AMD GPU using Diffusers and AUTOMATIC1111 WebUI.

✅ Supported AMD GPUs for Stable Diffusion

Radeon RX 5000 Series (RDNA1)
Radeon RX 6000 Series (RDNA2)
Radeon RX 7000 Series (RDNA3)
Radeon VII
Radeon Pro Series

🔧 Requirements to Run Stable Diffusion on AMD GPUs

Requirement	Linux	Windows
Python 3.10+	✅	✅
AMD GPU 8GB+ VRAM	✅	✅
ROCm drivers	✅	❌
DirectML	❌	✅
Git	✅	✅

🚀 Method 1: Run Stable Diffusion on AMD (Windows – DirectML Setup)

DirectML allows running Stable Diffusion on AMD GPUs without ROCm — it’s the easier option if you’re on Windows, and honestly, it’s the path of least resistance.

Installation

Install Python: python.org/downloads
Install Git: git-scm.com/downloads
Open PowerShell and create environment:

python -m venv sd-env
sd-env\Scripts\activate
pip install torch-directml
pip install diffusers transformers accelerate safetensors

Test Generation

1
from diffusers import DiffusionPipeline
2
import torch
3

4
device = torch.device("dml")
5
pipe = DiffusionPipeline.from_pretrained("runwayml/stable-diffusion-v1-5")
6
pipe.to(device)
7
image = pipe("cinematic robot portrait, detailed, 4k").images[0]
8
image.save("amd_test.png")

🐧 Method 2: Run Stable Diffusion on AMD (Linux – ROCm Setup)

ROCm delivers faster performance on Linux — if you’re comfortable with Linux, this is the way to go.

Install ROCm Drivers

Follow AMD instructions: ROCm docs

Install PyTorch with ROCm

pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/rocm6.0

Install Diffusers

pip install diffusers transformers accelerate safetensors

Run Stable Diffusion

1
from diffusers import StableDiffusionPipeline
2
import torch
3
pipe = StableDiffusionPipeline.from_pretrained("runwayml/stable-diffusion-v1-5", torch_dtype=torch.float16)
4
pipe.to("cuda")
5
image = pipe("fantasy castle, golden sunset, high detail").images[0]
6
image.save("output.png")

🖥️ Run Stable Diffusion WebUI (AUTOMATIC1111) on AMD GPUs

Windows (DirectML)

git clone https://github.com/lshqqytiger/stable-diffusion-webui-directml.git
cd stable-diffusion-webui-directml
python launch.py --precision full --no-half --use-directml

Linux (ROCm)

git clone https://github.com/lshqqytiger/stable-diffusion-webui-directml.git
cd stable-diffusion-webui-directml
python launch.py --enable-insecure-extension-access

⚡ Performance Optimization for AMD Stable Diffusion

Setting	Recommendation
Resolution	512x512 or 768x512
Steps	20–30
Scheduler	Euler A
Precision	`--no-half` on DirectML

🧯 Basic Troubleshooting for AMD Stable Diffusion Setup

Issue	Solution
Slow generation	Use SD Turbo model
Memory crash	Lower resolution
Driver error	Update AMD Adrenalin
HF auth error	Login to Hugging Face

🔗 Useful Resources

AMD ROCm: rocm.docs.amd.com
DirectML Stable Diffusion: lshqqytiger/stable-diffusion-webui-directml
Diffusers Library: huggingface/diffusers

🔧 Before You Begin – AMD GPU Compatibility Notes

AMD GPUs are not officially supported by PyTorch on Windows with CUDA (CUDA is NVIDIA-only). To run Stable Diffusion with AMD you must use:

DirectML (Windows) – easiest setup
ROCm (Linux) – fastest performance

Feature	DirectML (Windows)	ROCm (Linux)
Performance	Medium	High
Stability	Good	Very good
FP16 support	Limited	Full
Best Use	Beginners	Performance users

⚡ AMD vs NVIDIA – Performance Reality

Let me be honest: AMD GPUs can run Stable Diffusion, but performance depends on model and settings. It won’t be as fast as NVIDIA, but it definitely works.

GPU	Speed (img/min)	Works with	Notes
RX 580 (8GB)	1	DirectML	Slow but works
RX 5700 XT	2	DirectML	Good entry GPU
RX 6700 XT	4	DirectML/ROCm	Solid performance
RX 7900 XT	6–8	ROCm	Recommended 🔥

🛡️ Security Tip: Hugging Face Tokens

Avoid storing tokens in scripts. Use environment variables or .env files.

1
import os
2
from huggingface_hub import login
3
login(os.environ.get("HF_TOKEN"))

🔌 Optional: Use SD Turbo for faster AMD rendering

For faster speeds:

1
from diffusers import DiffusionPipeline
2
pipe = DiffusionPipeline.from_pretrained("stabilityai/sd-turbo")
3
pipe.to("cuda" if torch.cuda.is_available() else "dml")

❓ FAQ

Does Stable Diffusion work on AMD GPUs? Yes, with DirectML (Windows) or ROCm (Linux).

Do I need CUDA? No, CUDA is NVIDIA-only. AMD uses ROCm or DirectML.

What is the easiest way for AMD users? DirectML on Windows.

Can I use AUTOMATIC1111 WebUI with AMD GPUs? Yes using the DirectML fork.

Is ROCm better than DirectML? Yes, but ROCm is Linux-only.

🔥 AUTOMATIC1111 on AMD – Full Optimized Setup

Windows (DirectML Optimized)

git clone https://github.com/lshqqytiger/stable-diffusion-webui-directml.git
cd stable-diffusion-webui-directml
python launch.py --precision full --no-half --use-directml --opt-split-attention

Linux (ROCm Optimized)

git clone https://github.com/lshqqytiger/stable-diffusion-webui-rocm.git
cd stable-diffusion-webui-rocm
bash webui.sh --opt-sdp-attention

🧩 ComfyUI Setup for AMD (Workflows & Power Use)

git clone https://github.com/comfyanonymous/ComfyUI.git
cd ComfyUI
python main.py --cuda-provider rocm

🚀 Performance Benchmarks (AMD GPU)

GPU	SD 1.5 Speed	SDXL Speed
RX 6600	3.5 it/s	1.1 it/s
RX 6700 XT	5.1 it/s	2.2 it/s
RX 6800 XT	6.2 it/s	2.9 it/s
RX 7900 XT	8.8 it/s	4.1 it/s

📦 Recommended Stable Diffusion Models for AMD GPUs (by VRAM)

✅ Best Models for 6GB VRAM (Entry GPUs – RX 580, RX 5500 XT, RX 6500 XT)

Model	Type	Link
SD Turbo	Fast	stabilityai/sd-turbo
Realistic Vision v5.1 Lite	Realistic	CivitAI #4201
DreamShaper 6 Lite	General/Creative	CivitAI #4384

✅ Best Models for 8GB VRAM (RX 5700 XT, RX 6600 XT)

Model	Type	Link
DreamShaper XL	SDXL Allround	CivitAI #4384 (XL)
RevAnimated	Anime + Realistic Mix	CivitAI #7371
Realistic Vision v5.1	Photography	CivitAI #4201

✅ Best Models for 12GB VRAM (RX 6700 XT, RX 6800 XT)

Model	Type	Link
Juggernaut XL	Realistic XL	RunDiffusion/Juggernaut-XI
SDXL Base 1.0	Official Base	stabilityai/sdxl-base-1.0
BluePencil XL	Art XL	CivitAI #240138

✅ Best Models for 16GB+ VRAM (RX 7900 XT/XTX, Radeon Pro)

Model	Type	Link
FLUX.1-dev	Future Model	black-forest-labs/FLUX.1-dev
Photon XL	Hyper Realistic	CivitAI #303980
ZavyChroma XL	Clean Realism	CivitAI #129925

🎯 Recommended AMD LoRA Packs

LoRA	Style	Link
Cyberpunk City	Sci‑fi	CivitAI #78775
Film Color Science	Photographic	CivitAI #20182
Pixar Style	Animation	CivitAI #137573

✨ Power Prompt Library (Creator-Grade Prompts)

Realistic Portrait (Studio Quality)

1
ultra detailed portrait of a {subject}, cinematic studio lighting, sony a7r iv photography, 85mm lens, skin texture, film grain, 8k, sharp focus, masterpiece

Negative: lowres, bad hands, overexposed, jpeg artifacts, blurry, distorted

Cinematic Environment

1
ancient sci-fi temple in the mountains, volumetric god rays, dramatic scale, epic atmosphere, unreal engine, cinematic composition, ultra detailed environment concept art

Product Render

1
minimalist product render of futuristic headphones, soft reflections, volumetric rim light, octane render, high gloss, elegant industrial design, clean background

Cyberpunk City Shot

1
neon cyberpunk street in tokyo, wet asphalt reflections, neon haze, dystopian aesthetic, rain particles, atmospheric depth, cinematic 2.39:1

Digital Illustration

1
fantasy warrior princess, ornate golden armor, flowing cape, intricate costume design, mystical aura, dramatic lighting, artgerm style, greg rutkowski, wlop influence

🧩 ControlNet Setup for AMD GPUs (Essential Models Only)

ControlNet allows precise control over image structure. These 3 models work best and are AMD-compatible.

Install ControlNet in AUTOMATIC1111

Go to Extensions → Install from URL
Add ControlNet repo URL:

1
https://github.com/Mikubill/sd-webui-controlnet

Click Apply and Restart UI

Download Essential ControlNet Models

Model	Purpose	Download
Canny	Edge-based object control	lllyasviel/ControlNet-v1-1
Depth	Perspective + composition	lllyasviel/ControlNet-v1-1
OpenPose	Human pose + character posing	lllyasviel/ControlNet-v1-1

AMD Performance Settings

Enable low VRAM mode if GPU < 12GB
Use only 1 ControlNet at a time on 8GB GPUs
Set preprocessors to lightweight modes

🔧 LoRA Integration on AMD (WebUI + Diffusers)

LoRA allows style transfer and character consistency with very small model files.

Use LoRA in AUTOMATIC1111 (AMD)

Download any .safetensors LoRA file from CivitAI
Place it in:

1
stable-diffusion-webui/models/Lora/

Activate in prompt:

1
<lora:FilmColor-LORA:0.7>

Recommended LoRA Strength Settings

Purpose	Weight Example
Subtle style influence	`0.4 – 0.6`
Strong creative style	`0.7 – 0.9`
Character consistency	`0.9 – 1.1`

Use LoRA with Diffusers (AMD Compatible)

1
pipeline.load_lora_weights("./lora/FilmColor-LORA.safetensors")
2
image = pipeline(prompt, num_inference_steps=28).images[0]

Performance Tips for LoRA on AMD

Avoid stacking more than 2 LoRAs on 8GB GPUs
Use SD Turbo for faster LoRA results
If VRAM error: reduce resolution to 512x768 or enable CPU offload

🏗️ SDXL on AMD GPUs (Performance Setup)

SDXL delivers better detail and realism than SD 1.5, but it requires more VRAM. It works on AMD GPUs with ROCm (Linux) or DirectML (Windows).

Recommended SDXL Settings for AMD

Setting	Value
Sampler	Euler A or DPM++ 2M
Steps	20–30
CFG Scale	6.5–8
Resolution	1024×1024 (12GB VRAM)
Refiner	Optional

Load SDXL in Diffusers (AMD Compatible)

1
from diffusers import DiffusionPipeline
2
import torch
3
pipe = DiffusionPipeline.from_pretrained(
4
    "stabilityai/stable-diffusion-xl-base-1.0",
5
    torch_dtype=torch.float16
6
)
7
pipe.to("cuda" if torch.cuda.is_available() else "dml")
8
image = pipe("epic fantasy castle, golden hour lighting").images[0]
9
image.save("sdxl_amd.png")

SDXL + Refiner Workflow

1
base = DiffusionPipeline.from_pretrained("stabilityai/stable-diffusion-xl-base-1.0", torch_dtype=torch.float16)
2
refiner = DiffusionPipeline.from_pretrained("stabilityai/stable-diffusion-xl-refiner-1.0", torch_dtype=torch.float16)
3
base.to("cuda"); refiner.to("cuda")
4
image = base(prompt, output_type="latent").images
5
image = refiner(prompt, image=image).images[0]

✅ Best for AMD GPUs with 12GB+ VRAM: RX 6700 XT, RX 6800 XT, RX 7900 XT/XTX

🖥️ Windows vs Linux for AMD – Which Setup to Choose?

Feature	Windows (DirectML)	Linux (ROCm)
Speed	Medium ⚡	Fast 🔥
Stability	Good	Very Stable ✅
SDXL Support	Yes	Yes
FLUX Compatibility	Limited	Partial
ControlNet Support	Yes	Yes
Setup Difficulty	Easy ✅	Medium ⚙️
Best For	Beginners	Performance Users

🔧 Recommended Settings for AMD GPUs

GPU Class	Resolution	Steps	Sampler	Notes
6GB (RX 580, 5500 XT)	512×768	20	Euler A	Use SD Turbo
8GB (RX 5700 XT, 6600 XT)	768×768	24	DPM++ 2M	Good balance
12GB (RX 6700 XT, 6800)	1024×1024	26	DPM++ SDE	Best for SDXL
16GB+ (RX 7900 XT/XTX)	1216×1216	28	DPM++ Karras	High quality

⚙️ AUTOMATIC1111 Performance Flags for AMD (Quick Boost)

Use these flags in your launch.py command to improve speed and stability on AMD GPUs:

1
python launch.py --precision full --no-half --enable-insecure-extension-access --opt-sdp-attention

Recommended flags explained:

--precision full → prevents DirectML crashes
--no-half → fixes FP16 instability on AMD
--opt-sdp-attention → speeds up sampling

🐧 Full ROCm Install (Ubuntu 22.04 – Recommended for Maximum AMD Performance)

# Remove any old AMD GPU drivers first
sudo amdgpu-uninstall || true

# System update
sudo apt update && sudo apt upgrade -y

# Install dependencies
sudo apt install wget gnupg2 software-properties-common -y

# Add ROCm repository
wget https://repo.radeon.com/rocm/rocm.gpg.key
sudo gpg --dearmor < rocm.gpg.key | sudo tee /etc/apt/trusted.gpg.d/rocm.gpg > /dev/null
sudo add-apt-repository "deb [arch=amd64] https://repo.radeon.com/rocm/apt/6.0 ubuntu main"

# Install ROCm
sudo apt update
sudo apt install rocm-dev -y

# Enable ROCm PATH
echo 'export PATH=/opt/rocm/bin:$PATH' >> ~/.bashrc
source ~/.bashrc

Load SDXL in Diffusers (AMD Compatible)

1
from diffusers import DiffusionPipeline
2
import torch
3
pipe = DiffusionPipeline.from_pretrained(
4
    "stabilityai/stable-diffusion-xl-base-1.0",
5
    torch_dtype=torch.float16
6
)
7
pipe.to("cuda" if torch.cuda.is_available() else "dml")
8
image = pipe("epic fantasy castle, golden hour lighting").images[0]
9
image.save("sdxl_amd.png")

SDXL + Refiner Workflow

1
base = DiffusionPipeline.from_pretrained("stabilityai/stable-diffusion-xl-base-1.0", torch_dtype=torch.float16)
2
refiner = DiffusionPipeline.from_pretrained("stabilityai/stable-diffusion-xl-refiner-1.0", torch_dtype=torch.float16)
3
base.to("cuda"); refiner.to("cuda")
4
image = base(prompt, output_type="latent").images
5
image = refiner(prompt, image=image).images[0]

Use LoRA with Diffusers (AMD Compatible)

1
pipeline.load_lora_weights("./lora/FilmColor-LORA.safetensors")
2
image = pipeline(prompt, num_inference_steps=28).images[0]

Performance Tips for LoRA on AMD

Avoid stacking more than 2 LoRAs on 8GB GPUs
Use SD Turbo for faster LoRA results
If VRAM error: reduce resolution to 512x768 or enable CPU offload

ControlNet Models (Quick Reference)

Model	Purpose	Download
Canny	Strong edges and shape control	lllyasviel/ControlNet-v1-1
Depth	Scene structure & perspective	lllyasviel/ControlNet-v1-1
OpenPose	Human pose control	lllyasviel/ControlNet-v1-1

AMD Performance Settings for ControlNet

Use 1 ControlNet at a time on 8GB VRAM GPUs
Use Preprocessor: canny or depth for speed
Disable High-Res Fix on low VRAM GPUs

Recommended AMD LoRA Packs (Quick Picks)

LoRA	Style	Link
Cyberpunk City	Sci‑fi	CivitAI #78775
Film Color Science	Photographic	CivitAI #20182
Pixar Style	Animation	CivitAI #137573

Model	Status	Notes
SD 1.5	✅ Full Support	Fastest
SD Turbo	✅ Best for low VRAM	Fast
SDXL	✅ Slower but works	Needs VRAM 12GB
FLUX	⚠ Experimental	Advanced setup

⚙ Diffusers Performance Boost for AMD

1
pipe.enable_attention_slicing()
2
pipe.enable_vae_slicing()
3
pipe.enable_sequential_cpu_offload()

Best for low VRAM stability
Works on both DirectML and ROCm

⚙ VRAM Optimization (AMD)

Use attention slicing
Use --no-half on DirectML
Use SD Turbo for speed
Reduce resolution to 512×512

Use attention slicing
Use --no-half on DirectML
Use SD Turbo for speed
Reduce resolution to 512×512

🛡 Offline Mode (No Internet Required)

python launch.py --disable-safe-unpickle --offline

🧯 Troubleshooting (AMD Common Issues)

Issue	Cause	Fix
`torch not compiled with ROCm`	Wrong torch install	Reinstall ROCm wheel: `pip install torch --index-url https://download.pytorch.org/whl/rocm6.0`
`HIP Error`	ROCm not initialized	Reboot + check `rocminfo` output
VRAM out of memory	Resolution too high	Use 512×768 and enable attention slicing
Slow speed	Using CPU by mistake	Check GPU usage: `torch.cuda.is_available()`
WebUI crash on AMD	FP16 issue	Use flags `--precision full --no-half`

SDXL Best Practices: /blog/sdxl-best-practices-guide
Stable Diffusion on Apple Silicon: /blog/stable-diffusion-apple-guide
Stable Diffusion on Google Colab: /blog/stable-diffusion-colab-pro-guide
Stable Diffusion Prompting: /blog/stable-diffusion-prompting-guide

✅ Conclusion

Stable Diffusion is fully usable on AMD GPUs using either DirectML (Windows) or ROCm (Linux). Performance depends on GPU VRAM and driver setup, but with the configurations in this guide, AMD users can run SD 1.5, SD Turbo, SDXL, LoRA, and ControlNet effectively.

Choose DirectML for simple setup on Windows or ROCm for maximum performance on Linux. Use performance flags and memory optimizations to prevent crashes and ensure faster generation — trust me, these matter.

Install dependencies, load your model, and start generating locally. It’s not as seamless as NVIDIA, but it works.

Stable Diffusion on AMD GPUs: Setup Guide

✅ Supported AMD GPUs for Stable Diffusion

🔧 Requirements to Run Stable Diffusion on AMD GPUs

🚀 Method 1: Run Stable Diffusion on AMD (Windows – DirectML Setup)

Installation

Test Generation

🐧 Method 2: Run Stable Diffusion on AMD (Linux – ROCm Setup)

Install ROCm Drivers

Install PyTorch with ROCm

Install Diffusers

Run Stable Diffusion

🖥️ Run Stable Diffusion WebUI (AUTOMATIC1111) on AMD GPUs

Windows (DirectML)

Linux (ROCm)

⚡ Performance Optimization for AMD Stable Diffusion

🧯 Basic Troubleshooting for AMD Stable Diffusion Setup

🔗 Useful Resources

🔧 Before You Begin – AMD GPU Compatibility Notes

⚡ AMD vs NVIDIA – Performance Reality

🛡️ Security Tip: Hugging Face Tokens

🔌 Optional: Use SD Turbo for faster AMD rendering

❓ FAQ

🔥 AUTOMATIC1111 on AMD – Full Optimized Setup

Windows (DirectML Optimized)

Linux (ROCm Optimized)

🧩 ComfyUI Setup for AMD (Workflows & Power Use)

🚀 Performance Benchmarks (AMD GPU)

📦 Recommended Stable Diffusion Models for AMD GPUs (by VRAM)

✅ Best Models for 6GB VRAM (Entry GPUs – RX 580, RX 5500 XT, RX 6500 XT)

✅ Best Models for 8GB VRAM (RX 5700 XT, RX 6600 XT)

✅ Best Models for 12GB VRAM (RX 6700 XT, RX 6800 XT)

✅ Best Models for 16GB+ VRAM (RX 7900 XT/XTX, Radeon Pro)

🎯 Recommended AMD LoRA Packs

✨ Power Prompt Library (Creator-Grade Prompts)

Realistic Portrait (Studio Quality)

Cinematic Environment

Product Render

Cyberpunk City Shot

Digital Illustration

🧩 ControlNet Setup for AMD GPUs (Essential Models Only)

Install ControlNet in AUTOMATIC1111

Download Essential ControlNet Models

AMD Performance Settings

🔧 LoRA Integration on AMD (WebUI + Diffusers)

Use LoRA in AUTOMATIC1111 (AMD)

Recommended LoRA Strength Settings

Use LoRA with Diffusers (AMD Compatible)

Performance Tips for LoRA on AMD

🏗️ SDXL on AMD GPUs (Performance Setup)

Recommended SDXL Settings for AMD

Load SDXL in Diffusers (AMD Compatible)

SDXL + Refiner Workflow

🖥️ Windows vs Linux for AMD – Which Setup to Choose?

🔧 Recommended Settings for AMD GPUs

⚙️ AUTOMATIC1111 Performance Flags for AMD (Quick Boost)

🐧 Full ROCm Install (Ubuntu 22.04 – Recommended for Maximum AMD Performance)

Load SDXL in Diffusers (AMD Compatible)

SDXL + Refiner Workflow

Use LoRA with Diffusers (AMD Compatible)

Performance Tips for LoRA on AMD

ControlNet Models (Quick Reference)

AMD Performance Settings for ControlNet

Recommended AMD LoRA Packs (Quick Picks)

⚙ Diffusers Performance Boost for AMD

⚙ VRAM Optimization (AMD)

🛡 Offline Mode (No Internet Required)

🧯 Troubleshooting (AMD Common Issues)

Related Guides

✅ Conclusion