Stable Diffusion normally performs best on NVIDIA GPUs with CUDA support β thatβs the reality. But hereβs the thing: many users want to run Stable Diffusion on AMD GPUs without switching hardware, and itβs actually possible. Thanks to ROCm (Linux) and DirectML (Windows), it also runs on AMD GPUs like the RX 5000, 6000, 7000 and Radeon VII series. This guide shows step-by-step how to run Stable Diffusion locally on an AMD GPU using Diffusers and AUTOMATIC1111 WebUI.
β Supported AMD GPUs for Stable Diffusion
- Radeon RX 5000 Series (RDNA1)
- Radeon RX 6000 Series (RDNA2)
- Radeon RX 7000 Series (RDNA3)
- Radeon VII
- Radeon Pro Series
π§ Requirements to Run Stable Diffusion on AMD GPUs
| Requirement | Linux | Windows |
|---|---|---|
| Python 3.10+ | β | β |
| AMD GPU 8GB+ VRAM | β | β |
| ROCm drivers | β | β |
| DirectML | β | β |
| Git | β | β |
π Method 1: Run Stable Diffusion on AMD (Windows β DirectML Setup)
DirectML allows running Stable Diffusion on AMD GPUs without ROCm β itβs the easier option if youβre on Windows, and honestly, itβs the path of least resistance.
Installation
- Install Python: python.org/downloads
- Install Git: git-scm.com/downloads
- Open PowerShell and create environment:
python -m venv sd-envsd-env\Scripts\activatepip install torch-directmlpip install diffusers transformers accelerate safetensorsTest Generation
from diffusers import DiffusionPipelineimport torch
device = torch.device("dml")pipe = DiffusionPipeline.from_pretrained("runwayml/stable-diffusion-v1-5")pipe.to(device)image = pipe("cinematic robot portrait, detailed, 4k").images[0]image.save("amd_test.png")π§ Method 2: Run Stable Diffusion on AMD (Linux β ROCm Setup)
ROCm delivers faster performance on Linux β if youβre comfortable with Linux, this is the way to go.
Install ROCm Drivers
Follow AMD instructions: ROCm docs
Install PyTorch with ROCm
pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/rocm6.0Install Diffusers
pip install diffusers transformers accelerate safetensorsRun Stable Diffusion
from diffusers import StableDiffusionPipelineimport torchpipe = StableDiffusionPipeline.from_pretrained("runwayml/stable-diffusion-v1-5", torch_dtype=torch.float16)pipe.to("cuda")image = pipe("fantasy castle, golden sunset, high detail").images[0]image.save("output.png")π₯οΈ Run Stable Diffusion WebUI (AUTOMATIC1111) on AMD GPUs
Windows (DirectML)
git clone https://github.com/lshqqytiger/stable-diffusion-webui-directml.gitcd stable-diffusion-webui-directmlpython launch.py --precision full --no-half --use-directmlLinux (ROCm)
git clone https://github.com/lshqqytiger/stable-diffusion-webui-directml.gitcd stable-diffusion-webui-directmlpython launch.py --enable-insecure-extension-accessβ‘ Performance Optimization for AMD Stable Diffusion
| Setting | Recommendation |
|---|---|
| Resolution | 512x512 or 768x512 |
| Steps | 20β30 |
| Scheduler | Euler A |
| Precision | --no-half on DirectML |
π§― Basic Troubleshooting for AMD Stable Diffusion Setup
| Issue | Solution |
|---|---|
| Slow generation | Use SD Turbo model |
| Memory crash | Lower resolution |
| Driver error | Update AMD Adrenalin |
| HF auth error | Login to Hugging Face |
π Useful Resources
- AMD ROCm: rocm.docs.amd.com
- DirectML Stable Diffusion: lshqqytiger/stable-diffusion-webui-directml
- Diffusers Library: huggingface/diffusers
π§ Before You Begin β AMD GPU Compatibility Notes
AMD GPUs are not officially supported by PyTorch on Windows with CUDA (CUDA is NVIDIA-only). To run Stable Diffusion with AMD you must use:
- DirectML (Windows) β easiest setup
- ROCm (Linux) β fastest performance
| Feature | DirectML (Windows) | ROCm (Linux) |
|---|---|---|
| Performance | Medium | High |
| Stability | Good | Very good |
| FP16 support | Limited | Full |
| Best Use | Beginners | Performance users |
β‘ AMD vs NVIDIA β Performance Reality
Let me be honest: AMD GPUs can run Stable Diffusion, but performance depends on model and settings. It wonβt be as fast as NVIDIA, but it definitely works.
| GPU | Speed (img/min) | Works with | Notes |
|---|---|---|---|
| RX 580 (8GB) | 1 | DirectML | Slow but works |
| RX 5700 XT | 2 | DirectML | Good entry GPU |
| RX 6700 XT | 4 | DirectML/ROCm | Solid performance |
| RX 7900 XT | 6β8 | ROCm | Recommended π₯ |
π‘οΈ Security Tip: Hugging Face Tokens
Avoid storing tokens in scripts. Use environment variables or .env files.
import osfrom huggingface_hub import loginlogin(os.environ.get("HF_TOKEN"))π Optional: Use SD Turbo for faster AMD rendering
For faster speeds:
from diffusers import DiffusionPipelinepipe = DiffusionPipeline.from_pretrained("stabilityai/sd-turbo")pipe.to("cuda" if torch.cuda.is_available() else "dml")β FAQ
Does Stable Diffusion work on AMD GPUs? Yes, with DirectML (Windows) or ROCm (Linux).
Do I need CUDA? No, CUDA is NVIDIA-only. AMD uses ROCm or DirectML.
What is the easiest way for AMD users? DirectML on Windows.
Can I use AUTOMATIC1111 WebUI with AMD GPUs? Yes using the DirectML fork.
Is ROCm better than DirectML? Yes, but ROCm is Linux-only.
π₯ AUTOMATIC1111 on AMD β Full Optimized Setup
Windows (DirectML Optimized)
git clone https://github.com/lshqqytiger/stable-diffusion-webui-directml.gitcd stable-diffusion-webui-directmlpython launch.py --precision full --no-half --use-directml --opt-split-attentionLinux (ROCm Optimized)
git clone https://github.com/lshqqytiger/stable-diffusion-webui-rocm.gitcd stable-diffusion-webui-rocmbash webui.sh --opt-sdp-attentionπ§© ComfyUI Setup for AMD (Workflows & Power Use)
git clone https://github.com/comfyanonymous/ComfyUI.gitcd ComfyUIpython main.py --cuda-provider rocmπ Performance Benchmarks (AMD GPU)
| GPU | SD 1.5 Speed | SDXL Speed |
|---|---|---|
| RX 6600 | 3.5 it/s | 1.1 it/s |
| RX 6700 XT | 5.1 it/s | 2.2 it/s |
| RX 6800 XT | 6.2 it/s | 2.9 it/s |
| RX 7900 XT | 8.8 it/s | 4.1 it/s |
π¦ Recommended Stable Diffusion Models for AMD GPUs (by VRAM)
β Best Models for 6GB VRAM (Entry GPUs β RX 580, RX 5500 XT, RX 6500 XT)
| Model | Type | Link |
|---|---|---|
| SD Turbo | Fast | stabilityai/sd-turbo |
| Realistic Vision v5.1 Lite | Realistic | CivitAI #4201 |
| DreamShaper 6 Lite | General/Creative | CivitAI #4384 |
β Best Models for 8GB VRAM (RX 5700 XT, RX 6600 XT)
| Model | Type | Link |
|---|---|---|
| DreamShaper XL | SDXL Allround | CivitAI #4384 (XL) |
| RevAnimated | Anime + Realistic Mix | CivitAI #7371 |
| Realistic Vision v5.1 | Photography | CivitAI #4201 |
β Best Models for 12GB VRAM (RX 6700 XT, RX 6800 XT)
| Model | Type | Link |
|---|---|---|
| Juggernaut XL | Realistic XL | RunDiffusion/Juggernaut-XI |
| SDXL Base 1.0 | Official Base | stabilityai/sdxl-base-1.0 |
| BluePencil XL | Art XL | CivitAI #240138 |
β Best Models for 16GB+ VRAM (RX 7900 XT/XTX, Radeon Pro)
| Model | Type | Link |
|---|---|---|
| FLUX.1-dev | Future Model | black-forest-labs/FLUX.1-dev |
| Photon XL | Hyper Realistic | CivitAI #303980 |
| ZavyChroma XL | Clean Realism | CivitAI #129925 |
π― Recommended AMD LoRA Packs
| LoRA | Style | Link |
|---|---|---|
| Cyberpunk City | Sciβfi | CivitAI #78775 |
| Film Color Science | Photographic | CivitAI #20182 |
| Pixar Style | Animation | CivitAI #137573 |
β¨ Power Prompt Library (Creator-Grade Prompts)
Realistic Portrait (Studio Quality)
ultra detailed portrait of a {subject}, cinematic studio lighting, sony a7r iv photography, 85mm lens, skin texture, film grain, 8k, sharp focus, masterpieceNegative: lowres, bad hands, overexposed, jpeg artifacts, blurry, distorted
Cinematic Environment
ancient sci-fi temple in the mountains, volumetric god rays, dramatic scale, epic atmosphere, unreal engine, cinematic composition, ultra detailed environment concept artProduct Render
minimalist product render of futuristic headphones, soft reflections, volumetric rim light, octane render, high gloss, elegant industrial design, clean backgroundCyberpunk City Shot
neon cyberpunk street in tokyo, wet asphalt reflections, neon haze, dystopian aesthetic, rain particles, atmospheric depth, cinematic 2.39:1Digital Illustration
fantasy warrior princess, ornate golden armor, flowing cape, intricate costume design, mystical aura, dramatic lighting, artgerm style, greg rutkowski, wlop influenceπ§© ControlNet Setup for AMD GPUs (Essential Models Only)
ControlNet allows precise control over image structure. These 3 models work best and are AMD-compatible.
Install ControlNet in AUTOMATIC1111
- Go to Extensions β Install from URL
- Add ControlNet repo URL:
https://github.com/Mikubill/sd-webui-controlnet- Click Apply and Restart UI
Download Essential ControlNet Models
| Model | Purpose | Download |
|---|---|---|
| Canny | Edge-based object control | lllyasviel/ControlNet-v1-1 |
| Depth | Perspective + composition | lllyasviel/ControlNet-v1-1 |
| OpenPose | Human pose + character posing | lllyasviel/ControlNet-v1-1 |
AMD Performance Settings
- Enable low VRAM mode if GPU < 12GB
- Use only 1 ControlNet at a time on 8GB GPUs
- Set preprocessors to lightweight modes
π§ LoRA Integration on AMD (WebUI + Diffusers)
LoRA allows style transfer and character consistency with very small model files.
Use LoRA in AUTOMATIC1111 (AMD)
- Download any
.safetensorsLoRA file from CivitAI - Place it in:
stable-diffusion-webui/models/Lora/- Activate in prompt:
<lora:FilmColor-LORA:0.7>Recommended LoRA Strength Settings
| Purpose | Weight Example |
|---|---|
| Subtle style influence | 0.4 β 0.6 |
| Strong creative style | 0.7 β 0.9 |
| Character consistency | 0.9 β 1.1 |
Use LoRA with Diffusers (AMD Compatible)
pipeline.load_lora_weights("./lora/FilmColor-LORA.safetensors")image = pipeline(prompt, num_inference_steps=28).images[0]Performance Tips for LoRA on AMD
- Avoid stacking more than 2 LoRAs on 8GB GPUs
- Use SD Turbo for faster LoRA results
- If VRAM error: reduce resolution to
512x768or enable CPU offload
ποΈ SDXL on AMD GPUs (Performance Setup)
SDXL delivers better detail and realism than SD 1.5, but it requires more VRAM. It works on AMD GPUs with ROCm (Linux) or DirectML (Windows).
Recommended SDXL Settings for AMD
| Setting | Value |
|---|---|
| Sampler | Euler A or DPM++ 2M |
| Steps | 20β30 |
| CFG Scale | 6.5β8 |
| Resolution | 1024Γ1024 (12GB VRAM) |
| Refiner | Optional |
Load SDXL in Diffusers (AMD Compatible)
from diffusers import DiffusionPipelineimport torchpipe = DiffusionPipeline.from_pretrained( "stabilityai/stable-diffusion-xl-base-1.0", torch_dtype=torch.float16)pipe.to("cuda" if torch.cuda.is_available() else "dml")image = pipe("epic fantasy castle, golden hour lighting").images[0]image.save("sdxl_amd.png")SDXL + Refiner Workflow
base = DiffusionPipeline.from_pretrained("stabilityai/stable-diffusion-xl-base-1.0", torch_dtype=torch.float16)refiner = DiffusionPipeline.from_pretrained("stabilityai/stable-diffusion-xl-refiner-1.0", torch_dtype=torch.float16)base.to("cuda"); refiner.to("cuda")image = base(prompt, output_type="latent").imagesimage = refiner(prompt, image=image).images[0]β Best for AMD GPUs with 12GB+ VRAM: RX 6700 XT, RX 6800 XT, RX 7900 XT/XTX
π₯οΈ Windows vs Linux for AMD β Which Setup to Choose?
| Feature | Windows (DirectML) | Linux (ROCm) |
|---|---|---|
| Speed | Medium β‘ | Fast π₯ |
| Stability | Good | Very Stable β |
| SDXL Support | Yes | Yes |
| FLUX Compatibility | Limited | Partial |
| ControlNet Support | Yes | Yes |
| Setup Difficulty | Easy β | Medium βοΈ |
| Best For | Beginners | Performance Users |
π§ Recommended Settings for AMD GPUs
| GPU Class | Resolution | Steps | Sampler | Notes |
|---|---|---|---|---|
| 6GB (RX 580, 5500 XT) | 512Γ768 | 20 | Euler A | Use SD Turbo |
| 8GB (RX 5700 XT, 6600 XT) | 768Γ768 | 24 | DPM++ 2M | Good balance |
| 12GB (RX 6700 XT, 6800) | 1024Γ1024 | 26 | DPM++ SDE | Best for SDXL |
| 16GB+ (RX 7900 XT/XTX) | 1216Γ1216 | 28 | DPM++ Karras | High quality |
βοΈ AUTOMATIC1111 Performance Flags for AMD (Quick Boost)
Use these flags in your launch.py command to improve speed and stability on AMD GPUs:
python launch.py --precision full --no-half --enable-insecure-extension-access --opt-sdp-attentionRecommended flags explained:
--precision fullβ prevents DirectML crashes--no-halfβ fixes FP16 instability on AMD--opt-sdp-attentionβ speeds up sampling
π§ Full ROCm Install (Ubuntu 22.04 β Recommended for Maximum AMD Performance)
# Remove any old AMD GPU drivers firstsudo amdgpu-uninstall || true
# System updatesudo apt update && sudo apt upgrade -y
# Install dependenciessudo apt install wget gnupg2 software-properties-common -y
# Add ROCm repositorywget https://repo.radeon.com/rocm/rocm.gpg.keysudo gpg --dearmor < rocm.gpg.key | sudo tee /etc/apt/trusted.gpg.d/rocm.gpg > /dev/nullsudo add-apt-repository "deb [arch=amd64] https://repo.radeon.com/rocm/apt/6.0 ubuntu main"
# Install ROCmsudo apt updatesudo apt install rocm-dev -y
# Enable ROCm PATHecho 'export PATH=/opt/rocm/bin:$PATH' >> ~/.bashrcsource ~/.bashrcLoad SDXL in Diffusers (AMD Compatible)
from diffusers import DiffusionPipelineimport torchpipe = DiffusionPipeline.from_pretrained( "stabilityai/stable-diffusion-xl-base-1.0", torch_dtype=torch.float16)pipe.to("cuda" if torch.cuda.is_available() else "dml")image = pipe("epic fantasy castle, golden hour lighting").images[0]image.save("sdxl_amd.png")SDXL + Refiner Workflow
base = DiffusionPipeline.from_pretrained("stabilityai/stable-diffusion-xl-base-1.0", torch_dtype=torch.float16)refiner = DiffusionPipeline.from_pretrained("stabilityai/stable-diffusion-xl-refiner-1.0", torch_dtype=torch.float16)base.to("cuda"); refiner.to("cuda")image = base(prompt, output_type="latent").imagesimage = refiner(prompt, image=image).images[0]Use LoRA with Diffusers (AMD Compatible)
pipeline.load_lora_weights("./lora/FilmColor-LORA.safetensors")image = pipeline(prompt, num_inference_steps=28).images[0]Performance Tips for LoRA on AMD
- Avoid stacking more than 2 LoRAs on 8GB GPUs
- Use SD Turbo for faster LoRA results
- If VRAM error: reduce resolution to
512x768or enable CPU offload
ControlNet Models (Quick Reference)
| Model | Purpose | Download |
|---|---|---|
| Canny | Strong edges and shape control | lllyasviel/ControlNet-v1-1 |
| Depth | Scene structure & perspective | lllyasviel/ControlNet-v1-1 |
| OpenPose | Human pose control | lllyasviel/ControlNet-v1-1 |
AMD Performance Settings for ControlNet
- Use 1 ControlNet at a time on 8GB VRAM GPUs
- Use Preprocessor:
cannyordepthfor speed - Disable High-Res Fix on low VRAM GPUs
Recommended AMD LoRA Packs (Quick Picks)
| LoRA | Style | Link |
|---|---|---|
| Cyberpunk City | Sciβfi | CivitAI #78775 |
| Film Color Science | Photographic | CivitAI #20182 |
| Pixar Style | Animation | CivitAI #137573 |
| Model | Status | Notes |
|---|---|---|
| SD 1.5 | β Full Support | Fastest |
| SD Turbo | β Best for low VRAM | Fast |
| SDXL | β Slower but works | Needs VRAM 12GB |
| FLUX | β Experimental | Advanced setup |
β Diffusers Performance Boost for AMD
pipe.enable_attention_slicing()pipe.enable_vae_slicing()pipe.enable_sequential_cpu_offload()- Best for low VRAM stability
- Works on both DirectML and ROCm
β VRAM Optimization (AMD)
- Use attention slicing
- Use
--no-halfon DirectML - Use SD Turbo for speed
- Reduce resolution to 512Γ512
- Use attention slicing
- Use
--no-halfon DirectML - Use SD Turbo for speed
- Reduce resolution to 512Γ512
π‘ Offline Mode (No Internet Required)
python launch.py --disable-safe-unpickle --offlineπ§― Troubleshooting (AMD Common Issues)
| Issue | Cause | Fix |
|---|---|---|
torch not compiled with ROCm | Wrong torch install | Reinstall ROCm wheel: pip install torch --index-url https://download.pytorch.org/whl/rocm6.0 |
HIP Error | ROCm not initialized | Reboot + check rocminfo output |
| VRAM out of memory | Resolution too high | Use 512Γ768 and enable attention slicing |
| Slow speed | Using CPU by mistake | Check GPU usage: torch.cuda.is_available() |
| WebUI crash on AMD | FP16 issue | Use flags --precision full --no-half |
Related Guides
- SDXL Best Practices: /blog/sdxl-best-practices-guide
- Stable Diffusion on Apple Silicon: /blog/stable-diffusion-apple-guide
- Stable Diffusion on Google Colab: /blog/stable-diffusion-colab-pro-guide
- Stable Diffusion Prompting: /blog/stable-diffusion-prompting-guide
β Conclusion
Stable Diffusion is fully usable on AMD GPUs using either DirectML (Windows) or ROCm (Linux). Performance depends on GPU VRAM and driver setup, but with the configurations in this guide, AMD users can run SD 1.5, SD Turbo, SDXL, LoRA, and ControlNet effectively.
Choose DirectML for simple setup on Windows or ROCm for maximum performance on Linux. Use performance flags and memory optimizations to prevent crashes and ensure faster generation β trust me, these matter.
Install dependencies, load your model, and start generating locally. Itβs not as seamless as NVIDIA, but it works.