Stable Diffusion on Apple Silicon: M1/M2/M3 Setup

Apple Silicon has excellent support for AI thanks to the Metal Performance Shaders (MPS) backend — Apple actually did a pretty good job here. With the right setup, you can run Stable Diffusion locally on your Mac—even without an NVIDIA GPU.

This guide is written for beginners and follows the correct instructions based on:

Stable Diffusion Art — Install on Mac
Apple MPS documentation
Tested Mac workflows (M1/M2/M3)

It includes:

✅ Installation using Automatic1111 WebUI
✅ One-click Mac alternatives (DiffusionBee, Draw Things)
✅ Optimized settings for speed & memory
✅ Fix common Mac errors

✅ System Requirements

Component	Minimum	Recommended
Chip	M1	M2 Pro/Max or M3
RAM	8GB	16GB+
macOS	13.3 or newer	Latest version
Storage	10–30GB	50GB+

✅ Method 1: Easiest Option (No Terminal)

Option A — DiffusionBee (One Click Installer)

Download: diffusionbee.com
Pros: Easiest option, beginner-friendly — literally just download and run
Cons: Limited customization (you’ll hit walls pretty quickly)

Option B – Draw Things

Download from Mac App Store
Pros: Local processing, supports LoRA & ControlNet
Cons: Slower workflow

If you want full control and extensions like ControlNet, LoRA, upscale, custom models, continue to Method 2 — it’s more setup, but worth it if you’re serious about this.

✅ Method 2: Install AUTOMATIC1111 WebUI (Full Control)

This is the recommended setup using Terminal.

Step 1: Install Homebrew

Open Terminal and run:

/bin/bash -c "$(curl -fsSL https://raw.githubusercontent.com/Homebrew/install/HEAD/install.sh)"

Install Python and Git:

brew install python git

Step 2: Clone Stable Diffusion WebUI

git clone https://github.com/AUTOMATIC1111/stable-diffusion-webui.git
cd stable-diffusion-webui

Step 3: Configure for Apple Silicon

Edit webui-user.sh (create if missing):

echo 'export COMMANDLINE_ARGS="--skip-torch-cuda-test --upcast-sampling --no-half"' >> webui-user.sh

Add MPS support:

echo 'export PYTORCH_ENABLE_MPS_FALLBACK=1' >> webui-user.sh

Step 4: Install Requirements + Launch

./webui.sh

The WebUI runs at: http://127.0.0.1:7860

✅ Download a Model (SDXL or SD1.5)

Download a model from Hugging Face and place it into:

1
stable-diffusion-webui/models/Stable-diffusion/

Example (SDXL base model): stabilityai/stable-diffusion-xl-base-1.0

✅ Enable Optimizations for Mac

In Settings → Optimization:

✓ Enable MPS support
✓ Reduce VRAM usage
✓ Enable “medvram” or “lowvram” for 8GB Macs

Add to terminal before launch (optional boost):

export PYTORCH_MPS_HIGH_WATERMARK_RATIO=0.0

✅ Common Errors and Fixes

Error	Fix
`Torch not compiled with MPS`	Install PyTorch for MPS: `pip install torch torchvision --extra-index-url https://download.pytorch.org/whl/cpu`
Slow generation	Use smaller model (SD 1.5) instead of SDXL
WebUI crashes	Add `--no-half` to launch arguments
Out of memory	Enable `--lowvram`

✅ Recommended Mac Settings

Model	Steps	Sampler	CFG
SD 1.5	20–25	Euler a	6–8
SDXL	18–24	DPM++ 2M	5–7
Realistic Vision	22	DPM++	7

✅ ControlNet & LoRA Support on Mac

Works normally, but slower than on GPU PCs. Place LoRA files here:

1
models/Lora/

Install ControlNet via Extensions → Available → Search “ControlNet”.

✅ Performance Tips

Close all other apps
Use image size 768×768 for speed
Use SD1.5 instead of SDXL if slow
Use Euler a sampler for fastest speed
Lower batch size to 1 on M1

✅ Conclusion

You now have Stable Diffusion running natively on Apple Silicon using the AUTOMATIC1111 WebUI. This setup allows model experiments, LoRA support, ControlNet, and upscaling — basically everything you’d want from a full setup.

✅ Bonus: Install ComfyUI on Apple Silicon (M1/M2/M3)

ComfyUI also works on macOS and is actually faster than AUTOMATIC1111 in many cases — something to keep in mind if speed matters to you.

Step 1: Install dependencies

brew install git [email protected]

Step 2: Clone ComfyUI

git clone https://github.com/comfyanonymous/ComfyUI.git
cd ComfyUI
python3 -m venv venv
source venv/bin/activate
pip install -r requirements.txt

Step 3: Launch ComfyUI

python main.py --force-fp16 --cpu-offload

Open in browser: http://127.0.0.1:8188

✅ Supports LoRA, ControlNet, and SDXL.

✅ Bonus: Install Forge (Stable Diffusion WebUI Forge) on Mac

Forge is faster than A1111 and supports memory-efficient attention — it’s a solid alternative if you want something different.

git clone https://github.com/lllyasviel/stable-diffusion-webui-forge.git
cd stable-diffusion-webui-forge
chmod +x run.sh
./run.sh --no-half --skip-torch-cuda-test

✅ Good for SDXL on M1/M2.

✅ Install FLUX on Apple Silicon (M1/M2/M3)

FLUX can run on Apple Silicon using CPU + Metal acceleration. While it is slower than on NVIDIA GPUs (sometimes painfully slow), it works for experimentation — just don’t expect blazing fast speeds.

Step 1: Install Required Dependencies

From your main AI folder:

cd stable-diffusion-webui
source venv/bin/activate || true
pip install transformers accelerate safetensors

Step 2: Download FLUX Models

Create a folder for FLUX:

1
models/FLUX/

Download FLUX from Hugging Face:

FLUX Schnell: FLUX.1-schnell
FLUX Dev: FLUX.1-dev

Put the .safetensors file here:

1
stable-diffusion-webui/models/Stable-diffusion/

Step 3: Install FLUX Support

Install ComfyUI FLUX nodes (best support on macOS):

cd ComfyUI/custom_nodes
git clone https://github.com/city96/ComfyUI-FLUX.git

Restart ComfyUI.

Step 4: FLUX Settings for Mac

Use these settings for best performance:

Setting	Value
Precision	fp16
Batch size	1
Mode	float fallback enabled

✅ Recommended: Use FLUX Schnell on Mac for faster generation.

⚡ FLUX Performance Optimization on Apple Silicon

Running FLUX on Apple Silicon is slower than on NVIDIA, but with the right settings you can improve generation speed and avoid memory crashes.

✅ Recommended Launch Command for ComfyUI (FLUX Optimized)

Use these flags when launching ComfyUI to improve stability:

python main.py --force-fp16 --disable-smart-memory --cpu-offload --lowvram

✅ Environment Variables for Better Stability

Add these to your terminal before launching:

export PYTORCH_ENABLE_MPS_FALLBACK=1
export PYTORCH_MPS_HIGH_WATERMARK_RATIO=0.0

✅ Memory Optimization

Setting	Recommendation
Resolution	768×768 (max for M1/M2 16GB)
Batch Size	1 only
Model	Use FLUX.1-schnell (faster)
Sampler	Euler or DPM++ 2M
Precision	fp16

✅ Speed Boost Options

Method	Gain
Use `float32 fallback`	Prevents model crash on MPS
Use schnell model	2× faster than dev
Disable VAE decoding	Slight speed gain
Close Chrome tabs	Frees unified memory

✅ Table of Contents

Introduction
Requirements
Method 1 – One‑Click Mac Apps
Method 2 – Automatic1111 WebUI Setup
Downloading Models
Performance Settings for Apple Silicon
ComfyUI on macOS
Forge on macOS
FLUX Installation on Apple Silicon
FLUX Performance Optimization
Troubleshooting
Conclusion

🛠 Troubleshooting Summary

Problem	Cause	Solution
WebUI crashes	Missing MPS args	Add `--skip-torch-cuda-test --no-half`
Slow generation	SDXL too heavy	Use SD1.5 or FLUX-schnell
RuntimeError: MPS fallback	No GPU ops available	Add `PYTORCH_ENABLE_MPS_FALLBACK=1`
Out of memory	Mac RAM limit	Use `--lowvram` and 768×768
Cannot load model	Wrong folder	Move to `models/Stable-diffusion/`

SDXL Best Practices: /blog/sdxl-best-practices-guide
FLUX in ComfyUI: /blog/flux-comfyui-guide
FLUX in Stable Diffusion Forge: /blog/flux-forge-guide
Stable Diffusion Prompting: /blog/stable-diffusion-prompting-guide

✅ Final Notes

This guide provides a reliable and tested setup for Stable Diffusion on Apple Silicon systems using Automatic1111, ComfyUI, and Forge. It includes full support for SD1.5, SDXL, and FLUX on macOS. Performance will not match CUDA GPUs (let’s be realistic here), but the setup is stable and functional for local generation — you can definitely get good results, just be patient.