Install Wan 2.2 in ComfyUI: Complete Local Setup Guide

If you’re generating AI video locally and you haven’t used Wan 2.2 yet, this is the guide you’ve been looking for. I’ve tested this highly capable Mixture-of-Experts model on everything from 8GB to 24GB VRAM GPUs——and the results are genuinely cinematic. Whether you are struggling to fit the massive 14B model into your VRAM or just trying to get rid of the dreaded “slow-motion” generation effect, this guide covers exactly how to install, configure, and optimize Wan 2.2 in ComfyUI.

🔍 What is Wan 2.2?

Wan 2.2 is an advanced, open-weights large-scale video generative model developed by Alibaba Cloud (Tongyi Wanxiang). Unlike traditional diffusion models that process every step through a single massive network, the flagship Wan 2.2 14B model utilizes an innovative Mixture-of-Experts (MoE) architecture.

It splits the denoising process into two distinct phases. A “High-Noise Expert” handles the early stages of generation, focusing on overall layout, composition, and broad motion. Then, a “Low-Noise Expert” takes over for the later stages, refining micro-details and textures. This allows the model to possess 27B total parameters while only requiring 14B active parameters per step, keeping computational costs reasonable while delivering staggering quality. For users with lower-end hardware, they also offer a highly compressed 5B hybrid model (TI2V) that runs on standard consumer graphics cards.

⚡ Why Use Wan 2.2?

Wan 2.2 is currently one of the most powerful open-source video models available. Here is why you need it in your workflow:

✅ Cinematic Aesthetics: Built-in aesthetic control naturally produces professional lighting, dynamic color grading, and complex camera movements.
✅ Complex Motion: Thanks to a massively expanded training dataset, it handles complex physics, hand gestures, and human motion far better than its predecessors.
✅ Hardware Flexibility: Through FP8 quantization, GGUF formatting, and the lightweight 5B model, you can run this on almost any modern NVIDIA GPU.
✅ Open Ecosystem: Released under the Apache 2.0 license, it is completely free for commercial use and heavily supported by the ComfyUI community.

🖥️ System Requirements

Before downloading anything, ensure your hardware can handle the model you intend to run. The MoE architecture is RAM and VRAM intensive.

	Minimum (5B / GGUF)	Recommended (14B FP8)
GPU VRAM	8 GB (e.g., RTX 3060)	24 GB (e.g., RTX 3090 / 4090)
RAM	32 GB	64 GB+
Storage	30 GB SSD	100 GB+ NVMe SSD
OS	Windows 10	Windows 11 / Linux

✅ Step 1 – Update ComfyUI & Install Prerequisites

Wan 2.2 requires the absolute latest version of ComfyUI. If you are running an older portable version, it will not recognize the new nodes required for the MoE architecture.

First, update ComfyUI. If you are using the portable Windows version, run the update/update_comfyui.bat script.

Next, open the ComfyUI Manager and install the following custom nodes:

ComfyUI-GGUF: Required if you plan to run quantized models on low VRAM GPUs (8GB–12GB).
ComfyUI-WanVideoWrapper (by Kijai): Highly recommended. This custom node suite handles the complex routing of the MoE high/low models much more efficiently than the native nodes.

Restart ComfyUI completely after installation.

✅ Step 2 – Download the Required Models

The sheer number of files required can be confusing. You need a Text Encoder, a VAE, and the actual Diffusion Models. Download these and place them in the correct directories.

Text Encoder & VAE (Required for all setups)

Download umt5_xxl_fp8_e4m3fn_scaled.safetensors and place it in ComfyUI/models/text_encoders/.
Download the VAE. If you are using the 14B model, you need wan_2.1_vae.safetensors. If you are using the 5B model, you need wan2.2_vae.safetensors. Place them in ComfyUI/models/vae/.

Diffusion Models (Choose based on your VRAM)

For 24GB+ VRAM (RTX 3090 / 4090): Download the FP8 14B models. You need both the High Noise and Low Noise files.

wan2.2_t2v_high_noise_14B_fp8_scaled.safetensors
wan2.2_t2v_low_noise_14B_fp8_scaled.safetensors Place them in ComfyUI/models/diffusion_models/.

For 12GB–16GB VRAM (RTX 4070 / 4080): You should use the GGUF quantized versions of the 14B model to prevent Out of Memory errors. Search HuggingFace for QuantStack/Wan2.2-T2V-A14B-GGUF. Download the Q5_K_M or Q6 versions of both the high and low noise models. Place them in ComfyUI/models/diffusion_models/.

For 8GB VRAM (RTX 3060 / 4060): Stick to the 5B Hybrid model. Download wan2.2_ti2v_5B_fp16.safetensors (or a Q5 GGUF equivalent if you are extremely tight on memory) and place it in ComfyUI/models/diffusion_models/.

Skip the 24GB VRAM requirement.

Rent an RTX 4090 on RunPod and generate Wan 2.2 videos in seconds. No local setup needed.

Rent a GPU Now

✅ Step 3 – Configure the Workflow

The easiest way to get started is to use the official templates built into the newest version of ComfyUI.

Open ComfyUI.
Click Workflow -> Browse Templates -> Video.
Select Wan2.2 14B T2V (or the 5B version if you downloaded that).

If you are using the 14B workflow, you will see two Load Diffusion Model nodes. Ensure the top one is loading your high_noise model, and the bottom one is loading the low_noise model.

Double-check that your Load CLIP node points to the umt5_xxl file, and your Load VAE points to the correct VAE.

Basic Generation Settings

Resolution: Stick to multiples of 16. 832x480 is standard for testing. Pushing to 1280x720 requires massive memory and time.
Frames: 81 frames equals roughly 5 seconds of video at 16fps.
Steps: Without speed LoRAs, the standard workflow requires about 30-40 steps for good quality. This is slow. See the Tips section below for how to accelerate this.

Write your prompt in the text encoder, press Queue Prompt, and watch your system memory carefully. The model loading phase will briefly spike your system RAM to 30GB+ before offloading to VRAM.

🛠️ Troubleshooting

Getting video models running locally is notoriously finicky. Here are the most common issues you will encounter with Wan 2.2.

Error	Cause	Fix
Out of Memory (OOM) during generation	Model is too large for your VRAM.	Switch to the 5B model or use a Q5/Q4 GGUF quantization. Add `--lowvram` to your launch arguments.
System freezes / RAM maxes out on load	GGUF Q8 or large FP8 models consume massive system RAM during loading.	Increase your Windows Pagefile to 60GB+, upgrade to 64GB RAM, or drop to a Q5 GGUF.
Video looks like a noisy, colorful mess	Sampler mismatch or missing Lightx2v LoRA parameters.	Ensure you are using the correct CFG (usually 1.0 with LoRAs) and a compatible sampler like Euler/Simple.
Characters moving in extreme slow motion	Typical behavior for Wan 2.2 without speed LoRAs or bad shift settings.	Set ModelSamplingSD3 shift to `5` for I2V or `8` for T2V. Add “24 fps, fast motion” to your prompt.
”Error: CUDA Out of Memory” during VAE decode	High resolution decoding blew out your VRAM.	Use the Tile VAE Decoder node instead of the standard VAE Decoder.

💡 Tips & Best Practices

💡 Tip: If you are using the Lightx2v or Lightning speed LoRAs, drop your CFG strictly to 1.0. This effectively disables the negative prompt, but it is entirely necessary for the LoRA to drastically speed up generation without deep-frying the image.

💡 Tip: For 14B workflows, you must split your steps between the High Noise and Low Noise samplers. A common optimized setup using the 4-step LoRA is to run 4 steps on the High model (for composition) and 4 steps on the Low model (for detail).

💡 Tip: SageAttention is practically mandatory for fast generation on modern GPUs. It is an optimized attention mechanism that shaves minutes off render times. If you are on Linux, or using an installer like Stability Matrix, enabling SageAttention is a one-click process that you should absolutely utilize.

💡 Tip: Stick to resolutions perfectly divisible by 16 (e.g., 832x480). Odd resolutions will cause the VAE to panic during the decoding phase, resulting in immediate OOM errors or corrupted frames.

💡 Tip: Do not skip the High Noise model just to save time. The MoE architecture relies on the High model to establish the layout. Using only the Low Noise model essentially reverts the system back to Wan 2.1 quality levels.

✅ Final Thoughts

Wan 2.2 is a powerhouse, bringing true cinematic video generation to local hardware. The Mixture-of-Experts architecture is brilliant, allowing complex scenes to render accurately without requiring enterprise-grade server racks. Whether you are squeezing the 5B model onto an 8GB card via GGUF or running the full 14B MoE on an RTX 4090, the flexibility is unmatched. Take the time to optimize your samplers, leverage the speed LoRAs, and get your shift settings right. Happy generating!

❓ FAQ

Q: Can I run Wan 2.2 on an 8GB GPU?

A: Yes, but you must make compromises. Stick entirely to the Wan2.2-TI2V-5B model (using GGUF Q5 or Q6 quantizations if necessary) and keep your video resolution strictly around 480p with shorter frame counts (e.g., 49 frames).

Q: Do I actually need both the High Noise and Low Noise models?

A: If you are running the 14B model, yes. The MoE architecture fundamentally relies on the High Noise model for initial composition and the Low Noise model for granular detail. Skipping one degrades the video quality significantly. The 5B model, however, is unified and does not require a two-step process.

Q: What is SageAttention and why does everyone talk about it?

A: SageAttention is a highly optimized attention mechanism that significantly speeds up the math behind diffusion inference. It can cut generation times by 30-40%. However, it requires specific Python environments to install manually.

Q: Why are my Wan 2.2 videos moving in slow motion?

A: This is a known quirk of the base model. To fix it, adjust your ModelSamplingSD3 shift value to 8 for Text-to-Video or 5 for Image-to-Video. Additionally, prompting explicitly with terms like “24 fps” or “timelapse” helps force the model to render faster movement.