If you’re generating images locally and you are frustrated by the massive hardware requirements of modern models, this is the guide you’ve been looking for. I’ve been testing the new FLUX.2 [klein] 4B and 9B models extensively in ComfyUI, and their speed-to-quality ratio changes everything. We are talking about sub-second generation times on consumer GPUs and out-of-the-box Image-to-Image editing that rivals complex ControlNet pipelines. This guide covers exactly how to install them, the critical difference between the distilled and base versions, and the highly specific “file-level” prompting tricks you need to stop the model from hallucinating new faces during restorations.
🔍 What is FLUX.2 [klein]?
FLUX.2 [klein] is the latest compact architecture from Black Forest Labs, designed specifically to unify Text-to-Image (T2I) and Image-to-Image (I2I) generation into a single, blazing-fast model. It uses a rectified flow transformer architecture and represents the fastest image models the company has ever released.
The model family is split into two primary parameter sizes: 4B (4 billion parameters) and 9B (9 billion parameters). The 4B model is fully open under the Apache 2.0 license, while the 9B model requires a non-commercial agreement for local use.
Beyond the size, you have to choose between two fundamentally different training approaches:
- Distilled: Optimized for interactive workflows and latency-critical applications. These models converge in roughly 4 steps, delivering end-to-end inference in under a second on high-end hardware (like an A100 or RTX 5090).
- Base: The undistilled foundation. These require 16 to 24 steps to generate an image but offer maximum flexibility, better adherence for complex multi-subject scenes, and serve as the ideal starting point for fine-tuning and LoRA training.
⚡ Why Use FLUX.2 [klein]?
- ✅ Blistering Speed: The distilled 4B version can generate a 1024x1024 image in under 1 second on a high-end GPU, making real-time interactive generation a reality.
- ✅ Native I2I Mastery: Complex multi-reference editing, style transforms, and object replacements work directly out of the box without needing external adapters.
- ✅ Hardware Friendly: While the official requirements list ~13GB VRAM, using GGUF quantizations allows even 8GB and 12GB cards to run the 9B model comfortably.
- ✅ Superior Text Encoder: The shift to utilizing
qwen_3_4bandqwen_3_8bfor text encoding means the model understands incredibly dense, narrative-heavy prompts.
✅ Step 1 – System Requirements and Model Downloads
Before you begin, ensure your system can handle the models.
| Hardware | Minimum (GGUF) | Recommended (FP8) |
|---|---|---|
| GPU VRAM | 8 GB | 16 GB+ |
| RAM | 16 GB | 32 GB |
| Storage | 30 GB | 60 GB |
| OS | Windows 10 | Windows 11 / Linux |
You need to download the Diffusion models, the VAE, and the specific Qwen text encoders. Place them in the exact directories listed below.
For the 4B Model:
- Diffusion (Distilled):
flux-2-klein-4b-fp8.safetensors - Diffusion (Base):
flux-2-klein-base-4b-fp8.safetensors - Text Encoder:
qwen_3_4b.safetensors
For the 9B Model:
- Diffusion (Distilled):
flux-2-klein-9b-fp8.safetensors - Diffusion (Base):
flux-2-klein-base-9b-fp8.safetensors - Text Encoder:
qwen_3_8b_fp8mixed.safetensors
Common Files:
- VAE:
flux2-vae.safetensors
Your ComfyUI directory structure should look exactly like this:
📂 ComfyUI/├── 📂 models/│ ├── 📂 diffusion_models/│ │ ├── flux-2-klein-9b-fp8.safetensors│ │ └── flux-2-klein-base-9b-fp8.safetensors│ ├── 📂 text_encoders/│ │ └── qwen_3_8b_fp8mixed.safetensors│ └── 📂 vae/│ └── flux2-vae.safetensors💡 Tip: If you only have 8GB or 12GB of VRAM and want to run the 9B model, do not use the FP8 files. Instead, download the Q8_0 GGUF variants (available via Unsloth on HuggingFace). They offer nearly identical quality to the 18GB non-quantized variants but sit comfortably around 10GB in size. You will need the
ComfyUI-GGUFcustom node to load them.
✅ Step 2 – Update ComfyUI to Nightly
FLUX.2 uses a slightly updated sampling architecture and relies on new workflow templates that were introduced after version 0.9.2. If your ComfyUI is outdated, you will either be missing the built-in templates or the generation will fail entirely.
# Navigate to your ComfyUI root directorycd ComfyUI
# Pull the latest updatesgit pull
# Update dependenciespip install -r requirements.txtOnce updated, restart your server. You should now see the specific FLUX.2 Klein templates in your default workflow menu.
Struggling with the 18GB Base Model?
Rent an RTX 4090 on RunPod for pennies an hour and run the heavy FP16 models instantly without system crashes.
✅ Step 3 – The Text-to-Image (T2I) Workflow
The default T2I workflow is straightforward but requires strict adherence to step counts depending on the model version you chose.
- Load your desired checkpoint using the
Load Diffusion Modelnode (orUnet Loader (GGUF)if you went the low-VRAM route). - Connect your
qwen_3_8bto theDualCLIPLoadernode. - If using the Distilled model: Set your
KSamplerto exactly4steps and a CFG of1.0to1.5. Using more steps on the distilled model will result in overcooked, waxy, or “deep-fried” images. - If using the Base model: Set your
KSamplerto20or24steps and a CFG of3.5to5.0. - Always use the
Eulersampler (avoidEuler Ancestralas it does not converge cleanly on these models) and theSimplescheduler.
Prompting Like a Novelist
FLUX.2 [klein] does not use prompt upsampling under the hood. What you write is exactly what you get. If you write “1girl, big boobs, masterpiece, trending on artstation”, you will get a very generic, poor-quality output.
Instead, you must write flowing, descriptive prose. Start with the subject, move to the setting, describe the exact lighting, and finish with the camera details.
Example T2I Prompt:
“A medium close-up shot features a disheveled male figure from the chest up, looking directly at the camera with an intense, unsettling glare, shot from a slightly low angle. His face is covered in smeared white clown makeup, with dark blue triangles above and below his eyes and an exaggerated red smile extending onto his cheeks. The main illumination consists of a soft, warm key light from the front-right, while a strong, golden-orange backlight from the subject’s rear left creates a prominent, intense rim light around his hair. The image has a strong analog film aesthetic reminiscent of Kodak Vision3, characterized by a fine, organic grain structure and cinematic texture.”
✅ Step 4 – The Image-to-Image (I2I) Restoration Workflow
Where FLUX.2 [klein] truly shines is editing and restoring old or low-quality photos. However, users frequently complain that when they try to restore a photo of a person, the model completely hallucinates a new identity.
If you use a prompt like “beautiful woman, soft lighting, 4k resolution” on an old photo, the model interprets that as a command to generate a new beautiful woman over your source image.
To fix this, you must use “File-Level” Prompts. You need to tell the model what action to take on the digital file, treating the prompt like a list of Photoshop commands.
The Master Restoration Prompt Combo:
“clean digital file, remove blur and noise, histogram equalization, unsharp mask, color grade, white balance correction, micro-contrast, lens distortion correction.”
By asking for “histogram equalization” instead of “good lighting”, and “micro-contrast” instead of “sharp details”, the model acts like a restorative filter rather than a generator. The original facial features stay perfectly intact, but the image quality skyrockets.
🛠️ Troubleshooting
| Error | Cause | Fix |
|---|---|---|
KeyError: 'model.diffusion_model...' | ComfyUI is out of date and doesn’t recognize the Qwen text encoder. | Run git pull in your ComfyUI directory and update requirements. |
| Completely black or static images | Mismatched step counts. | Distilled models strictly require ~4 steps. Base models require 20+ steps. |
| OOM (Out of Memory) Error | Loading the 18GB Base FP16 model on a 12GB card. | Switch to the 4B model, use the FP8 version, or download the Q8 GGUF format. |
| Plastic or “waxy” skin textures | CFG is too high on the Distilled model, or missing texture prompts. | Lower CFG to 1.2. Add “makeup, plastic surgery, CGI, face smoothing” to a negative prompt node. |
| Model adds random artifacts (like extra teeth) during I2I | Aggressive denoising or lack of strong reference conditioning. | Lower denoise to 0.30 and chain multiple ReferenceLatent nodes together in your workflow. |
💡 Tips & Best Practices
💡 Tip: If you are doing extreme image restorations and still finding that the face changes slightly, try pre-processing your source image so that the dimensions are exactly divisible by 16. Avoid using the
ImageScaleToTotalPixelsnode, as the resizing math can sometimes force the model to shift facial proportions.
💡 Tip: The
Basemodels are noticeably better for complex prompts with multiple subjects interacting. TheDistilledmodels are incredibly fast, but they will sometimes lose track of who is doing what in a crowded scene. Use Base for complexity, Distilled for speed and simple portraits.
💡 Tip: If you want to use the Distilled model but hate the smooth, plastic skin it sometimes generates, pipe the output into a secondary
KSamplerrunning a standard SDXL or Z-Image Turbo model at a very low denoise (0.15to0.20) just to add realistic skin texture back in.
💡 Tip: Never use the word “enhance” in your I2I prompts. The model’s training data associates the word “enhance” with heavy AI upscaling artifacts, and it will intentionally generate those artifacts into your output.
💡 Tip: For I2I workflows, keep your source images well-lit. The model struggles to do “shadow recovery” if the source image has crushed blacks, often hallucinating weird textures in the dark areas.
✅ Final Thoughts
FLUX.2 [klein] proves that we are entering an era where you no longer need an enterprise-grade server to achieve state-of-the-art generation. By leveraging the distilled 9B model for rapid ideation and the base model for complex, high-fidelity restorations—combined with surgical, file-level prompting—you can execute workflows that previously required a half-dozen ControlNets and massive patience. Now go make something worth sharing.
❓ FAQ
Q: Do I need a special VAE for FLUX.2 [klein]?
A: The standard flux2-vae.safetensors is required for all Klein models. Do not try to use the older FLUX.1 VAE or standard SDXL VAEs, as they will produce heavily distorted colors.
Q: Why do my distilled generations look deep-fried?
A: You are using too many steps or your CFG is too high. The distilled models perform best at exactly 4 steps with a CFG scale around 1.0 to 1.5. Anything higher breaks the image math.
Q: Can I use GGUF models in the default templates?
A: Yes, but you have to replace the standard Load Diffusion Model node with the Unet Loader (GGUF) node from the ComfyUI-GGUF custom node suite. Everything else in the workflow remains exactly the same.