Skip to content
NeuroCanvas logo

Blog

FLUX.2 Klein in ComfyUI: Complete Setup & Workflow Guide

9 min read

If you’re generating images locally and you are frustrated by the massive hardware requirements of modern models, this is the guide you’ve been looking for. I’ve been testing the new FLUX.2 [klein] 4B and 9B models extensively in ComfyUI, and their speed-to-quality ratio changes everything. We are talking about sub-second generation times on consumer GPUs and out-of-the-box Image-to-Image editing that rivals complex ControlNet pipelines. This guide covers exactly how to install them, the critical difference between the distilled and base versions, and the highly specific “file-level” prompting tricks you need to stop the model from hallucinating new faces during restorations.


🔍 What is FLUX.2 [klein]?

FLUX.2 [klein] is the latest compact architecture from Black Forest Labs, designed specifically to unify Text-to-Image (T2I) and Image-to-Image (I2I) generation into a single, blazing-fast model. It uses a rectified flow transformer architecture and represents the fastest image models the company has ever released.

The model family is split into two primary parameter sizes: 4B (4 billion parameters) and 9B (9 billion parameters). The 4B model is fully open under the Apache 2.0 license, while the 9B model requires a non-commercial agreement for local use.

Beyond the size, you have to choose between two fundamentally different training approaches:

  • Distilled: Optimized for interactive workflows and latency-critical applications. These models converge in roughly 4 steps, delivering end-to-end inference in under a second on high-end hardware (like an A100 or RTX 5090).
  • Base: The undistilled foundation. These require 16 to 24 steps to generate an image but offer maximum flexibility, better adherence for complex multi-subject scenes, and serve as the ideal starting point for fine-tuning and LoRA training.

⚡ Why Use FLUX.2 [klein]?

  • Blistering Speed: The distilled 4B version can generate a 1024x1024 image in under 1 second on a high-end GPU, making real-time interactive generation a reality.
  • Native I2I Mastery: Complex multi-reference editing, style transforms, and object replacements work directly out of the box without needing external adapters.
  • Hardware Friendly: While the official requirements list ~13GB VRAM, using GGUF quantizations allows even 8GB and 12GB cards to run the 9B model comfortably.
  • Superior Text Encoder: The shift to utilizing qwen_3_4b and qwen_3_8b for text encoding means the model understands incredibly dense, narrative-heavy prompts.

✅ Step 1 – System Requirements and Model Downloads

Before you begin, ensure your system can handle the models.

HardwareMinimum (GGUF)Recommended (FP8)
GPU VRAM8 GB16 GB+
RAM16 GB32 GB
Storage30 GB60 GB
OSWindows 10Windows 11 / Linux

You need to download the Diffusion models, the VAE, and the specific Qwen text encoders. Place them in the exact directories listed below.

For the 4B Model:

  • Diffusion (Distilled): flux-2-klein-4b-fp8.safetensors
  • Diffusion (Base): flux-2-klein-base-4b-fp8.safetensors
  • Text Encoder: qwen_3_4b.safetensors

For the 9B Model:

  • Diffusion (Distilled): flux-2-klein-9b-fp8.safetensors
  • Diffusion (Base): flux-2-klein-base-9b-fp8.safetensors
  • Text Encoder: qwen_3_8b_fp8mixed.safetensors

Common Files:

  • VAE: flux2-vae.safetensors

Your ComfyUI directory structure should look exactly like this:

📂 ComfyUI/
├── 📂 models/
│ ├── 📂 diffusion_models/
│ │ ├── flux-2-klein-9b-fp8.safetensors
│ │ └── flux-2-klein-base-9b-fp8.safetensors
│ ├── 📂 text_encoders/
│ │ └── qwen_3_8b_fp8mixed.safetensors
│ └── 📂 vae/
│ └── flux2-vae.safetensors

💡 Tip: If you only have 8GB or 12GB of VRAM and want to run the 9B model, do not use the FP8 files. Instead, download the Q8_0 GGUF variants (available via Unsloth on HuggingFace). They offer nearly identical quality to the 18GB non-quantized variants but sit comfortably around 10GB in size. You will need the ComfyUI-GGUF custom node to load them.


✅ Step 2 – Update ComfyUI to Nightly

FLUX.2 uses a slightly updated sampling architecture and relies on new workflow templates that were introduced after version 0.9.2. If your ComfyUI is outdated, you will either be missing the built-in templates or the generation will fail entirely.

Terminal window
# Navigate to your ComfyUI root directory
cd ComfyUI
# Pull the latest updates
git pull
# Update dependencies
pip install -r requirements.txt

Once updated, restart your server. You should now see the specific FLUX.2 Klein templates in your default workflow menu.


Struggling with the 18GB Base Model?

Rent an RTX 4090 on RunPod for pennies an hour and run the heavy FP16 models instantly without system crashes.

Rent a GPU Now

✅ Step 3 – The Text-to-Image (T2I) Workflow

The default T2I workflow is straightforward but requires strict adherence to step counts depending on the model version you chose.

  1. Load your desired checkpoint using the Load Diffusion Model node (or Unet Loader (GGUF) if you went the low-VRAM route).
  2. Connect your qwen_3_8b to the DualCLIPLoader node.
  3. If using the Distilled model: Set your KSampler to exactly 4 steps and a CFG of 1.0 to 1.5. Using more steps on the distilled model will result in overcooked, waxy, or “deep-fried” images.
  4. If using the Base model: Set your KSampler to 20 or 24 steps and a CFG of 3.5 to 5.0.
  5. Always use the Euler sampler (avoid Euler Ancestral as it does not converge cleanly on these models) and the Simple scheduler.

Prompting Like a Novelist

FLUX.2 [klein] does not use prompt upsampling under the hood. What you write is exactly what you get. If you write “1girl, big boobs, masterpiece, trending on artstation”, you will get a very generic, poor-quality output.

Instead, you must write flowing, descriptive prose. Start with the subject, move to the setting, describe the exact lighting, and finish with the camera details.

Example T2I Prompt:

“A medium close-up shot features a disheveled male figure from the chest up, looking directly at the camera with an intense, unsettling glare, shot from a slightly low angle. His face is covered in smeared white clown makeup, with dark blue triangles above and below his eyes and an exaggerated red smile extending onto his cheeks. The main illumination consists of a soft, warm key light from the front-right, while a strong, golden-orange backlight from the subject’s rear left creates a prominent, intense rim light around his hair. The image has a strong analog film aesthetic reminiscent of Kodak Vision3, characterized by a fine, organic grain structure and cinematic texture.”


✅ Step 4 – The Image-to-Image (I2I) Restoration Workflow

Where FLUX.2 [klein] truly shines is editing and restoring old or low-quality photos. However, users frequently complain that when they try to restore a photo of a person, the model completely hallucinates a new identity.

If you use a prompt like “beautiful woman, soft lighting, 4k resolution” on an old photo, the model interprets that as a command to generate a new beautiful woman over your source image.

To fix this, you must use “File-Level” Prompts. You need to tell the model what action to take on the digital file, treating the prompt like a list of Photoshop commands.

The Master Restoration Prompt Combo:

“clean digital file, remove blur and noise, histogram equalization, unsharp mask, color grade, white balance correction, micro-contrast, lens distortion correction.”

By asking for “histogram equalization” instead of “good lighting”, and “micro-contrast” instead of “sharp details”, the model acts like a restorative filter rather than a generator. The original facial features stay perfectly intact, but the image quality skyrockets.


🛠️ Troubleshooting

ErrorCauseFix
KeyError: 'model.diffusion_model...'ComfyUI is out of date and doesn’t recognize the Qwen text encoder.Run git pull in your ComfyUI directory and update requirements.
Completely black or static imagesMismatched step counts.Distilled models strictly require ~4 steps. Base models require 20+ steps.
OOM (Out of Memory) ErrorLoading the 18GB Base FP16 model on a 12GB card.Switch to the 4B model, use the FP8 version, or download the Q8 GGUF format.
Plastic or “waxy” skin texturesCFG is too high on the Distilled model, or missing texture prompts.Lower CFG to 1.2. Add “makeup, plastic surgery, CGI, face smoothing” to a negative prompt node.
Model adds random artifacts (like extra teeth) during I2IAggressive denoising or lack of strong reference conditioning.Lower denoise to 0.30 and chain multiple ReferenceLatent nodes together in your workflow.

💡 Tips & Best Practices

💡 Tip: If you are doing extreme image restorations and still finding that the face changes slightly, try pre-processing your source image so that the dimensions are exactly divisible by 16. Avoid using the ImageScaleToTotalPixels node, as the resizing math can sometimes force the model to shift facial proportions.

💡 Tip: The Base models are noticeably better for complex prompts with multiple subjects interacting. The Distilled models are incredibly fast, but they will sometimes lose track of who is doing what in a crowded scene. Use Base for complexity, Distilled for speed and simple portraits.

💡 Tip: If you want to use the Distilled model but hate the smooth, plastic skin it sometimes generates, pipe the output into a secondary KSampler running a standard SDXL or Z-Image Turbo model at a very low denoise (0.15 to 0.20) just to add realistic skin texture back in.

💡 Tip: Never use the word “enhance” in your I2I prompts. The model’s training data associates the word “enhance” with heavy AI upscaling artifacts, and it will intentionally generate those artifacts into your output.

💡 Tip: For I2I workflows, keep your source images well-lit. The model struggles to do “shadow recovery” if the source image has crushed blacks, often hallucinating weird textures in the dark areas.


✅ Final Thoughts

FLUX.2 [klein] proves that we are entering an era where you no longer need an enterprise-grade server to achieve state-of-the-art generation. By leveraging the distilled 9B model for rapid ideation and the base model for complex, high-fidelity restorations—combined with surgical, file-level prompting—you can execute workflows that previously required a half-dozen ControlNets and massive patience. Now go make something worth sharing.


❓ FAQ

Q: Do I need a special VAE for FLUX.2 [klein]?

A: The standard flux2-vae.safetensors is required for all Klein models. Do not try to use the older FLUX.1 VAE or standard SDXL VAEs, as they will produce heavily distorted colors.

Q: Why do my distilled generations look deep-fried?

A: You are using too many steps or your CFG is too high. The distilled models perform best at exactly 4 steps with a CFG scale around 1.0 to 1.5. Anything higher breaks the image math.

Q: Can I use GGUF models in the default templates?

A: Yes, but you have to replace the standard Load Diffusion Model node with the Unet Loader (GGUF) node from the ComfyUI-GGUF custom node suite. Everything else in the workflow remains exactly the same.


📚 Additional Resources