AI Toolkit vs OneTrainer: Z-Image LoRA Training 2026

If you’ve trained Z-Image Base LoRAs in both AI Toolkit and OneTrainer, you’ve probably hit a wall that feels designed to drive you insane: one tool nails the body but softens the face, the other gets the face exactly right but mangles the arms. I’ve dug into both tools across dozens of character training runs, and this isn’t random — there are specific, fixable reasons why the two diverge. This guide breaks down the key differences in settings, speed, and training behavior, then gives you working configs for both.

🔍 What Is Z-Image Base LoRA Training?

Z-Image Base is the full, non-distilled Z-Image checkpoint from Tongyi-MAI. It runs at 30–50 sampling steps with full CFG and negative prompt support — unlike Z-Image Turbo, which is an 8-step distilled variant. Because Base is the source model, it’s the cleanest target for LoRA fine-tuning: any LoRA you train here has access to the full weight space without fighting distillation artifacts.

Both AI Toolkit (by Ostris) and OneTrainer are open-source LoRA trainers that support Z-Image Base. They both work. The question is which gives you more predictable results — and why they behave so differently even with nearly identical settings.

⚡ Why the Tool Choice Actually Matters

The biggest surprise for most people is that identical hyperparameters produce meaningfully different outputs in the two tools. Here’s the practical summary before the details:

✅ Speed: OneTrainer runs roughly 1.4–2× faster than AI Toolkit on the same hardware. The main driver is torch.compile and int8 quantized training (w8a8), which OneTrainer enables by default and AI Toolkit currently lacks.
✅ Character bodies: AI Toolkit tends to produce more anatomically consistent bodies — especially at 3,000+ steps.
✅ Character faces: OneTrainer frequently produces sharper, more accurate facial likeness, particularly with the Prodigy_Adv optimizer and stochastic rounding enabled.
✅ Workflow: AI Toolkit has a cleaner, simpler UI and supports job queuing for multi-run iteration.
✅ Options: OneTrainer exposes more optimizer choices and ships fine-grained presets per VRAM tier.
❌ The catch? Getting both face and body right in a single tool requires deliberate tuning — but it’s fully achievable in either one.

📊 Quick Comparison Table

Feature	AI Toolkit	OneTrainer
UI Simplicity	✅ Very clean	❌ Dense but powerful
Speed	❌ ~1.4–2× slower	✅ Faster via torch.compile + int8
Job Queue	✅ Yes	❌ No
Z-Image Base Support	✅ Native	✅ Native
FLUX.2 Klein 9B Support	✅ Yes	✅ Via fork (PR not yet merged)
Prodigy_Adv Optimizer	❌ Not available	✅ Yes
Stochastic Rounding	❌ Not exposed in UI	✅ Available
VRAM Presets	❌ Manual config	✅ Presets per tier
Windows Support	✅	✅

🥇 AI Toolkit by Ostris

AI Toolkit is the natural starting point for most people — Ostris ships solid defaults, the job queue makes iterating across multiple runs painless, and the cloud UI on RunComfy removes local setup entirely. Here are the full recommended settings for Z-Image Base character LoRAs.

💡 Key Features

Job queue for multi-dataset iteration without manual restarts
FlowMatch sampler with CFG support for accurate in-training preview images
float8 quantization for Transformer and Text Encoder
BF16 checkpoint output
Cloud deployment via RunComfy (H100/H200)

✅ Recommended Settings (Character LoRA, 24 GB GPU)

Setting	Value
Model Architecture	Z-Image
Model Path	`Tongyi-MAI/Z-Image`
Quantization (Transformer)	`float8`
Quantization (Text Encoder)	`float8`
Target Type	LoRA
Linear Rank	32
Save Data Type	BF16
Batch Size	1
Steps	3,000–7,000
Optimizer	AdamW8Bit
Learning Rate	0.0001
Weight Decay	0.0001
Timestep Type	Weighted
Timestep Bias	Balanced
EMA	OFF
Resolutions	768 + 1024
Sample Guidance Scale	4
Sample Steps	30–50

Steps per image: aim for ~100 steps per image in your dataset. 64 images → 6,400 steps. 30 images → 3,000 steps. Don’t undershoot — if your face is coming out slightly wide or soft at 3,000 steps with a 60-image dataset, extending to 5,000–6,500 and checking intermediate checkpoints is the right move, not switching tools.

Sample settings matter. The most common Z-Image Base misconfiguration in AI Toolkit is using Turbo-style sampling — 8 steps, little or no CFG — for preview images. Set Guidance Scale to 3–5 and Sample Steps to at least 30. Turbo settings here make previews look undercooked, which causes people to stop training too early or assume the LoRA isn’t working when it just needs more steps.

❌ Known Limitations

No Prodigy_Adv optimizer — only standard Prodigy and AdamW variants
Stochastic rounding is not exposed in the UI
~1.4–2× slower than OneTrainer at equivalent settings
FLUX.2 Klein 9B training sample images are notoriously misleading — test checkpoints manually at a fixed seed
Speed gap versus OneTrainer widens above batch size 2

💰 Cloud Cost

Running on RunComfy H100: approximately $2–4 per full character training run depending on step count and resolution.

🥈 OneTrainer

OneTrainer trades UI simplicity for raw flexibility. The settings page is dense, but that density is the point — it exposes optimizer options and quantization controls that AI Toolkit either doesn’t have or doesn’t surface. For Z-Image Base character training specifically, the Prodigy_Adv optimizer with stochastic rounding directly improves facial likeness in ways that default AI Toolkit settings don’t match.

💡 Key Features

torch.compile for significantly faster training on all supported GPUs
Prodigy_Adv optimizer with stochastic rounding
Per-VRAM-tier presets (8 GB, 12 GB, 16 GB, 24 GB+)
DoRA support for faster convergence on fine character details
Image pair training for edit-style LoRAs

✅ Recommended Settings (Character LoRA, Z-Image Base)

Setting	Value
Base Model	Z-Image Base
LoRA Rank	16–32
Alpha	Equal to Rank
Optimizer	Prodigy_Adv (or AdamW + stochastic rounding)
Learning Rate	1.0 (Prodigy) / 0.0001 (AdamW)
Epochs	~100–120 (batch 1 = 100–120 × image count steps)
Stochastic Rounding	ON
LoRA Weight Data Type	BF16
EMA	OFF
Resolution	768
Gradient Checkpointing	ON if VRAM is tight
Differential Guidance	OFF

Alpha and rank: keep alpha equal to rank. Setting alpha = 1 with a high rank effectively decouples the update scale from the learning rate in a way that makes tuning harder to reason about. Equal alpha and rank keeps the math straightforward and the LR more predictable.

Stochastic rounding: in OneTrainer’s optimizer settings, enable stochastic rounding whenever you’re using BF16 as the LoRA weight data type. This is the single setting most people miss when first using OneTrainer, and it meaningfully affects convergence quality on fine facial detail.

Epoch-to-steps math: 100 epochs at batch size 1 with 64 images = 6,400 steps. Train to 120 epochs (7,680 steps) and manually select the best checkpoint. The sweet spot is often around epochs 115–118 — not the final one.

❌ Known Limitations

FLUX.2 Klein 9B support requires a fork (PR #1301, not yet merged to main)
Dense UI — configuring from scratch takes significantly longer than AI Toolkit
No job queue for automated multi-run training
Sample images during training can be misleading on some models — use a fixed seed for manual checkpoint evaluation

⚙️ Z-Image Base: Recommended Configs by VRAM Tier

These apply to both tools unless otherwise noted.

Setting	12–16 GB	24 GB	48 GB+
Quantization	float8 (Transformer + TE)	float8	Optional
Rank	16	32	32–48
Resolutions	512 + 768	768 + 1024	1024 + 1280 + 1536
Sample Steps	30	30–40	40–50
Steps (~60 img, character)	4,000–6,000	5,000–7,000	Same, faster iteration
EMA	OFF	OFF	OFF
BF16 LoRA Weights	YES	YES	YES
Stochastic Rounding (OT)	ON	ON	ON

🧩 Fixing the Character LoRA Face vs Body Problem

This is the most common frustration when training Z-Image Base character LoRAs, and it has a real explanation rather than being random tool behavior.

Why AI Toolkit gets the body right but softens the face:

At 3,000 steps with a 64-image dataset, a Z-Image Base LoRA trained in AI Toolkit has learned enough for body poses, composition, and general likeness — but hasn’t converged on high-frequency facial detail. The weighted timestep approach AI Toolkit uses by default distributes training broadly, which favors coarser features (body shape, clothing, pose) before finer ones (facial geometry, eye spacing). The fix is more steps, not a different tool.

Why OneTrainer nails the face but fails the body:

Body failures in OneTrainer — slimmer proportions, deformed arms and hands — are almost always a dataset imbalance issue, not an optimizer issue. If your dataset has 40 portrait crops and 24 full-body shots, the LoRA overweights facial features relative to body structure. Prodigy_Adv and stochastic rounding accelerate convergence, which means the imbalance shows up faster and more severely than it does in AI Toolkit.

Sounds obvious, but: check your dataset ratio before changing your optimizer. Add more full-body images with clear arm and hand visibility. Physically crop out other people in group shots rather than captioning them out — caption exclusion is inconsistent.

A two-LoRA approach that works:

The most reliable high-likeness method currently used by the community: train a full character LoRA in AI Toolkit at 6,000+ steps for body and pose fidelity, then train a face-only LoRA in OneTrainer using a tightly cropped face-only dataset. Apply both in ComfyUI — pass the face LoRA through a FaceDetailer node using FLUX.2 Klein 4B as the refinement model. This setup consistently produces 95–100% likeness across poses and lighting conditions.

🛠️ Troubleshooting

Error	Cause	Fix
Face slightly wide or nose larger (AI Toolkit)	Too few steps — underfitting	Increase to 6,000+ steps; test every 250-step checkpoint
Arms deformed or body too slim (OneTrainer)	Dataset imbalance — too many portrait crops	Add full-body images; crop out bystanders physically
”LoRA does nothing” at inference	Using Turbo sampling settings on Base	Set guidance scale 3–5 and sample steps 30–50 at inference
OOM at 1024 resolution	Rank too high or quantization off	Enable float8 for Transformer; drop to 768; reduce rank to 16
Loss curve never drops (OneTrainer)	Differential guidance accidentally enabled	Turn OFF differential guidance in advanced settings
LoRA looks great on Base, poor on Turbo	Base–Turbo weight deviation	Train on Turbo for Turbo deployment; or use a 4-step distilled LoRA merge
FLUX.2 Klein 9B OOM on 16 GB	Model too large for VRAM at default settings	Enable layer offloading; use 7-bit quant; drop rank to 8–16
Training samples look terrible during Klein runs	Known issue — base model mid-training samples	Ignore samples; manually test checkpoints at a fixed seed

💡 Tips & Best Practices

💡 Tip: Don’t deploy a Z-Image Base LoRA directly on Z-Image Turbo and expect matching results. Turbo is a distilled finetune with diverged weights — LoRA strength, facial sharpness, and style transfer all behave differently. You’ll need to push strength to 1.3–1.5 at minimum to compensate, and results still won’t match a Turbo-trained LoRA. Train and deploy on the same base for consistent quality.

💡 Tip: In OneTrainer, always verify that stochastic rounding is enabled in the optimizer settings when using BF16 as the LoRA weight data type. It’s not automatic for all optimizers and is the most commonly missed setting. Without it, precision loss from BF16 can subtly degrade convergence on fine detail — exactly the kind of problem that shows up as soft or inaccurate faces.

💡 Tip: Caption dropout at 0.05 prevents the trigger word from becoming too tightly bound to a single lighting condition or background. AI Toolkit sets this by default; OneTrainer requires you to set it manually. Skipping it makes your LoRA less flexible at inference time — prompts for different backgrounds or lighting will partially bleed into the trigger response.

💡 Tip: For FLUX.2 Klein 9B in AI Toolkit, set timestep_type to linear instead of the default weighted. Community members consistently report better character likeness at 1,800–2,000 steps with this single change, using lr=0.0001 and rank=32. Also plan to run the LoRA at 1.3–1.5 strength when using the distilled 9B model at inference — base-trained LoRAs don’t fully transfer at strength 1.0.

💡 Tip: The 100-steps-per-image rule is a floor, not a ceiling. For high-fidelity character training, running to 120 steps per image and manually selecting the best intermediate checkpoint typically beats stopping at the minimum. Evaluate each checkpoint at a fixed prompt and seed — the best result is usually around 115–118 epochs of training, not at the end of the run.

💡 Tip: If FLUX.2 Klein 9B hits VRAM limits on 16 GB, the 4B variant uses Apache 2.0 and trains without forced quantization compromises on consumer hardware. The 9B produces marginally sharper fine detail, but for most character and style use cases the 4B results are competitive and far easier to work with locally.

✅ Final Thoughts

There’s no single best tool for Z-Image Base LoRA training — there are clear trade-offs. AI Toolkit gives you a simpler workflow, a job queue, and body-consistent character training. OneTrainer runs faster, exposes better optimizer options, and produces stronger facial likeness when configured correctly. The face-versus-body problem is almost always a dataset balance or step-count issue, not a fundamental limitation of either tool.

Fix your dataset ratio, run more steps than you think you need, enable stochastic rounding in OneTrainer, and evaluate intermediate checkpoints at a fixed seed rather than trusting training previews. That combination resolves the overwhelming majority of Z-Image Base character training frustrations — in both tools.

Go train something.

❓ FAQ

❓ Q: Why is AI Toolkit so much slower than OneTrainer for LoRA training?

OneTrainer is 1.4–2× faster primarily because it implements torch.compile and int8 quantized training (w8a8) by default — optimizations AI Toolkit currently lacks. The gap widens at higher batch sizes, where OneTrainer’s VRAM scaling is more efficient. For speed-critical workflows, OneTrainer is the better choice; for workflow simplicity and job queuing, AI Toolkit wins.

❓ Q: Can I use a Z-Image Base LoRA on Z-Image Turbo?

You can, but results vary and typically require increasing LoRA strength to 1.3–1.5 to compensate for the weight deviation between Base and Turbo. For best consistency, train on the same model you’ll deploy to. If Turbo deployment is the goal, either retrain on Turbo or combine a Base-trained character LoRA with a separately available 4-step distilled LoRA for Base inference.

❓ Q: How many training steps does a Z-Image Base character LoRA actually need?

The community baseline is 100–120 steps per image at batch size 1: 64 images → 6,400–7,680 steps. Training too short is the most common cause of soft faces and missed likeness in AI Toolkit. Save checkpoints every 250 steps and manually evaluate at a fixed seed — the best result is often around 115–118 epochs, not the final checkpoint.

❓ Q: Does Prodigy_Adv actually improve results over AdamW for Z-Image LoRA training?

The quality improvement comes primarily from two things, not the optimizer itself: Prodigy_Adv dynamically adjusts the learning rate (removing the need to tune it manually), and it’s typically paired with stochastic rounding enabled in OneTrainer. If you enable stochastic rounding alongside a well-tuned AdamW learning rate, the quality gap between the two closes significantly. Prodigy_Adv is mainly a convenience, not a secret ingredient.

❓ Q: Why do Z-Image Base LoRA training samples look bad during training?

Training samples are generated mid-run at reduced quality to avoid slowing down the run. For Z-Image Base, they’re produced using in-progress weights at 30 steps, which doesn’t reflect final LoRA quality. Don’t make decisions based on training samples alone. After the run completes, manually test each checkpoint at a consistent prompt and seed to find the actual best point.

🔍 What Is Z-Image Base LoRA Training?

⚡ Why the Tool Choice Actually Matters

📊 Quick Comparison Table

🥇 AI Toolkit by Ostris

💡 Key Features

✅ Recommended Settings (Character LoRA, 24 GB GPU)

❌ Known Limitations

💰 Cloud Cost

🥈 OneTrainer

💡 Key Features

✅ Recommended Settings (Character LoRA, Z-Image Base)

❌ Known Limitations

⚙️ Z-Image Base: Recommended Configs by VRAM Tier

🧩 Fixing the Character LoRA Face vs Body Problem

🛠️ Troubleshooting

💡 Tips & Best Practices

✅ Final Thoughts

❓ FAQ

❓ Q: Why is AI Toolkit so much slower than OneTrainer for LoRA training?

❓ Q: Can I use a Z-Image Base LoRA on Z-Image Turbo?

❓ Q: How many training steps does a Z-Image Base character LoRA actually need?

❓ Q: Does Prodigy_Adv actually improve results over AdamW for Z-Image LoRA training?

❓ Q: Why do Z-Image Base LoRA training samples look bad during training?

📚 Additional Resources

📚 Related Guides