If youβve trained Z-Image Base LoRAs in both AI Toolkit and OneTrainer, youβve probably hit a wall that feels designed to drive you insane: one tool nails the body but softens the face, the other gets the face exactly right but mangles the arms. Iβve dug into both tools across dozens of character training runs, and this isnβt random β there are specific, fixable reasons why the two diverge. This guide breaks down the key differences in settings, speed, and training behavior, then gives you working configs for both.
π What Is Z-Image Base LoRA Training?
Z-Image Base is the full, non-distilled Z-Image checkpoint from Tongyi-MAI. It runs at 30β50 sampling steps with full CFG and negative prompt support β unlike Z-Image Turbo, which is an 8-step distilled variant. Because Base is the source model, itβs the cleanest target for LoRA fine-tuning: any LoRA you train here has access to the full weight space without fighting distillation artifacts.
Both AI Toolkit (by Ostris) and OneTrainer are open-source LoRA trainers that support Z-Image Base. They both work. The question is which gives you more predictable results β and why they behave so differently even with nearly identical settings.
β‘ Why the Tool Choice Actually Matters
The biggest surprise for most people is that identical hyperparameters produce meaningfully different outputs in the two tools. Hereβs the practical summary before the details:
- β Speed: OneTrainer runs roughly 1.4β2Γ faster than AI Toolkit on the same hardware. The main driver is torch.compile and int8 quantized training (w8a8), which OneTrainer enables by default and AI Toolkit currently lacks.
- β Character bodies: AI Toolkit tends to produce more anatomically consistent bodies β especially at 3,000+ steps.
- β Character faces: OneTrainer frequently produces sharper, more accurate facial likeness, particularly with the Prodigy_Adv optimizer and stochastic rounding enabled.
- β Workflow: AI Toolkit has a cleaner, simpler UI and supports job queuing for multi-run iteration.
- β Options: OneTrainer exposes more optimizer choices and ships fine-grained presets per VRAM tier.
- β The catch? Getting both face and body right in a single tool requires deliberate tuning β but itβs fully achievable in either one.
π Quick Comparison Table
| Feature | AI Toolkit | OneTrainer |
|---|---|---|
| UI Simplicity | β Very clean | β Dense but powerful |
| Speed | β ~1.4β2Γ slower | β Faster via torch.compile + int8 |
| Job Queue | β Yes | β No |
| Z-Image Base Support | β Native | β Native |
| FLUX.2 Klein 9B Support | β Yes | β Via fork (PR not yet merged) |
| Prodigy_Adv Optimizer | β Not available | β Yes |
| Stochastic Rounding | β Not exposed in UI | β Available |
| VRAM Presets | β Manual config | β Presets per tier |
| Windows Support | β | β |
π₯ AI Toolkit by Ostris
AI Toolkit is the natural starting point for most people β Ostris ships solid defaults, the job queue makes iterating across multiple runs painless, and the cloud UI on RunComfy removes local setup entirely. Here are the full recommended settings for Z-Image Base character LoRAs.
π‘ Key Features
- Job queue for multi-dataset iteration without manual restarts
- FlowMatch sampler with CFG support for accurate in-training preview images
- float8 quantization for Transformer and Text Encoder
- BF16 checkpoint output
- Cloud deployment via RunComfy (H100/H200)
β Recommended Settings (Character LoRA, 24 GB GPU)
| Setting | Value |
|---|---|
| Model Architecture | Z-Image |
| Model Path | Tongyi-MAI/Z-Image |
| Quantization (Transformer) | float8 |
| Quantization (Text Encoder) | float8 |
| Target Type | LoRA |
| Linear Rank | 32 |
| Save Data Type | BF16 |
| Batch Size | 1 |
| Steps | 3,000β7,000 |
| Optimizer | AdamW8Bit |
| Learning Rate | 0.0001 |
| Weight Decay | 0.0001 |
| Timestep Type | Weighted |
| Timestep Bias | Balanced |
| EMA | OFF |
| Resolutions | 768 + 1024 |
| Sample Guidance Scale | 4 |
| Sample Steps | 30β50 |
Steps per image: aim for ~100 steps per image in your dataset. 64 images β 6,400 steps. 30 images β 3,000 steps. Donβt undershoot β if your face is coming out slightly wide or soft at 3,000 steps with a 60-image dataset, extending to 5,000β6,500 and checking intermediate checkpoints is the right move, not switching tools.
Sample settings matter. The most common Z-Image Base misconfiguration in AI Toolkit is using Turbo-style sampling β 8 steps, little or no CFG β for preview images. Set Guidance Scale to 3β5 and Sample Steps to at least 30. Turbo settings here make previews look undercooked, which causes people to stop training too early or assume the LoRA isnβt working when it just needs more steps.
β Known Limitations
- No Prodigy_Adv optimizer β only standard Prodigy and AdamW variants
- Stochastic rounding is not exposed in the UI
- ~1.4β2Γ slower than OneTrainer at equivalent settings
- FLUX.2 Klein 9B training sample images are notoriously misleading β test checkpoints manually at a fixed seed
- Speed gap versus OneTrainer widens above batch size 2
π° Cloud Cost
Running on RunComfy H100: approximately $2β4 per full character training run depending on step count and resolution.
π₯ OneTrainer
OneTrainer trades UI simplicity for raw flexibility. The settings page is dense, but that density is the point β it exposes optimizer options and quantization controls that AI Toolkit either doesnβt have or doesnβt surface. For Z-Image Base character training specifically, the Prodigy_Adv optimizer with stochastic rounding directly improves facial likeness in ways that default AI Toolkit settings donβt match.
π‘ Key Features
- torch.compile for significantly faster training on all supported GPUs
- Prodigy_Adv optimizer with stochastic rounding
- Per-VRAM-tier presets (8 GB, 12 GB, 16 GB, 24 GB+)
- DoRA support for faster convergence on fine character details
- Image pair training for edit-style LoRAs
β Recommended Settings (Character LoRA, Z-Image Base)
| Setting | Value |
|---|---|
| Base Model | Z-Image Base |
| LoRA Rank | 16β32 |
| Alpha | Equal to Rank |
| Optimizer | Prodigy_Adv (or AdamW + stochastic rounding) |
| Learning Rate | 1.0 (Prodigy) / 0.0001 (AdamW) |
| Epochs | ~100β120 (batch 1 = 100β120 Γ image count steps) |
| Stochastic Rounding | ON |
| LoRA Weight Data Type | BF16 |
| EMA | OFF |
| Resolution | 768 |
| Gradient Checkpointing | ON if VRAM is tight |
| Differential Guidance | OFF |
Alpha and rank: keep alpha equal to rank. Setting alpha = 1 with a high rank effectively decouples the update scale from the learning rate in a way that makes tuning harder to reason about. Equal alpha and rank keeps the math straightforward and the LR more predictable.
Stochastic rounding: in OneTrainerβs optimizer settings, enable stochastic rounding whenever youβre using BF16 as the LoRA weight data type. This is the single setting most people miss when first using OneTrainer, and it meaningfully affects convergence quality on fine facial detail.
Epoch-to-steps math: 100 epochs at batch size 1 with 64 images = 6,400 steps. Train to 120 epochs (7,680 steps) and manually select the best checkpoint. The sweet spot is often around epochs 115β118 β not the final one.
β Known Limitations
- FLUX.2 Klein 9B support requires a fork (PR #1301, not yet merged to main)
- Dense UI β configuring from scratch takes significantly longer than AI Toolkit
- No job queue for automated multi-run training
- Sample images during training can be misleading on some models β use a fixed seed for manual checkpoint evaluation
βοΈ Z-Image Base: Recommended Configs by VRAM Tier
These apply to both tools unless otherwise noted.
| Setting | 12β16 GB | 24 GB | 48 GB+ |
|---|---|---|---|
| Quantization | float8 (Transformer + TE) | float8 | Optional |
| Rank | 16 | 32 | 32β48 |
| Resolutions | 512 + 768 | 768 + 1024 | 1024 + 1280 + 1536 |
| Sample Steps | 30 | 30β40 | 40β50 |
| Steps (~60 img, character) | 4,000β6,000 | 5,000β7,000 | Same, faster iteration |
| EMA | OFF | OFF | OFF |
| BF16 LoRA Weights | YES | YES | YES |
| Stochastic Rounding (OT) | ON | ON | ON |
π§© Fixing the Character LoRA Face vs Body Problem
This is the most common frustration when training Z-Image Base character LoRAs, and it has a real explanation rather than being random tool behavior.
Why AI Toolkit gets the body right but softens the face:
At 3,000 steps with a 64-image dataset, a Z-Image Base LoRA trained in AI Toolkit has learned enough for body poses, composition, and general likeness β but hasnβt converged on high-frequency facial detail. The weighted timestep approach AI Toolkit uses by default distributes training broadly, which favors coarser features (body shape, clothing, pose) before finer ones (facial geometry, eye spacing). The fix is more steps, not a different tool.
Why OneTrainer nails the face but fails the body:
Body failures in OneTrainer β slimmer proportions, deformed arms and hands β are almost always a dataset imbalance issue, not an optimizer issue. If your dataset has 40 portrait crops and 24 full-body shots, the LoRA overweights facial features relative to body structure. Prodigy_Adv and stochastic rounding accelerate convergence, which means the imbalance shows up faster and more severely than it does in AI Toolkit.
Sounds obvious, but: check your dataset ratio before changing your optimizer. Add more full-body images with clear arm and hand visibility. Physically crop out other people in group shots rather than captioning them out β caption exclusion is inconsistent.
A two-LoRA approach that works:
The most reliable high-likeness method currently used by the community: train a full character LoRA in AI Toolkit at 6,000+ steps for body and pose fidelity, then train a face-only LoRA in OneTrainer using a tightly cropped face-only dataset. Apply both in ComfyUI β pass the face LoRA through a FaceDetailer node using FLUX.2 Klein 4B as the refinement model. This setup consistently produces 95β100% likeness across poses and lighting conditions.
π οΈ Troubleshooting
| Error | Cause | Fix |
|---|---|---|
| Face slightly wide or nose larger (AI Toolkit) | Too few steps β underfitting | Increase to 6,000+ steps; test every 250-step checkpoint |
| Arms deformed or body too slim (OneTrainer) | Dataset imbalance β too many portrait crops | Add full-body images; crop out bystanders physically |
| βLoRA does nothingβ at inference | Using Turbo sampling settings on Base | Set guidance scale 3β5 and sample steps 30β50 at inference |
| OOM at 1024 resolution | Rank too high or quantization off | Enable float8 for Transformer; drop to 768; reduce rank to 16 |
| Loss curve never drops (OneTrainer) | Differential guidance accidentally enabled | Turn OFF differential guidance in advanced settings |
| LoRA looks great on Base, poor on Turbo | BaseβTurbo weight deviation | Train on Turbo for Turbo deployment; or use a 4-step distilled LoRA merge |
| FLUX.2 Klein 9B OOM on 16 GB | Model too large for VRAM at default settings | Enable layer offloading; use 7-bit quant; drop rank to 8β16 |
| Training samples look terrible during Klein runs | Known issue β base model mid-training samples | Ignore samples; manually test checkpoints at a fixed seed |
π‘ Tips & Best Practices
π‘ Tip: Donβt deploy a Z-Image Base LoRA directly on Z-Image Turbo and expect matching results. Turbo is a distilled finetune with diverged weights β LoRA strength, facial sharpness, and style transfer all behave differently. Youβll need to push strength to 1.3β1.5 at minimum to compensate, and results still wonβt match a Turbo-trained LoRA. Train and deploy on the same base for consistent quality.
π‘ Tip: In OneTrainer, always verify that stochastic rounding is enabled in the optimizer settings when using BF16 as the LoRA weight data type. Itβs not automatic for all optimizers and is the most commonly missed setting. Without it, precision loss from BF16 can subtly degrade convergence on fine detail β exactly the kind of problem that shows up as soft or inaccurate faces.
π‘ Tip: Caption dropout at 0.05 prevents the trigger word from becoming too tightly bound to a single lighting condition or background. AI Toolkit sets this by default; OneTrainer requires you to set it manually. Skipping it makes your LoRA less flexible at inference time β prompts for different backgrounds or lighting will partially bleed into the trigger response.
π‘ Tip: For FLUX.2 Klein 9B in AI Toolkit, set
timestep_typetolinearinstead of the default weighted. Community members consistently report better character likeness at 1,800β2,000 steps with this single change, usinglr=0.0001andrank=32. Also plan to run the LoRA at 1.3β1.5 strength when using the distilled 9B model at inference β base-trained LoRAs donβt fully transfer at strength 1.0.
π‘ Tip: The 100-steps-per-image rule is a floor, not a ceiling. For high-fidelity character training, running to 120 steps per image and manually selecting the best intermediate checkpoint typically beats stopping at the minimum. Evaluate each checkpoint at a fixed prompt and seed β the best result is usually around 115β118 epochs of training, not at the end of the run.
π‘ Tip: If FLUX.2 Klein 9B hits VRAM limits on 16 GB, the 4B variant uses Apache 2.0 and trains without forced quantization compromises on consumer hardware. The 9B produces marginally sharper fine detail, but for most character and style use cases the 4B results are competitive and far easier to work with locally.
β Final Thoughts
Thereβs no single best tool for Z-Image Base LoRA training β there are clear trade-offs. AI Toolkit gives you a simpler workflow, a job queue, and body-consistent character training. OneTrainer runs faster, exposes better optimizer options, and produces stronger facial likeness when configured correctly. The face-versus-body problem is almost always a dataset balance or step-count issue, not a fundamental limitation of either tool.
Fix your dataset ratio, run more steps than you think you need, enable stochastic rounding in OneTrainer, and evaluate intermediate checkpoints at a fixed seed rather than trusting training previews. That combination resolves the overwhelming majority of Z-Image Base character training frustrations β in both tools.
Go train something.
β FAQ
β Q: Why is AI Toolkit so much slower than OneTrainer for LoRA training?
OneTrainer is 1.4β2Γ faster primarily because it implements torch.compile and int8 quantized training (w8a8) by default β optimizations AI Toolkit currently lacks. The gap widens at higher batch sizes, where OneTrainerβs VRAM scaling is more efficient. For speed-critical workflows, OneTrainer is the better choice; for workflow simplicity and job queuing, AI Toolkit wins.
β Q: Can I use a Z-Image Base LoRA on Z-Image Turbo?
You can, but results vary and typically require increasing LoRA strength to 1.3β1.5 to compensate for the weight deviation between Base and Turbo. For best consistency, train on the same model youβll deploy to. If Turbo deployment is the goal, either retrain on Turbo or combine a Base-trained character LoRA with a separately available 4-step distilled LoRA for Base inference.
β Q: How many training steps does a Z-Image Base character LoRA actually need?
The community baseline is 100β120 steps per image at batch size 1: 64 images β 6,400β7,680 steps. Training too short is the most common cause of soft faces and missed likeness in AI Toolkit. Save checkpoints every 250 steps and manually evaluate at a fixed seed β the best result is often around 115β118 epochs, not the final checkpoint.
β Q: Does Prodigy_Adv actually improve results over AdamW for Z-Image LoRA training?
The quality improvement comes primarily from two things, not the optimizer itself: Prodigy_Adv dynamically adjusts the learning rate (removing the need to tune it manually), and itβs typically paired with stochastic rounding enabled in OneTrainer. If you enable stochastic rounding alongside a well-tuned AdamW learning rate, the quality gap between the two closes significantly. Prodigy_Adv is mainly a convenience, not a secret ingredient.
β Q: Why do Z-Image Base LoRA training samples look bad during training?
Training samples are generated mid-run at reduced quality to avoid slowing down the run. For Z-Image Base, theyβre produced using in-progress weights at 30 steps, which doesnβt reflect final LoRA quality. Donβt make decisions based on training samples alone. After the run completes, manually test each checkpoint at a consistent prompt and seed to find the actual best point.
π Additional Resources
- AI Toolkit by Ostris β GitHub
- Z-Image Base LoRA Training Guide β RunComfy
- OneTrainer FLUX.2 Klein 9B Support β PR #1301
- Z-Image Base Working Solution Thread β Reddit