Installation
$npx skills add MCKRUZ/ComfyUI-Expert --skill comfyui-lora-trainingSummary
Enable the agent to prepare datasets, configure model-specific training (FLUX, SDXL), and evaluate checkpoints to produce character LoRAs for consistent identity generation. Invoke when the user needs to fine-tune a base model on reference images for repeatable character consistency.
SKILL.MD
ComfyUI LoRA Training
Guide the user through dataset preparation, training configuration, and evaluation for character LoRAs.
When to Train vs Zero-Shot
| Scenario | Recommendation |
|---|---|
| Need absolute consistency across many images | Train LoRA |
| Building a character series or ongoing project | Train LoRA |
| Quick one-off generation | Use zero-shot (InstantID/PuLID) |
| Limited references (1-5 images) | Use zero-shot |
| Testing concepts | Use zero-shot first, train if committing |
Training Pipeline
1. DATASET PREP
|-- Collect/generate 15-30 reference images
|-- Preprocess (crop, resize, diversify styles)
|-- Caption with trigger word + descriptions
|
2. CONFIGURE TRAINING
|-- Select training tool (Kohya/AI-Toolkit/FluxGym)
|-- Set hyperparameters based on model type
|-- Configure checkpointing
|
3. TRAIN
|-- Monitor loss curve
|-- Save checkpoints every 250-500 steps
|
4. EVALUATE
|-- Test each checkpoint with identical prompts
|-- Check identity accuracy, flexibility, overfitting
|-- Select best checkpoint
|
5. INTEGRATE
|-- Copy to ComfyUI models/loras/
|-- Update character profile with trigger word + strength
|-- Test in full workflow (LoRA + identity method)
Dataset Preparation
Image Requirements
| Aspect | Minimum | Optimal | Maximum |
|---|---|---|---|
| Count | 10-15 | 20-30 | 50+ |
| Resolution | 512x512 | 1024x1024 | - |
| Format | PNG/high JPEG | PNG | - |
Content Diversity Checklist
- Multiple angles (front, 3/4, profile, back)
- Various expressions (neutral, smile, serious, laugh, etc.)
- Different lighting conditions (studio, natural, dramatic)
- Varied backgrounds (or transparent/solid)
- Multiple outfits/contexts
- Some close-ups, some medium shots
- If from 3D renders: include style variations (see below)
Preprocessing 3D Renders
Problem: Training directly on 3D renders bakes in the "3D" aesthetic.
Solution: Generate style variations first:
- Run each render through img2img with varied style prompts
- Mix: 60% style variations, 40% original renders
- This teaches identity, not style
Style prompts for variation:
"photorealistic portrait, dslr photo"
"oil painting portrait"
"digital illustration"
"pencil sketch"
"watercolor portrait"
Captioning Rules
Trigger word: ALWAYS use a unique token as first word.
- Good:
sage_character,ohwx_sage,sks_person - Bad:
woman,redhead,character(too generic)
Caption structure:
{trigger}, {subject type}, {clothing}, {pose}, {setting}, {lighting}, {style}
DO NOT describe face features (let the model learn them):
- Bad: "woman with green eyes, freckles, auburn hair, defined cheekbones"
- Good: "sage_character, woman, indoor portrait, wearing blue sweater"
DO describe everything else: clothing, pose, background, lighting, expression.
Folder Structure
dataset/{character_name}/{repeats}_{trigger_word}/
001.png + 001.txt
002.png + 002.txt
...
Folder naming: 10_sage_character = each image repeated 10x per epoch.
Training Configurations
FLUX LoRA (AI-Toolkit) - Recommended
network:
type: lora
linear: 16 # Rank (16-32 for characters)
linear_alpha: 16 # Alpha = rank for FLUX
train:
batch_size: 1
gradient_accumulation_steps: 4
steps: 1500 # FLUX converges faster
lr: 4e-4 # Higher than SDXL
optimizer: adamw8bit
dtype: bf16
datasets:
- resolution: [1024]
caption_ext: "txt"
sample:
sample_every: 250
prompts:
- "{trigger}, photorealistic portrait"
FLUX training notes:
- Converges 2-3x faster than SDXL
- 1000-2000 steps usually sufficient
- Watch for overfitting (quality plateaus early)
- 24GB VRAM for standard, 9GB with NF4 quantization (SimpleTuner)
SDXL LoRA (Kohya_ss) - Proven
pretrained_model: "RealVisXL_V5.0.safetensors"
network_dim: 32 # Rank (16-64)
network_alpha: 16 # Usually dim/2
resolution: "1024,1024"
train_batch_size: 1
gradient_accumulation_steps: 4
learning_rate: 0.0001 # 1e-4
lr_scheduler: "cosine_with_restarts"
lr_scheduler_num_cycles: 3
max_train_epochs: 10
optimizer_type: "AdamW8bit"
mixed_precision: "bf16"
enable_bucket: true
min_snr_gamma: 5
Step calculation:
total_steps = (images x repeats x epochs) / batch_size
Target: 1500-3000 steps for SDXL
Example: 20 images x 10 repeats x 5 epochs / 1 = 1000 steps
Low VRAM Training (FluxGym / SimpleTuner)
For 12-16GB VRAM:
use_8bit_adam: true
gradient_checkpointing: true
cache_latents_to_disk: true
max_data_loader_n_workers: 0
train_batch_size: 1
gradient_accumulation_steps: 8
quantize_base_model: nf4 # SimpleTuner only
Evaluation Protocol
Test Each Checkpoint
Use identical prompts across all checkpoints:
Prompt 1: "{trigger}, photorealistic portrait, neutral expression"
Prompt 2: "{trigger}, photorealistic portrait, smiling, outdoor"
Prompt 3: "{trigger}, wearing formal suit, standing, office"
Prompt 4: "a person standing in a park" (WITHOUT trigger - should NOT produce character)
Quality Indicators
Good training:
- Character recognizable from trigger word alone
- Responds to different prompts/contexts
- Doesn't always produce same pose/expression
- Prompt 4 does NOT produce the character
Overfitting signs:
- Same exact pose/expression regardless of prompt
- Training backgrounds appearing in outputs
- Ignores clothing/setting prompts
- Prompt 4 produces the character (too strong)
Best Epoch Selection
If using sample_every: 250 with 1500 steps:
- Checkpoint 250: Usually underfit
- Checkpoint 500-750: Often sweet spot for FLUX
- Checkpoint 1000-1500: May be overfitting
Compare visually and select the checkpoint with best identity + prompt flexibility balance.
Post-Training Integration
- Copy best checkpoint to
{ComfyUI}/models/loras/ - Update character profile:
lora: trained: true model_file: "sage_character_flux.safetensors" trigger_word: "sage_character" best_strength: 0.8 - Test in full workflow: LoRA (0.7-0.9) + PuLID/IP-Adapter (0.5-0.7)
- Record successful settings in character's
generation_history
Combining LoRA with Zero-Shot Methods
Best practice: LoRA as base identity, zero-shot for enhancement.
[Load Checkpoint] → [Load LoRA (0.7-0.9)] → [Apply PuLID/IP-Adapter (0.5-0.7)] → [Generate]
Lower weights on both prevents conflict while reinforcing identity.
Troubleshooting
| Issue | Solution |
|---|---|
| LoRA not activating | Check trigger word spelling, ensure loaded before KSampler |
| Identity drift at angles | Add more angle variety to dataset, reduce network_dim |
| Overfitting | Reduce epochs, increase dataset, lower network_dim |
| Style contamination | Better caption diversity, don't describe style in captions |
| Poor quality/artifacts | Check training images for compression, reduce LR |
Reference
references/lora-training.md- Full parameter referencereferences/models.md- Training tool download links- Character profiles in
projects/for trigger words and reference images