Follow along free — 200 free credits at signup + 100 daily when logged in, free to use Create Free Now →

LoRA Training Guide: Create Custom AI Models for Your Style

Lora Training Guide
By Cemhan Biricik 2026-01-26 22 min read

Every AI image generator produces images in the styles it was trained on. Want a specific person's face, a particular product, a unique art style, or a proprietary brand aesthetic? The base model does not know these concepts. You need to teach it. LoRA — Low-Rank Adaptation — is the most efficient and widely used method for doing exactly that.

LoRA was originally developed for large language models by Edward Hu and colleagues at Microsoft Research, but it was quickly adopted by the image generation community because it solves a critical problem: how do you customize a multi-gigabyte model without retraining the entire thing? A LoRA fine-tune modifies only a tiny fraction of the model's parameters by training small, low-rank matrices that are injected alongside the original weights. The result is a file that is typically 10–200 MB, trains in minutes to hours rather than days, and can be loaded and unloaded from the base model instantly.

This guide walks through the entire LoRA training process from dataset preparation to deployment. Whether you are training for FLUX or SDXL, the principles are the same — the implementation details differ, and we cover both.

Understanding LoRA: How Low-Rank Adaptation Works

To understand why LoRA is so effective, you need to understand what happens during fine-tuning. A diffusion model like SDXL has approximately 3.5 billion parameters organized in layers of weight matrices. During standard fine-tuning, you update all of these weights through gradient descent, which requires storing gradients and optimizer states for every parameter — enormous memory requirements and the risk of catastrophically forgetting the model's original capabilities.

LoRA takes a different approach. For each weight matrix W in the model, instead of updating W directly, LoRA freezes W and trains two small matrices A and B such that the modified weight becomes W + BA. The key insight is that A and B have a much lower rank than W — if W is a 4096×4096 matrix (16.7 million parameters), A might be 4096×16 and B might be 16×4096, totaling only 131,072 trainable parameters for that layer. Across the entire model, this typically reduces trainable parameters by 99% or more while still being able to learn meaningful adaptations.

The rank parameter (commonly called rank or dim) controls how many parameters the LoRA has. Higher rank means more expressive capacity but larger file size and higher risk of overfitting. Lower rank produces smaller, more efficient LoRAs but may not capture complex concepts. For most use cases, a rank of 16–64 is sufficient. Character likeness LoRAs often work well at rank 32. Complex style LoRAs may benefit from rank 64–128.

LoRA vs Full Fine-Tuning vs Textual Inversion

MethodFile SizeTraining TimeVRAM NeededExpressiveness
Textual Inversion~10 KB1–4 hours8 GB+Limited (token-level only)
LoRA (rank 32)30–100 MB30 min–3 hours12 GB+High (modifies attention layers)
LoRA (rank 128)100–300 MB1–6 hours16 GB+Very high
Full Fine-Tune2–7 GB12–48 hours24 GB+Maximum

Dataset Preparation: The Foundation of Quality

The quality of your LoRA is determined primarily by the quality of your training dataset. No amount of hyperparameter tuning will compensate for a poorly curated dataset. This section covers how to build a dataset that produces excellent results.

How Many Images Do You Need?

Image Quality Requirements

Every image in your dataset should meet these criteria:

Captioning Your Dataset

Every training image needs a text caption that accurately describes its contents. This is how the model learns to associate your visual concept with specific text tokens. There are two captioning approaches:

Trigger word captioning: Use a unique trigger word (e.g., sks person, ohwx style) followed by a natural description of the image. The trigger word becomes the activation token — include it in your generation prompt to activate the LoRA. Example: sks woman, portrait, studio lighting, neutral expression, dark background.

Natural language captioning: Describe the image in full natural language without a trigger word. The LoRA learns to associate the visual concept with the descriptive patterns in your captions. This approach can produce more flexible LoRAs but requires more careful captioning. Example: A portrait photograph of a woman with auburn hair and green eyes, studio lighting with soft shadows, neutral expression, wearing a white blouse, dark background.

For FLUX LoRAs, natural language captioning typically works better because FLUX's T5 encoder processes full sentences more effectively than keyword clusters. For SDXL, trigger word captioning is the established standard.

Automated Captioning Tools

Manually captioning 50–200 images is tedious. Several tools can auto-generate captions that you then review and correct:

Regardless of the tool, always review auto-generated captions. Incorrect captions teach the model incorrect associations. Add your trigger word to every caption if using the trigger word approach, and ensure the descriptions accurately reflect what is visible in each image.

Training Parameters: Getting the Settings Right

Learning Rate

The learning rate controls how aggressively the model updates weights during training. Too high and the LoRA overtrains quickly, producing distorted or oversaturated images. Too low and training takes forever or the LoRA has no visible effect.

Training Steps and Epochs

An epoch is one complete pass through your training dataset. The total number of training steps is: epochs × (dataset_size / batch_size). For a 30-image dataset with batch size 1 and 50 epochs, that is 1,500 steps.

General guidelines for total training steps:

Overtraining is the most common mistake. An overtrained LoRA produces images that look exactly like the training data regardless of the prompt — the model has memorized rather than learned. Save checkpoints every 200–500 steps and test each one. The best checkpoint is usually well before the final one.

Rank and Alpha

The rank (dim) determines the LoRA's capacity. The alpha determines the scaling factor applied to the LoRA's contribution: effective_weight = alpha / rank * lora_weight. A common convention is to set alpha equal to rank (so the scaling factor is 1.0), but some trainers prefer alpha = rank/2 for more conservative initial influence.

Use CaseRecommended RankFile Size (SDXL)
Simple concept (trigger word for an object)8–1610–30 MB
Character likeness16–3230–80 MB
Art style32–6480–150 MB
Complex multi-concept64–128150–300 MB

Network Modules to Train

You can select which layers of the model the LoRA modifies. The standard target modules are the attention layers (Q, K, V, and output projection) in both the U-Net (SDXL) or DiT (FLUX). Some trainers also include feed-forward network layers for additional expressiveness at the cost of larger file sizes.

For character LoRAs, training only the attention layers is usually sufficient. For complex style LoRAs that need to modify how the model processes textures and colors at a fundamental level, including feed-forward layers can improve results.

The Training Process: Step by Step

Setting Up Your Environment

The two most popular LoRA training tools are kohya_ss (sd-scripts) and ai-toolkit by Ostris. Both support SDXL and FLUX training with full parameter control. kohya_ss has a web UI (via bmaltais GUI) that simplifies configuration, while ai-toolkit uses YAML configuration files for more precise control.

  1. Install your chosen training tool following its documentation. Ensure your GPU drivers and CUDA are up to date.
  2. Organize your dataset in a directory structure: training_data/[num]_[concept_name]/ where [num] is the number of repeats per image per epoch. For a 20-image character dataset, 10_sks_person means each image is seen 10 times per epoch.
  3. Place caption files (.txt) alongside each image with the same filename: image_001.png and image_001.txt.
  4. Configure your training parameters (learning rate, steps, rank, etc.) in the UI or config file.
  5. Set a regularization dataset (optional but recommended) — images of the base concept without your specific subject, to prevent the model from associating your trigger word with generic features of the class.
  6. Start training and monitor the loss curve. Loss should decrease steadily and plateau. If it drops to near zero, you are probably overtraining.

Regularization Images

Regularization (also called "class images" or "prior preservation") prevents a phenomenon called language drift, where the LoRA causes the model to associate generic class words with your specific concept. Without regularization, training a LoRA on "sks woman" might cause the model to generate your specific woman whenever anyone prompts "woman" without the trigger word.

To create regularization images, generate 200–500 images from the base model using the class prompt (e.g., "a woman, portrait photograph") without the LoRA. These images represent what the model normally generates for the class, and during training they anchor the model's understanding of the generic class while allowing it to learn your specific concept under the trigger word.

Testing and Deployment

Evaluating Your LoRA

Test your LoRA checkpoints systematically. Generate images with several prompts that vary in specificity and context:

Deploying Your LoRA

LoRA files are portable and platform-independent. A LoRA trained with kohya_ss works in ComfyUI, Automatic1111, Forge, and any other tool that supports the safetensors format. To use your LoRA:

  1. Place the .safetensors file in your tool's LoRA directory (typically models/loras/).
  2. Load it alongside the base model it was trained on. An SDXL LoRA works with AI checkpoints; a FLUX LoRA works with advanced AI.
  3. Set the LoRA weight (start at 0.7) and include the trigger word in your prompt.
  4. Generate and adjust weight as needed.

On ZSky AI, you can upload custom LoRAs and use them with our RTX 5090 inference infrastructure for fast generation with your custom models.

Common Training Problems and Solutions

Overtrained / Fried LoRA

Symptoms: images look exactly like training data regardless of prompt, colors are oversaturated, faces look waxy or plastic. Solution: use an earlier checkpoint, reduce learning rate by 50%, reduce training steps, or increase rank (which distributes learning across more parameters, reducing per-parameter overfitting).

Undertrained LoRA

Symptoms: trigger word has no visible effect, or effect is extremely subtle even at weight 1.0. Solution: increase training steps, increase learning rate by 50%, verify that captions include the trigger word in every file, and check that images are being loaded correctly (correct directory structure, correct file format).

Style Bleeding

Symptoms: the LoRA changes overall image style even when the trigger word is not used. Solution: add more regularization images, reduce training steps, ensure captions describe image content beyond just the trigger word (so the model does not attribute everything in the image to the trigger word), and test with lower LoRA weights.

Poor Generalization

Symptoms: LoRA only works well in poses/angles/contexts similar to the training data. Solution: increase dataset variety. Add more images showing the concept in diverse situations. For character LoRAs, add profile shots, three-quarter views, full-body shots, and different lighting conditions. For style LoRAs, add more subject variety within the style.

Advanced Techniques

LyCORIS: Beyond Standard LoRA

LyCORIS (Lora beYond Conventional methods, Other Rank adaptation Implementations for Stable diffusion) extends LoRA with alternative matrix decomposition methods. LoHa (Hadamard product) and LoKr (Kronecker product) offer different capacity-efficiency tradeoffs that can capture certain concepts more effectively than standard LoRA. If standard LoRA struggles with your concept at reasonable ranks, try LyCORIS methods — they sometimes produce better results for complex style adaptations.

Pivotal Tuning / DreamBooth + LoRA

Combining DreamBooth's approach (learning a new text token) with LoRA's efficient weight modification produces particularly strong character LoRAs. The model learns both a new embedding for your subject and modified attention patterns, resulting in stronger likeness capture with better prompt followability than either method alone.

Multi-Concept LoRA Training

You can train a single LoRA on multiple concepts by using different trigger words for each concept and organizing your dataset accordingly. A single LoRA file could contain both a character and a style, each activated by their own trigger word. This is more memory-efficient during inference than loading multiple separate LoRAs.

Use Your Custom LoRAs on ZSky AI

Train your LoRA, upload it to ZSky AI, and generate with dedicated RTX 5090 GPU power. Custom model support with no queue times.

Try ZSky AI Free →
Made with ZSky AI
LoRA Training Guide: Create Custom AI Models for Your Style — ZSky AI
Create fashion art like thisFree, free to use
Try It Free

Frequently Asked Questions

What is a LoRA in AI image generation?

LoRA (Low-Rank Adaptation) is a fine-tuning technique that trains small adapter weights to modify a base model's behavior without changing the original weights. A LoRA file is typically 10–200 MB compared to the base model's multi-gigabyte size. LoRAs teach models new concepts like specific people, art styles, objects, or aesthetic preferences, and can be loaded and unloaded instantly during generation.

How many images do I need to train a LoRA?

For a character or face LoRA, 15–30 high-quality images with varied poses, lighting, and expressions typically produce good results. For a style LoRA, 50–200 images consistently demonstrating the target style are recommended. For an object LoRA, 20–50 images from different angles and contexts work well. Quality always matters more than quantity.

What GPU do I need to train a LoRA?

For SDXL LoRAs, a GPU with at least 12 GB VRAM (like an RTX 3060 12GB or RTX 4070) is sufficient. For FLUX LoRAs, 16–24 GB VRAM is recommended. Training time ranges from 30 minutes to several hours depending on dataset size and parameters. Cloud GPU services are an option if local hardware is insufficient.

What is the best learning rate for LoRA training?

For SDXL, 1e-4 (0.0001) is the most reliable starting point. For FLUX, 5e-5 to 1e-4 works well. Use a cosine scheduler for stable training. If outputs are oversaturated or distorted, lower the learning rate. If the LoRA has no effect at weight 1.0, increase it or add more training steps.

Can I combine multiple LoRAs at once?

Yes. A common workflow combines a character LoRA with a style LoRA to place a specific person in a specific artistic style. When stacking, reduce each LoRA's weight to 0.5–0.8 to prevent over-conditioning. More than 3–4 simultaneous LoRAs typically degrades quality as their modifications may conflict.

What is the difference between LoRA, LyCORIS, and full fine-tuning?

LoRA trains low-rank matrices modifying specific layers, producing small files with efficient training. LyCORIS extends LoRA with additional decomposition methods (LoHa, LoKr) for more complex adaptations. Full fine-tuning modifies every weight, producing maximum quality but requiring the most resources. LoRA offers the best balance for most users.

Ready to put your training knowledge to work? Try ZSky AI free with 200 free credits at signup + 100 daily when logged in. Power users who need more can compare plans here.