Create professional photos free — 200 free credits at signup + 100 daily when logged in Create Free Now →

Image-to-Image Guide: Transform Photos with AI

Img2Img Guide
By Cemhan Biricik 2026-02-22 20 min read

Text-to-image generates from noise. Image-to-image generates from your image. That distinction changes everything about what is possible. Instead of describing what you want from scratch, you provide a starting point — a photograph, a sketch, a screenshot, a painting — and the diffusion model transforms it according to your text prompt while preserving as much or as little of the original as you choose.

This is not a filter. Filters apply fixed mathematical transformations to pixel values. Image-to-image (img2img) runs your image through a diffusion model that understands what it is looking at — it recognizes faces, objects, spatial relationships, lighting, and composition — and regenerates the image with those understood elements restyled, enhanced, or transformed according to your instructions. The results are fundamentally different from anything achievable with traditional image processing.

This guide covers the mechanics of img2img, the critical denoising strength parameter, practical workflows for style transfer, sketch-to-image conversion, photo enhancement, and advanced techniques for getting exactly the transformation you want. These techniques work across FLUX, SDXL, and most modern diffusion models through tools like ZSky AI, ComfyUI, and Automatic1111.

How Image-to-Image Generation Works

Understanding the mechanics helps you predict and control results. In standard text-to-image, the model starts with pure random noise and progressively removes noise over many steps, guided by the text prompt, until a coherent image emerges. In img2img, the starting point is not random noise — it is your input image with a controlled amount of noise added to it.

The process works as follows:

  1. Your input image is encoded into the model's latent space using the VAE encoder, producing a latent representation of the image.
  2. Gaussian noise is added to this latent, proportional to the denoising strength parameter. A denoising strength of 0.5 means enough noise is added to correspond to 50% of the total diffusion process. A strength of 1.0 adds enough noise to completely obscure the original image.
  3. The denoising process begins not from the first step but from the step corresponding to the noise level. At strength 0.5 with 20 total steps, denoising starts at step 10. The model runs the last 10 steps of its normal denoising process.
  4. During each denoising step, the model predicts and removes noise guided by the text prompt, exactly as in text-to-image. But because the starting point contains information from your input image (not just random noise), the output retains structural elements of the original.
  5. The denoised latent is decoded back to pixel space by the VAE decoder, producing the final transformed image.

This mechanism explains why denoising strength is so powerful: at low values, very little noise is added and very few denoising steps run, so the output closely resembles the input with minor modifications. At high values, heavy noise almost completely obscures the original, and many denoising steps run, allowing the model to generate substantially new content using the original as only a vague structural guide.

Mastering Denoising Strength

Denoising strength is the single most important parameter in img2img. It is a continuous scale from 0.0 (no change) to 1.0 (essentially text-to-image with a vague compositional bias), and understanding where to set it for different tasks is the key skill.

RangeTransformation LevelUse Cases
0.1–0.3Minimal — subtle refinementNoise reduction, slight color correction, texture enhancement, minor quality improvement
0.3–0.5Moderate — recognizable changesGentle style transfer, lighting adjustment, color palette shift, detail enhancement
0.5–0.7Significant — clear transformationFull style transfer (photo to painting), environment changes, substantial aesthetic transformation
0.7–0.85Major — loose referenceDramatic reimagining, sketch-to-finished-art, concept exploration from rough references
0.85–1.0Near-complete — structural echo onlyUsing input as compositional inspiration only, generating "variations" of a concept

Finding the Sweet Spot

Start at 0.5 and generate. If the result is too similar to the input, increase by 0.1. If the result has lost too much of the original's structure, decrease by 0.1. After 2–3 adjustments, you will find the exact strength that gives the right balance of transformation and preservation for your specific input and prompt combination.

The sweet spot changes based on the input image. Photographs with strong, clear compositions tolerate higher denoising before losing structure (their structure is so strong that it persists through more noise). Sketches and abstract inputs need lower denoising to preserve their compositional intent. Complex scenes with many small elements need lower denoising because fine details are the first things destroyed by noise.

Style Transfer: Photo to Art

Style transfer is img2img's most popular application: take a photograph and transform it into an oil painting, watercolor, anime illustration, pixel art, or any other visual style. The photo provides composition and content; the prompt provides the target style.

Effective Style Transfer Prompts

The style prompt should be specific about the target medium and aesthetic:

# Instead of this:
"painting of a landscape"

# Write this:
"oil painting on canvas, thick impasto brushstrokes,
warm color palette, impressionist style, soft edges,
visible paint texture, gallery lighting"

Describe the target medium (oil painting, watercolor, charcoal drawing), the technique (impasto, wet-on-wet, cross-hatching), the aesthetic movement (impressionist, art nouveau, expressionist), and quality markers (museum quality, gallery exhibition, masterwork). The more specific your style description, the more convincingly the model transforms the image.

Style Transfer Settings by Target Style

Target StyleDenoisingCFG ScaleKey Prompt Terms
Oil painting0.55–0.77–9oil on canvas, brushstrokes, impasto, gallery lighting
Watercolor0.5–0.656–8watercolor wash, transparent layers, wet edges, paper texture
Anime/manga0.6–0.757–10anime style, cel shaded, clean lines, vibrant colors
Pencil sketch0.5–0.656–8graphite pencil drawing, cross-hatching, white paper
Cyberpunk0.55–0.78–10neon lighting, rain-slicked surfaces, holographic, dystopian
Vintage photograph0.3–0.55–7faded film, grain, 1970s Polaroid, warm cast, soft focus

Using LoRAs for Style Transfer

For the most convincing style transfers, combine img2img with a style LoRA. A LoRA trained on a specific artist's work or a particular visual style will produce more authentic results than prompt engineering alone. Load the style LoRA at weight 0.6–0.8, write a style-matching prompt, and set denoising to 0.5–0.7. The LoRA handles the stylistic nuance that words cannot fully capture.

Sketch-to-Image: From Rough to Refined

Img2img transforms rough sketches into finished, rendered images. This workflow is popular among concept artists, game designers, and illustrators who want to iterate quickly on compositions before committing to full rendering.

Preparing Your Sketch

The sketch does not need to be polished. Even rough gesture drawings and basic shape compositions work as img2img inputs. However, a few preparation steps improve results:

Sketch-to-Image Settings

Use denoising strength 0.7–0.9 for sketches. The model needs significant freedom to transform rough lines into rendered content. Lower values keep too much of the sketch's rough quality. The prompt should describe the finished result in detail — surface materials, lighting, atmosphere, and style — because the sketch provides only structure.

For more precise control over which elements of the sketch are followed, consider using ControlNet in scribble or lineart mode instead of standard img2img. ControlNet provides structural conditioning without the noise-addition mechanism of img2img, giving you more independent control over structure and style.

Photo Enhancement and Restoration

Img2img can enhance photographs in ways that go beyond traditional editing tools. Because the model understands the content of the image, it can add genuine detail, correct lighting, and improve composition — not just sharpen pixels.

Quality Enhancement

For quality enhancement without changing the image content, use low denoising strength (0.2–0.35) with a quality-focused prompt:

professional photograph, high resolution, sharp focus,
clean detail, natural lighting, DSLR quality,
well-exposed, accurate colors

The model refines textures, reduces noise, and enhances detail while keeping the image recognizably the same. This is particularly effective for improving smartphone photos, compressed images, and older digital photographs. The results exceed traditional sharpening because the model generates real detail rather than amplifying existing noise.

Lighting Correction

Describe the desired lighting in your prompt while using moderate denoising (0.3–0.5): "natural golden hour lighting, warm tones, soft shadows" transforms a harshly lit photo into a warmly lit one. "Studio three-point lighting, professional portrait" transforms casual portrait lighting into studio-quality lighting. The model re-renders the lighting of the scene while preserving subject identity and composition.

Background Enhancement

For images where the subject is good but the background is distracting, use a moderate denoising (0.4–0.6) with a prompt that describes the desired background: "professional portrait, clean blurred background, shallow depth of field, bokeh." The model typically preserves the subject while replacing or enhancing the background, though for precise control, consider inpainting the background specifically.

Advanced Img2Img Techniques

Progressive Refinement

Instead of attempting one perfect transformation, apply img2img iteratively with low denoising at each step. Start with the original, apply img2img at 0.3 denoising with your style prompt. Take the output, feed it back as the input, and apply img2img again at 0.3. Each iteration nudges the image closer to the target style without the jarring artifacts that can occur with a single high-denoising pass. Three to five iterations of gentle transformation often produce more coherent results than one aggressive transformation.

Multi-Model Pipeline

Different models have different strengths. A powerful workflow for maximum quality:

  1. Generate the base image with advanced AI (best prompt adherence and composition)
  2. Feed the FLUX output into SDXL via img2img for stylistic treatment (some styles are better represented in SDXL's training data)
  3. Upscale the result with Real-ESRGAN
  4. Refine the upscaled image with a final img2img pass at low denoising (0.2–0.3) to add fine detail

This pipeline extracts the best qualities from each model. FLUX for composition, SDXL for style, Real-ESRGAN for resolution, and a final pass for detail refinement.

Seed Control for Variation Exploration

Keep the same input image and prompt but change the seed to explore variations. Each seed produces a different interpretation of the transformation. Generate 8–16 variations, select favorites, and then fine-tune those with additional img2img passes or inpainting. This is faster than tweaking the prompt word by word because the visual differences between seeds are immediate and dramatic.

CFG Scale Interaction with Img2Img

CFG (Classifier-Free Guidance) scale interacts differently with img2img than with text-to-image. In text-to-image, higher CFG pushes the model toward the prompt more aggressively. In img2img, high CFG combined with low denoising can produce over-saturated or artifact-heavy results because the model is trying to push the slightly-noised image strongly toward the prompt with very few steps.

For img2img, use lower CFG than you would for text-to-image. If you normally generate at CFG 7–8, try 5–7 for img2img. At very low denoising (0.2–0.3), reducing CFG to 3–5 often produces the most natural results. The input image already provides strong structural guidance, so less prompt pressure is needed.

Img2Img vs ControlNet: When to Use Which

Img2img and ControlNet both use reference images, but they work fundamentally differently and excel at different tasks:

CriterionImg2ImgControlNet
How reference is usedNoise added to reference, then denoisedStructural info extracted from reference, used as conditioning
Content preservationPreserves actual colors, textures, content from the originalPreserves only the specific structure type (edges, depth, pose)
Best for style transferYes — directly transforms contentOnly if combined with IP-Adapter for style reference
Best for structural controlModerate — structure degrades with high denoisingExcellent — structure is maintained independently of content
Best for photo enhancementYes — preserves and enhances photo contentOnly via Tile ControlNet
Best for sketch-to-imageGood with high denoisingBetter — ControlNet scribble/lineart designed for this
Combining with promptsPrompt and image compete (denoising balances them)Prompt and control are complementary (independent signals)

Use img2img when you want to transform existing visual content — style transfer, photo enhancement, quality improvement, and iterative refinement. Use ControlNet when you want structural guidance from a reference while generating entirely new visual content — pose-guided generation, composition control, and edge-guided rendering. For maximum control, combine both: use ControlNet for structural guidance and img2img's noise-addition for content-level transformation.

Batch Processing and Workflow Automation

For professional workflows that require transforming multiple images consistently — converting a product photo set to illustration style, enhancing a batch of event photographs, or generating variations for A/B testing — batch processing is essential.

In ComfyUI, batch img2img is built into the workflow system. Load a directory of images, apply the same prompt and settings to each, and output to a results directory. The consistency of the transformation depends on using identical settings for every image in the batch.

For the most consistent batch results, lock the seed across all images in the batch. While this means every image gets the same random influence, it removes one source of variation, making the transformation more uniform across the set. If you need variation, use sequential seeds (seed, seed+1, seed+2) rather than random seeds to maintain some consistency.

Transform Your Images with AI

Upload any photo and transform it with img2img on ZSky AI's dedicated RTX 5090 GPUs. Style transfer, enhancement, and creative transformation in seconds.

Try ZSky AI Free →
Made with ZSky AI
Image-to-Image Guide: Transform Photos with AI — ZSky AI
Create photos like thisFree, free to use
Try It Free

Frequently Asked Questions

What is image-to-image (img2img) in AI generation?

Img2img is an AI generation mode where you provide an existing image as a starting point along with a text prompt. The diffusion model adds noise to the input image (controlled by denoising strength), then denoises it guided by your prompt. The result inherits the composition and structure of the input while being transformed according to the prompt. Low denoising preserves the original closely; high denoising produces more dramatic transformations.

What denoising strength should I use for img2img?

Start at 0.5 and adjust. Use 0.2–0.4 for subtle refinements like quality enhancement. Use 0.4–0.6 for moderate style transfer. Use 0.6–0.8 for significant transformations like photo-to-painting. Use 0.8–1.0 when you want only a loose structural reference from the original.

How do I do style transfer with AI?

Upload your source image in img2img mode and write a detailed prompt describing the target style. Set denoising strength to 0.5–0.7. For stronger style adherence, combine with a style LoRA. Describe the specific medium, technique, and aesthetic movement for the most convincing results.

Can I convert a rough sketch into a finished image?

Yes. Upload your sketch as the img2img input, write a detailed prompt describing the finished result, and set denoising strength to 0.7–0.9. The model uses your sketch as a compositional guide while generating fully rendered content. For more precise structural control, ControlNet's scribble mode is even more effective.

What is the difference between img2img and ControlNet?

Img2img directly transforms your image by adding noise and denoising with a prompt. ControlNet extracts structural information from a reference and uses it as conditioning for generation from noise. Img2img transforms content; ControlNet provides structural guidance. Use img2img for style transfer and enhancement, ControlNet for compositional control.

How do I enhance photo quality with img2img?

Use low denoising (0.2–0.4) with a quality-focused prompt: "high resolution, sharp focus, professional lighting, clean detail." The model enhances textures and detail while preserving content. For best results, combine with upscaling (Real-ESRGAN) and a final low-denoising refinement pass.