Can I convert a rough sketch into a finished image using AI?

Yes, sketch-to-image is one of img2img's most powerful applications. Draw your sketch (even very rough), upload it as the img2img input, write a detailed prompt describing the finished result you want, and set denoising strength to 0.7-0.9 so the model uses your sketch as a compositional guide while generating fully rendered content. For more precise structural control over the sketch-to-image process, ControlNet's scribble or lineart mode provides even better results.

How does image-to-image differ from text-to-image?

Text-to-image creates an image from scratch using only your prompt as guidance. Image-to-image starts with an existing photo and modifies it according to your prompt while preserving the original composition, subject, and structure. Use text-to-image when you don't have a starting image; use image-to-image when you want targeted changes to a specific photo without losing the rest.

What aspect ratios are supported for image-to-image?

ZSky AI's image-to-image supports the source image's native aspect ratio by default — square, portrait 9:16, landscape 16:9, 4:5, 3:2, and any custom ratio. The output is delivered at HD on the free tier, 2K on Pro, 4K on Ultra. Aspect ratio is preserved unless you specify a target ratio in your prompt.

Where can I try image-to-image right now?

Open zsky.ai/tools/image-to-image to use it directly — upload your photo, type a prompt, and the AI returns a transformed image in roughly 5 seconds. The free tier is unlimited. The full studio at zsky.ai/create adds custom prompts, aspect ratios, and more controls.

Create professional photos free — unlimited video and image generation Create Free Now →

Image-to-Image Guide: Transform Photos with AI

Q: What is image-to-image (img2img) in AI generation?

Image-to-image (img2img) is an AI generation mode where you provide an existing image as a starting point along with a text prompt. The diffusion model adds noise to the input image (controlled by denoising strength), then denoises it guided by your prompt. The result is a new image that inherits the composition and structure of the input while being transformed according to the prompt. Low denoising preserves the original closely; high denoising produces more dramatic transformations.

Q: What denoising strength should I use for img2img?

Denoising strength controls how much the output differs from the input. Use 0.2-0.4 for subtle refinements like quality enhancement or minor color changes. Use 0.4-0.6 for moderate style transfer that preserves composition. Use 0.6-0.8 for significant transformations like converting photos to paintings. Use 0.8-1.0 for dramatic changes where you only want a loose structural reference from the original. Start at 0.5 and adjust based on results.

Q: How do I do style transfer with AI?

Upload your source image in img2img mode and write a prompt describing the target style (e.g., 'oil painting in the style of impressionism, visible brush strokes, warm palette'). Set denoising strength to 0.5-0.7 — high enough that the style changes significantly but low enough that the composition is preserved. For stronger style adherence, use a style LoRA alongside the prompt. The model will transform your photo's content into the specified artistic style.

Q: What is the difference between img2img and ControlNet?

Img2img adds noise to your input image and denoises it with a prompt — the model directly transforms the image. ControlNet extracts structural information (edges, depth, pose) from a reference image and uses it as conditioning alongside a separate generation from noise. The key difference: img2img transforms the actual image content, while ControlNet uses the image only for structural guidance. ControlNet provides more precise structural control; img2img provides more direct content transformation.

Q: How do I enhance photo quality with img2img?

For photo enhancement, use a low denoising strength (0.2-0.4) with a prompt describing the improved quality: 'high resolution photograph, sharp focus, professional lighting, clean detail.' This preserves the photo's content while the model enhances textures, reduces noise, and improves overall quality. For more significant enhancement, combine with an upscaling model like Real-ESRGAN first, then refine with img2img. Tile ControlNet is another excellent option for quality enhancement.

By Cemhan Biricik · February 22, 2026 · About the author · Last reviewed April 17, 2026

By Cemhan Biricik 2026-02-22 20 min read

Text-to-image generates from noise. Image-to-image generates from your image. That distinction changes everything about what is possible. Instead of describing what you want from scratch, you provide a starting point — a photograph, a sketch, a screenshot, a painting — and the diffusion model transforms it according to your text prompt while preserving as much or as little of the original as you choose.

This is not a filter.Filters apply fixed mathematical transformations to pixel values.

Image-to-image (img2img) runs your image through a diffusion model that understands what it is looking at — it recognizes faces, objects, spatial relationships, lighting, and composition — and regenerates the image with those understood elements restyled, enhanced, or transformed according to your instructions.

The results are fundamentally different from anything achievable with traditional image processing.

This guide covers the mechanics of img2img, the critical denoising strength parameter, practical workflows for style transfer, sketch-to-image conversion, photo enhancement, and advanced techniques for getting exactly the transformation you want. These techniques work across FLUX, SDXL, and most modern diffusion models through tools like ZSky AI, our generation pipeline, and Automatic1111.

How Image-to-Image Generation Works

Understanding the mechanics helps you predict and control results. In standard text-to-image, the model starts with pure random noise and progressively removes noise over many steps, guided by the text prompt, until a coherent image emerges. In img2img, the starting point is not random noise — it is your input image with a controlled amount of noise added to it.

The process works as follows:

Your input image is encoded into the model's latent space using the VAE encoder, producing a latent representation of the image.
Gaussian noise is added to this latent, proportional to the denoising strength parameter. A denoising strength of 0.5 means enough noise is added to correspond to 50% of the total diffusion process. A strength of 1.0 adds enough noise to completely obscure the original image.
The denoising process begins not from the first step but from the step corresponding to the noise level. At strength 0.5 with 20 total steps, denoising starts at step 10. The model runs the last 10 steps of its normal denoising process.
During each denoising step, the model predicts and removes noise guided by the text prompt, exactly as in text-to-image. But because the starting point contains information from your input image (not just random noise), the output retains structural elements of the original.
The denoised latent is decoded back to pixel space by the VAE decoder, producing the final transformed image.

This mechanism explains why denoising strength is so powerful: at low values, very little noise is added and very few denoising steps run, so the output closely resembles the input with minor modifications. At high values, heavy noise almost completely obscures the original, and many denoising steps run, allowing the model to generate substantially new content using the original as only a vague structural guide.

Mastering Denoising Strength

Denoising strength is the single most important parameter in img2img. It is a continuous scale from 0.0 (no change) to 1.0 (essentially text-to-image with a vague compositional bias), and understanding where to set it for different tasks is the key skill.

Range	Transformation Level	Use Cases
0.1–0.3	Minimal — subtle refinement	Noise reduction, slight color correction, texture enhancement, minor quality improvement
0.3–0.5	Moderate — recognizable changes	Gentle style transfer, lighting adjustment, color palette shift, detail enhancement
0.5–0.7	Significant — clear transformation	Full style transfer (photo to painting), environment changes, substantial aesthetic transformation
0.7–0.85	Major — loose reference	Dramatic reimagining, sketch-to-finished-art, concept exploration from rough references
0.85–1.0	Near-complete — structural echo only	Using input as compositional inspiration only, generating "variations" of a concept

Finding the Sweet Spot

Start at 0.5 and generate. If the result is too similar to the input, increase by 0.1. If the result has lost too much of the original's structure, decrease by 0.1. After 2–3 adjustments, you will find the exact strength that gives the right balance of transformation and preservation for your specific input and prompt combination.

The sweet spot changes based on the input image. Photographs with strong, clear compositions tolerate higher denoising before losing structure (their structure is so strong that it persists through more noise). Sketches and abstract inputs need lower denoising to preserve their compositional intent. Complex scenes with many small elements need lower denoising because fine details are the first things destroyed by noise.

Style Transfer: Photo to Art

Style transfer is img2img's most popular application: take a photograph and transform it into an oil painting, watercolor, anime illustration, pixel art, or any other visual style. The photo provides composition and content; the prompt provides the target style.

Effective Style Transfer Prompts

The style prompt should be specific about the target medium and aesthetic:

# Instead of this:
"painting of a landscape"

# Write this:
"oil painting on canvas, thick impasto brushstrokes,
warm color palette, impressionist style, soft edges,
visible paint texture, gallery lighting"

Describe the target medium (oil painting, watercolor, charcoal drawing), the technique (impasto, wet-on-wet, cross-hatching), the aesthetic movement (impressionist, art nouveau, expressionist), and quality markers (museum quality, gallery exhibition, masterwork).

The more specific your style description, the more convincingly the model transforms the image.

Style Transfer Settings by Target Style

Target Style	Denoising	CFG Scale	Key Prompt Terms
Oil painting	0.55–0.7	7–9	oil on canvas, brushstrokes, impasto, gallery lighting
Watercolor	0.5–0.65	6–8	watercolor wash, transparent layers, wet edges, paper texture
Anime/manga	0.6–0.75	7–10	anime style, cel shaded, clean lines, vibrant colors
Pencil sketch	0.5–0.65	6–8	graphite pencil drawing, cross-hatching, white paper
Cyberpunk	0.55–0.7	8–10	neon lighting, rain-slicked surfaces, holographic, dystopian
Vintage photograph	0.3–0.5	5–7	faded film, grain, 1970s Polaroid, warm cast, soft focus

Using LoRAs for Style Transfer

For the most convincing style transfers, combine img2img with a style LoRA. A LoRA trained on a specific artist's work or a particular visual style will produce more authentic results than prompt engineering alone. Load the style LoRA at weight 0.6–0.8, write a style-matching prompt, and set denoising to 0.5–0.7. The LoRA handles the stylistic nuance that words cannot fully capture.

Advanced Img2Img Techniques

Progressive Refinement

Instead of attempting one perfect transformation, apply img2img iteratively with low denoising at each step.Start with the original, apply img2img at 0.3 denoising with your style prompt.Take the output, feed it back as the input, and apply img2img again at 0.3.

Each iteration nudges the image closer to the target style without the jarring artifacts that can occur with a single high-denoising pass.Three to five iterations of gentle transformation often produce more coherent results than one aggressive transformation.

Multi-Model Pipeline

Different models have different strengths. A powerful workflow for maximum quality:

Generate the base image with advanced AI (best prompt adherence and composition)
Feed the FLUX output into SDXL via img2img for stylistic treatment (some styles are better represented in SDXL's training data)
Upscale the result with Real-ESRGAN
Refine the upscaled image with a final img2img pass at low denoising (0.2–0.3) to add fine detail

This pipeline extracts the best qualities from each model. FLUX for composition, SDXL for style, Real-ESRGAN for resolution, and a final pass for detail refinement.

Seed Control for Variation Exploration

Keep the same input image and prompt but change the seed to explore variations. Each seed produces a different interpretation of the transformation. Generate 8–16 variations, select favorites, and then fine-tune those with additional img2img passes or inpainting. This is faster than tweaking the prompt word by word because the visual differences between seeds are immediate and dramatic.

CFG Scale Interaction with Img2Img

CFG (Classifier-Free Guidance) scale interacts differently with img2img than with text-to-image. In text-to-image, higher CFG pushes the model toward the prompt more aggressively. In img2img, high CFG combined with low denoising can produce over-saturated or artifact-heavy results because the model is trying to push the slightly-noised image strongly toward the prompt with very few steps.

For img2img, use lower CFG than you would for text-to-image. If you normally generate at CFG 7–8, try 5–7 for img2img. At very low denoising (0.2–0.3), reducing CFG to 3–5 often produces the most natural results. The input image already provides strong structural guidance, so less prompt pressure is needed.

Img2Img vs ControlNet: When to Use Which

Img2img and ControlNet both use reference images, but they work fundamentally differently and excel at different tasks:

Criterion	Img2Img	ControlNet
How reference is used	Noise added to reference, then denoised	Structural info extracted from reference, used as conditioning
Content preservation	Preserves actual colors, textures, content from the original	Preserves only the specific structure type (edges, depth, pose)
Best for style transfer	Yes — directly transforms content	Only if combined with IP-Adapter for style reference
Best for structural control	Moderate — structure degrades with high denoising	Excellent — structure is maintained independently of content
Best for photo enhancement	Yes — preserves and enhances photo content	Only via Tile ControlNet
Best for sketch-to-image	Good with high denoising	Better — ControlNet scribble/lineart designed for this
Combining with prompts	Prompt and image compete (denoising balances them)	Prompt and control are complementary (independent signals)

Use img2img when you want to transform existing visual content — style transfer, photo enhancement, quality improvement, and iterative refinement. Use ControlNet when you want structural guidance from a reference while generating entirely new visual content — pose-guided generation, composition control, and edge-guided rendering. For maximum control, combine both: use ControlNet for structural guidance and img2img's noise-addition for content-level transformation.

Batch Processing and Workflow Automation

For professional workflows that require transforming multiple images consistently — converting a product photo set to illustration style, enhancing a batch of event photographs, or generating variations for A/B testing — batch processing is essential.

In our generation pipeline, batch img2img is built into the workflow system. Load a directory of images, apply the same prompt and settings to each, and output to a results directory. The consistency of the transformation depends on using identical settings for every image in the batch.

For the most consistent batch results, lock the seed across all images in the batch. While this means every image gets the same random influence, it removes one source of variation, making the transformation more uniform across the set. If you need variation, use sequential seeds (seed, seed+1, seed+2) rather than random seeds to maintain some consistency.

Transform Your Images with AI

Upload any photo and transform it with img2img on ZSky AI's dedicated RTX 5090 GPUs. Style transfer, enhancement, and creative transformation in seconds.

Try ZSky AI Free →

Made with ZSky AI

Image-to-Image Guide: Transform Photos with AI — ZSky AI

Create photos like thisFree, free to use

Try It Free

Frequently Asked Questions

What is image-to-image (img2img) in AI generation?

Img2img is an AI generation mode where you provide an existing image as a starting point along with a text prompt. The diffusion model adds noise to the input image (controlled by denoising strength), then denoises it guided by your prompt. The result inherits the composition and structure of the input while being transformed according to the prompt. Low denoising preserves the original closely; high denoising produces more dramatic transformations.

What denoising strength should I use for img2img?

Start at 0.5 and adjust. Use 0.2–0.4 for subtle refinements like quality enhancement. Use 0.4–0.6 for moderate style transfer. Use 0.6–0.8 for significant transformations like photo-to-painting. Use 0.8–1.0 when you want only a loose structural reference from the original.

How do I do style transfer with AI?

Upload your source image in img2img mode and write a detailed prompt describing the target style. Set denoising strength to 0.5–0.7. For stronger style adherence, combine with a style LoRA. Describe the specific medium, technique, and aesthetic movement for the most convincing results.

Can I convert a rough sketch into a finished image?

Yes. Upload your sketch as the img2img input, write a detailed prompt describing the finished result, and set denoising strength to 0.7–0.9. The model uses your sketch as a compositional guide while generating fully rendered content. For more precise structural control, ControlNet's scribble mode is even more effective.

What is the difference between img2img and ControlNet?

Img2img directly transforms your image by adding noise and denoising with a prompt. ControlNet extracts structural information from a reference and uses it as conditioning for generation from noise. Img2img transforms content; ControlNet provides structural guidance. Use img2img for style transfer and enhancement, ControlNet for compositional control.

How do I enhance photo quality with img2img?

Use low denoising (0.2–0.4) with a quality-focused prompt: "high resolution, sharp focus, professional lighting, clean detail." The model enhances textures and detail while preserving content. For best results, combine with upscaling (Real-ESRGAN) and a final low-denoising refinement pass.

Editorial note: This article is drafted with AI assistance using ZSky's own tooling and reviewed by the ZSky editorial team for accuracy and brand voice. Feedback welcome at [email protected].

Image-to-Image Guide: Transform Photos with AI

How Image-to-Image Generation Works

Mastering Denoising Strength

Finding the Sweet Spot

Style Transfer: Photo to Art

Effective Style Transfer Prompts

Style Transfer Settings by Target Style

Using LoRAs for Style Transfer

Advanced Img2Img Techniques

Progressive Refinement

Multi-Model Pipeline

Seed Control for Variation Exploration

CFG Scale Interaction with Img2Img

Img2Img vs ControlNet: When to Use Which

Batch Processing and Workflow Automation

Transform Your Images with AI

Frequently Asked Questions

What is image-to-image (img2img) in AI generation?

What denoising strength should I use for img2img?

How do I do style transfer with AI?

Can I convert a rough sketch into a finished image?

What is the difference between img2img and ControlNet?

How do I enhance photo quality with img2img?

Related Articles

Turn Any Photo Into a Video with AI (Free)

Turn Photos Into AI Art [Free, 10 Styles]

Image to Video AI Free: Animate Any Photo

AI Image-to-Image [Transform Photos]

How to Create AI Art From a Sketch (Step-by-Step)

AI Pet Portraits [Turn Photos to Art Free]

Turn Photos Into Paintings with AI [Free] 2026

AI Style Transfer [Transform Photos to Art]

Try the image-to-image tool now