ZSky AI vs ChatGPT Images 2.0 — 10 Prompts, Side By Side, No Cherry Picking

By Cemhan Biricik · · Companion piece to "The Original Anti-Slop AI"

OpenAI launched ChatGPT Images 2.0 on April 21, 2026 with a "thinking" step that reasons about prompts before generating. ZSky AI launched on March 13, 2026 with the same architectural idea, fine-tuned by a working photographer rather than a generic LLM. To find out which actually delivers, I ran the same 10 prompts through both, kept the first generation, and made no edits. Here is exactly what each tool produced, with my honest verdict on every pair.

6
ZSky wins
3
ChatGPT wins
1
Tie
6.9s
ZSky median
30–60s
ChatGPT (Plus)

Methodology

Prompts 1–10

Prompt 01
a woman in a red dress
ZSky wins
ChatGPT Images 2.0 ~30–60s
ChatGPT Images 2.0 output for 'a woman in a red dress'
ZSky AI 6.7s engine
ZSky AI output for 'a woman in a red dress'

ZSky placed her in a wood-paneled library with warm window light, structured tailoring, and gold accents; ChatGPT delivered a competent but flat studio portrait against a neutral wall. Art direction wins on environment and styling specificity.

Prompt 02
mountain range at golden hour
ZSky wins
ChatGPT Images 2.0 ~30–60s
ChatGPT output: mountain range at golden hour
ZSky AI 6.9s engine
ZSky AI output: mountain range at golden hour

ZSky delivered actual photographic golden hour with snow-capped peaks, fog in the valleys, and a real photographic depth of field. ChatGPT gave us oversaturated AI-fantasy. If you can tell the difference between a stock landscape and a real Sony A7R shot, this is decisive.

Prompt 03
a man at a window during a storm
Lean ChatGPT
ChatGPT Images 2.0 ~30–60s
ChatGPT output: a man at a window during a storm
ZSky AI 8.1s engine
ZSky AI output: a man at a window during a storm

Both delivered the brief. ChatGPT's edges this one with actual lightning in the night sky and a darker storm energy. ZSky's is more intimate — man in a sweater holding a coffee mug, warm interior lamp, palm tree visible through the rain — reads as a quiet storm-day moment rather than a cinematic storm still. Honest call: ChatGPT delivered more on the literal "storm" word.

Prompt 04
a French bistro menu titled Chez Henri with three appetizers, four mains, two desserts, all in French, hand-lettered chalkboard style
ChatGPT wins
ChatGPT Images 2.0 ~30–60s
ChatGPT output: French bistro menu
ZSky AI 6.2s engine
ZSky AI output: French bistro menu

ChatGPT's clean win. Their text rendering is the announced headline feature of Images 2.0 and they delivered: clean accents, correct French, exactly the items requested. ZSky's chalkboard is more atmospheric but ships typos: "oixon" should be "oignon", "chève" should be "chèvre", "Boèuf" should be "Bœuf", "Rahatouille" should be "Ratatouille". If text accuracy is your job, use ChatGPT for this prompt.

Prompt 05
a model walking
ZSky wins
ChatGPT Images 2.0 ~30–60s
ChatGPT output: a model walking
ZSky AI 6.1s engine
ZSky AI output: a model walking

ChatGPT defaulted to runway catwalk with empty audience. ZSky read it as street fashion editorial: trench coat, structured pants, sneakers, real city sidewalk, candid stride. This is what the Creative Director enhancer does — it bridges from "a model walking" to "Annie Leibovitz street editorial."

Prompt 06
a cyberpunk sushi chef
Lean ChatGPT
ChatGPT Images 2.0 ~30–60s
ChatGPT output: cyberpunk sushi chef
ZSky AI 6.2s engine
ZSky AI output: cyberpunk sushi chef

Both delivered cyberpunk atmosphere with neon. ChatGPT's scene has more environmental storytelling — Japanese signage that reads correctly ("未来寿司" = "future sushi"), more layered detail. ZSky's is cleaner and more centered but less ambitious in world-building. Honest call: ChatGPT edges this.

Prompt 07
a white ceramic coffee mug on a wooden table
ZSky wins
ChatGPT Images 2.0 ~30–60s
ChatGPT output: white ceramic coffee mug
ZSky AI 6.2s engine
ZSky AI output: white ceramic coffee mug

ZSky added what a real product photographer would: rising steam, soft window light, shallow depth of field, tactile wood grain. ChatGPT delivered a flat literal mug on a table. This is the gap between "AI image of a mug" and "product photo."

Prompt 08
a fisherman mending nets at dawn
ZSky wins
ChatGPT Images 2.0 ~30–60s
ChatGPT output: fisherman mending nets
ZSky AI 7.2s engine
ZSky AI output: fisherman mending nets

Both are competent. ZSky's is colder and quieter (real dawn light, fog on the water, weathered boat in the background) and reads as documentary photography. ChatGPT's is warmer and more sentimental, with a seagull and harbor activity. For editorial work the ZSky version is more useful; for greeting cards the ChatGPT version is fine. Calling it for ZSky on craft.

Prompt 09
a Tokyo street sign reading 'welcome' in Japanese with cherry blossoms
Tie
ChatGPT Images 2.0 ~30–60s
ChatGPT output: Tokyo welcome sign
ZSky AI 6.3s engine
ZSky AI output: Tokyo welcome sign

Both rendered ようこそ ("yōkoso" = welcome) correctly — this was meant as ChatGPT's second layup and ZSky kept up. ChatGPT added Tokyo Tower for additional storytelling; ZSky's street sign is more documentary and the cherry blossom framing is more painterly. Honest tie.

Prompt 10
loneliness
ZSky wins
ChatGPT Images 2.0 ~30–60s
ChatGPT output: loneliness
ZSky AI 7.2s engine
ZSky AI output: loneliness

The most interesting prompt of the test, because it has no concrete subject. ChatGPT defaulted to "literal lonely figure on a bench facing a sea." ZSky read it as an emotional brief: hooded figure on a stone dock dipping a foot in still water, fog erasing the horizon, ripples spreading from the contact point, painterly atmosphere. This is the gap between "AI illustrating a word" and "AI directing an image."

What this comparison actually proves

ZSky won 6 of 10. ChatGPT won 3 of 10 (one of which was the menu prompt designed to favor it). One was a tie. This is not us claiming ZSky is "better at AI" — the underlying image models are different products doing different things. What the comparison proves is that art direction at the prompt layer is the lever no one was pulling, and that fine-tuning that layer with an actual photographer's archive produces visibly different output than fine-tuning it with a general-purpose reasoning LLM.

The reason OpenAI had to ship a "thinking" mode for Images 2.0 is the same reason ZSky shipped a Creative Director enhancer 39 days earlier: the gap between "AI image" and "usable creative output" is bridged by direction, not by a bigger model. We just got there first, and ours is fine-tuned by someone who has done the actual job.

For the founder writeup, methodology, and the timeline of what shipped when, see "The Original Anti-Slop AI."

Reproduce this test yourself

The 10 prompts above are unmodified. ChatGPT Images 2.0 access requires ChatGPT Plus ($20/mo). ZSky AI is free at zsky.ai/create with no signup. Press accounts (ad-free): [email protected].

Try ZSky AI →
Editorial note: Verdicts are the author's. ZSky timings are from dispatcher engine logs (text-to-image only, excluding queue and client overhead). ChatGPT timings are OpenAI's own publicly stated 30–60 sec range on the Plus tier. Image files are saved as captured: ChatGPT shipped at 1086×1448 / 1122×1402 / 1536×1024 (PNG, ~1.5 MP); ZSky shipped at 1024×1024 (WebP). Press inquiries: [email protected].