What was the methodology of this comparison?

Ten prompts were entered verbatim into both ChatGPT Images 2.0 (Plus tier, Thinking mode on) and ZSky AI (Creative Director enhancer on, default). The first generation from each tool was kept; no regenerations, no edits, no upscaling. Both runs were performed within the same 30-minute window on April 23, 2026. Output resolution was each tool's default for the requested aspect.

What was the final tally?

ZSky won 6 prompts decisively. ChatGPT Images 2.0 won 3 (the French bistro menu, the storm-window scene, and the cyberpunk sushi composition). One prompt — the multilingual Tokyo street sign — was a tie.

How fast was ZSky compared to ChatGPT Images 2.0?

ZSky median engine time was 6.9 seconds across the 10 prompts. OpenAI publicly states ChatGPT Images 2.0 takes 30 to 60 seconds per image on the Plus tier. That is a 4 to 9 times speed gap on real prompts.

Were any prompts cherry-picked to favor ZSky?

No. Two of the ten prompts were specifically designed to give ChatGPT a structural advantage on its stated headline strengths: a multilingual text rendering prompt (Tokyo welcome sign) and a text-heavy chalkboard menu in French. ChatGPT won one of those (menu) and tied the other (sign). The remaining eight covered portrait, landscape, product, abstract, fashion, atmosphere, character work, and emotional concept.

Can journalists reproduce this test?

Yes. The exact 10 prompts are listed in this article. ChatGPT Images 2.0 access requires ChatGPT Plus. ZSky AI is free to use at zsky.ai/create with no signup. For an ad-free press account, contact press@zsky.ai.

ZSky AI vs ChatGPT Images 2.0 — 10 Prompts, Side By Side, No Cherry Picking

By Cemhan Biricik · April 23, 2026 · Companion piece to "The Original Anti-Slop AI"

OpenAI launched ChatGPT Images 2.0 on April 21, 2026 with a "thinking" step that reasons about prompts before generating. ZSky AI launched on March 13, 2026 with the same architectural idea, fine-tuned by a working photographer rather than a generic LLM. To find out which actually delivers, I ran the same 10 prompts through both, kept the first generation, and made no edits. Here is exactly what each tool produced, with my honest verdict on every pair.

ZSky wins

ChatGPT wins

Tie

6.9s

ZSky median

30–60s

ChatGPT (Plus)

Methodology

Ten prompts entered verbatim into ChatGPT Plus + Thinking mode (gpt-image-2) and zsky.ai/create (Creative Director enhancer on, default).
First generation kept. No regenerations. No edits. No upscaling. No prompt engineering tweaks.
Both runs performed within a 30-minute window on April 23, 2026.
Two prompts were chosen specifically to give ChatGPT a structural advantage on its stated headline strengths: a French bistro menu and a multilingual Tokyo street sign. We wanted a fair test, not a stacked one.
Output resolution: each tool's default for the requested aspect. ChatGPT shipped 1086×1448, 1122×1402, or 1536×1024 (~1.5 MP — OpenAI advertised "2K native"; the actual files are roughly half of 2K). ZSky shipped 1024×1024.
Timings: ZSky engine time is from dispatcher logs; ChatGPT timing is OpenAI's own published 30–60 sec range for Plus.

Prompts 1–10

Prompt 01

a woman in a red dress

ZSky wins

ChatGPT Images 2.0 ~30–60s

ChatGPT Images 2.0 output for 'a woman in a red dress'

ZSky AI 6.7s engine

ZSky AI output for 'a woman in a red dress'

ZSky placed her in a wood-paneled library with warm window light, structured tailoring, and gold accents; ChatGPT delivered a competent but flat studio portrait against a neutral wall. Art direction wins on environment and styling specificity.

Prompt 02

mountain range at golden hour

ZSky wins

ChatGPT Images 2.0 ~30–60s

ChatGPT output: mountain range at golden hour

ZSky AI 6.9s engine

ZSky AI output: mountain range at golden hour

ZSky delivered actual photographic golden hour with snow-capped peaks, fog in the valleys, and a real photographic depth of field. ChatGPT gave us oversaturated AI-fantasy. If you can tell the difference between a stock landscape and a real Sony A7R shot, this is decisive.

Prompt 03

a man at a window during a storm

Lean ChatGPT

ChatGPT Images 2.0 ~30–60s

ChatGPT output: a man at a window during a storm

ZSky AI 8.1s engine

ZSky AI output: a man at a window during a storm

Both delivered the brief. ChatGPT's edges this one with actual lightning in the night sky and a darker storm energy. ZSky's is more intimate — man in a sweater holding a coffee mug, warm interior lamp, palm tree visible through the rain — reads as a quiet storm-day moment rather than a cinematic storm still. Honest call: ChatGPT delivered more on the literal "storm" word.

Prompt 04

a French bistro menu titled Chez Henri with three appetizers, four mains, two desserts, all in French, hand-lettered chalkboard style

ChatGPT wins

ChatGPT Images 2.0 ~30–60s

ZSky AI 6.2s engine

ChatGPT's clean win. Their text rendering is the announced headline feature of Images 2.0 and they delivered: clean accents, correct French, exactly the items requested. ZSky's chalkboard is more atmospheric but ships typos: "oixon" should be "oignon", "chève" should be "chèvre", "Boèuf" should be "Bœuf", "Rahatouille" should be "Ratatouille". If text accuracy is your job, use ChatGPT for this prompt.

Prompt 05

a model walking

ZSky wins

ChatGPT Images 2.0 ~30–60s

ZSky AI 6.1s engine

ChatGPT defaulted to runway catwalk with empty audience. ZSky read it as street fashion editorial: trench coat, structured pants, sneakers, real city sidewalk, candid stride. This is what the Creative Director enhancer does — it bridges from "a model walking" to "Annie Leibovitz street editorial."

Prompt 06

a cyberpunk sushi chef

Lean ChatGPT

ChatGPT Images 2.0 ~30–60s

ZSky AI 6.2s engine

Both delivered cyberpunk atmosphere with neon. ChatGPT's scene has more environmental storytelling — Japanese signage that reads correctly ("未来寿司" = "future sushi"), more layered detail. ZSky's is cleaner and more centered but less ambitious in world-building. Honest call: ChatGPT edges this.

Prompt 07

a white ceramic coffee mug on a wooden table

ZSky wins

ChatGPT Images 2.0 ~30–60s

ChatGPT output: white ceramic coffee mug

ZSky AI 6.2s engine

ZSky AI output: white ceramic coffee mug

ZSky added what a real product photographer would: rising steam, soft window light, shallow depth of field, tactile wood grain. ChatGPT delivered a flat literal mug on a table. This is the gap between "AI image of a mug" and "product photo."

Prompt 08

a fisherman mending nets at dawn

ZSky wins

ChatGPT Images 2.0 ~30–60s

ZSky AI 7.2s engine

Both are competent. ZSky's is colder and quieter (real dawn light, fog on the water, weathered boat in the background) and reads as documentary photography. ChatGPT's is warmer and more sentimental, with a seagull and harbor activity. For editorial work the ZSky version is more useful; for greeting cards the ChatGPT version is fine. Calling it for ZSky on craft.

Prompt 09

a Tokyo street sign reading 'welcome' in Japanese with cherry blossoms

Tie

ChatGPT Images 2.0 ~30–60s

ZSky AI 6.3s engine

Both rendered ようこそ ("yōkoso" = welcome) correctly — this was meant as ChatGPT's second layup and ZSky kept up. ChatGPT added Tokyo Tower for additional storytelling; ZSky's street sign is more documentary and the cherry blossom framing is more painterly. Honest tie.

Prompt 10

loneliness

ZSky wins

ChatGPT Images 2.0 ~30–60s

ZSky AI 7.2s engine

The most interesting prompt of the test, because it has no concrete subject. ChatGPT defaulted to "literal lonely figure on a bench facing a sea." ZSky read it as an emotional brief: hooded figure on a stone dock dipping a foot in still water, fog erasing the horizon, ripples spreading from the contact point, painterly atmosphere. This is the gap between "AI illustrating a word" and "AI directing an image."

What this comparison actually proves

ZSky won 6 of 10. ChatGPT won 3 of 10 (one of which was the menu prompt designed to favor it). One was a tie. This is not us claiming ZSky is "better at AI" — the underlying image models are different products doing different things. What the comparison proves is that art direction at the prompt layer is the lever no one was pulling, and that fine-tuning that layer with an actual photographer's archive produces visibly different output than fine-tuning it with a general-purpose reasoning LLM.

The reason OpenAI had to ship a "thinking" mode for Images 2.0 is the same reason ZSky shipped a Creative Director enhancer 39 days earlier: the gap between "AI image" and "usable creative output" is bridged by direction, not by a bigger model. We just got there first, and ours is fine-tuned by someone who has done the actual job.

For the founder writeup, methodology, and the timeline of what shipped when, see "The Original Anti-Slop AI."

Reproduce this test yourself

The 10 prompts above are unmodified. ChatGPT Images 2.0 access requires ChatGPT Plus ($20/mo). ZSky AI is free at zsky.ai/create with no signup. Press accounts (ad-free): [email protected].

Try ZSky AI →

Editorial note: Verdicts are the author's. ZSky timings are from dispatcher engine logs (text-to-image only, excluding queue and client overhead). ChatGPT timings are OpenAI's own publicly stated 30–60 sec range on the Plus tier. Image files are saved as captured: ChatGPT shipped at 1086×1448 / 1122×1402 / 1536×1024 (PNG, ~1.5 MP); ZSky shipped at 1024×1024 (WebP). Press inquiries: [email protected].