The 5-Element Prompt Formula Every Photographer Knows (And AI Users Should Steal)
I spent a decade as a professional photographer before I ever typed a text-to-image prompt. When I started using AI tools, I realized most people were missing the one framework every photo pro already knows: every image is five decisions.
I'm Cemhan. I run ZSky AI. Before that I shot fashion editorial work and was shortlisted for the Sony World Photography Awards. This post is the crossover guide I wish existed when I first started prompting — the exact formula I use to write prompts that actually produce what I'm picturing.
(Well, what I'm "picturing" — I have aphantasia, so my mental imagery is text-based. Which is ironically perfect training for prompt writing.)
The formula
[SUBJECT] + [COMPOSITION] + [LIGHTING] + [STYLE] + [MOOD]
Every real photograph is answering five questions:
- Subject — what is the image of?
- Composition — how is the camera positioned and framed?
- Lighting — where is the light coming from and what's its quality?
- Style — what visual tradition is the image working inside?
- Mood — what should the viewer feel?
Every AI image generator works better when you address all five explicitly. And here's the secret: you don't have to be fancy. You just have to not skip any of them.
Let me show you.
A bad prompt vs a good prompt
Bad prompt:
a woman walking in the city
This is what 90% of first-time AI users write. It's three of the five elements at best — and they're all at the weakest level of specificity.
Good prompt using the formula:
a woman in her 60s with silver hair wearing a long wool coat (SUBJECT),
medium wide shot from slightly below eye level, she's walking
left-to-right across the frame (COMPOSITION),
late afternoon golden hour sunlight raking in from the left,
warm amber tones, shadows stretching across the pavement (LIGHTING),
shot on 35mm film with gentle grain, cinematic aspect ratio,
Kodak Portra 400 color palette (STYLE),
contemplative, peaceful, a quiet moment of independence (MOOD)
This is the same idea.
It's also ~10x the fidelity in output.You can copy-paste it into ChatGPT DALL-E, Midjourney, Runway, Pika, ZSky, Kling, Stable Diffusion — anything — and get back something recognizably in the same family across tools.Why this works (the photography background)
Professional photographers don't walk into a shoot thinking "I want a nice picture." They walk in with answers to all five questions before they press the shutter. The lighting diagram is drawn. The shot list is written. The mood board is pinned. The composition is blocked out on set before the model arrives.
When you're writing a prompt, you're doing the same mental work — just at a keyboard instead of on a set. The camera equivalent is already in the AI. You just have to tell it what you would have told a gaffer, a DP, a wardrobe stylist, and a retoucher.
The reason AI image generators feel random to beginners is that the beginners are leaving four of the five decisions up to the model. Random output is exactly what should happen. Once you make the decisions yourself, the output stops being random.
The 3-tool portability test
Here's a practical exercise: write one prompt using the 5-element formula, then run it on 3 different AI tools. You should see three recognizably different images — different because the tools have different stylistic defaults — but all three should land in the same conceptual neighborhood.
If your results are wildly different (one is a portrait, one is a landscape, one is an abstract), your prompt is too vague. Go back and tighten the composition and subject elements.
If your results are identical (all three tools produced the same mood and style), your prompt is too specific — probably naming a specific artist or technique that all three tools have in their training data. That's sometimes what you want and sometimes a sign you're being lazy. Real photography is more specific than "like Van Gogh" and more generalizable than naming a single painter.
Negative prompts: the opposite of the formula
If the 5-element formula tells the model what to do, the negative prompt tells it what not to do. I use a standard negative prompt that I add to almost everything:
deformed hands, extra fingers, misplaced features, blurry, low resolution, watermark, text overlay, duplicate subjects, inconsistent anatomy, poorly drawn faceNegative prompts work best when they're specific to the kinds of errors the model you're using actually makes. If you're getting weird hands, you put "deformed hands" and "extra fingers" first. If you're getting weird eyes, "asymmetric eyes" and "cross-eyed" go in.
The camera-as-prompt analogy
The deeper reason the formula works is that a prompt is a camera setting. You're telling a virtual camera what to look at, where to stand, how to light, what lens filter to use, and what mood to aim for. The AI is doing the image processing (what used to happen in a darkroom or in post), but you are still the one making the photograph.
I say this a lot because I think it matters: AI didn't replace photographers. It replaced cameras. Which means the photographer's craft is now the craft of prompting. Photographers who embrace this will have an edge. Photographers who resist it will lose relevance.
If you're reading this as a photographer wondering whether to learn AI tools: yes, learn them. Your visual decision-making is exactly the skill the prompt layer needs. You already know how to answer the five questions. You just have to type them instead of setting up a tripod.
And if you're reading this as an AI user who wishes your outputs were more consistent and more intentional: steal the formula. Make all five decisions before you hit generate. The randomness goes away.
TL;DR
- Every image is 5 decisions: Subject + Composition + Lighting + Style + Mood
- Bad prompts leave 2-4 of those undecided and blame the AI for being random
- Replace "beautiful" and "high quality" with actual decisions
- Test portability by running the same prompt on 3 tools
- Negative prompts handle the specific error patterns of the tool you're using
- Photographers already know this formula — they just call it "the shot"
Copy this post, bookmark it, use it. It's MIT-licensed in spirit. Go make something.
— Cemhan
Try the Formula on ZSky AI
Start creating free with unlimited video and image generation (ad-supported on the free tier) bonus. 1080p video and image with synchronized audio. no video watermark on any tier; image watermark only on the free tier.
Start Creating Free →105,000+ creators using ZSky AI. Built by a photographer for photographers.
Frequently Asked Questions
What is the 5-element prompt formula?
It is the framework every professional photographer uses: Subject plus Composition plus Lighting plus Style plus Mood. Every real photograph is answering those five questions before the shutter is pressed. Every AI image or video prompt works better when all five are addressed explicitly. The formula ports cleanly to any AI tool including ChatGPT image, Midjourney, Runway, Pika, Kling, and ZSky AI.
Why do AI image generators feel random to beginners?
Because beginners leave four of the five decisions up to the model. Random output is exactly what should happen when the prompt does not specify the subject, composition, lighting, style, and mood. Once you make those decisions yourself, the output stops being random. This is the same thing a photographer does before a shoot, just typed at a keyboard instead of blocked out on set.
Should I use words like beautiful, stunning, or 8k in my prompts?
No. Words like beautiful, stunning, masterpiece, high quality, 8k, photorealistic, and trending on artstation are hopes, not decisions. They do not tell the model what to do, they just tell it to be good. The model is already trying to be good. Replace hope words with actual decisions: instead of beautiful write soft side light, face partially in shadow. Instead of masterpiece write Vermeer-style chiaroscuro. Decisions, not hopes.
What is the shorthand version of the 5-element formula?
Subject comma lighting word comma style word comma mood word. Example: A young dancer mid-leap in a rehearsal room, harsh window light, 35mm film grain, triumphant. You can type these in 15 seconds and they outperform most 200-word mega prompts because they hit all the important decisions without wasting tokens on decoration.
How do negative prompts fit into the 5-element formula?
If the 5-element formula tells the model what to do, the negative prompt tells it what not to do. Standard negative prompts include deformed hands, extra fingers, misplaced features, blurry, low resolution, watermark, text overlay, duplicate subjects, inconsistent anatomy, and poorly drawn face. Negative prompts work best when they target the specific error patterns of the tool you are using. If you keep getting weird hands, put deformed hands and extra fingers first.
Why should photographers learn AI image tools?
Because a prompt is a camera setting. A photographer is already trained in the five-element visual decision-making that prompts require: they decide subject, composition, lighting, style, and mood before every shot. AI did not replace photographers, it replaced cameras. Photographers who embrace prompting will have an edge because their craft is now the craft of prompting. Photographers who resist it will lose relevance.