GPT-4 Prompt Engineering: What I Actually Do Differently
I'll start with a confession: when GPT-4 first came out, I kept using it the same way I used GPT-3.5. Short prompts, quick questions, accept the first answer. It worked fine — until I realized I was using a sports car to drive to the grocery store.
GPT-4 is a fundamentally different tool. Not just "smarter" — it processes instructions differently, handles complexity differently, and frankly, punishes lazy prompts more than weaker models do.
After several months of daily use, here's what actually changed in my approach.
The Biggest Mistake: Treating GPT-4 Like GPT-3.5
With GPT-3.5, I learned to keep prompts short. Long instructions confused it. Complex multi-step tasks fell apart. So I kept things simple — and it worked.
Then GPT-4 came along, and I kept using the same short prompts. The answers were better than 3.5, sure. But they were still generic. Still surface-level. Still "safe."
The turning point was when I gave GPT-4 a task I'd normally break into three separate prompts. Instead, I wrote one detailed prompt — full context, specific constraints, desired output format, examples of what I wanted — and the first response was better than anything I'd gotten after five rounds of back-and-forth with 3.5.
That's when I understood: GPT-4 rewards thoroughness. Punishes vagueness. The more precisely you describe what you want, the better it performs.
This is the opposite of what many people assume. They think a smarter model needs less instruction. In reality, a smarter model can handle more instruction — and benefits from it.
What Changed in My Prompting
I started writing longer prompts
Not rambling — structured. My prompts now typically include:
- Context: What project this is for, who the audience is
- My role: What I'm trying to accomplish and why
- Specific requirements: Format, length, tone, must-include and must-avoid items
- Examples: When possible, a sample of the output style I want
This might be 100-200 words where before I'd write 20. The quality difference is significant.
I stopped asking for answers, started asking for reasoning
GPT-3.5 gave you answers. Some were right, some were wrong, and it was hard to tell which. GPT-4 can show its reasoning — and when it does, the answers are dramatically better.
My most-used phrase now: "Think through this step by step before giving your final answer."
For anything involving analysis, comparison, or decision-making, this single instruction improves output quality more than any other technique I've found.
I use GPT-4 to critique itself
This is the technique I use most and talk about least. The workflow:
- Ask GPT-4 to generate something (a plan, a draft, an analysis)
- Then ask: "Now critique your own output. What are the weaknesses? What did you miss?"
- Then ask it to revise based on its own critique
This two-step process consistently produces better results than asking for a "perfect" output in one shot. GPT-4 is surprisingly good at finding flaws in its own work — as long as you explicitly ask it to.
I give it a specific persona
Not just "be an expert" — a specific one. "You're a senior backend engineer who's reviewed thousands of code reviews and has strong opinions about code readability." Or "You're a skeptical editor who pushes back on vague claims and demands evidence."
GPT-4 doesn't just adopt surface-level language patterns when given a role. It actually shifts its reasoning framework. A "skeptical editor" persona will genuinely challenge your arguments. A "patient teacher" will actually check for understanding.
The key is specificity. "Be helpful" does nothing. "Be a startup CTO who's seen three companies fail and is ruthlessly pragmatic about what actually matters" — that changes the output.
What I Stopped Doing
I stopped using GPT-4 for simple lookups. If I just need a quick fact or a simple code snippet, that's what search engines and documentation are for. GPT-4's strength is synthesis, analysis, and creation — not retrieval.
I stopped accepting the first answer for important tasks. Even GPT-4 benefits from iteration. I'll often say: "That's a good first draft. Now make it more concise and add a section about X."
I stopped treating it as an oracle. GPT-4 still hallucinates. It still makes up statistics. It still writes confident-sounding nonsense. For anything I'm going to publish or make decisions on, I verify the critical claims.
I stopped using it to do my thinking. This is subtler. Early on, I'd ask GPT-4 to make decisions for me — "what should I build?" "which framework should I choose?" The answers were reasonable but generic. I get better results when I bring my own judgment and use GPT-4 to refine, challenge, and execute my ideas rather than generate them from scratch.
The Honest Truth About "Prompt Engineering"
Here's what I've come to believe: most of what's taught as "prompt engineering" is just clear communication.
Giving context isn't a trick. Specifying your requirements isn't a hack. Showing examples isn't a technique — it's just how you'd explain something to a competent colleague.
The reason it feels like GPT-4 needs "prompt engineering" more than GPT-3.5 is that GPT-4 actually uses the information you give it. When you provide detailed context, it incorporates that context. Your effort isn't wasted.
With weaker models, extra context often gets ignored or creates confusion. So people learned to keep things minimal. That habit doesn't serve you with GPT-4.
One Last Thing
The most valuable thing GPT-4 taught me has nothing to do with prompts. It taught me that the quality of the output is directly proportional to the quality of the thinking that went into the question.
When I take five minutes to write a thorough, specific prompt, I get a great answer. When I type three lazy words, I get a mediocre answer — regardless of how powerful the model is.
GPT-4 didn't make me a better prompt engineer. It made me realize I wasn't thinking clearly about what I actually wanted. The prompt was just the mirror.
Apply this lesson to your own work: before writing a prompt, spend a few minutes thinking about what you actually want. What does "good" look like? What constraints matter? What context would help someone understand your request? The clearer you are with yourself, the clearer your prompts will be.
When to Use Other Models Instead
Despite all the techniques in this article, there are times when GPT-4 is not the right choice. Use GPT-4o-mini for high-volume, low-complexity tasks. If you need to generate hundreds of product descriptions or classify thousands of support tickets, mini is faster and cheaper with minimal quality loss. Consider Claude for long-document tasks. Claudes 200K token context window makes it ideal for analyzing long documents or codebases. Consider Gemini for multimodal tasks. Googles strong multimodal capabilities make it better for tasks involving images, charts, or visual content. Use specialized models for specialized tasks. For code generation, consider dedicated code models. For creative writing, models like Claude often produce more natural prose.
Common Mistakes to Avoid
Here are the most common mistakes I see people make when working with GPT-4. Being vague in prompts. The single biggest mistake is not being specific enough. "Write something good" will always produce mediocre output. Be explicit about every dimension that matters: length, tone, format, audience, purpose, constraints. Over-explaining to the point of confusion. Conversely, some people write prompts so long and complex that the model loses the thread. If your prompt is more than three hundred words, consider breaking it into multiple turns. Expecting the model to know your context. GPT-4 does not know your industry, your project, or your preferences unless you tell it. Always include the essential context. Using GPT-4 for everything. Simple lookups, fact retrieval, and quick calculations do not need GPT-4. Save it for tasks that benefit from deep reasoning and creative generation.
Prompt Library
Keep prompts in a searchable tool organized by use case. Record context effectiveness and modifications for each. Notice patterns where certain approaches work consistently. Share prompts with colleagues and learn from theirs.
Tips and Tricks
Keep prompts organized by use case recording context for each template carefully. Notice patterns where specific framing works consistently. Share winning templates with colleagues while adopting their proven approaches for better results.
Advanced Prompting Techniques
Beyond basic prompts, GPT-4 responds well to several advanced techniques. Chain-of-Thought (CoT) asks the model to think step by step, improving reasoning accuracy. Few-shot learning provides 2-3 examples of desired output format. Self-consistency generates multiple answers and picks the most common. ReAct combines reasoning and acting in a loop for complex tasks.
Common Mistakes to Avoid
Being too vague causes GPT-4 to guess what you want, often incorrectly. Over-specifying with too many constraints confuses the model. Ignoring context window limits leaves less room for responses. Not iterating on prompts is a mistake—first prompts rarely produce perfect results. Forgetting about latency impact matters for complex prompts.
Building a Prompt Library
Create a library of tested prompts for your specific use cases. Store prompts in version control, tag by use case and quality level, include expected outputs as test cases, review and update when GPT-4 is updated, and share effective prompts with your team.