Prompt Engineering as a Debugging Process
If you search “prompt engineering”, you’ll find a million posts telling you patterns that work. But instead of trying to memorize prompt patterns, I did something more boring. I kept a timeline of prompt changes in my project: what I changed, what broke, what improved, and what that probably says about how the model “thinks”.
This article is basically my prompt debug log, cleaned up into something readable.
Also: this is personal experience. I might be wrong.I’m writing this because I want to re-check my thinking and spot problems I still don’t see.
Before anything: a short note on terminology
A prompt - is just the input we give the model so it can produce the next output.
A Large Language Model - is a next-word prediction machine. You give it some text, and it predicts what text should come next. A simple mental model for text models (LLMs) is this: It predicts the next chunk of text, then the next, then the next… until it stops. It’s more like extremely advanced autocomplete. That’s why wording matters — it changes what pattern the model continues. The better the context, the better its predictions.
What about temperature, top-K, and all those settings?
At temperature 0, the model always picks the most likely next word. Very predictable. Very safe. Sometimes very boring or flat. As you increase temperature (say 0.4, 0.7, 1.0), the model starts considering less likely options. You get more variety and creativity but also more randomness and occasional weirdness.
Top-K limits how many options the model considers at each step. Top-K of 40 means it only looks at the 40 most likely next words before choosing. Lower numbers mean more focus.Higher numbers mean more diversity.
The actual problem my prompts were trying to solve
One of my project's goal sounded simple in one sentence: Convert a 2D floor plan into a photorealistic 3D interior view, at eye level, without breaking the layout.
I am using image generation models (primarily Gemini) that take images as input and produce images as output.
The core challenge was getting the AI to look at a flat 2D floor plan and produce a realistic photo of what that room would look like from the inside at eye level with correct walls, windows, doors, materials, and lighting.
Here’s what actually happened, phase by phase.
Phase 1: The polite prompt (and why it failed)
Generate a photorealistic 3D interior photograph from this floor plan. Camera at human eye level. Floor at bottom, ceiling at top.
The result - Top-down and birds eye views. Everything except what a person standing inside the room would actually see. What I learned: The model defaulted to top-down views because “floor plan → top-down” is a very strong mental shortcut.
Phase 2: Over-correcting with “DO NOT”
So I went into full angry mode: What you MUST NOT create: Top-down view; Bird’s-eye view; Isometric -Any view showing the entire floor layout from above.
It helped but it also introduced a strange side effect. The more “DON’Ts” I added, the more unstable the outputs felt. Lesson: Negative constraints help, but too many become noise. They also don’t tell the model what to do instead.
Phase 3: The shift to tell it what TO do, very precisely
Instead of saying “don’t make a top-down view”, I tried this: “Horizon line centered vertically at approximately 50% from the top. All vertical architectural lines remain perfectly parallel to image edges. Zero-Tilt: all vertical edges must be 90° to the horizon.” This worked noticeably better. What I learned: Positive, specific, measurable instructions beat negative.
Recommended by LinkedIn
Phase 4: The role change mattered more than expected
I changed my system instruction from: “Act as an Architectural Visualizer” to: “Act as a Professional Interior Photographer”. The outputs shifted toward eye-level perspectives almost immediately.
My theory is that the model’s training data associates “architect” with blueprints, plans, and top-down diagrams, while “photographer” is associated with eye-level shots, composition, and real-world perspectives. I’m not 100% sure this explanation is correct but the results were consistent enough that I kept it.
Phase 5: Language mattered more than I expected
Calling structural elements “immutable anchors” worked better than saying “don’t move the walls”. The model seemed to respond much better to domain-specific terminology — words that are strongly tied to concrete concepts in its training data. What I learned: Use the vocabulary of the domain you want the output to match. If you want architectural accuracy, use architectural terms. If you want photographic quality, use photography terms.
Phase 6: Multi-pass beats single-pass, always
Trying to do everything in one prompt never worked reliably.
So I split the process:
Pass 1: Generate a render - a plain white architectural model with correct geometry, shadows, and perspective. No textures. No furniture. No colors. Low temperature (0.2) for consistency.
Pass 2: Take that render and add materials, furniture, and lighting. Use reference images for style. Higher temperature (0.4) for creative freedom.
This was far more stable than any single-pass prompt.
What actually helped
What actually helped wasn’t clever prompts. It was documenting what worked, what didn’t, and asking why.
This article comes from a real project I built, but the lessons apply even if you’re just typing questions into a chat window. No code. No settings. Just curiosity and patience.
Document everything as you go. Not in a polished way. Just messy notes: “Tried X, got Y, expected Z.” That’s it.
After a few weeks, go back and read your notes. Patterns will jump out. You’ll see your recurring mistakes. Experiment, document and analyze.
Why this works better than memorizing tips
Prompt engineering is deeply context-dependent. What works for interior rendering doesn’t directly work for code generation or essay writing. The patterns I found are specific to my use case, my model, and my inputs. Your patterns will be different.
I might be wrong about some explanations in this article. I’m explaining why I think certain things worked, based on my understanding of how these models work. The internal mechanics may be different.
I’m not an ML researcher.I’m a developer who needed to get something working. And this is what actually helped.
I’m currently experimenting with Gemini voice models for my next project. I’m approaching it the same way. If I learn something useful, I’ll probably update this article.