Prompt Engineering as a Debugging Process

Elene S.

Published Feb 25, 2026

If you search “prompt engineering”, you’ll find a million posts telling you patterns that work. But instead of trying to memorize prompt patterns, I did something more boring. I kept a timeline of prompt changes in my project: what I changed, what broke, what improved, and what that probably says about how the model “thinks”.

This article is basically my prompt debug log, cleaned up into something readable.

Also: this is personal experience. I might be wrong.I’m writing this because I want to re-check my thinking and spot problems I still don’t see.

Before anything: a short note on terminology

A prompt - is just the input we give the model so it can produce the next output.

A Large Language Model - is a next-word prediction machine. You give it some text, and it predicts what text should come next. A simple mental model for text models (LLMs) is this: It predicts the next chunk of text, then the next, then the next… until it stops. It’s more like extremely advanced autocomplete. That’s why wording matters — it changes what pattern the model continues. The better the context, the better its predictions.

What about temperature, top-K, and all those settings?

At temperature 0, the model always picks the most likely next word. Very predictable. Very safe. Sometimes very boring or flat. As you increase temperature (say 0.4, 0.7, 1.0), the model starts considering less likely options. You get more variety and creativity but also more randomness and occasional weirdness.

Top-K limits how many options the model considers at each step. Top-K of 40 means it only looks at the 40 most likely next words before choosing. Lower numbers mean more focus.Higher numbers mean more diversity.

The actual problem my prompts were trying to solve

One of my project's goal sounded simple in one sentence: Convert a 2D floor plan into a photorealistic 3D interior view, at eye level, without breaking the layout.

I am using image generation models (primarily Gemini) that take images as input and produce images as output.

The core challenge was getting the AI to look at a flat 2D floor plan and produce a realistic photo of what that room would look like from the inside at eye level with correct walls, windows, doors, materials, and lighting.

Here’s what actually happened, phase by phase.

Phase 1: The polite prompt (and why it failed)

Generate a photorealistic 3D interior photograph from this floor plan. Camera at human eye level. Floor at bottom, ceiling at top.

The result - Top-down and birds eye views. Everything except what a person standing inside the room would actually see. What I learned: The model defaulted to top-down views because “floor plan → top-down” is a very strong mental shortcut.

Phase 2: Over-correcting with “DO NOT”

So I went into full angry mode: What you MUST NOT create: Top-down view; Bird’s-eye view; Isometric -Any view showing the entire floor layout from above.

It helped but it also introduced a strange side effect. The more “DON’Ts” I added, the more unstable the outputs felt. Lesson: Negative constraints help, but too many become noise. They also don’t tell the model what to do instead.

Phase 3: The shift to tell it what TO do, very precisely

Instead of saying “don’t make a top-down view”, I tried this: “Horizon line centered vertically at approximately 50% from the top. All vertical architectural lines remain perfectly parallel to image edges. Zero-Tilt: all vertical edges must be 90° to the horizon.” This worked noticeably better. What I learned: Positive, specific, measurable instructions beat negative.

Recommended by LinkedIn

Harness Engineering: the discipline that turns “spiky”…

Sucharitha P. 2 months ago

Context Engineering: Designing What Agents See

Arpit Mittal 7 months ago

Engineering the Artifact not the Process

Antony (Tony) Butterfield 10 years ago

Phase 4: The role change mattered more than expected

I changed my system instruction from: “Act as an Architectural Visualizer” to: “Act as a Professional Interior Photographer”. The outputs shifted toward eye-level perspectives almost immediately.

My theory is that the model’s training data associates “architect” with blueprints, plans, and top-down diagrams, while “photographer” is associated with eye-level shots, composition, and real-world perspectives. I’m not 100% sure this explanation is correct but the results were consistent enough that I kept it.

Phase 5: Language mattered more than I expected

Calling structural elements “immutable anchors” worked better than saying “don’t move the walls”. The model seemed to respond much better to domain-specific terminology — words that are strongly tied to concrete concepts in its training data. What I learned: Use the vocabulary of the domain you want the output to match. If you want architectural accuracy, use architectural terms. If you want photographic quality, use photography terms.

Phase 6: Multi-pass beats single-pass, always

Trying to do everything in one prompt never worked reliably.

So I split the process:

Pass 1: Generate a render - a plain white architectural model with correct geometry, shadows, and perspective. No textures. No furniture. No colors. Low temperature (0.2) for consistency.

Pass 2: Take that render and add materials, furniture, and lighting. Use reference images for style. Higher temperature (0.4) for creative freedom.

This was far more stable than any single-pass prompt.

What actually helped

What actually helped wasn’t clever prompts. It was documenting what worked, what didn’t, and asking why.

This article comes from a real project I built, but the lessons apply even if you’re just typing questions into a chat window. No code. No settings. Just curiosity and patience.

Document everything as you go. Not in a polished way. Just messy notes: “Tried X, got Y, expected Z.” That’s it.

After a few weeks, go back and read your notes. Patterns will jump out. You’ll see your recurring mistakes. Experiment, document and analyze.

Why this works better than memorizing tips

Prompt engineering is deeply context-dependent. What works for interior rendering doesn’t directly work for code generation or essay writing. The patterns I found are specific to my use case, my model, and my inputs. Your patterns will be different.

I might be wrong about some explanations in this article. I’m explaining why I think certain things worked, based on my understanding of how these models work. The internal mechanics may be different.

I’m not an ML researcher.I’m a developer who needed to get something working. And this is what actually helped.

I’m currently experimenting with Gemini voice models for my next project. I’m approaching it the same way. If I learn something useful, I’ll probably update this article.

To view or add a comment, sign in

Prompt Engineering as a Debugging Process

Elene S.

Before anything: a short note on terminology

What about temperature, top-K, and all those settings?

The actual problem my prompts were trying to solve

Phase 1: The polite prompt (and why it failed)

Phase 2: Over-correcting with “DO NOT”

Phase 3: The shift to tell it what TO do, very precisely

Recommended by LinkedIn

Phase 4: The role change mattered more than expected

Phase 5: Language mattered more than I expected

Phase 6: Multi-pass beats single-pass, always

What actually helped

Why this works better than memorizing tips

Others also viewed

Ozymandias Engineering

What is Prompt Engineering? — Part 2: Properties of Effective Prompts

Prompt Engineering - Part 3 Chain of Verification Technique

Finite State Machine (FSM) design using SystemVerilog - Mealy vs Moore - Practical trade-offs for Area, Power and Speed.

SIL (Software-in-the-Loop

Build v. Buy: The Bull Case for Hardware

To MBSE or Not To MBSE

Sculpt the Harness, Don't Guess It

I Pull Technology. Then I Push It.

Why Engineering Matters More Than Ever

How to Use Advanced Prompt Engineering for Large Language Models

How Large Language Models Respond to Unexpected Prompts

How Llms Process Language

How LLMs Handle Selective Reading Prompts

Explore content categories

Before anything: a short note on terminology

What about temperature, top-K, and all those settings?

The actual problem my prompts were trying to solve

Phase 1: The polite prompt (and why it failed)

Phase 2: Over-correcting with “DO NOT”

Phase 3: The shift to tell it what TO do, very precisely

Recommended by LinkedIn

Phase 4: The role change mattered more than expected

Phase 5: Language mattered more than I expected

Phase 6: Multi-pass beats single-pass, always

What actually helped

Why this works better than memorizing tips

Others also viewed

Ozymandias Engineering

What is Prompt Engineering? — Part 2: Properties of Effective Prompts

Prompt Engineering - Part 3 Chain of Verification Technique

Finite State Machine (FSM) design using SystemVerilog - Mealy vs Moore - Practical trade-offs for Area, Power and Speed.

SIL (Software-in-the-Loop

Build v. Buy: The Bull Case for Hardware

To MBSE or Not To MBSE

Sculpt the Harness, Don't Guess It

I Pull Technology. Then I Push It.

Why Engineering Matters More Than Ever

Similar topics

How to Use Advanced Prompt Engineering for Large Language Models

How Large Language Models Respond to Unexpected Prompts

How Llms Process Language

How LLMs Handle Selective Reading Prompts

Explore content categories