52). LLM Nodes & Durable Patterns: Function calling and JSON schemas, retries, and guardrails that make AI outputs reliable instead of random

52). LLM Nodes & Durable Patterns: Function calling and JSON schemas, retries, and guardrails that make AI outputs reliable instead of random

LLMs feel unpredictable when you treat them like smart text boxes.

You tweak a prompt, bump a temperature, swap a model… and suddenly:

  • The ad generator stops respecting character limits
  • The brief writer forgets key fields
  • The “structured” JSON answer shows up as a paragraph again

Nobody sees it until it hits production.

The fix isn’t “better prompts.” It’s treating every LLM step as a node in a system with contracts and control loops.

This is the mental model I use:

  • Each LLM step is a node
  • Each node has a bounded job, a typed interface, and tests
  • Four patterns hold it together:

Once you lock those in, you go from “prompt luck” to something you can actually ship and maintain.


What’s an “LLM node”?

Think of an LLM node exactly like you’d think of a service or component:

  • It takes structured input
  • It does one clear task
  • It emits structured output

That means:

  • You can test it with fixtures
  • You can measure it (latency, tokens, error rate)
  • You can version it and roll it back
  • You can swap the implementation without breaking downstream nodes, as long as it still respects the contract

In other words, “LLM node” is less about the model and more about how you wrap it:

  • Define the job
  • Define the input/output shapes
  • Wire retries and guardrails around it
  • Log what happened

The patterns below are just ways of making that concrete.


Pattern 1: Function calling is the backbone

Function calling turns the model from “writer” into “router + argument builder.”

Instead of free-form “do everything” output, the model:

  • Chooses which tool to use
  • Fills in typed arguments
  • Hands that to your system for validation and execution

You get a clean split:

  • Model: pick tools, guess arguments
  • System: validate, run, and sanity-check results

A few rules that keep this sane:

  1. Make tools narrow
  2. Use strong typing on arguments
  3. Validate before execution
  4. Wrap tools in timeouts and circuit breakers
  5. Make calls idempotent or include request IDs
  6. Log the chain

That’s what lets you debug “why did the agent decide to do this?” later.

Function calling isn’t magic. It’s just a clean way to turn LLM intent into typed actions your system can trust.


Pattern 2: JSON schemas make outputs contract-first

Natural language is ambiguous. Contracts aren’t.

If you let a model answer in free text and then try to parse it, you’ll fight edge cases forever.

Better pattern:

  1. Define a JSON schema for the output
  2. Tell the model to respond only with JSON that matches that schema
  3. Validate the output before anything downstream touches it

Example idea (simplified):

{
  "type": "object",
  "additionalProperties": false,
  "required": ["audience", "offer", "channels", "kpis"],
  "properties": {
    "audience": { "type": "string", "minLength": 3 },
    "offer": { "type": "string", "minLength": 3 },
    "channels": {
      "type": "array",
      "minItems": 1,
      "items": {
        "type": "string",
        "enum": ["email", "linkedin", "x", "blog", "ads"]
      }
    },
    "tone": {
      "type": "string",
      "enum": ["direct", "casual", "formal"]
    },
    "kpis": {
      "type": "array",
      "minItems": 1,
      "items": { "type": "string" }
    },
    "constraints": {
      "type": "array",
      "items": { "type": "string" }
    }
  }
}
        

That schema says:

  • No extra keys
  • Audience and offer must be real strings
  • Channels must be chosen from a safe list
  • Tone must be one of three options
  • KPIs and constraints are lists of strings

Now you can:

  • Run the model output through a JSON Schema validator
  • Reject or repair on failure
  • Pass only validated output into other nodes or systems

The difference in practice:

  • Without schema: “it mostly works until it doesn’t”
  • With schema: you know exactly when it’s off, and you can handle it


Pattern 3: Retries that repair, not just repeat

“Just retry it” helps when:

  • The model hit a transient error
  • The API call failed
  • The network glitched

It does nothing when:

  • The output fundamentally doesn’t match your contract
  • The model misunderstood the instructions
  • The output keeps failing schema validation

You want retries that repair. That usually looks like a small loop:

  1. Call the model with your normal prompt
  2. Validate against schema
  3. If valid → done
  4. If not valid:
  5. If it still fails after N attempts:

The repair prompt is simple:

“Here is a JSON schema and an object that failed validation. Fix the object so it passes validation. Return only valid JSON, nothing else.”

Pair that with:

  • A strict JSON Schema validator
  • A max retry count
  • Logging of failure reasons

Now your node isn’t just “retrying and hoping,” it’s actively using the schema to fix its own output.

This pattern is especially powerful when:

  • You’re building briefs, configs, or structured plans
  • The first answer is almost always “close, but not quite”
  • You can’t afford malformed outputs downstream


Pattern 4: Guardrails

Guardrails are the rules around the model, not inside the prompt.

They answer questions like:

  • What inputs do we refuse?
  • What outputs do we block or sanitize?
  • What costs or token budgets are acceptable?
  • When do we stop and ask a human?

Some practical guardrails to consider:

1. Input filters

  • Reject prompts that contain certain patterns (PII, secrets, disallowed topics)
  • Normalize or redact inputs (emails, phone numbers)
  • Enforce length limits and truncate safely

2. Output filters

  • Block outputs with banned content, slurs, or unsafe instructions
  • Strip secrets, identifiers, or internal IDs
  • Run a second “safety classifier” model if needed

3. Cost and token budgets

Per node and per request, enforce:

  • Max prompt tokens
  • Max completion tokens
  • Max tool calls per request

If the node hits a budget:

  • Cut it off
  • Return an explicit “truncated / partial” response
  • Log that as a budget event, not a generic failure

4. Environment constraints

  • Different guardrail settings for dev, staging, prod
  • Different tools allowed per environment
  • Stricter cost limits in prod

5. Human-in-the-loop hooks

  • Any time the node is uncertain (low confidence, repeated failures), emit:
  • Route that to Slack, email, or a review UI
  • Let humans correct and feed that back into your prompts/schemas later

Guardrails don’t make the model perfect. They make the system predictable enough to trust.


Putting It Together: How An “LLM Node” Actually Looks

Take a marketing brief generator node as an example.

Input contract

{
  "type": "object",
  "required": ["product", "audience", "goal"],
  "properties": {
    "product": { "type": "string" },
    "audience": { "type": "string" },
    "goal": { "type": "string" },
    "constraints": {
      "type": "array",
      "items": { "type": "string" }
    }
  }
}
        

LLM node behavior

  1. Validate inputs against input schema
  2. Use function calling / tools if needed (e.g., fetch product details)
  3. Call the model asking for JSON that matches the CampaignBrief schema
  4. Validate the output
  5. If invalid:

Guardrails

  • Limit tokens (e.g., 2k prompt, 1k completion)
  • Block certain channels or KPIs based on compliance rules
  • Send to human review if:

Logging

For each run, record:

  • Input payload (or a redacted version)
  • Model, version, and parameters
  • Tool calls and results
  • Validation results and repair attempts
  • Final JSON output
  • Tokens, latency, and cost

At that point, you don’t have “a prompt.” You have a component that:

  • Can be unit tested
  • Can be regression tested with golden fixtures
  • Can be upgraded and rolled back like any other service


Why These Patterns Hold Up Over Time

Models will change.

Vendors will change.

Your stack will change.

If you rely on “prompt magic,” you’ll constantly chase regressions.

If you rely on:

  • Function calling
  • JSON schemas
  • Retries that repair
  • Guardrails

…you can swap in new models, tools, and backends as long as one thing stays true:

Each LLM node keeps honoring its contract.

That’s what makes the behavior durable.

You’re not betting on one model. You’re betting on the discipline of treating LLM steps as real nodes in a real system.

And once you do that, AI stops feeling like a science experiment and starts feeling like the rest of your engineering work: defined, observable, and fixable when it breaks.

Dino Cajic

To view or add a comment, sign in

More articles by Dino Cajic

Others also viewed

Explore content categories