Memory Control Techniques for Self-Optimizing Generative Systems: The Contract Review Use Case

Jack Brzezinski

Published Jul 8, 2025

In a self-optimizing generative system for contract review, the goal is to refine the LLM's ability to redline legal documents in a loop. When working on the contract review system, we also realized that getting the system up and running by building the initial task descriptions was a task that required automation.

The architecture supported by the TritonGPT Contract Review Assistant relies on the following self-optimization loop (see Figure 1)

Article content — Figure 1. Evolutionary Memory Management for the Contract Review System: A Self-Optimizing Agent

The self-optimizing process functions with the following knowledge fragments (KFs):

- Task description: A large language model executes initial instructions about the redlining task

- Source Document: the original, unedited legal document (contract)

- Corrected Document: the same legal document redlined by an expert

- Expert instructions: information provided by an expert that contributes to the general description of the task

- Optimization Task: a set of instructions that provide a "rubric" for the agent to evaluate and compare the AI-generated redline with the one created by the expert

- Task Modification: a set of instructions for modifying the initial task instructions to improve the performance of the redlining system and get it closer to the performance exemplified in the expert-generated redlines.

Recommended by LinkedIn

Bridging the Gap: Automating Zero-Knowledge…

CertiK 3 months ago

my2cents - 14 - Reformer: The Efficient Transformer

Vasilis Kalyvas 1 month ago

Doubt as a Fundamental Component in Cognitive…

Abhay Singh 1 year ago

- Optimization Modification: a meta-level knowledge fragment that represents evolving knowledge about the optimization process itself.

- Generated Edit: the source document redline generated by the generative system.

The memory management question is critical. In the self-optimization system, each loop generates modifications to the task description. Should we retain all new versions or manage them through an optimization process? The same applies to the optimization task itself: should we keep the history of all modifications in short-term or long-term memory, or perhaps discard it?

Our approach to this problem is based on the memory control techniques Vince Kellen PhD discussed in his "Metabolism of Knowledge" post:

Knowledge fragment half-life. Ideally, each knowledge bundle could have a metaknowledge component that identifies a couple of temporal aspects: it's half-life (or the rate at which its value diminishes over time) and its clearance rate (the rate at which it needs to be removed from the AI platform). Today we typically control this by maintaining schedules of ingestion and removal from vector stores' out-of-band' and according to the source of the knowledge fragment. For example, we may choose to manually remove outdated knowledge fragments from the various stores in either the AI platform or in the source systems feeding the AI platform.

Knowledge pathways. With their potential for splitting and recombining via various agents or tools, knowledge fragments may travel through the AI platform and through the organization in complex and unplanned ways, especially with autonomous agents and humans sharing knowledge. A knowledge pathway can be described by following a knowledge fragment through the various pathways within the agentic service, within the AI platform and any of its components, and within the organization. The collection of these pathways, in aggregate, is complex, with many targets of intervention possible to improve the flow and quality of knowledge across the organization.

We implement the following memory management steps in the optimization process:

Performance-Driven Half-Life Adjustment: Each "Task Modification" version is associated with performance metrics that semantically explain its contributions: We initially give it a high "activation score". If it leads to sustained performance improvement over multiple redlining loops, its "half-life" is extended, and its "activation score" remains high. If it shows no improvement or degrades performance, or is superseded by a more effective subsequent modification, its "half-life" decays. The system maintains the rate of decay as an agentic introspection process.
"Optimization Task" Evolution: The "Optimization Task" itself undergoes a similar process. The critical aspect is to realize that our initial ideas about how the systems should be improved or what the "evaluation rubric" should be might be significantly suboptimal. If only the latest optimization task is used, we might be losing the contributions of the previous ones. The most effective version of the optimization task, which is actively used in the current loop for evaluating performance, might not be the best. We need to retain in the context the older versions with their appropriate "scores". The "half-life" here is controlled by an agentic introspection process based on how well the "Optimization Task" correlates with expert assessment or how efficiently it drives improvement in "Task Modifications."

The above techniques have been well-established in the #AI field for decades. The research community offers numerous papers on evolutionary pruning and decay techniques in the context of temporal management of long-term and short-term memory. The core of managing evolving instructions, such as "Task Modifications" and "Optimization Task" definitions, lies in evolutionary pruning and decay mechanisms. Simply retaining all versions is unsustainable and suboptimal.

In the more recent #genAI context, our approach addresses issues based on task complexity or observed performance. The system needs to dynamically adjust the size or composition of the context window, prioritizing different types of knowledge fragments that are "in play" for the task completion. For instance, if the #LLM struggles with a specific legal concept, more detailed semantic memory related to that concept would be injected or prioritized.

Duc Haba 10mo

Interesting read🤠

1 Reaction

Vince Kellen PhD 10mo

In metabolism of knowledge fragment terms, if the knowledge fragment does not fit as well into the prompting, the knowledge fragment floats about unused until it gets removed.

1 Reaction

Vince Kellen PhD 10mo

Competition for the version’s selection (version ‘popularity’ - if versions can be efficiently tagged) can deprioritize prior versions.

Memory Control Techniques for Self-Optimizing Generative Systems: The Contract Review Use Case

Jack Brzezinski

Recommended by LinkedIn

More articles by Jack Brzezinski

Others also viewed

AOKI™: Verifiable Computation as Missing Infrastructure

Exploring Bit Population Count in Tiny Tapeout 3

The New Feature in Claude Opus 4.5 That Can Cut Your LLM Costs by 50-80%

The "Translation Layer": How I Used AI to Kill My Technical Debt

Beyond Prompting — Building the Context Stack

Building a VM with Native ZK Proof Generation in Rust

Navigating Security in the Age of AI Abstraction

AI Agents Memory: Why Files Feel Right, and Where They Break

Agentic Reflexes — When Systems Don’t Wait for Instructions

DFA Minimization techniques

Explore content categories

Recommended by LinkedIn

More articles by Jack Brzezinski

Orchestrating Knowledge Neogenesis with Recursive Search Agents

Agentic Self-improvement Dimensionalities

Cellular Processes as a Digital Twin for Knowledge-Bundle Construction in Agentic AI Systems

Geometrical Realization of Neuromodulation Concepts for Optimizing Agentic AI Memory Topology

Enhancing Gen AI Reasoning Agents with Memory Access Planning

Semantic Tensor Transformation Memory Framework for Generative AI

Tensor Representation for Generative AI Memory Management for System 2 Architectures

Extending Generative AI Inference with Metacognitive Machine Learning

The Business Value of Increasing Query Capabilities in Generative Text-To-SQL Systems

Engineering Context for Generative AI Behaviors with Logic

Others also viewed

AOKI™: Verifiable Computation as Missing Infrastructure

Exploring Bit Population Count in Tiny Tapeout 3

The New Feature in Claude Opus 4.5 That Can Cut Your LLM Costs by 50-80%

The "Translation Layer": How I Used AI to Kill My Technical Debt

Beyond Prompting — Building the Context Stack

Building a VM with Native ZK Proof Generation in Rust

Navigating Security in the Age of AI Abstraction

AI Agents Memory: Why Files Feel Right, and Where They Break

Agentic Reflexes — When Systems Don’t Wait for Instructions

DFA Minimization techniques

Similar topics

Quantization Techniques for Long Context LLMs

Optimizing Large Language Model Planning with Dynamic Belief Updates

Building Task Flows with Branching and Looping in LLMs

Explore content categories