Memory Control Techniques for Self-Optimizing Generative Systems: The Contract Review Use Case

Memory Control Techniques for Self-Optimizing Generative Systems: The Contract Review Use Case

In a self-optimizing generative system for contract review, the goal is to refine the LLM's ability to redline legal documents in a loop. When working on the contract review system, we also realized that getting the system up and running by building the initial task descriptions was a task that required automation.

The architecture supported by the TritonGPT Contract Review Assistant relies on the following self-optimization loop (see Figure 1)

Article content
Figure 1. Evolutionary Memory Management for the Contract Review System: A Self-Optimizing Agent

The self-optimizing process functions with the following knowledge fragments (KFs):

- Task description: A large language model executes initial instructions about the redlining task

- Source Document: the original, unedited legal document (contract)

- Corrected Document: the same legal document redlined by an expert

- Expert instructions: information provided by an expert that contributes to the general description of the task

- Optimization Task: a set of instructions that provide a "rubric" for the agent to evaluate and compare the AI-generated redline with the one created by the expert

- Task Modification: a set of instructions for modifying the initial task instructions to improve the performance of the redlining system and get it closer to the performance exemplified in the expert-generated redlines. 

- Optimization Modification: a meta-level knowledge fragment that represents evolving knowledge about the optimization process itself.

- Generated Edit: the source document redline generated by the generative system.

The memory management question is critical. In the self-optimization system, each loop generates modifications to the task description. Should we retain all new versions or manage them through an optimization process? The same applies to the optimization task itself: should we keep the history of all modifications in short-term or long-term memory, or perhaps discard it?

Our approach to this problem is based on the memory control techniques Vince Kellen PhD discussed in his "Metabolism of Knowledge" post:

Knowledge fragment half-life. Ideally, each knowledge bundle could have a metaknowledge component that identifies a couple of temporal aspects: it's half-life (or the rate at which its value diminishes over time) and its clearance rate (the rate at which it needs to be removed from the AI platform). Today we typically control this by maintaining schedules of ingestion and removal from vector stores' out-of-band' and according to the source of the knowledge fragment. For example, we may choose to manually remove outdated knowledge fragments from the various stores in either the AI platform or in the source systems feeding the AI platform.

Knowledge pathways. With their potential for splitting and recombining via various agents or tools, knowledge fragments may travel through the AI platform and through the organization in complex and unplanned ways, especially with autonomous agents and humans sharing knowledge. A knowledge pathway can be described by following a knowledge fragment through the various pathways within the agentic service, within the AI platform and any of its components, and within the organization. The collection of these pathways, in aggregate, is complex, with many targets of intervention possible to improve the flow and quality of knowledge across the organization.

We implement the following memory management steps in the optimization process:

  • Performance-Driven Half-Life Adjustment: Each "Task Modification" version is associated with performance metrics that semantically explain its contributions: We initially give it a high "activation score". If it leads to sustained performance improvement over multiple redlining loops, its "half-life" is extended, and its "activation score" remains high. If it shows no improvement or degrades performance, or is superseded by a more effective subsequent modification, its "half-life" decays. The system maintains the rate of decay as an agentic introspection process.
  • "Optimization Task" Evolution: The "Optimization Task" itself undergoes a similar process. The critical aspect is to realize that our initial ideas about how the systems should be improved or what the "evaluation rubric" should be might be significantly suboptimal. If only the latest optimization task is used, we might be losing the contributions of the previous ones. The most effective version of the optimization task, which is actively used in the current loop for evaluating performance, might not be the best. We need to retain in the context the older versions with their appropriate "scores". The "half-life" here is controlled by an agentic introspection process based on how well the "Optimization Task" correlates with expert assessment or how efficiently it drives improvement in "Task Modifications."

The above techniques have been well-established in the #AI field for decades. The research community offers numerous papers on evolutionary pruning and decay techniques in the context of temporal management of long-term and short-term memory. The core of managing evolving instructions, such as "Task Modifications" and "Optimization Task" definitions, lies in evolutionary pruning and decay mechanisms. Simply retaining all versions is unsustainable and suboptimal.

In the more recent #genAI context, our approach addresses issues based on task complexity or observed performance. The system needs to dynamically adjust the size or composition of the context window, prioritizing different types of knowledge fragments that are "in play" for the task completion. For instance, if the #LLM struggles with a specific legal concept, more detailed semantic memory related to that concept would be injected or prioritized.

In metabolism of knowledge fragment terms, if the knowledge fragment does not fit as well into the prompting, the knowledge fragment floats about unused until it gets removed.

Competition for the version’s selection (version ‘popularity’ - if versions can be efficiently tagged) can deprioritize prior versions.

To view or add a comment, sign in

More articles by Jack Brzezinski

Others also viewed

Explore content categories