Note 7: Designing with Time in Mind: A Practical Reflection for Data Solutions Architects
A personal reflection on how time quietly shapes everything we, architects, design and build is that when I hear the word time in a data context, I notice my mind branches depending on the hat I’m wearing.
For instance, as a data scientist, I used to think of time series : forecasting, seasonality...
As a developer, one may think of Git : version control, rollbacks, diffs, and the history of changes.
As a data engineer, another might think of ACID transactions. Especially the A and I parts (Atomicity and Isolation).
But now, as an architect, I’m beginning to see time somewhat differently! Not as a domain-specific concern, but as a foundational element that touches many decisions we make in data systems.
This isn't a novel insight. We all know time is the fourth dimension. Einstein, physics, even pop culture ; they’ve all said it before.
But what's often missing from our technical discussions about data is an explicit approach to designing for time in modern architectures.
It’s not just about knowing "when" something happened. It’s about how data evolves, when it’s valid, how fresh it needs to be, how infrastructure scales over time, and how our systems respond to that change.
This article is a practical reflection; not on time as a scientific concept (I'm an architect, not a physicist), but as a dimension in data systems.
I’ll walk through how time shows up across architectural roles and what it means to treat it with intention.
In fact, I found it even more interesting to try and gather (as much as I could) moments where time quietly, or sometimes loudly, entangles itself with our architecting process.
In some moments, time dominates the conversation, becoming the main concern, such as when designing real-time event processing architectures, where latency directly shapes business decisions. In other moments, it quietly lingers in the background, as with schema evolution, subtly guiding how we adapt our data models while businesses evolve over months or even years.
As I reflected on different angles, I found the thinking more easy to follow when grouped into four distinct categories. Each captures a different way time shapes our decisions; whether through change tracking, real-time flow, temporal signals in models, or governance needs. This structure helped me make sense of it all, and I hope it helps you too as we explore each in turn.
1. Time as a Layer of Change: Data & Schema Versioning
In this category, I gathered the main moments where time helps us track how things change.
These are the patterns where we care deeply about what something looked like before, not just what it looks like now.
In this I like to think in two different things we would like to version across time.
First is Data, second is Schema.
1.1. Point-in-Time Queries and Time Travel
This is one of the clearest ways time influences how we design data systems. At some point, someone will ask: “What did the data look like back then?”
At first, this might sound like a simple filter; something like WHERE date = '2025-03-01'.
But that kind of query only returns what the data looks like today for that date. It misses any corrections made after the fact, late-arriving records, or deletions that happened later.
And real data life is full of these things:
This is where point-in-time queries go a level deeper (or maybe we should say, further back).
They let you reconstruct the exact state of your tables (data) as they were known at a specific moment.
Two scenarios where this capability becomes essential are :
From architectural point of view, to support this, systems must be built with notion of memory; the ability to retain and restore historical states. That involves:
Many modern platforms build on this foundation by exposing time travel; a user-friendly “rewind” feature. Behind the scenes, this is powered by snapshot storage, append-only logs, or metadata that captures state transitions over time.
This is why some trade-offs should be considered as designing for memory comes with real costs. Main considerations can be Storage overhead on keeping multiple versions of data uses space; query latency for reconstructing historical states may be slower than reading the current table and governance complexity as we manage more metadata, more policies, more edge cases to handle.
But when it’s done right, you give your system a critical superpower: The ability to revisit and verify any moment in the past; almost like hitting Ctrl + Z on the entire data platform!
1.2. Schema Evolution Tracking
Until I trained a churn model on a carefully prepared dataset and later found out the model failed in production because one of the features I used was no longer allowed, I hadn’t really thought about schema versioning. It just happened, quietly, without any notice.
It wasn’t part of the ML tutorials. No one warned me that the "structure" (a.k.a schema) of the data could change. But it did. And it still does. Often.
Let’s face it. Schemas change. New fields get added. Columns get renamed. Definitions shift, sometimes with no notice at all.
It doesn’t just affect ML models. Pipelines break, reports stop working and dashboards light up with errors.
Someone in the data team ends up spending the weekend fixing things.
Just a quick side comment: that's not a failure. It's actually a sign of movement; from a business perspective. If your schema has never changed, your business probably hasn't either!
But back to our real question : is you architecture ready for it?
I always come back to three questions when thinking about this:
If the answer to any of these is no, then we have a design problem!
Well... logging schema changes in a README file or tracking them manually in a spreadsheet is not enough.
So, when it comes to schema evolution, we need:
2. Time as a Flow: Streaming & Scheduling Realities
So far, we’ve mostly talked about time as a way to look back; versioning, history, evolution.
But there’s another face of time. One that deals not with what was, but with what’s happening right now.
In this category, time governs how data flows through a system. It shapes how data arrives, when it gets processed, and how it's grouped and interpreted as it moves.
This is where time feels less like a reference field in a table and more like a driving force in the architecture.
2.1. Event Time vs. Processing Time
One of the first things you learn when working with streaming data is that there’s more than one version of time.
In batch systems, this distinction often goes unnoticed. But in streaming systems; or anything distributed or asynchronous; it matters a lot.
A sensor might send a reading at 10:01, but if your pipeline doesn’t see it until 10:05, how do you treat it? Was it late? Is it out of order? Should it be discarded, reprocessed, or adjusted?
Architecturally, this becomes a decision about:
2.2. Pipeline Timeliness and Scheduling
Not everything is a stream. But even scheduled pipelines carry a concept of time.
Whether it’s an hourly job, a nightly batch, or a daily sync, time defines:
This becomes especially important when you’re designing for SLAs, freshness-sensitive features, or anything “near real-time.”
Here you may start thinking about time as a control mechanism.
You have to think not just about what the pipeline does, but when it does it; and what happens if it doesn’t. And maybe when to re-process it!
Should it reprocess yesterday’s failed run? Should it pick up only today’s data? Should it notify someone, or silently retry?
These are time-sensitive design choices that have nothing to do with the data itself, but everything to do with how users experience it.
3. Time as a Signal: Subtle but Critical in ML Models
When I introduced this article, I mentioned that time doesn’t always show up loudly. Sometimes, it lingers quietly in the background; shaping decisions without ever being named.
This is one of those moments.
In machine learning, time rarely appears as a headline field.
(And no, I’m not referring to time series modeling; there, time is obviously the main character. We already touched on that in the introduction.)
What I’m referring to here is more subtle. The kind of time that creeps in behind the scenes, even when it’s not explicitly part of the model.
Recommended by LinkedIn
This is where data scientists and ML engineers start asking questions like, “Can we still trust these predictions?”
And often, time is at the center of that doubt. (a side note : in linguistic terms, "still" is classified as an aspectual (or temporal) adverb).
Yes, I’m talking about concepts like drift and staleness. They’re tricky to detect. But time, when used intentionally, becomes a signal; one that helps you monitor, realign, and improve your models.
In this section, I’ll highlight three key areas where time plays this quiet but critical role: Model Drift and Staleness, Feature Freshness, and Training vs. Serving Alignment.
3.1. Model Drift and Staleness
Model performance isn’t just about accuracy scores on day one. It’s about staying relevant over time; through changing data, shifting behavior, and business decisions that weren’t part of the original training set.
Yes, we are very skilled in catching the loud failures during development; the ones that throw error codes or break the pipeline.
But there’s another kind of failure that slips by silently.
If you're coming from a programming background, this is the difference between a syntax error and a logic error. The code runs fine, but it doesn’t do what you intended.
Models fail like that too. They make predictions. The pipeline runs. Nothing crashes. But slowly, the model starts making worse and worse decisions.
What changed? Time.
Let’s be honest; your model was trained to predict behavior. But your business doesn’t sit still. You introduce new campaigns, shift strategies, and try to change behavior. In doing that, you change the very patterns the model was built on.
And unless your system is designed to help the model learn from those changes, it starts to drift.
That’s why time matters; not as a data column, but as a marker of relevance.
If you can’t answer questions like:
… then you're not really monitoring your models. You're just watching them age.
3.2. Feature Freshness
Not all model inputs age the same way.
Some features; like a customer’s birthdate or sign-up date are basically timeless. Others, like page views or click counts start losing value within minutes.
And here’s the thing: when those behavioral features go stale, your pipeline won’t complain. Everything runs. Predictions get served. But your accuracy? It quietly slips away.
No errors. No alarms. Just a slow fade (well... at least until you’ve mastered a few MLOps tricks).
This is where time shows up again, not in the data structure, but in what the data means. Because even though we’re still talking about data, here we care about when that data was last refreshed.
Its usefulness isn’t just about content, it’s about timing.
Even time-invariant fields, like a customer’s country or signup date, need to carry a sense of when they were last verified.
So what does this mean for ML architecture?
It means you treat freshness as part of the feature’s identity.
Then monitor what you’ve built:
3.3. Training vs. Serving Alignment
This one can hit you hard; especially when everything looks like it’s working.
I’ve seen it happen to ML teams after they’ve spent weeks trying to convince the business side to trust this “new ML thing.”
And to be fair, skepticism is natural; especially when models don’t speak the same language as classical dashboards or SQL queries.
(Yes, I’m referring to the probabilistic nature of ML versus the deterministic world of SQL; where a question has exactly one answer, and no one asks about confidence intervals.)
And since you’ve probably noticed by now that I enjoy side comments, here’s a slightly nerdy one:
In embedding-space terms, “skepticism” and “probabilistic” both have a strong projection along the same latent “uncertainty” dimension; so their vectors cluster pretty tightly in that region of the semantic manifold.
But just when trust starts to build, the model goes into production… and things start drifting (yes, I use the "drift" tem again).
Something is off. And more often than not, what’s off is time.
Take a financial risk model, for example. You compute a rolling 14-day balance volatility feature. It worked perfectly in training, where all data was clean and synchronized. But in production, if transactions arrive late, your model sees an incomplete picture, and makes a decision based on that.
The model isn’t broken. But it’s living in a different temporal reality than the one it was trained on.
This is what we call training-serving skew. And when that skew is caused by time, whether lag, drift, or inconsistent cut-offs, it’s called temporal skew.
It’s subtle, silent, and dangerous.
Yes, this could easily be framed as an ML engineering problem. But architecture is about designing with foresight; building in the precautions so things don’t fall apart later.
Here are some of the safeguards that should be baked into the architecture:
4. Time in Governance : Trust, Traceability, and Policy
This section might feel less intuitive than the others. Time shows up clearly in streaming systems, pipelines, and ML models.
But governance? It’s not always obvious.
Until, of course, you’ve dealt with it.
If you’ve ever had to revoke access for a user, track what someone could see last quarter, figure out when a dataset became sensitive, or delete records after a legal retention period; you’ve already dealt with time in governance. You just might not have called it that.
So while governance often gets framed as a set of policies, the moment you try to implement those policies, it becomes an architectural concern. And time sits quietly in the middle of it all.
From an architectural perspective, designing for time in governance, mainly, means solving three recurring challenges:
4.1. Versioned Lineage & Provenance
Here architects may focus on use immutable change logs or versioned data stores (e.g. append-only event tables, CDC streams) to capture each write with metadata like who, when, and why.
For that you might design for exposing APIs or query layers that let systems reconstruct the state of a dataset “as of” a specific timestamp.
4.2. Retention & Tiered Lifecycle Management
Some data must be kept for a fixed period. Some must be deleted after that period. And some can be archived based on cost, risk, or policy.
Here the architectural focus could be to "codify" and automate retention policies directly into the platform with some lifecyclerules. Or even automate purging and archival workflows.
4.3. Temporal Access Control & Policy Evaluation
This becomes especially important in large organizations; particularly those that rely on outsourcing or have frequently changing access rights.
Sometimes, the question isn’t “can this person access the data now?” It’s “could they access it at that time?”
You should design for attribute-based access control (ABAC) or policy-as-code engines that evaluate permissions against the request timestamp.
This makes access enforcement time-aware and better reflects the real-world nature of shifting roles, evolving policies, and changing data classifications.
5. Closing Thoughts
Not every system is built around time, but almost every system is shaped by it.
In this reflection, I didn’t aim to provide a framework or checklist; just a way of noticing. Noticing where time sneaks in, where it demands clarity, and where ignoring it creates hidden fragility.
These patterns came from side notes, past decisions, and the kind of questions that surface only after things break; or almost do.
If there’s one thing I took from writing this, it’s that designing with time in mind doesn’t need to be complex. It just needs to be intentional.
And sometimes, that’s enough.