Elementary, My Dear Network
This article builds upon concepts defined in previous articles in the series. I do recommend reading them in order.
Scope of Network Automation
Despite a network operating as a single autonomous system, it consists of numerous smaller subsystems, each requiring specific connections and configurations to achieve the desired overall network behaviour. Individual systems, such as routers, switches and transponders, manage their own configurations and maintain their own state data. Part of network automation is reconciling the individual systems to the greater network services. To differentiate these, we will use two terms, "Single Element" and "Network" to distinguish.
Single Element
Focusing on single elements means examining individual components of the network. This could be a specific switch, router, or firewall and understanding its configuration, performance, and issues. If you look at a single device, you might see operational alarms which point to problems with that device like heat, CPU or RAM issues. For that example we could see that it is single element operational reality.
Network
Network-wide management addresses scenarios where multiple elements collaborate to deliver services. It encompasses overall topology, end-to-end performance, and the integration of network segments to ensure seamless communication and service delivery.
A basic network level truth is a circuit. A circuit will by definition include at least two elements working together to provide a connection. This circuit may comprise multiple hops. When we are considering how to configure our network, we need to think about not just the configuration of each side, but what this larger network service is and if it is consistently deployed.
How to differentiate?
If we are designing an automation system, then how should we differentiate between these two possibilities. First let us consider the concept of configuration.
Configuration in our model has the following dimensions:
If a request for change is an immutable source of intent truth, then we can consider (4) as the sum of all requests for change to that particular service. Those change requests must fill some sort of format, typically from an order system in order to be effectively related to actual configuration. Our service (3) is effectively an aggregate of the ordered sum of the requests. For example type (4) above might be recorded as:
We can see how a service would aggregate these, hiding old information, displaying what is current. So for each of our time periods above (1, 2, 3) we would have the network service configuration of:
Each of these states will map to a set of intended configurations for actual devices, which is point (2).
Finally we get to the actual configuration on the devices. This should be exactly the same as (2) unless a change is in progress. If a change is ongoing, for example we are going from step 2 to step three (remove VLAN from device B), then the config for device B as well as potentially other devices which have trunk ports which are solely used to provide that VLAN connectivity to B. So we could model it in the following way:
VLAN Intent + Intent record (Remove device B from VLAN) -> VLAN Intent prime (this is our new intent). In the real world the intent record might be an HTTP DELETE on /vlan/acme/device/b. The DELETE does not need to specify the entire VLAN definition, just the change to the definition that is requested.
Recommended by LinkedIn
VLAN Intent prime + Device C config -> Device C config intent prime, this will be our new device C config.
Device C config intent prime -> Device C config prime, this is the process of actually applying the calculated config to the device.
How to get there?
This diagram illustrates an idealized version of network automation, where all services are fully defined, and declarative mappings between services and configurations are clearly established. However, in real-world automation projects, especially the successful ones, the focus often shifts to more practical, short-term, and achievable goals.
To achieve this, the existing device configuration can be used as a data source to inform and shape the new configuration. This approach closely mirrors how network configurations have traditionally been managed by engineers. You start with a clear definition of what the customer (internal or external) needs and implement changes by referencing the current configuration on the device. The goal is to ensure the new service functions as intended without disrupting existing services or functionality. Achieving this involves carefully assessing the scope of change, identifying which configuration components are dedicated to a specific service, and distinguishing those shared across multiple services.
As automation systems evolve to become more authoritative, the reliance on the current configuration as a reference point diminishes. Instead, the focus shifts toward applying a new service based solely on its declarative intent. In this model, the final device configuration is derived as a perfect declaration, representing the aggregate of all relevant service states. This approach eliminates the need to inspect or depend on existing configurations, ensuring consistency and alignment with the intended state of the network.
In this authoritative model, we can demonstrate that the network is operating as intended by validating the following:
The advantage is that all of these are testable. If we are rigorous in our definition of change control and testing, then we can say with a high degree of certainty that what is running on our network is exactly what we intend.
Closing
This was a deep dive! If you've made it this far, it’s clear that you are interested in this subject. I’d love to hear your thoughts, feedback, or any contributions you’d like to make. Let’s keep the conversation going—reach out, and let’s explore these ideas further.
In the next article, I delve into the concept of time as it relates to network automation. This is an emerging focus, particularly in GitOps-centric projects like OpsMill and Kubenet. I believe time is the critical dimension that differentiates successful automation strategies, as even the most accurate declarative truths are ineffective without the ability to map intent across time. It can be found here.
Finally, I want to extend my gratitude to Wim Henderickx and Brad Zellefrow for their invaluable insights and ideas, which significantly influenced this article.
Thank you for reading, and I hope you’ll join me for the next instalment!
James Henderson great article and I completely agree on the need to have network-wide declarative models to define services and operate networks. Have a look at avd.arista.com, which has transformed many customers to leverage Git and CI/CD pipelines to operate their networks!
Having worked with James on this it is also important to point out that the keeping track of the relationships at all times is important. If you can track from the cli all the way up to the original intent the documentation, maintenance and troubleshooting becomes much easier. Combing that with a historical timeline we will now be able to track what changes happened when and why.
Great write up, on a very important topic it's interesting to see how your are presenting it. Last week at Autocon2 I presented something along the same line during the Data modeling workshop on Monday The language is slightly different but I think the idea is the same. Slides 83 to 87 in this presentation https://speakerdeck.com/dgarros/autocon2-workshop-data-modeling?slide=83 this is great to see more articles on this topic, I feel like the market isn't really thinking about it the same way yet