Getting Started with Network Automation
I have been recently been asked by a few individuals how we got started and what lessons we learned along the way. I can tell you it hasn't been an easy road, but with every step we have grown stronger and smarter in how we approach our daily networking work. As the saying goes "anything easy is probably not worth doing!" All too often I hear many people state "we don't have the time" or "it will take too much work to get it done". It's true that we are all very busy with our "regular jobs" to take the time to think outside the box and find better more efficient methods to get that very same work done, but with a little creativity and passion we can all benefit from taking the time to work on this.
We were not much different than any other network team just a few months ago. Over worked, understaffed, and focused on keeping our clients happy and the lights on. The challenge to think about the future much less act on it was daunting one. We knew that expecting a huge influx into the budget for positions dedicated to this, wasn't going to happen so we needed to work with the resources we had. So the first step was to find resources with some time to spend on network automation. We started with a presentation to our management on the benefits of automation and what we can do today to position ourselves to carry this out in the near future. Once we got support it was much easier to announce to the teams and get volunteers to join the effort. I can't emphasize enough that getting the proper support in the early stages will make the transition much easier as time goes on. For us this happened back in June. The rest of this article will cover what steps we recommend you take and the lessons we learned to build a team that can use network automation to greatly improve the services we provide our clients. Note we decided to do this with Ansible but most of the lessons learned to can be applied to whatever tool you decide to use.
As I mentioned, we went with Ansible over let say Salt, Puppet, and Chef due to a few factors. One being that Ansible is agent less and of course dealing with network equipment like switches and routers an application that required an agent just won't work. The other larger factor was that while automation has been going strong for years on the systems side of the house it still a new area for network engineers and we wanted a tool that has made some public commitments to pushing the network agenda to the forefront and not just an after thought. Red Hat has done a great job at dedicating developers to bringing network automation to where in my opinion it belongs, the spotlight! oh yeah and did I mention its free! Whatever you and your team decide make sure its the best solution for you and the services you intend to deploy. Here are a few factors to consider:
- Budget and Cost
- Type of Devices you Support with it
- Type of Services you will deliver with it
- Current Skills/Knowledge on team
The next step may surprise some of you while others may just say "of course!" You need to consider that as a network infrastructure team you are now heading into uncharted waters. You will be for all intents and purposed be developing code. Ansible, for example, is coded in YAML files. You may be using Python and creating Jinja2 templates etc. So this leads to the fact that your engineers will be come to some extent developers as time goes on. Trying to code and run a network automation effort under a waterfall platform is doomed to not only fail but fail badly. You need to look into a Agile and specifically Scrum to carry this effort out to its greatest potential. I won't get into all the details of that transformation in this article but feel free to read my series on transforming an network infrastructure change to Agile. Here is link to first article: Agile Transformation for Network Infrastructure Teams - Part 1. Ideally you can get a Certified Scrum Master or even better an Agile coach to help with this part but worst case look t your existing Project Managers that have an interest in Agile and see if they can take that role and focus on learning and teach agile best practices as you progress. Start small only use Scrum for this network automation effort and use it right away. What I mean is begin a sprint cycle even before you have anything to code. We called it Sprint 0 and it went for 2 weeks helping us practice Scrum while we kept the momentum going as we installed our VM's, Python, Ansible, and any other tools we would need.
In terms of testing and training environments, we went with a virtual box install on each of our laptops running a CentOs VM to run Ansible. This made it easy and quick to get up and running for training purposes. There plenty of instruction out there on the install so I won't bore you with those details but here are a few things to keep in mind and additional software to install during this phase:
- Make sure you make shared folder between your windows side and you CentOS VM. Ideally make that the folder for your GIT repository (no worries I'll get to this)
- Select a good code editor to use across the team. We went with Microsoft Visual Studio Code but there are many great products out there and they are free to install
- Go directly to GIT do not pass go.... well you know how it goes. we made the mistake of thinking we could store Ansible playbooks like regular files early on on later to discover the power of having a GIT Repository. More on later on as it deserves its own section
- Create a "golden image" of your dev environment for your VM. This way as more and more engineers join your cause its easy to get them up and running to start learning right away. You can simply do an image from virtual box or use tools like Vagrant for this.
Now that you are ready to go begin your first true sprint 1 by working on Ansible Playbooks (scripts). The goal at this point is to learn Ansible and Scrum not to develop every solution for every client you may have. So start small. Think of user stories that can be carried out in 2 week or however long you sprint is. Trust me time will fly at first as you learn. Your team will get stuck on a syntax errors for hours before you realize it was missing 1 space. I will tell you Ansible is very syntax sensitive. Watch case, spaces, quotes etc and again a good case for using a good editor with plugins that can help with Jinja2 and Yaml. Make sure the full team takes time to read Ansible documentation as there is much to learn there that will help. We didn't do that right away and would have saved time early on if we had. Here is a link where to start reading: Ansible for Network Automation. That being said you can't learn anything by just reading. Code Code Code until you are crossed eyed! Each error you encounter will teach you more. Expect many hours google researching but all well worth it. Try to think of you regular network engineering job and how you can use network automation to make it easier and quicker. As you learn more about what is possible you will find more and more useful solutions. At each Demo/Retro meeting we took time to think of what new ways we can use Ansible in our daily tasks
Collaboration! I know over used term number 1001 these days but there is no escaping the truth. If we are seeking to perform better and more efficiently we need to not only enable but to push the culture of collaboration. Here are a few items to work on in this area:
- If you are not already, start using a collaboration tool . I'm a huge fan of Slack but if that not possible in your org, Microsoft Teams has a ton of great features and other similar tools work well. This is especially true for teams that are remote from each other
- Pair Development works very well. We did this early on and it accelerated our learning curve in a big way. It seems to work best to pair people in the same location but in time move the pairs around to accelerate learning even more.
- Use the Daily stand-ups to not only provide updates but to engage help from other team members. We solved many issues in the hour after our daily stand-up due to this. Team Members would bring a blocker up and then group with a few others to solve it right after the call.
Last but not least GIT. I promised I would get back to this. I wouldn't say its the most important of all the steps we covered but I will say it's one that as network engineers we don't often think about. We definitely didn't and it hurt us to have to back track in later stages to get our house in order in this regard. Once you are in full swing writing new playbooks and improving on previous versions of old ones you will see how critical it is to use a git repository. Now add on to that a team of 5 or 10 or 20 members all trying to write updates to the same playbook or make new updated versions and you can begin to see where a file server will fall very short. Spend the time early on during that Sprint 0 to setup a git repository. You don't want you proprietary data out on GitHub so that not a solution for most if not all organizations. We went with Bitbucket since our company already had it up and running for the DevOps teams so it was easy for us to just open a request for a repo and go from there. There are other options to look into if this doesn't work for you but make sure you take the time. Now for those that don't know git is not just a file system but a very different way to look at store your code and control it. Learning curve at first won't be easy but once you do it hits you all at once and really isn't bad at all. Also here do the reading but do do do practice over and over again so the command line sticks to you. And please use the CLI and stay away from the GUI. Here is a good place to start reading. GIT Tutorial. Here are a few things to consider and plan as you learn and setup GIT:
- Standard naming convention for files (playbooks and others)
- Folder structure for you git repository. There are some many recommended structure out there
- .gitignore file. setting this up correctly at the beginning will help you avoid storing files in git that aren't needed or at least don't need that level of control
- The use of branches such as a dev branch or sandbox branch. We had many of debate on this and eventually decided on just Master, Dev and Sandbox but still are seeing if that the best approach or if we need another structure. There is a lot of guidance for regular DevOps teams but for the network side its a bit more slim so we will share once we have a best recommendation on this one.
In closing, I want to wish you the best of luck in your journey down the network automation path. We have not only learned a lot in 4 months but we have already begun to see the fruits of our labor helping us in our day to day jobs. We would love to connect to other teams going down this road and help in any way we can. Having a strong network automation community will benefit us all. Please feel free to reach out with any questions you may have and stay tuned for future articles with more hands on practical solutions including sample code as we continue to grow!