Don’t touch that switch…

Early one morning I was sitting at my admin desk just like I did every day back then when I begin to notice my access to various systems as well as the outside world disappearing for several moments and then reappearing. I give a shout out to see who else within earshot is experiencing the same issues. Everyone responds with similar stories. Several of us quickly head upstairs to the server room and after several minutes of investigation discover that our Cisco 6509 Switches are in distress but we cannot figure out why. The department head is not in the office today and is unreachable so the next guy in line surveys the group for suggestions. While reviewing options a few of the VP’s and C-Levels pay us a visit. You see at this point the entire corporate office, roughly six hundred employees, cannot use ANY of the network resources. No phones, ERP, production systems, building security, you name it, ALL gone! What the hell is going on?! After a few hours the executives decide to shut down the office for the remainder of the day. All of us in IT camp out in the server room and troubleshoot every system in an effort to locate a shred of information as to why we are down. No luck! So later that evening it was decided to reboot the Cisco switches in an effort to start from scratch. Down they went! Once power was restored the switches came up, however now we didn’t have even the intermittent access to things we did prior to the reboot. The I.T. Sky is almost literally falling now!  As it turns out, since the installation a few years prior, many edits had been made to the Cisco’s configs with one colossal mistake being made. NO usable configuration file backup had EVER been made. And what’s worse we had NO onsite staff skilled at crafting layer three configs from scratch. After hours on the phone with various support avenues and several shifts for naps we were able to return to limited functionality by early morning the next day and full functionality a couple of days after that. So what went wrong exactly? As it turns out our switches were doing precisely what they were supposed to do. They were shutting down ports due to a hard loopback issue. And where did the hard loopback come from? One of our very own! One of the IT folks over in Support was not paying attention and plugged an Ethernet cable back into the same switch at his desk. To add insult to injury one day later someone in another IT department made the exact same mistake with a small consumer grade switch under his desk. As a result of these two identical mistakes, several changes were made to documented policies and procedures.  Some of them were… the Switches were configured to isolate and shutdown problem ports instead of producing cascade failures. ALL small switches were banned from the office. All infrastructure equipment is to have routine automatic configuration file backups performed with human verification and sign off on all configuration changes as well as offsite storage of said backups!

Who says IT is not exciting?

To view or add a comment, sign in

More articles by Sam Rose

  • Customer Self-service?

    In industries where service is paramount be aware that on some occasions the customer will try to help. In this…

    1 Comment
  • Under Fire Decision Making…

    Early in my career I was the entire IT department for a midsize public relations and Motorsports marketing enterprise…

  • I have a feeling…

    Back in the early nineties I was at my desk waiting on a colleague when I heard my boss from down the hall strongly…

  • Time really is money…

    Back in the early nineties when I was a young software/systems developer for a small systems house, NOT a cable jockey,…

Others also viewed

Explore content categories