Code Blue at 4:07am...

Code Blue at 4:07am...

Setting the scene:

CIX has a policy of monitoring the service supplied to every customer. Our Network Operations Centre (NOC) responds to alerts from this monitoring on a 24x365 basis. During the 10pm to 6am window, our standard operating procedure requires us to wait for fifteen minutes, if the equipment is customer owned, in case the alert is generated by a server reboot while security patches are being applied. In all other cases we react immediately. Each customer has supplied us with a unique escalation plan to respond to an alert. As a result of this process, more than 80% of service affecting issues are reported by our NOC to the customer, before the customer is aware of a problem.

On Sunday 4th June the following story evolved as a result of this process:

4:07am, the CIX Network Operations Centre (NOC) was alerted that two servers monitored for a customer stopped responding to pings. This indicates that the customer services may now be offline. But a visual check of the customer rack did not show any obvious problem.

4:25am, the NOC Commander began following the customer's escalation plan. Each phone number and email address was contacted in sequence but the customer was not reachable. Voicemail was left on phone systems but these were office phones. The final phone number in the escalation plan was an 'out of hours' mobile number at the end of the contact list but this number was no longer in service. As it was a bank holiday weekend here in Ireland, we feared these messages might not be heard until Tuesday.

9:00am, with no response from anybody on the escalation plan contact list, we trawled through email communications with our customer and eventually found a personal mobile phone number of a senior executive. We also left voicemail on that number.

11:00am, we made contact with the senior executive and she advised us that her company was now fully aware of the issue and was mobilising repair technicians to come on site.

14:15pm, two repair technicians arrived on site and identified that the 'top of rack' switch was not functioning. A power cycle recovered the failed switch.

15:00pm, full services were restored. The escalation plan was amended in agreement with the customer to include up to date contact details.

On Tuesday, 6th June, CIX received a thank you message from the company CEO. He explained that they offer an Enterprise SaaS solution predominantly to UK customers. Monday was a bank holiday in Ireland but not in the UK. He was delighted that the problem was rectified before it affected customers on Monday morning. He also asked us to add his personal mobile phone number to the end of our escalation plan as a final point of contact if all other contacts in the escalation plan failed to respond.

Another happy customer!

Great article Jerry. CIX have always been great at communicating outages (both planned and unexpected!) with your customers. I really appreciated that as a small customer of yours. Continued success!

Well done and nicely written

To view or add a comment, sign in

More articles by Jerry Sweeney

  • An Alternative National Broadband Plan

    The Irish Government wishes to implement a plan to deliver fibre to the home (FttH) for every house in Ireland. This is…

    9 Comments
  • CoLocation in Ireland

    Cork Internet eXchange's new data centre (CIX1C) has 240 racks and 2.4MW of power ready to go with no install charges…

    4 Comments
  • RFC 2324 : I am a teapot!

    Networking Specifications are published by the IETF and, for historical reasons, are called 'Request for Comments' or…

    4 Comments
  • Data Centres and Energy

    Here at Cork Internet eXchange (CIX), we are building a 2.5MW extension.

    12 Comments
  • The CloudCIX WannaCry Diaries...

    On 29th April, The CloudCIX SOC noticed a number of IP Addresses scanning CIX Address space on port 445. This port is…

    11 Comments
  • Three pieces of business advice my Dad gave me.

    1) An offer is never an insult. Say thanks for every offer.

    11 Comments
  • Christmas Bandwidth

    Watching bandwidth usage gives a very interesting insight into behavior. The above graph shows traffic on the CIX port…

  • Ooops! CIX attacks China...

    Attack before dawn: At around 3am this morning, a server in CIX started generating around 700Mbps of out bound traffic.…

    12 Comments

Others also viewed

Explore content categories