The Evolution from the Computer Room to Data Center Clusters

Data center strategies have come a long way from where they began. In the beginning, most companies would have just one “Data Center” and did not even think twice about needing a second. The “Data Center” was not even called a Data Center; it was called the “Computer Room”.  Most companies did not have the funds or the vision to dedicate an entire building to be a Data Center. If the computer (yes singular) went down, it was a minor inconvenience, today when the computers go down; not only is your business impacted; your credibility is at risk.

As time went on and the dependency on computers increased, the need to have a backup plan gained traction. The first step was to protect the data beyond the local tape backups. Offsite data vaults started to pop up. That was a good start, however it was soon identified that just having the data in a safe place was not enough. You needed a place to process the data should your computer room become inoperable. Even with the new found attention, the cost to have a dedicated backup to the computer room was too cost prohibitive for most companies. This was a boon for companies like IBM and Sungard Availability Services. They offered companies the ability to use their network of data centers to be the backup site.

For those companies that had the need to go this route, the Recover Point Object (RPO: is the age of the data that must be recovered from the backup storage for normal operations to resume, aka Data Loss) and the Recovery Time Object (RTO: the amount of time it would take to get the systems back on line) were measured in days.  Remembering the technology of the time, trucks not communication lines were used to transport the tapes to the recovery site.

The reliance on computers continued to grow resulting in the need to improve both the RTO and the RPO. In order to accomplish this, the delivery method needed to get faster, much faster. Enter the age of asynchronous data replication.

Depending on the distance between the sites and the speed of the data circuits, the RPO could be reduced to under a minute. The RTO would depend on the state of the computers. Remember that they are still sitting at a managed site. The cost was dependent on the level of service in your contract (by the way, this model is very similar to the current Cloud Providers … I will save writing about that for the future). If the infrastructure is shared with other customers, then you would have to first notify the managed service provider, and if the servers you signed up for were available for use, then you would be given permission to proceed. The first step would be to get your configuration and data loaded. Depending on the complexity and size of your environment, this could still take days before you reach system restoration. If the servers were not available for your use (someone asked for them before you did), then you would have to scramble to come up with a solution. This led companies to make the decision between having dedicated servers at the managed site or to build out their own disaster recovery capabilities.

This brings us to about the late 1980s and early 1990s, long before the likes of Amazon and Rackspace. My personal experience showed that we were able to switch to a self-recovery model for much less money with significantly better recovery capabilities.

The two data center model using asynchronous data replication is a very robust solution as long as the network and servers are dedicated for your use. By eliminating the time needed to get your configuration online, you can achieve very good RTO and RPO numbers. That said, the RPO will always be greater than zero with this model and you will need to have a process in place to recover the “lost” data / transactions.  

As the reliance on computers continued to grow and the tolerance for downtime shrank, businesses were requiring more than Disaster Recovery; they needed Business Continuity. Enter Synchronous Data Replication and Active / Active Server configurations. In order to accomplish this, you needed to first understand your business requirements. For companies with high performance needs, the data centers would need to be within 25 miles to achieve synchronous data replication with minimal impact to performance. This close proximity of the data centers did raise some eyebrows of concern. Should there be a localized incident, both data centers could be impacted. Depending on the business requirements and operating model, this leads to two different data center strategies to provide Business Continuity and Disaster Recovery.  The choices for the highest data center availability were either a two site with synchronous data replication or a three site configuration where two of them are using synchronous data replication and the third site is using asynchronous data replication.

As time went on, the businesses that needed the highest levels of availability had plenty of experience under their belts with the configuration that they chose. Both had a common weakness, the network that supported the replication between them. Even with private fiber connecting them and multiple levels of redundancies, they were exposed to interruption. Circuits would go down for a multitude of reasons and you would find yourself without redundancy; or completely isolated from the replication site.

A new paradigm was needed, one that removes the weakest link … the replication network. One way to accomplish this would be to build two logical data centers within a single physical data center cluster. [A data center cluster is either multiple buildings on the same campus or independent / isolated data halls within the same building. You could technically put both logical infrastructures in a single data hall, however if you need this level of redundancy, spend the extra money to have the physical separation so a single event like a fire does not take out both logical environments.] Then connect that site with an out of region data center with the same capabilities.

If you already have a three physical site configuration, you will see significant savings should you decide to go to this new configuration. Those savings would go a long way to funding the build out of the enhanced solution. Being able to recovery from multiple single component failures without having to declare a disaster is a significant improvement to the data center models that rely on an alternate physical site to recover. 

You have accomplished two very critical needs, protecting your ability to run your business as well as your reputation.

Great read, summarizing the road for those of who have travelled it. Love it. Nice job.

Like
Reply

Excellent summary of how the evolution of computing improved over time to allow for the highest availability in supporting your most critical applications for your clients and your business operations which is essential in today's world of computing. Thanks, Michael Rudge

Like
Reply

Mike, Excellent article which nicely ties together the business and technical considerations which should be examined. Furthermore, the article has credibility as it describes a successful real world private cloud. Thanks, Jay Strand

Like
Reply

Wow, I remember that lifecycle- and it will only get more complex, necessary and vital in the future.. good read- thanks

Like
Reply

Great article Mike! Just curious about how you feel about the centerCore solutions for expanding or placing the clusters, since the costs associated with the expansions as well as the time to build are lower and quicker than the traditional separate building/center concepts.

Like
Reply

To view or add a comment, sign in

More articles by Mike Sullivan

Others also viewed

Explore content categories