hypervisor HA

ossama rashad

Published May 10, 2016

In the classic data center, it was necessary to deploy a clustering solution to provide High Availability for critical services, such as email and databases.

However, in the VDC there are High Availability features integrated into the hypervisor layer to provide some, if not all, of the failover capabilities.

In general, hypervisor HA provides a very similar amount of redundancy that an OS cluster did in a classic data center. In the event of a service or server failure, the OS cluster would restart processing on another node in the cluster. The hypervisor performs a similar function by restarting a failed VM on the same hypervisor or another hypervisor in the cluster. However, one component that may not be identical is the ability to detect a service failure.

Can your hypervisor detect that a service, such as Microsoft Exchange, has failed on a VM and trigger a restart of the VM? While most OSs have internal processes to restart failed services, they will typically only do so
for a finite number of times before stopping. The time to restart a VM on another node will vary based on a number of factors, such as the number of services, utilization of the hypervisor, etc. This amount of time may or may not be more than restarting just the services on another cluster node. If your hypervisor has a hot standby feature, where a synchronized copy of the VM is kept on another hypervisor, the failover time is almost instantaneous. However, there may be other restrictions that should be considered in this type of environment.

Deploying an OS cluster within a hypervisor environment is generally a bit more complex, as there are certain disk configurations that are required for clusters. Since the hypervisor typically masks much of that configuration from the VM, special considerations must be taken into account. Also, deploying services into a cluster usually requires additional IP addresses, DNS names, and other network components to allow that service to float between the cluster nodes. The time to restart the services on another node should be considered against the amount of time to restart the entire VM. However, the cluster will natively be able to detect a service failure and trigger a restart.

Examine the maximum number of cluster nodes that the OS supports. You may need to have multiple clusters configured for your services, which can complicate the environment even further. Finally, be aware of how failback is configured, especially if you are using both hypervisor HA and OS Clusters. If the cluster is configured to fail the resource back immediately, then when the VM is restarted, the service will experience another outage to return to normal operating status.

For application clusters, or applications that are configured with redundant components, be sure that the redundant copies are stored on different hypervisors so that a server failure doesn’t impact the entire application. Depending on how the application is configured, you may or may not want to use hypervisor HA as well. If the application has an automated failback mechanism, you may not want to use hypervisor HA, since that could also cause service disruption during the failback process. If the failback can be controlled, then you may want to use Hypervisor HA or some other process to restart
the VM so that if the second node fails, you do not lose the service entirely.

Ahmed Galal Khalifa 9y

Thanks.

1 Reaction

To view or add a comment, sign in

hypervisor HA

ossama rashad

More articles by ossama rashad

Others also viewed

Object Storage

💡 What is Redundancy Level in ADLS?

Object Storage. What is it and whats it used for?

A well Managed Data Center

Modeling Alarm Number on VeeamONE

Data Replication and Partitioning

February 6, 1997: When Storage Stopped Being a Sidecar

RAID - An In-Depth Guide and Its Types

Proof-of-Concept: Azure Data Factory and Private Endpoints

🚀 FortiSIEM Installation in the Data Center (DC): A Step-by-Step Guide

Explore content categories

More articles by ossama rashad

Cloud-native application protection platforms (CNAPPs)

knowledge factor

IT Security that Keeps Up with Business and IT Operations

Cyber-security Strategy and Implementation Plan (CSIP)

Storage Virtualization

Next-Generation Firewall – NGFW “Two”

Next-Generation Firewall – NGFW “One”

Vendor lock-in

Data Center Networking, What's Changed?

Single and multiple hypervisors