Defining High availability and Disaster Recovery
High availability (HA) and disaster recovery (DR) are often thought of as synonymous with each other. A highly available infrastructure component or IT system is described as “fault tolerant” or having the ability to “fail over”. An example of high availability at the component level is adding redundant power supplies. At the datacenter level adding dual UPS (A/B power) adds high(er) availability to power systems. To some, this implies the system is resilient enough to survive a disaster. Implementing high availability on its own, however, does not achieve disaster recovery. So what is the difference between High Availability and Disaster Recovery?
Here are a coupleof definitions. IEEE defines high availability as, “…the availability of resources in a computer system, in the wake of component failures in the system.” While the Disaster Recovery Journal defines disaster recovery as, “Resources and activities to re-establish information technology services (including components such as infrastructure, telecommunications, systems, applications and data) at an alternate site following a disruption of IT services”
There are several key differences between the two concepts.
- Disaster recovery includes the use of an alternate site (geographic diversity) not just redundancy at the system or datacenter level.
- Disaster recovery includes a focus on re-establishing services after an incident not just fail over.
- Disaster recovery addresses multiple failures in a datacenter while high availability typically accounts for a single predictable failure (such as failure of a processor, memory or power supply).
- Disaster recovery includes the people and processes necessary to execute recovery while high availability focuses on technology design and implementation.
When you take apart the two definitions, the differences between the two terms become much clearer.
Talking to datacenter managers about infrastructure or applications might bring up the topic of high availability or redundancy when they are really mean disaster recovery. Similarly, end users may talk about adding a “business continuity disaster recovery” solution when they really intend to make a service highly available. More often than not elements of both high availability and disaster recovery are blended in the discussion. If you’re in the role of a service provider listening to requirements and asking clarifying questions it will help identify if geographic diversity is needed and how much down time can be tolerated before a system is restored.
Can Disaster Recovery Include High Availability?
Disaster recovery can, and often does, include high availability in the technology design. Often this configuration takes the form of implementing highly available clustered servers for an application within a production datacenter and having backup hardware in the recovery datacenter. With data from the production server backed up or replicated to the recovery datacenter, systems are both protected from component failures at the production datacenter and can be recovered during a disaster at the recovery data center.
The ultimate combination of high availability and disaster recovery occurs when servers are configured as “active-active” or in a “continental cluster” across geographically diverse datacenters. In this case clustered servers for an IT application reside in two different datacenters connected by a load balancer and a very low latency data connection. Data between the two servers is synchronously replicated and both systems are “active” at the same time. Should one datacenter be impacted by a disaster, the server in the second internet datacenterpicks up the full load of the application and continues on uninterrupted. Add to this the people, processes and documentation necessary to manage and respond to a datacenter incident and the high availability configuration is now incorporated within the disaster recovery program.
The next time you are in a discussion about disaster recovery or high availability, think carefully about what is intended. Both high availability and disaster recovery mitigate risks to IT applications, but only one includes geographic diversity and preparedness for the worst case scenario.
Nitin Mishra heads the product management and solutions engineering functions at Netmagic Solutions. During his nine years with the company, he has been responsible for conceptualizing and packaging hosting and managed services focused on IT infrastructure requirements of Internet and Enterprise applications.