This is part one of a three-part series explores disaster-recovery options, from models to environments to public cloud offerings. Are you ready?
This three-part series explores disaster-recovery options, from models to environments to public cloud offerings. Are you ready?
In technological terms, you can define a disaster as any unplanned event that impacts your business’s continuity and stability. People often think of natural disasters such as earthquakes, floods, and storms. But a disaster could also take any of the following forms:
Building a solid disaster recovery (DR) solution is not just about meeting your customer or compliance requirements. The steps you take to plan and build a suitable DR solution help ensure your business’s stability and safeguard your reputation as a business as well.
Extended outages to critical systems can result in the following consequences:
You should build an effective DR plan on an effective set of policies, processes, tools, and documentation. The plan should also focus on restoring your critical data and applications in the event of a disaster. Critical steps in your plan should include:
Recovery time objective and recovery point objective are two key metrics to consider when planning and designing your DR solution.
Typically, lower RTO and RPO requirements result in higher costs. In other words, the quicker you need to bring your systems back online, the higher the cost for the required components and services. For example, your online customer-facing applications typically need to be recovered very quickly, and so would have a lower RTO and RPO.
The DR model you select should be based on your recovery goals. For example, a system that stores compliance-related historical data would probably not need quick access to data. In that case, a high RTO and a cold DR model, based on backup and restore, might be appropriate.
Following are common models used in successful DR strategies:
This DR approach is simple, inexpensive, and suitable for non-critical systems with high RTO and RPO requirements. For example, if your system is non-critical and can tolerate an outage of up to 24 hours, a daily backup to tape or disk may be sufficient to meet your recovery needs.
In this approach, a scaled-down copy of your infrastructure is running simultaneously. Database data is replicated from your primary to secondary sites in real-time, while your web and application tiers are switched off and used only during DR failover or testing.
Unlike a cold approach, the core elements, such as the database, are already configured and running in your secondary site. The application and web servers are not running, but templates and machine images are replicated to the DR site. In a DR situation, you can quickly provision an environment, which includes the required core components. These can then be scaled up to handle production load through automation and auto-scaling.
Both your database and application tier data are continuously replicated to a secondary site and can be designed to failover automatically. It builds on the pilot light model by ensuring that the secondary site has a fully functional set of components. All components are running and ready to go, but at a reduced scale compared to the primary production site. In a DR situation, you can set up the secondary site to automatically scale up to handle production load. You can route user traffic automatically by using suitable load balancer solutions, such as Route 53 in AWS® or Azure® Traffic Manager.
The warm model usually costs less than the hot model described next, but your site might experience some downtime while the secondary site comes online. The warm or active-passive model is a good compromise between cost and DR capabilities as aggressive RTO and RPO requirements can still be met at a reduced cost.
This model is the most expensive and consists of a fully redundant copy of your production environment at a secondary site. Both sites are active and share the day-to-day production load. In a disaster situation, the secondary site takes over and is sized sufficiently to handle normal production loads.
Check out Part Two of this series on disaster preparation.
Use the Feedback tab to make any comments or ask questions. You can also click Sales Chat to chat now and start the conversation.