blog

What is Disaster Recovery-as-a-Service (DRaaS) and why is it important?

July 20, 2022

Extensive security and business continuity strategies including a Disaster Recovery-as-a-Service (DRaaS) solution, are crucial for modern business survival. In today’s always-on, information-driven organizations, IT resilience depends on IT infrastructures that are available 24/7. The costs of downtime are huge and data loss can put a company out of business. Data loss is not only caused by natural disasters, power outages, hardware failure and user errors, but more and more often by software problems and cybersecurity-related disasters

The potential implications of disaster-related IT failures are far-reaching and can range from breached medical records to legal repercussions, privacy leaks, lost revenue and damage to brand value. For all these reasons, any business reliant on IT for any part of its ongoing operations needs to have an up-to-date, regularly tested disaster recovery plan.

Similar to selecting a cloud service provider, deploying an effective DRaaS solution should begin with a solid understanding of which business and technical goals are driving the need for a recovery plan. This means starting with a business impact study conducted in house by an auditing group or, if offered, by the chosen DRaaS provider itself. This research should be undertaken in cooperation with the business leaders or business units themselves—not just the various IT departments—to classify the processes and systems most vital to the success of the company. These classifications will be based on agreement regarding the acceptable amount of downtime and data loss recovery period for these systems. The goal is to understand not only the impact to the business processes, but the expectations of the people responsible for each distinct aspect of the identified business operations.

It is important during this assessment to ensure that all parties clearly understand the difference between the concepts of recovery time objectives (RTO) and recovery point objectives (RPO) as well.

RTO and RPO
RTO: The time that an application or business process is unavailable. For revenue-generating activities, this is easily measured in terms of lost income such as sales per hour of downtime. For operational activities, such as manufacturing production, the measurement is often calculated in terms of lost productivity, the write-off of perishable product or SLA penalties.

RPO: How much data is lost after implementing recovery from a backup or during failover to a secondary site. This is measured most simply in terms of the value of any lost data, such as information collected about purchasing, inventory or lost sales. There are often secondary costs to data loss as well, including compliance penalties, public relations expenses and loss of brand value, that can easily exceed the value of the data itself.

At first pass, many business leaders will focus most of their energy on the amount of time a given system, process or application can be offline before being recovered, often insisting that even a minor disruption represents a catastrophic loss of revenue or productivity. This downtime window is referred to as the recovery time objective and generates a lot of interest because it is the impact most easily perceived by people trying to get work done. Days, hours, minutes or even seconds of downtime will translate into lost revenue, contractual penalties or loss of return on investment due to missed milestones in key projects. Consider a business reliant on online or mobile sales transactions such as a large retail enterprise. For a company like this, even a single minute of its sales site being offline translates into hundreds of thousands or even millions of dollars in lost revenue.

Systems with a low tolerance for downtime need to be classified as such, with effective process recovery plans in place. These plans can include straightforward solutions such as the ability to reroute workstreams to warm or hot disaster recovery sites or implementing load-balanced, geographically distributed access to ensure that workers and customers are not left unable to transact business. This is important not only in the case of a disaster or malicious acts but also for more mundane things like system maintenance, software patches and infrastructure updates to ensure that this routine, behind-the-scenes work is minimally disruptive to the flow of business operations.

However, while the RTO is very important, the recovery point objective cannot be overlooked or underestimated. It is here that many businesses can be truly lost as they deal with far-reaching implications not initially understood at the time of a system outage. The RPO defines the amount of data that can be sacrificed in order to secure a clean recovery of business operations. A daily backup scheme, for example, might mean that up to 24 hours of data could be entirely lost. This affects everything from supply chain tracking to order and manufacturing processes to financial operations. It is not difficult to imagine the impact of a day’s worth of customer credit card transactions lost within an online ordering system that had to be restored to a recovery point a day old. These transactions would likely have already been processed against customers’ banks, but the business would be left without vital information like the customer’s name, what they ordered and where the item should be delivered.

Recovery time can inconvenience a business and even cause significant financial impact, but poorly planned recovery points can completely cripple a company. In addition to lost revenue and decreased productivity, lost data can also create financial or legal liability, regulatory compliance penalties and diminished confidence in the business. Ensuring that business leadership not only understands the terminology surrounding RTO and RPO, but also completely thinks through the business implications of each, is a critical first step in developing an impact study and classifying each workstream.

Additionally, the applications and processes identified as business or mission-critical during the initial assessment need to be considered in relation to each other and dependencies on other IT systems. Many core applications are heavily reliant on a variety of systems, tools and even legacy applications that have not been effectively modernized. These ancillary systems are used for a variety of easily overlooked, but nonetheless critical, processes such as login and user authentication, data storage and retrieval and network access. Effective recovery from a disaster means considering all of these dependencies and implementing a plan that is broad-reaching enough to cover them all, yet flexible enough to adapt as the flow of information changes over time.

Think again of that online consumer transaction. An order on the website flows from a web front-end through a customer relationship management (CRM) database and transaction processing application to an order management utility. That, in turn, is subsequently tied into a supply chain or inventory control system, possibly into a manufacturing resource manager and eventually into the back-end accounting and finance systems. At each step of this transaction, the various applications that move an order through the business have infrastructure dependencies, access controls and data collection and archiving processes that are tied together.

These cross-dependencies need to be fully explored and categorized so that keystone systems and infrastructure are not categorized lower than the more obvious, mission-critical applications they support. Thus, it is important to think in terms of the entire business process, not just single systems or applications. A traditional backup application is typically focused on a specific data set tied to a specific application and may not take into account the spectrum of different systems relying on that data. Application failover or clustering functions may help ensure that a core application stays up and accessible but may break key upstream or downstream dependencies in the event that failover occurs, unless scrupulously planned and implemented.

DRaaS can help safeguard the entire workstream from start to finish. Unlike traditional backup utilities, a disaster recovery service is built around an entire business process that takes into account cross-dependencies, performance requirements, and the implications of both downtime and recovery points.

 

Get in touch with our disaster recovery experts today to learn more about DRaaS solutions for your business.

Complete the form to sign up for our blog.