DR 101: RPO Meaning, Applications, and Drivers (Updated in September 2024)
Overview
Data has become as important a resource as infrastructure and physical assets are to organizations. Data loss can cost an organization thousands or even millions of dollars. To minimize data loss, organizations turn to disaster recovery solutions, which focus on recovery point objectives recovering as much data as possible during a disaster. In this blog, we’ll discuss recovery point objectives in depth, but we can begin with a very simple definition:
A recovery point objective (RPO) is the point in time you would like to restore to in the event of a disaster.
But there is more to it. Let’s investigate the meaning of RPO and how it adds value to a disaster recovery plan.
What Does RPO Mean? A Goal for Minimizing Data Loss
An RPO is, as the part “objective” implies, a goal for having minimal data loss in a disaster scenario. It is defined by a service-level agreement (SLA), which exists for internal or external customers of key data and systems. An RPO may also be defined by regulatory requirements for certain industries and governmental organizations.
RPOs are one of the measurements of disaster recovery effectiveness. While an RPO goal may be defined in your disaster recovery plans, the RPO you can achieve is determined by the disaster recovery tools you have in place and your ability to use them effectively. RPOs must be measured frequently to ensure you are meeting or surpassing your goals so that when disaster strikes, data can be recovered with an acceptable amount of loss.
→ Learn about the difference between RPO and RTO
Measuring the Amount of Acceptable Loss
RPOs are expressed as units of time, from days to minutes or seconds. Rather than measuring data itself, RPOs measure the time between the moment of data loss and the last point in time from which data can be recovered. Here’s a scenario:
ACME Corporation backs up their data every 12 hours, at 6 a.m. and 6 p.m. daily. If ACME Corporation experienced a disaster in which data was lost at 2 p.m., then the nearest point in time from which they could recover—the 6 a.m. backup—would be 8 hours prior to the disaster. If they recovered to that point, they would lose 8 hours of data. However, this does not mean that ACME Corporation has an RPO of 8 hours.
If the disaster instead hit at 5:59 p.m., just before the next backup was scheduled, the nearest recovery point would still be 6 a.m. Now the data loss becomes 11 hours and 59 minutes or, rounding up, 12 hours. A 12-hour RPO is the minimum ACME’s disaster recovery plan can achieve because the two recovery points are 12 hours apart. Even this measurement assumes that the nearest recovery point is reliable—if for some reason it isn’t, then an earlier recovery point must be used, increasing the RPO further.
The amount of data that ACME Corporation might lose over those 12 hours is variable based on the amount of data they are creating during that time. In our modern digital world, the value of data can vary, but the business impact of data loss does not. Data loss causes lost productivity and intellectual property, damaged reputation, and even regulatory fines. If ACME Corporation determines that 12 hours of data loss is acceptable, then this disaster recovery plan is good for them.
Tiering RPOs for Different Data
The reality is that no organization can afford 12 hours of business-critical data loss. Instead, organizations are looking to solutions that can provide an RPO of minutes or even seconds for their most critical data.
For less critical data, such as data on internal test/dev systems, a different RPO that defines a longer period might be acceptable. Therefore, many organizations tier data based on business criticality: a lower RPO (with time measured in minutes or seconds) for some data and a higher RPO (with time measured in hours or even days) on other data. By defining multiple RPOs, organizations can save money with low-cost backup solutions for less critical data and low-RPO disaster recovery solutions for business-critical data.
Main Drivers for RPOs
The RPOs you can achieve will mostly be determined by the technology solutions you have or plan to have in place for disaster recovery and backup. Bandwidth and quality of service (QoS) may also play a role, especially if you want to achieve aggressive RPOs. Here’s how to achieve or surpass your RPOs.
Your Technology Solution
There are many kinds of backup and disaster recovery solutions to choose from, and each one of them achieves RPOs differently.
• Backups
A traditional backup solution, even one based on snapshot technologies, periodically creates different backups of applications, data, and even entire virtual machines. Although snapshots themselves can theoretically be taken frequently, they are usually taken every few hours because taking snapshots negatively impacts performance. For this reason, backup solutions are really only suitable for long-term backup retention or protecting low-tier data in a disaster scenario.
• Replication
Replication solutions can achieve low RPOs through synchronous replication. Synchronous replication is very limited in geographic range and cost-prohibitive to achieve, but it can achieve near-zero RPOs when implemented properly. Replication also includes snapshot-based technologies that are unable to achieve RPOs of seconds or minutes. The RPOs a given replication solution can achieve in real implementations may vary.
→ Difference between replication and backup
• Continuous Data Protection
Generally, continuous data protection (CDP) creates many recovery points in a short time so data can be recovered from any point as needed. The success of a CDP solution depends on its design, but the core principle of multiple recent recovery points can help ensure low RPOs. When the most recent recovery point cannot be used successfully, the next recovery point may be used.
• Application-Centric Protection
File and folder data is relatively simple to protect and recover, but applications are far more complex. An application can span multiple virtual machines or containers and rely on data stored in multiple locations to function properly. Recovering an application to a specific point in time requires consistency in protection which is hard for most replication solutions to achieve. It is important to consider how often a replication solution can create a consistent recovery point from which an application can recover.
Bandwidth
Bandwidth is not unlimited. Your RPO can only be measured after the data has been received at the recovery site and is fully available for recovery. While real-time replication helps achieve a low RPO, bandwidth can be an issue during high rates of data change where replication is moving larger amounts of data. Latency and network disruptions may also affect available bandwidth.
Using QoS may further restrict the bandwidth available to replication, and this can impact RPOs. You should consider the importance of RPOs relative to other network processes when configuring your QoS. By planning, you can ensure that bandwidth is available to achieve the RPOs you need.
Testing and Reporting
If you’ve recovered before, then you probably know that no recovery is guaranteed. If your first recovery attempt is not successful, then you must recover from the next available recovery point. With backup solutions, these can be hours apart. But with continuous data protection solutions, these can be seconds apart.
The only way to know is through testing, which is one of the best ways to ensure you can achieve your RPOs. Testing your recovery not just before but also during a disaster ensures you have good recovery data, especially in a cyberattack.
Testing also generates reporting. With reports, you can identify and address any issues where RPOs are not meeting requirements. Reporting may also satisfy stakeholder requirements for regulatory compliance. Regular testing and reporting ensure that your RPOs are meeting targets when disaster strikes.
Achieving the Lowest RPOs in the Industry with Zerto
Lowering RPOs and RTOs to within a few seconds, at scale and across sites, is not only possible but also simple with Zerto. Zerto consistently achieves the lowest RTOs and RPOs in the industry to help businesses become ransomware-resilient. Using its own CDP technology, that combines real-time replication, recovery point journaling, and application-centric protection, Zerto adapts your organization’s disaster recovery strategy to keep RPOs as low as possible.
Now, learn more about RTO, the other important DR metric, or about the difference between RTO and RPO, or even about the relationship between RTO, RPO and the business continuity metrics: MTD and MTDL.
You can also increase your knowledge about DR with our Disaster Recovery Guide!
Frequently Asked Questions about RPO
What does RPO stand for?
RPO stands for recovery point objective. It represents the point in time you would like to restore to in the event of a disaster or a disruption.
What is the meaning of RPO in technology?
RPO is a key metric in IT disaster recovery. Its value, expressed in time (seconds to hours or more), can be a target to represent the maximal amount of data loss to be incurred in a disaster scenario. It is usually defined by a service-level agreement (SLA), which exists for internal or external customers of key data and systems.
What is an example of RPO?
Let’s say that an organization’s business continuity team determines that the maximum tolerable data loss (MTDL) is 4 hours based on the amount of [critical] data created by the business every hour. Based on this target, and to ensure compliance, the DR team establishes an RPO of 2 hours, based on backups created every 2 hours. In the worst-case scenario, if a disruption occurs at the end of a backup window, the DR team can try to revert back to its first restore point (~2 hours prior). If for whatever reason this restore point cannot be used, it can go back further to the second restore point (4 hours prior) and still meet its RPO target.