When it comes to business continuity and disaster recovery, we all know that data is king. Reporting on metrics is one of the few ways to truly know that what you’re doing works, but for many business continuity and disaster recovery managers, this is a huge challenge. If you don’t have an automated tool, it’s likely that you rely on Word, Excel and colleagues in other departments to collect BC/DR metrics. We all know the struggle of working with Kyle from finance, a guy who is “way too busy” for your “little” business continuity project.
So, what’s a BC/DR manager to do? You already know that BC/DR is a critical component of an organization’s success. And you know that you need metrics to measure the effectiveness of your efforts. The first step is to understand the metrics that matter in business continuity and disaster recovery planning, which is exactly what this guide will cover. You’ll also need a tool to collect and report on these metrics. Depending on your organization’s size and the maturity level of your BC/DR program, this could range from an Excel template to powerful, automated software.
Important BC/DR Metrics
There are 7 important BC/DR metrics that you should be tracking to grow and measure recovery plans:
- Recovery Time Objectives (RTO)
- Recovery Point Objectives (RPO)
- The number of plans that cover each critical business process
- The amount of time since each plan was updated
- The number of businesses processes that are threatened by a potential disaster
- The actual time it takes to recover a business process
- The difference between your target and actual recovery time
While there are several other metrics that you could track, these metrics serve as a core review of your program, and indicate how prepared you are for a real disaster.
Critical Metrics in BC/DR
The first two important BC/DR metrics are Recovery Time Objectives (RTOs) and Recovery Point Objectives (RPOs). RTOs is the maximum acceptable length of time that the item can be down. RPOs determine the age of the data you can afford to lose and whether your backups will save the rest. For example, if you can afford to lose an hour’s worth of data, you’ll have to run backups at least every hour.
Backup and recovery procedures are at the heart of a good BC/DR plan, so you need to consider both RTOs and RPOs to determine the best backup and recovery tools for the job. If, for example, you generate continuous transactions at a moderate-to-high-volume and value, how many minutes worth of transactions could you afford to lose? How long could you afford to be out-of-service? Such an application could benefit from the very frequent, block-level backups that are possible with Continuous Data Protection (CDP), but you wouldn’t know that unless you looked at both the RTOs and RPOs.
Finally, you should measure the number of plans that cover each business process, as well as the amount of time since each plan was updated. Key Performance Indicators (KPIs) are a measure of how well a program works and one that you can’t ignore. You can set KPIs for how often you review and update your plans (for example, every month, 6 months or year) and how many business functions are covered by a recovery plan, with an action plan to achieve 100% coverage. If you are limited on time and resources, start with your most critical business processes.
Metrics for Planning
Enterprises can have hundreds to thousands of processes and you can’t restore a process without a plan. A key metric for BC/DR planning is the number of processes that are threatened by a potential disaster.
You should start with a risk analysis and business impact analysis to a) understand the greatest risks that threaten your organization and, b) the impact of those risks on various functions of the business. Then, you can create plans to protect these processes and minimize the disruption when disaster strikes.
But static plans can stagnate. You can’t restore processes unless you update plans periodically to account for changes in applications, data, environments, employees and risks. You should set reminders for yourself to prompt plan reviews at appropriate points in the cycle. In a perfect world, you’d receive confirmation from the managers of various departments who have reviewed and updated their plans, but let’s be real — reviewing and updating those plans is a huge hassle and it’s near miraculous if they do it on time. Using software can alleviate this pain point — you can automate email reminders to the various plan owners and track their progress all within the software — no passive aggressive emails needed! Software also removes many of the tedious tasks concerned with change management. For example, automated data integrations will keep your data updated automatically as that data changes in other applications. If a single contact is used in 100 plans and their phone number changes, an integrated system will carry that change over to your business continuity and emergency management plans as well.
Using Metrics to Measure Plan and Recovery Effectiveness
One of the simplest ways to determine how business functions are interdependent is by using a dependency modeling tool. This will help you visualize whether application dependencies allow you to meet RTOs and SLAs.
For example, if you need to recover an accounts payable service in 12 hours, but that depends on finance software that can take up to 24 hours to recover, accounts payable cannot meet a 12-hour SLA. A dependency modeller illustrates these dependent relationships dynamically, and when and how a plan will break down as a result.
You should be measuring the actual time it takes to recover a business process. You can test recovery procedures using a BC/DR tool to track the time each step takes.
Alternatively, you could use the old-school method by timing each step manually. These tests will help you determine whether your people and processes can meet RTOs using your existing plan. You should be able to complete recovery tasks in the time the plan allows, and if you can’t, you need to revise your plan so that it’s realistic and achievable.
Finally, the last metric covered in this resource is the difference between your actual and target recovery time, also known as a gap analysis. You can (and should!) test for gaps with tabletop exercises, failover and recovery tests, enterprise wide BC/DR tests, and gap analyses. Once you’ve identified where there are gaps in your plans, you can set KPIs and use them in your planning process.
Best Practices for Clean BC/DR Data
The data that your BC/DR software collects needs to be “clean” to ensure accurate reports and planning. For good data hygiene, make sure you’re standardizing data input with drop down menus, pick lists, text formatting and data validation. For example, if you’re inputting employee phone numbers into a plan, you’ll want to validate whether those phone numbers include an area code and remain in use.
Deduplication and Identity and Access Management (IAM) can help you to cultivate elegant data. You can use deduplication to eliminate multiple appearances of the same entries. You can use credentials (authentication) together with permissions (authorization) to ensure that only qualified users enter vital records and data. You’ll also save yourself a lot of time and headaches by integrating your BC/DR system with other applications (for example, your HR system) to avoid the duplication of records and any chance of errors.
Where to Start
We live in a world where disasters happen and companies either suffer or die. BC/DR is critical to the success and resilience of an organization, and it’s your responsibility to keep the business afloat and your staff safe in an emergency”¦ but you already knew that.
With the weight of the world on your shoulders, you can only rely on data to sleep soundly at night.
You’ve made a great start to BC/DR planning by making it to the end of this guide, but now it’s time to turn your knowledge into action! Start by determining your critical business functions and how they are dependent on one another using a relationship modelling tool.
Next, set an acceptable downtime threshold using RTO and RPO metrics. Test your plans to see if you come close to or exceed those thresholds. If you do, revise the plans and test them again. You should set KPIs to measure how often your plans are updated and tested, and conduct a gap analysis to compare the planned vs. actual recovery time.
Finally, make sure that you’re maintaining “hygienic” data for accurate reporting. Your BC/ DR metrics are completely useless if the data isn’t accurate. It may seem like a no brainer, but it’s surprising how many companies lull themselves into a false sense of security with reports that misrepresent their SLAs. It’s always better to be a realist, even if that means you’re accepting the risks that go along with it.