Resolver’s Hosted Platform Disaster Recovery Plan (DRP) encompasses the applications, application environment, network, and data communications infrastructure that is involved in the product. Any event that has a negative impact on a company’s business continuity or finances could be termed a disaster. This includes hardware or software failure, a network outage, a power outage, physical damage to a building like fire or flooding, human error, or some other significant event. In order to mitigate the risk of a disaster caused by natural, man-made, or acts of God, the company has developed a detailed Hosted DRP. The plan includes strategies and efforts that the company’s technical and management personnel will need to perform before, during, and after a disruption occurs.
The DRP details guidelines for declaring a disaster involving various teams, which encompasses roles and responsibilities specifically outlining each team’s assigned responsibilities.
The purpose of this policy is to protect the confidentiality, integrity, and availability of Resolver’s and its customer’s information by controlling remote access to Resolver’s IT systems.
The Hosted DRP describes the step-by-step process on how to recover from the loss of Resolver’s Platform. It includes guidelines of priority of work to ensure that applications and systems are recovered in a timely fashion.
The purpose of the DRP is to define precisely how Resolver will recover its IT infrastructure, IT services, and all data (including personal data) within set deadlines in the case of a disaster or other disruptive incident. The objective of this Plan is to complete the recovery of IT infrastructure, IT services, and data within the set recovery time objective (RTO).
This Plan includes all resources and processes necessary for the recovery and covers all the information security aspects of business continuity management.
Users of this document are members of the top management and employees necessary for the recovery of this activity.
This whitepaper uses two common industry terms for disaster planning:
Recovery time objective (RTO) — The time it takes after a disruption to restore a business process to its service level, as defined by the operational level agreement (OLA). For example, if a disaster occurs at 12:00 PM (noon) and the RTO is eight hours, the DR process should restore the business process to the acceptable service level by 8:00 PM.
Recovery point objective (RPO) — The acceptable amount of data loss measured in time. For example, if a disaster occurs at 12:00 PM (noon) and the RPO is one hour, the system should recover all data that was in the system before 11:00 AM. Data loss will span only one hour, between 11:00 AM and 12:00 PM (noon).
A company typically decides on an acceptable RTO and RPO based on the financial impact to the business when systems are unavailable. The company determines financial impact by considering many factors, such as the loss of business and damage to its reputation due to downtime and the lack of systems availability.
IT organizations then plan solutions to provide cost-effective system recovery based on the RPO within the timeline and the service level established by the RTO.
The DRP team ascertains and maintains that the database data is being backed up on a daily basis. The backup and restoration of all databases are regularly tested.
All Resolver’s Cloud Platform applications are enclosed in Amazon AWS Data Centers.
A Resolver DRP was developed to map each team’s tasks and the interdependencies of those tasks.
The DRP is reviewed and tested on an annual basis to keep the plan in sync with current business and Resolver’s Platform environment needs.
A disaster will be declared if Resolver’s Platform is inaccessible for a period of four hours consecutive or Resolver management believes Resolver’s Platform will be unavailable for a twenty-four (24) hour period. Declaring a Disaster Recovery Event is a serious step and a conservative approach will be taken when a decision is required.
If the Resolver’s Platform production facility is destroyed, a disaster will be declared immediately.
Declaring a disaster is the responsibility of the Development Operations (DevOps) team and the COO.
The COO is the responsible Disaster Recovery Executive. In the COO’s absence, VP of Engineering and/or CEO are authorized to make decisions on behalf of Resolver. The DevOps team member will advise the Disaster Recovery Executive about the potential Disaster Recovery situation. This person will determine if an incident should be classified as a Disaster Recovery event, which will put the Hosted Disaster Recovery Plan into effect. Furthermore, he or she will notify the Director of the Customer Service group and the EVP of Operation that a Disaster Recovery Event has been declared. The Director of the Customer Service will determine the information that will be communicated to customers.
Key Personnel Contact Info (Call tree)
|Name and Title||Contact details||Contact Numbers|
|BPC Coordinator||Mobile phone|
|BCP Coordinator||Mobile phone|
|VP Customer Success, Communications Team Lead||Mobile phone|
|DevOps Director||Mobile phone|
|Information Security Analyst||Mobile phone|
The Manager of DevOps has overall responsibility to ensure that the Hosted Disaster Recovery Plan and Disaster Recovery environment are properly maintained and tested. This person is also the leader of the Disaster Recovery Team and will lead the Disaster Recovery Team in implementing the Hosted Disaster Recovery Plan.
If a potential Disaster Recovery Event occurs after hours the On-Call DevOps team member is responsible for identifying a possible incident and following the escalation process.
The hosting personnel located at the Production Site(s) will assist in assessing the impact of the incident and reporting this information to Resolver.
The Communications Teams will be responsible to ensure that all stakeholders will be regularly updated, every 4 hours.
The Communications Teams will be responsible to ensure that the entire company has been notified of the disaster. The best and/or most practical means of contacting all of the employees will be used with a preference on the following methods (in order):
The Director of the Customer Service is responsible for notifying customers of the declaration of a Disaster Recovery Event. The Director of Customer Service will inform customers of the nature of the disaster and estimated time to recovery. There will be regular communications (every 4-6 hours) to customers regarding the status and progress for the duration of the Disaster Recovery Event.
The Public Relations Representative (PR Rep) is a member of the executive staff or their assigned representative. This person is the only authorized Resolver personnel that is permitted to give any statement to the media.
Important Note- All Disaster Recovery team members will refer members of the media to the CEO and CRO.
The disaster recovery process consists of defining rules, processes, and disciplines to ensure that the critical business processes will continue to function if there is a failure of one or more of the information processing or telecommunications resources upon which their operations depends. The following are key elements to a disaster recovery plan:
Key people from each business unit should be members of the team and included in all disaster recovery planning activities. The disaster recovery planning group needs to understand the business processes, technology, networks, and systems in order to create a DRP. A risk and business impact analysis should be prepared by the disaster recovery planning group that includes at least the top ten potential disasters. After analyzing the potential risks, priority levels should be assigned to each business process and application/system.
It is important to keep inventory up-to-date and have a complete list of equipment; physical and virtual, 3rd party services from which our products depend, locations, and points of contact.
The goal is to provide viable, effective, and economical recovery across all technology domains.
Each product group should create a list of assets and can use the following chart to classify them:
|1||Mission Critical||Mission Critical to accomplishing the mission of the organization.|
It can be performed only by computers.
No alternative manual processing capability exists.
|2||Critical||Critical in accomplishing the work of the organization.|
Primarily performed by computers.
It can be performed manually for a limited time period.
Must be restored starting at 36 hours and within 5 days.
|3||Essential||Essential in completing the work of the organization. Performed by computers.|
It can be performed manually for an extended time period.
It can be restored as early as 5 days, however, it can take longer.
|4||Non-Critical||Non-Critical to accomplishing the mission of the organization.|
It can be delayed until the damaged site is restored and/or a new computer system is purchased.
It can be performed manually.
This phase includes the activities to notify Disaster Recovery Executive of a possible disaster, directing the Disaster Recovery Team to assess the damages to the Resolver’s Platform, and beginning the Disaster Recovery process if necessary.
2. The DevOps Team member will notify key personnel of the incident. At a minimum, the following personnel must be notified:
The DevOps Team member will include the following notification information if applicable:
3. The Disaster Recovery Executive will determine how to proceed. The following actions may be taken:
a) Require the Disaster Recovery Team to conduct a further damage assessment. Information on the following items will be reported back to the Disaster Recovery Executive hourly:
b) Determine the extent of the incident and is it considered a Disaster Recovery Event as defined by the company’s Disaster Recovery Guidelines. If so, the Disaster Recovery Executive will instruct the Disaster Recovery Team to activate the Hosted Disaster Recovery Plan.
This phase includes the activities that initiate the Disaster Recovery Event. Team members are notified, assembled, and updated on the present situation.
The DevOps Team member will contact the COO. If not reachable, the DevOps Team member will contact the Company CTO. If not reachable, the DevOps Engineer will contact the CEO.
The recovery phase involves steps to be taken to restore Resolver’s Platform to be recovered to the Disaster Recovery Site.
The initial focus in bringing up the Disaster Recovery environment as Production is to ensure that the data is as current as possible and to determine how much data loss there is between the most recent Production-Disaster Recovery data synching and the time Production was lost.
In parallel to the data recovery effort, DevOps will be working to re-point customers, and start pushing the new Domain Name entries across the Internet. Lastly, when the process is under control, DevOps will work to find a new site for Production if that is necessary.
It is important, throughout the entire Disaster Recovery process; to have the engineering leads available. Work to get the Resolver’s Platform live will be performed around the clock.
Resolver’s AWS Hosted Production Environments RTO should be no more than 15 minutes in the scope of the AWS Region.
Resolver’s AWS Hosted Production Environments RPO – potential data loss will not be more than one (1) hour
Annually re-occurring for January and incorporates a full mock fail-over up to but not including the point of re-directing customers.
The Hosted Disaster Recovery Plan and Disaster Recovery environment will be modified in response to changes in the Resolver’s Platform environment. Such changes might include personnel changes, critical application changes, and network, hardware, or software changes. Resolver’s Platform Hosted Disaster Recovery Plan is tested annually to ensure that Resolver has the appropriate environment to support Disaster Recovery at 100% capacity.
The Disaster Recovery test is designed to ensure that Disaster Recovery data is in sync with Production data and the Disaster Recovery applications function the same as Production applications.
This document is valid as of July 2020.
The owner of this document is an Information Security Analyst who must check and, if necessary, update the document at least once a year.
When evaluating the effectiveness and adequacy of this document, the following criteria need to be considered:
EFFECTIVE ON: September 2020
REVIEW CYCLE: Annual at least and as needed
REVIEW, APPROVAL & CHANGE HISTORY: Last time reviewed and approved in August 2020 by Resolver’s Information Technology Security team.