Business Continuity & Disaster Recovery Frameworks

A comprehensive reference for BC/DR planning, recovery objectives, business impact analysis, and testing methodologies.

Core Concepts

Business Continuity (BC)

The capability of an organisation to continue delivery of products or services at acceptable predefined levels following a disruptive incident.

Disaster Recovery (DR)

The process, policies, and procedures related to preparing for recovery or continuation of technology infrastructure critical to an organisation after a natural or human-induced disaster.

Key Difference

BC = Keeping the business running (people, processes, facilities)
DR = Recovering IT systems and data

Recovery Objectives

Recovery Time Objective (RTO)

Definition: The maximum acceptable time that a system, application, or function can be down after a disruption.

Examples:

Critical payment system: RTO = 1 hour
Email system: RTO = 4 hours
Internal file storage: RTO = 24 hours

Business Question: "How long can we survive without this?"

Recovery Point Objective (RPO)

Definition: The maximum acceptable amount of data loss measured in time. Determines backup frequency.

Examples:

Financial transactions: RPO = 0 (zero data loss acceptable)
CRM system: RPO = 1 hour (up to 1 hour of data can be lost)
Document management: RPO = 24 hours

Business Question: "How much data can we afford to lose?"

Maximum Tolerable Downtime (MTD)

Definition: The time after which an organisation's viability will be threatened if normal operations cannot be resumed.

Business Question: "At what point does this outage become an existential threat?"

Recovery Consistency Objective (RCO)

Definition: Ensures data consistency across interdependent systems during recovery.

Example: Customer orders, inventory, and payment systems must all recover to the same point in time to maintain data integrity.

BC/DR Planning Frameworks

ISO 22301:2019 - Business Continuity Management

Region: International

Purpose: International standard for Business Continuity Management Systems (BCMS).

Key Components:

Context of the organisation
Leadership and planning
Business impact analysis and risk assessment
BC strategy and solutions
Exercising and testing
Performance evaluation and continual improvement

Best For: Organisations seeking certification or formal BCMS

Link: ISO 22301:2019

NIST SP 800-34 Rev. 1 - Contingency Planning Guide

Region: United States

Purpose: Federal guidance for IT system contingency planning.

Key Components: - Contingency planning policy - Business impact analysis - Preventive controls - Contingency strategies - Plan development, testing, and maintenance

Best For: US federal agencies, contractors, and organisations following NIST guidance

Link: NIST SP 800-34

BS 25999-2 (Superseded by ISO 22301)

Region: United Kingdom

Note: Withdrawn in 2012 and replaced by ISO 22301. Still referenced in some legacy documentation.

BCI Good Practice Guidelines (GPG)

Region: International

Purpose: Professional practice guidance from the Business Continuity Institute.

Key Components:

Policy and programme management
Embedding BC in the organisation's culture
Analysis (BIA and risk assessment)
Design (strategies and solutions)
Implementation
Validation (exercising, testing, maintenance)

Best For: BC practitioners seeking professional development and implementation guidance

Link: BCI Good Practice Guidelines

DRII Professional Practices

Region: International

Purpose: Framework from DRI International (now merged with BCI).

Key Components: 10 Professional Practice areas covering BC lifecycle

Link: DRI International

Business Impact Analysis (BIA) Process

Purpose

Identify and quantify the impacts of disruptions to critical business functions and the resources required to support them.

BIA Steps

flowchart TD
    A[1. Identify Business Functions] --> B[2. Assess Impact Over Time]
    B --> C[3. Determine RTO/RPO Requirements]
    C --> D[4. Identify Critical Resources]
    D --> E[5. Document Dependencies]
    E --> F[6. Prioritise Recovery]
    F --> G[7. Present Findings to Leadership]

Impact Categories to Assess

Impact Type	Examples
Financial	Lost revenue, fines, compensation costs
Operational	Inability to deliver services, supply chain disruption
Reputational	Customer confidence, media coverage, brand damage
Regulatory/Legal	Compliance breaches, contractual penalties
Health & Safety	Risk to staff or public safety

BIA Output Examples

Business Function	MTD	RTO	RPO	Impact (4hr)	Impact (24hr)
Customer payments	2hr	1hr	0	£50k loss, regulatory breach	Business-critical
Customer support portal	8hr	4hr	1hr	Reputation damage	£20k loss, SLA breach
Internal email	24hr	8hr	4hr	Productivity impact	Minor impact

BC/DR Strategy Development

Strategy Options by Recovery Speed

Strategy	RTO Range	Cost	Description
Hot Site	Minutes-1hr	High	Fully equipped, continuously synchronised alternate site
Warm Site	4-24hrs	Medium	Partially equipped site with some infrastructure ready
Cold Site	Days-weeks	Low	Empty facility with power and connectivity only
Cloud DR	Minutes-hours	Medium	Cloud-based recovery using IaaS/PaaS
Mobile Recovery	24-72hrs	Medium	Transportable recovery facilities

Backup Strategies by RPO

RPO Target	Backup Strategy	Technology Examples
0 (Zero data loss)	Synchronous replication or Journalling	Database mirroring, synchronous SAN replication, synchronous redundant database writes
Minutes	Asynchronous replication	Continuous data protection, near-real-time replication
Hours	Frequent backups	Hourly incremental backups, log shipping
24 hours	Daily backups	Nightly full or incremental backups

Testing Methodologies

Test Types (Progressive Complexity)

1. Tabletop Exercise

Description: Discussion-based session where team members walk through scenarios verbally.

Duration: 2-4 hours

Frequency: Quarterly

Advantages:

Low cost and disruption
Good for training and identifying gaps
Tests understanding and decision-making

Disadvantages:

Doesn't test actual systems
May not reveal technical issues

Example Scenario: "The primary data centre has lost power and cooling. Walk through your response steps."

2. Simulation Test

Description: Teams respond to scenario in near-real-time, but without affecting production systems.

Duration: 4-8 hours

Frequency: Semi-annually

Advantages: - Tests coordination and communication - Identifies process gaps - Minimal business disruption

Disadvantages: - Doesn't validate technical recovery - Requires significant planning

3. Parallel Test

Description: Recovery systems are activated alongside production systems without failover.

Duration: 1-2 days

Frequency: Annually

Advantages: - Tests actual recovery capability - No business disruption - Validates backup data integrity

Disadvantages: - Costly - Doesn't test full failover process

4. Full Interruption Test

Description: Production systems are shut down and full failover to recovery environment occurs.

Duration: Varies (planned outage window)

Frequency: Every 1-3 years (rarely performed)

Advantages: - Complete validation of DR capability - Tests all aspects including staff response

Disadvantages: - High risk and cost - Significant business disruption - Requires executive approval

Note: Typically only performed for critical systems with mature DR programmes.

Test Documentation Requirements

Pre-Test:

Test objectives and scope
Success criteria
Participants and roles
Test scenario details
Rollback procedures

During Test:

Actions taken (timestamped)
Issues encountered
Decisions made

Post-Test:

Results vs. success criteria
Lessons learned
Action items for plan improvement
Updated RTO/RPO actuals

BC/DR Plan Components

Essential Plan Elements

Plan Activation Criteria
- Who can invoke the plan
- Triggering events
- Decision tree
Emergency Contact Information
- Crisis management team
- Key vendors/suppliers
- Emergency services
- Notification cascades
Roles and Responsibilities
- Crisis management team structure
- Recovery team leaders
- Communication coordinators
Recovery Procedures
- Step-by-step technical recovery tasks
- System dependencies and sequence
- Estimated timeframes
Communication Plan
- Internal communications (staff)
- External communications (customers, suppliers, media)
- Regulatory notifications
- Templates for common scenarios
Alternative Working Arrangements
- Remote working capabilities
- Alternative facilities
- Equipment and supplies
Vendor and Third-Party Contact Details
- Support contracts and escalation paths
- SLA reference information

Industry-Specific Requirements

Financial Services

PRA/FCA (UK): Operational resilience requirements
FFIEC (US): Business continuity planning handbook
Basel Committee: Principles for operational resilience

Healthcare

NHS England: Business continuity guidance for NHS organisations
HIPAA (US): Contingency plan requirements (164.308(a)(7))

Critical Infrastructure

NIS Regulations (UK): BC requirements for operators of essential services
NIS2 Directive (EU): Enhanced resilience measures

Quick Selection Guide

Organisation Profile	Recommended Framework	Testing Frequency
Small business (<50 staff)	Simplified BCI GPG approach	Annual tabletop
Medium enterprise	ISO 22301 or BCI GPG	Quarterly tabletop, Annual simulation
Large enterprise	ISO 22301 + industry-specific	Monthly tabletop, Quarterly simulation, Annual parallel test
US Federal/Contractor	NIST SP 800-34	Per agency requirements
Financial services (UK)	ISO 22301 + PRA/FCA guidance	Quarterly minimum
Healthcare (UK)	ISO 22301 + NHS guidance	Semi-annual minimum

Key Metrics and KPIs

Metric	Description	Target
Plan Currency	% of plans reviewed within last 12 months	100%
Staff Awareness	% of staff who know how to access BC plans	>80%
Test Coverage	% of critical systems tested annually	100%
RTO Achievement	% of recovery tests meeting RTO targets	>95%
RPO Achievement	% of recoveries meeting RPO targets	>95%

Common Pitfalls

Plans Not Maintained: Plans become outdated as technology and staff change
Insufficient Testing: Tabletop exercises only, no validation of actual recovery
Single Points of Failure: Key person dependencies or single-vendor reliance
Inadequate Documentation: Plans are too high-level or too technical
No Alternative Communications: Primary communication method fails and no backup exists
Backup Data Not Tested: Backups exist but restoration has never been validated
Scope Creep: Trying to protect everything instead of focusing on critical functions

Business Continuity & Disaster Recovery Frameworks

Core Concepts

Business Continuity (BC)

Disaster Recovery (DR)

Key Difference

Recovery Objectives

Recovery Time Objective (RTO)

Recovery Point Objective (RPO)

Maximum Tolerable Downtime (MTD)

Recovery Consistency Objective (RCO)

BC/DR Planning Frameworks

ISO 22301:2019 - Business Continuity Management

NIST SP 800-34 Rev. 1 - Contingency Planning Guide

BS 25999-2 (Superseded by ISO 22301)

BCI Good Practice Guidelines (GPG)

DRII Professional Practices

Business Impact Analysis (BIA) Process

Purpose

BIA Steps

Impact Categories to Assess

BIA Output Examples

BC/DR Strategy Development

Strategy Options by Recovery Speed

Backup Strategies by RPO

Testing Methodologies

Test Types (Progressive Complexity)

1. Tabletop Exercise

2. Simulation Test

3. Parallel Test

4. Full Interruption Test

Test Documentation Requirements

BC/DR Plan Components

Essential Plan Elements

Industry-Specific Requirements

Financial Services

Healthcare

Critical Infrastructure

Quick Selection Guide

Key Metrics and KPIs

Common Pitfalls

Related Topics