Skip to content

Business Continuity & Disaster Recovery Frameworks

A comprehensive reference for BC/DR planning, recovery objectives, business impact analysis, and testing methodologies.


Core Concepts

Business Continuity (BC)

The capability of an organisation to continue delivery of products or services at acceptable predefined levels following a disruptive incident.

Disaster Recovery (DR)

The process, policies, and procedures related to preparing for recovery or continuation of technology infrastructure critical to an organisation after a natural or human-induced disaster.

Key Difference

  • BC = Keeping the business running (people, processes, facilities)
  • DR = Recovering IT systems and data

Recovery Objectives

Recovery Time Objective (RTO)

Definition: The maximum acceptable time that a system, application, or function can be down after a disruption.

Examples:

  • Critical payment system: RTO = 1 hour
  • Email system: RTO = 4 hours
  • Internal file storage: RTO = 24 hours

Business Question: "How long can we survive without this?"


Recovery Point Objective (RPO)

Definition: The maximum acceptable amount of data loss measured in time. Determines backup frequency.

Examples:

  • Financial transactions: RPO = 0 (zero data loss acceptable)
  • CRM system: RPO = 1 hour (up to 1 hour of data can be lost)
  • Document management: RPO = 24 hours

Business Question: "How much data can we afford to lose?"


Maximum Tolerable Downtime (MTD)

Definition: The time after which an organisation's viability will be threatened if normal operations cannot be resumed.

Business Question: "At what point does this outage become an existential threat?"


Recovery Consistency Objective (RCO)

Definition: Ensures data consistency across interdependent systems during recovery.

Example: Customer orders, inventory, and payment systems must all recover to the same point in time to maintain data integrity.


BC/DR Planning Frameworks

ISO 22301:2019 - Business Continuity Management

Region: International

Purpose: International standard for Business Continuity Management Systems (BCMS).

Key Components:

  • Context of the organisation
  • Leadership and planning
  • Business impact analysis and risk assessment
  • BC strategy and solutions
  • Exercising and testing
  • Performance evaluation and continual improvement

Best For: Organisations seeking certification or formal BCMS

Link: ISO 22301:2019


NIST SP 800-34 Rev. 1 - Contingency Planning Guide

Region: United States

Purpose: Federal guidance for IT system contingency planning.

Key Components: - Contingency planning policy - Business impact analysis - Preventive controls - Contingency strategies - Plan development, testing, and maintenance

Best For: US federal agencies, contractors, and organisations following NIST guidance

Link: NIST SP 800-34


BS 25999-2 (Superseded by ISO 22301)

Region: United Kingdom

Note: Withdrawn in 2012 and replaced by ISO 22301. Still referenced in some legacy documentation.


BCI Good Practice Guidelines (GPG)

Region: International

Purpose: Professional practice guidance from the Business Continuity Institute.

Key Components:

  • Policy and programme management
  • Embedding BC in the organisation's culture
  • Analysis (BIA and risk assessment)
  • Design (strategies and solutions)
  • Implementation
  • Validation (exercising, testing, maintenance)

Best For: BC practitioners seeking professional development and implementation guidance

Link: BCI Good Practice Guidelines


DRII Professional Practices

Region: International

Purpose: Framework from DRI International (now merged with BCI).

Key Components: 10 Professional Practice areas covering BC lifecycle

Link: DRI International


Business Impact Analysis (BIA) Process

Purpose

Identify and quantify the impacts of disruptions to critical business functions and the resources required to support them.

BIA Steps

flowchart TD
    A[1. Identify Business Functions] --> B[2. Assess Impact Over Time]
    B --> C[3. Determine RTO/RPO Requirements]
    C --> D[4. Identify Critical Resources]
    D --> E[5. Document Dependencies]
    E --> F[6. Prioritise Recovery]
    F --> G[7. Present Findings to Leadership]

Impact Categories to Assess

Impact Type Examples
Financial Lost revenue, fines, compensation costs
Operational Inability to deliver services, supply chain disruption
Reputational Customer confidence, media coverage, brand damage
Regulatory/Legal Compliance breaches, contractual penalties
Health & Safety Risk to staff or public safety

BIA Output Examples

Business Function MTD RTO RPO Impact (4hr) Impact (24hr)
Customer payments 2hr 1hr 0 £50k loss, regulatory breach Business-critical
Customer support portal 8hr 4hr 1hr Reputation damage £20k loss, SLA breach
Internal email 24hr 8hr 4hr Productivity impact Minor impact

BC/DR Strategy Development

Strategy Options by Recovery Speed

Strategy RTO Range Cost Description
Hot Site Minutes-1hr High Fully equipped, continuously synchronised alternate site
Warm Site 4-24hrs Medium Partially equipped site with some infrastructure ready
Cold Site Days-weeks Low Empty facility with power and connectivity only
Cloud DR Minutes-hours Medium Cloud-based recovery using IaaS/PaaS
Mobile Recovery 24-72hrs Medium Transportable recovery facilities

Backup Strategies by RPO

RPO Target Backup Strategy Technology Examples
0 (Zero data loss) Synchronous replication or Journalling Database mirroring, synchronous SAN replication, synchronous redundant database writes
Minutes Asynchronous replication Continuous data protection, near-real-time replication
Hours Frequent backups Hourly incremental backups, log shipping
24 hours Daily backups Nightly full or incremental backups

Testing Methodologies

Test Types (Progressive Complexity)

1. Tabletop Exercise

Description: Discussion-based session where team members walk through scenarios verbally.

Duration: 2-4 hours

Frequency: Quarterly

Advantages:

  • Low cost and disruption
  • Good for training and identifying gaps
  • Tests understanding and decision-making

Disadvantages:

  • Doesn't test actual systems
  • May not reveal technical issues

Example Scenario: "The primary data centre has lost power and cooling. Walk through your response steps."


2. Simulation Test

Description: Teams respond to scenario in near-real-time, but without affecting production systems.

Duration: 4-8 hours

Frequency: Semi-annually

Advantages: - Tests coordination and communication - Identifies process gaps - Minimal business disruption

Disadvantages: - Doesn't validate technical recovery - Requires significant planning


3. Parallel Test

Description: Recovery systems are activated alongside production systems without failover.

Duration: 1-2 days

Frequency: Annually

Advantages: - Tests actual recovery capability - No business disruption - Validates backup data integrity

Disadvantages: - Costly - Doesn't test full failover process


4. Full Interruption Test

Description: Production systems are shut down and full failover to recovery environment occurs.

Duration: Varies (planned outage window)

Frequency: Every 1-3 years (rarely performed)

Advantages: - Complete validation of DR capability - Tests all aspects including staff response

Disadvantages: - High risk and cost - Significant business disruption - Requires executive approval

Note: Typically only performed for critical systems with mature DR programmes.


Test Documentation Requirements

Pre-Test:

  • Test objectives and scope
  • Success criteria
  • Participants and roles
  • Test scenario details
  • Rollback procedures

During Test:

  • Actions taken (timestamped)
  • Issues encountered
  • Decisions made

Post-Test:

  • Results vs. success criteria
  • Lessons learned
  • Action items for plan improvement
  • Updated RTO/RPO actuals

BC/DR Plan Components

Essential Plan Elements

  1. Plan Activation Criteria

    • Who can invoke the plan
    • Triggering events
    • Decision tree
  2. Emergency Contact Information

    • Crisis management team
    • Key vendors/suppliers
    • Emergency services
    • Notification cascades
  3. Roles and Responsibilities

    • Crisis management team structure
    • Recovery team leaders
    • Communication coordinators
  4. Recovery Procedures

    • Step-by-step technical recovery tasks
    • System dependencies and sequence
    • Estimated timeframes
  5. Communication Plan

    • Internal communications (staff)
    • External communications (customers, suppliers, media)
    • Regulatory notifications
    • Templates for common scenarios
  6. Alternative Working Arrangements

    • Remote working capabilities
    • Alternative facilities
    • Equipment and supplies
  7. Vendor and Third-Party Contact Details

    • Support contracts and escalation paths
    • SLA reference information

Industry-Specific Requirements

Financial Services

  • PRA/FCA (UK): Operational resilience requirements
  • FFIEC (US): Business continuity planning handbook
  • Basel Committee: Principles for operational resilience

Healthcare

  • NHS England: Business continuity guidance for NHS organisations
  • HIPAA (US): Contingency plan requirements (164.308(a)(7))

Critical Infrastructure

  • NIS Regulations (UK): BC requirements for operators of essential services
  • NIS2 Directive (EU): Enhanced resilience measures

Quick Selection Guide

Organisation Profile Recommended Framework Testing Frequency
Small business (<50 staff) Simplified BCI GPG approach Annual tabletop
Medium enterprise ISO 22301 or BCI GPG Quarterly tabletop, Annual simulation
Large enterprise ISO 22301 + industry-specific Monthly tabletop, Quarterly simulation, Annual parallel test
US Federal/Contractor NIST SP 800-34 Per agency requirements
Financial services (UK) ISO 22301 + PRA/FCA guidance Quarterly minimum
Healthcare (UK) ISO 22301 + NHS guidance Semi-annual minimum

Key Metrics and KPIs

Metric Description Target
Plan Currency % of plans reviewed within last 12 months 100%
Staff Awareness % of staff who know how to access BC plans >80%
Test Coverage % of critical systems tested annually 100%
RTO Achievement % of recovery tests meeting RTO targets >95%
RPO Achievement % of recoveries meeting RPO targets >95%

Common Pitfalls

  1. Plans Not Maintained: Plans become outdated as technology and staff change
  2. Insufficient Testing: Tabletop exercises only, no validation of actual recovery
  3. Single Points of Failure: Key person dependencies or single-vendor reliance
  4. Inadequate Documentation: Plans are too high-level or too technical
  5. No Alternative Communications: Primary communication method fails and no backup exists
  6. Backup Data Not Tested: Backups exist but restoration has never been validated
  7. Scope Creep: Trying to protect everything instead of focusing on critical functions