Lesson 46 of 60 intermediate

RPO, RTO & DR Drills

Business language for technical people

Open interactive version (quiz + challenge)

Real-world analogy

DR planning is like fire drills. Everyone thinks they’ll remember what to do until the alarm actually rings. RPO is ‘how much data can we lose?’ RTO is ‘how long can we be down?’ Drills are the difference between plan and performance.

What is it?

RPO/RTO/DR translates technical resilience into business language and reveals whether ‘backups’ actually translate to real recovery. Juniors who can speak this language become the bridge between ops and business.

Real-world relevance

A core banking outage at 10 AM on a Monday: RPO target 5 min, RTO target 30 min. DR plan is rehearsed quarterly. Incident commander declares disaster, failover begins at 10:04, service resumes at 10:27. Auditor happy; customers barely notice.

Key points

Code example

// DR drill report template (partial exercise)

Exercise:     Restore CRM database to DR site
Date:         2026-04-15
Participants: DBA, Network, Security, App Owner, IT Manager
RPO target:   15 minutes
RTO target:   2 hours

Pre-checks:
  [ ] Current backup validated (timestamp + checksum)
  [ ] DR network links reachable
  [ ] DR identity and DNS ready
  [ ] Runbook version 2.3 printed and reviewed

Execution timeline:
  T+0       Declaration of exercise
  T+10 min  Backup delivered to DR storage
  T+25 min  DB restored, checksum verified
  T+40 min  App reconfigured to DR endpoints
  T+55 min  User smoke tests passed

Results:
  RPO achieved:  10 minutes (target 15)
  RTO achieved:  55 minutes (target 120)
  Issues found:  DNS TTLs too high; runbook missing a cert step
  Actions:       shorten DNS TTLs; update runbook; re-drill in Q3

Line-by-line walkthrough

  1. 1. DR drill template header
  2. 2. Exercise name
  3. 3. Date
  4. 4. Participants
  5. 5. RPO target
  6. 6. RTO target
  7. 7. Blank separator
  8. 8. Pre-checks section
  9. 9. Validate backup
  10. 10. DR networking
  11. 11. DR identity/DNS
  12. 12. Runbook version
  13. 13. Blank separator
  14. 14. Execution timeline header
  15. 15. T+0 declare
  16. 16. T+10 deliver backup
  17. 17. T+25 DB restored
  18. 18. T+40 app reconfigured
  19. 19. T+55 smoke tests
  20. 20. Blank separator
  21. 21. Results header
  22. 22. RPO achieved vs target
  23. 23. RTO achieved vs target
  24. 24. Issues found
  25. 25. Actions to close gaps

Spot the bug

Company has an RTO target of 1 hour for core banking, but no DR site, no runbook, and backups are on a USB drive in the same building.
Need a hint?
Why is the stated RTO meaningless here?
Show answer
Without a DR architecture, recent tested backups offsite, and a documented runbook, a 1-hour RTO is aspirational, not achievable. Fix: geographically separated DR site (cloud or second datacenter), 3-2-1 backups with immutability, dependency-mapped runbooks, at least annual drills, and monitoring that can detect failures fast.

Explain like I'm 5

If your phone dies, how much do you mind losing (RPO) and how long can you wait before getting a new one (RTO)? DR is that question, but for a whole company — and a drill proves you know the answer.

Fun fact

The Bangladesh central bank’s directives and modern ICT-risk frameworks explicitly expect banks to plan, document, test, and report DR exercises. Many banks now run at least annual drills with evidence retained — a trend mirrored across global financial regulators.

Hands-on challenge

Write a one-page DR drill plan for a small fictional company: 3 apps, 1 DB, 1 file server. Include RPO, RTO, dependencies, runbook steps, roles, and success criteria.

More resources

Open interactive version (quiz + challenge) ← Back to course: IT Jobs Bootcamp