Backup Theory That Survives Audits
From copy to recoverability
Open interactive version (quiz + challenge)Real-world analogy
Backups are spare keys. A spare key that you’ve never tested might be the wrong shape. A spare key stored on the same keyring as the original is not a spare. A good backup is a copy kept elsewhere, tested often, and restorable to a known, recent state.
What is it?
Backup discipline is the spine of availability and recovery. You don’t need to architect it as a junior; you need to recognize when a setup is audit-ready and when it isn’t — and raise it before an incident proves it.
Real-world relevance
A mid-sized company gets hit by ransomware at 2 AM. Their backups were on the same SAN that got encrypted. No immutability, no offsite. Recovery takes weeks and pain. A peer company next door recovers in 48 hours because of 3-2-1 + immutability + recent restore drills.
Key points
- Full, incremental, differential — Full: entire dataset (slow, big). Incremental: changes since last backup of any kind (small, fast, longer restore chain). Differential: changes since last full (medium, shorter restore). Modern tools often use synthetic fulls + forever-incremental.
- 3-2-1 rule — Three copies of data, on at least two different media, with at least one copy offsite. Combined with immutability, this survives ransomware and hardware failure.
- Immutability and WORM — Write-Once-Read-Many (WORM) storage prevents backups from being modified or deleted within a retention window. Critical against ransomware that seeks to destroy backups before encrypting data.
- Retention policy — How long each backup is kept. Often tiered: daily for 30 days, weekly for 90 days, monthly for 1 year, yearly for 7 years. Regulators define minimums for some industries.
- Application-consistent vs crash-consistent — Crash-consistent: like pulling the power — may miss in-flight writes. Application-consistent: quiesces apps (VSS, pre/post scripts) before capturing. Databases almost always need app-consistent backups.
- Restore is the product — Every backup must be tested. ‘We have a backup’ means nothing; ‘we did a restore drill last month with documented RTO’ means everything. Unverified backups fail at the worst time.
- Backup as security control — Segmentation of backup credentials, separate domain/tenant for backup admins, isolated backup network, and monitoring of backup jobs. Attackers target backups first in ransomware cases.
- Common pitfalls — (1) Backups on the same storage as production. (2) Jobs fail silently. (3) No offsite copy. (4) No immutability. (5) Never-tested restore. (6) Encryption keys lost. (7) Retention mismatch with regulation.
Code example
// Backup hygiene checklist
[ ] 3 copies: production + onsite backup + offsite/cloud
[ ] 2 media types (disk + cloud; disk + tape; etc.)
[ ] 1 offsite / offline / immutable copy
[ ] Immutability window matches worst-case attacker dwell time
[ ] Application-consistent backups for databases
[ ] Separate backup admin identity / tenant
[ ] Backup jobs monitored -> on failure, alert + ticket
[ ] Retention matches regulation (e.g., 7y for some finance data)
[ ] Encryption keys stored in a vault, with escrow
[ ] Documented restore runbook
[ ] Scheduled restore drills (at least quarterly)
[ ] Last restore drill date is within policy windowLine-by-line walkthrough
- 1. Backup hygiene checklist
- 2. Three copies pattern
- 3. Two media types
- 4. One offsite/immutable
- 5. Immutability matches attacker dwell
- 6. App-consistent DB backups
- 7. Separate backup identity/tenant
- 8. Monitored jobs + alerts
- 9. Retention matches regulation
- 10. Keys in a vault with escrow
- 11. Documented restore runbook
- 12. Quarterly restore drills
- 13. Last drill within policy
Spot the bug
Junior sets up daily backups to a share on the same storage array as production.
Retention is 14 days. No offsite, no immutability. Last restore tested: never.Need a hint?
How does this fail against a ransomware attack?
Show answer
Ransomware encrypts the shared storage, destroying backups with it. 14-day retention misses longer-dwell campaigns. ‘Never tested’ means untrusted. Fix: 3-2-1 with immutable offsite copy, longer retention per regulation, tested quarterly restores, separate backup identity and network, and monitored alerts on failed jobs.
Explain like I'm 5
Keep a spare key you know works, store it somewhere safe, check it regularly, and never keep it on the same keychain as the one you lose.
Fun fact
In many post-ransomware reviews, the victim had backups — but attackers had also compromised the backup admin account and wiped them. Backups must be defended as an identity/access frontier, not just a storage job.
Hands-on challenge
Install Veeam Community Edition (or use Azure Backup trial). Back up a test VM. Simulate data loss by deleting a file. Restore it and time the process. Write a 6-line restore runbook.
More resources
- Veeam Backup Community Edition (Veeam)
- Azure Backup (Microsoft Learn)
- CISA backup guidance (CISA)