Lesson 47 of 60 advanced

From Tabletop to Real Recovery

How juniors contribute during outages

Open interactive version (quiz + challenge)

Real-world analogy

A real incident is like a hospital code-blue. Nobody needs a hero; everyone needs to follow the playbook. The junior who documents, stays calm, and hands over clearly is more valuable than the one who tries to solve everything.

What is it?

Real recovery work involves clear roles, disciplined comms, and structured learning — not heroics. Juniors who embed these habits early get promoted into Incident Commander tracks.

Real-world relevance

A DNS change at 1 AM breaks authentication across a bank. Scribe (junior) captures timeline. IC declares Sev 1. Comms lead updates customers every 30 minutes. Network + IAM teams roll back the change. PIR finds a missing pre-production check step; runbook updated, automation added. Four hours to full restore; weeks of lasting improvement.

Key points

Code example

// Incident response roles (concise)

Incident Commander (IC)
  - Runs the incident, makes decisions, owns timeline
  - Does NOT type commands or fix hands-on

Communications Lead
  - Owns customer/internal/exec messaging cadence
  - Drafts updates, coordinates with PR / legal if needed

SMEs (network, identity, DB, app)
  - Investigate and execute recovery steps
  - Report facts to IC; do not broadcast independently

Scribe
  - Captures timestamped facts: events, decisions, actions
  - Supports the post-incident review with clean evidence

Executive Liaison
  - Summarizes status for execs; translates technical to business
  - Shields IC from non-essential escalations

Line-by-line walkthrough

  1. 1. Incident roles block
  2. 2. Incident Commander duties
  3. 3. Role boundaries for IC
  4. 4. Blank separator
  5. 5. Communications Lead duties
  6. 6. Coordination with PR/legal
  7. 7. Blank separator
  8. 8. SMEs header
  9. 9. Investigate and execute
  10. 10. Report to IC, don’t broadcast solo
  11. 11. Blank separator
  12. 12. Scribe duties
  13. 13. Capture timestamped evidence
  14. 14. Support PIR
  15. 15. Blank separator
  16. 16. Executive Liaison duties
  17. 17. Translate status to business
  18. 18. Shield IC from non-essentials

Spot the bug

During a Sev 1 outage, the junior posts: 'We’re getting hacked! DM me for details' on LinkedIn.
Need a hint?
Which three rules does this break, and what could it cost?
Show answer
(1) Only designated spokesperson communicates externally, (2) Never speculate publicly during an incident, (3) Preserve confidentiality. Consequences: customer panic, regulatory breach, attacker benefit (they read too), personal disciplinary action. Right behavior: funnel all comms through the Communications Lead; save reflections for a blameless PIR later.

Explain like I'm 5

In a real emergency, you want teammates who listen, take notes, update people calmly, and don’t panic. That’s exactly what a great junior looks like in an outage — priceless.

Fun fact

Google’s site reliability engineering culture popularized ‘blameless post-mortems.’ The premise: humans make mistakes; systems should be designed to survive them. Blame cultures quietly destroy transparency — and transparency is the only reliable input to learning.

Hands-on challenge

Run a tabletop with a friend: pretend an M365 outage is underway. You are the Incident Commander; they are a senior. Practice 3 decision points and 3 customer comms updates — with timestamps.

More resources

Open interactive version (quiz + challenge) ← Back to course: IT Jobs Bootcamp