Branch Outage, ATM Outage & Audit Evidence
Regulated incidents require controlled response
Open interactive version (quiz + challenge)Real-world analogy
A branch outage is a fire in the bank’s bakery — customers are hungry, staff are anxious, regulators watch the smoke. Put it out calmly, preserve the oven (evidence), and document the recipe (runbook) for next time.
What is it?
Regulated incident response connects technical fixes to policy, audit, and regulatory expectations. Juniors who can translate their work into ‘audit evidence’ are twice as useful as juniors who can only type commands.
Real-world relevance
A single branch loses connectivity at 10 AM. Junior confirms WAN link down, branch PABX still up, VPN backup fails to establish due to expired cert. Coordinates vendor, patches cert, gets WAN back at 11:15, collects timeline, ticket, approvals, cert-renewal action item, and publishes a clean audit pack by end of day.
Key points
- Branch outage first moves — Confirm scope (one branch or many). Check WAN, power, local servers, identity, CBS channel, printing. Comms to the branch manager every 15 minutes. Escalate with scope + timeline + suspected layer.
- ATM outage — financial and reputational — Card declines hurt customers immediately. Coordinate with switch team, acquirer, issuer. Confirm whether it’s card-specific, BIN-specific, geography-specific, or network-wide. Preserve switch logs and timestamps.
- Customer-facing comms discipline — Branch notices, in-app banners, IVR messages — coordinated and approved. No ad-hoc claims. Updates at set intervals. Apologize without speculating about blame.
- Stakeholder map during a Sev 1 — Incident Commander, Comms Lead, Branch/Ops leadership, CISO/SOC (if security-related), Legal, Regulator liaison, Vendor contacts, Business owners. Know who does what before the day you need them.
- Evidence preservation under regulation — Logs, timestamps, sequence of actions, decisions, and approvals. Chain of custody for any artefact that might go to an audit or regulator. Don’t delete, don’t share outside approved channels.
- Audit evidence pack — what it contains — Runbook used, timeline log, change tickets associated, communications records, approvals, technical logs, RCA/PIR, remediation actions with owners and due dates.
- Regulatory reporting windows — Some frameworks expect critical-incident notifications in specific timeframes (often within 72 hours for major events). Junior IT doesn’t report; junior IT hands clean evidence to whoever does.
- The difference between a mess and a case — A well-handled incident produces a ‘case’ — a clean folder with everything needed. A mishandled incident leaves blame, gaps, and repeated questions. Over years, this distinction defines careers.
Code example
// ATM-down first-hour checklist
[ ] Confirm scope: specific ATM? region? scheme-wide?
[ ] Snapshot switch logs with timestamps
[ ] Confirm host + switch + acquirer connectivity
[ ] Open incident with IC assigned; start timeline log
[ ] Notify comms lead + business owner (cards, retail banking)
[ ] Draft customer-facing notice (branch + app + IVR) — keep factual
[ ] Check for pending changes in the last 48h
[ ] Engage vendor / acquirer / scheme support with evidence
[ ] Update stakeholders at fixed cadence (e.g., 15 min)
[ ] Preserve evidence (logs, tickets, approvals)
[ ] After recovery: run PIR, update runbook, plan fix actionsLine-by-line walkthrough
- 1. ATM-down first-hour checklist
- 2. Confirm scope
- 3. Snapshot switch logs
- 4. Confirm connectivity chain
- 5. Open incident + timeline
- 6. Notify comms + business owner
- 7. Customer-facing notice
- 8. Check recent changes
- 9. Engage vendor/scheme support
- 10. Fixed cadence updates
- 11. Preserve evidence
- 12. Post-incident actions
Spot the bug
Senior tells junior to 'just delete the old logs, they're taking space'.
Junior deletes 180 days of SWIFT workstation logs during a live audit.Need a hint?
What regulation, process, and trust problems does this create?
Show answer
Deleting regulatory logs — especially during audit — is a serious violation: retention rules, chain of custody, audit integrity, and possibly legal discovery duties. Never delete logs without written retention policy approval. Correct path: raise the disk-space issue through change control; move logs to compliant archive storage; get approvals and document.
Explain like I'm 5
When the bank has a bad day, don’t run around. Find out what broke, tell the right people calmly, fix it by the book, save every receipt, and explain clearly what happened afterward. That’s the whole game.
Fun fact
Mature banks treat incident writeups as learning artefacts, not punishment. Monthly or quarterly reviews of incidents across branches become the most reliable input to new detection rules, training, and process changes.
Hands-on challenge
Build a one-page ATM outage runbook for a fictional bank: scope, escalation, logs to capture, comms steps, evidence pack, RCA structure. Show it to a peer and iterate.
More resources
- SWIFT Customer Security Controls Framework (SWIFT)
- ISO/IEC 27035 Incident management (ISO)
- NIST SP 800-61r2 (NIST)