Lesson 19 of 60 intermediate

Domain Joins & Login Failures

Safe first actions under identity pressure

Open interactive version (quiz + challenge)

Real-world analogy

A domain join is like a device signing an employment contract. Once signed, it gets an ID, trusts the building, and plays by the rules. If the contract is torn or the building loses the signature page, the device is suddenly a stranger at the door.

What is it?

This lesson is the crisis-response layer on top of everything you learned about AD, DNS, DHCP, GPO, and permissions. It makes sure that when identity goes wrong, your first moves are safe, documented, and scoped.

Real-world relevance

Half of HQ can’t log in after a weekend change window. A junior immediately rejoins laptops and blames ‘AD.’ A senior runs gpresult, checks Event Viewer, finds a new GPO blocking network logons on Windows 11 because of a misconfigured logon right. Reverts the GPO, 200 users back in 10 minutes.

Key points

Code example

// Multi-user logon failure — scoped triage

1) Scope
   - 1 user or many?
   - 1 site or many?
   - 1 OS/image or mixed?

2) Identity path
   ipconfig /all                  # internal DNS?
   nslookup _ldap._tcp.<domain>   # SRV reachable?
   Test-ComputerSecureChannel -Verbose
   w32tm /query /status           # clock skew

3) Policy
   gpresult /h report.html
   Recent GPO changes -> correlation with symptom time

4) Auth logs
   Event Viewer -> Security on DCs
   Event IDs 4768 (TGT), 4769 (service ticket), 4625 (logon failed)

5) Change control
   What changed in the last 48h?
   What can be reverted safely?

Line-by-line walkthrough

  1. 1. Scoped triage playbook
  2. 2. Step 1 — scope the blast radius
  3. 3. How many users
  4. 4. How many sites
  5. 5. Single or multi-image
  6. 6. Blank separator
  7. 7. Step 2 — identity path checks
  8. 8. Verify DNS
  9. 9. Verify SRV resolution
  10. 10. Check secure channel
  11. 11. Check clock
  12. 12. Blank separator
  13. 13. Step 3 — policy check
  14. 14. gpresult report
  15. 15. Correlate with recent GPO changes
  16. 16. Blank separator
  17. 17. Step 4 — auth logs on DCs
  18. 18. Security log
  19. 19. TGT event ID
  20. 20. Service ticket event ID
  21. 21. Logon failure event ID
  22. 22. Blank separator
  23. 23. Step 5 — change control
  24. 24. Recent changes
  25. 25. Safe revert candidates

Spot the bug

Monday morning: 40 users from HQ say ‘cannot connect to domain’.
A junior rejoins all 40 laptops one by one. Takes 6 hours.
Need a hint?
What cheaper and safer first step would have revealed the real root cause?
Show answer
Scope first. Run gpresult /h on a couple of affected machines; check Event Viewer on the DC; look for recent changes (DNS, DHCP, GPO, firewall). In most real cases a single misconfigured change is the cause and can be reverted in minutes, saving the rebuild of 40 devices.

Explain like I'm 5

When the building doesn’t recognize your ID anymore, you don’t knock down the front door. You check: is the nameplate right, is the badge still valid, is the building’s clock the same as yours, and did anyone change the rules last night?

Fun fact

Many real corporate outages are just phones. A salesperson changes their domain password on a laptop but forgets their iPhone’s Exchange profile. The phone keeps trying the old password until the account locks out — over and over — and the ‘attacker’ is the user themselves.

Hands-on challenge

On a VM (or your lab), simulate the trust failure: disconnect a domain-joined client for a long time, then try to log in with a domain user. Run Test-ComputerSecureChannel. Reset-ComputerMachinePassword if you have admin credentials. Document each step.

More resources

Open interactive version (quiz + challenge) ← Back to course: IT Jobs Bootcamp