DNS, DHCP & Why AD Breaks When They Break
The hidden dependencies behind logon failures
Open interactive version (quiz + challenge)Real-world analogy
AD is like a courier company. Kerberos is the signed delivery note. But the phonebook (DNS) tells drivers where each address is, and the city council (DHCP) assigns addresses to new buildings. Break either and the couriers can’t deliver — even though the paperwork is valid.
What is it?
DNS and DHCP are the invisible plumbing Active Directory runs on. Most ‘AD is broken’ tickets are actually DNS or DHCP misconfigurations. Mastering this lesson prevents weeks of misdiagnosis.
Real-world relevance
An office renovates and swaps its switch. Next morning every laptop says ‘cannot contact domain.’ Junior A blames AD. Junior B runs ipconfig, sees DNS pointing at 8.8.8.8 instead of the internal DC, fixes DHCP option 6, and restores the office in 10 minutes.
Key points
- DHCP — who gets an address — DHCP assigns IP, subnet mask, gateway, DNS servers, sometimes domain suffix. If DHCP fails, devices may self-assign APIPA (169.254.x.x) and can’t reach anything. DHCP scopes, reservations, and exclusions are standard sysadmin work.
- DNS — who knows the name — DNS maps names to IPs. AD relies heavily on DNS SRV records (_ldap._tcp, _kerberos._tcp). If clients can’t resolve these, domain logon fails — even though the user, password, and network are all fine.
- Why ‘AD is slow/down’ is often DNS — Logon uses DNS to find a DC. Slow DNS = slow logon. Wrong DNS (pointing at public DNS instead of DCs) = DC locator fails → user ‘cannot contact domain.’ Rule: domain-joined clients should use internal DNS servers, not 8.8.8.8.
- Forward vs reverse zones — Forward: name → IP. Reverse: IP → name. Many enterprise tools (backup, monitoring, Kerberos delegation) need reverse records to work correctly. Skipping reverse zones bites you later.
- TTL (Time To Live) — How long a DNS answer may be cached. Low TTL = fast change propagation, higher DNS load. High TTL = efficient, slow changes. Don’t mass-lower TTLs without thinking.
- nslookup / Resolve-DnsName basics — Tools to query DNS directly and see which server answered. Essential for diagnosing ‘DNS says X but I expected Y.’
- DHCP reservations and exclusions — Reservation: fixed IP for a specific MAC (printers, cameras). Exclusion: range of IPs DHCP should not hand out (static servers). Confusing these creates IP conflicts and ‘random’ connectivity issues.
- The 4-question AD-logon diagnostic — (1) Does the client have a sane IP and correct DNS? (2) Can it resolve the SRV records? (3) Can it reach the DC on required ports? (4) Is the machine account/trust still valid?
Code example
// Logon failure triage — DNS/DHCP lens
1) ipconfig /all
- IP present? subnet correct? gateway reachable?
- DNS servers point at internal DCs (NOT public)?
- Correct DNS suffix?
2) nslookup contoso.local
Resolve-DnsName _ldap._tcp.dc._msdcs.contoso.local -Type SRV
- Are SRV records returning?
- Are the answering IPs the expected DCs?
3) Test-ComputerSecureChannel -Verbose
- Is the machine's trust with the domain healthy?
4) Ports from client to DC (on-prem):
- 53/TCP+UDP DNS
- 88/TCP+UDP Kerberos
- 389/TCP LDAP
- 445/TCP SMB
- 3268/TCP Global Catalog
- 123/UDP Time (NTP)Line-by-line walkthrough
- 1. Logon failure triage playbook
- 2. Step 1 — confirm healthy IP/DNS suffix
- 3. Sub-check: non-public DNS
- 4. Sub-check: correct DNS suffix
- 5. Blank separator
- 6. Step 2 — resolve domain and SRV records
- 7. Inspect answers for expected DCs
- 8. Blank separator
- 9. Step 3 — verify machine’s domain trust
- 10. Blank separator
- 11. Step 4 — confirm required ports to DC
- 12. DNS
- 13. Kerberos
- 14. LDAP
- 15. SMB
- 16. Global Catalog
- 17. NTP time sync
Spot the bug
Problem: After office DHCP change, users get ‘trust relationship failed’ when logging on.
Junior tries to rejoin every laptop to the domain one by one.Need a hint?
Is the trust issue real, or a symptom of something else broken first?
Show answer
Before rejoining anything, verify DNS and DHCP: do clients receive internal DNS servers via DHCP option 6? Can they resolve SRV records? Can they reach the DC on required ports? Rejoining masks the real fix and creates unnecessary work. Run Test-ComputerSecureChannel only after DNS is validated.
Explain like I'm 5
Before anyone in the company can say ‘yes, you can come in,’ your phone has to know WHERE to call. DNS is the phonebook. DHCP is the operator giving new phones their numbers. Break either and every door stops answering.
Fun fact
Microsoft engineers often joke that ‘it’s always DNS’ — and they’re usually right. Studies of enterprise AD incidents routinely show DNS is the single most common root cause of authentication issues.
Hands-on challenge
On your own machine: run ipconfig /all, note your DNS servers. Run Resolve-DnsName -Type SRV (if you have any domain suffix) or Resolve-DnsName microsoft.com -Type MX. Read the output until every field makes sense.
More resources
- AD DS DNS requirements (Microsoft Learn)
- DHCP in Windows Server (Microsoft Learn)
- DNS explained (interview-safe depth) (NetworkChuck)