Linux Processes, Services & Logs
Where Linux incidents become visible
Open interactive version (quiz + challenge)Real-world analogy
Processes are the workers, services are their managers, and logs are the diary every worker keeps. When something goes wrong, you don’t interrogate every worker — you read the diaries, check the manager, and decide who to restart.
What is it?
Linux service administration is watching the right logs, issuing the right systemctl commands, and resisting the temptation to reboot everything. It’s quiet work that prevents 80% of Linux outages.
Real-world relevance
A production API server runs fine all day, then fails every night at 2 AM. journalctl shows an OOM-kill of the app every night. You correlate with a nightly backup that bloats RAM. You either tune the backup, add memory, or move backup to a quieter window — not ‘restart the service.’
Key points
- Process vs service — A process is any running program. A service (systemd unit) is a managed, auto-restarted, logged process. Services survive reboots because systemd knows about them; random scripts don’t.
- systemctl & journalctl — systemctl status/start/stop/restart/enable/disable/mask. journalctl reads the systemd journal (logs). These two commands handle 80% of Linux service support.
- Process inspection: ps, top, htop, pidof — ps auxf shows a tree. top/htop are live monitors. pidof nginx / pgrep nginx find PIDs. kill sends signals (SIGTERM 15 polite, SIGKILL 9 last resort).
- Resource health: load avg, memory, disk, I/O — uptime (load averages), free -h (memory), df -h (disk), du -sh (dir size), iostat and iotop (I/O), vmstat (virtual memory). Know these names; try them monthly.
- Logs beyond journal — /var/log/syslog (Debian/Ubuntu) or /var/log/messages (RHEL), /var/log/auth.log (auth events), application-specific dirs. grep, less, tail -f are your friends.
- Common first-aid actions — Service won’t start: journalctl -u -e and systemctl status for the exit code. Disk full: df -h then du -sh /var/* /home/*. Runaway CPU: top → identify PID → inspect with ps/strace → decide whether to restart or investigate.
- OOM killer awareness — When memory runs out, the kernel kills the process it considers worst. dmesg and journalctl show ‘Killed process’ entries. A repeating OOM-kill is a capacity issue, not a ‘flaky app.’
- Safe restart order — Restart the smallest scope first: service → dependent service → container → host. Never reboot a production server as a shortcut without approval in regulated environments.
Code example
// Linux triage cheat-script
# Is the service alive?
systemctl status myapp
journalctl -u myapp -e --since "30 min ago"
# Is the system healthy?
uptime # load average
free -h # memory
df -h # disk
top -b -n 1 # snapshot of CPU/memory
# Is a specific process misbehaving?
pgrep -a myapp
ps -o pid,pcpu,pmem,etime,cmd -p $(pgrep myapp)
kill -15 <pid> # graceful
kill -9 <pid> # last resort
# Did the kernel kill a process?
dmesg | grep -i "killed process"
journalctl -k --since today | grep -i oomLine-by-line walkthrough
- 1. Linux triage cheat-script
- 2. Is service alive block
- 3. systemctl status
- 4. journalctl recent window
- 5. Blank separator
- 6. System health block
- 7. uptime
- 8. free -h
- 9. df -h
- 10. top snapshot
- 11. Blank separator
- 12. Process-specific block
- 13. pgrep
- 14. ps per-pid stats
- 15. Graceful kill
- 16. Last-resort kill
- 17. Blank separator
- 18. Kernel-kill check
- 19. dmesg grep
- 20. journalctl OOM grep
Spot the bug
Ticket: 'API slow at 2 AM.' Junior reboots the VM every morning at 8 AM and closes the ticket.Need a hint?
Which logs would tell you what actually breaks at 2 AM?
Show answer
journalctl since the previous day, dmesg for OOM-kill, systemctl status for crashes, and system metrics for CPU/memory/disk peaks. Often it’s a nightly job colliding with the app. Reboots mask the cause; real fix is finding the 2 AM trigger and tuning capacity or scheduling.
Explain like I'm 5
Every program is a worker. Services are trustworthy workers with a manager. Logs are their diary. When something breaks, you read the diary, not shout at the worker.
Fun fact
systemd is so central to modern Linux that ‘systemctl status ’ has become the first command most engineers type when something feels wrong — even before they check CPU or disk.
Hands-on challenge
On an Ubuntu VM: install nginx, verify via systemctl status, break it by editing nginx.conf to invalid syntax, try to restart, read the error via journalctl, fix, restart successfully. Save the command history.
More resources
- systemd documentation (freedesktop.org)
- journalctl cheatsheet (DigitalOcean)
- Linux systemd in depth (Learn Linux TV)