Process Inspection & Control (ps, top, htop, kill, Background Jobs)
Version: 2.0.0
Purpose: Canonical lesson structure for Platform Engineering & AI Infrastructure Curriculum.
Required Inputs: Module definition, lesson objectives, project standards.
Outputs: Standards-compliant lesson markdown.
Lesson Metadata
- Lesson ID:
MOD-LINUX-ADM-03 - Module: Linux Administration (
MOD-LINUX-ADM) - Difficulty: Beginner
- Estimated Duration: 45 minutes
- Learning Track: 🟢 Core
- Version: 2.0.0
- Last Updated: 2026-06-28
Lesson Overview
This lesson pulls back the curtain on the active execution engine of the Linux operating system, exploring how the Linux kernel schedules, monitors, and terminates active software applications. By mastering ps, top, htop, kill, and the elegant mechanics of background job control (&, bg, fg), you will establish the dynamic monitoring capabilities supporting our module capability: “I can administer a Linux server, manage permissions, automate simple tasks, and troubleshoot common issues.”
Learning Objectives
- Define what a Linux Process is and explain the architectural role of the Process ID (PID).
- Inspect static snapshots of active system processes using
ps auxcombined withgrep. - Monitor dynamic, real-time CPU and memory resource consumption using
topandhtop. - Terminate runaway or unresponsive software processes safely using
kill(SIGTERM) andkill -9(SIGKILL). - Manage background jobs and terminal multiplexing using
&,bg,fg, andjobs.
Prerequisites
- Completion of
MOD-LINUX-ADM-02(Linux Permission Mechanics). - Foundational terminal piping and filtering skills (
cat,grep,|).
Why This Exists
In the preceding lessons, we mastered the static filesystem of Linux—how to navigate directories, inspect file contents, and secure permission locks. However, files sitting on a hard drive are completely dormant. When you execute a Python script, launch a web server, or start a database engine, those static files are loaded into active computer memory (RAM) and handed to the CPU for execution.
In a desktop GUI operating system, when a graphical application freezes or consumes 100% of your computer’s CPU, you press Ctrl + Alt + Delete to open the graphical “Task Manager,” locate the frozen app icon, and click “End Task.”
However, in a headless cloud server or AI container, there is no graphical Task Manager. If a runaway AI training script consumes 100% of your server’s memory, or a web server daemon freezes, how do you see what is consuming your system resources? How do you forcefully shut down the failing software?
To solve this mission-critical operational requirement, Linux provides an elite suite of Process Inspection and Control Utilities (ps, top, kill). These dynamic tools act as your terminal Task Manager, empowering Platform Engineers to monitor real-time resource consumption, manage background jobs, and surgically terminate failing applications with absolute precision.
Core Concepts
1. What is a Process? (PIDs)
Think of a running computer like a busy office building. A process is simply an active worker in that office doing a specific job. Every single worker is assigned a unique employee badge number called a Process ID (PID).
- The CEO / Founder (PID 1): The very first worker hired when the computer boots up.
PID 1acts as the boss and master parent of every other worker in the entire building!
2. Static Process Snapshots (ps aux)
When you need to take an instant photograph of everything happening in the office at a specific second, you use ps, which acts like The Office Camera Snapshot.
ps aux: The universal master camera button. Let’s break down the flags:a: Takes a photo of workers from all departments.u: Prints a beautiful, human-readable format showing exactly how hard each person is working.x: Includes workers who are hidden in the back rooms without a desk.
ps aux | grep "python"In this beautiful pipeline, ps aux captures a photo of all 500 workers, but instead of flooding your desk with photos, the pipe (|) hands them to grep, which throws away every photo except the ones containing “python”!
3. Real-Time Resource Monitoring (top and htop)
When you need to watch the workers on a Live Security Feed (like a heart monitor), you use top or htop.
top: Pre-installed on literally every Linux machine. It prints a live leaderboard of who is working the hardest, updating every 3 seconds.htop: A modern, colorful version oftopwith beautiful charts.
4. Terminating Processes (kill and Signals)
To fire a worker, you use The Firing Process with the kill command and their badge number (kill [PID]). Behind the scenes, kill sends a message to the worker:
kill 1234(Polite Request to Leave /SIGTERM): This is the professional way to fire someone. It tells the worker: “Please finish writing your current email, pack up your desk safely, and leave the building.”kill -9 1234(The Bouncer /SIGKILL): This is the absolute brutal executioner. It bypasses the worker entirely and instantly kicks them out the window! Use this only when a worker is completely frozen and ignoring polite requests.
5. Background Jobs (&, bg, fg)
If you give a worker a massive task in your main terminal, they will lock up your desk for hours. You can send them to the background!
[command] &: Adding an ampersand (&) sends the worker to the back room instantly, immediately returning your desk to you!jobs: Prints a list of all workers currently in the back room.fg 1: Brings back room worker number 1 into the foreground at your desk.
Architecture
Real-World Example
Imagine you are managing an AI inference GPU cluster at a company like OpenAI. This maps perfectly to our layered architecture:
- Layer 1: Process Execution: An engineer reports that their new LLM serving container (spawned as a process) has become completely unresponsive.
- Layer 2: Monitoring Tools: You log into the server via SSH and execute
htopto inspect live performance. You instantly spot a runaway Python process (PID 4052) consuming 99.9% of the server’s CPU and memory! - Layer 3: Signal Transmission: You first attempt a polite shutdown using
kill 4052(SIGTERM), but after 10 seconds,htopshows the process is still frozen and running. You immediately escalate to the brutal executioner:kill -9 4052(SIGKILL). - Layer 4: Process Termination: The Linux kernel instantly terminates the runaway process, freeing up 256 Gigabytes of RAM in milliseconds. The server recovers instantly, and your AI cluster returns to a healthy state!
Hands-on Demonstration
Let’s look at how an engineer inspects active processes using ps aux | grep, launches a background job using &, and terminates a process safely using kill.
Input 1: Launching a Background Job and Inspecting Processes
We launch a simulated long-running sleep command in the background using &, verify it with jobs, and locate its exact PID using ps aux | grep.
Code 1
# Launch a simulated 500-second long-running task in the background using '&'.
sleep 500 &
# Verify the list of active background jobs in our terminal.
jobs
# Use 'ps aux' piped into 'grep' to locate the exact Process ID (PID) of our sleep command.
ps aux | grep "sleep" | grep -v "grep"Expected Output 1
[1] 24500
[1]+ Running sleep 500 &
aloysius 24500 0.0 0.0 5144 716 pts/0 S 04:20 0:00 sleep 500Explanation 1
Look at how beautifully transparent Linux is! When we execute sleep 500 &, the terminal instantly prints [1] 24500, telling us it launched job number 1 with PID 24500. Our prompt returns instantly! When we run ps aux | grep "sleep", Linux isolates the exact running process row. Notice our clever addition of grep -v "grep" (invert match)—this perfectly filters out the grep command itself from our output table!
Input 2: Terminating a Process via kill
We use the kill command to gracefully terminate our running background sleep process, and verify its termination.
Code 2
# Gracefully terminate the background sleep process using its exact PID (24500).
kill 24500
# Verify the status of our background jobs to confirm termination.
jobsExpected Output 2
[1]+ Terminated sleep 500Explanation 2
Notice how perfectly this functions! When we execute kill 24500, Linux sends a SIGTERM signal to the sleep process. When we check jobs, the terminal proudly confirms [1]+ Terminated sleep 500. The process has been perfectly and cleanly shut down!
Hands-on Lab
- Objective: Inspect active processes, manage background jobs, and terminate applications using
kill. - Estimated Time: 15 minutes
- Difficulty: Beginner
- Environment: Interactive Browser Terminal / Local Sandbox
Step-by-step Instructions
- Open your terminal sandbox.
- Type
sleep 300 &to launch a background sleep process. - Type
jobsto verify your active background job number. - Type
ps aux | grep sleepto locate the exact Process ID (PID) of your sleep command. - Type
kill [PID](using the exact number you found) to gracefully terminate the process. - Type
jobsto verify the process successfully reportsTerminated. - Type
topto open the live real-time system monitor. (Press the q key on your keyboard to quittopwhen finished!).
Verification
jobs
ps aux | grep sleep | grep -v grepIf your terminal confirms the sleep job is terminated and ps aux returns a clean, empty output, you have mastered Linux process control!
Troubleshooting
- Issue:
killreturnsbash: kill: (12345) - No such process. - Solution: You typed an incorrect PID number, or the process already finished running on its own. Use
ps auxto verify active PIDs.
Cleanup
No cleanup is required for this dynamic process lab.
Production Notes
In enterprise cloud architectures (such as Kubernetes microservices), Platform Engineers rely heavily on SIGTERM and SIGKILL signals to manage automated deployments. When Kubernetes wants to scale down a container or deploy a new version of your code, it first sends a SIGTERM signal to PID 1 inside the container, giving your application 30 seconds to finish serving active user web requests gracefully. If the container is still running after 30 seconds, Kubernetes forcefully drops a SIGKILL (kill -9) signal to instantly terminate the container!
Common Mistakes
- The Brutal Habit of Reaching for
kill -9First: Beginners often develop the terrible habit of usingkill -9for every single process termination.kill -9gives the software absolutely zero chance to save open files or close database transactions cleanly, which can easily corrupt your database files! Always use standardkillfirst; only usekill -9as a last resort! - Forgetting Background Jobs When Closing Terminals: If you launch a long-running script in the background using
&and then close your SSH terminal window, Linux will send aSIGHUP(Hangup) signal that instantly kills your background job! To keep jobs running permanently even after closing your terminal, you must use thenohup(no hangup) command (nohup ./backup.sh &) or use a terminal multiplexer liketmux.
Failure-Driven Learning
Imagine a junior engineer attempts to use kill to terminate a critical system daemon process owned by root while logged in as a standard user.
Simulated Failure
# Attempting to kill the master system logging daemon owned by root
kill $(pgrep rsyslogd)Output
bash: kill: (842) - Operation not permittedDiagnosis & Recovery
Why did this fail? The error Operation not permitted occurs because Linux’s multi-user security model strictly isolates process control! A standard user ($) is only authorized to kill processes that they personally spawned. You cannot kill processes owned by other engineers or by root. To recover, the engineer must elevate their privileges using sudo: sudo kill 842.
Engineering Decisions
Monolithic OS Monitoring vs. Containerized Isolation
When architecting an enterprise platform, engineering leaders must decide how processes share server resources.
- Monolithic Server Deployments: Run fifty different applications directly on the same Linux server. If one badly written application suffers a memory leak, it can consume 100% of the server’s RAM, causing the Linux kernel’s Out-Of-Memory (OOM) killer to start randomly terminating other critical services!
- Containerized Isolation (Docker / Kubernetes): Wrap every single process inside its own isolated cgroup and namespace. If a containerized process attempts to consume more than its assigned memory limit, the kernel terminates only that single container, leaving the rest of the server running flawlessly!
- The Platform Decision: Platform Engineers strictly mandate containerized process isolation for all modern cloud workloads.
Best Practices
- Master
htopFiltering: When viewinghtop, press the F4 key on your keyboard to instantly filter the live process table for a specific keyword likepythonornode! - Use
pgrepandpkillfor Speed: Instead of typingps aux | grep nginxand thenkill 1234, you can usepgrep nginxto instantly print the PID, orpkill nginxto instantly kill all processes named nginx!
Troubleshooting Guide
Issue 1: “Out of Memory: Kill process” (The OOM Killer)
- Cause: Your Linux server completely runs out of physical RAM and swap memory due to heavy application load.
- Diagnosis: A critical database or web server process suddenly vanishes. Inspecting the system logs (
sudo cat /var/log/syslog | grep -i oom) revealsOut of memory: Kill process 1102 (postgres) score 851 or sacrifice child. - Solution: When Linux completely runs out of memory, the kernel’s Out-Of-Memory (OOM) Killer steps in to save the operating system from crashing by forcefully executing (
SIGKILL) the process consuming the most RAM. To resolve this, you must upgrade your server’s physical RAM (vertical scaling) or configure strict memory limits in your application settings.
Summary
- A Process is an active running program, tracked by a unique Process ID (PID) starting from
PID 1. ps aux | grepcaptures instant static snapshots of running processes across all users.topandhtopprovide live, continuously updating real-time views of CPU and memory consumption.kill(SIGTERM) requests polite, graceful process shutdown;kill -9(SIGKILL) commands instant, brutal kernel termination.- Background job control (
&,bg,fg,jobs) empowers Platform Engineers to multitask efficiently within a single terminal window.
Cheat Sheet
# Capture a static snapshot of all running processes on the server
ps aux
# Filter the process snapshot table for a specific keyword
ps aux | grep "python" | grep -v "grep"
# Open the live, real-time interactive process monitor (Press 'q' to quit)
top
htop
# Launch a command in the background instantly
[command] &
# List all active background jobs in your terminal
jobs
# Bring a background job back into the foreground
fg [job_number]
# Gracefully terminate a process (Polite SIGTERM / Signal 15)
kill [PID]
# Forcefully terminate a frozen process (Brutal SIGKILL / Signal 9)
kill -9 [PID]
# Find the exact PID of a process by its name
pgrep [process_name]
# Kill all processes matching a specific name instantly
pkill [process_name]Knowledge Check
Multiple Choice Questions
- You are managing a production cloud server and notice a runaway background process (
PID 8820) is completely frozen. You try executingkill 8820, but after 30 seconds,ps auxshows the process is still running and ignoring your signal. Which command do you execute to forcefully command the Linux kernel to terminate the process instantly?- A)
chmod 000 8820 - B)
kill -9 8820 - C)
pkill --polite 8820 - D)
bg 8820
- A)
Scenario Questions
You are writing an automated deployment script that needs to launch a heavy database migration script named migrate.sh in the background so that the deployment pipeline doesn’t freeze up waiting for it to finish. Based on what you learned in this lesson, what exact character do you add to the end of the command to achieve this, and what command would an engineer type to verify that the job is successfully running in the background?
Short Answer Questions
Explain the exact architectural difference between sending a SIGTERM (Signal 15) to a process versus sending a SIGKILL (Signal 9) to a process.
View Answers
Multiple Choice
- B -
kill -9sends theSIGKILLsignal, forcefully and instantly terminating the process.
Scenario
Add an ampersand (&) to the end of the command (e.g., ./migrate.sh &). The engineer would then type jobs to verify that the job is running in the background.
Short Answer
SIGTERM (15) politely requests a process to terminate, giving it a chance to gracefully save state and clean up resources. SIGKILL (9) forcefully terminates the process instantly at the kernel level, skipping any cleanup and potentially causing data loss.
Interview Preparation
Beginner Questions
- What is a PID in Linux, and what process always holds
PID 1? - What does the
ps auxcommand do? - Why is it considered bad practice to use
kill -9as your first choice when stopping a process?
Intermediate Questions
- Explain what the
jobs,fg, andbgcommands do in the context of terminal job control. - What is the purpose of the
grep -v grepaddition when filteringps auxoutput?
Advanced Questions
- Explain how the Linux kernel’s Out-Of-Memory (OOM) killer calculates the
oom_scoreof running processes, and how a Site Reliability Engineer can modifyoom_score_adjto protect critical daemons from being terminated during an OOM event.
Scenario-Based Discussions
- Discuss the operational trade-offs of running heavy background administrative tasks using
nohuportmuxdirectly on cloud servers versus architecting dedicated asynchronous worker queues (e.g., Celery, AWS SQS) in an enterprise platform environment.
View Answers
Beginner
- PID and PID 1: A PID (Process ID) is a unique numerical identifier assigned by the kernel to every running process.
PID 1is always assigned to the init system (usuallysystemdon modern Linux), which is the first process launched by the kernel and the parent of all other processes. - ps aux command:
ps auxdisplays a snapshot of all currently running processes on the system across all users (a), providing detailed user-centric information (u), including processes not attached to a terminal (x). - Why avoid kill -9:
kill -9sends aSIGKILLsignal, which abruptly terminates the process at the kernel level without warning. The process cannot intercept it to save state, close file descriptors, or release locks, which can lead to data corruption or orphaned resources.SIGTERM(kill -15) should always be used first.
Intermediate
- jobs, fg, and bg commands:
jobslists the background tasks currently managed by your shell session.fgbrings a background or suspended job to the foreground so you can interact with it.bgresumes a suspended job and lets it continue executing asynchronously in the background. - Purpose of grep -v grep: When you run
ps aux | grep [pattern], thegrepcommand itself becomes a running process containing the pattern. Piping togrep -v grepfilters out thatgrepprocess from the output, leaving only the actual target process you are searching for.
Advanced
- OOM killer and oom_score_adj: The OOM killer assigns each process an
oom_scorebased primarily on memory consumption. When memory is exhausted, the kernel kills the process with the highest score. An SRE can lower or set a negative value in/proc/[pid]/oom_score_adjfor critical daemons (like SSHD or a database), drastically reducing their score and protecting them from being targeted.
Scenario-Based Discussions
- nohup/tmux vs async worker queues: Running tasks with
nohup/tmuxis quick and requires no infrastructure overhead, but lacks observability, retries, distributed scaling, and fault tolerance. Worker queues (Celery/SQS) introduce architectural complexity but provide robust durability, horizontal scalability, automated retries, and comprehensive metric tracking crucial for enterprise-grade automation.