Pod Lifecycle, Deployments, ReplicaSets & DaemonSets
Version: 2.0.0
Purpose: Canonical lesson structure for Platform Engineering & AI Infrastructure Curriculum.
Required Inputs: Module definition, lesson objectives, project standards.
Outputs: Standards-compliant lesson markdown.
Lesson Metadata
- Lesson ID:
MOD-K8S-02 - Module: Kubernetes Engineering (
MOD-K8S) - Difficulty: Intermediate
- Estimated Duration: 60 minutes
- Learning Track: 🟢 Core
- Version: 2.0.0
- Last Updated: 2026-06-28
Lesson Overview
This lesson explores the master workload management and zero-downtime deployment engines of Kubernetes, decrypting how Platform Engineers manage microservice availability using advanced controller objects. By mastering Pod immutability, ReplicaSets, Deployments, Rolling Updates, Rollbacks (kubectl rollout undo), DaemonSets, and Probes (Liveness, Readiness, Startup), you will firmly establish the elite workload capabilities supporting our module capability: “I can deploy, scale, operate, and troubleshoot production-grade Kubernetes cluster environments.”
Learning Objectives
- Explain the principle of Pod Immutability, detailing why running Pod specifications cannot be modified in-place and must be replaced.
- Deconstruct the architectural relationship between Deployments, ReplicaSets, and Pods, detailing how label selectors (
matchLabels) bind objects together. - Architect a zero-downtime Rolling Update deployment strategy (
strategy.type: RollingUpdate), configuringmaxSurgeandmaxUnavailableboundaries. - Execute rapid emergency deployment rollbacks using
kubectl rollout undoto restore previous stable application configurations instantly. - Contrast Deployments (stateless scaling across arbitrary nodes) with DaemonSets (guaranteeing exactly one Pod runs on every single worker node).
- Configure advanced container health checks using Liveness, Readiness, and Startup Probes to prevent routing traffic to broken containers.
Prerequisites
- Completion of
MOD-K8S-01(Kubernetes Control Plane Architecture & Reconciliation Loops). - Foundational understanding of HTTP health checks (
MOD-CLOUD-03), YAML manifests, andkubectlterminal execution.
Why This Exists
In Lesson 01, we established that a Pod is the smallest deployable atomic unit in Kubernetes. When junior engineers begin using Kubernetes, they frequently write standalone Pod manifests (kind: Pod) and deploy them directly to the cluster (kubectl apply -f my-pod.yaml).
Deploying standalone Pods directly to a production cluster is a massive operational vulnerability!
Imagine you are hired as a Lead Platform Engineer at a rapidly scaling healthcare technology enterprise. The previous engineers deployed the company’s master patient intake microservice using a standalone Pod manifest.
One afternoon, the software engineering team releases a critical security update for the intake microservice (transitioning from v1.0.0 to v2.0.0). Because they deployed a standalone Pod, they cannot perform an automated rolling update! They must manually execute kubectl delete pod intake-pod and then kubectl apply -f intake-pod-v2.yaml.
During the 45 seconds it takes for the old Pod to terminate and the new Pod to pull its container image and spin up, your entire patient intake portal goes completely offline!
Furthermore, when the new v2.0.0 Pod finally spins up, it contains a fatal database connection bug that causes the container process to crash in a loop (CrashLoopBackOff). Because you deleted the old Pod, you have absolutely no automated rollback history to instantly restore v1.0.0! Your patient portal remains entirely broken while engineers scramble to manually rewrite YAML files!
Your company has just suffered a catastrophic deployment outage!
To solve the monumental challenge of Deployment Downtime, Lack of Rollback History, Manual Pod Deletion, and Traffic Blackholing, Kubernetes leaders established Deployments, ReplicaSets, DaemonSets, and Container Probes. By wrapping your Pods in higher-level Deployment controllers that manage automated zero-downtime rolling updates, maintaining immutable ReplicaSet revision histories for instant rollbacks (kubectl rollout undo), and enforcing strict Readiness Probes that prevent routing traffic to unready containers, Platform Engineers guarantee that you can ship code dozens of times per day with absolute zero downtime!
Core Concepts
1. Pod Immutability (The Disposable Unit Imperative)
To manage production Kubernetes workloads, Platform Engineers enforce a strict principle of immutability:
- Pod Immutability: Once a Pod is created and bound to a worker node, its core specification (container image, environment variables, resource requests) is Immutable! You cannot modify a running Pod in-place! If you want to change the container image from
v1.0.0tov2.0.0, you must completely destroy the old Pod and create a brand-new replacement Pod! Pods are ephemeral, disposable units!
[ Standalone Pod: Immutable & Fragile ] [ Deployment Controller: Dynamic & Resilient ]
┌────────────────────────────────────────┐ ┌────────────────────────────────────────┐
│ kind: Pod (Image: v1.0.0) │ │ kind: Deployment ──► Manages ReplicaSets
│ (Cannot update in-place! Downtime!) │ │ (Spins up v2.0.0 cleanly before killing v1!)│
└────────────────────────────────────────┘ └────────────────────────────────────────┘2. Deployments vs. ReplicaSets vs. Pods (The Hierarchical Chain)
In production Kubernetes, you never create Pods directly. You create a Deployment. Deployments sit at the top of a strict three-tier controller hierarchy:
Deployment: The master declarative manager! You declare your desired container image (v1.0.0) and replica count (replicas: 3). The Deployment automatically creates a ReplicaSet!ReplicaSet: The master replica enforcer! Its sole responsibility is to guarantee that exactlyNPods matching its label selector (matchLabels: app=payment) are running at all times. If a Pod crashes, the ReplicaSet spins up a replacement!Pod: The physical execution container running your application code!
[ The Kubernetes Controller Hierarchy ]
[ kind: Deployment ] (Master Declarative Manager: Updates & Rollbacks)
│
└──► [ kind: ReplicaSet ] (Master Replica Enforcer: Guarantees N Pods)
│
└──► [ kind: Pod ] (Physical Execution Containers)3. Zero-Downtime Rolling Updates (maxSurge / maxUnavailable)
When you update a Deployment’s container image from v1.0.0 to v2.0.0, the Deployment executes a highly governed Rolling Update utilizing two strict mathematical guardrails:
maxSurge: The maximum number of extra Pods that can be created above your desired replica count during the update (e.g.,25%or1).maxUnavailable: The maximum number of Pods that can be offline below your desired replica count during the update (e.g.,25%or0).- The Rolling Mechanics: The Deployment creates a brand-new ReplicaSet for
v2.0.0. It spins up 1 new Pod in the new ReplicaSet. Once that new Pod passes its health checks, the Deployment scales down the oldv1.0.0ReplicaSet by 1 Pod. It repeats this rolling dance until 100% of the Pods are runningv2.0.0! Zero seconds of downtime!
[ Zero-Downtime Rolling Update Mechanics ]
(Old ReplicaSet: v1.0.0) [Pod 1: RUNNING] [Pod 2: RUNNING] [Pod 3: TERMINATING]
(New ReplicaSet: v2.0.0) [Pod 4: RUNNING] [Pod 5: PENDING]4. Automated Rollbacks (kubectl rollout undo)
What happens when you deploy v2.0.0, but it contains a catastrophic runtime bug?
- ReplicaSet Revision History: When a Deployment finishes a rolling update, it does NOT delete the old
v1.0.0ReplicaSet! It simply scales its replica count to0and preserves it in the cluster’setcdrevision history! - Instant Rollback: If
v2.0.0fails, you simply typekubectl rollout undo deployment/my-app. The Deployment instantly scales the brokenv2.0.0ReplicaSet down to0and scales the old stablev1.0.0ReplicaSet back up to3! Your application heals instantly!
5. Deployments vs. DaemonSets
Platform Engineers must choose between two distinct workload controller paradigms depending on scheduling requirements:
- Deployment (Arbitrary Placement): Scales stateless Pods across arbitrary worker nodes based on free CPU/RAM. You might have 3 Pods running on Node A and 0 Pods running on Node B! Use Case: Web applications, APIs, microservices!
- DaemonSet (Strict Node Placement): A specialized controller that completely bypasses normal scheduling to guarantee that exactly one Pod runs on every single worker node in your cluster! If you add 50 brand-new worker nodes to your cluster, the DaemonSet instantly spins up exactly 1 Pod on each new node automatically! Use Case: Log forwarding agents (FluentBit), monitoring daemons (Prometheus Node Exporter), and CNI network plugins (Calico)!
[ Deployment: Arbitrary Node Placement ] [ DaemonSet: Exactly 1 Pod Per Node ]
┌────────────────────────────────────────┐ ┌────────────────────────────────────────┐
│ Node 1: [Pod] [Pod] | Node 2: (Empty)│ │ Node 1: [Pod] | Node 2: [Pod] │
└────────────────────────────────────────┘ └────────────────────────────────────────┘6. Container Probes (Liveness vs. Readiness vs. Startup)
How does Kubernetes know whether your container is healthy enough to receive incoming web traffic? Platform Engineers configure three distinct Container Probes:
- Liveness Probe: Answers: “Is the container process dead in a deadlock?” (
HTTP GET /healthz). If the Liveness probe fails 3 consecutive times,kubeletforcefully kills the container and restarts it! - Readiness Probe: Answers: “Is the container ready to receive live user web traffic?” (
HTTP GET /readyz). If the Readiness probe fails,kubeletdoes NOT kill the container! Instead, the Pod’s IP address is instantly unlinked from all active network routing endpoints, preventing incoming user traffic from reaching the unready container! - Startup Probe: Answers: “Has the legacy monolithic application finished its massive 3-minute bootup sequence?” Protects slow-starting containers by disabling Liveness and Readiness probes until the Startup probe passes!
Architecture
Real-World Example
Imagine you are managing an airline’s booking system. The system runs on a Production Floor (Kubernetes Cluster) functioning through a strict layered architecture.
Originally, the team deployed the flight search microservice directly at Layer 3 (The Work Unit) and completely omitted Readiness and Liveness checks at Layer 4.
One Friday afternoon, the software engineering team ships a brand-new release of the flight search API. Because they lack a Layer 1: Rollout Manager, they manually dismiss the old workers and hire new ones.
When the new Worker Units spin up, they start instantly, but the internal system takes exactly 45 seconds to prepare its massive database connection pools. Because there is no check at Layer 4: The Health Inspector, the system assumes the workers are ready the exact millisecond they show up!
The system instantly floods the brand-new Worker Units with thousands of live user flight search requests at Layer 5 (Network Routing). Because the database pools aren’t ready, every single user request fails! Thousands of customers abandon their bookings!
Because you maintain elite standards, you take command of the workload re-architecture. You transition the flight search microservice to be governed by Layer 1 (Deployment), which manages Layer 2 (ReplicaSet), and enforce strict Pulse Checks and Traffic Readiness Checks at Layer 4 (kubelet Probe Engine).
You configure a Readiness check to ensure it passes exclusively after database pools are fully initialized. You configure a zero-downtime strategy.
Now, when the team ships a new version, the Layer 1 Rollout Manager commands Layer 2 to spin up a new Layer 3 Worker Unit. Layer 4 (The Health Inspector) continuously checks the Traffic Readiness. For the first 45 seconds, the check fails, so Layer 5 (Network Routing) refuses to send a single user request to the new Worker Unit! Once it passes, the system routes live traffic cleanly to the new Worker Unit, and safely retires an old Worker Unit. Your airline enterprise achieves absolute zero-downtime deployments with zero dropped user requests!
Hands-on Demonstration
Let’s look at how an engineer inspects a production Deployment manifest using cat, inspects active rollout statuses using kubectl rollout status, and executes an emergency rollback using kubectl rollout undo.
Input 1: Inspecting Production Deployment Manifests (deployment.yaml)
We use cat to inspect a pristine, highly governed Kubernetes Deployment manifest defining a Rolling Update strategy, label selectors, resource requests, Liveness probes, and Readiness probes.
Code 1
# Inspect the declarative production Kubernetes Deployment manifest.
# (We simulate inspecting a compliant Kubernetes Deployment configuration file)
cat << 'EOF'
apiVersion: apps/v1
kind: Deployment
metadata:
name: production-payment-api
namespace: default
labels:
app: payment-api
tier: backend
spec:
replicas: 3
strategy:
type: RollingUpdate
rollingUpdate:
maxSurge: 1
maxUnavailable: 0
selector:
matchLabels:
app: payment-api
template:
metadata:
labels:
app: payment-api
spec:
containers:
- name: payment-microservice
image: mycompany/payment-api:v2.0.0
ports:
- containerPort: 8080
resources:
requests:
memory: "256Mi"
cpu: "200m"
limits:
memory: "512Mi"
cpu: "500m"
livenessProbe:
httpGet:
path: /healthz
port: 8080
initialDelaySeconds: 15
periodSeconds: 10
readinessProbe:
httpGet:
path: /readyz
port: 8080
initialDelaySeconds: 5
periodSeconds: 5
EOFExpected Output 1
apiVersion: apps/v1
kind: Deployment
metadata:
name: production-payment-api
namespace: default
labels:
app: payment-api
tier: backend
spec:
replicas: 3
strategy:
type: RollingUpdate
rollingUpdate:
maxSurge: 1
maxUnavailable: 0
selector:
matchLabels:
app: payment-api
template:
metadata:
labels:
app: payment-api
spec:
containers:
- name: payment-microservice
image: mycompany/payment-api:v2.0.0
ports:
- containerPort: 8080
resources:
requests:
memory: "256Mi"
cpu: "200m"
limits:
memory: "512Mi"
cpu: "500m"
livenessProbe:
httpGet:
path: /healthz
port: 8080
initialDelaySeconds: 15
periodSeconds: 10
readinessProbe:
httpGet:
path: /readyz
port: 8080
initialDelaySeconds: 5
periodSeconds: 5Explanation 1
Look at how beautifully architected this Deployment configuration is! Let’s deconstruct the elite workload elements:
strategy.rollingUpdate: Absolute zero-downtime perfection!maxSurge: 1allows creating 1 extra Pod during updates, whilemaxUnavailable: 0guarantees that our cluster will NEVER drop below 3 active, running Pods!selector.matchLabels: The master controller binding glue! Guarantees the Deployment manages Pods possessingapp: payment-api.readinessProbe: The master traffic shield!kubeletcontinuously pings/readyz; if it fails, the Pod’s IP is instantly unlinked from network routing tables!
Input 2: Inspecting Rollout Statuses and Executing Emergency Rollbacks
We simulate executing kubectl rollout status to monitor an active deployment rollout, and simulate executing kubectl rollout undo to perform an emergency rollback.
Code 2
# Monitor the live rollout execution status of a Deployment rolling update.
# (We simulate the clean plain-text output of kubectl rollout status)
kubectl rollout status deployment/production-payment-api 2>/dev/null || cat << 'EOF'
Waiting for deployment "production-payment-api" rollout to finish: 1 out of 3 new replicas have been updated...
Waiting for deployment "production-payment-api" rollout to finish: 2 out of 3 new replicas have been updated...
Waiting for deployment "production-payment-api" rollout to finish: 1 old replicas are pending termination...
deployment "production-payment-api" successfully rolled out
EOF
# Simulate a catastrophic v2.0.0 runtime failure and execute an emergency rollback to v1.0.0.
# (We simulate the clean plain-text output of kubectl rollout undo)
echo -e "--- CATASTROPHIC RUNTIME BUG DETECTED (HTTP 500) ---\n# ACTION: Executing emergency rollback to previous stable ReplicaSet revision...\nkubectl rollout undo deployment/production-payment-api\ndeployment.apps/production-payment-api rolled back"Expected Output 2
Waiting for deployment "production-payment-api" rollout to finish: 1 out of 3 new replicas have been updated...
Waiting for deployment "production-payment-api" rollout to finish: 2 out of 3 new replicas have been updated...
Waiting for deployment "production-payment-api" rollout to finish: 1 old replicas are pending termination...
deployment "production-payment-api" successfully rolled out
--- CATASTROPHIC RUNTIME BUG DETECTED (HTTP 500) ---
# ACTION: Executing emergency rollback to previous stable ReplicaSet revision...
kubectl rollout undo deployment/production-payment-api
deployment.apps/production-payment-api rolled backExplanation 2
Notice how perfectly managed our deployment rollout state is! kubectl rollout status cleanly outputs our rolling update progress, confirming successfully rolled out. Notice our emergency rollback simulation: it beautifully demonstrates our recovery engine! With a single command (kubectl rollout undo), Kubernetes instantly scales down the broken v2.0.0 ReplicaSet and scales back up our stable v1.0.0 ReplicaSet! Absolute operational safety!
Hands-on Lab
- Objective: Author a declarative Deployment manifest defining a Rolling Update strategy, Liveness probes, and Readiness probes, simulate executing
kubectl rollout status, simulate executingkubectl rollout undo, and verify workload governance. - Estimated Time: 20 minutes
- Difficulty: Intermediate
- Environment: Interactive Browser Terminal / Local Sandbox (with kubectl installed)
Step-by-step Instructions
- Open your terminal sandbox and create a brand-new directory named
deployment-lab:mkdir ~/deployment-lab && cd ~/deployment-lab. - Create a declarative YAML manifest defining a production Kubernetes Deployment by typing:
cat << 'EOF' > deployment-spec.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
name: production-web-deployment
namespace: default
labels:
app: web-app
spec:
replicas: 3
strategy:
type: RollingUpdate
rollingUpdate:
maxSurge: 1
maxUnavailable: 0
selector:
matchLabels:
app: web-app
template:
metadata:
labels:
app: web-app
spec:
containers:
- name: web-container
image: nginx:1.26-alpine
ports:
- containerPort: 80
livenessProbe:
httpGet:
path: /
port: 80
initialDelaySeconds: 5
periodSeconds: 10
readinessProbe:
httpGet:
path: /
port: 80
initialDelaySeconds: 2
periodSeconds: 5
EOF- Type
cat deployment-spec.yamlto inspect your pristine Kubernetes Deployment declaration! Noticestrategy.type: RollingUpdateandreadinessProbe. - Simulate applying your Deployment declaration to the cluster using
kubectl apply -f deployment-spec.yamlby typing:
# (We simulate the exact kubectl apply execution)
echo "deployment.apps/production-web-deployment created"- Simulate verifying active Deployment execution states using
kubectl get deploymentsby typing:
# (We simulate the exact kubectl get deployments execution)
echo -e "NAME\t\t\t\tREADY\tUP-TO-DATE\tAVAILABLE\tAGE\nproduction-web-deployment\t3/3\t3\t\t3\t\t45s"- Simulate updating your Deployment container image to a non-existent broken image (
nginx:broken-tag) by typing:
# (We simulate executing kubectl set image)
echo "kubectl set image deployment/production-web-deployment web-container=nginx:broken-tag"
echo "deployment.apps/production-web-deployment image updated"- Simulate verifying the stuck rollout status caused by the broken image by typing:
# (We simulate the exact kubectl rollout status execution during a broken update)
echo "Waiting for deployment \"production-web-deployment\" rollout to finish: 1 out of 3 new replicas have been updated..."
echo "# WARNING: Rollout stuck! New Pod stuck in ImagePullBackOff / ErrImagePull. Old 3 Pods remain 100% online due to maxUnavailable: 0!"- Simulate executing an emergency rollback to restore your previous stable ReplicaSet revision by typing:
# (We simulate the exact kubectl rollout undo execution)
echo "kubectl rollout undo deployment/production-web-deployment"
echo "deployment.apps/production-web-deployment rolled back"
echo "SUCCESS: Rollback complete. Stable Nginx 1.26 ReplicaSet fully restored!"Verification
cat deployment-spec.yaml | grep -E "type.*RollingUpdate" || echo "RollingUpdate Verified"If your terminal successfully outputs your type: RollingUpdate string, you have mastered foundational Kubernetes Deployment strategies and rollback mechanics!
Troubleshooting
- Issue:
kubectl rollout undofails witherror: no rollout history found for deployment "production-web-deployment". - Solution: You have completely disabled revision history retention by setting
spec.revisionHistoryLimit: 0inside your Deployment manifest, OR you literally just created the Deployment for the very first time and have never performed an update! EnsurerevisionHistoryLimitis set to at least10(the default)!
Cleanup
# Safely remove the demonstration deployment lab directory
rm -rf ~/deployment-labProduction Notes
In enterprise Kubernetes architecture, what happens when you want to deploy a brand-new version (v2.0.0) of your microservice, but you want to route exactly 5% of live user web traffic to the new version to test its stability before rolling it out to 100% of users? Standard Kubernetes Deployments cannot easily do this because they scale by integer Pod counts! Platform Engineers solve this by deploying Argo Rollouts or Istio Service Mesh. Argo Rollouts introduces an advanced custom controller (kind: Rollout) that integrates directly with your Ingress controller to manage beautiful, highly governed Canary Deployments with automated mathematical traffic weighting (weight: 5%)!
Common Mistakes
- Mismatched Label Selectors (
matchLabels): Beginners frequently declarematchLabels: app=payment-apiinside their Deploymentspec.selector, but declarelabels: app=web-frontendinside theirtemplate.metadata. If the template labels do not exactly match the selector labels, the API Server forcefully rejects the YAML manifest with a fatal validation error! Selector labels and template labels MUST match exactly! - Setting
maxUnavailable: 100%: Junior developers frequently setmaxUnavailable: 100%inside theirrollingUpdatestrategy to make deployments finish faster. This instructs Kubernetes to instantly terminate 100% of your running Pods before spinning up a single new Pod, causing instant total platform downtime! Never setmaxUnavailableto 100% in production!
Failure-Driven Learning
Imagine a junior engineer attempts to deploy an application into a Kubernetes cluster, but when they inspect kubectl get pods, the Pod is stuck in a frustrating, endless crash cycle known as CrashLoopBackOff.
Simulated Failure
# Simulating a Pod stuck in CrashLoopBackOff due to a misconfigured Liveness probe
# (We simulate the exact kubectl get pods / kubectl describe pod error during probe failures)
echo -e "NAME\t\t\tREADY\tSTATUS\t\tRESTARTS\tAGE\nproduction-api-pod\t0/1\tCrashLoopBackOff\t12 (2m ago)\t35m\n\n--- KUBECTL DESCRIBE POD EVENTS ---\nWarning Unhealthy 35m (x36 over 35m) kubelet Liveness probe failed: HTTP probe failed with statuscode: 404\nNormal Killing 35m (x12 over 35m) kubelet Container production-api failed liveness probe, will be restarted\n# FATAL: Pod stuck in CrashLoopBackOff. kubelet continuously killing and restarting container."Output
NAME READY STATUS RESTARTS AGE
production-api-pod 0/1 CrashLoopBackOff 12 (2m ago) 35m
--- KUBECTL DESCRIBE POD EVENTS ---
Warning Unhealthy 35m (x36 over 35m) kubelet Liveness probe failed: HTTP probe failed with statuscode: 404
Normal Killing 35m (x12 over 35m) kubelet Container production-api failed liveness probe, will be restarted
# FATAL: Pod stuck in CrashLoopBackOff. kubelet continuously killing and restarting container.Diagnosis & Recovery
Why did this fail? Look at this classic container probe failure: Liveness probe failed: HTTP probe failed with statuscode: 404 followed by Normal Killing! When a Pod enters CrashLoopBackOff, it means the container process is starting, but crashing or being forcefully killed shortly after! The junior engineer configured a Liveness probe pointing to HTTP GET /healthz. However, the application developer completely forgot to write the code for the /healthz HTTP route! Because /healthz returns HTTP 404 Not Found, kubelet assumes the container is dead in a deadlock and forcefully kills it every 15 seconds! To recover correctly, the engineer must either correct the Liveness probe path to a valid route (e.g., /) or instruct the developer to implement /healthz, and CrashLoopBackOff is eliminated permanently!
Engineering Decisions
Workload Controller: Deployment vs. StatefulSet vs. DaemonSet
When architecting an enterprise workload strategy, engineering leaders must choose the master controller object.
- Deployment: Manages stateless Pods across arbitrary worker nodes. Pods are completely anonymous and interchangeable (
pod-abc,pod-xyz). Excellent for web servers, APIs, and stateless microservices. - DaemonSet: Bypasses normal scheduling to guarantee exactly one Pod runs on every single worker node. Excellent for cluster-wide daemons (log aggregators, monitoring agents, CNI plugins).
- StatefulSet: Manages stateful Pods requiring unique, persistent identities (
pod-0,pod-1) and stable persistent storage attachments (volumeClaimTemplates). Excellent for databases (PostgreSQL, MongoDB) and distributed consensus clusters (Kafka, ZooKeeper). - The Platform Decision: Platform Engineers strictly mandate Deployments as the master controller for all stateless applications, strictly deploy DaemonSets for all cluster logging/monitoring infrastructure, and strictly reserve StatefulSets exclusively for stateful database clusters.
Best Practices
- Master
kubectl rollout history: Before executing an emergency rollback, inspect your revision history usingkubectl rollout history deployment/[name]. It displays a clean list of all active ReplicaSet revisions currently preserved inetcd, allowing you to rollback to a specific revision using--to-revision=2! - Define Conservative Initial Delay (
initialDelaySeconds): When configuring Liveness and Readiness probes, always configure a generousinitialDelaySeconds(e.g.,15to30seconds). This gives slow-booting application runtimes (e.g., JVM or Node.js) adequate time to initialize beforekubeletbegins firing health check pings!
Troubleshooting Guide
Issue 1: “ImagePullBackOff / ErrImagePull” vs. “CrashLoopBackOff”
- Cause: You attempt to deploy workloads, but encounter container image retrieval failures or continuous process crashes.
- Diagnosis & Solution:
ImagePullBackOff / ErrImagePull:kubeletsuccessfully received the Pod specification, but when it attempted to pull the container image from the registry, the image tag completely did not exist (nginx:non-existent), OR the private container registry rejected the request due to missing authentication credentials (imagePullSecrets)! To fix, verify the image tag string and ensure your Kubernetes secret contains valid Docker registry credentials!CrashLoopBackOff: The container image pulled successfully and the process started, but the internal application immediately crashed with a fatal exit code (exit 1- e.g., due to a missing environment variable or malformed configuration file), OR a Liveness probe failed! To fix, executekubectl logs [pod_name] --previousto view the exact fatal stack trace of the crashed container process!
Summary
- Pod Immutability mandates that running Pod specifications cannot be modified in-place; they must be completely replaced.
- Deployments manage ReplicaSets, which enforce that exactly
NPods matching a label selector (matchLabels) run at all times. - Rolling Updates achieve zero downtime by spinning up new Pods cleanly before terminating old Pods (
maxSurge/maxUnavailable). - Automated Rollbacks (
kubectl rollout undo) instantly restore previous stable ReplicaSet revisions preserved inetcd. - DaemonSets guarantee exactly one Pod runs on every single worker node in your cluster.
- Liveness Probes kill deadlocked containers; Readiness Probes unlink unready containers from active network routing tables.
Cheat Sheet
# Create or update a declarative Deployment manifest inside your cluster
kubectl apply -f deployment-spec.yaml
# Monitor live rollout execution status of an active Deployment rolling update
kubectl rollout status deployment/[deployment_name]
# Inspect active ReplicaSet revision history preserved in etcd for a Deployment
kubectl rollout history deployment/[deployment_name]
# Execute an emergency rollback to instantly restore the previous stable ReplicaSet revision
kubectl rollout undo deployment/[deployment_name]
# Execute an emergency rollback to a specific historical ReplicaSet revision
kubectl rollout undo deployment/[deployment_name] --to-revision=2
# Retrieve all active Deployments, ReplicaSets, and DaemonSets in your cluster
kubectl get deployments,replicasets,daemonsets -o wideKnowledge Check
Multiple Choice Questions
- A developer configures a Deployment manifest for a critical web API with
replicas: 4. They configure aRollingUpdatestrategy withmaxSurge: 1andmaxUnavailable: 0. They update the container image fromv1.0tov2.0. Thev2.0container image has a bug that causes the container to fail its Readiness probe (/readyz). What is the correct evaluation of how Kubernetes handles this rolling update?- A) Kubernetes will instantly terminate all 4 old Pods and spin up 4 new broken Pods, causing total platform downtime.
- B) The Deployment spins up 1 new
v2.0Pod in a new ReplicaSet (maxSurge: 1). Because the new Pod fails its Readiness probe, it never entersReadystate. BecausemaxUnavailable: 0is enforced, Kubernetes refuses to terminate a single oldv1.0Pod! The rollout pauses safely in place, and the 4 old Pods remain 100% online serving user traffic with zero downtime. - C)
kubeletwill automatically rewrite the application code to fix the bug. - D) The Deployment requires
chmod 777.
Scenario Questions
You have deployed a monitoring agent into your Kubernetes cluster using a standard Deployment manifest with replicas: 10 across your 10 Worker Nodes. Tomorrow, your company executes a massive scaling event and provisions 20 brand-new Worker Nodes (bringing the total to 30 nodes). You notice that the monitoring agent completely fails to spin up on the 20 new nodes. Based on what you learned in this lesson, what exact workload controller object should you have used instead of a Deployment to guarantee automatic placement on new nodes?
Short Answer Questions
Explain the operational difference between a Liveness Probe and a Readiness Probe, specifically addressing what kubelet does to the container when each probe fails.
Interview Preparation
Beginner Questions
- What is a Kubernetes Deployment?
- What is the difference between a Deployment and a DaemonSet?
- What does
kubectl rollout undodo?
Intermediate Questions
- Explain how
maxSurgeandmaxUnavailablecontrol the execution of a zero-downtime Rolling Update. - What is the difference between a Liveness probe and a Readiness probe?
Advanced Questions
- Explain how a Kubernetes Deployment controller calculates the hash of a Pod template (
pod-template-hash) to uniquely identify and manage child ReplicaSets inetcd, and describe the architectural implications of modifying a Deployment’s label selector (spec.selector) after creation.
Scenario-Based Discussions
- Discuss the architectural trade-offs of establishing a deployment strategy that relies exclusively on standard Kubernetes Rolling Updates (
strategy.type: RollingUpdate) versus adopting an advanced Blue/Green deployment strategy utilizing separate, isolated Deployments and dynamic Service endpoint switching, specifically addressing cloud resource consumption (doubling active Pod counts), rollback latency, and handling breaking database schema migrations.
View Answers
Beginner
- Kubernetes Deployment: A master declarative manager for your applications that dictates the desired state (e.g., replica count, container image) and automatically manages the lifecycle, scaling, and rolling updates of Pods via ReplicaSets.
- Deployment vs. DaemonSet: A Deployment manages a specific number of Pod replicas scheduled arbitrarily across the cluster. A DaemonSet completely bypasses normal scheduling to guarantee exactly one Pod runs on every single worker node in the cluster.
- kubectl rollout undo: A command that instantly rolls back a Deployment to a previous stable ReplicaSet revision (saved in
etcd), automatically scaling down the broken pods and scaling up the healthy ones.
Intermediate
- maxSurge and maxUnavailable:
maxSurgesets the maximum number of extra Pods that can be created above the desired replica count during an update.maxUnavailablesets the maximum number of Pods that can be offline below the desired count. Together, they enforce strict mathematical guardrails to guarantee zero downtime during a Rolling Update. - Liveness vs. Readiness probe: A Liveness probe checks if a container is running properly (e.g., not deadlocked); if it fails,
kubeletforcefully restarts the container. A Readiness probe checks if a container is ready to accept traffic; if it fails, the Pod’s IP is unlinked from network endpoints (no traffic routed), but the container is not killed.
Advanced
- Pod template hash and Label selectors: The Deployment controller calculates a hash (
pod-template-hash) based on the Pod template (spec.template) and adds it as a label to the ReplicaSet and its Pods to uniquely identify and map them. Modifying a Deployment’s label selector (spec.selector) after creation is heavily restricted/prohibited because it breaks the mapping to existing ReplicaSets and Pods, creating orphaned resources or catastrophic overlap with other Deployments.
Scenario-Based Discussions
- Rolling Updates vs. Blue/Green Deployments: Rolling Updates consume minimal extra cloud resources (governed by
maxSurge), but rollbacks can take time (re-spinning older image Pods) and rolling deployments make breaking database schema changes difficult because two versions of the app run concurrently. Blue/Green deployments provision an entirely isolated duplicate environment (doubling cloud resource consumption temporarily). However, Blue/Green enables instant rollbacks by simply toggling a Service endpoint switch back to the old environment, and provides a completely safe, untainted boundary to execute and test breaking database schema migrations before live user traffic cuts over.