Kubernetes Deployment Order
Namespace
A Namespace is a virtual cluster inside a physical Kubernetes
cluster. It provides logical isolation — you can have the same resource names in different
namespaces without conflict. Think of it like folders on a computer. By default K8s has: default, kube-system, kube-public, kube-node-lease.
"Namespace provides a mechanism for isolating groups of resources within a single cluster. Resources like Deployments, Services, and Pods are namespace-scoped, while Nodes, PersistentVolumes, and ClusterRoles are cluster-scoped. This lets teams share a cluster without stepping on each other — dev, staging, prod can all live in one cluster but in separate namespaces with their own resource quotas and RBAC policies."
apiVersion: v1
kind: Namespace
metadata:
name: my-app
labels:
env: production
team: backend
kubectl commands
# Create namespace kubectl create namespace my-app kubectl create ns my-app # List all namespaces kubectl get ns # Run all commands in a namespace kubectl get pods -n my-app kubectl get all -n my-app # Set default namespace for session kubectl config set-context --current --namespace=my-appConfigMap
A ConfigMap stores non-sensitive configuration data as key-value pairs. It decouples environment-specific configuration from container images, so the same image works in dev/staging/prod by just swapping the ConfigMap. Pods consume ConfigMaps as env vars, command-line args, or mounted files.
"ConfigMap is a K8s API object used to store non-confidential data in key-value pairs. The main benefit is separation of concerns — your app image stays the same but the config changes per environment. Pods can consume ConfigMap values as environment variables via envFrom or env.valueFrom, or as files mounted via a volume. One important thing — ConfigMap updates don't automatically restart pods. If you mount it as a volume, K8s will eventually update the file, but if you use it as an env var, you need to manually restart the pod."
apiVersion: v1
kind: ConfigMap
metadata:
name: app-config
namespace: my-app
data:
APP_ENV: "production"
LOG_LEVEL: "info"
DB_HOST: "postgres-service"
config.yaml: | # file-style key
server:
port: 8080
timeout: 30s
---
# Pod consuming ConfigMap in all 3 ways
apiVersion: v1
kind: Pod
metadata:
name: app-pod
namespace: my-app
spec:
containers:
- name: app
image: myapp:1.0
# Pattern 1: Load ALL keys as env vars
envFrom:
- configMapRef:
name: app-config
# Pattern 2: Load specific key as env var
env:
- name: DATABASE_HOST
valueFrom:
configMapKeyRef:
name: app-config
key: DB_HOST
# Pattern 3: Mount as file in container
volumeMounts:
- name: config-volume
mountPath: /etc/config
volumes:
- name: config-volume
configMap:
name: app-config
kubectl commands
# From literal values kubectl create cm app-config --from-literal=APP_ENV=prod --from-literal=LOG_LEVEL=info # From a file kubectl create cm app-config --from-file=config.yaml # From env file (KEY=VALUE format) kubectl create cm app-config --from-env-file=.env # Generate YAML without creating (exam trick!) kubectl create cm app-config --from-literal=K=V --dry-run=client -o yaml # View configmap data kubectl describe cm app-config -n my-appSecret
A Secret stores sensitive data like passwords,
tokens, SSH keys. Values are base64 encoded (NOT encrypted by default — just encoded). For
real security, use encryption at rest (EncryptionConfiguration) + RBAC. Secret types: Opaque (generic), kubernetes.io/tls, kubernetes.io/dockerconfigjson, kubernetes.io/service-account-token.
"Secrets are similar to ConfigMaps but designed for sensitive data. Values are base64 encoded — which is encoding not encryption, so anyone who can access the Secret object can decode it. Best practice is to enable etcd encryption at rest and use strict RBAC. Secrets are mounted as tmpfs (in-memory) volumes so they never hit disk in the container. One key difference from ConfigMap — when you create a secret imperatively, kubectl auto base64-encodes the values. But in YAML, you need to base64-encode yourself unless you use the stringData field."
apiVersion: v1
kind: Secret
metadata:
name: app-secret
namespace: my-app
type: Opaque
# Option A: base64 encoded values
data:
DB_PASSWORD: cGFzc3dvcmQxMjM= # echo -n "password123" | base64
API_KEY: c2VjcmV0a2V5
# Option B: plain text (K8s encodes automatically)
stringData:
DB_PASSWORD: "password123"
API_KEY: "secretkey"
---
# Consuming in Pod
apiVersion: v1
kind: Pod
metadata:
name: app-pod
spec:
containers:
- name: app
image: myapp:1.0
envFrom:
- secretRef:
name: app-secret
env:
- name: DB_PASS
valueFrom:
secretKeyRef:
name: app-secret
key: DB_PASSWORD
volumeMounts:
- name: secret-vol
mountPath: /etc/secrets
readOnly: true
volumes:
- name: secret-vol
secret:
secretName: app-secret
kubectl commands
kubectl create secret generic app-secret --from-literal=DB_PASSWORD=pass123 kubectl create secret generic app-secret --from-file=ssh-key=id_rsa # TLS secret kubectl create secret tls my-tls --cert=cert.pem --key=key.pem # Docker registry secret kubectl create secret docker-registry regcred \ --docker-server=gcr.io --docker-username=user --docker-password=pass # Decode a secret value kubectl get secret app-secret -o jsonpath='{.data.DB_PASSWORD}' | base64 -dPersistentVolume & PersistentVolumeClaim
PV = actual storage provisioned by admin (NFS, EBS, HostPath). It's
cluster-scoped.
PVC = user's request for storage. Pod uses PVC, not PV directly. K8s matches PVC to PV via
accessModes + capacity.
Access Modes: ReadWriteOnce (RWO — 1 node), ReadOnlyMany (ROX — many nodes read), ReadWriteMany (RWX
— many nodes write).
Reclaim Policy: Retain (keep data), Delete (delete on PVC delete), Recycle (deprecated).
"PV and PVC implement a two-tier abstraction for storage in K8s. Admin creates PVs that represent actual storage infrastructure. Developers create PVCs to claim that storage without needing to know the underlying infrastructure. K8s binds a PVC to a PV when capacity and accessModes match. The Pod then references the PVC. This separation means infrastructure and application code are decoupled. StorageClass adds dynamic provisioning — the PV gets created automatically when PVC is submitted, no manual PV creation needed."
apiVersion: v1
kind: PersistentVolume
metadata:
name: my-pv
spec:
capacity:
storage: 5Gi
accessModes:
- ReadWriteOnce # RWO
persistentVolumeReclaimPolicy: Retain
storageClassName: manual # must match PVC
hostPath: # for local/dev (use NFS/EBS in prod)
path: /mnt/data
---
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
name: my-pvc
namespace: my-app
spec:
accessModes:
- ReadWriteOnce
resources:
requests:
storage: 2Gi # request <= PV capacity
storageClassName: manual # must match PV
---
apiVersion: v1
kind: Pod
metadata:
name: app-with-storage
namespace: my-app
spec:
containers:
- name: app
image: nginx
volumeMounts:
- name: storage
mountPath: /data
volumes:
- name: storage
persistentVolumeClaim:
claimName: my-pvc # reference PVC, not PV
kubectl commands
# No direct imperative command for PV/PVC creation — use YAML # But useful commands: kubectl get pv # list all PVs kubectl get pvc -n my-app # list PVCs in ns kubectl describe pvc my-pvc -n my-app # check binding # Check PVC status — should be "Bound" kubectl get pvc my-pvc -n my-app -o widePod
A Pod is the smallest deployable unit in K8s. It
wraps one or more containers that share the same network namespace and storage. Containers in a Pod
communicate via localhost. Pods are ephemeral — when they die,
they're gone. That's why you use Deployments/StatefulSets to manage them. Key concepts: init
containers (run before app), sidecar containers (run alongside app),
resource requests/limits.
"A Pod is the atomic unit of scheduling in Kubernetes. It hosts one or more tightly-coupled containers that share an IP address, hostname, and storage volumes. In practice, most pods have one container — multi-container pods are used for sidecar patterns like log shipping or service mesh proxies. Pods are ephemeral by design; you never manage them directly in production. Instead, controllers like Deployment or StatefulSet manage pods and ensure the desired count is always running. Resource requests are used for scheduling decisions, while limits enforce runtime constraints."
apiVersion: v1
kind: Pod
metadata:
name: full-app-pod
namespace: my-app
labels:
app: myapp
version: "1.0"
spec:
# Init container runs first, completes, then app starts
initContainers:
- name: init-db-check
image: busybox
command: ['sh', '-c', 'until nc -z postgres-service 5432; do sleep 2; done']
containers:
- name: app
image: myapp:1.0
ports:
- containerPort: 8080
# Resources
resources:
requests:
memory: "128Mi"
cpu: "250m"
limits:
memory: "256Mi"
cpu: "500m"
# Env from ConfigMap + Secret
envFrom:
- configMapRef:
name: app-config
- secretRef:
name: app-secret
# Liveness + Readiness probes
livenessProbe:
httpGet:
path: /healthz
port: 8080
initialDelaySeconds: 10
periodSeconds: 10
readinessProbe:
httpGet:
path: /ready
port: 8080
initialDelaySeconds: 5
periodSeconds: 5
volumeMounts:
- name: storage
mountPath: /data
- name: config-volume
mountPath: /etc/config
# Sidecar container
- name: log-shipper
image: fluentd:latest
volumeMounts:
- name: shared-logs
mountPath: /var/log
volumes:
- name: storage
persistentVolumeClaim:
claimName: my-pvc
- name: config-volume
configMap:
name: app-config
- name: shared-logs
emptyDir: {}
restartPolicy: Always # Always | OnFailure | Never
kubectl commands
kubectl run nginx-pod --image=nginx --port=80 kubectl run nginx-pod --image=nginx --dry-run=client -o yaml > pod.yaml # With env vars kubectl run app --image=myapp --env="ENV=prod" --env="PORT=8080" # With resource limits kubectl run app --image=myapp --requests='cpu=100m,memory=128Mi' \ --limits='cpu=200m,memory=256Mi' # Execute command in pod kubectl exec -it nginx-pod -- /bin/bash kubectl exec nginx-pod -- env | grep APP # Logs kubectl logs nginx-pod -f # follow kubectl logs nginx-pod -c log-shipper # specific container kubectl logs nginx-pod --previous # crashed pod logsReplicaSet
A ReplicaSet ensures a specified number of Pod
replicas are running at all times. If a pod dies, RS creates a new one. Uses a label
selector to track pods.
⚠️ In practice — don't use RS directly! Use
Deployment instead which manages RS and adds rolling updates + rollback capabilities. RS is
the underlying mechanism Deployment uses.
"ReplicaSet is a controller that maintains a stable set of replica pods running at any given time. It uses label selectors to identify which pods it manages. If you manually delete a pod, the ReplicaSet notices the actual count doesn't match desired count and creates a new one. However, ReplicaSet alone has no rollout strategy — you can't do rolling updates with it. That's why in production we always use Deployment, which owns a ReplicaSet and adds update strategies, rollback, and pause/resume capabilities on top."
apiVersion: apps/v1
kind: ReplicaSet
metadata:
name: app-rs
namespace: my-app
spec:
replicas: 3
selector: # RS uses this to find/track pods
matchLabels:
app: myapp # MUST match pod template labels
template:
metadata:
labels:
app: myapp # MUST match selector
spec:
containers:
- name: app
image: myapp:1.0
ports:
- containerPort: 8080
kubectl commands
# No direct imperative for RS — generate YAML: kubectl get rs -n my-app kubectl describe rs app-rs -n my-app # Scale replicaset kubectl scale rs app-rs --replicas=5 -n my-appDeployment
A Deployment manages ReplicaSets and adds declarative
updates. It's the most common workload for stateless apps. Key features:
rolling updates (zero downtime), rollback, scaling,
pause/resume.
Strategy types:
• RollingUpdate: gradual replacement (default) — configurable via maxSurge / maxUnavailable
• Recreate: kill all pods then create new (causes downtime)
"Deployment is the standard way to run stateless applications in Kubernetes. It manages one or more ReplicaSets — when you update the Deployment, it creates a new RS and gradually shifts traffic from old to new RS, that's your rolling update. You can control the speed with maxSurge (extra pods during update) and maxUnavailable (pods that can go down). Every update creates a new RS, and Kubernetes keeps old RSes for rollback. You can roll back with kubectl rollout undo, which just swaps which RS is active. It's an immutable history of your rollouts."
apiVersion: apps/v1
kind: Deployment
metadata:
name: app-deployment
namespace: my-app
labels:
app: myapp
spec:
replicas: 3
selector:
matchLabels:
app: myapp
strategy:
type: RollingUpdate
rollingUpdate:
maxSurge: 1 # max extra pods during update
maxUnavailable: 0 # no downtime (0 = zero-downtime)
template:
metadata:
labels:
app: myapp
spec:
containers:
- name: app
image: myapp:1.0
ports:
- containerPort: 8080
resources:
requests:
cpu: "100m"
memory: "128Mi"
limits:
cpu: "500m"
memory: "256Mi"
livenessProbe:
httpGet:
path: /healthz
port: 8080
initialDelaySeconds: 10
periodSeconds: 10
readinessProbe:
httpGet:
path: /ready
port: 8080
initialDelaySeconds: 5
periodSeconds: 5
kubectl commands
kubectl create deployment app-deploy --image=myapp:1.0 --replicas=3 kubectl create deployment app-deploy --image=myapp:1.0 --dry-run=client -o yaml # Scale kubectl scale deployment app-deploy --replicas=5 -n my-app # Update image (triggers rolling update) kubectl set image deployment/app-deploy app=myapp:2.0 -n my-app # Check rollout status kubectl rollout status deployment/app-deploy -n my-app # Rollback to previous version kubectl rollout undo deployment/app-deploy -n my-app kubectl rollout undo deployment/app-deploy --to-revision=2 # View history kubectl rollout history deployment/app-deploy # Pause/Resume rolling update kubectl rollout pause deployment/app-deploy kubectl rollout resume deployment/app-deployStatefulSet
A StatefulSet manages stateful applications that need stable
identity and persistent storage. Unlike Deployment, each pod gets:
• Sticky identity: pod-0, pod-1,
pod-2 (not random names)
• Stable DNS: pod-0.service.ns.svc.cluster.local
• Per-pod PVC: via volumeClaimTemplates
Use for: databases (MySQL, PostgreSQL, MongoDB), Kafka,
Zookeeper, Elasticsearch. Requires a Headless Service.
"StatefulSet is for stateful workloads where pods need a stable network identity and persistent storage that survives pod restarts. The key difference from Deployment is that pods are created and deleted in order — pod-0 must be Running before pod-1 starts. Each pod has a predictable DNS name through a headless service. volumeClaimTemplates creates a separate PVC for each pod, so pod-0 always gets pvc-0 even after rescheduling. This is essential for databases where each replica node needs its own dedicated storage and stable hostname for replication configuration."
apiVersion: v1
kind: Service
metadata:
name: mysql-headless # Headless service for DNS
namespace: my-app
spec:
clusterIP: None # Makes it headless!
selector:
app: mysql
ports:
- port: 3306
---
apiVersion: apps/v1
kind: StatefulSet
metadata:
name: mysql
namespace: my-app
spec:
serviceName: mysql-headless # MUST reference headless service
replicas: 3
selector:
matchLabels:
app: mysql
template:
metadata:
labels:
app: mysql
spec:
containers:
- name: mysql
image: mysql:8.0
env:
- name: MYSQL_ROOT_PASSWORD
valueFrom:
secretKeyRef:
name: mysql-secret
key: password
ports:
- containerPort: 3306
volumeMounts:
- name: data
mountPath: /var/lib/mysql
# Each pod gets its own PVC!
volumeClaimTemplates:
- metadata:
name: data
spec:
accessModes: ["ReadWriteOnce"]
resources:
requests:
storage: 10Gi
kubectl commands
# No direct imperative — use YAML. Useful debugging: kubectl get statefulset -n my-app kubectl get pods -l app=mysql -n my-app # see pod-0, pod-1, pod-2 kubectl scale statefulset mysql --replicas=5 -n my-app # Access specific pod by DNS # mysql-0.mysql-headless.my-app.svc.cluster.localDaemonSet
A DaemonSet ensures one pod runs on every node (or a
subset via node selectors). When nodes are added to cluster, pods are added automatically. When nodes are
removed, pods are garbage collected.
Use cases: log collectors (Fluentd, Filebeat), monitoring agents
(Prometheus node-exporter, Datadog), network plugins (CNI like Calico, Weave),
storage daemons.
"DaemonSet guarantees that a copy of a pod runs on every node — or a subset if you use nodeSelector or affinity rules. It's cluster infrastructure stuff — log shippers, monitoring agents, network proxies, anything that needs to run at the node level. Unlike Deployment where you specify replica count, DaemonSet is implicitly replicas=number-of-nodes. When you add a node, K8s automatically schedules the DaemonSet pod on it. It also tolerates the control-plane taint by default in newer K8s versions so it can run on master nodes too for system-level agents."
apiVersion: apps/v1
kind: DaemonSet
metadata:
name: log-collector
namespace: kube-system
spec:
selector:
matchLabels:
app: fluentd
template:
metadata:
labels:
app: fluentd
spec:
tolerations: # Run on master/control-plane too
- key: node-role.kubernetes.io/control-plane
effect: NoSchedule
containers:
- name: fluentd
image: fluentd:latest
resources:
limits:
memory: "200Mi"
cpu: "100m"
volumeMounts:
- name: varlog
mountPath: /var/log
- name: varlibdockercontainers
mountPath: /var/lib/docker/containers
readOnly: true
volumes:
- name: varlog
hostPath:
path: /var/log
- name: varlibdockercontainers
hostPath:
path: /var/lib/docker/containers
kubectl commands
kubectl get daemonset -n kube-system kubectl describe ds log-collector -n kube-system # Trick: generate YAML from deployment then change kind kubectl create deployment log-ds --image=fluentd --dry-run=client -o yaml \ | sed 's/kind: Deployment/kind: DaemonSet/' \ | sed '/replicas/d' | sed '/strategy/d' > ds.yamlJob
A Job runs pods to completion (not indefinitely like
Deployment). It guarantees that a specified number of pods successfully terminate. Key
params:
• completions: how many successful pod completions needed
• parallelism: how many pods run simultaneously
• backoffLimit: retry limit before marking job failed
• activeDeadlineSeconds: max job duration
Use restartPolicy: OnFailure or Never (not Always).
"Job creates one or more pods and tracks successful completions. When completions are reached, the job is done. For batch processing, you set parallelism to run multiple pods simultaneously and completions to total tasks. The key difference from a Deployment is that a Job terminates — pods aren't restarted after success. You use restartPolicy: OnFailure so failed pods retry, or Never to get a new pod each attempt. BackoffLimit controls how many times a failed pod retries before the whole job fails."
apiVersion: batch/v1
kind: Job
metadata:
name: data-processor
namespace: my-app
spec:
completions: 10 # total successful completions needed
parallelism: 3 # 3 pods at a time
backoffLimit: 4 # retry 4 times before fail
activeDeadlineSeconds: 300 # fail if not done in 5min
template:
spec:
restartPolicy: OnFailure # OnFailure or Never (NOT Always!)
containers:
- name: processor
image: python:3.9
command: ["python", "-c", "print('Processing batch job')"]
resources:
requests:
cpu: "100m"
memory: "64Mi"
kubectl commands
kubectl create job my-job --image=busybox -- echo "hello" kubectl create job my-job --image=busybox --dry-run=client -o yaml -- echo "hello" kubectl get jobs -n my-app kubectl describe job data-processor kubectl logs job/data-processorCronJob
A CronJob creates Jobs on a schedule (like Linux
cron). Uses standard cron syntax: * * * * * (minute hour day month weekday). Key
params:
• successfulJobsHistoryLimit: keep N successful jobs (default 3)
• failedJobsHistoryLimit: keep N failed jobs (default 1)
• concurrencyPolicy: Allow | Forbid | Replace
• startingDeadlineSeconds: deadline to start if missed window
"CronJob is a layer on top of Job that runs it on a schedule using cron syntax. CronJob creates a Job, which creates Pods. So it's CronJob → Job → Pod. The concurrencyPolicy is important — Allow means multiple scheduled jobs can run simultaneously, Forbid skips new job if previous is still running, Replace kills the old and starts new. You need to manage job history to avoid accumulating too many completed jobs — use successfulJobsHistoryLimit and failedJobsHistoryLimit to control that."
apiVersion: batch/v1
kind: CronJob
metadata:
name: daily-backup
namespace: my-app
spec:
schedule: "0 2 * * *" # Every day at 2 AM
# schedule: "*/5 * * * *" # Every 5 minutes
concurrencyPolicy: Forbid # Don't run if previous still running
successfulJobsHistoryLimit: 3
failedJobsHistoryLimit: 1
startingDeadlineSeconds: 60 # Start within 60s of schedule or skip
jobTemplate: # Job spec goes here
spec:
backoffLimit: 2
template:
spec:
restartPolicy: OnFailure
containers:
- name: backup
image: backup-tool:latest
command: ["/bin/sh", "-c", "pg_dump -h postgres > /backup/dump.sql"]
volumeMounts:
- name: backup-storage
mountPath: /backup
volumes:
- name: backup-storage
persistentVolumeClaim:
claimName: backup-pvc
kubectl commands
kubectl create cronjob daily-backup --image=busybox --schedule="0 2 * * *" -- echo backup kubectl create cronjob my-cj --image=busybox --schedule="*/5 * * * *" --dry-run=client -o yaml kubectl get cronjob -n my-app kubectl get jobs -n my-app # see jobs created by cronjob # Manually trigger a cronjob kubectl create job manual-run --from=cronjob/daily-backupService Types
A Service exposes pods via a stable network endpoint. Pods come and go but Service IP stays. Uses label selectors to find target pods and kube-proxy to route traffic.
| Type | Access | Use case |
|---|---|---|
| ClusterIP | Internal only | Default. Pod-to-pod communication |
| NodePort | Node IP:Port | Dev/test external access (30000-32767) |
| LoadBalancer | Cloud LB IP | Production cloud external access |
| ExternalName | DNS alias | Alias external service (no selector) |
| Headless | Direct pod IPs | StatefulSets, service discovery |
"Service provides stable DNS and IP for a dynamic set of pods. ClusterIP is default — only reachable within the cluster. NodePort extends it and opens a port on every node's IP. LoadBalancer extends NodePort and provisions a cloud load balancer in front. The key insight is they're additive — LoadBalancer creates NodePort creates ClusterIP. For StatefulSets you use a headless service — clusterIP: None — which returns the actual pod IPs from DNS instead of a virtual IP, so clients can connect directly to individual pods."
# 1. ClusterIP (default - internal only)
apiVersion: v1
kind: Service
metadata:
name: app-svc
namespace: my-app
spec:
type: ClusterIP
selector:
app: myapp
ports:
- port: 80 # service port
targetPort: 8080 # container port
protocol: TCP
---
# 2. NodePort (external via node IP)
apiVersion: v1
kind: Service
metadata:
name: app-nodeport
spec:
type: NodePort
selector:
app: myapp
ports:
- port: 80
targetPort: 8080
nodePort: 30080 # optional, auto-assigned if omitted (30000-32767)
---
# 3. LoadBalancer (cloud provider LB)
apiVersion: v1
kind: Service
metadata:
name: app-lb
spec:
type: LoadBalancer
selector:
app: myapp
ports:
- port: 80
targetPort: 8080
---
# 4. Headless (StatefulSet / direct pod access)
apiVersion: v1
kind: Service
metadata:
name: mysql-headless
spec:
clusterIP: None # This makes it headless!
selector:
app: mysql
ports:
- port: 3306
---
# 5. ExternalName (alias to external DNS)
apiVersion: v1
kind: Service
metadata:
name: external-db
spec:
type: ExternalName
externalName: my-database.rds.amazonaws.com # no selector!
kubectl commands
# Expose a deployment as ClusterIP kubectl expose deployment app-deploy --port=80 --target-port=8080 # Expose as NodePort kubectl expose deployment app-deploy --type=NodePort --port=80 --target-port=8080 # Create service directly kubectl create service clusterip my-svc --tcp=80:8080 kubectl create service nodeport my-svc --tcp=80:8080 --node-port=30080 kubectl create service loadbalancer my-svc --tcp=80:8080 # Get service endpoints kubectl get endpoints app-svc -n my-app # Test connectivity from inside cluster kubectl run test --image=busybox --rm -it -- wget -qO- http://app-svcHPA — Horizontal Pod Autoscaler
HPA automatically scales pod count based on observed
metrics. Runs as a control loop (every 15s by default).
CPU-based HPA: Built-in. Uses Metrics Server. Scales when average CPU across pods crosses
target %.
KEDA (Kubernetes Event Driven Autoscaler): External operator. Scales based on event
sources — queue length (RabbitMQ, SQS, Kafka), cron, Prometheus metrics, etc. Can scale
to 0 (saves cost). CPU HPA cannot scale to 0.
"HPA watches resource metrics and adjusts replica count. The standard HPA uses CPU and memory from the metrics server — if average CPU exceeds your target, it scales up. KEDA extends this with 50+ scalers for event-driven sources. The critical advantage of KEDA over CPU-HPA is scale-to-zero — if there are no messages in your queue, KEDA scales to 0 replicas. CPU-HPA can only scale to minReplicas (minimum 1). For microservices processing async jobs from a queue, KEDA is the right choice. For typical HTTP services, CPU-based HPA is simpler and sufficient."
# CPU-based HPA (requires metrics-server installed)
apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
name: app-hpa
namespace: my-app
spec:
scaleTargetRef:
apiVersion: apps/v1
kind: Deployment
name: app-deployment
minReplicas: 2
maxReplicas: 10
metrics:
- type: Resource
resource:
name: cpu
target:
type: Utilization
averageUtilization: 70 # scale up if CPU > 70%
- type: Resource
resource:
name: memory
target:
type: Utilization
averageUtilization: 80
---
# KEDA ScaledObject (scale based on queue length)
apiVersion: keda.sh/v1alpha1
kind: ScaledObject
metadata:
name: app-keda-scaler
namespace: my-app
spec:
scaleTargetRef:
name: app-deployment
minReplicaCount: 0 # KEDA can scale to ZERO!
maxReplicaCount: 20
triggers:
- type: rabbitmq
metadata:
host: amqp://rabbitmq:5672
queueName: tasks
queueLength: "5" # 1 pod per 5 messages in queue
kubectl commands
# Create HPA imperatively (CPU) kubectl autoscale deployment app-deployment --cpu-percent=70 --min=2 --max=10 kubectl get hpa -n my-app kubectl describe hpa app-hpa -n my-app # see current metrics # Generate YAML kubectl autoscale deployment app-deployment --cpu-percent=70 --min=2 --max=10 \ --dry-run=client -o yaml| Feature | HPA (CPU) | KEDA |
|---|---|---|
| Scale to zero | ❌ min 1 | ✅ yes |
| Event sources | CPU, Memory | 50+ (queue, cron, DB, Prometheus...) |
| Installation | Built-in | Separate operator |
| Best for | HTTP services | Async/event-driven workloads |
| Lag before scale | ~15-30s | Near real-time |
VPA — Vertical Pod Autoscaler
VPA automatically adjusts CPU and memory
requests/limits for containers (vertical scaling = more resources, not more pods). It has 3
modes:
• Off: Only recommend, no auto-apply
• Initial: Apply only at pod creation
• Auto: Apply and restart pods when needed
⚠️ VPA + HPA conflict! Don't use both on same target for CPU/Memory. Use VPA for vertical,
HPA for horizontal, or use HPA with custom metrics + VPA on non-conflicting resources.
"VPA solves the problem of right-sizing — developers often over-provision CPU/memory to be safe, wasting resources. VPA observes actual usage over time and recommends or automatically adjusts resource requests and limits. In Auto mode, it will evict and restart pods with updated resources. The main drawback is that restarting pods causes brief disruption. So for production, many teams use Recommendation mode to get suggestions and apply them during maintenance windows. VPA and HPA shouldn't both be targeting the same CPU metric — they'll fight each other. Safe combo is HPA on CPU + VPA in recommendation-only mode."
apiVersion: autoscaling.k8s.io/v1
kind: VerticalPodAutoscaler
metadata:
name: app-vpa
namespace: my-app
spec:
targetRef:
apiVersion: apps/v1
kind: Deployment
name: app-deployment
updatePolicy:
updateMode: "Auto" # "Off" | "Initial" | "Auto"
resourcePolicy:
containerPolicies:
- containerName: "*" # apply to all containers
minAllowed:
cpu: 100m
memory: 50Mi
maxAllowed:
cpu: "2"
memory: 2Gi
controlledResources: ["cpu", "memory"]
Taints & Tolerations
Taints are applied to Nodes — they repel pods that
can't tolerate them.
Tolerations are applied to Pods — they allow pods to schedule on tainted
nodes.
Taint effects:
• NoSchedule: New pods won't schedule. Existing pods stay.
• PreferNoSchedule: Try not to schedule (soft).
• NoExecute: Evict existing pods too (hard). Pod needs toleration or gets
kicked.
⚠️ Taints/Tolerations only say "this pod CAN go to tainted node" — it doesn't FORCE it there. Use
NodeAffinity for forcing.
"Taints and tolerations are a push mechanism — taints push pods away from nodes. A taint on a node means 'no pod is allowed here unless they tolerate this taint'. Tolerations on pods say 'I can handle that taint, don't reject me.' NoSchedule prevents future pods, NoExecute additionally evicts existing pods that don't tolerate it. A classic use case is GPU nodes — you taint them with gpu=true:NoSchedule so general workloads don't land on expensive GPU nodes, and only your ML pods that have the matching toleration get scheduled there. But important — tolerations alone don't guarantee the pod goes to that node. You still need node affinity or nodeSelector for that positive selection."
# Taints are applied with kubectl (node level)
# kubectl taint nodes node1 gpu=true:NoSchedule
# kubectl taint nodes node1 gpu=true:NoSchedule- (remove taint with -)
# Pod with Toleration
apiVersion: v1
kind: Pod
metadata:
name: gpu-pod
spec:
tolerations:
- key: "gpu"
operator: "Equal" # Equal | Exists
value: "true"
effect: "NoSchedule" # NoSchedule | NoExecute | PreferNoSchedule
# - key: "gpu"
# operator: "Exists" # tolerates any value with key "gpu"
containers:
- name: ml-app
image: tensorflow/tensorflow:latest-gpu
kubectl commands
# Add taint to node kubectl taint nodes node1 gpu=true:NoSchedule kubectl taint nodes node1 env=prod:NoExecute kubectl taint nodes node1 dedicated=backend:PreferNoSchedule # Remove taint (append -) kubectl taint nodes node1 gpu=true:NoSchedule- # View taints on nodes kubectl describe node node1 | grep -i taint kubectl get nodes -o json | jq '.items[].spec.taints'Node Affinity & Pod Affinity
Node Affinity: PULL pods TOWARD specific nodes (based on node
labels). Advanced version of nodeSelector.
• requiredDuringSchedulingIgnoredDuringExecution: Hard rule (MUST match)
• preferredDuringSchedulingIgnoredDuringExecution: Soft rule (try to
match)
Pod Affinity / Anti-Affinity: Schedule pods RELATIVE to other pods.
• Affinity: "Schedule near pods with label X" (co-location)
• Anti-Affinity: "Don't schedule near pods with label X" (spread out, HA)
"Node affinity is the positive complement to taints — taints repel, affinity attracts. You label your nodes (disk=ssd, region=us-east) and use requiredDuringScheduling for hard constraints or preferredDuringScheduling for best-effort. Pod affinity is more nuanced — it says 'schedule me on the same node or same zone as pods matching this selector'. Anti-affinity is the opposite — 'spread my replicas across zones'. For high availability, you always use pod anti-affinity with topologyKey=topology.kubernetes.io/zone to ensure replicas land in different availability zones, so one zone failure doesn't take down your whole app."
apiVersion: apps/v1
kind: Deployment
metadata:
name: ha-deployment
namespace: my-app
spec:
replicas: 3
selector:
matchLabels:
app: myapp
template:
metadata:
labels:
app: myapp
spec:
affinity:
# NODE AFFINITY — schedule only on SSD nodes
nodeAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
nodeSelectorTerms:
- matchExpressions:
- key: disk-type
operator: In # In | NotIn | Exists | DoesNotExist | Gt | Lt
values:
- ssd
preferredDuringSchedulingIgnoredDuringExecution:
- weight: 100
preference:
matchExpressions:
- key: region
operator: In
values: [us-east-1]
# POD ANTI-AFFINITY — spread replicas across zones (HA!)
podAntiAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
- labelSelector:
matchLabels:
app: myapp # don't schedule with other myapp pods
topologyKey: topology.kubernetes.io/zone # one per zone
# POD AFFINITY — schedule near cache pods (co-location)
podAffinity:
preferredDuringSchedulingIgnoredDuringExecution:
- weight: 80
podAffinityTerm:
labelSelector:
matchLabels:
app: redis-cache
topologyKey: kubernetes.io/hostname # same node as redis
containers:
- name: app
image: myapp:1.0
| Operator | Meaning | Example |
|---|---|---|
In |
Label value is in list | disk-type In [ssd, nvme] |
NotIn |
Label value NOT in list | env NotIn [dev, staging] |
Exists |
Key exists (any value) | gpu Exists |
DoesNotExist |
Key doesn't exist | spot DoesNotExist |
Gt |
Greater than | cores Gt 4 |
Lt |
Less than | memory Lt 8 |
| Concept | Applied To | Direction | Type |
|---|---|---|---|
| Taint | Node | REPEL pods | Push (node says "go away") |
| Toleration | Pod | Accept taint | "I can handle that taint" |
| Node Affinity | Pod | ATTRACT to node | Pull (pod says "go here") |
| Pod Affinity | Pod | Near other pods | Co-locate |
| Pod Anti-Affinity | Pod | Away from pods | Spread for HA |
| nodeSelector | Pod | Exact label match | Simpler node affinity |
Must-know shortcuts for the exam
# 1. ALWAYS use --dry-run to generate YAML fast kubectl run pod --image=nginx --dry-run=client -o yaml > pod.yaml kubectl create deploy app --image=nginx --dry-run=client -o yaml > deploy.yaml # 2. Edit running resource kubectl edit deployment app-deployment # 3. Apply changes imperatively kubectl apply -f pod.yaml kubectl replace --force -f pod.yaml # delete + recreate # 4. jsonpath — extract specific field kubectl get pod my-pod -o jsonpath='{.status.podIP}' kubectl get nodes -o jsonpath='{.items[*].metadata.name}' # 5. Watch resources in real-time kubectl get pods -w # 6. Check what went wrong kubectl describe pod my-pod | tail -20 # events section kubectl events --for pod/my-pod # 7. Copy file to/from pod kubectl cp my-pod:/etc/config/app.conf ./app.conf kubectl cp ./app.conf my-pod:/etc/config/app.conf # 8. Port forward for quick testing kubectl port-forward pod/my-pod 8080:80 kubectl port-forward svc/my-svc 8080:80