We use cookies to improve your browsing experience, support the operation of this site, and understand how visitors use our content.
You can accept all cookies, accept only essential cookies, or deny non-essential cookies.
Privacy Policy
This is Part 1 of our DevOps Interview QA series, covering the 10 most frequently asked DevOps interview questions. DevOps bridges software development and IT operations to deliver software faster, more reliably, and with tighter feedback loops — emphasizing automation, collaboration, and continuous improvement.
Q1: What Is CI/CD and How Do You Design a Pipeline?
Answer:
CI/CD (Continuous Integration / Continuous Delivery or Deployment) is the backbone of DevOps automation. CI merges code frequently into a shared repository with automated builds and tests. CD ensures validated code is automatically deployed to staging or production. Together they reduce manual errors, accelerate releases, and provide rapid feedback.
graph LR
subgraph CI["Continuous Integration"]
COMMIT["Code Commit<br/>(Git push)"]
COMMIT --> LINT["Lint &<br/>Static Analysis"]
LINT --> BUILD["Build<br/>(compile, package)"]
BUILD --> UNIT["Unit Tests"]
UNIT --> INTEG["Integration Tests"]
end
subgraph CD["Continuous Delivery / Deployment"]
INTEG --> ARTIFACT["Push Artifact<br/>(container image)"]
ARTIFACT --> STAGING["Deploy to Staging"]
STAGING --> E2E["E2E / Smoke Tests"]
E2E --> GATE["Approval Gate<br/>(manual or auto)"]
GATE --> PROD["Deploy to Production"]
PROD --> MONITOR["Monitor &<br/>Rollback if needed"]
end
style CI fill:#6cc3d5,stroke:#333,color:#fff
style CD fill:#56cc9d,stroke:#333,color:#fff
Continuous Delivery vs Continuous Deployment
Aspect
Continuous Delivery
Continuous Deployment
Definition
Code is always release-ready; deployment requires manual approval
Every change passing tests is deployed to production automatically
Human gate
Yes (manual approval before prod)
No (fully automated)
Risk
Lower (human review)
Requires robust automated testing
Speed
Fast, but gated
Fastest possible
Best for
Regulated industries, critical systems
High-velocity teams with strong test coverage
CI/CD Pipeline Best Practices
Practice
Description
Fast feedback
Unit tests run first (<5 min); slow tests run later
Fail fast
Pipeline stops on first failure, team notified immediately
Immutable artifacts
Build once, deploy same artifact to all environments
Environment parity
Dev/staging/prod are as similar as possible
Secrets isolation
Use vault/secrets manager, never hardcode credentials
Caching
Cache dependencies, Docker layers, test results
Parallelization
Run independent test suites concurrently
Idempotent deployments
Re-running deployment produces same result
CI/CD Tools Comparison
Tool
Type
Key Feature
Best For
GitHub Actions
SaaS, YAML workflows
Deep GitHub integration, marketplace
GitHub-centric teams
GitLab CI/CD
Integrated, YAML
Built into GitLab, Auto DevOps
GitLab users, all-in-one
Jenkins
Self-hosted, plugins
Maximum flexibility, huge ecosystem
Complex enterprise pipelines
CircleCI
SaaS
Fast, parallelism, Docker-native
Speed-focused teams
ArgoCD
GitOps, K8s-native
Declarative, auto-sync from Git
Kubernetes deployments
Tekton
K8s-native, CRDs
Cloud-native, reusable tasks
K8s-native CI/CD
Q2: How Do Docker Containers Work and Why Are They Used in DevOps?
Answer:
Docker containers package an application with all its dependencies (code, runtime, libraries, config) into a lightweight, portable unit that runs consistently across any environment. Unlike VMs, containers share the host OS kernel, making them fast to start and resource-efficient.
graph TD
subgraph VM["Virtual Machines"]
HW1["Hardware"]
HW1 --> HYP["Hypervisor"]
HYP --> OS1["Guest OS 1<br/>(full OS)"]
HYP --> OS2["Guest OS 2<br/>(full OS)"]
OS1 --> APP1["App A + Libs"]
OS2 --> APP2["App B + Libs"]
end
subgraph Container["Docker Containers"]
HW2["Hardware"]
HW2 --> HOST["Host OS + Docker Engine"]
HOST --> C1["Container 1<br/>(App A + Libs)"]
HOST --> C2["Container 2<br/>(App B + Libs)"]
HOST --> C3["Container 3<br/>(App C + Libs)"]
end
style VM fill:#6cc3d5,stroke:#333,color:#fff
style Container fill:#56cc9d,stroke:#333,color:#fff
Docker vs Virtual Machines
Feature
Docker Containers
Virtual Machines
Startup time
Seconds
Minutes
Size
MBs (application layer only)
GBs (full OS)
Resource usage
Lightweight (shared kernel)
Heavy (dedicated OS per VM)
Isolation
Process-level (namespaces, cgroups)
Hardware-level (hypervisor)
Portability
Run anywhere Docker is installed
Tied to hypervisor
Density
100s of containers per host
10s of VMs per host
Use case
Microservices, CI/CD, dev environments
Legacy apps, strong isolation, different OS
Docker Architecture
Component
Purpose
Dockerfile
Recipe to build an image (FROM, RUN, COPY, CMD)
Image
Immutable template; layers of filesystem changes
Container
Running instance of an image
Registry
Store and distribute images (Docker Hub, ECR, GCR)
Docker Compose
Define multi-container applications in YAML
Docker Engine
Daemon that builds, runs, manages containers
Dockerfile Best Practices
# Multi-stage build: smaller final imageFROM python:3.12-slim AS builderWORKDIR /appCOPY requirements.txt .RUNpip install --no-cache-dir-r requirements.txtFROM python:3.12-slimWORKDIR /appCOPY--from=builder /usr/local/lib/python3.12/site-packages /usr/local/lib/python3.12/site-packagesCOPY . .# Run as non-root user (security)RUNuseradd-r appuserUSER appuserEXPOSE 8000CMD ["uvicorn", "main:app", "--host", "0.0.0.0", "--port", "8000"]
Key Practices
1. Use multi-stage builds → smaller images
2. Pin base image versions → reproducibility
3. Run as non-root → security
4. Use .dockerignore → exclude unnecessary files
5. Order layers by change frequency → better caching
6. One process per container → composability
7. Health checks → orchestrator can detect unhealthy containers
8. No secrets in images → use runtime env vars or secrets mounts
Q3: How Does Kubernetes Orchestrate Containers at Scale?
Answer:
Kubernetes (K8s) is an open-source container orchestration platform that automates deployment, scaling, self-healing, and management of containerized applications. It abstracts infrastructure into a declarative API — you describe the desired state, and Kubernetes makes it happen.
graph TD
subgraph ControlPlane["Control Plane"]
API["API Server<br/>(kubectl, REST)"]
ETCD["etcd<br/>(cluster state store)"]
SCHED["Scheduler<br/>(assigns pods to nodes)"]
CM["Controller Manager<br/>(reconciliation loops)"]
end
subgraph WorkerNode["Worker Node"]
KUBELET["Kubelet<br/>(node agent)"]
PROXY["Kube-Proxy<br/>(networking)"]
RUNTIME["Container Runtime<br/>(containerd)"]
POD1["Pod<br/>(container(s))"]
POD2["Pod<br/>(container(s))"]
end
API --> ETCD
API --> SCHED
API --> CM
SCHED --> KUBELET
KUBELET --> RUNTIME
RUNTIME --> POD1
RUNTIME --> POD2
style ControlPlane fill:#6cc3d5,stroke:#333,color:#fff
style WorkerNode fill:#56cc9d,stroke:#333,color:#fff
Core Kubernetes Objects
Object
Purpose
Example
Pod
Smallest deployable unit (1+ containers)
Single app instance
Deployment
Manages ReplicaSets, rolling updates, rollbacks
Stateless web app
Service
Stable network endpoint for pods (ClusterIP, NodePort, LoadBalancer)
Internal or external access
ConfigMap
Non-sensitive configuration data
App settings, feature flags
Secret
Sensitive data (base64 encoded)
DB passwords, API keys
Ingress
HTTP/S routing rules, TLS termination
Domain-based routing
StatefulSet
Ordered, persistent pods with stable IDs
Databases, message queues
DaemonSet
One pod per node
Log collectors, monitoring agents
Job / CronJob
Run-to-completion tasks
Batch processing, scheduled tasks
HPA
Horizontal Pod Autoscaler
Scale pods by CPU/memory/custom
Kubernetes Self-Healing
Mechanism
What It Does
Liveness probe
Restarts container if health check fails
Readiness probe
Removes pod from service if not ready
ReplicaSet
Ensures desired number of pods always running
Node failure
Scheduler reschedules pods to healthy nodes
PodDisruptionBudget
Ensures minimum available pods during updates
Kubernetes Networking
Concept
Description
Pod-to-Pod
All pods can communicate without NAT (flat network)
Service
Virtual IP + DNS name load-balanced across pods
Ingress
L7 routing (path/host-based) from external traffic to services
NetworkPolicy
Firewall rules between pods (namespace/label selectors)
Service Mesh
Sidecar proxies for mTLS, retries, observability (Istio, Linkerd)
Q4: What Is Infrastructure as Code (IaC) and How Do You Use Terraform?
Answer:
Infrastructure as Code (IaC) is the practice of managing and provisioning infrastructure through machine-readable configuration files rather than manual processes. It makes infrastructure reproducible, version-controlled, auditable, and testable — treating infrastructure the same way you treat application code.
graph TD
subgraph Traditional["Manual Infrastructure"]
MANUAL["Click in Console<br/>(AWS/GCP/Azure)"]
MANUAL --> DRIFT["Configuration Drift<br/>(snowflake servers)"]
DRIFT --> UNDOC["Undocumented<br/>Changes"]
end
subgraph IaC["Infrastructure as Code"]
CODE["Define in Code<br/>(Terraform/CloudFormation)"]
CODE --> GIT["Version Control<br/>(Git)"]
GIT --> REVIEW["Code Review<br/>(PR/MR)"]
REVIEW --> PLAN["Plan<br/>(preview changes)"]
PLAN --> APPLY["Apply<br/>(provision infra)"]
APPLY --> STATE["State File<br/>(tracks what exists)"]
end
style Traditional fill:#ff6b6b,stroke:#333,color:#fff
style IaC fill:#56cc9d,stroke:#333,color:#fff
Reusable modules for common patterns (VPC, EKS, RDS)
Remote state
Store state in S3/GCS with locking (DynamoDB/GCS)
State isolation
Separate state per environment (dev/staging/prod)
Plan in CI
Auto-run terraform plan on PRs for review
Drift detection
Periodically compare actual vs desired state
Secrets out of code
Use variables, vault references, or encrypted values
Tagging
Tag all resources (team, env, cost-center)
Blast radius
Small, focused modules limit impact of mistakes
Q5: What Are Deployment Strategies and When Do You Use Each?
Answer:
A deployment strategy defines how new application versions are rolled out to production. The right strategy depends on risk tolerance, rollback requirements, infrastructure complexity, and team capabilities.
New version gets copy of traffic, results discarded
No
N/A (not serving)
Medium
Zero
A/B testing
Split users by segment
No
Instant
Medium
Low
Feature flags
Toggle features in code without deploy
No
Instant (flip flag)
Low
Low
When to Use Each Strategy
Strategy
Use When
Recreate
Dev/test environments; can tolerate downtime
Rolling
Standard choice for K8s; good test coverage exists
Blue-green
Need instant rollback; can afford 2x infrastructure
Canary
High-risk changes; want to validate with real traffic
Shadow
Major rewrites; need production validation without risk
Feature flags
Decouple deployment from release; gradual feature rollout
Zero-Downtime Deployment Requirements
For zero-downtime deployments, ensure:
1. Backward-compatible API changes (old clients must still work)
2. Database migrations are non-breaking (add column, NOT rename)
3. Health checks configured (readiness + liveness probes)
4. Graceful shutdown (drain connections before terminating)
5. Load balancer removes unhealthy instances automatically
6. Session handling is stateless (or externalized to Redis)
7. Enough capacity to serve traffic during rollout
Q6: How Do You Implement GitOps?
Answer:
GitOps is an operational framework that uses Git as the single source of truth for both application code and infrastructure declarations. Changes are made via pull requests, and an operator (ArgoCD, Flux) continuously reconciles the cluster state to match what’s declared in Git.
Q7: How Do You Implement Monitoring and Observability?
Answer:
Observability is the ability to understand a system’s internal state from its external outputs. It combines three pillars — metrics, logs, and traces — to provide complete visibility into distributed systems. Monitoring is proactive alerting on known failure modes; observability enables investigation of unknown unknowns.
Numeric measurements over time (counters, gauges, histograms)
Time-series
Prometheus, Datadog, CloudWatch
Logs
Discrete events with context (structured JSON preferred)
Text/JSON
ELK Stack, Loki, Fluentd
Traces
Request path across services with timing
Spans + trace ID
Jaeger, Tempo, OpenTelemetry
Key Metrics to Monitor (USE/RED)
Method
Metrics
Apply To
USE (Utilization, Saturation, Errors)
CPU %, queue depth, error count
Infrastructure (servers, disks, network)
RED (Rate, Errors, Duration)
Requests/sec, error rate %, p99 latency
Services (APIs, microservices)
Four Golden Signals (Google SRE)
Latency, traffic, errors, saturation
Any production system
Monitoring Stack
Component
Purpose
Tool Options
Metric collection
Scrape/push metrics from services
Prometheus, Telegraf, StatsD
Log aggregation
Centralize logs from all services
Fluentd/Fluent Bit → Loki/Elasticsearch
Distributed tracing
Track requests across microservices
OpenTelemetry → Jaeger/Tempo
Visualization
Dashboards and exploration
Grafana, Kibana, Datadog
Alerting
Notify on-call when thresholds breach
Alertmanager, PagerDuty, OpsGenie
SLO tracking
Monitor service level objectives
Sloth, Nobl9, custom Prometheus rules
Alerting Best Practices
Practice
Description
Alert on symptoms, not causes
Alert on “API error rate > 5%” not “CPU > 80%”
Reduce noise
Group related alerts; avoid duplicate pages
Actionable alerts
Every alert should have a clear runbook/response
Severity levels
Critical (page), Warning (ticket), Info (dashboard)
SLO-based alerts
Burn rate alerts: “burning error budget 10x faster than normal”
Test your alerts
Periodically verify alerts fire correctly
Q8: How Do You Manage Secrets in DevOps?
Answer:
Secrets management ensures sensitive data (API keys, database passwords, TLS certificates, tokens) is stored securely, accessed with least privilege, rotated regularly, and never exposed in code or logs.
graph TD
subgraph Bad["Anti-Patterns ✗"]
HARDCODE["Hardcoded in code"]
ENV_FILE[".env committed to Git"]
PLAIN["Plain text ConfigMap"]
end
subgraph Good["Best Practices ✓"]
VAULT["Secrets Manager<br/>(Vault, AWS SM)"]
INJECT["Runtime Injection<br/>(env vars, volumes)"]
ROTATE["Auto-Rotation<br/>(scheduled renewal)"]
AUDIT["Audit Logging<br/>(who accessed what)"]
end
Bad -->|"Migrate to"| Good
style Bad fill:#ff6b6b,stroke:#333,color:#fff
style Good fill:#56cc9d,stroke:#333,color:#fff
Secrets Management Tools
Tool
Type
Key Feature
Best For
HashiCorp Vault
Self-hosted/SaaS
Dynamic secrets, PKI, transit encryption
Enterprise, multi-cloud
AWS Secrets Manager
Managed (AWS)
Auto-rotation for RDS, Lambda integration
AWS-native
AWS SSM Parameter Store
Managed (AWS)
Free tier, hierarchical keys
Simple AWS use cases
Azure Key Vault
Managed (Azure)
HSM-backed, RBAC integration
Azure-native
GCP Secret Manager
Managed (GCP)
IAM integration, versioning
GCP-native
Sealed Secrets
K8s-native
Encrypt secrets in Git, decrypt in cluster
GitOps workflows
External Secrets Operator
K8s-native
Sync secrets from external vault into K8s
Multi-provider
SOPS
CLI tool
Encrypt YAML/JSON files with cloud KMS
Config files in Git
Secrets in Kubernetes
Approach
Security Level
Complexity
K8s Secret (base64)
Low (not encrypted at rest by default)
Simple
K8s Secret + etcd encryption
Medium
Moderate
Sealed Secrets
Medium-High (encrypted in Git)
Moderate
External Secrets Operator
High (pulls from Vault/SM at runtime)
Higher
CSI Secrets Store Driver
High (mounts secrets as volumes)
Higher
Secrets Management Principles
1. Never store secrets in source code or container images
2. Use separate secrets per environment (dev/staging/prod)
3. Apply least-privilege access (RBAC, IAM policies)
4. Rotate secrets automatically on a schedule
5. Audit all secret access (who, when, from where)
6. Encrypt secrets at rest AND in transit
7. Use short-lived credentials where possible (dynamic secrets)
8. Detect secrets in code with pre-commit hooks (gitleaks, trufflehog)
Q9: How Do You Handle Incident Response and Post-Mortems?
Answer:
Incident response is the structured process of detecting, diagnosing, resolving, and learning from production failures. DevOps teams need clear processes, defined roles, and a blame-free culture to handle incidents effectively and prevent recurrence.
Check dashboards, logs, traces; identify root cause
Grafana, Kibana, Jaeger
Mitigation
Rollback, feature flag off, scale up, failover
ArgoCD, kubectl, feature flags
Resolution
Deploy fix, verify recovery, close incident
CI/CD pipeline
Post-mortem
Blameless review, timeline, action items
Confluence, Google Docs
Post-Mortem Template
## Incident Post-Mortem: [Title]**Date:** 2026-05-21**Duration:** 47 minutes (10:15 - 11:02 UTC)**Severity:** SEV-2**Impact:** 30% of users experienced 500 errors on checkout### Timeline- 10:15 — Alert: checkout error rate > 10%- 10:18 — On-call engineer acknowledged- 10:25 — Root cause identified: bad config deployment- 10:32 — Rollback initiated- 11:02 — Error rate returned to baseline### Root CauseA config change removed the database connection pool setting,causing connection exhaustion under load.### What Went Well- Alert fired within 2 minutes of impact- Rollback was fast (< 10 minutes)### What Could Be Improved- Config changes lacked validation tests- No canary stage for config deployments### Action Items1. [ ] Add schema validation for config files (Owner: Alice, Due: May 28)2. [ ] Canary deploy for config changes (Owner: Bob, Due: June 4)3. [ ] Add integration test for DB pool settings (Owner: Carol, Due: May 25)
SRE Concepts
Concept
Definition
SLI (Service Level Indicator)
Metric measuring service quality (e.g., % requests < 200ms)
SLO (Service Level Objective)
Target value for SLI (e.g., 99.9% requests < 200ms)
Q10: How Do You Secure a DevOps Pipeline (DevSecOps)?
Answer:
DevSecOps integrates security practices into every stage of the DevOps lifecycle — “shifting security left” so vulnerabilities are caught early rather than discovered in production. Security is automated, continuous, and everyone’s responsibility.
graph LR
subgraph ShiftLeft["Shift Left Security"]
PLAN["Plan<br/>(threat modeling)"]
CODE["Code<br/>(SAST, secrets scan)"]
BUILD["Build<br/>(dependency scan,<br/>image scan)"]
TEST["Test<br/>(DAST, pen test)"]
DEPLOY["Deploy<br/>(policy gates,<br/>signed images)"]
OPERATE["Operate<br/>(runtime security,<br/>monitoring)"]
end
PLAN --> CODE --> BUILD --> TEST --> DEPLOY --> OPERATE
style ShiftLeft fill:#56cc9d,stroke:#333,color:#fff