We use cookies to improve your browsing experience, support the operation of this site, and understand how visitors use our content.
You can accept all cookies, accept only essential cookies, or deny non-essential cookies.
Privacy Policy
This is Part 2 of our System Design Interview QA series, focusing on infrastructure components and operational systems that power production-grade architectures. While Part 1 covered foundational concepts (scalability, CAP theorem, etc.), this article dives deep into how specific infrastructure components work and how to design them.
Q1: How Does Load Balancing Work and How Do You Design a Load Balancer?
Answer:
A load balancer distributes incoming network traffic across multiple backend servers to ensure no single server is overwhelmed, improving availability, throughput, and fault tolerance.
graph TD
subgraph RR["Round Robin"]
RR1["Request 1 → Server A"]
RR2["Request 2 → Server B"]
RR3["Request 3 → Server C"]
RR4["Request 4 → Server A"]
end
subgraph LC["Least Connections"]
LC1["Server A: 5 active"]
LC2["Server B: 2 active ← next request"]
LC3["Server C: 8 active"]
end
subgraph WRR["Weighted Round Robin"]
WRR1["Server A (weight 5): gets 5 of every 8"]
WRR2["Server B (weight 2): gets 2 of every 8"]
WRR3["Server C (weight 1): gets 1 of every 8"]
end
style RR fill:#56cc9d,stroke:#333,color:#fff
style LC fill:#ffce67,stroke:#333
style WRR fill:#6cc3d5,stroke:#333,color:#fff
Algorithm
Best For
Weakness
Round Robin
Equal-capacity servers, stateless services
Ignores server load
Weighted Round Robin
Mixed hardware capacities
Static weights, doesn’t adapt
Least Connections
Long-lived connections (WebSocket, DB)
May route to slow servers
Least Response Time
Latency-sensitive services
Requires constant measurement
IP Hash
Session affinity without sticky cookies
Uneven with few clients
Consistent Hashing
Cache distribution (Redis Cluster)
Complex implementation
Random
Large server pools, simplicity
Variance with few servers
Session Persistence (Sticky Sessions)
Problem: User state (shopping cart, login session) lives on one server.
If next request goes to different server → state lost.
Solutions (from worst to best):
1. Sticky sessions (cookie/IP-based routing to same server)
- Simple but defeats load balancing purpose
- Server failure = lost sessions
2. Session replication (broadcast sessions to all servers)
- Network overhead grows O(n²)
- Memory wasted on every server
3. Centralized session store (Redis/Memcached) ← RECOMMENDED
- Any server can handle any request
- Session stored in Redis with TTL
- Server failure has zero impact on sessions
- Scales independently
Health Check Design
Type
Mechanism
Interval
Use Case
TCP check
Can connect to port?
5-10s
Basic availability
HTTP check
GET /health returns 200?
5-10s
Application-level health
Deep health check
Checks DB connectivity, disk space, dependencies
30s
Comprehensive readiness
Health check state machine:
HEALTHY → 3 consecutive failures → UNHEALTHY (remove from pool)
UNHEALTHY → 2 consecutive successes → HEALTHY (add back to pool)
Drain mode: stop sending new requests, wait for active to complete
Q2: How Do You Design a Caching System and What Caching Strategies Exist?
Answer:
Caching stores frequently accessed data in fast storage (memory) to reduce latency and database load. A well-designed caching strategy can reduce P99 latency from 100ms to <1ms.
Problem: Cache key expires → hundreds of requests simultaneously hit DB → DB overload
Solutions:
1. Lock/mutex: Only one request fetches from DB, others wait
cache_key = "user:123"
lock_key = f"lock:{cache_key}"
if not redis.get(cache_key):
if redis.set(lock_key, "1", nx=True, ex=5): # acquire lock
data = db.query(...)
redis.set(cache_key, data, ex=300)
redis.delete(lock_key)
else:
wait_for_cache() # spin until cache populated
2. Probabilistic early recomputation:
- Each read checks: should I refresh? (probability increases near TTL)
- Spreads refresh across time window
3. Background refresh (refresh-ahead):
- Background job refreshes popular keys before expiry
- No stampede possible
Q3: How Do Message Queues Work and When Should You Use Them?
Answer:
Message queues enable asynchronous communication between services by decoupling producers (senders) from consumers (receivers). They provide buffering, load leveling, and guaranteed delivery.
Exactly-once in practice:
- Kafka: Idempotent producer + transactions + consumer offset commit
- Application-level: Idempotency key in each message
→ Consumer checks: "Have I processed message with ID X?"
→ If yes → skip (dedup)
→ If no → process + record ID in DB (same transaction)
Dead Letter Queue (DLQ)
Message processing flow:
1. Consumer picks up message
2. Processing fails → retry (exponential backoff: 1s, 5s, 30s, 5min)
3. After max retries (e.g., 5 attempts) → move to Dead Letter Queue
4. DLQ messages are inspected manually or by automated systems
5. Fix the bug → replay DLQ messages back to original queue
Why DLQ matters:
- Prevents poison messages from blocking the queue
- Preserves failed messages for debugging
- Allows retry after fix is deployed
Q4: How Do You Design a Microservices Architecture?
Answer:
Microservices architecture structures an application as a collection of loosely coupled, independently deployable services, each owning its own data and business logic.
Order → Payment → Inventory (compensating on failure)
Saga Pattern for Distributed Transactions
graph LR
subgraph Happy["Happy Path"]
O1["Create Order<br/>(PENDING)"] --> P1["Reserve Payment"]
P1 --> I1["Reserve Inventory"]
I1 --> O2["Confirm Order<br/>(CONFIRMED)"]
end
subgraph Compensate["Compensation (on failure)"]
I_FAIL["Inventory fails"] --> P_COMP["Refund Payment"]
P_COMP --> O_COMP["Cancel Order"]
end
style Happy fill:#56cc9d,stroke:#333,color:#fff
style Compensate fill:#ff7851,stroke:#333,color:#fff
Saga Type
Coordination
Pros
Cons
Choreography
Each service listens to events and acts
Decoupled, no central coordinator
Hard to track overall flow, debugging complex
Orchestration
Central orchestrator directs the workflow
Easy to understand and monitor
Orchestrator is a single point of failure
Service Discovery
Problem: Services scale dynamically (pods come and go).
How does Service A find Service B's current address?
Solution: Service Registry
1. Service starts → registers itself (IP:port) with registry
2. Service wants to call another → queries registry for addresses
3. Registry health-checks registered services, removes dead ones
4. Client-side load balancing across returned addresses
Tools:
- Consul (HashiCorp) — service mesh + KV store + health checks
- Eureka (Netflix) — Java-focused, Spring Cloud native
- Kubernetes DNS — built-in (service-name.namespace.svc.cluster.local)
- etcd — distributed KV store (used by Kubernetes internally)
Q5: What Are Database Replication and Partitioning Strategies?
Answer:
Replication and partitioning are the two fundamental mechanisms for scaling databases beyond a single machine, addressing read throughput, write throughput, storage capacity, and availability.
Scenario: User updates profile (write to primary), immediately reads (from replica)
Problem: Replica hasn't received the update yet → shows stale data
Solutions:
1. Read-your-writes consistency:
→ After write, read from primary for N seconds
→ Or track last-write timestamp, read from replica only if up-to-date
2. Monotonic reads:
→ Always route same user to same replica (sticky reads)
→ Prevents seeing data go "backward"
3. Causal consistency:
→ Track dependencies between writes
→ Replica only serves reads after all causal dependencies are applied
Sharding (Partitioning) Deep Dive
Shard Key Strategy
Example
Pros
Cons
Hash-based
shard = hash(user_id) % 4
Even distribution
Range queries span all shards
Range-based
shard1: dates Jan-Mar
Efficient range scans
Hot shards (recent data accessed most)
Geographic
shard_us, shard_eu, shard_asia
Data locality, compliance
Uneven if one region dominates
Directory
Lookup table: user123 → shard2
Maximum flexibility
Directory is bottleneck/SPOF
Cross-Shard Operations
Challenge: Query that spans multiple shards (e.g., "all orders > $100")
Approaches:
1. Scatter-gather: Query all shards, merge results (expensive)
2. Denormalize: Copy needed data into each shard (storage trade-off)
3. Global index: Secondary index service spans all shards
4. Avoid: Design schema so most queries hit single shard
→ Shard by user_id, and most queries are user-scoped
Q6: How Does Kubernetes Work and How Do You Design for Container Orchestration?
Answer:
Kubernetes (K8s) is a container orchestration platform that automates deployment, scaling, and management of containerized applications. It’s the de facto standard for running microservices in production.
graph TD
subgraph ControlPlane["Control Plane"]
API["API Server<br/>(kube-apiserver)"]
SCHED["Scheduler<br/>(kube-scheduler)"]
CM["Controller Manager"]
ETCD["etcd<br/>(cluster state store)"]
end
subgraph WorkerNode["Worker Node 1"]
KUBELET["kubelet"]
PROXY["kube-proxy"]
POD1["Pod A<br/>(Container 1)"]
POD2["Pod B<br/>(Container 2, Container 3)"]
end
subgraph WorkerNode2["Worker Node 2"]
KUBELET2["kubelet"]
POD3["Pod C"]
POD4["Pod D"]
end
API --> SCHED
API --> CM
API --> ETCD
API --> KUBELET
API --> KUBELET2
style ControlPlane fill:#56cc9d,stroke:#333,color:#fff
style WorkerNode fill:#ffce67,stroke:#333
style WorkerNode2 fill:#6cc3d5,stroke:#333,color:#fff
Core Kubernetes Objects
Object
Purpose
Example
Pod
Smallest deployable unit (1+ containers)
Single instance of your app
Deployment
Manages desired state of Pods (replicas, rolling updates)
# Pod resource specificationresources:requests: # Guaranteed minimumcpu:"250m" # 0.25 CPU coresmemory:"256Mi" # 256 MB RAMlimits: # Maximum allowedcpu:"1000m" # 1 CPU corememory:"512Mi" # 512 MB RAM# HPA (Horizontal Pod Autoscaler)# Scale between 3-10 pods when CPU > 70%minReplicas:3maxReplicas:10targetCPUUtilizationPercentage:70
Kubernetes Networking
Concept
Purpose
ClusterIP
Internal service (only within cluster)
NodePort
Expose service on each node’s IP at a static port
LoadBalancer
Provision cloud LB (AWS ALB/NLB) for external traffic
Ingress
L7 routing rules (path-based, host-based)
Network Policy
Firewall rules between Pods (default: all-open)
Service Mesh
Sidecar proxy for mTLS, observability, traffic control
Q7: How Do You Design a CI/CD Pipeline?
Answer:
CI/CD (Continuous Integration / Continuous Delivery) automates the process of building, testing, and deploying software. A well-designed pipeline ensures rapid, reliable releases with minimal manual intervention.
graph LR
DEV["Developer<br/>pushes code"]
DEV --> CI["CI Pipeline"]
subgraph CI["Continuous Integration"]
BUILD["Build<br/>(compile, deps)"]
LINT["Lint &<br/>Static Analysis"]
TEST["Unit Tests"]
INT["Integration Tests"]
SEC["Security Scan<br/>(SAST, deps)"]
IMG["Build Container<br/>Image"]
end
CI --> CD["CD Pipeline"]
subgraph CD["Continuous Delivery"]
STAGE["Deploy to<br/>Staging"]
E2E["E2E Tests<br/>(staging)"]
APPROVE["Manual Approval<br/>(optional)"]
PROD["Deploy to<br/>Production"]
SMOKE["Smoke Tests<br/>(production)"]
end
style CI fill:#56cc9d,stroke:#333,color:#fff
style CD fill:#6cc3d5,stroke:#333,color:#fff
CI/CD Pipeline Stages
Stage
Purpose
Tools
Feedback Time
Lint / Format
Code style consistency
ESLint, Black, gofmt
< 30s
Unit Tests
Test individual functions/classes
pytest, JUnit, Jest
1-5 min
Build
Compile code, resolve dependencies
Maven, npm, pip
1-3 min
Integration Tests
Test service interactions
Testcontainers, docker-compose
5-15 min
Security Scan (SAST)
Find vulnerabilities in code
Snyk, SonarQube, Semgrep
2-5 min
Container Build
Build Docker image, push to registry
Docker, Buildah, Kaniko
2-5 min
Deploy to Staging
Deploy to pre-production environment
ArgoCD, Helm, Terraform
3-10 min
E2E Tests
Full user flow tests in staging
Playwright, Cypress, Selenium
10-30 min
Deploy to Production
Rolling update / canary / blue-green
ArgoCD, Spinnaker, Flux
5-15 min
Smoke Tests
Verify critical paths in production
Custom health checks, synthetic monitors
1-3 min
CI/CD Best Practices
Pipeline design principles:
1. Fast feedback: fail early (lint → unit tests → integration)
2. Immutable artifacts: build once, deploy to all environments
3. Environment parity: staging mirrors production
4. Infrastructure as Code: Terraform/Pulumi for infra changes
5. GitOps: desired state in Git, reconciler applies it (ArgoCD)
6. Feature flags: decouple deployment from release
7. Rollback plan: every deployment has automated rollback trigger
Branch strategy:
- Trunk-based development (preferred for fast teams):
→ Short-lived feature branches (< 1 day)
→ Merge to main frequently
→ Feature flags hide incomplete work
→ Main is always deployable
- GitFlow (for teams needing release management):
→ develop → feature branches → release branches → main
→ More overhead, longer release cycles
GitOps with ArgoCD
GitOps workflow:
1. Developer merges PR → main branch
2. CI pipeline builds image → pushes to registry (e.g., v1.2.3)
3. CI updates manifest repo (Helm values / kustomize with new image tag)
4. ArgoCD detects drift between Git manifest and cluster state
5. ArgoCD applies changes to Kubernetes cluster
6. If deployment fails health checks → ArgoCD auto-rollback
Benefits:
- Git is single source of truth
- Full audit trail (who changed what, when)
- Easy rollback (git revert)
- Declarative (describe desired state, not imperative steps)
Q8: How Do You Design a Monitoring and Observability System?
Answer:
Observability is the ability to understand a system’s internal state by examining its external outputs. The three pillars are metrics, logs, and traces. Together they enable debugging, alerting, and performance optimization.
{"timestamp":"2026-05-21T10:30:45.123Z","level":"ERROR","service":"order-service","trace_id":"abc-123-def-456","span_id":"span-789","user_id":"user_42","method":"POST","path":"/api/v1/orders","status_code":500,"duration_ms":2345,"error":"ConnectionRefusedError: payment-service:8080","message":"Failed to process payment for order"}
Distributed Tracing
Request: User places order
┌─ API Gateway (12ms) ─────────────────────────────────────┐
│ ┌─ Order Service (45ms) ──────────────────────────────┐ │
│ │ ┌─ User Service (8ms) ────┐ │ │
│ │ └─────────────────────────┘ │ │
│ │ ┌─ Payment Service (320ms) ← BOTTLENECK ─────────┐│ │
│ │ │ ┌─ Stripe API (280ms) ───────────────────────┐││ │
│ │ │ └────────────────────────────────────────────┘││ │
│ │ └────────────────────────────────────────────────┘│ │
│ │ ┌─ Inventory Service (15ms)──┐ │ │
│ │ └────────────────────────────┘ │ │
│ └─────────────────────────────────────────────────────┘ │
└────────────────────────────────────────────────────────────┘
Total: 392ms (Payment → Stripe is 71% of total time)
SLOs, SLIs, and SLAs
Term
Definition
Example
SLI (Service Level Indicator)
The metric you measure
P99 latency, availability %, error rate
SLO (Service Level Objective)
The target for the SLI
P99 latency < 200ms, 99.9% availability
SLA (Service Level Agreement)
Contract with consequences if SLO breached
99.9% uptime or customer gets credits
Error Budget
How much failure is allowed before violating SLO
99.9% = 43 minutes downtime/month budget
Alerting Strategy
Alert design principles:
1. Alert on symptoms, not causes
✅ "Error rate > 5% for 3 min" (symptom)
❌ "CPU > 90%" (may not impact users)
2. Severity levels:
- P1 (Critical): Revenue impacting, page immediately
- P2 (High): Degraded service, page during business hours
- P3 (Medium): Non-urgent, ticket in queue
- P4 (Low): Informational, dashboard only
3. Reduce noise:
- Group related alerts
- Require duration threshold (not single spike)
- Suppress during maintenance windows
- Escalation: Slack → PagerDuty → phone call
Q9: How Does Event-Driven Architecture Work?
Answer:
Event-Driven Architecture (EDA) is a design pattern where services communicate by producing and consuming events (facts about something that happened). It enables loose coupling, real-time processing, and scalable async workflows.
graph TD
subgraph Producers["Event Producers"]
US["User Service<br/>→ UserCreated"]
OS["Order Service<br/>→ OrderPlaced"]
PS["Payment Service<br/>→ PaymentProcessed"]
end
subgraph EventBus["Event Bus / Broker (Kafka)"]
T1["Topic: user-events"]
T2["Topic: order-events"]
T3["Topic: payment-events"]
end
subgraph Consumers["Event Consumers"]
EMAIL["Email Service"]
ANALYTICS["Analytics Service"]
INVENTORY["Inventory Service"]
SEARCH["Search Indexer"]
end
US --> T1
OS --> T2
PS --> T3
T1 --> EMAIL
T1 --> ANALYTICS
T2 --> INVENTORY
T2 --> ANALYTICS
T3 --> EMAIL
T3 --> SEARCH
style EventBus fill:#56cc9d,stroke:#333,color:#fff
style Consumers fill:#ffce67,stroke:#333
Event Types
Type
Description
Example
Size
Domain Event
Something significant happened in the business
OrderPlaced, UserRegistered
Small (metadata + IDs)
Integration Event
Event shared between services (bounded contexts)
PaymentCompleted consumed by Order service
Small
Event-Carried State Transfer
Event contains full state (eliminates need to query source)
Debezium captures INSERT/UPDATE/DELETE from DB binlog
Row-level
Event Sourcing
graph LR
CMD["Command:<br/>PlaceOrder"]
CMD --> ES["Event Store<br/>(append-only log)"]
ES --> E1["OrderCreated"]
ES --> E2["ItemAdded (x3)"]
ES --> E3["PaymentReceived"]
ES --> E4["OrderShipped"]
ES -->|"Replay events"| STATE["Current State:<br/>Order #123<br/>Status: Shipped<br/>Items: 3<br/>Total: $59.99"]
style ES fill:#56cc9d,stroke:#333,color:#fff
style STATE fill:#6cc3d5,stroke:#333,color:#fff
Reads and writes have different performance profiles and scaling needs
Write side
Normalized, optimized for consistency and validation
Read side
Denormalized, pre-computed views optimized for queries
Sync mechanism
Events from write side update read projections (async)
Trade-off
Eventual consistency between write and read models
Pairs with
Event Sourcing (events feed both write log and read projections)
Idempotent Event Processing
Problem: Network failures → events may be delivered multiple times.
Consumer must handle duplicates safely.
Solutions:
1. Idempotency key in every event:
Event: { "id": "evt_abc123", "type": "PaymentReceived", "data": {...} }
Consumer:
- Before processing, check: "Have I seen evt_abc123?"
- If yes → skip
- If no → process + record evt_abc123 in processed_events table
2. Idempotent operations (naturally safe):
- SET operations (overwrite): last write wins
- Upsert with same data: same result regardless of count
3. Transactional outbox pattern:
- Write business data + event to same DB (single transaction)
- Background process reads outbox table → publishes to Kafka
- Guarantees: if data saved, event will eventually publish
Q10: How Do You Design for Service Mesh and Inter-Service Communication?
Answer:
A service mesh is an infrastructure layer that handles service-to-service communication, providing observability, security (mTLS), and traffic management without changing application code. It’s typically implemented as sidecar proxies alongside each service.
Automatic encryption + identity between all services
Each service manages certs manually
Traffic management
Canary releases, A/B testing, fault injection
Custom load balancer config per service
Observability
Automatic metrics, traces, access logs from proxy
Instrument every service manually
Retries & timeouts
Configurable retry policies per route
Each service implements retry logic
Circuit breaking
Auto-stop traffic to failing services
Library-based (Hystrix, resilience4j)
Rate limiting
Per-service traffic control
Centralized rate limiter service
Access control
Policy-based authorization (which service can call which)
Manual firewall rules / code checks
Service Mesh Comparison
Feature
Istio
Linkerd
Consul Connect
Proxy
Envoy
Linkerd2-proxy (Rust)
Envoy or built-in
Complexity
High (many CRDs)
Low (lightweight)
Medium
Performance
Moderate overhead
Low overhead
Low overhead
Features
Full-featured (traffic, security, observability)
Core features, simple
Service discovery + mesh
Best for
Large orgs needing full control
Teams wanting simplicity
HashiCorp ecosystem users
Traffic Management Patterns
Pattern
Purpose
Configuration
Canary
Route 5% traffic to v2, 95% to v1
Weight-based routing
Header-based routing
Internal testers get v2 via header x-version: canary
Match rules on headers
Fault injection
Inject 500ms delay to test resilience
Delay/abort rules for testing
Mirroring
Copy production traffic to test environment
Traffic shadowing (no impact to users)
Circuit breaking
Max 100 concurrent requests per service
Connection pool limits
Retry budget
Max 20% additional requests as retries
Prevent retry storms
When to Use (and NOT Use) a Service Mesh
Use a service mesh when:
✅ Running 10+ microservices in production
✅ Need mTLS between all services (zero trust)
✅ Want consistent observability without code changes
✅ Complex traffic routing (canary, A/B, fault injection)
✅ Need policy-based access control
Do NOT use when:
❌ Fewer than 5 services (overhead not worth it)
❌ Team doesn't have Kubernetes expertise
❌ Simple request-response with no special routing
❌ Latency-critical paths where sidecar overhead matters (~1-3ms)
❌ Monolith or early-stage product
Summary Table
#
Topic
Key Concepts
1
Load Balancing
L4 vs L7, algorithms (round robin, least connections, consistent hashing), health checks, sticky sessions
2
Caching
Cache-aside, write-through, write-behind, eviction policies, Redis vs Memcached, stampede prevention
3
Message Queues
Kafka vs RabbitMQ vs SQS, delivery guarantees, DLQ, partitions, consumer groups
4
Microservices
Service communication, Saga pattern, service discovery, database per service