We use cookies to improve your browsing experience, support the operation of this site, and understand how visitors use our content.
You can accept all cookies, accept only essential cookies, or deny non-essential cookies.
Privacy Policy
10 most-asked system design interview questions covering URL shortener, rate limiter, chat system, social media feed, file storage, video streaming, notification system, search autocomplete, key-value store, and API gateway.
This is Part 3 of our System Design Interview QA series, covering the 10 most frequently asked system design questions at FAANG+ companies. Each question follows the proven 4-step framework: Requirements → High-Level Design → Deep Dive → Trade-offs.
1. Alice sends message via WebSocket → Chat Gateway
2. Gateway publishes to Kafka topic (partitioned by chat_id)
3. Message Router consumes from Kafka:
a. Persist message to Cassandra (status: "sent")
b. Look up Bob's connection in Session Store (Redis)
c. If online → push via WebSocket → update status: "delivered"
d. If offline → send push notification
4. Bob opens app → fetch undelivered messages → update: "delivered"
5. Bob reads message → client sends ACK → update: "read"
Group Messaging Fan-out
Strategy
How It Works
Best For
Fan-out on write
Copy message to each member’s inbox at send time
Small groups (<100 members)
Fan-out on read
Store once, recipients pull on connect
Large groups / channels
Hybrid
Fan-out on write for small groups, on read for large
Production systems (WhatsApp)
Q4: Design a Social Media News Feed (Twitter/Instagram)
Answer:
A news feed system aggregates and ranks posts from users you follow, delivering a personalized, near real-time content stream.
Feed generation <500ms; High availability; 500M DAU; Eventual consistency acceptable
Capacity
2 posts/user/day → 1B posts/day; Average 300 followers; Feed shows top 20 posts
Fan-out Strategy: The Core Decision
graph LR
subgraph FanOutWrite["Fan-out on Write (Push)"]
POST1["New Post"] --> COPY["Copy to all<br/>followers' feeds"]
COPY --> F1["Alice's Feed Cache"]
COPY --> F2["Bob's Feed Cache"]
COPY --> F3["Carol's Feed Cache"]
end
subgraph FanOutRead["Fan-out on Read (Pull)"]
OPEN["Open Feed"] --> FETCH["Fetch posts from<br/>all followed users"]
FETCH --> M1["User A's posts"]
FETCH --> M2["User B's posts"]
FETCH --> M3["User C's posts"]
end
style FanOutWrite fill:#56cc9d,stroke:#333,color:#fff
style FanOutRead fill:#6cc3d5,stroke:#333,color:#fff
Aspect
Fan-out on Write (Push)
Fan-out on Read (Pull)
When
At post creation time
At feed request time
Latency
Fast reads (pre-computed)
Slow reads (compute on demand)
Write cost
High (copy to all followers)
Low (store once)
Hot users
Celebrity with 50M followers → 50M writes
No write amplification
Best for
Normal users (<10K followers)
Celebrities / high-follower users
Hybrid Approach (Twitter/Instagram’s Actual Design)
Normal users (<10K followers):
→ Fan-out on write: pre-compute feed for all followers
→ Feed is ready when they open the app
Celebrity users (>10K followers):
→ Fan-out on read: don't pre-compute
→ When user opens feed, merge:
- Pre-computed feed (from normal users they follow)
- On-demand fetch (from celebrities they follow)
→ Rank the merged result
Feed Ranking
Signal
Weight
Source
Recency
High
Post timestamp
Engagement
High
Likes, comments, shares on the post
Relationship
Medium
Interaction history with poster
Content type
Medium
User preference (video vs text)
Diversity
Low
Avoid showing too many posts from one user
Q5: Design a File Storage System (Dropbox/Google Drive)
Answer:
A cloud file storage system lets users upload, download, and sync files across devices with high reliability and global availability.
graph TD
CLIENT["Desktop / Mobile Client"]
CLIENT -->|"Upload (chunked)"| API["API Gateway"]
API --> META["Metadata Service"]
META --> METADB["Metadata DB<br/>(MySQL)"]
API --> UPLOAD["Upload Service"]
UPLOAD --> QUEUE["Upload Queue"]
QUEUE --> STORE["Object Storage<br/>(S3)"]
CLIENT2["Another Device"]
CLIENT2 --> SYNC["Sync Service"]
SYNC --> NOTIFY["Notification Service<br/>(WebSocket / Long Poll)"]
SYNC --> CDN["CDN<br/>(Download cache)"]
style API fill:#56cc9d,stroke:#333,color:#fff
style STORE fill:#6cc3d5,stroke:#333,color:#fff
style NOTIFY fill:#ffce67,stroke:#333
Requirements
Type
Requirement
Functional
Upload/download files; Sync across devices; File versioning; Share files/folders
Non-functional
High reliability (no data loss); Low latency downloads; Support files up to 10GB; 100M users, 1M DAU
Capacity
1 file/user/day, avg 5MB → 5TB/day; Total storage: ~1.5PB
Chunked Upload Design
Why chunking?
- Resume interrupted uploads (mobile networks)
- Deduplicate at chunk level (save storage)
- Parallel upload of chunks (faster)
- Delta sync: only upload changed chunks
Chunk size: 4MB (balance between overhead and resume granularity)
Upload flow:
1. Client splits file into 4MB chunks
2. Client computes SHA-256 hash per chunk
3. Client asks server: "Do you have chunk with hash X?"
- Yes → skip (deduplication)
- No → upload chunk
4. After all chunks uploaded → server assembles file
5. Server updates metadata DB with file record
6. Notification service alerts other devices to sync
Multi-channel: push, SMS, email, in-app; User preferences and opt-out; Notification templates; Rate limiting per user
Non-functional
Soft real-time (<30s for push, minutes for email); At-least-once delivery; 10B notifications/day; Pluggable providers
Architecture Deep Dive
Event flow:
1. Service emits event: {"type": "order_shipped", "user_id": 123, "data": {...}}
2. Notification Service:
a. Check user preferences (opted-in channels, quiet hours)
b. Check rate limits (max 5 push/hour per user)
c. Render template with user data
d. Enqueue to channel-specific queues with priority
3. Channel workers:
a. Dequeue message
b. Call provider API (APNs, Twilio, SES)
c. Handle retries with exponential backoff
d. Log delivery status
4. Analytics:
- Track: sent, delivered, opened, clicked, unsubscribed
Handling Scale and Reliability
Challenge
Solution
Provider failures
Retry with exponential backoff + fallback providers
Duplicate notifications
Idempotency key per notification (dedup in Redis)
Quiet hours / time zones
Store user timezone, schedule delivery accordingly
Notification fatigue
Rate limiting + batching (digest emails)
Provider rate limits
Queue with controlled concurrency per provider
Delivery tracking
Webhook callbacks from providers + polling
Q8: Design a Search Autocomplete System
Answer:
An autocomplete system suggests query completions in real time as users type, based on popularity, personalization, and recency.
graph TD
USER["User types: 'syst'"]
USER --> API["Autocomplete API<br/>(<100ms response)"]
API --> CACHE["Local Cache<br/>(per server)"]
CACHE -->|"miss"| TRIE["Trie Service<br/>(in-memory)"]
TRIE --> RESULTS["Top-K results:<br/>1. system design<br/>2. systems programming<br/>3. systematic review"]
subgraph Offline["Offline Pipeline (hourly)"]
LOGS["Search Logs"] --> AGG["Aggregation<br/>(MapReduce)"]
AGG --> BUILD["Build Trie<br/>(top queries)"]
BUILD --> DEPLOY["Deploy to<br/>Trie Servers"]
end
style API fill:#56cc9d,stroke:#333,color:#fff
style TRIE fill:#6cc3d5,stroke:#333,color:#fff
style Offline fill:#ffce67,stroke:#333
Requirements
Type
Requirement
Functional
Return top 5-10 suggestions per prefix; Rank by popularity / recency / personalization; Handle misspellings (fuzzy match)
Non-functional
P99 latency <100ms; Support 100K QPS; Update suggestions without downtime
Trie Data Structure
Trie for ["system", "systems", "syslog", "syntax"]:
root
|
s
|
y
/ \
s n
| |
t t
| |
e a
| |
m x
/
s
Each node stores:
- Character
- Top-K queries passing through this prefix
- Frequency / score for ranking
Two-Phase Architecture
Phase
Component
Latency
Frequency
Online (serve)
Trie servers + cache
<100ms
Per keystroke
Offline (build)
MapReduce + Trie builder
Minutes
Every 15-60 min
Ranking Signals
Signal
Description
Weight
Query frequency
How often this query is searched
High
Recency
Trending queries weighted higher
Medium
Personalization
User’s past search history
Medium
Freshness
New events (e.g., breaking news)
Variable
Q9: Design a Distributed Key-Value Store
Answer:
A distributed key-value store provides fast, reliable storage and retrieval of data across a cluster of machines, handling partitioning, replication, and failure recovery.
get(key) → value; put(key, value); delete(key); Support arbitrary value sizes (up to 1MB)
Non-functional
High availability (AP system); Tunable consistency; Low latency (<10ms P99); Horizontal scaling (add nodes without downtime)
CAP Theorem Trade-offs
graph TD
CAP["CAP Theorem:<br/>Pick 2 of 3"]
CAP --> C["Consistency<br/>(every read gets latest write)"]
CAP --> A["Availability<br/>(every request gets a response)"]
CAP --> P["Partition Tolerance<br/>(works despite network failures)"]
C --- CP["CP Systems:<br/>MongoDB, HBase, Redis Cluster"]
A --- AP["AP Systems:<br/>Cassandra, DynamoDB, CouchDB"]
style CAP fill:#56cc9d,stroke:#333,color:#fff
style CP fill:#6cc3d5,stroke:#333,color:#fff
style AP fill:#ffce67,stroke:#333
Key Design Components
Component
Design Choice
Rationale
Partitioning
Consistent hashing with virtual nodes
Even distribution, minimal reshuffling when nodes join/leave
Replication
Replicate to N=3 clockwise neighbors
Fault tolerance
Consistency
Quorum: W + R > N (configurable)
Tunable: W=1,R=3 (fast writes) or W=2,R=2 (balanced)
Conflict resolution
Vector clocks + last-write-wins
Handle concurrent writes during partitions
Failure detection
Gossip protocol
Decentralized, scalable node health checks
Write path
Write-ahead log → MemTable → SSTable
Fast writes, durable, efficient reads
Consistency Levels (DynamoDB/Cassandra Style)
Setting
Write (W)
Read (R)
Behavior
Strong
W=N
R=1 or W=1, R=N
Always latest value
Quorum
W=2, R=2 (N=3)
Latest value if no concurrent writes
Eventual
W=1
R=1
Fastest, may read stale
Q10: Design an API Gateway and Load Balancer
Answer:
An API Gateway is the single entry point for all client requests, handling routing, authentication, rate limiting, and protocol translation. A Load Balancer distributes traffic across backend servers for high availability and throughput.
Route /api/users/* → User Service, /api/orders/* → Order Service
Authentication
Validate JWT/OAuth tokens before forwarding
Rate limiting
Per-client, per-endpoint throttling
Request transformation
REST ↔︎ gRPC, request/response rewriting
Circuit breaker
Stop forwarding to unhealthy services
Caching
Cache GET responses for static/semi-static data
Logging & metrics
Centralized request logging, latency tracking
SSL termination
Handle HTTPS at the edge
Load Balancing Algorithms
Algorithm
How It Works
Best For
Round Robin
Distribute sequentially to each server
Equal-capacity servers
Weighted Round Robin
Higher-capacity servers get more requests
Mixed hardware
Least Connections
Route to server with fewest active connections
Variable request duration
IP Hash
Hash client IP → always same server
Session affinity
Consistent Hashing
Hash-ring-based routing
Cache servers, stateful services
Health Checks
Active health checks:
- Gateway pings /health on each backend every 5-10s
- Unhealthy after 3 consecutive failures
- Healthy after 2 consecutive successes
- Remove unhealthy servers from rotation
Passive health checks:
- Monitor response codes and latency
- If >50% of requests to a server fail → mark unhealthy
- Automatic recovery when success rate improves
High Availability Design
Layer
Redundancy Strategy
API Gateway
Multiple instances behind DNS round-robin or network LB
Load Balancer
Active-passive pair with virtual IP failover
Backend services
Minimum 3 instances per service, across availability zones
Database
Primary-replica with automatic failover
Cache
Redis Cluster with replication
Summary Table
#
System
Key Concepts
1
URL Shortener
Base62 encoding, key generation, read-heavy caching, 301 vs 302
2
Rate Limiter
Token bucket, sliding window, Redis counters, API gateway placement