Back to blog
Career

System Design Interview Prep: The Engineer's Framework for Whiteboard Architecture

Wrok||14 min read

System Design Interview Prep: The Engineer's Framework for Whiteboard Architecture

Most engineers who fail system design interviews don't fail because they don't know the concepts. They fail because they run out of time, wait to be asked questions instead of driving the session, and mistake drawing boxes for demonstrating judgment.

The system design round is the one interview where knowing things is necessary but not sufficient. You need to know distributed systems fundamentals and manage a 45-minute conversation under pressure and know which problems to surface before the interviewer asks. The engineers who pass aren't the ones with the most knowledge. They're the ones who've practiced running the session.

Here's the complete framework.


What the Interviewer Is Actually Scoring

Before you learn a single design pattern, understand what you're being evaluated on. The rubric is consistent across Google, Meta, and Amazon, even if the scoring language differs:

1. Requirements scoping — Do you clarify before you design? Jumping into architecture before agreeing on scale, latency requirements, and functional scope is the single most common failure mode. Engineers who build the wrong thing fluently still fail.

2. Design breadth — Can you produce a high-level architecture that covers the full system lifecycle? Interviewers want to see all major components before any single one gets deep-dived. A candidate who spends 30 minutes on the database schema and never touches the API layer or caching hasn't demonstrated breadth.

3. Deep-dive quality — When the interviewer pushes on a specific component ("walk me through how you'd handle 10x traffic spike on this service"), can you reason about it concretely? This is where staff-level candidates separate from senior — they don't just answer, they pre-empt the question by volunteering the hard problems.

4. Trade-off reasoning — The most differentiated signal. For any design decision, can you explain what you gave up? Why a message queue instead of direct service calls? Why NoSQL here and SQL there? Candidates who state preferences without explaining trade-offs are guessing, and interviewers know it.

5. Communication under pressure — This is often what tips borderline candidates. Silence while drawing is a bad signal. Narrating your thinking out loud, asking clarifying questions before pivoting, and explicitly flagging assumptions keeps the interviewer in the conversation and demonstrates clarity.


The 45-Minute Framework

The session has four phases. Practice them with a timer until the pacing is automatic — knowing the material is a different skill from delivering it under a clock.

Phase 1: Requirements (Minutes 0–5)

Don't touch the whiteboard. Ask first.

Functional requirements: What does this system need to do? State the core use cases explicitly and confirm them. "I'm going to design this to support creating short URLs and redirecting visitors to the original URL. Out of scope: analytics, custom slugs, and link expiration — unless you want me to include those."

Non-functional requirements: Scale, latency, availability, durability. The numbers shape every subsequent decision. A system handling 100 requests per second looks nothing like one handling 100,000. Ask:

  • How many daily active users?
  • Read-heavy or write-heavy?
  • What latency SLA does the product need?
  • Any consistency requirements (can users see stale data)?

Scope confirmation: Restate what you're building and what you're not. Treat this as a contract. Five minutes spent clarifying here prevents 20 minutes designing the wrong system.

Phase 2: Estimates and API (Minutes 5–15)

Before drawing the architecture, do back-of-envelope math and define the API.

Scale estimation: Pick two or three numbers that drive your architecture. For a URL shortener: 1B URLs stored, 100:1 read-to-write ratio, 500ms P99 redirect latency. Write them visibly. Interviewers check whether your subsequent design choices reflect your own estimates.

API definition: State the contract before building the system behind it. For each core feature:

POST /shorten  { url: string } → { shortCode: string }
GET  /{shortCode}              → 302 redirect

This step grounds the rest of the session. If your API changes mid-session, you've exposed that you didn't think it through upfront.

Phase 3: High-Level Design (Minutes 15–25)

Now draw the boxes. Five to seven major components, minimal detail. Show how data flows through the system end-to-end. The rule: every component should be nameable and have a clear responsibility.

A typical skeleton:

  • Client (mobile app, browser, API consumer)
  • Load balancer / API gateway
  • Application servers (stateless, horizontally scalable)
  • Primary data store (SQL or NoSQL — state your choice and why)
  • Cache (where and what you cache)
  • Async processing layer (queue, event stream — if applicable)
  • CDN (for read-heavy, latency-sensitive content)

Don't hide uncertainty. Say "I'm choosing Postgres here because the data is relational and the write volume is manageable — I'd reconsider if writes hit 100K/sec." A stated reason beats a silent assumption every time.

Phase 4: Deep Dives (Minutes 25–42)

The interviewer will pick two or three components and probe. Senior candidates wait to be asked. Staff candidates say "The two areas I'd want to deep-dive are the cache invalidation strategy and the database sharding plan — which would you rather start with?"

Volunteer the hard problems. This is the single biggest signal of level.

Common deep-dive areas by problem type:

  • URL shortener: collision-resistant key generation, cache hit rate optimization, redirect latency
  • Chat system: message delivery guarantees, presence tracking at scale, fan-out to large groups
  • News feed: fan-out on write vs. fan-out on read trade-off, ranking algorithm, cache staleness
  • Rate limiter: distributed enforcement without a single-point shared state, clock skew, Redis vs in-process counters
  • Notification system: exactly-once delivery, retry logic, unsubscribe state propagation

Phase 5: Wrap-Up (Minutes 42–45)

Reserve the final three minutes to surface what you'd want to address if you had more time: monitoring and alerting, failure modes, scaling plan, known weaknesses in the current design. Interviewers weight this heavily — it's a proxy for production judgment and operational awareness.


Five Core Problem Archetypes

Every classic system design question is an instance of a pattern. Learn the patterns, not the specific problems.

Pattern 1: Key-Value Storage at Scale

Problems: URL shortener, Pastebin, feature flags

The core challenge: generate a short, unique identifier, map it to a value, and serve reads at very high volume with low latency.

Key decisions:

  • Hash generation: MD5 of the original URL truncated to 6 chars introduces collision risk. Base62 encoding of an auto-incremented ID is more predictable. Discuss the trade-off.
  • Storage: Write once, read many. A simple key-value store (DynamoDB, Redis with persistence) beats a relational database for this access pattern.
  • Cache: Cache the hot URLs (popular links follow a Zipfian distribution — the top 20% of URLs receive 80% of traffic). LRU eviction with a TTL.
  • Redirect latency: Return a 302 (temporary, sends traffic to origin each time) vs. 301 (permanent, browsers cache the redirect and skip your server entirely). The right answer depends on whether you want accurate click analytics.

Pattern 2: Real-Time Messaging

Problems: Chat system, live notifications, collaborative editing

The core challenge: deliver messages with low latency and strong delivery guarantees between millions of concurrent users.

Key decisions:

  • Protocol: HTTP long-polling works but creates connection overhead. WebSockets maintain a persistent connection and are appropriate for bidirectional real-time communication. Server-Sent Events work for unidirectional updates.
  • Message delivery guarantees: At-most-once (fire and forget) vs. at-least-once (acknowledge and retry) vs. exactly-once (idempotent operations with deduplication). Each level has a cost.
  • Presence tracking: Knowing who's online requires either polling (expensive) or heartbeat-based TTLs in a fast key-value store.
  • Fan-out to groups: Sending a message to a 10,000-person group requires a choice: fan-out on write (write to each recipient's mailbox at message creation) or fan-out on read (store the message once, compute the audience at read time). Write-heavy groups favor fan-out on read.

Pattern 3: Feed Generation

Problems: Twitter/X timeline, Instagram feed, LinkedIn activity feed

The core challenge: serve a personalized, ranked stream of content to hundreds of millions of users with sub-second latency.

Key decisions:

  • Fan-out model: For users following thousands of accounts, fan-out on read is more write-efficient. For accounts with millions of followers (celebrities), fan-out on write would saturate the write path. Hybrid models write to followers up to a threshold (e.g., 10K followers) and compute feeds for "celebrity" accounts at read time.
  • Ranking: A simple chronological feed is easy to implement but produces poor engagement. A ranking layer applies a scoring model and re-sorts the candidate set before serving.
  • Cache invalidation: A user's feed should reflect recent content. Cache TTLs, event-driven invalidation, and "lazy" re-computation on miss are each valid approaches with different freshness vs. cost trade-offs.

Pattern 4: Distributed Rate Limiting

Problems: API rate limiter, DDoS protection, abuse prevention

The core challenge: enforce per-user or per-IP limits across multiple application servers without a single bottleneck.

Key decisions:

  • Algorithm: Token bucket (bursty traffic allowed up to a ceiling) vs. sliding window counter (smooth rate enforcement) vs. leaky bucket (fixed output rate). Token bucket is the most common choice for API rate limiting because it allows reasonable bursts.
  • Distributed enforcement: Storing counters in a single Redis instance is simple but creates a single point of failure and a network round-trip per request. Local in-process counters with periodic sync to a shared store reduce latency at the cost of brief over-limit windows.
  • Failure modes: What happens when the rate-limit store is unavailable? Fail open (allow traffic) or fail closed (reject all)? State your choice and the business context that drives it.

Pattern 5: Search Indexing

Problems: Web search, product search, full-text search

The core challenge: index large corpora and serve low-latency query results.

Key decisions:

  • Inverted index: The core data structure. Maps terms to the documents that contain them. Discuss how you'd build and update it.
  • Write path: Ingestion pipeline → tokenization → posting list updates. Batch vs. streaming updates depending on freshness requirements.
  • Ranking: BM25 for text relevance, combined with behavioral signals (click-through rate, dwell time) for a hybrid scoring model.
  • Query processing: Single-term lookups are fast. Multi-term queries require posting list intersection. Prefix matching (autocomplete) needs a different data structure (trie or Elasticsearch's edge n-gram tokenizer).

Senior vs. Staff: Where the Bar Shifts

The system design round is evaluated differently depending on the level you're interviewing for.

At the senior (L5/SDE-II) level, the expectation is that you know the standard patterns, can produce a complete high-level design, and can answer deep-dive questions with reasonable accuracy. The interviewer does most of the steering. You respond well.

At the staff (L6/SDE-III) level, you're expected to drive the session like a technical lead running a design review. Senior candidates wait to be prompted; staff candidates volunteer the hard problems before the interviewer surfaces them. The staff bar includes:

  • Pre-emptive risk surfacing: "The hot spot I'd be most worried about is the fan-out path for accounts with >1M followers — let me walk through how I'd handle that before we move on."
  • Operational judgment: Monitoring, alerting, runbooks, and degraded-mode behavior are part of the design, not an afterthought. Mentioning specific SLIs and SLOs (e.g., "I'd alert on P99 latency exceeding 300ms and cache hit rate dropping below 85%") signals production experience.
  • Cross-functional awareness: Data privacy, compliance constraints, cost optimization, and abuse vectors are in scope at staff level. A staff candidate designing a notification system asks "what's our legal obligation on data retention for message delivery receipts?" A senior candidate doesn't.

For principal and distinguished engineers (L7+), the session often becomes a collaborative technical discussion rather than an evaluation — you're expected to challenge the problem framing and propose alternatives the interviewer hadn't considered.


The 2026 Addition: AI System Design

If you're interviewing at a company that ships AI features — which is almost any company in 2026 — expect at least one prompt involving AI infrastructure. These questions were niche in 2024 and are now mainstream.

Real system design prompts showing up in 2026 FAANG loops include:

  • Design a customer-support chatbot that uses a third-party LLM
  • Design the serving infrastructure for an LLM that handles 100K requests per day
  • Design a retrieval-augmented generation (RAG) pipeline for a document search product
  • Design a rate limiter for LLM API calls that accounts for variable token costs

You don't need to be an ML engineer to answer these. The infrastructure patterns are extensions of what you already know:

  • RAG pipeline: Document ingestion → chunking → embedding → vector store → retrieval → LLM prompt assembly → response. Know the read path (query → embedding → nearest-neighbor search → inject into context) well enough to explain latency contributors.
  • LLM serving: The key cost driver isn't compute per request — it's token throughput. Rate limits are typically expressed in tokens-per-minute, not requests-per-second. Cache common prompt prefixes (KV cache) to reduce redundant computation.
  • Vector databases: Pinecone, Weaviate, and pgvector each make different latency vs. consistency trade-offs. Know that approximate nearest neighbor (ANN) is standard at scale because exact nearest neighbor search doesn't scale.

The failure mode in AI system design questions is the same as in classical ones: not knowing when to apply a pattern. Knowing that "RAG uses a vector database" is not the same as knowing when to use RAG instead of fine-tuning, or why you'd use a vector store vs. a full-text search engine for a given retrieval problem.


How to Practice

Use a timer. Set a 45-minute countdown and design a full system start to finish for every practice session. Knowing the material without timing yourself is like knowing a language without speaking it.

Talk out loud. Practice narrating your reasoning to someone — a study partner, a rubber duck, a recording. Interviewers penalize silence. Narrating while building is a skill that only develops through repetition.

Study problems by pattern, not by name. Don't memorize "how to design Twitter." Understand the fan-out trade-off, then apply it to Twitter, Instagram, Discord, and any other system that has followers/publishers and feeds/consumers.

Use Alex Xu's System Design Interview volumes (Volumes 1 and 2) for canonical problems. Each chapter ends with a summary of the core trade-offs — read those first as a study shortcut. Neetcode's system design playlist is the best free video supplement.

Do at least two timed mock sessions with a real person before your onsite. The mechanics of explaining architecture while someone asks follow-up questions is qualitatively different from solo practice. Interviewing.io and Pramp offer peer and paid mock system design rounds.


TL;DR

  1. You're scored on five things: requirements scoping, design breadth, deep-dive quality, trade-off reasoning, and communication. Knowing patterns is necessary but not sufficient.
  2. Follow the 45-minute framework: 5 minutes on requirements, 10 minutes on scale estimates and API, 10 minutes on high-level design, 17 minutes on deep dives, 3 minutes on operational wrap-up.
  3. Learn five core archetypes: key-value storage, real-time messaging, feed generation, distributed rate limiting, and search indexing. Every classic problem is an instance of one of these patterns.
  4. The senior-to-staff gap is about driving the session. Senior candidates answer well. Staff candidates pre-empt the hard questions and surface operational concerns unprompted.
  5. AI system design is now mainstream. Know RAG pipelines, LLM serving patterns, and vector databases at the infrastructure level — not the ML model level.
  6. Practice with a timer out loud. Two timed mock sessions with a real person will do more for your onsite performance than two more weeks of reading.

Related: Meta, Google, and Amazon Interview Loops Decoded — how the system design round fits into each company's specific evaluation framework and what each company scores differently.

Related: The Technical Interview Reboot for 2026 — how AI-assisted coding rounds and format changes are reshaping the engineering interview across the board.

Related: The Engineer's Behavioral Interview Playbook — while you're prepping system design, don't let the behavioral round catch you off guard.


Your GitHub history, architecture decisions, and scale accomplishments are exactly the raw material that makes system design answers credible — and exactly what most resumes fail to capture. Wrok turns your work history into a career profile that communicates what senior interviewers are actually evaluating. Build your profile on Wrok →

InterviewSystem DesignCareer StrategyInterview PrepBig Tech