FEM: Fullstack System Design

November 06, 2025

https://frontendmasters.com/workshops/fullstack-system-design/

Tools

https://app.diagrams.net/

Diagramming

Pizza Shop

Customer places an order
Resteraunt receives the order
Staff takes the order after completing other orders
If missing ingredients, has to prepare/acquire them
Order ready: delivery or in store
Customer receives order

Pizza Ordereed

graph TD
   A1[Cusomter Places Phone Order] --> C{Order Received}
   A2[Cusomter Places Website Order] --> C{Order Received}
   A3[Cusomter Places In Store Order] --> C{Order Received}
   B[Take Payment]
   C[Staff Receives Order]
   C --> |Staff Busy| C
   C --> |Has Ingredients| D
   C --> |Missing Ingredients| D
   D[Staff Makes Order] --> E
   E[Counter or Delivery Receives Complete Order]
   E --> |Paid| F
   E --> |Needs payment| B --> E
   F[Customer Gets Order]
   G[Acquire/Prepare Ingredients] --> D

Flow

graph TD
    A[Client]
    A --> |Cache Hit| C
    A --> |Cache Miss| B(Send Request)
    B[Load Balancer or CDN] --> D
    C@{ shape: diamond, label: "Cache" } --> A
    D[Server]
    D --> |Cache Hit| A
    D --> |Cache Miss| F(Return Data)
    E@{ shape: diamond, label: "Cache" }
    F@{shape: cyl, label: 'Database'} --> E

Translating Business Requirements

TODO App

What is actually needed?
Create a TODO
Read a TODO or list
Update a TODO
Complete a TODO
Delete a TODO

Mobile Banking

How realtime?
Manage Users: CRUD
Authenticate a User
Authorize a User
Display recent transactions
Display historical transactions
Make a payment
Transfer money
Withdraw money
Show Balance
Apple/Google Pay

Non-functional

performant, very fast
live data
secure

URL Shortener

Questions:

Can users customize the shortened URL?
How long are short URLs persisted?
What about keys/duplicates?
What happens if the site is no longer available?
How long is too long for short?
HTTPS?
Performance expectations
Can a URL change?
Can a URL be deleted?

Functional:

Users should be able to convert long URL's into shortened versions

TODO App

Functional

tasks can only be text
users should be able to:
- read their todos
- edit todos
- create new todos
- mark a todo as complete
- delete a todo
- reorder todos
- create a list
- edit list
- filter or sort
- delete a list
- ~~share a list~~
- create account
- login

non-functional

only authenticated users can access tasks
task oeprations must complete within 1000ms

CAP Theorem

Realiability: ability for a system to function correctly over time
Availability: proportion of time a system is operational and accessible
Resiliency: how well does the system handle failures
Consistency: how do we ensure that all the users see the same data at the same time

Distributed systems can only guarentee 2 of 3 at a time.

Consistency: every read receives the most recent write on error Availability: a request for data gets a response, even if one or more nodes are dow Partition Tolerance: the cluster must continue to work despite any number of communication breakdowns between nodes in the system.

~~C + A: only works without network issues~~
C + P: Always show the latest data but unreliable performance
A + P: Always responds but might who outdated data

Non-functional Requirements

Mobile Banking

How many MAU's?
How many transactions per user?
How often do users have to transfer money, i.e. write through the app
How often do the users use the app per month?

Non-functional

the system should have 4 nines of availability
transactions should be backed up daily
transaction data must be encrpyted in transit at rest
transactions cannot be lost
every transaction and user action must be audited

URL Shortener

How many MAUs?
What is the average requests per second?
What is the maximum latency allowed?
What is the max length of the URL?
Do URL's expire?

Non-functional

Redirects should happen in no more than 500ms
the system should support 1 million RPS
long URLS can be at most 3kb
short urls can at most 0.3kb

Modeling

graph LR
  A[Requirements] --> B
  B[Entity Modeling] --> C
  C[API Design] --> D
  D[Endpoints Optional]

Entities

erDiagram
    USER ||--o{ TASK : has
    USER ||--o{ LIST : has
    USER {
        string username
        string password
        string id
    }
    TASK {
        string contents
        string status
        string id
    }
    LIST }|..o{ TASK: contains
    LIST {
        string description
        string tasks
        string id
    }

Protocols

HTTP: stateless, simple, human readable, supported by all browsers
Websockets: bi-directional communication, peristent connection
Server Side Events: one way communication (server to client), human readable
gRPC: binary protocol (HTTP/2), Strongly-types contracts
REST: multiple endpoints, human readable, supported by all, stateless
GraphQL: single endpoint, precise data retrieval, self documenting API, strongly typed

graph LR
    A[Is this internal service-to-service communication?] --> |Yes| E
    A --> |No| B
    B[Do you read need realtime updates?] --> |Yes| C
    B --> |No| D
    C[Do you need bi-directional communication?] --> |Yes| F
    C --> |No| G
    D[Do you have complex data from many sources?] --> |Yes| H
    D --> |No| I
    E[consider gRPC]
    F[WebSockets]
    G[Server-Sent Events]
    H[GraphQL]
    I[REST]

Database Scaling

Partiioning

Same DB
Easier
Transactional guarentees
Queries span multiple tables

Sharding

Load balanced over different machines

Replication

making copies of your data across multiple servers or locations
increases fault tolerance
increases read performance

Primary / Replica

All writes go to the primary server
Replicas copy data from the primary and handle read requests
If the primary fails, a replica can take over

graph LR
    A[Service] --> |READ| D
    B[Service] --> |WRITE| E
    C[Service] --> |READ| F
    D[Replica] --> |READ| E
    E[Primary]
    F[Replica] --> |READ| E

Primary / Primary

Multiple servers acept reads/writes
Data is synchonized between all servers

graph LR
    A[Service] --> |READ/WRITE| D
    B[Service] --> |READ/WRITE| E
    C[Service] --> |READ/WRITE| F
    D[Primary] <--> E
    E[Primary] 
    F[Primary] <--> E

Peer to Peer

Every server can read and write
Changes are shared with all other servers

graph LR
    A[Peer A] <--> D
    B[Peer B] <--> C
    C[Peer C]
    D[Peer D]
    A <--> B & C <--> D

Strategies

Transactional
Snapshotting
Merging

Caching

Cache Aside (Lazy Loading) READ

Cache miss
Read from database
Update Cache

graph LR
  A[Service] --> |3 WRITE| B
  A --> |2 READ| C
  B@{ shape: diamond, label: "cache" } --> |1| A
  C@{ shape: database, label: "database" }

Cache Aside (Lazy Loading) WRITE

Write to database
Write to cache

graph LR
  A[Service] --> |1 WRITE| C
  A --> |2 WRITE| B
  B@{ shape: diamond, label: "cache" } 
  C@{ shape: database, label: "database" }

Write Through

Write to cache
Write to DB

graph LR
  A[Service] --> |1 WRITE| B
  B@{ shape: diamond, label: "cache" } 
  B --> |2 WRITE| C
  C@{ shape: database, label: "database" }

Read Through

Read from cache
on miss, read from db
write to cache

graph LR
  A[Service] --> |1 READ| B
  B --> |cache hit| A
  B@{ shape: diamond, label: "cache" }  --> |2 READ| C
  C@{ shape: database, label: "database" }
  C --> |3| B

Write Behind

write to cache
immediately return
asynchronously, write to db

graph LR
  A[Service] --> |1 WRITE| B
  B --> |2| A
  B@{ shape: diamond, label: "cache" }  --> |3 WRITE| C
  C@{ shape: database, label: "database" }

Cache Invalidation

Time-based expiration (TTL)
Event-based
Version tagging
Refresh ahead

Cache Eviction

LRU is a double linked used.

FIFO: first in first out
LIFO: last in first out
LRU: least recently used
MRU: most recently used
LFU: least frequently used
RR: random replacement

Estimations

help ground vague requirements in reality
helps you think about the speficis of the system components
shows the interviewer your thought process
don't have to be precise

Strategy

clarify
- what are you estimating?
  - users, requests, storage
- ask or make reasonable assumptions
  - how many users?
- validate your assumptions
  - write it down
do the math
sanity check the results

TODO System Design

Questions

Requirements

graph LR
    A@{shape: curv-trap, label: 'client'} -->B
    A -->|client| FEC
    FEC@{shape: diamond, label: 'cache'}
    B[reverse proxy] -->GO
    B -->WC
    subgraph GO
        direction TB
        D[web server]
        E[web server]
    end
    GO -->F
    F[load balancer]
    F --> C1
    F --> C2
    F --> C3
    WC[cache] --> AUS
    AUS[autoupdate server] -->F
    C1@{shape: diamond, label: 'redis cache'} -->|READ| DBR1
    C2@{shape: diamond, label: 'redis cache'}  -->|WRITE| DBW1
    C3@{shape: diamond, label: 'redis cache'}  -->|READ| DBR2
    DBR1@{shape: database, label: 'database'}
    DBR1 -->|READ| DBW1
    DBW1@{shape: database, label: 'database'}
    DBR2@{shape: database, label: 'database'}
    DBR2 -->|READ| DBW1

Security

SSL./tls

option 1: termination at load balancer
- most common
- application receives http
option 2: termination at application layer
- load balancer passes through encrypted traffic
- each application instance handles decryption
option 3: re-encryption
- terminate at load balancer
- re-encrypt between load balancer and applications

Authentication

Who are you?

Authorization

Determines permissions, what can you do?

Asynchorinicity

help adders the challenge of working with computationally expensive tasks
keep the system responsive

Expensive Tasks

uploading and processing a large video file
generating a report
processing payments
image resizing or thumbnail creation

Components

message broker
- rabbitMQ
- kafka
message queue
- rabbitMQ
- Amazon SQS
worker management
- kubernetes

Video Upload Service

supported resolutions / formats?
is there a size limit?
do we need generated thumbnails?
how many users are uploading at once?
do we need to process videos?
do we need captions / subtitles
do we need to process audio?

features

up to 4k
max filesize 4g
1 upload per user / day @ 1000 users
yes thumbnails
no trim/edit
not today for captions
yes on audio
audio track is separate
no perf metrics on upload speed

entities

erDiagram
    USER ||--o{ MANIFEST : has
    USER ||--o{ VIDEO  : uploads
    METADATA ||--o{ VIDEO : has
    METADATA ||--o{ AUDIO : has
    METADATA ||--o{ THUMBNAILS : has
    VIDEO
    THUMBNAILS
    AUDIO
    METADATA
    MANIFEST

    USER {
        string username
        string password
        string id
    }

user gives a title
user starts uploading a video
user is notified when upload is successful
process video
user notified when processing complete

graph TD
    C[client] -->|POST videos| W
    W[web server] -->|video| SV
    W --> MDB
    SV[source video storage] --> N
    N[notification service] -->|notify| P
    N -->|notify| C
    P[video processing service]
    subgraph B["`
        convert to 4k
        convert to HD
    `"]
        B1[queue]
        B2[queue]
    end
    B --> W
    MDB@{shape: database, label: "database"}
    MDB --> A
    A@{shape: database, label: 'audio'}
    subgraph W ["video broker"]
        W1[worker]
        W2[worker]
    end
    subgraph AB ["audio broker"]
        B1[queue]
        B2[queue]
    end
    AB --> AW
    subgraph AW
        W1[worker]
        W2[worker]
    end
    PV[processed video storage]
    PV -->|get video| SV
    PV --> B