Foundations

TL;DR

Languages age. Clouds rebrand. Frameworks churn. What transfers across all of it is a small set of mental models — complexity, latency, consistency, delivery semantics, idempotency, backpressure, state, composition. Learn these well enough to reach for them reflexively and most "new" problems turn out to be old problems in new syntax. This roadmap is the map of that small set and the order to visit them.

You will be able to

Name the nine foundational models and say what each one predicts.
Spot the signature of a missing foundation when a design or incident looks weird.
Explain why every "exactly-once" claim you will ever read is really at-least-once plus idempotency.

The Map

You will be able to
The Map
Station 1 — Complexity, and its lies
Station 2 — Latency numbers you refuse to forget
Station 3 — Consistency, honestly
Station 4 — CAP, and its honest sibling PACELC
Station 5 — Delivery semantics
Station 6 — Idempotency and retries
Station 7 — Backpressure and flow control
Station 8 — State and effects
Station 9 — Composition, the cross-cutter
How the stations connect
Standards & Specs
Test yourself

Rendering diagram…

Read the graph left to right for the dependency chain that most engineers stumble through in practice — you can't reason about retries until you understand delivery semantics, and you can't reason about those until you understand what consistency means. State and composition cut across everything; they show up at every station.

Station 1 — Complexity, and its lies

Big-O is not the answer. Big-O is the shape of the answer once the input is large enough for constants and cache effects to disappear. On real hardware, with real inputs, a linear scan over a contiguous array beats a tree walk that has the same asymptotic class, because prefetchers love contiguity and hate pointers.

  n small                  n medium                 n large
  ┌────────┐               ┌────────┐               ┌────────┐
  │ const  │ ← dominates   │  c·n   │ ← dominates   │  n²    │ ← dominates
  └────────┘               └────────┘               └────────┘
  O(n²) may win            Linear takes over         Asymptotic wins

The model you want: every algorithm has a crossover point below which constants beat class. Know it for the hot paths. When someone says "but this is O(log n)," ask "over what range?"

WARNING

Cache-oblivious matters more than class for anything numeric. A HashMap<Integer, Integer> lookup can cost 10x what a contiguous array scan costs at the sizes modern CPUs care about.

Go deeper: Sedgewick's "Algorithms" Ch. 1 on empirical analysis; Ulrich Drepper's "What Every Programmer Should Know About Memory" §3; a weekend benchmarking sorted array vs TreeMap lookup across n = 10…10^7.

Station 2 — Latency numbers you refuse to forget

Every performance argument is secretly an argument about a latency table. Memorize it once and entire debates collapse.

  L1 cache reference ......................... ~1 ns
  Branch mispredict .......................... ~3 ns
  L2 cache reference ......................... ~4 ns
  Mutex lock/unlock .......................... ~17 ns
  Main memory reference ...................... ~100 ns
  Compress 1 KB with Snappy .................. ~2,000 ns      =    2 µs
  Send 1 KB over 10 Gbps link ................ ~1,000 ns      =    1 µs
  Read 1 MB sequentially from memory ......... ~4,000 ns      =    4 µs
  SSD random read ............................ ~16,000 ns     =   16 µs
  Read 1 MB sequentially from SSD ............ ~50,000 ns     =   50 µs
  Round trip within same datacenter .......... ~500,000 ns    =  0.5 ms
  Read 1 MB sequentially from disk (HDD) ..... ~20,000,000 ns =   20 ms
  Packet CA → Netherlands → CA ............... ~150,000,000 ns = 150 ms

The model you want: every hop is a multiplier. L1 to disk is 10^7. Same-host to cross-region is 10^5. Architects who feel latency viscerally design differently from architects who don't.

TIP

The numbers move slightly each year; the ratios between tiers barely move. Memorize the ratios.

Go deeper: Jeff Dean's "Numbers Everyone Should Know" (2011, still 90% valid after you adjust absolutes); Brendan Gregg's latency scale; a day of perf stat on hot loops in your own code.

Station 3 — Consistency, honestly

"Consistent" without a modifier is meaningless. The useful levels, strongest to weakest:

Rendering diagram…

Linearizable — every operation appears to take effect at a single instant, in real-time order. Looks like a single machine. Most expensive.
Sequential — operations form a single total order consistent with each client's order, but not necessarily real-time. Cheaper than linearizable.
Causal — if A caused B, every observer sees A before B. Concurrent events can be seen in any order.
Read-your-writes — within a session, you see your own writes. A minimum dignity for user-facing apps.
Monotonic reads — a client never sees time move backward. Forgetting this is how a user sees a post, then their refresh makes it disappear.
Eventual — if you stop writing, replicas converge. Says nothing about when.

The model you want: weaker consistency is cheaper and more available, but pushes work onto the application (conflict resolution, reconciliation, idempotent consumers).

CAUTION

"Strong consistency" is a marketing term with exactly the specificity of "fast car." Always ask: linearizable, sequential, or snapshot isolation? The gap between those three is orders of magnitude of coordination cost.

Go deeper: Jepsen reports for any database you actually use; Kleppmann's Designing Data-Intensive Applications ch. 9; Peter Bailis's "Highly Available Transactions" paper.

Station 4 — CAP, and its honest sibling PACELC

CAP says: during a network Partition, a system can be Consistent or Available, not both. That's a theorem and it's correct, but it's half the story.

PACELC adds: Else (no partition), the system trades Latency for Consistency. So a real system is characterized by two letters — what it gives up under partition, and what it gives up when healthy.

   Partition?
       │
    yes│    ┌─────────────┐
       ├───▶│ CP   or  AP │   what you give up during a partition
       │    └─────────────┘
       │
    no │    ┌─────────────┐
       └───▶│ EL   or  EC │   what you give up when healthy
            └─────────────┘

   Real systems: PC/EC (Spanner), PA/EL (Cassandra), PC/EL (MongoDB w/ defaults)

The model you want: CAP is the rare case, PACELC is the everyday case. Most of the time your system is not partitioned, and the tradeoff you actually feel is latency vs consistency.

WARNING

"We solve CAP" is the distributed-systems equivalent of "we abolished gravity." Nobody solves CAP; people pick a corner and decorate it. Ask which corner — and what it costs you when healthy.

Go deeper: Abadi's PACELC paper; MongoDB, DynamoDB, Cassandra, and Spanner docs read side by side on write latency vs staleness guarantees.

Station 5 — Delivery semantics

Three levels, in increasing cost and honesty:

Rendering diagram…

At-most-once — cheap, loses messages under failure. UDP-ish.
At-least-once — the honest default of distributed messaging. Losses become duplicates.
Exactly-once — a property, not a mechanism. Always implemented as at-least-once plus an idempotent consumer, or as a transaction across the message broker and the side-effect store (Kafka's transactional producer + consumer offsets). There is no magic.

The model you want: "exactly-once" is a property you engineer into the consumer, not a guarantee you get from the broker.

CAUTION

Every time someone says "we need exactly-once," what they usually need is "I don't want the side effect to happen twice." Those are different problems. The first is hard; the second is idempotency (next station).

Go deeper: Kafka's transactional-producer docs read end to end; Confluent's "Exactly-once Semantics Are Possible" post; Tyler Treat's "You Cannot Have Exactly-Once Delivery."

Station 6 — Idempotency and retries

An operation is idempotent when applying it N times has the same effect as applying it once. Idempotency is the prerequisite for every safe retry.

Rendering diagram…

The model you want: every retryable operation needs an idempotency key the server dedupes against. This is what Stripe's Idempotency-Key header, SQS deduplication IDs, and Kafka's producer IDs all are. Same mechanism, different names.

Retry policy completes the picture:

Backoff — exponential (or decorrelated jitter) so retries don't synchronize.
Jitter — randomize the wait so a herd doesn't retry in lockstep.
Budget — cap total retry time; prefer giving up to cascading upstream.

WARNING

Retry without backoff is a thundering herd. Retry without jitter is a thundering herd that ticks like a metronome. Retry without a budget is how partial failure becomes total outage.

Go deeper: AWS's "Exponential Backoff and Jitter" post; Stripe's idempotency-key docs; the Netflix Hystrix paper (dated but still the clearest failure-domain writeup).

Station 7 — Backpressure and flow control

A queue that cannot push back is a memory leak with extra steps. Backpressure is the mechanism by which a slow consumer tells a fast producer to slow down.

 producer ──▶ [ queue ] ──▶ consumer
                │
                ├─ bounded  + blocking push   → producer waits
                ├─ bounded  + drop-on-full     → producer loses, but survives
                └─ unbounded                   → OOM when consumer slows 💥

The model you want: an unbounded queue is not a queue, it's a memory leak waiting for a traffic spike. Every queue needs a bound and a full-policy (block, drop, or spill).

CAUTION

"We'll just scale up the consumer" is not flow control. It's a prayer with a credit card attached, and the bill arrives before the prayer.

Go deeper: Reactive Streams spec (the minimal interface for backpressure across languages); the Akka flow-control docs; Little's Law applied to your own queues.

Station 8 — State and effects

Code is either pure (output is a function of input, no side effects) or impure (reads or writes the world — the clock, the DB, the network, a global, a file). The distinction is operational, not ideological: pure code is easy to test, safe to parallelize, cheap to cache. Impure code is the opposite, in exactly the amount that matches its impurity.

     input ─▶  ┌──────┐  ─▶ output        pure
               │  f   │
               └──────┘
                                            ─ parallelize freely
                                            ─ cache freely
                                            ─ test with one assertion

     input ─▶  ┌──────┐  ─▶ output        impure
               │  g   │  ─▶ DB, clock, net
               └──────┘
                                            ─ needs a seam for testing
                                            ─ ordering constraints
                                            ─ race windows

The model you want: push impurity to the edges. Keep the core pure; wrap it in a thin impure shell that handles I/O. This is the single most transferable design habit across languages — it's called "functional core, imperative shell" in one tradition and "hexagonal architecture" in another, and it's the same idea.

TIP

When testing feels hard, the usual fix is moving the impurity, not adding more mocks. If a function mixes a computation with a DB call, split it.

Go deeper: Gary Bernhardt's "Boundaries" talk (mandatory, 30 min); Domain Modeling Made Functional; any Haskell tutorial until monads click — not for Haskell itself, for the clarity about effects.

Station 9 — Composition, the cross-cutter

Composition is the glue for every other station. Functions compose. Objects compose (preferred over inheritance). Services compose. Promises compose. Streams compose. One model, many scales.

Rendering diagram…

The model you want: prefer composition over inheritance, over configuration, over magic. The composition you can read top-to-bottom is the composition that survives the next refactor.

WARNING

"Just extend this base class" is the siren song of a codebase two years from being rewritten. Composition asks more of you up front and gives more back forever.

Go deeper: Design Patterns ch. 1 (read only the first chapter, the rest has aged); Gilad Bracha's talks on modularity; any week spent rewriting a deep inheritance hierarchy as composition.

How the stations connect

The stations are not independent. Read them as a single chain, and the "why" of distributed systems gets much shorter:

Rendering diagram…

Read top to bottom: costs → numbers → guarantees → tradeoffs → messaging → retries → flow control. The side stations — state and composition — are the techniques that make every other station tractable.

Standards & Specs

The authoritative references for this domain. Cite these, don't paraphrase.

CAP theorem — Gilbert & Lynch (2002) — the formal proof that during a network partition, a distributed system gives up either consistency or availability.
PACELC — Abadi (2012) — the honest extension of CAP: else (no partition), latency trades against consistency.
Lamport, "Time, Clocks, and the Ordering of Events" (1978) — the paper that defines causal order and logical clocks.
Herlihy & Wing, "Linearizability" (1990) — the canonical definition of the strongest consistency level.
Bailis, "Highly Available Transactions" (HAT, VLDB 2014) — what transactional guarantees survive a partition, and what they cost.
Jepsen analyses — adversarial consistency testing of real databases; the empirical ground truth for any vendor's consistency claims.
RFC 7231 §4.2.2 (HTTP idempotency) — the formal definition of idempotent methods; now carried forward in RFC 9110.
Reactive Streams spec — the minimal cross-language interface for asynchronous backpressure.
Little's Law (Little, 1961) — $L = \lambda W$ ; the one-equation model every queue obeys.
Dean & Barroso, "The Tail at Scale" (CACM 2013) — why p99 is not p50, and what to do about it.
Books — Kleppmann, Designing Data-Intensive Applications. Petzold, Code. Sedgewick & Wayne, Algorithms. Hennessy & Patterson, Computer Architecture.

Test yourself

A service behind a load balancer sees duplicate charges after a deploy. The team proposes "exactly-once delivery." What's the real problem and the cheap fix?

The real problem is that the charge endpoint is not idempotent. The cheap fix is an idempotency key (a UUID the client generates per intended charge). "Exactly-once delivery" from the broker doesn't help if the consumer reapplies on retry. See Station 5 and Station 6.

Your p99 latency doubles when a single remote dependency lives in another region. Why is that predictable from the latency table alone?

Same-DC round trips are ~0.5 ms; cross-region is ~150 ms — two orders of magnitude. Any dependency that lands in the p99 critical path and sits cross-region will dominate the budget. The table predicts this without you measuring it. See Station 2.

A background worker has an unbounded in-memory queue. The queue is "fine" in staging. Predict what happens in production and name the fix.

Under a traffic spike or a consumer slowdown, the queue grows without bound until the process OOMs. The fix is a bounded queue with an explicit full-policy (block, drop, or spill to disk). Staging didn't surface it because the arrival rate never outran the service rate. See Station 7.