Instructions
A programming language is a set of constraints you choose, and those constraints shape what's easy and what's hard.
The CPU of Act II only runs instructions: numbered operations on registers and memory. People want to write code in something more humane. A programming language closes the gap. It does two jobs. First, it translates source text into instructions the machine can run. Second, it refuses certain operations — touching raw memory, skipping a type check, freeing your own allocations — and the refusal is what gives the language its character. Algorithms and data structures sit above all of this; they belong to the math, not the syntax.
What a program is
A source file is just bytes — the CPU has no idea what for or def or int main() mean. A translator reads that text and produces instructions the CPU does understand. Call it a compiler when its output runs later, an interpreter when its output runs immediately. Either way the work is the same: split source into tokens (lexing), build a tree (parsing), check that names and types are consistent (semantic analysis), and produce an intermediate representation (IR) that's easier to optimize and emit code from.
The IR decouples "what the language means" from "what the CPU runs". A front end lowers source to LLVM IR; a back end takes IR and emits x86 or ARM. Add a new language and every existing back end works for it; add a new CPU and every existing language runs on it. Linters and Language Servers reuse the same front end.
Between front end and back end sits the middle end: a sequence of optimization passes, each consuming IR and producing better IR. A pass might fold 1 + 2 to 3, delete a branch the compiler proves never runs, or inline a function's body at the call site. The same passes appear in every serious compiler — constant folding, dead-code elimination, inlining, loop transforms. Most of a release build's wall-clock goes here, not parsing. -O0 to -O2 typically buys 2-10x runtime speed on the same source.
The back end's two hard jobs are instruction selection (which opcodes for which IR ops) and register allocation (which values live in registers, which spill to the stack). Registers are scarce — x86-64 has 16. Spilling in a hot loop can erase the wins from every other optimization, which is why -O2 matters.
Three paths to execution
The translator can produce instructions at three different times, and the choice shapes everything else about the language.
Ahead-of-time (AOT) compilation translates the whole program to native machine code before it runs. The output is a single binary the OS loads into memory. C, C++, Rust, and Go take this path. Compile time is a one-time cost; runtime overhead is roughly zero.
Interpretation keeps the source (or a parsed tree, or bytecode) and walks it every time. Bash, classic CPython, and Ruby take this path. The interpreter loop adds 10-100x overhead per operation, but startup is instant and code can be modified live.
Bytecode plus a VM sits between. The translator compiles once to a portable byte-stream (.class for Java, .pyc for Python); a runtime — the VM — interprets that bytecode wherever the program lands. Add a JIT ("just-in-time" compiler) to the VM and hot parts get re-compiled to native code at runtime, recovering most of the AOT speed. The JVM, .NET, V8, and modern Python all do this.
__pycache__/ and is interpreted. V8 has four JIT tiers. The shape of each path is durable; the lines between them aren't.Linking: the AOT path's last step
A C program is rarely one source file. Each .c file compiles separately to an object file — machine code with holes where references to other files' functions go. The compiler has no idea where printf lives; it leaves a hole labelled printf and trusts someone to fill it.
The linker is that someone. It scans every object file and library, builds a global table of which file defines each symbol, then patches every hole with the right address. Two files defining the same symbol: link error. No file defining it: "undefined reference" — you forgot -lm or misspelled the function.
Worked example: linking main.o and foo.o into one binary
Two source files, compiled separately:
main.ocontains the machine code formain(), which callsfoo(). Its symbol table says: DEFINESmainat offset0x00; UNDEFINEDfoo(a hole at byte0x12insidemain's code).foo.ocontains the machine code forfoo(). Its symbol table says: DEFINESfooat offset0x00; no undefined references.
The linker runs in three passes:
- Scan defines. Walk every object file once and build a global table:
{ main → main.o:0x00, foo → foo.o:0x00 }. Two files defining the same name here would abort with "multiple definition". - Resolve undefs. For every undefined reference, look it up in the table.
main.oneedsfoo→ found infoo.o. Nothing left undefined; the program is closed. If anything were still missing, the linker would printundefined reference to 'foo'and stop. - Lay out and patch. Pick final addresses (say
mainat0x401000,fooat0x401080), then write those addresses into the holes (the byte atmain.o:0x12is rewritten from00 00 00 00to point at0x401080). This last step is called relocation.
The output a.out now has zero holes. Every CALL instruction inside main jumps to a concrete address. Add a third file bar.o with another UNDEFINED foo and the linker resolves it to the same 0x401080 — that's how one definition serves the whole program.
Function calls and the stack
A running program lives mostly on its call stack — a region of memory that grows downward as functions call each other and shrinks back as they return. Each call pushes a stack frame: the return address, the caller's saved registers, and the callee's local variables.
Where do function arguments live? That's the calling convention — a contract between caller and callee. On x86-64 Linux/macOS, the first six integer arguments arrive in registers rdi, rsi, rdx, rcx, r8, r9; the rest go on the stack; the return value comes back in rax. Windows x64 picks different registers. The CPU doesn't care — these are conventions. Get them wrong and arguments arrive scrambled, which is exactly what happens when you forget extern "C" across a language boundary.
How a JIT works
A JIT recovers AOT speed without losing the bytecode-VM startup model. The runtime starts by interpreting bytecode — slow but instant. Each method gets a counter; once it passes a tier-up threshold (around 1,000 calls for a first-tier JIT, often 10,000 for an aggressive second tier), the JIT compiles that method to native code. Subsequent calls jump to the native version.
A JIT can be more aggressive than an AOT compiler because it sees actual runtime behaviour. If a call site has only ever seen int arguments, the JIT specializes for int and skips the type checks. If a branch has never been taken, the fast path omits it. These are bets. If a later call passes a string instead, the runtime de-optimizes: throws away the native code, falls back to the interpreter, and may re-compile later with the new profile.
Worked example: a hot function's life through a tiered JIT
Picture one method, dot(a, b), called inside a loop a million times. The runtime (think V8 or the JVM) carries a per-method counter and a profile of seen argument types. Threshold for tier 1 is 1,000 calls; tier 2 is 10,000.
| Call # | What runs | Counter | Notes |
|---|---|---|---|
| 1 | Interpreter walks bytecode | 1 | first invocation; slow, no compile cost yet |
| 2-999 | Interpreter, recording types | 999 | every call: a: float[], b: float[] — no surprises |
| 1,000 | Tier-up trigger. Baseline JIT compiles dot to native code. Specializes for float[] args; emits a guard if (typeof(a) !== float[]) deopt. | 1,000 | one-time compile latency (a few ms); next call jumps to native |
| 1,001-9,999 | Native code, with the guard checked each entry | 9,999 | maybe 20× faster than the interpreter |
| 10,000 | Tier-up again. Optimizing JIT re-compiles with inlining, loop unrolling, register allocation tuned to the observed shape. | 10,000 | another compile pause; afterwards ≈100× the interpreter |
| 10,001-N | Heavily optimized native code | — | hottest path; bulk of the program's time spent here |
| Some later call | Caller passes a string instead | — | guard fails → deopt: throw away the native code, resume in the interpreter, eventually re-profile and re-compile |
Two consequences worth noticing. Startup is slower than a pure interpreter because the first few thousand calls pay both the interpreter cost and the profile bookkeeping. Steady state is close to AOT because tier-2 native code knows things an AOT compiler can only guess at — like "this call site has only ever seen one type". Microbenchmarks that run for 100ms never escape tier 0 and report misleadingly slow numbers.
Four trade-offs every language makes
A language is defined as much by what it refuses as by what it allows. The refusals cluster on four axes. There's no neutral position — abstaining is itself a stance.
Control vs safety
Hardware lets you do dangerous things — read uninitialized memory, treat an integer as a pointer, write off the end of an array. A language decides whether to expose that power.
C and assembly expose it. Cast an integer to a pointer and dereference — if the address isn't yours, the result is undefined behaviour: a segfault, a corrupted neighbour, a silently wrong answer, or a security exploit. Python and JavaScript hide it: no raw pointers, every collection access bounds-checked, every mistake a clean exception. Rust threads the needle — pointers exist but are tracked at compile time by ownership rules, and unsafe operations live in explicit unsafe blocks.
The trade is real. C lets you write a memcpy that beats glibc's. Python prevents you from writing a buffer overflow at all.
Speed vs expressiveness
Low-level code runs fast but is slow to write. High-level code is fast to write but slower at runtime. A language picks where on that line it wants to live.
Assembly is fastest but most painful — every function call is a stack manipulation you spelled out. C adds structs, control flow, and a standard library, no garbage collector. Java/Go/C# add managed memory, generics, richer abstractions; expressiveness rises and runtime cost rises with it. Python, Ruby, and Lisp keep climbing — comprehensions, metaclasses, dynamic dispatch on every method call — and pay another order of magnitude.
Rough rule: each step up the abstraction ladder buys you fewer lines per feature and costs you roughly an order of magnitude at runtime — Python is typically 10-50x slower than C on the same task. Rust sits near C on speed but far up on expressiveness; that's its claim.
Static vs dynamic typing
A program might call .length on an integer. When does the language tell you?
Static typing checks types before the program runs. The compiler refuses to produce a binary if a call doesn't make sense. Dynamic typing waits until the call happens and the actual value disagrees. Static catches the bug before any user sees it; dynamic lets you write the first version faster.
Modern static checkers also narrow types as control flow refines them. A variable typed string | null becomes just string inside an if (x !== null) branch — the compiler proves the other case is gone. TypeScript, Kotlin, Swift, and Rust's match all do this. The technique is called flow-sensitive typing, and it's what makes "explicit nullability" tolerable instead of cast-heavy.
Manual vs managed memory
Every program allocates memory, and that memory has to get freed when nobody needs it. Who does the freeing?
Manual management: you do it. Every malloc needs a matching free, and the language doesn't check. Get it wrong and you produce one of three classic bugs: a leak (forgot to free, memory grows forever), a use-after-free (freed too early, the pointer now points at someone else's memory), or a double-free (freed twice, the allocator's bookkeeping corrupts). These three account for a large fraction of the CVEs ever filed against C and C++ programs.
Garbage collection (GC): the runtime does it. A tracing collector walks the live heap from a set of roots (CPU registers, the call stack, globals), marks every reachable object, and frees the rest. Java, Go, Python, and JavaScript all use GC. The cost is a pause while the collector runs. Modern concurrent collectors (Go's, Java's ZGC and Shenandoah) keep the stop-the-world portion in the single-digit milliseconds even on multi-hundred-gigabyte heaps, but it's never zero.
Ownership: the compiler does it. Rust's invention. Every value has exactly one owner; the owner gets dropped at the end of its scope; borrows are checked at compile time. No runtime cost, no leak class, no GC pause — paid for in compile-time effort and in code that won't compile until the lifetimes line up.
The simplest tracing GC is mark-and-sweep. Phase one: from the roots, follow every pointer and mark each reachable object. Phase two: walk the heap and free anything unmarked. Cost is proportional to live data on mark, total heap on sweep. Anything unreachable is, by definition, garbage.
The decisive optimization on top of mark-and-sweep is the generational hypothesis: most objects die young. Across decades of measurements on real workloads, the large majority of allocations become unreachable shortly after they're made — often before the next allocation of comparable size. A generational collector splits the heap into a small young generation (collected often, cheaply) and a large old generation (collected rarely). Survivors of a few young-gen collections get promoted to the old gen. The young-gen pause stays small because the young gen does.
The other GC family is reference counting (RC). Every object carries a count of references to it; the count goes up on copy, down on drop; the object frees itself when the count hits zero. No tracing, no pauses — freeing happens exactly when the last reference dies. Cycles are the failure mode: two objects pointing at each other keep each other alive even when nothing outside the cycle does, and neither count reaches zero.
shared_ptr) ship a weak reference type that doesn't bump the count. Python also runs a periodic tracing collector to clean cycles the programmer forgot to break.Paradigms
A paradigm is a default answer to "how should this code be organized?" Four answers dominate. Imperative programs read as sequences of commands that mutate state. Object-oriented programs model the world as things that hold state and respond to messages. Functional programs are transformations over immutable values. Declarative programs describe the desired result and let the runtime figure out how. Modern languages mix all four; the choice is per-module, not per-language.
Two implementation tricks make these paradigms cheap at runtime. The first is the closure: a function value that carries the variables it captured from the enclosing scope, stored on the heap so they outlive the original stack frame. A callback often needs the local variables of the function that registered it, but by the time the callback runs that function has returned. The closure makes those variables stay.
void* user_data alongside every function pointer. C++ lambdas, Rust closures, and Go function literals all desugar to the same struct: code pointer plus captured fields.The second trick is the vtable. In OO code, dog.speak() and cat.speak() run different functions even though the source looks identical — a lookup has to happen at runtime to pick the right one. The vtable does it in two memory loads. Each class has one vtable — an array of function pointers, one per method. Each instance starts with a header containing a pointer to its class's vtable. A virtual call becomes: load the vtable pointer from the object, load the right slot, call it. Branch predictors learn the common case within a few iterations.
dyn Trait is a "fat pointer" — data pointer plus vtable pointer — placing the vtable beside the reference rather than inside the object. Go's interfaces use the same shape.Algorithms
An algorithm is a procedure whose cost can be reasoned about independently of any particular machine. The reasoning uses Big-O notation: write O(f(n)) to mean "for large enough n, this algorithm's work grows no faster than some constant times f(n)". Constants drop because real CPUs vary them too much to be portable claims.
What matters is the shape. O(1) is constant. O(log n) doubles input but adds one step. O(n) scales linearly. O(n log n) is what good sorts achieve. O(n²) is fine on small inputs and painful past a few thousand. O(2ⁿ) is hopeless past about thirty.
Sorting
Take n items, return them in order. A theorem says comparison-based sorts can't beat O(n log n) on average — an information-theoretic floor, not an engineering limit. Non-comparison sorts (radix, counting) reach O(n) but only when keys have structure (fixed-width integers, bounded range).
Real standard libraries don't ship a textbook sort. They ship hybrids that switch algorithm based on input shape. Python's list.sort() is Timsort: merge sort that detects already-sorted runs. C++'s std::sort is introsort: quicksort that falls back to heapsort if recursion gets too deep, then to insertion sort on small ranges. Rust uses pdqsort.
Graph traversal
Given a graph (nodes connected by edges), visit every node reachable from a starting point.
Breadth-first search (BFS) uses a queue and visits nodes in order of distance from the start: root, then neighbours, then their neighbours. Best for shortest-path-by-edge-count. Depth-first search (DFS) uses a stack (or recursion, which is a stack) and goes as deep as possible before backtracking. Best for cycle detection and topological ordering. Both run in O(V + E). The only structural difference is queue vs stack.
Recursion vs iteration
Recursion and iteration compute the same answer with different memory shapes. A recursive factorial(n) allocates n stack frames before any multiplication happens. An iterative version uses a single accumulator and constant stack. The recursive form is often clearer; it also overflows the stack on large n in any language without tail-call optimization (TCO).
TCO is a compiler trick. If the recursive call is in tail position — its return value is the caller's return value with no further work — the compiler reuses the same stack frame instead of pushing a new one. Recursion compiles to JMP, not CALL. Scheme and OCaml require it; Rust and the JVM don't guarantee it; C compilers apply it opportunistically.
Data structures
An algorithm is a procedure; a data structure is the shape it operates on. The right shape often makes an algorithm trivial; the wrong one makes the same algorithm infeasible. Four staples cover most needs.
Arrays (contiguous memory, indexed by integer): O(1) random access, expensive insertion in the middle. Hash maps (key-to-value via a hash function): O(1) average lookup and insertion, no order. Balanced trees (especially B-trees): O(log n) lookup with sortedness preserved — the basis for ordered maps and database indexes. Queues and stacks (FIFO / LIFO): what BFS and DFS use, plus task scheduling and undo stacks.
Inside the hash map
The hash map deserves a closer look because most programs use one. Recipe: hash the key into an integer, take that modulo the bucket-array size to pick a slot, store the key-value pair there.
Two keys can hash to the same slot — a collision, unavoidable past a certain density (the birthday paradox makes the first one likely after about √N insertions). Two strategies handle them. Chaining stores a small list per bucket. Open addressing probes neighbouring slots until it finds an empty one.
Past a load factor of about 0.75 (entries / slots), both strategies trigger a rehash: allocate a bigger bucket array, re-insert everything, throw the old one away. That single rehash is O(n), breaking the "O(1) per insert" claim — but amortized across all the cheap inserts before it, the average is still O(1).
Worked example: a collision in bucket 7, then a rehash
Start with a map of 8 slots (N = 8) using chaining. The hash of each key, mod 8, decides its slot:
- Insert
"Alice" → 1.hash("Alice") % 8 = 3. Slot 3 is empty; drop the pair in. Entries: 1, load factor 1/8 = 0.125. - Insert
"Bob" → 2.hash("Bob") % 8 = 7. Slot 7 is empty; drop it in. Load factor 0.25. - Insert
"Carol" → 3.hash("Carol") % 8 = 7. Collision — slot 7 already holdsBob. Chaining strategy: appendCarolto slot 7's mini-list. Slot 7 is now a 2-item chain[Bob, Carol]. Lookups forBobandCarolboth go to slot 7, then walk the short list comparing keys. Load factor 0.375. - Insert
"Dave" → 4.hash("Dave") % 8 = 7. Another collision in slot 7; chain becomes[Bob, Carol, Dave]. Load factor 0.5. Still under threshold. - Inserts continue until entries = 6. Load factor = 6/8 = 0.75 — threshold hit.
- Rehash. Allocate a new array of 16 slots. Walk the old array, re-hash every key against the new size (
hash(key) % 16), and place it in the new array.Bob,Carol, andDavevery likely scatter to different slots now because% 16distributes them more finely than% 8did. Free the old array. - Insert #7 onward runs against the bigger array. Load factor resets to 7/16 ≈ 0.44; collisions become rare again. The next rehash waits until entries = 12.
That one rehash at step 6 touched all 6 old entries — O(n) work. But the 5 cheap inserts before it, plus the cheap inserts that follow until the next rehash, average out: across many operations the cost per insert is O(1) amortized. With open addressing instead of chaining, step 3 would probe slot 8, 9, 10… until it found a free one — same load-factor logic, no extra list.
The pattern recurs everywhere. Compilers use hash maps for symbol tables, balanced trees for type lookup, queues for parser lookahead. Databases (Act VI) use B-trees for indexes, LSM trees for write-heavy workloads. Networking (Act V) uses queues for packet buffering and hash tables for flow tracking. The right shape isn't an algorithmic question; it's a question about which operation dominates.
Standards
Every language and algorithm cited on this page has a canonical reference.
- C — ISO/IEC 9899 (current: C23). Defines the abstract machine, undefined behaviour, and the standard library every C compiler targets.
- C++ — ISO/IEC 14882 (current: C++23).
- JavaScript — ECMA-262 (annual editions). The host integration with browsers is split across WHATWG specs (HTML, DOM, Fetch).
- JSON — ECMA-404 / RFC 8259 (handed forward from Act I).
- Python — The Python Language Reference and the PEP series. The language is single-vendor but the spec is public; CPython is the reference implementation.
- Rust — The Rust Reference. Single implementation, but the language and borrow checker are specified.
- Go — The Go Programming Language Specification.
- Java — The Java® Language Specification and The Java® Virtual Machine Specification (Oracle, JSR 924). The JVM bytecode spec is what every alternative JVM language (Kotlin, Scala, Clojure) targets.
- POSIX — IEEE 1003.1 (the runtime contract every Unix-like OS exposes to programs; foreshadows Act IV).
- SQL — ISO/IEC 9075 (foreshadows Act VI; programs embed it everywhere).
- Algorithmic complexity — Bachmann–Landau notation (1894 / 1909); modern usage codified in Knuth, The Art of Computer Programming, vols. 1–4. No ISO standard, but the notation is universal.
- Floating-point — IEEE 754-2019 (handed forward from Act I & Act II; affects every language's numeric semantics).
Branches that earn their own article.
- Compiler internals (lexing, parsing, IRs, optimization passes, code generation).
- Interpreter and VM design.
- JIT compilation.
- Garbage collection strategies (mark-sweep, generational, concurrent).
- Type theory and type systems.
- Specific language deep dives.
- Formal algorithm analysis.
- Individual data structure internals (B-trees, red-black trees, skip lists, bloom filters).
- Concurrency primitives (threads, async/await, actors, CSP).
- Formal methods and verification.