Interfaces (9a) — Product

An interface has 16.67 ms to draw the next frame and roughly 100 ms before a user calls it slow. Inside that envelope the browser parses HTML into a DOM, computes styles for every element, lays out the page, paints layers, composites them on the GPU, and ships pixels to the display. A 250 KB JavaScript bundle on a mid-range Android phone takes longer to parse than a frame budget allows. A button positioned three pixels off lands outside the thumb's natural arc. Interface work is the engineering of these constraints — the runtime, the render pipeline, layout, components, rendering strategies, platform APIs, performance budgets, accessibility, and the native and design surfaces that surround the web.

The interface stack. Every layer is replaceable; none is optional. A slow choice anywhere shows up as a missed frame at the top.

The browser as a runtime

A native app ships with one runtime: the language VM, the OS's UI toolkit, and the libraries the developer chose. A web app ships into a runtime it does not control — a browser that another team built, on a device the user picked, configured for a network the developer never measured. The web runtime is the most heterogeneous deployment target in software, and every design decision starts from that fact.

The browser exposes three primary languages. HTML declares the document tree — elements, attributes, the parent/child structure that becomes the DOM. CSS describes how every element looks and lays out — colour, geometry, font, position. JavaScript runs as the imperative layer that mutates both at runtime: it adds nodes, changes styles, listens to user input, and talks to the network. The DOM is the shared data structure all three engines mutate; the main thread is the single CPU the JS engine runs on.

The JavaScript engine is itself a small compiler. V8 (Chrome, Edge, Node), JavaScriptCore (Safari, Bun), and SpiderMonkey (Firefox) all share a similar shape: bytecode interpreter for cold code, an optimising JIT (TurboFan in V8, FTL in JSC, WarpMonkey in SpiderMonkey) that recompiles hot functions to machine code based on observed types. Hidden classes and inline caches let property lookups on regular objects compile down to a single load instruction; the moment an object's shape changes unexpectedly, the cache invalidates and the lookup falls back to a slow generic path. Writing fast JS is mostly about keeping object shapes stable.

delete obj.x on a shaped object

Tiering lets the engine spend optimisation budget only on hot code. A single deopt can drop a function from 10 ns/call back to 200 ns/call.

Three caveats sit underneath everything else. The main thread is single-threaded — long JS blocks every other engine. The event loop processes one task at a time; if a task runs for 200 ms, no scroll, no click, no animation lands during that window. Memory is managed by a generational garbage collector — most objects die young and get freed cheaply, but a long-lived data structure that gets mutated each frame can drag old-generation collections that pause for 30 ms or more. And the DOM is not a JS data structure — it's a C++ tree exposed across a binding layer, so each element.style.color = "red" is a cross-boundary call that the engine cannot optimise the way it optimises pure JS arithmetic.

Promises drain before paint. An infinite await Promise.resolve() chain looks like a hang because no render step fires.

The async primitives surface from that loop. Promise schedules a microtask; async / await is the same with sugar. requestAnimationFrame schedules a callback for the next paint, ideal for visual updates because it runs at the right moment in the pipeline. requestIdleCallback fires only when the main thread is idle — useful for background work like prefetch or analytics. scheduler.yield() (Chrome) and setImmediate (Node, partial browser support) let a long task voluntarily cede the thread between input events. MessageChannel.postMessage is the fastest way to yield without scheduling a full task, and is the trick behind several time-sliced library implementations.

The trade-off is the price of the web's reach. Every interactive site runs on top of three coupled engines you do not control, on hardware that might be a 16-core M3 or a five-year-old Android phone with thermal throttling. The mental model that survives both is: HTML and CSS are declarations the browser can optimise heavily; JavaScript is the imperative escape hatch that costs main-thread time. Spend the budget where it pays the user back.

The render pipeline

A web page is not painted once and forgotten. Each scroll, each animation tick, each setState call risks producing a new frame, and a new frame is a small pipeline the browser walks from the DOM down to the GPU. Engineering performant interfaces is mostly about knowing which mutations re-enter which stage of that pipeline, and how expensive the re-entry is.

The pipeline has six named stages. Parse converts raw HTML and CSS bytes into the DOM and CSSOM trees. Style recalc matches every element to its applicable CSS rules, producing a computed style. Layout (sometimes called reflow) computes the geometric box for every element — position, width, height — given the styles and the viewport. Paint turns each layer into a list of drawing commands (filled rect, glyph run, image blit). Composite decides how the painted layers stack and what transforms apply to each. Raster runs the drawing commands into actual GPU textures the compositor swaps into the next frame.

transform / opacitycomposite only · main thread skippedreading offsetTop after a writeforces synchronous layout — forced reflowthe cheapest mutation is the one that only re-enters the compositor

The pipeline is one-directional but re-entrant. Knowing which property reaches which stage is the difference between a 60 fps animation and a 12 fps slideshow.

A worked trace makes the costs concrete. On a 2023 MacBook Air rendering a typical news article — around 2,400 DOM nodes, 8,000 CSS rules across three stylesheets, three above-the-fold images — Chrome's DevTools Performance tab shows roughly: HTML parse 6 ms, style recalc 3 ms, layout 7 ms, paint 2 ms, composite 0.4 ms. Total first-frame work on the main thread: around 18 ms, plus image decode that runs on a worker thread. Subsequent frames during a scroll cost almost nothing — composite-only — because scroll position is just a transform on the composited document layer.

The pipeline is engineered around one optimisation: composited layers. Properties on the GPU-friendly list — transform, opacity, filter, will-change — can be applied entirely on the compositor thread without re-running layout or paint. A 60 fps card-flip animation that uses transform: rotateY(180deg) runs on the compositor and never touches the main thread. The same animation written as width and left mutations forces layout and paint every frame and chokes on any device under a flagship phone. Force a layout in the middle of a JS task and the whole frame budget collapses — reading offsetTop or getBoundingClientRect after a write flushes pending style and layout work synchronously, a pattern called layout thrashing.

The browser tries to defer layout until the next frame. Reading a geometry property mid-write defeats the deferral and pays the full cost per iteration.

The frame budget is the honest limit. At 60 fps the browser has 16.67 ms between vsync ticks to produce a new frame, and the OS compositor itself eats one or two milliseconds. At 120 fps — common on flagship phones and recent laptops — the budget shrinks to 8.33 ms. The compositor itself never blocks; the main thread does, and the main thread is where your JS, your style recalc, and your layout all live. Every frame you ship is a contract with the main thread to finish in under 10 ms.

CSS as a layout engine

CSS is the most underestimated programming language on the platform. Every selector is a pattern that matches against the DOM, every property cascades from parent to child unless reset, every rule competes against every other rule for the same slot on the same element. The browser does all of this on every style invalidation. Getting it right is the difference between an interface that scales to a million DOM nodes and one that stutters at ten thousand.

The cascade is the algorithm that picks a winner when multiple rules set the same property on the same element. It walks four ladders in order: origin and importance (user-agent vs author, plus !important), then specificity (a tuple counting ID selectors, class/attribute/pseudo-class selectors, and type selectors — so #nav .item beats .menu .item), then source order (later rules win ties), then inheritance (some properties — color, font-family, line-height — inherit from the parent if unset; most do not). Cascade layers (@layer) added in 2022 let you bound that fight: rules in an earlier layer always lose to rules in a later layer regardless of specificity, which makes a design-system base layer reliably overridable by app-level rules.

@layer reset, base, components, applater layer always wins, regardless of specificitymoves the fight out of selector arms-races into named order

Specificity is the most common source of "why doesn't my style apply." Layers turn the answer from a count to a name.

Layout has three engines worth knowing. Flexbox lays out children along one axis, with growing, shrinking, and alignment along the cross axis — perfect for toolbars, button rows, single-row card lists. Grid lays out children in two dimensions with named tracks and explicit areas — perfect for full-page layouts, dashboards, and any design where alignment crosses rows and columns. Containment (contain: layout style paint) tells the browser that a subtree's layout, style, or paint cannot affect the rest of the document, so the browser can skip work on the rest of the page when the contained subtree changes. Container queries (@container) finally let a component respond to its parent's size rather than the viewport — the missing piece that made truly reusable design-system components possible.

grid-auto-flow: dense

Flexbox aligns along one axis at a time; Grid pins both axes simultaneously. Most pages use both — Grid for the page shell, Flexbox inside cells.

CSS also reshaped colour over the last few years. oklch() specifies colour in a perceptually uniform space — oklch(70% 0.15 240) means 70% lightness, 0.15 chroma, 240° hue, and a 10% lightness step looks like the same brightness change anywhere on the wheel. color-mix(in oklch, var(--brand), white 20%) blends two colours predictably without manually computing intermediates. Relative colour syntax lets you derive a colour from another (oklch(from var(--brand) calc(l + 0.1) c h)), turning a design system's tokens into a self-consistent system instead of a flat palette. These features remove the need for a Sass build step for most teams.

The trade-off is that CSS executes on every change. Style invalidation propagates from a changed element up, down, and across the DOM depending on which selectors might now match. A descendant combinator (.modal .button) forces a re-match across the modal subtree on any structural change inside it. A :has() selector — useful but expensive — can invalidate ancestors when descendants change. Avoiding deep selectors, scoping work with containment, and preferring class toggles over inline-style writes are not micro-optimisations on a complex app; they decide whether style recalc fits in the frame budget. CSS earned its place as a serious engineering surface the same way SQL did: people kept trying to escape it, and kept ending up writing worse versions of it.

The component model and state

Once an app grows past a few hundred elements, raw DOM manipulation stops being viable. Every interaction risks updating the wrong node, leaving stale state, or stacking event handlers. The component model is the response: split the UI into named, reusable units, each owning a slice of state and a render function, each receiving inputs (props) from its parent and emitting events upward. Data flows down, events flow up; nothing else crosses the boundary.

That contract sounds simple and isn't. The hard problem is reactivity — when a piece of state changes, which components need to re-render, and how does the framework know? Three families of answers compete, and the choice has real performance consequences past 10,000 components.

Virtual DOM (React) re-runs the component function on state change, produces a new tree of JSX-described nodes, and diffs it against the previous tree to compute a minimal patch list applied to the real DOM. Easy to reason about — your render function is a pure description of "what the UI looks like at this state" — but the diff costs CPU on every update, and the developer is responsible for telling React when to not re-run (memoisation via useMemo, memo, and the new React Compiler). At scale, careless React apps spend more time diffing than the patches themselves.

Fine-grained reactivity (Vue 3, Solid, Svelte 5) flips the model. Each piece of state is a reactive primitive — a signal in Solid, a ref in Vue, a $state rune in Svelte 5 — and the framework tracks at compile or run time which DOM nodes depend on which signal. When the signal changes, only those nodes update. No diff, no re-execution of the component function. Solid's component function runs exactly once per mount; the closures it creates are the subscriptions. The cost is a sharper conceptual model — you can't freely destructure or read state outside a tracking scope — and a smaller ecosystem.

Compiled reactivity (Svelte) takes the same idea further. Svelte's compiler reads your component at build time and emits imperative DOM-mutation code directly — no virtual DOM, no runtime reconciler beyond a tiny scheduler. Bundle sizes drop by tens of kilobytes; updates are pointer-fast. The trade is a heavier compiler and the need to learn Svelte's template syntax rather than plain JSX.

vDOM is "describe everything, diff." Signals are "subscribe what reads, notify what writes." Compiled is "emit the writes statically." The right choice depends on team size, app shape, and where the bottleneck actually is.

State scope matters as much as state mechanism. Local state lives in one component and disappears with it. Lifted state moves up to a common ancestor when two siblings need to share. Global state — a store, a context, a signal exposed at module scope — survives the component tree. Three layers, each with a cost: local is cheap but doesn't share; lifted forces re-renders down the whole subtree; global needs an explicit subscription model so that not every component re-renders on every change. Frameworks differ on the global story: Redux made reducers explicit; Zustand and Jotai went smaller; signal-based libraries (Solid, Vue) make global state just another signal.

The three scopes are not stylistic — picking the wrong one is the most common cause of "the whole app re-renders on every keystroke."

The honest limit: state architecture is the lever that decides whether the app scales. A 200-component app survives anything; a 20,000-component dashboard with a single global store and no memoisation will drop frames on every keystroke. The fix is rarely "pick a different framework" — it's usually "narrow the subscription so the right components, and only the right components, re-render on the right writes."

Rendering strategies

A web page exists at one of four times: at build, on a server, at the edge, or in the browser. Picking which one renders the HTML decides what TTFB looks like, what LCP looks like, what the first interactive moment looks like, and how big the JavaScript bundle has to be. The named strategies are the recognisable points on that curve.

Client-side rendering (CSR) ships an empty HTML shell and a JavaScript bundle that builds the whole page in the browser. The bundle parses, the framework hydrates, the data loads, and only then does the user see content. Great for app-shaped interfaces behind a login, where SEO does not matter and the user is already committed; brutal for content sites, where Time-to-First-Byte arrives in 100 ms and the page is still blank at 3 seconds.

Server-side rendering (SSR) generates the HTML on each request, sends it down, then ships the JavaScript that hydrates the static markup into an interactive page. First paint arrives early because the server already shaped the content; interactivity arrives later because the hydration JS still has to run. Next.js, Remix, Nuxt, SvelteKit all default here.

Static site generation (SSG) does the same render at build time and serves the pre-rendered HTML from a CDN. Cheapest TTFB on the planet (10–30 ms from an edge POP), best caching story, no server cost per request — at the price of a build that takes longer as the site grows and a content-update lag bounded by the build pipeline. Blogs, docs, marketing sites live here.

Streaming SSR pipelines the render. The server flushes the HTML shell first, then streams in chunks of content as data resolves — using <Suspense> in React, await blocks in SvelteKit, <Suspense> in Vue. The user sees pixels at 100 ms even if the slowest data dependency takes 800 ms; the JS hydrates each chunk as it arrives. The improvement on perceived speed is large and the cost on engineering is mostly understanding which boundaries to draw.

Islands (Astro, Qwik, Fresh) ship mostly static HTML with small interactive components — islands — hydrated independently. A blog post might be 100% static except for a comment widget and a like button, each its own small bundle. Total JS shipped: 5–20 KB instead of 200 KB. The model trades the convenience of "the whole page is one app" for a hard split between static and interactive zones.

No strategy is universally best. Islands win for content; SSR with streaming wins for dynamic pages; CSR survives for behind-login app shells where the first paint is a loading state anyway.

Hydration is the cost shared by every strategy except pure SSG. The server-rendered HTML is dead until the JS engine runs the framework, attaches event handlers, and rebuilds the component tree in memory. On a mid-range Android phone with 200 KB of JS to parse, that's 400–800 ms after the HTML arrived — a window during which clicks land on a page that looks interactive but isn't. Partial hydration (Astro), resumability (Qwik, which serialises the entire framework state into HTML so the client never re-runs setup), and progressive hydration (hydrate above-the-fold first, defer the rest) are the responses. React Server Components add a new shape — components that run only on the server, never ship JS for themselves, and stream their rendered output into a client tree — letting the client bundle skip everything that doesn't need interactivity.

The same first paint, three different paths to interactivity. The gap between "looks ready" and "is ready" is the bug users describe as "the site froze for a second."

Incremental Static Regeneration (ISR) sits between SSG and SSR — pages are served from a static cache, regenerated on a schedule or on demand when stale. Cheap reads, fresh content, no per-request server cost during steady state. Edge rendering moves the server to the CDN's edge POPs (Cloudflare Workers, Vercel Edge, Netlify Edge) — the server is closer to the user, so even SSR can hit 50 ms TTFB anywhere in the world. The trade is a constrained runtime (no Node-specific APIs, limited execution time per request) and a different deployment model.

The trade-off across the table: every shift left on the rendering curve (toward static / server) cuts JS bundle and improves first paint; every shift right (toward client) buys richer interactivity per byte. Pick the cheapest strategy that meets your interactivity floor. Most marketing and content sites should be SSG with islands; most logged-in app surfaces should be SSR with streaming; pure CSR survives only in app-shell scenarios where SEO does not exist.

The web platform

A browser used to be a renderer that ran a sandboxed scripting language. The shape now is closer to a portable OS: storage, networking, threading, peer-to-peer media, low-level graphics, and background execution are all exposed through standardised JavaScript APIs. Building a sophisticated interface is partly knowing which capability lives behind which API.

Networking. fetch() superseded XMLHttpRequest as the request primitive — promise-based, with explicit Request and Response objects, streaming bodies, and AbortController for cancellation. Streams (ReadableStream, WritableStream, TransformStream) let you process a 100 MB response incrementally instead of buffering it. Server-Sent Events (EventSource) deliver a one-way stream of text events over HTTP — perfect for token streams from an LLM. WebSockets open a full-duplex, message-oriented channel — perfect for chat, collaborative editing, live game state. WebTransport runs on HTTP/3 with QUIC underneath, giving you unreliable datagrams and reliable streams on the same connection.

Storage. localStorage is synchronous, string-only, and capped around 5 MB — fine for a theme preference, wrong for anything structured. sessionStorage is the same but cleared on tab close. IndexedDB is the structured client database — async, indexed, supports binary blobs, capped by available disk minus a safety margin (often gigabytes). The Cache API stores Request/Response pairs and is what Service Workers use to implement offline. Origin Private File System (OPFS) gives you a sandboxed POSIX-ish filesystem accessible only to your origin — used by SQLite-WASM and other in-browser databases.

Workers. The main thread is single-threaded, but the browser is not. Web Workers run JS in a separate OS thread with no DOM access, communicating over postMessage with structured-clone serialisation. SharedWorker is one worker shared across tabs of the same origin. Service Workers are network proxies — installed once, they intercept fetch requests and can serve them from cache, modify them, or synthesise responses entirely. The Service Worker is what makes an app installable, offline-capable, and push-notification-capable.

caniuse.com before reaching for the obscure ones

The capability surface is wider than any single team uses. The shape of an app is largely a choice of which corners to live in.

Media and peer-to-peer. getUserMedia() opens camera and microphone (with user permission). WebRTC adds peer-to-peer audio, video, and data channels — the substrate behind Google Meet, Discord voice, and most browser-based video calls. The signalling is left to you (typically WebSocket); WebRTC handles the NAT traversal, the SRTP encryption, and the codec negotiation.

Low-level graphics. WebGL2 exposes OpenGL ES 3.0 — adequate for most 3D needs. WebGPU, shipping in stable browsers from 2023 onward, exposes the explicit-graphics model used by Vulkan, Metal, and D3D12 — bind groups, compute pipelines, command encoders — at performance levels competitive with native. Used for browser-based ML inference, GPU-accelerated image processing, and the next generation of in-browser games.

WebAssembly sits alongside JS as a second compile target. Compiled C, C++, Rust, Go, AssemblyScript, and an increasing list of other languages produce .wasm modules the browser executes at near-native speed. The boundary with JS is explicit: shared memory through SharedArrayBuffer, function calls through imports and exports, no DOM access without a JS bridge. WebAssembly earns its place when CPU-bound work is the bottleneck — Photoshop on the web, Figma's vector engine, in-browser SQLite (sql.js, OPFS-backed wa-sqlite), AV1 decoders, ML inference runtimes. The ergonomic cost is the boundary itself; nothing is free across it, and a chatty call pattern can be slower than pure JS.

The trade-off is fragmentation. Capability detection (if ('serviceWorker' in navigator)) is non-negotiable; a feature that works in Chrome may be stage-2 in Safari and unimplemented in Firefox. Bridge APIs (WebUSB, WebHID, WebNFC) ship in Chromium-derived browsers and not in the others by design — Apple has chosen not to implement them. The practical rule: lean on the platform where it's broadly supported (Fetch, Streams, IndexedDB, Service Workers, WebSockets, WebRTC are universal), and treat the long-tail capability APIs as enhancements that gracefully degrade.

Performance for users

User-perceived performance is not a feeling. It is a measurable distribution of milliseconds across the page lifecycle, and Google's Core Web Vitals are the three numbers most teams now treat as the contract. Largest Contentful Paint (LCP) measures when the largest above-the-fold image or text block becomes visible — target under 2.5 seconds at the 75th percentile across real users. Interaction to Next Paint (INP) replaced First Input Delay in 2024 and measures the worst tap-to-paint latency across the session — target under 200 ms. Cumulative Layout Shift (CLS) measures how much visible content jumps around during load — target under 0.1.

Three thresholds are worth holding in mind beneath those numbers. Under 200 ms feels instantaneous. Under 1 second feels responsive and keeps attention. Above 10 seconds breaks the user's task — by then they have switched tabs, lost the goal, or left. The Core Web Vitals thresholds are tuned to keep most interactions inside the first two windows.

width / heightreserve space for adssize-adjust for fontsmin-height skeletonsmeasured at the 75th percentile across real users — synthetic numbers do not count

Three numbers, three thresholds. Every product team should know its current values and the worst-performing route in the app.

A worked Core Web Vitals budget turns the targets into concrete decisions. Consider a product landing page: hero image, headline, paragraph, CTA, three feature cards. On a mid-range Android phone over a 4G connection (RTT 100 ms, throughput 5 Mbps after slow-start), the budget walks like this.

LCP target 2.5 s. The largest element is the hero image. Network goes: DNS 30 ms, TCP+TLS 200 ms, server TTFB 200 ms — 430 ms before any byte of HTML. The HTML itself is 30 KB gzipped: 50 ms. Critical CSS inline in the head: 0 ms extra. The browser sees the <img> tag at 480 ms; the image is 80 KB AVIF: 130 ms to download. Decode: 30 ms. LCP arrives around 640 ms — well under budget. AVIF over JPEG is the single biggest LCP lever — typical 50% size reduction at equal quality. WebP gives most of the win with broader Safari support pre-2023. Preloading the hero (<link rel="preload" as="image">) saves another 50–100 ms.

INP target 200 ms. Every interactive element — the CTA, the cards, the menu — must respond within 200 ms of tap. The bundle is the constraint. A 250 KB gzipped JS bundle parses in roughly 100 ms on a mid-range phone; if the framework runtime is 60 KB of that, only the rest is your code. Long tasks (over 50 ms on the main thread) block input handling; scheduler.yield() and requestIdleCallback break work into chunks that fit between input events. Heavy synchronous work — hashing, JSON parsing of a 10 MB blob, image manipulation — belongs in a Worker.

INP measures the worst tap-to-paint of the session. One pathological interaction on a 5-year-old phone defines the field score.

CLS target 0.1. Every image and embed needs explicit width and height (or aspect-ratio) so the browser can reserve space before the asset loads. Ads and embeds need their own reserved slots — a 250 px ad that arrives after the page rendered will push 250 px of content down, eating CLS budget instantly. Custom fonts shift text when they swap: font-display: swap makes the swap visible (better LCP, risks CLS), optional blocks the swap (worse LCP, no CLS); size-adjust and the @font-face overrides line the metrics up so the swap is invisible.

Walked end to end, the LCP budget is mostly network and decode, not framework code. Optimising hero delivery is more leverage than any framework swap.

The toolbox underneath those targets is consistent across teams. Code-splitting with dynamic imports keeps the initial bundle small — route-level chunks load only the JS the current route needs. Tree-shaking in current bundlers (Vite, esbuild, Rollup) drops unused exports automatically. Lazy loading with loading="lazy" on images defers below-the-fold work to scroll time. Font subsetting drops glyphs you do not use — a Latin-only subset of a typical Google Font drops from 400 KB to 30 KB. fetchpriority="high" on the LCP image tells the browser to schedule the download ahead of others. HTTP/2 server push is dead; 103 Early Hints is the surviving mechanism for telling the client to preload critical assets before the main response arrives.

The honest limit is JavaScript. Every kilobyte of JS costs parse, compile, and execution time on the device's CPU, not just download time on the network. On flagship hardware the cost is invisible; on the 5-year-old Android phone in a developing market, 500 KB of JS is the difference between a 2-second LCP and a 9-second one. Performance work is mostly the discipline of shipping less code, later, on fewer threads.

Accessibility as engineering

About 16% of the world's population — over a billion people — lives with some form of disability. The fraction of users who navigate your site by keyboard, by screen reader, by voice, or with adapted input is larger than any browser's market share except Chrome. Accessibility is the engineering work that lets one codebase serve all of them. Done right, it improves the interface for every user; done wrong, it locks out a measurable fraction and exposes you to legal risk in most jurisdictions.

The foundation is semantic HTML. A <button> is keyboard-focusable, has a default Enter/Space activation, announces as a button to screen readers, has correct focus styling, and participates in the form-submission contract. A <div onclick> has none of these. Every native element — <a>, <button>, <input>, <label>, <select>, <form>, <table>, <dialog> — encodes a contract with assistive technology that an arbitrary div does not. The first accessibility rule is: use the native element if one exists.

ARIA (Accessible Rich Internet Applications) is the patch when the platform falls short. role tells assistive tech what an element is when the HTML can't (role="tab" on a custom tab). aria-label provides an accessible name when no visible text exists. aria-live="polite" makes a region announce its updates to screen readers. aria-expanded, aria-selected, aria-checked describe state. The hard rule, drilled into every accessibility engineer, is "no ARIA is better than bad ARIA" — a wrong role or stale aria-expanded is worse than no attribute at all because it lies to the user.

The accessibility tree is what screen readers see. Native HTML maps cleanly; custom components need ARIA to build their tree node.

A worked accessibility audit makes the decisions concrete. Consider a sign-up form: email, password, confirm password, marketing-opt-in checkbox, submit button. Across the form, the engineering questions cascade.

Each input needs a programmatic label. <label for="email">Email</label><input id="email" type="email"> ties them with for/id. Screen readers will announce "Email, edit text" when the input gets focus. A placeholder is not a label — it disappears the moment the user types, leaving the user unable to recall the field's purpose. Visually-hidden labels (<label class="sr-only">) work when design dictates a label-less look, but they must exist.

Focus order follows reading order. Tab moves Email → Password → Confirm → Checkbox → Submit. The tabindex attribute defaults to 0 for natively-focusable elements; using positive tabindex values (tabindex="3") breaks the order and is almost always wrong. The checkbox is a real <input type="checkbox">, not a styled div with a click handler — Space toggles it, and screen readers announce "checked" or "not checked."

Right-side annotations are what a screen reader announces. The first regression test is "can a sighted keyboard user complete this form without a mouse?"

Inline errors after a failed submit need to be announced. Three pieces work together. The input gets aria-invalid="true". The error message gets an id, and the input gets aria-describedby="email-error" pointing at it. A live region (<div aria-live="assertive" role="alert">) announces the summary error once submission fails — "Sign-up failed. 2 fields have errors." Focus moves programmatically to the first invalid field so the user lands on the problem instead of hunting for it.

Colour contrast must hit WCAG 2.2 AA: 4.5:1 for body text against background, 3:1 for large text and UI components. A subtle "Forgot password?" link at #999 on white background is around 2.8:1 — fails. The fix is one design-token swap. Errors marked only in red colour also fail — colour alone cannot carry meaning for users with red-green colour blindness or screen-reader users. Add an icon, the word "Error:", or both.

Keyboard navigation must complete the task. A user with no mouse must reach Submit, fill every field, and submit. A keyboard-only walkthrough catches the bugs visual testing misses: focus traps in modals (Tab cycles within an open dialog and Esc closes it), skip-to-content links (a <a href="#main"> first in the tab order lets keyboard users skip the nav), and visible focus rings (:focus-visible styling that's clearly visible against the background).

WCAG 2.2 AA is the minimum level most legal frameworks treat as "accessible." It defines 55 testable criteria across four principles — Perceivable, Operable, Understandable, Robust. Modern testing tools (axe-core, Lighthouse, Pa11y) catch around a third of WCAG violations automatically; the rest need manual review with a screen reader (NVDA on Windows, JAWS for the enterprise market, VoiceOver on macOS and iOS, TalkBack on Android). The engineering practice is to run automated checks on every PR (axe-core in unit tests) and manual screen-reader testing on every flow before release.

The trade-off is engineering time, and the honest answer is: less than you think, if you start with semantic HTML; far more than you think, if you bolt accessibility on after launch. Every framework component you write is either accessible by default or accessible by retrofit — and retrofits are always more expensive than getting the contract right the first time.

Mobile, native, and design as a discipline

The web reaches further than any other distribution channel, but it is not the only surface. Phones are the dominant general-purpose computing device, and the constraints there — thermal, battery, network, screen size, touch input, app-store rules — shape interface engineering in ways the web does not. Three paths reach a phone.

Native means writing once per platform in the platform's language and toolkit. iOS: Swift with SwiftUI (declarative, modern) or UIKit (imperative, mature). Android: Kotlin with Jetpack Compose (declarative, Compose-style) or the older XML-layout + View system. Native delivers the best performance, lowest battery use, and full access to every OS API the day it ships — Live Activities, Dynamic Island, ARKit, Health, CarPlay, WidgetKit. The cost is two codebases, two engineering tracks, and two release cycles. Native pays off when the app is the product, the user is in it daily, and the surface justifies the duplication.

Cross-platform toolkits target the duplication directly. React Native runs JS in a separate engine (Hermes on Android, JSC on iOS) and bridges to native UIKit / Android Views; the new architecture (Fabric + TurboModules) shrinks the bridge cost. Flutter does not use the native widgets at all — it ships its own rendering engine (Skia, now Impeller on iOS) and draws every pixel from a Dart codebase, achieving consistent visuals at the cost of platform-feel divergence. Both let one team ship to both phones; both also lag the platforms by months when new OS features arrive.

Progressive Web Apps (PWAs) bring the web to phones via the Service Worker, the manifest, and install-to-home-screen. Installable, offline-capable, push-notifiable on most platforms (Apple is selectively restrictive). No app-store gatekeeper, instant updates, and the same codebase as the web. The trade is reduced access to OS capabilities — file pickers, Bluetooth, NFC, contacts work in pockets of the spec, and the experience on iOS is more constrained than on Android.

The phone surface is three routes to the same screen. Pick the one whose constraint shape matches the team you have.

Across all three surfaces, design as a discipline decides whether the interface succeeds. Design is not the colour of a button; it is the set of decisions about what the product does and how the user moves through it. A designer who only makes screens hands a developer pictures; a designer who decides product behaviour hands a developer a spec for how the system responds to a user goal. The second one is harder to hire and unambiguously more valuable.

A working design system centralises decisions before each surface re-decides them. Design tokens are the foundational primitives — --color-bg-primary, --radius-md, --space-4, --font-size-lg — that propagate across web, iOS, and Android in a single source (often JSON via Style Dictionary, Tokens Studio, or the W3C Design Tokens spec). A token change updates every surface in one PR. Above the tokens sit component primitives — Button, Input, Modal, Toast — each accessible by default, each with documented props and states. Above those sit patterns — Sign-up flow, Empty state, Error recovery — that compose primitives into recognisable interaction shapes.

A rebrand becomes a one-PR change. Without tokens it becomes a six-month migration touching every surface.

Motion deserves its own engineering treatment. A 200 ms ease-out transition tells the user "this thing came from that thing" — a property panel sliding in from the right hints at the spatial relationship between the trigger and the result. A 600 ms transition feels slow; a 100 ms one feels abrupt. Spring physics (overshoot, settle) produces motion that matches how physical objects move and signals tactility. Motion that does not communicate — a sparkle effect after every click, a fade-in on every page load — is noise that costs frame budget. Every animation should answer "what does this teach the user about the system?"

Perception research provides the constants. Fitts's law — formalised by Paul Fitts in 1954 — says the time to point at a target grows with distance and shrinks with target size: T = a + b * log₂(D / W + 1). The practical version: bigger targets are faster to hit, and targets at screen edges (corners, edges of a phone) are effectively infinite size because the cursor or finger can't overshoot. Apple's minimum tap target is 44×44 pt; Google's is 48×48 dp. The 200 ms attention threshold is the rough window during which a user's gaze remains on an interaction's result before moving on; feedback that arrives after 200 ms feels disconnected from the action. Hick's law says decision time grows with the number of options — a menu of 50 items is not 5 times slower to use than one of 10, it is closer to 8 times slower.

Typography on screens is a precise engineering surface. Type ramp sets a small number of sizes (12 / 14 / 16 / 20 / 24 / 32 / 48) and forbids the rest — five sizes carry every screen instead of fifty ad-hoc ones. Line height rises with font size for body text (1.5×) and falls for display text (1.1×). Measure — characters per line — stays in the 45–75 range for sustained reading. Variable fonts (woff2 files with wght, wdth, opsz axes) collapse what used to be six weight files into one, cutting font payload by 60–80% while letting design vary weight continuously.

The honest limit on design as engineering: most teams either underinvest (the designer makes pictures, the engineer makes ad-hoc decisions on every spec gap) or overinvest (a design system big enough to need its own engineering team consumes more capacity than it returns). The sweet spot is a small system — tokens, 15–25 components, a handful of patterns — owned by a designer-engineer pair and used by every product team. Past that, every shipped product earns its surface back in months instead of weeks.

Standards

The web platform is the most heavily specified surface in software. Most of the standards below are either WHATWG living specs (continuously updated), W3C Recommendations (snapshot standards), or vendor docs that have become de facto canon.

Web platform specs:

WHATWG HTML — html.spec.whatwg.org. The living spec for HTML, the DOM, parsing, the event loop, and most of what a browser implements at the document level.
WHATWG DOM — dom.spec.whatwg.org. The tree model, mutation observers, custom elements, shadow DOM.
ECMAScript — tc39.es/ecma262 and the TC39 proposals tracker. The JavaScript language spec; proposals advance through stages 0–4 before becoming part of the annual edition.
CSS Working Group specs — drafts.csswg.org and the canonical CSS Snapshot. Cascade, selectors, flexbox, grid, container queries, colour, and every module published or in draft.
URL Standard — url.spec.whatwg.org. The parsing and serialisation rules every fetch and link obeys.
Fetch — fetch.spec.whatwg.org. Defines Request, Response, CORS, and the network fetching semantics behind fetch() and Service Workers.
Streams — streams.spec.whatwg.org. ReadableStream, WritableStream, TransformStream; the substrate behind streaming responses and pipe chains.
Service Worker — w3.org/TR/service-workers. The installable network proxy that powers offline and PWAs.
IndexedDB — w3.org/TR/IndexedDB. The async structured client database.
WebSockets — RFC 6455 plus websockets.spec.whatwg.org. The full-duplex protocol layered on HTTP/1.1 upgrade.
WebRTC — w3.org/TR/webrtc. Peer connection, data channels, getUserMedia integration.
WebGPU — gpuweb.github.io/gpuweb. Modern explicit-graphics API for the web; shading language is WGSL.
Web Components — w3c.github.io/webcomponents. Custom elements, shadow DOM, HTML templates; the browser-native component model.
WebAssembly — webassembly.github.io/spec. The portable bytecode that runs alongside JS in every major browser.

Accessibility:

WCAG 2.2 — w3.org/TR/WCAG22. The current Web Content Accessibility Guidelines; AA is the practical legal floor in most jurisdictions, AAA is aspirational.
WAI-ARIA 1.3 — w3.org/TR/wai-aria-1.3. Accessible Rich Internet Applications: roles, states, properties.
ARIA Authoring Practices Guide (APG) — w3.org/WAI/ARIA/apg. Reference patterns for combobox, dialog, listbox, tabs, treegrid — the worked examples every component library cribs from.
ATAG 2.0 — w3.org/TR/ATAG20. Authoring Tool Accessibility Guidelines; for tools that generate web content.
Accessible Name and Description Computation (AccName) — w3.org/TR/accname. How browsers compute the string a screen reader announces.

Performance:

Core Web Vitals — web.dev/vitals. LCP, INP, CLS definitions, thresholds, and field-measurement methodology.
web-vitals JS library — github.com/GoogleChrome/web-vitals. Reference implementation for measuring Core Web Vitals in production.
Performance Timeline — w3.org/TR/performance-timeline. The PerformanceObserver API and entry types (navigation, resource, paint, largest-contentful-paint, event).
HTTP Archive Web Almanac — almanac.httparchive.org. Annual report on the state of the web platform from real-site crawls.
caniuse.com — caniuse.com. The reference for browser-feature support tables; check before relying on any spec less than three years old.

JavaScript engines:

V8 — v8.dev. Chrome, Edge, Node, Deno, Bun-on-server use V8 (Bun-on-client uses JSC). Deep blog posts on TurboFan, Sparkplug, Maglev, Liftoff.
JavaScriptCore (JSC) — webkit.org/blog and the JSC wiki. Safari, Bun (client), Tauri (macOS).
SpiderMonkey — firefox-source-docs.mozilla.org/js. Firefox; WarpMonkey is the current optimising tier.

Frameworks and libraries:

React — react.dev. The reference for React 19+, including Server Components and the React Compiler.
Vue — vuejs.org/guide. Vue 3 docs cover the Composition API, reactivity (ref, reactive), and SFCs.
Svelte — svelte.dev/docs. Svelte 5 documentation with the runes ($state, $derived, $effect) reactivity model.
Solid — docs.solidjs.com. Fine-grained signal-based reactivity; no virtual DOM.
Angular — angular.dev. The full-framework stack; recently added signals alongside the older zone-based change detection.
Astro — docs.astro.build. Islands architecture for content-led sites.
Qwik — qwik.dev. Resumable framework that serialises the framework state into HTML so the client never re-bootstraps.
Next.js — nextjs.org/docs. The React full-stack framework that popularised the App Router and Server Components.
SvelteKit / Nuxt / Remix — official docs at kit.svelte.dev, nuxt.com/docs, remix.run/docs.

Mobile platforms:

Apple Human Interface Guidelines (HIG) — developer.apple.com/design/human-interface-guidelines. The canonical UX reference for iOS, iPadOS, macOS, watchOS, visionOS.
Material Design 3 — m3.material.io. Google's design system spec; the basis for Material You and Android's default look.
Apple Developer — developer.apple.com/documentation. UIKit, SwiftUI, AppKit, and every Apple framework.
Android Developers — developer.android.com. Jetpack Compose, Android Views, the platform SDKs.
React Native — reactnative.dev/docs. The new architecture (Fabric, TurboModules, JSI) is documented in the architecture overview.
Flutter — docs.flutter.dev. Dart-based cross-platform UI with the Skia / Impeller rendering engines.
PWA manifest — w3.org/TR/manifest. Web App Manifest spec.

Design references:

Refactoring UI — Adam Wathan and Steve Schoger, 2018, refactoringui.com. Practical visual-design heuristics for engineers.
The Design of Everyday Things — Don Norman, MIT Press, revised edition 2013. The foundational text on affordances, signifiers, and feedback in interface design.
Fitts (1954) — Paul M. Fitts, "The information capacity of the human motor system in controlling the amplitude of movement," Journal of Experimental Psychology 47(6). The original derivation of the relationship between target size, distance, and pointing time.
Designing Interfaces — Jenifer Tidwell, Charles Brewer, Aynne Valencia, O'Reilly, 4th ed. 2020. The pattern-language reference for UI interactions.
Inclusive Design Principles — inclusivedesignprinciples.org. The Paciello Group's seven principles for designing for the full range of users.

Cross-act references:

Image, audio, and font encoding — every JPEG, AVIF, WebP, WOFF2, and Unicode glyph the browser renders is a byte pattern decided in Act I. Performance work at the interface layer often becomes encoding work upstream.
Browser tabs are processes; tabs use threads; the OS schedules them. The reality underneath the main-thread metaphor lives in Act IV — virtual memory, file descriptors, the scheduler that decides when your tab gets CPU.
HTTP/2, HTTP/3, TLS, DNS — every request that produces a pixel travels through the protocol stack documented in Act Va. Latency at the interface is mostly network latency in disguise.
Caching, CDNs, observability, and the back-end performance story — the systems that sit behind every API call from the browser — are Act Vc.
The team practice that ships the interface and keeps it improving — version control, code review, testing, CI/CD, decisions on paper — is Act IXb. Once the interface is live, keeping it alive under load — capacity, profiling, on-call, incident response, SLOs — is Act IXc.