Networking

Two computers want to exchange bytes. Between them is a wire — or fibre, or radio — that delays, drops, reorders, duplicates, and corrupts what they send. That's the entire problem. Every protocol on this page exists to mask one of those failures so the layer above can pretend the wire is something it isn't: a reliable stream, a named host, a request that returned an answer.

The networking stack — five layers and the failure modes they maskApplicationHTTP · DNS · SMTP · TLS — what the program speaksTransportTCP (reliable stream) · UDP (fast datagram) · QUICNetworkIPv4 · IPv6 — packets routed across networksLinkEthernet · Wi-Fi — frames between neighboursPhysicalcopper · fibre · radio — voltage, photons, EM wavesThe unreliable wirewhat every layer above masksdelaydropreorderduplicatecorruptRTT 0.1 – 300 msloss 0.01 – 5%queue rebalancesretx echoesBER 10⁻⁹ – 10⁻³+ partition+ congestion+ adversarieseach layer adds a header above and consumes the header belowHTTP/3 swapped TCP for QUIC without touching IP, link, or physical
The whole act is the right column hiding behind the left. Remove TCP and the network doesn't become reliable — the wire's behaviour just stops being hidden from the application.

The unreliable wire

A signal in copper or fibre travels at roughly two-thirds the speed of light. Strip the protocols away and you're left with a medium that makes one promise — best effort — and breaks it in five well-defined ways.

Delay is unavoidable. A transcontinental round trip is about 70 ms with a perfect path and no queues. Real round-trip times sit between 0.1 ms (same datacentre rack) and 300 ms (geostationary satellite). Drops happen when a router's queue fills or a frame's checksum fails. 0.01% loss is healthy, 1–5% is congested, and anything higher turns a reliability protocol into stop-and-go traffic. Reorder comes from equal-cost paths with different latencies — two packets in the same flow can take different routes and arrive out of order. Duplication is rare on the wire itself but common after retransmission echoes. Corruption — single bits flipped — is detected by checksums at multiple layers, with bit-error rates from 10⁻⁹ on fibre to 10⁻³ on noisy wireless.

Five failure modes of the unreliable wireABthe wiredelayRTT 0.1 – 300 ms · transcontinental ≈70–150 mspropagation + queueing + processingdroploss 0.01% – 5%reorder1234513254ECMPduplicate12312233retx echocorrupt0110100101111001BER 10⁻⁹ – 10⁻³
Five failure modes, one wire. Fibre is quiet, Wi-Fi is loud, but the stack must handle all five — one bad hop on the path is enough to expose any of them to the application.

The wire isn't the only thing that can fail. Routers crash. Links flap. BGP withdraws a prefix and a continent's traffic shifts. A NAT entry expires and an idle connection dies. Software treats these as variants of "drop," but the timescales differ enormously: a buffer overflow loses a packet in microseconds; a BGP reconvergence stalls traffic for tens of seconds. Anything that makes the network unreliable lives in this column.

Layers

The problem. A single end-to-end protocol over a wire would have to handle modulation, neighbour addressing, global addressing, reliability, semantics, and security all at once. Change anything and you change everything. Nobody can deploy that.

The fix. Split the work into five layers, each with a narrow contract. The physical layer turns bits into voltages, photons, or radio waves. The link layer (Ethernet, Wi-Fi) frames bits and addresses immediate neighbours on the same segment. The network layer (IP) carries packets from any source to any destination across routers. The transport layer (TCP, UDP, QUIC) offers either a reliable byte stream or an unreliable datagram. The application layer (HTTP, DNS, SMTP) defines what the bytes mean.

Each layer prepends its own header to whatever it gets from above — encapsulation — and strips it on receive. The layer above is opaque payload. Headers carry only the metadata the current layer needs.

Encapsulation: an HTTP request wrapped in TCP, IP, and EthernetEthernet frameIP packetTCP segmentHTTP requestopaque payload to TCP, IP, Etherneton the wire: ETH | IP | TCP | HTTP — each header read and stripped in reverse on receive
Each layer treats the one above as a blob. The Ethernet MTU (typically 1500 bytes) caps the outermost frame; longer messages are fragmented at IP or segmented at TCP.

The win is independent evolution. HTTP went from text in 1991 to binary multiplexed streams in 2015 without TCP changing. TCP went from Reno congestion control to CUBIC to BBR without IP changing. IPv6 deployed alongside IPv4 without Ethernet changing. A new physical medium (LTE, fibre to the home) plugs in below the link layer without anything above noticing. Shared state between layers would have killed every one of these transitions.

The pitfall. The model is clean; production crosses the lines whenever performance or policy demands it. A modern NIC offloads TCP segmentation and TLS encryption into hardware. A router doing deep packet inspection reads inside TCP and HTTP on purpose to apply policy. Layering is the abstraction, not the implementation.

ARP — bridging two address spaces

Link and network layers use different addresses. Ethernet uses 48-bit MAC addresses; IP uses 32-bit (v4) or 128-bit (v6) addresses. To send an IP packet to 192.168.1.5, the host needs the MAC address of whichever interface owns that IP on the local segment — because the Ethernet frame is addressed to a MAC, not an IP.

ARP (Address Resolution Protocol) bridges the gap. The host broadcasts "who has 192.168.1.5?" on the local segment. Whichever interface owns that address replies with its MAC. The host caches the mapping for a few minutes, then re-asks when it expires. ARP only works inside one broadcast domain — once the packet crosses a router, the MAC pair changes hop by hop while the IP pair survives end to end. IPv6 replaces ARP with Neighbor Discovery; same idea, multicast instead of broadcast.

ARP resolves a local IP to a MAC before the frame can leaveAB192.168.1.2aa:aa:…:01192.168.1.5bb:bb:…:05shared Ethernet segment (one broadcast domain)ARP req: who has 192.168.1.5?broadcast to ff:ff:ff:ff:ff:ffARP reply: 192.168.1.5 is at bb:bb:…:05Frame on the wiredst MAC: bb:bb:…:05src MAC: aa:aa:…:01payload: IP{src 1.2, dst 1.5} | TCP | datalink layer addressesneighbours;network layeraddresses endpoints.crossing a router rewrites the MAC pair; the IP pair survives end to end
The ARP cache is tiny but load-bearing. Spoofed replies are the basis of classic LAN attacks — tell everyone the gateway is at the attacker's MAC, then silently relay every byte.

IP and routing

The problem. A packet leaving a host needs to reach any other host anywhere on Earth, through a path the sender doesn't know in advance, across networks owned by different organizations.

The fix. IP (Internet Protocol) gives every host a numeric address and lets routers forward packets toward it one hop at a time. IPv4 uses 32-bit addresses written as four decimal octets (93.184.216.34) — about 4.3 billion total, exhausted at the registry level since 2011. IPv6 uses 128-bit addresses written as hex groups (2606:2800:0220::1946) and is large enough that exhaustion isn't a concern this millennium. Both use the same routing model.

Addresses are grouped into CIDR prefixes. 93.184.216.0/24 covers 256 addresses from .0 to .255. The slash counts the leading bits that name the network; the rest names the host inside. Prefixes nest: 10.0.0.0/8 contains 10.1.0.0/16 contains 10.1.2.0/24.

A router's job is forwarding. Its routing table is a list of prefixes paired with next hops. For each packet, it picks the most specific matching prefix — longest-prefix match — and forwards to the next hop named there. No router knows the entire path; each makes a local decision. The packet's full route is the consequence of those independent decisions strung together.

Subnetting a /24 into four /26 blocks192.168.1.0/24256 addresses · 24 network bits · 8 host bitsborrow 2 host bits to make 4 subnets192.168.1.0/26192.168.1.64/26192.168.1.128/26192.168.1.192/26.0 – .63.64 – .127.128 – .191.192 – .25562 usable hosts62 usable hosts62 usable hosts62 usable hostsprefixmaskaddressesusable hosts/24255.255.255.0256254/25255.255.255.128128126/26255.255.255.1926462/30255.255.255.25242
Each bit borrowed from the host portion doubles the number of subnets and halves their size. Two addresses per subnet are reserved — the network (all zeros) and broadcast (all ones) — which is why a /30 point-to-point link gives only two usable hosts.
Routers forward by longest-prefix matchR1R2R4R5R6srcdst 198.51.100.42prefixnext hopmetric10.0.0.0/8R310198.51.100.0/24R450.0.0.0/0R3100default route — last resortR2's routing tablelongest-prefix match198.51.100.42 matches/24 (most specific)forward to R4
R2 receives a packet, scans its table, picks the longest matching prefix, hands it off. Repeat at every hop. The path is whatever falls out of those local decisions.

Traceroute

The path is invisible to the endpoints — neither end knows which routers the packets visited. Traceroute reconstructs it with a clever trick on the IP TTL (time-to-live) field. Every router decrements TTL by one; if TTL hits zero, the router drops the packet and returns an ICMP "Time Exceeded" message identifying itself.

Traceroute sends a probe with TTL=1, learns the first hop from the ICMP reply, then TTL=2 for the second, and so on. TTL exists to kill packets stuck in routing loops; traceroute repurposes it as a diagnostic.

Inside vs between networks

A network operated by one organization is an autonomous system (AS). Inside an AS, routers use interior protocols like OSPF or IS-IS to flood link-state updates so every router builds the same map and computes shortest paths.

Between ASes, BGP-4 carries policy. AS 15169 (Google) announces to AS 7018 (AT&T): "I have a path to 8.8.8.0/24 of length 1." BGP is a path-vector protocol — every announcement carries the list of ASes it has crossed, which is how loops are detected (see your own AS in the path, ignore the route) and how operators tiebreak between alternatives. But length is only one signal. Operators override it constantly with policy attributes encoding business rules: prefer customer routes over peer routes over upstream routes, because customers pay you and upstreams charge you. The internet's routes are shaped as much by money as by physics.

BGP route propagation: AS-PATH grows by one with each hopAS 1AS 2AS 3origintransitcustomer203.0.113.0/24UPDATE → AS 2prefix: 203.0.113.0/24AS-PATH: [1]UPDATE → AS 3prefix: 203.0.113.0/24AS-PATH: [2, 1]each AS prepends its number before re-announcingpolicy attributes (LOCAL_PREF) override length when business rules dictate
BGP is path-vector: a route is the prefix plus the list of ASes it has crossed. The path is policy first, length second.

Anycast — one address, many sites

The problem. Anchoring a service to one IP address means anchoring it to one location. If that location dies, the service dies. If the user is on another continent, the round trip is bad.

The fix. Have multiple sites announce the same prefix from different ASes. Routers pick the path that looks shortest to them, so each client lands at whichever site is closest on the BGP topology. This is anycast: one IP advertised from many places, the network's own routing decides who goes where. The DNS root servers all use it. CDNs use it. DDoS scrubbers use it.

The catch is flow stability. If BGP reroutes mid-connection, a long-lived TCP session can suddenly find itself talking to a different site that has no idea about the connection. Anycast works cleanly for stateless or short-lived flows (DNS, fresh HTTPS). QUIC's connection IDs were partly designed to survive such mid-flow reroutes.

NAT and address sharing

The problem. IPv4 ran out of public addresses. Most home and corporate networks have far more devices than public IPs.

The fix. NAT (Network Address Translation) lets many devices share one public IP by rewriting the source address and port on outbound packets. The NAT keeps a table mapping (internal IP, internal port) to (public IP, public port). When the reply arrives, it looks up the entry and rewrites in reverse. From the outside, the whole network looks like one host.

The pitfall is port exhaustion: the public side has only 65,535 ports, and many flows reuse a small range. Heavy concurrent users behind one NAT — especially apps that open hundreds of TLS connections to the same destination — can run out. NAT also breaks any protocol that embeds addresses inside the payload, since rewriting the header doesn't touch the body.

NAT rewrites source address and port on outbound flowsPrivate LAN10.0.0.0/24ABCNATrewrite src203.0.113.7Public internetsees one IP per LAN203.0.113.7internal (src:port)external (src:port)remote (dst:port)10.0.0.5:50321203.0.113.7:4000193.184.216.34:44310.0.0.6:50321203.0.113.7:4000293.184.216.34:44310.0.0.7:33108203.0.113.7:400031.1.1.1:53NAT translation table (per flow)
Two devices behind one NAT can both use source port 50321 to the same destination — the NAT picks unique external ports to keep them apart.

TCP and UDP — the two transports

The problem. IP delivers packets unreliably. The application wants to send a file and have all of it arrive, in order, exactly once. Building that on top of "best effort" requires every byte to be tracked, missing ones detected and resent, and the sender slowed when the network can't keep up.

The fix. TCP (Transmission Control Protocol) turns IP's best-effort packets into a reliable, in-order, full-duplex byte stream between two endpoints. It does this with three mechanisms layered on the same packet exchange: connection setup, sequence numbers with acknowledgements, and windows that limit how much data is in flight.

The three-way handshake

Before sending data, both sides need to agree they're talking to each other and exchange starting sequence numbers. The three-way handshake does this in three packets:

  1. Client sends SYN with initial sequence number x.
  2. Server replies with SYN-ACK, acknowledging x and announcing its own initial sequence y.
  3. Client sends ACK acknowledging y.

Three packets, one full RTT between the client's first SYN and the moment it can attach data to the closing ACK. Both sides now know the other's starting sequence. That latency penalty — paid before any useful byte moves — is one reason QUIC and TCP Fast Open exist.

TCP three-way handshake and small data exchangeClientServerSYN seq=xSYN-ACK seq=y, ack=x+1ACK ack=y+1established · 1.5 RTT laterDATA seq=x+1, len=460ACK ack=x+461cumulative — covers everything up to x+460
Initial sequence numbers are randomized so off-path attackers can't inject data by guessing them. The SYN itself consumes a sequence value.
Worked example: one handshake with concrete sequence numbers

Each side picks an initial sequence number (ISN) at random — say the client picks 1000, the server picks 5000. The handshake is three packets that teach each side the other's starting number.

StepDirectionFlagsseqackWhat it says
1C to SSYN1000"I want to talk. My byte stream starts at 1000."
2S to CSYN, ACK50001001"Got your 1000. Mine starts at 5000."
3C to SACK10015001"Got your 5000. Ready."

After step 3 both sides know the other's ISN. The SYN itself consumes one sequence value, which is why the acks are 1001 and 5001 rather than 1000 and 5000.

Now suppose the client sends 460 bytes of data and the server sends nothing back yet:

StepDirectionFlagsseqackNotes
4C to SDATA10015001bytes 1001 through 1460
5S to CACK50011461"I have everything up to byte 1460."

The ack number is always the next byte the receiver expects — not the last one it got. If a later segment arrives with seq = 1600 but the segment starting at 1461 was dropped, the server still acks 1461, repeatedly. The duplicate acks tell the client which byte to retransmit.

Sequence numbers and retransmission

Every byte in the stream has a sequence number. The receiver acknowledges the highest contiguous sequence it has received — ACKs are cumulative, so one ACK can cover many segments. The sender retransmits anything not acknowledged within a retransmission timeout (RTO) derived from the round-trip time it has measured.

Cumulative ACKs have a weakness: if segment 5 is lost but 6, 7, 8 arrived, the receiver can only ACK up to 4. The sender doesn't know whether 6, 7, 8 arrived or were lost too. Selective acknowledgement (SACK) fixes this — the receiver explicitly lists the ranges it has, so the sender retransmits only what's missing.

Two windows

Flow control is a peer concern: the receiver advertises a receive window (rwnd) in every ACK, telling the sender how many bytes its buffer can hold. The sender never has more than rwnd bytes outstanding. Without this, a fast sender would drown a slow receiver.

Congestion control is a shared-resource concern: the sender keeps its own congestion window (cwnd), an estimate of how much the network can absorb without dropping. The actual in-flight limit is min(rwnd, cwnd).

Why two? Because the receiver knows its own buffer but not the network's; the sender knows the network's behaviour but not the receiver's buffer. Each enforces what it can see.

Congestion control

If every sender pushed packets as fast as it could, the bottleneck router's queue would fill, packets would drop, senders would retransmit, queues would fill again. The network would collapse. Congestion control is the agreement that prevents this — and the classic rule is AIMD: additive increase, multiplicative decrease.

  • Slow start. A new connection opens cwnd at ≈10 segments and doubles it every RTT (exponential ramp).
  • Congestion avoidance. Once a threshold is reached, growth slows to ≈1 segment per RTT (additive).
  • On loss. Cut cwnd in half (multiplicative). Resume additive growth.

The asymmetry — climb slowly, fall fast — is what makes competing TCP flows converge to a fair share of a bottleneck. Two senders both backing off on loss leaves room for each to grow back, and they end up trading congestion signals at roughly equal rates.

Congestion window: slow start, AIMD, losscwndtimessthreshlosslossslow start× 2 / RTTcongestion avoidance+ 1 / RTTcwnd ← cwnd / 2
Exponential climb until the first loss, then linear growth, then halve on each loss event. The slope is gentle; the cliffs are sharp.
Worked example: one connection through slow start, loss, and recovery

A fresh TCP connection. Take RTT = 50 ms and segment size = 1500 bytes. The congestion window is measured in segments.

RTTPhasecwnd beforecwnd afterWhat just happened
0slow start10initial window
1slow start1020every ack of the 10 in-flight grew cwnd by 1
2slow start2040doubled again
3slow start4080doubled again — sender is now pushing 80 × 1500 = 120 KB per RTT
4loss event8040a drop is detected; cwnd halved; ssthresh now 40
5congestion avoidance4041linear growth from here: +1 per RTT
6congestion avoidance4142
30congestion avoidance6566a long climb back

The asymmetry is the whole story. The connection took 4 RTTs (200 ms) to find its first ceiling, and it will take 25 RTTs (over a second) to climb back to that ceiling after one drop. This is why a single packet loss on a long path is so much more expensive than the lost bytes suggest, and why short-lived connections — most HTTP requests — never escape slow start at all.

Two variants dominate today. CUBIC (Linux default) replaces linear increase with a cubic function of time since the last loss — it recovers high windows faster on long, fast paths. BBR (Google) abandons loss-as-signal entirely and measures the bottleneck bandwidth and minimum round-trip time directly, pacing packets to fill the pipe without queueing. BBR is faster on lossy paths; fairness with CUBIC is contested.

Bufferbloat

The classic congestion control assumes loss is the signal. But cheap RAM made router buffers enormous, and a deep queue holds packets for hundreds of milliseconds before they're dropped. The result is bufferbloat: throughput is fine, latency is awful, and the sender doesn't notice because it isn't seeing loss yet. Interactive applications — video calls, gaming — die under load.

Active queue management drops packets before the buffer fills, so the congestion signal arrives in time. CoDel (the default on modern home routers) drops based on how long the oldest packet has been queued — if every packet has dwelt at least 5 ms recently, drop one. No tuning required. The trade-off is a slight throughput hit for predictable end-to-end latency under load, and most interactive workloads prefer that bargain.

UDP — the unreliable alternative

The problem with TCP. Reliable, in-order delivery isn't always what the application wants. A video frame that arrives 200 ms late is useless. A DNS query is one packet — retransmission, ordering, congestion control add cost for no gain.

UDP (User Datagram Protocol) is the alternative: an 8-byte header (source port, destination port, length, checksum) and a payload. No connection. No retransmission. No ordering. No congestion control. The application gets a thin shell over IP and handles everything itself — or chooses not to. DNS, NTP, video calls, game traffic, QUIC, and most metrics protocols use UDP because TCP's contract is wrong for them.

TCP vs UDP on the same lossy exchangeTCPUDPclientserverclientserverSYNSYN-ACKACKdata #1data #2 (lost)ACK #1retransmit #2datagram #1datagram #2 (lost)datagram #3no ACK · no retransmit · #2 simply goneconnection · ordered · reliable · congestion-controlledconnectionless · datagram-preserving · best-effort
TCP guarantees what an application wants for a file transfer. UDP guarantees nothing and gets out of the way. Real-time protocols pick UDP because waiting for a retransmit costs more than the lost data was worth.

Pitfall — head-of-line blocking. TCP delivers bytes in order. Lose one segment mid-burst and the receiver buffers everything after it until the retransmission arrives — even if the application could process later bytes independently. HTTP/2 multiplexes many streams over one TCP connection, so one lost packet stalls every stream. HTTP/3 fixes this by moving to QUIC, where each stream has its own loss-recovery state.

DNS

The problem. Humans remember example.com. Computers route on 93.184.216.34. Something has to translate between them — and it has to scale to billions of names and millions of changes per day without any central authority.

The fix. DNS (Domain Name System) is a distributed, cached, hierarchical lookup service. It maps names to whatever the application needs — usually an IP address, but also mail servers, text records, key fingerprints, and aliases.

The namespace is a tree. The unwritten root (.) sits at the top, served by 13 root server letters (A–M) operated by 12 organizations and replicated by anycast to about 1,800 instances worldwide. Beneath the root sit top-level domains (TLDs) — .com, .org, country codes, the long tail of new gTLDs. Beneath each TLD sit authoritative servers for individual zones.

Most clients never query any of these. They ask a resolver (run by their ISP, or a public service like 1.1.1.1) which walks the hierarchy on their behalf and caches every step. A query for www.example.com is at most: root tells resolver "ask the .com servers," .com tells resolver "ask example.com's nameservers," example.com returns the answer. Two referrals, one answer. Subsequent queries hit the cache.

DNS resolution walking root, TLD, authoritativeClientRecursive resolverRoot .TLD .comAuthoritativestub resolvercache hits answer most queriesA? www.example.com93.184.216.34.com NS?referral → .comexample.com NS?referral → ns1.example.comwww.example.com A?93.184.216.34 (TTL 300)Cache effectsEvery step is cached at the resolver. Most production queries are cache hits — never touch the authoritative.
The resolver does the recursion; the client makes one lookup. Each answer carries a TTL — short (60–300 s) for fast failover, long (1–7 days) for static infrastructure.

Record types and TTL

A handful of record types carry most traffic. A maps a name to an IPv4 address; AAAA to IPv6. CNAME is an alias to another name — resolvers chase the chain. MX points to mail servers with priorities. TXT holds arbitrary strings (SPF, DKIM, domain ownership proofs). NS declares which servers are authoritative for a zone. SOA carries the zone's serial number and refresh timers.

Each record carries a TTL that governs how long resolvers may cache it. TTL 300 s gives 5-minute failover; TTL 86,400 s reduces resolver load but locks you out of fast change. TTL is advisory — resolvers may keep records past TTL on failure, and browsers cache independently. Treat it as a lower bound on propagation time, not a contract.

Privacy and integrity

Plain DNS over UDP port 53 is unencrypted and unauthenticated. Anyone on the network sees what names you resolve, and anyone in the path can forge answers. DNS over TLS (DoT) and DNS over HTTPS (DoH) wrap queries in TLS for confidentiality. DNSSEC is a separate concern: it adds cryptographic signatures to zone data so a resolver can verify the answer came from the real authoritative server — integrity, not confidentiality.

HTTP

The problem. Two programs need to ask each other for data over a network. The protocol needs to be flexible enough for HTML pages, file downloads, JSON APIs, video streaming, file uploads, websockets — without re-negotiating from scratch every time.

The fix. HTTP uses a single request-response model with rich metadata. A request has a method, a target URI, headers, and an optional body. A response has a status code, headers, and an optional body. The methods declare intent; the headers carry every cross-cutting concern.

Methods and idempotence

  • GET retrieves. No body. Idempotent. Cacheable.
  • POST creates or submits. Has a body. Not idempotent.
  • PUT replaces a resource at a known URI. Idempotent.
  • DELETE removes. Idempotent.
  • PATCH modifies in place. Not necessarily idempotent.
  • HEAD is GET without the body. OPTIONS asks what's allowed.

Idempotence is the property that doing the operation twice produces the same outcome as once. It matters because the network can deliver a request twice — a client retries on timeout, the original arrives anyway — and a non-idempotent operation (POST) processed twice has now charged the card twice. Idempotent methods are safe to retry; non-idempotent ones aren't, and most "duplicate order" bugs are this issue.

Status codes and headers

The status code is a three-digit number. The first digit identifies the family:

  • 2xx success (200 OK, 201 Created, 204 No Content)
  • 3xx redirection (301 Moved, 304 Not Modified)
  • 4xx client error (400 Bad Request, 401, 403, 404, 429)
  • 5xx server error (500, 502 Bad Gateway, 503 Service Unavailable, 504 Gateway Timeout)

The family is the rule, the specific code is the detail. Headers carry the rest of the metadata: Authorization for credentials, Content-Type for body format, Cache-Control for cache lifetime, Accept for content negotiation, If-None-Match for conditional requests that get 304 instead of a full body. Most engineering happens in the headers.

Anatomy of an HTTP requestPOST /api/v1/orders HTTP/1.1Host: api.example.comAuthorization: Bearer eyJhbGciOiJSUzI1NiIs…Content-Type: application/jsonContent-Length: 47Accept-Encoding: gzip, br{"sku":"WIDGET-001","qty":3,"customer":1234}request line: method · target · versionheaders: cross-cutting metadatablank line, then optional body
Methods declare intent. Headers parameterize it. The body, when present, is whatever the headers said it would be.

Three wire formats, one semantics

The semantics — methods, codes, headers — have been stable for thirty years. The wire format has changed three times, each time to reduce what blocks under loss.

HTTP/1.1 is plain text, one request per connection at a time. Connection: keep-alive lets the same TCP connection serve sequential requests, but you can't send the next request before the previous response arrives. To work around this, browsers open six parallel TCP connections per origin.

HTTP/2 frames everything as binary on a single TCP connection, with multiple streams interleaved on the same wire. Many requests in flight at once, no need for six connections. It also adds header compression (HPACK).

HTTP/3 replaces TCP with QUIC over UDP. QUIC keeps reliability and congestion control but moves them into the same protocol as TLS — and gives each stream its own loss-recovery state. One lost packet stalls only the affected stream, not the whole connection. The TLS handshake is bundled into the transport handshake, so first byte arrives at 1 RTT (or 0 RTT on resumption).

HTTP/1.1, HTTP/2, HTTP/3 under one packet lossHTTP/1.1 over TCP+TLSHTTP/2 over TCP+TLSHTTP/3 over QUIC (UDP)TCPTLS 1.3req 1req 2req 3req 43 RTT to first byte; sequential requests; one in flight at a timebrowsers open 6 parallel TCP connections per origin to compensateTCPTLS 1.3streams 1–N interleavedpacket lossstall3 RTT to first byte; concurrent streams over one TCP connectionbut TCP delivers in order: one lost packet stalls every streamQUIC (TLS bundled)streams 1–Npacket lossother streams keep flowing1 RTT to first byte (0 RTT on resumption); per-stream loss recovery
Each version reduces what blocks under loss. HTTP/1.1 serializes requests. HTTP/2 serializes the underlying TCP stream. HTTP/3 isolates streams at the transport. The application semantics are unchanged.

Pitfall — caching is the hardest part of HTTP. Cache-Control: public, max-age=31536000, immutable and Cache-Control: private, no-store look similar and behave nothing alike. Stale CDN edges, browser caches, and intermediary proxies all interpret the rules slightly differently. A surprising share of "works for me but not for them" production bugs eventually trace back to a cache header.

TLS

The problem. Two strangers across the open internet need to:

  1. Agree on a shared secret key without ever sending it in the clear.
  2. Encrypt their traffic so observers learn nothing.
  3. Detect tampering — a middle party flipping bits.
  4. Confirm the server they reached really is who it claims.

Each of these is hard alone. TLS solves all four together, on top of TCP or inside QUIC.

Hybrid cryptography

The trade-off. Asymmetric algorithms (RSA, ECDHE) let two parties agree on a secret over a public channel without sharing it directly — but they're slow, milliseconds per operation. Symmetric algorithms (AES-GCM, ChaCha20-Poly1305) encrypt bulk data at gigabits per second but require a pre-shared key.

The fix. TLS uses asymmetric crypto only at the start, to establish a one-time symmetric session key. Then it switches to the fast symmetric algorithm for everything else. The expensive operation runs once per connection; the cheap operation runs on every byte.

Certificates and the trust chain

Encryption without authentication is useless — an attacker who intercepts the handshake can negotiate keys with you and read everything. The server has to prove its identity. TLS certificates do this.

A certificate is a public key plus a name plus a signature from a Certificate Authority (CA). The browser ships with about 150 root CAs in its trust store. A real-world site presents a leaf certificate (its own key, signed by an intermediate CA), and the browser walks the chain back to a root it trusts, checking each signature.

If any link breaks — expired, revoked, wrong name, untrusted root — the connection fails and the user sees the red lock. Leaf certs are short-lived (90 days is common); intermediates last years; roots last decades and are zealously protected, because compromising a root would let an attacker impersonate any site under it.

TLS certificate chain validationLeaf certIntermediate CARoot CACN = example.comvalid 90 dayssigned by →e.g. Let's Encrypt R3valid yearssigned by →e.g. ISRG Root X1self-signedin OS trust storeverifiesverifiesBrowser walk:1. Verify the leaf's signature using the intermediate's public key.2. Verify the intermediate's signature using the root's public key.3. Confirm the root is in the trust store, the leaf name matches the host, nothing has expired or been revoked.
Compromise of an intermediate means thousands of leaf certs to re-issue. Compromise of a root would let an attacker impersonate any site under it until every browser shipped an update distrusting the root.

The 1-RTT handshake

The TLS 1.3 handshake takes one round trip before the first encrypted byte:

  1. ClientHello — supported ciphers, random nonce, ephemeral public key (its half of the ECDHE exchange).
  2. ServerHello — server's ephemeral key, certificate, signature over the handshake transcript, Finished message. The signature proves the server holds the private key matching the cert.
  3. Client computes the shared secret, verifies the certificate and signature, sends its Finished, and starts sending encrypted application data.

Resumed connections can use 0-RTT: the client sends application data on its first packet, encrypted under a key derived from a previous session. Faster, but replayable — so only safe for idempotent requests.

TLS 1.3 1-RTT handshakeClientServerClientHellocipher suites · ephemeral key · randomServerHelloephemeral key · {cert, signature, Finished} encrypted1 RTT later · keys established · transcript verified{Finished} + Application Datafirst encrypted application byte at 1 RTT
TLS 1.3 halves the 1.2 handshake from 2 RTTs to 1, removes legacy weak ciphers, and encrypts the certificate. The earlier version sent the cert in the clear, which leaked the server's name to anyone watching.
Worked example: how the shared key actually appears

The trick is Elliptic-Curve Diffie-Hellman (ECDHE). Both sides pick a fresh private number, compute a public point from it, and trade public points. A mathematical property of the curve makes the result identical when each side combines its own private number with the other's public point — but the result is unreachable to anyone who only sees the two public points.

  1. Client picks a private number a and computes public point A = a · G, where G is a fixed curve generator both sides agreed on. It sends A in the ClientHello.
  2. Server picks a private number b, computes B = b · G, and sends B in the ServerHello.
  3. Client computes S = a · B. Server computes S = b · A. The curve guarantees these are the same point — that point is the shared secret.
  4. Both sides feed S plus the two handshake randoms through a key derivation function (HKDF) to produce the symmetric key used by AES-GCM or ChaCha20-Poly1305 for the rest of the connection.

An attacker on the wire sees A and B but not a or b. Recovering S from A and B is the elliptic-curve discrete-log problem, which has no known efficient classical solution. The certificate plays no part in producing S — it only signs the handshake transcript so the client knows it really did exchange keys with the holder of example.com's private key, not a middle attacker who substituted their own B.

This is also where forward secrecy comes from. The private numbers a and b are discarded the moment the handshake ends. Steal the server's long-term cert key tomorrow and you still cannot recompute S for any past session — the only inputs that ever existed are gone.

Forward secrecy

The problem. If an attacker records encrypted traffic now and steals the server's long-term private key years later, can they go back and decrypt?

The fix. Forward secrecy. TLS 1.3 mandates ephemeral key exchange (ECDHE): the symmetric key is derived from one-time keys that are discarded at the end of the session. The server's certificate proves identity; it never decrypts traffic. Steal the cert's private key tomorrow and yesterday's recordings stay sealed. RSA-only key exchange — which lacks this property — was removed in 1.3.

Pitfall — TLS hides what, not who. The Server Name Indication (SNI) field, sent in the ClientHello to let one server host certificates for many sites, is unencrypted. A network observer still sees which host you're connecting to, even if the contents are sealed. Encrypted Client Hello (ECH) is the in-progress fix; deployment is slow.

Standards

  • Internet ProtocolRFC 791 (IPv4), RFC 8200 (IPv6, replaces RFC 2460).
  • Transmission Control ProtocolRFC 9293 (current; supersedes RFC 793 plus a long list of extensions including 5681, 6298, 6928).
  • User Datagram ProtocolRFC 768. Eight-byte header; no reliability, no ordering, no congestion control.
  • DNSRFC 1034 (concepts and facilities) and RFC 1035 (encoding); RFC 8484 (DNS over HTTPS); RFC 7858 (DNS over TLS); RFC 403340344035 (DNSSEC).
  • HTTPRFC 9110 (semantics, replaces RFC 7230–7235), RFC 9112 (HTTP/1.1), RFC 9113 (HTTP/2, replaces RFC 7540), RFC 9114 (HTTP/3).
  • HTTP cachingRFC 9111. The rules every CDN, browser, and proxy claims to obey.
  • QUICRFC 9000 (transport), RFC 9001 (TLS in QUIC), RFC 9002 (loss detection and congestion control).
  • TLSRFC 8446 (TLS 1.3); RFC 5246 (TLS 1.2, predecessor still widely deployed).
  • RoutingRFC 4271 (BGP-4), RFC 2328 (OSPFv2), RFC 5340 (OSPFv3 for IPv6); ECMP described in RFC 2992.
  • Link layerIEEE 802.3 (Ethernet, including the MAC frame format and 10 Gb / 25 Gb / 100 Gb / 400 Gb physical layers); IEEE 802.11 (Wi-Fi, including 802.11ax / Wi-Fi 6 and 802.11be / Wi-Fi 7).
  • CIDRRFC 4632 (replaces RFC 1519). The classless prefix notation that replaced fixed Class A/B/C addressing in 1993.
  • Sockets APIIEEE 1003.1 (POSIX) defines the runtime API. The Berkeley sockets interface itself has no single RFC; it is documented across the BSD historical record and POSIX.
  • Forward refs — TLS 1.3 handshake details and X.509 certificate structure handed forward to Act VIIa (Security). Distributed-systems consequences of the unreliable wire — consensus, replication, partitions — handed forward to Act Vb. Serialization formats (JSON, Protocol Buffers) on which most application protocols are built handed back from Act I.
Going deeper

Branches that earn their own article.

  • Physical layer (Ethernet, fiber, wireless, signal encoding).
  • BGP and internet routing in depth.
  • TCP congestion control algorithms (Reno, CUBIC, BBR).
  • UDP-based protocols.
  • DNS security (DNSSEC, DoH, DoT).
  • WebSockets, gRPC, Server-Sent Events.
  • NAT, firewalls, load balancers (L4 vs L7).
  • CDNs.
  • QUIC protocol internals.