Information
Everything is a bit pattern, and the meaning is always an agreement.
A computer holds nothing but bits. A bit means something only when two parties agree what it means.
The bit
The smallest decision a machine can make: on or off, present or absent, 1 or 0.
Bit patterns and agreements
Eight bits is a byte. The same byte can be a number, a letter, or a pixel — only the agreement decides which.
Numbers from bits
Place values
Binary works like decimal, but each column is worth twice the one to its right.
01000001 = 64 + 1 = 65.Addition
Add column by column, right to left. When a column overflows, carry the 1.
Negative numbers
A 4-bit register holds 16 patterns. Two's complement parks half on each side of zero so addition just works — no special hardware for subtraction.
0111 wraps to 1000 — overflow.Floats
Real numbers don't fit in finite bits. IEEE 754 standardizes a compromise: binary scientific notation with a sign bit, a biased exponent, and a fractional mantissa.
0.1) round on the way in.Text from bits
ASCII
Seven bits, 128 codepoints — every English letter, digit, and control code, all in one table.
Unicode
ASCII covers English. Unicode covers everything: ~1.1 million codepoints organized into 17 planes of 65,536 each.
UTF-8
A codepoint can be up to 21 bits. UTF-8 packs them into 1–4 bytes, self-synchronizing, with ASCII as the no-op case.
Beyond text
Images, audio, video — same trick, different sampling. Pick a grid, pick a rate, pick a bit depth. Agree.
Compression
Most data has redundancy. Compression rewrites it in fewer bits — losslessly when the original must be recoverable, lossily when perception (or the use case) can absorb the error.
Serialization
Programs hold structured data in memory. To send it across a wire or write it to disk, the structure must be flattened to bytes — both sides agreeing on the layout.
Standards
Every agreement on this page has a canonical specification. Cite these, not blog posts.
- ASCII — ANSI X3.4-1986 / ISO 646. 7-bit, 128 codepoints.
- Two's complement integers — universal in modern hardware; no separate spec, but documented in every CPU ISA reference (e.g. Intel® 64 and IA-32 Architectures Software Developer's Manual, Vol. 1).
- Floating-point — IEEE 754-2019 (revision of IEEE 754-1985 / -2008). Defines binary32, binary64, rounding modes, NaN/Inf semantics.
- Unicode — The Unicode Standard, current major version. Code charts, normalization, bidi, collation. Maintained by the Unicode Consortium; tracked by ISO/IEC 10646.
- UTF-8 — RFC 3629 (obsoletes RFC 2279). Defines the byte sequences shown in the encoding diagram, including the prohibition on surrogate codepoints and overlong forms.
- JSON — RFC 8259 / ECMA-404. Defines the grammar;
JSON.stringifysemantics live in ECMA-262 (JavaScript). - Protocol Buffers — Encoding (developers.google.com/protocol-buffers/docs/encoding) and the
.protolanguage reference. Wire format is stable across proto2/proto3; the schema language is not. - Run-length and other lossless schemes — RFC 1951 (DEFLATE) underpins
gzip,zlib, PNG; RFC 7932 (Brotli) is the modern successor.
Branches that earn their own article.
- Deep dive on IEEE 754 floating point.
- Unicode specification and encoding details.
- Image format internals (PNG, JPEG, WebP).
- Audio/video codecs.
- Information theory (Shannon, entropy).
- Compression algorithms (Huffman, LZ family).
- Serialization formats compared (ASN.1, MessagePack, Avro, Parquet).