Handbook · Digital · Hardware
Signals & Embedded
Signals & Embedded··38 min read
TL;DR
The physical world is continuous. Temperature, pressure, voltage, sound, light, radio waves, and the position of a mechanical arm are all analog signals that vary smoothly over time. Digital systems only understand numbers. Any time we want a computer to sense or act on the physical world, something has to turn the analog signal into a sequence of numbers on the way in, and numbers back into an analog signal on the way out. That translation, and everything we do with the numbers in between, is what signals and embedded systems is about.
The field has two halves, tightly coupled. Signal processing is the math of sampling, filtering, detecting, modulating, and compressing signals — ideas that live in the frequency domain and date back to Fourier, Nyquist, and Shannon. Embedded systems is the engineering of running that math on small microcontrollers — chips with kilobytes of RAM, no operating system worth the name, direct wires to sensors, and a real-time deadline measured in microseconds. A car's anti-lock braking system, a drone's flight controller, a pacemaker, a guitar pedal, a Wi-Fi chip, and a factory PLC are all built in this world.
This handbook walks the chain from the sensor to the control loop, and from the wire back to the deadline. Sampling — how to turn a continuous signal into a sequence without lying about what is in it (Nyquist). Quantization — how much precision each sample needs (SNR). Filters — how to remove noise and separate bands of interest (FIR, IIR). FFT and windowing — how to see the frequency content and the trade-offs of finite data. Modulation — how to ride digital bits on analog carriers so they can travel over wire or air. Board-level buses — I²C, SPI, UART, CAN — how chips talk to each other on one circuit board. Interrupts and DMA — how a microcontroller handles real-time events without stopping to poll. Real-time scheduling — how to guarantee a deadline is met when the bus is already saturated.
You will be able to
- Pick a sample rate, anti-alias filter, and bit depth for a given signal and state the Nyquist-Shannon constraint that forced each choice.
- Read an I²C, SPI, UART, or CAN frame on a logic analyzer and say which end is misbehaving.
- Predict whether an RTOS task set will meet its deadlines using rate-monotonic analysis, and know when to reach for priority inheritance.
The Map
- You will be able to
- The Map
- Station 1 — Sampling and Nyquist
- Station 2 — Quantization, SNR, and bit depth
- Station 3 — Filters: FIR, IIR, and the tradeoffs
- Station 4 — FFT, windowing, and the frequency domain
- Station 5 — Modulation: AM, FM, QAM, OFDM
- Station 6 — Board-level buses: I²C, SPI, UART, CAN
- Station 7 — Interrupts, DMA, and the ISR discipline
- Station 8 — Real-time systems, WCET, and rate-monotonic scheduling
- How the stations connect
- Standards & Specs
- Test yourself
Two threads run through every station: information (what the signal is telling you and how not to destroy that information on the way in) and time (what must happen before the next deadline, and what happens if it doesn't). The embedded world is where those two threads touch the ground.
Station 1 — Sampling and Nyquist
An analog signal varies continuously over time; a digital one is a sequence of numbers taken at discrete instants. Sampling is the act of measuring the analog signal at those instants and writing down the values. The central question is how often do we have to measure? Sample too slowly and the digital sequence fails to represent the analog signal correctly — high-frequency wiggles get aliased onto lower frequencies and appear as spurious content that is not there in the original.
The answer is the Nyquist-Shannon sampling theorem (Harry Nyquist 1928, Claude Shannon 1949): to faithfully represent a signal whose highest frequency is B, you must sample at a rate of at least 2B. Audio CD samples at 44.1 kHz because human hearing tops out around 20 kHz (with headroom for the anti-alias filter). A digital oscilloscope capturing a 1 GHz signal needs more than 2 GSamples/s. If the incoming signal has content above B that you do not want, you must add an anti-alias filter — an analog low-pass filter — before the sampler so the high frequencies never reach the digitiser.
An analog signal is a continuous function of time; a digital system can only hold samples — snapshots at discrete instants. Sampling is that reduction. The Nyquist–Shannon theorem (Shannon 1949, Kotelnikov 1933, Whittaker 1915) gives the price: if a band-limited signal contains no frequencies above B Hz, you can reconstruct it exactly from samples taken at rate f_s > 2B. Sample below 2B and the reconstruction aliases higher frequencies down to lower ones — and you cannot tell, from the samples alone, which is which.
Signal at 3 kHz, sampled at 8 kHz:
* * * * * (samples)
/ \ / \ / \ / \ / \ (3 kHz wave)
/ \ / \ / \ / \ / \
reconstructed uniquely ✓ (8 > 2 × 3)
Signal at 5 kHz, sampled at 8 kHz:
* * * * * (same samples!)
\ /\ /\ /\ / (3 kHz alias)
✗ reconstruction returns 3 kHz, not 5 kHz
→ 5 kHz has "aliased" to |5 − 8| = 3 kHz
Nyquist frequency f_N = f_s / 2
Alias of f = f mod f_s (reflected around f_N)
Because no real sensor is perfectly band-limited, every ADC front end includes an anti-alias filter — a low-pass filter with cutoff below f_N — so out-of-band content is attenuated below the noise floor before sampling. That filter's order, ripple, and phase response are often the unsung gating factor for signal quality.
- CD audio samples at 44.1 kHz — 2.205× the upper-edge 20 kHz human hearing limit, with ~2 kHz of guard band for a realistic brick-wall filter. Blu-ray uses 48 kHz or 96 kHz; studio master formats go to 192 kHz mostly as headroom for editing, not because ears hear above 20 kHz.
- Oversampling ADCs (sigma-delta, common in audio and modern sensor front ends) sample at 10–1000× the Nyquist rate with 1-bit quantization, then decimate and filter digitally to trade rate for resolution. This moves the anti-alias burden from analog hardware into a digital filter that's easier to design.
- Reconstruction from samples uses the Shannon–Whittaker sinc interpolation: y(t) = Σ y[n] · sinc((t − nT)/T). Real DACs approximate sinc with a zero-order hold (flat between samples) plus a smoothing filter; the hold introduces a
sin(x)/xdroop that needs compensation for flat frequency response near Nyquist. - Sample-and-hold circuits freeze the analog voltage while the ADC converts. The aperture uncertainty (jitter in the hold instant) sets a ceiling on effective resolution at high frequencies — at 1 MHz input, 1 ps of jitter limits ENOB (effective number of bits) to about 18.
The model you want: every sample rate is a bandwidth promise, and every ADC front end is a filter that keeps the promise. If the anti-alias filter is wrong, no amount of downstream DSP rescues the signal — the aliased energy is now indistinguishable from real signal.
WARNING
"Just sample faster" is not always the answer. Doubling the sample rate doubles the ADC's power, doubles the data to move, doubles the memory footprint, and can halve the filter's stop-band attenuation requirement (easier analog) — but if your signal genuinely contains energy above the new Nyquist, the aliasing problem moves rather than disappearing.
Go deeper: Shannon, "Communication in the Presence of Noise" (Proc. IRE 1949); Oppenheim & Schafer, Discrete-Time Signal Processing (3rd ed) chapters 4 and 11; Kester, The Data Conversion Handbook (Analog Devices); play with numpy.fft on a sine generator to watch aliasing happen in real time.
Station 2 — Quantization, SNR, and bit depth
Sampling (Station 1) captures the signal at discrete times. The second half of going digital is quantization — rounding each analog measurement to a finite number of possible values so it fits in a fixed-size integer. A 16-bit analog-to-digital converter (ADC) produces one of 2¹⁶ = 65,536 output values; a 24-bit ADC produces one of ~16.7 million. The difference between the true analog value and the rounded digital value is quantization noise, and the fundamental tradeoff is: more bits means less noise (and more storage, more bandwidth, more power).
The canonical figure of merit is signal-to-noise ratio (SNR). For a full-scale sine wave quantized to N bits, the theoretical SNR is approximately 6.02 × N + 1.76 dB — each extra bit gives 6 dB of dynamic range. A 16-bit ADC has ~98 dB SNR; a 24-bit has ~146 dB (though noise in the analog front-end usually prevents the last bits from being useful). The bit depth you actually need is determined by the SNR you need in your application; anything past that is cost without benefit.
Sampling discretizes time; quantization discretizes amplitude. An N-bit ADC produces one of 2^N codes per sample. That rounding step is the fundamental, unavoidable error of digitization — quantization noise — and it sets the signal-to-noise ratio ceiling no downstream math can beat.
Quantization of a sine into 3-bit codes (8 levels):
analog quantized
│ ╱╲ ╷╷
│ ╱ ╲ ╷╷╷╷
│╱ ╲ ╷ ╷ ╷╷
│ ╲ ╷ ╷ ╷
│ ╲ ╷ ← rounded to nearest code
└─────── ╲── ╷ ╷─
╲╱ ╷╷
step size q = Vref / 2^N
max error ≤ q / 2
noise power = q² / 12 (uniform-distribution assumption)
SNR for a full-scale sine:
SNR (dB) = 6.02 · N + 1.76
→ each extra bit adds ~6 dB of SNR.
The 6.02·N + 1.76 dB formula is the headline number for every ADC datasheet. A perfect 16-bit ADC has ~98 dB SNR; a 24-bit audio ADC has ~146 dB on paper but delivers ~120 dB ENOB because thermal noise in real silicon dominates the low bits. ENOB (Effective Number of Bits) is the honest number: ENOB = (SINAD − 1.76) / 6.02, where SINAD includes quantization noise, thermal noise, harmonic distortion, and spurs.
- Audio: 16 bits per sample at 44.1 kHz (CD), 24 bits at 48/96/192 kHz (studio). 16-bit dithered PCM has a dynamic range of ~96 dB — more than any human listening environment needs; 24 bits is headroom for mix/edit operations that scale down.
- Sensors: pressure / strain / temperature transducers are often 12–16 bit; motion IMUs 16–24 bit; scientific instruments 18–24 bit with sigma-delta converters; high-speed scopes 8–12 bit because speed trades against bits.
- Non-uniform quantization exists where the signal's distribution is known: μ-law and A-law (G.711 telephony) compand 14-bit linear audio into 8 bits per sample by log-ish steps that match the ear's logarithmic perception — a 2× bitrate win for voice with no perceived quality loss.
- Dither — a small random noise deliberately added before quantization — linearizes quantization error and removes pattern artifacts at low signal levels. Counter-intuitive but well-established; every competent mastering engineer dithers on bit-depth reduction.
The model you want: a bit is ~6 dB; a dB is ~1.15× amplitude; ENOB is the number that actually measures. When someone quotes a 24-bit ADC with 90 dB ENOB, they are selling you a 15-bit ADC. Read the datasheet page 2.
TIP
Scale your signal to use the top ~80% of the ADC's range — too small and you waste bits in quantization noise, too large and you clip. A calibration routine that adjusts analog gain to a target RMS at setup time is worth an extra bit or two of real resolution for free.
Go deeper: Analog Devices' "MT-001: Taking the Mystery out of the Infamous Formula SNR = 6.02N + 1.76dB" tutorial; Widrow & Kollár, Quantization Noise: Roundoff Error in Digital Computation, Signal Processing, Control, and Communications; a datasheet for any modern ΔΣ ADC (e.g. AD7768, ADS1299) read from ENOB backwards.
Station 3 — Filters: FIR, IIR, and the tradeoffs
Once you have a digital signal, you almost always need to reshape it — remove noise above some frequency, isolate a band you care about, smooth out jitter, equalize a microphone, match the anti-alias filter before a downsample. The tool for this is a digital filter: a mathematical operation that takes an input sample sequence and produces an output sequence with a desired frequency response.
There are two fundamental classes. A Finite Impulse Response (FIR) filter computes each output sample as a weighted sum of the last N input samples — the impulse response dies away in finite time, and the filter has no internal memory that persists. FIR filters are easy to design with exact linear phase (all frequencies delayed by the same time), always stable, and the go-to choice when phase matters. An Infinite Impulse Response (IIR) filter feeds its own output back into the calculation, so the impulse response rings on forever — this makes IIR filters much more efficient (many fewer multiplies per sample for the same sharpness) but introduces phase distortion and possible instability. Classical analog filter prototypes (Butterworth, Chebyshev, elliptic) all have digital IIR counterparts.
A filter is a system that selectively attenuates some frequency content and passes the rest. Every serious signal-processing task is a cascade of filters: the anti-alias filter, the decimation filter, the DC blocker, the noise rejector, the band splitter, the matched filter. Two families dominate: FIR (Finite Impulse Response) and IIR (Infinite Impulse Response).
FIR (non-recursive): N taps, convolves input with a fixed kernel
y[n] = h[0]·x[n] + h[1]·x[n-1] + … + h[N-1]·x[n-N+1]
─ linear phase possible (symmetric h[]) → no delay distortion ✓
─ always stable (poles at origin, zeros anywhere) ✓
─ N taps to hit a steep cutoff; N grows with 1 / transition-width
─ cost: N multiply-adds per output sample
IIR (recursive): past outputs feed back
y[n] = Σ b[k]·x[n-k] − Σ a[k]·y[n-k]
─ few taps (order 2–8 is common) match FIR orders of 100s
─ cheap: O(order) MACs per sample
─ can ring, can oscillate if coefficients drift (stability matters)
─ phase response is not linear
Classical analog-derived IIR designs — Butterworth (flat passband), Chebyshev I/II (ripple in passband or stopband, steeper rolloff), elliptic / Cauer (steepest, ripple in both), Bessel (linear phase, gentle rolloff) — have closed-form coefficient recipes and are one scipy.signal.iirdesign away. FIRs are designed by windowing a sinc, by Parks-McClellan (optimal equiripple, Remez exchange), or by least-squares methods.
- A typical audio equalizer is a cascade of biquad IIR sections (2nd-order), 8–16 of them covering the band. Each biquad costs ~5 MACs and 4 words of state — trivially small even on 8-bit MCUs.
- FIR decimation filters are bread and butter in ΔΣ ADC decimation chains — a 256-tap FIR followed by 64× downsampling gives an effective 16384-tap anti-alias response at the cost of a 256-tap convolution. The polyphase decomposition makes the math cheap.
- Group delay matters for anything that must preserve waveshape — ECG, radar pulse, data communication. A symmetric FIR has group delay exactly (N − 1)/2 samples at all frequencies; an IIR's group delay varies with frequency, so fast transients blur.
- Fixed-point implementation on an MCU needs attention to headroom (intermediate sums exceeding the accumulator), coefficient quantization (a 16-bit coefficient can turn a stable IIR into an unstable one), and limit cycles (small oscillations from rounding). DSP cores like ARM Cortex-M4 have a SIMD MAC and saturating arithmetic exactly for this.
The model you want: FIR when phase matters or stability is non-negotiable; IIR when compute or memory is tight and the phase response is acceptable. The choice is a system question, not a DSP-theory question.
CAUTION
A sharper filter (narrower transition, deeper stopband) always costs either more taps (FIR) or higher order (IIR with more sensitivity to coefficient error). "Make the filter steeper" on a Monday becomes "the filter is unstable on Tuesday" on fixed-point silicon. Prototype in floating-point, verify in fixed-point, and test with worst-case inputs.
Go deeper: Oppenheim & Schafer, Discrete-Time Signal Processing chapters 5–7 and 10; Lyons, Understanding Digital Signal Processing (3rd ed); the SciPy signal package docs; ARM's CMSIS-DSP library for reference MCU implementations.
Station 4 — FFT, windowing, and the frequency domain
Most signals are easiest to reason about in the time domain (what voltage at each instant?) for storage and playback, but hardest in the time domain for understanding — "is there a 60 Hz hum in this recording?" is a question about frequency content, not time content. Fourier's theorem says any signal can be expressed as a sum of sinusoids at different frequencies with different amplitudes and phases. The Fourier transform converts a signal from time-domain samples into frequency-domain amplitudes (and phases) at each frequency.
In computation, we use the Discrete Fourier Transform (DFT), which operates on a finite-length buffer of samples. Computing it naïvely takes O(N²) operations; the Fast Fourier Transform (FFT) (Cooley-Tukey 1965) does the same thing in O(N log N), which is what makes real-time frequency analysis feasible. The FFT is the single most important algorithm in signal processing — audio equalizers, spectrum analyzers, radio receivers, MRI scanners, image compressors, and wireless modulation all lean on it.
The catch is that the DFT assumes the N-sample buffer is one period of an infinitely repeating signal. In practice your buffer is just a chunk of a longer signal, and the abrupt ends create spectral leakage — energy smeared across frequencies that were not really there. Windowing (multiplying the buffer by a smooth-edged envelope like Hann, Hamming, or Blackman) softens the ends and reduces leakage at the cost of a small loss in frequency resolution.
The DFT (Discrete Fourier Transform) maps N samples in time to N complex values in frequency:
X[k] = Σ x[n] · e^(-j · 2π · k · n / N), n = 0 … N-1
Naively this costs O(N²). The FFT (Cooley & Tukey, 1965, echoing Gauss 1805) factors the DFT recursively into smaller DFTs of size 2 or 4, costing O(N log N). For N = 1024 that is ~10 000 vs ~1 000 000 — three orders of magnitude, which is why the FFT is the one signal-processing algorithm every embedded engineer will compute at least once.
Bin k of the FFT corresponds to frequency f_k = k · f_s / N. Bin spacing Δf = f_s / N is the frequency resolution. To resolve two tones 10 Hz apart at f_s = 48 kHz you need N ≥ 4800. Longer FFTs buy resolution; shorter FFTs buy time-locality — the uncertainty principle of DSP.
- Windowing is not optional. A rectangular window (just the N samples as they are) produces sinc-shaped leakage into every other bin; any real-world non-periodic signal looks like garbage. Hann, Hamming, Blackman–Harris, and Kaiser windows trade main-lobe width for side-lobe level — pick the window that matches whether you care about resolving close tones (narrow main lobe, Hann) or detecting weak tones near strong ones (low side lobes, Blackman–Harris).
- Real signals produce a conjugate-symmetric spectrum: X[N − k] = X[k]*. Only bins 0 through N/2 are independent — libraries exploit this (RFFT, half-complex format) to halve compute and memory.
- STFT / spectrogram: slide a window across the signal, FFT each frame, stack the magnitude spectra. This is how you get a time-frequency picture. Overlap (50%, 75%) recovers information lost at frame edges.
- FFT on MCUs: CMSIS-DSP ships a radix-4 FFT tuned for Cortex-M4/M7. A 512-point fixed-point FFT runs in ~100 µs on a 180 MHz Cortex-M4 — fast enough for kHz-rate analysis in a battery-powered device.
The model you want: the FFT is a basis change from time to frequency; every DSP operation that is hard in one basis is typically easy in the other. Convolution in time is multiplication in frequency; modulation in time is shift in frequency; correlation is a dot product under the DFT.
TIP
When you can't see a weak tone in a noisy spectrum, average the magnitude squared of many short FFTs rather than taking one long FFT. Noise averages down as 1/√K for K averages; signal stays the same. Welch's method is the packaged form.
Go deeper: Cooley & Tukey, "An Algorithm for the Machine Calculation of Complex Fourier Series" (Math. Comp. 1965); Frigo & Johnson, "The Design and Implementation of FFTW3" (Proc. IEEE 2005); Harris, "On the Use of Windows for Harmonic Analysis with the Discrete Fourier Transform" (Proc. IEEE 1978); Oppenheim & Schafer chapter 10.
Station 5 — Modulation: AM, FM, QAM, OFDM
To send a digital signal across a wire or through the air, we almost always ride it on top of a higher-frequency carrier wave. The carrier gives the signal access to a frequency band licenced for that purpose (a Wi-Fi channel, an FM radio station, a cellular band) and takes advantage of better propagation at RF frequencies. Modulation is the process of imprinting the digital message onto the carrier; demodulation is recovering it at the other end.
Three families cover most of what you will meet. Amplitude modulation (AM) varies the height of the carrier to encode the message — the oldest form, still used for AM radio and some industrial telemetry. Frequency modulation (FM) varies the carrier frequency — more resistant to amplitude noise; FM radio, two-way radios, and many simple digital links. Phase modulation (PM) and its practical cousin Quadrature Amplitude Modulation (QAM) vary the phase (and amplitude) of the carrier in discrete steps — each step represents a cluster of bits. 64-QAM, 256-QAM, and 1024-QAM are what modern Wi-Fi and cellular signals use to push many bits per Hz of bandwidth. Orthogonal Frequency-Division Multiplexing (OFDM) splits the available spectrum into hundreds of narrow subcarriers, each modulated independently — the basis of Wi-Fi 4/5/6, 4G LTE, 5G NR, and DSL.
To move bits through the air, you ride them on a high-frequency carrier. A 2.4 GHz Wi-Fi carrier propagates through a house; a 100 Hz baseband signal would need an antenna kilometres long. Modulation is how you imprint information on the carrier; demodulation recovers it at the far end.
Three ways to imprint a bit on a sine wave c(t) = A·cos(2πf·t + φ):
AM — amplitude A varies with the signal
FM — frequency f varies with the signal
PM — phase φ varies with the signal
Digital flavours:
ASK, FSK, PSK — same three, with discrete symbol alphabets
QAM — joint amplitude + phase, typically 4 / 16 / 64 / 256 / 1024 / 4096 points
OFDM — many narrowband QAM-modulated subcarriers in parallel
Constellation diagram for 16-QAM (points on complex plane):
● ● ● ● each point = one 4-bit symbol
● ● ● ● noise moves points; decode = nearest-neighbour
● ● ● ● spectral efficiency: 4 bits per symbol per Hz
● ● ● ●
OFDM — Orthogonal Frequency-Division Multiplexing — is the technique behind Wi-Fi, LTE, 5G, DVB-T, and DSL. Instead of one fast QAM stream, split the channel into hundreds or thousands of narrow subcarriers, each running QAM at a low symbol rate, spaced so they are mutually orthogonal over one symbol period. A cyclic prefix turns the multi-path channel into a set of independent flat-fading subcarriers. The FFT is how you do the math — transmit = IFFT, receive = FFT.
- Wi-Fi 6/6E (802.11ax) uses 256-point to 2048-point OFDMA, with subcarrier spacing 78.125 kHz, symbol time 12.8 µs + 0.8–3.2 µs cyclic prefix, and up to 1024-QAM (10 bits/symbol) per subcarrier in excellent SNR. Peak PHY rate on 160 MHz channel, 4 spatial streams: ~9.6 Gbps.
- 5G NR uses OFDM in DL and CP-OFDM or DFT-s-OFDM in UL, with scalable subcarrier spacing 15–240 kHz. The same OFDM idea, different parameter sets for FR1 (sub-6 GHz) and FR2 (mmWave 24–52 GHz).
- Shannon capacity is the cap: C = B · log₂(1 + SNR), bits per second, where B is bandwidth in Hz and SNR is signal-to-noise ratio (linear). A 20 MHz channel at 30 dB SNR caps at ~200 Mbps per spatial stream; breaking the cap is how you know your channel model is wrong.
- Spread spectrum (DSSS, FHSS) hides the signal in a wider band than it needs, trading bandwidth for SNR or stealth. GPS uses DSSS with a 1.023 MHz chip rate to deliver navigation data at 50 bps with a processing gain of ~43 dB — which is why a GPS receiver works with a signal below the thermal noise floor.
The model you want: the carrier is a vehicle; the modulation is how you load it; the channel decides how much you can load before symbols start bumping into each other. Every wireless standard is a negotiation between modulation order, bandwidth, SNR, and multi-path — with codes and retransmission to catch what slips through.
WARNING
A 1024-QAM link on paper needs ~30 dB SNR to survive with good BER; a typical home Wi-Fi environment gives you 20 dB on a bad day. "My router says 9.6 Gbps" is the PHY peak in lab conditions; in your kitchen you may be one modulation step down and a spatial-stream short. This is not a defect, it is physics.
Go deeper: Proakis & Salehi, Digital Communications (5th ed); Tse & Viswanath, Fundamentals of Wireless Communication; the IEEE 802.11ax (Wi-Fi 6) and 3GPP TS 38.211 (5G NR PHY) standards for worked-out OFDM parameter sets; SDR with an RTL-SDR dongle and GNU Radio to see modulation live.
Station 6 — Board-level buses: I²C, SPI, UART, CAN
An embedded system is not one chip — it is a microcontroller plus several peripherals (sensors, displays, flash memory, power-management ICs) that have to communicate over a few wires. Board-level buses are simple, standardized wire protocols that allow one or more chips to exchange bytes at tens of kilobits per second up to tens of megabits per second without Ethernet's complexity or a full network stack.
The four you will meet constantly. UART (Universal Asynchronous Receiver-Transmitter) is the simplest — two wires (TX and RX), no clock, each side clocks its own data at an agreed baud rate. Used for debug consoles, GPS modules, Bluetooth modems, and anything that wants "just send me bytes." SPI (Serial Peripheral Interface) adds a shared clock and a chip-select wire per peripheral; fast (tens of MHz), full-duplex, point-to-point — the bus of choice for flash chips, displays, and ADCs. I²C (Inter-Integrated Circuit) is a two-wire multi-drop bus — one data line, one clock, addresses let up to 127 devices share the bus at 100 kHz / 400 kHz / 1 MHz. Slower than SPI but needs only two wires for any number of peripherals. CAN (Controller Area Network) is the bus a modern car's electronics run on — multi-drop, differential, 1 Mbps, with built-in arbitration so any node can start a frame and the highest-priority ID wins.
Inside an embedded device, the interesting wires are board-level buses — how the microcontroller talks to sensors, displays, flash chips, and other MCUs. Four protocols dominate, each tuned to a different balance of pin count, speed, reach, and complexity.
UART (async serial)
2 wires: TX, RX (GND shared)
no clock; agreed baud rate, start/stop bits, optional parity
typical: 9600 – 115 200 – 3 000 000 baud
reach: ~1 m unterminated; km with RS-232 or RS-485 transceivers
good for: debug console, GPS module, old modems
SPI (synchronous, full-duplex)
4 wires: SCLK, MOSI, MISO, SS (one SS per peripheral)
master provides clock; up to tens of MHz
no addressing — SS selects the slave; no acknowledgements
good for: SD cards, flash memory, high-rate ADCs, displays
I²C (synchronous, half-duplex, multi-drop)
2 wires: SDA (data), SCL (clock) with pull-ups to Vdd
7-bit addresses, speeds 100 kHz (std) / 400 kHz (fast) / 3.4 MHz (HS)
built-in ACK per byte, clock stretching, arbitration
good for: sensors, EEPROMs, fan controllers, dozens of devices per bus
CAN (Controller Area Network, automotive + industrial)
2 wires differential (CAN_H, CAN_L), up to 1 Mbps (Classical), 8 Mbps (CAN-FD)
message-based, no addressing — every node filters by 11- or 29-bit ID
non-destructive arbitration, reach up to ~40 m at 1 Mbps, longer slower
good for: vehicles, motor drives, agricultural machinery
The choice is almost always a pin-count vs speed vs reach tradeoff. SPI wins raw speed; I²C wins pin count and makes long daisy chains; UART wins simplicity; CAN wins electrical robustness and determinism under contention.
- I²C's open-drain lines need external pull-ups (typically 2.2–10 kΩ to Vdd). Too weak and rise times blow past the 300 ns spec at 400 kHz; too strong and the drivers can't sink enough current. This is the #1 reason "my I²C is unreliable" is a hardware problem, not firmware.
- CAN arbitration is non-destructive: when two nodes transmit simultaneously, the one with the lower ID wins (dominant 0 overrides recessive 1) and the loser silently retries. That's why safety-critical messages get the lowest IDs — they preempt, they don't wait.
- SPI has no universal framing: every peripheral's datasheet defines the command bytes and clock polarity/phase (CPOL, CPHA — four modes, and they mismatch frequently). A logic analyzer in "SPI decode" mode answers 90% of SPI bring-up questions in minutes.
- RS-485 (which UARTs often speak through a transceiver) uses differential signaling over a twisted pair, allowing multi-drop up to 32–256 nodes over 1 200 m at 100 kbps. It is the backbone of Modbus-RTU, DMX512 lighting, and industrial sensor networks.
The model you want: a bus is a contract that decides who talks, when, and how loudly. Pin count, clock source, drive strength, addressing, and acknowledgement are negotiable; electrical integrity is not. A misreading on any bus is almost always termination, pull-ups, or ground return at fault before firmware.
TIP
Buy a cheap logic analyzer (Saleae clones are ~$20) and plug it into every new bus before you trust the firmware. "The waveform looks wrong" diagnoses 80% of bring-up bugs in a minute; staring at printouts does not.
Go deeper: NXP UM10204 (I²C spec); Motorola/Freescale SPI application notes; Bosch CAN 2.0 specification (15 pages, readable); ISO 11898 (CAN physical layer); TI SLLA272 ("RS-485 Basics"); any MCU vendor's reference manual chapter on their SERCOM / USART / SPI / CAN peripheral.
Station 7 — Interrupts, DMA, and the ISR discipline
A microcontroller's main loop cannot spend its time asking peripherals "do you have something for me?" — polling wastes CPU and misses fast events. The hardware answer is interrupts: when a peripheral (a sensor, a UART, a timer) has something to report, it raises a signal that forces the CPU to stop what it was doing, save its state, and jump to an interrupt service routine (ISR) — a dedicated function that handles the event, after which the CPU resumes what it was doing.
An ISR has to be fast. Anything slow you do inside it (blocking waits, complex calculations, logging to a slow UART) adds latency to every interrupt that happens while you are in the ISR and can cause events to be missed. The ISR discipline: do the absolute minimum in the ISR — grab the data from the peripheral's register, set a flag, enqueue a message — and let the main loop or a lower-priority task do the real work.
A second hardware feature, DMA (Direct Memory Access), lets peripherals move bytes to and from memory without involving the CPU at all. A high-speed ADC can stream thousands of samples per second straight into a buffer via DMA; the CPU only gets an interrupt when the buffer is full. DMA is what makes a small microcontroller capable of serious real-time throughput — without it, the CPU would spend all its time servicing per-byte interrupts and never get anything else done.
An MCU's main loop cannot poll every sensor every microsecond. Interrupts let peripherals shout "I have data" and preempt whatever the CPU was doing. DMA (Direct Memory Access) lets peripherals move data to or from memory without bothering the CPU at all — the CPU configures the DMA controller, DMA does the transfer, and fires an interrupt only when done.
Interrupt life cycle on Cortex-M:
peripheral asserts IRQ → NVIC prioritizes → CPU:
1. finishes current instruction
2. pushes xPSR, PC, LR, R12, R3–R0 onto the stack (automatic)
3. loads vector table entry for IRQ → branches to ISR
4. ISR runs (in privileged handler mode)
5. BX LR (special EXC_RETURN) restores registers, resumes
tail-chaining: if another IRQ is pending at return, skip unstack+stack
lazy stacking: FPU regs saved only if ISR uses FPU
typical latency from IRQ assert to first ISR instruction: ~12 cycles
- Priority matters. Cortex-M NVIC supports 8–256 priority levels depending on implementation. A high-priority ISR preempts a low-priority one; same-priority ISRs run to completion. Get priorities wrong and critical timers starve behind long-running "housekeeping" ISRs.
- Keep ISRs short. The working rule is do the minimum to clear the hardware, defer the rest. Good shapes: read one byte from a UART FIFO into a ring buffer; sample an ADC into a circular DMA buffer; set a flag a task picks up. Bad shapes: log a message, traverse a data structure, call printf.
- DMA patterns come in three flavours. Memory-to-memory for fast copies. Peripheral-to-memory (ADC, UART RX, SPI RX) fills a buffer with incoming data and interrupts at half/full thresholds — the standard pattern for continuous sampling. Memory-to-peripheral (UART TX, SPI TX, DAC, LED strip) lets the CPU build a frame in RAM and hand off the byte-by-byte transfer.
- Race conditions: an ISR sharing state with a main-loop task is a concurrency problem, and the rules are the same as in the Operating Systems handbook's threading section.
volatileis necessary (tells the compiler not to optimize), but rarely sufficient — you need atomic updates or critical sections for multi-byte values. Cortex-M'sLDREX/STREXandBASEPRIgive the primitives.
Double-buffering (ping-pong DMA) keeps the CPU processing one half of a buffer while DMA fills the other. It is the canonical pattern for continuous data in real time — audio codecs, high-rate ADCs, camera sensors, LED strips all do it.
The model you want: an ISR is a promise: I will be brief, I will not allocate, I will not block, I will leave the system in a consistent state. DMA is a promise kept by the hardware: I will move these bytes while you do something else.
CAUTION
Don't call malloc, printf, or any blocking API from an ISR. The standard library is not generally reentrant; allocators hold global locks; printf can pull in hundreds of cycles plus a UART TX wait. The ISR will "work" in testing and deadlock in production.
Go deeper: Joseph Yiu, The Definitive Guide to Arm Cortex-M3/M4/M7 (3rd ed) chapters 7–9; ARM's NVIC reference manual; the ChibiOS or FreeRTOS port-layer code for a reference implementation of interrupt-entry discipline; the datasheet's DMA channel table for any MCU you care about.
Station 8 — Real-time systems, WCET, and rate-monotonic scheduling
An ordinary operating system tries to be fast on average — minimise wait times, maximise throughput, share fairly. A real-time system tries to be fast in the worst case — guarantee that a specific task finishes by a specific deadline every single time, no exceptions. An anti-lock braking system cannot miss a deadline once every thousand years; a pacemaker cannot afford even that. Real-time systems trade average performance for guaranteed bounds.
The key quantity is Worst-Case Execution Time (WCET) — how long this task could take in the worst possible case, including all cache misses, interrupts, and context switches. WCET is hard to measure (measurements only give average + tail) and hard to prove (static analysis tools like aiT, OTAWA try). Once you have WCETs, classical real-time scheduling theory (Liu & Layland 1973) tells you whether a set of periodic tasks can all meet their deadlines. Rate-monotonic scheduling (RMS) assigns priority by frequency — faster deadlines get higher priority — and is optimal among fixed-priority schedulers; a set of tasks is schedulable by RMS if total utilisation is below n × (2^(1/n) − 1), which approaches ln(2) ≈ 0.693 as n grows. Earliest Deadline First (EDF) is optimal among dynamic schedulers and can hit 100% utilisation.
A Real-Time Operating System (RTOS) — FreeRTOS, Zephyr, VxWorks, QNX, ThreadX — provides the scheduler, priority-inheritance mutexes, event flags, message queues, and timers that real-time software needs. Priority inheritance (not just priority ceilings) exists because of the 1997 Mars Pathfinder priority-inversion failure, a classic case study every embedded engineer eventually reads.
A real-time system is one where correctness depends on producing the right answer before a deadline. Hard real-time: missing a deadline is a failure (a pacemaker, an airbag, a motor controller). Soft real-time: missing is ugly but tolerable (video playback, UI responsiveness). An RTOS — FreeRTOS, Zephyr, ThreadX, QNX — provides tasks, priority-based preemption, inter-task primitives (queues, semaphores, event groups), and timers with bounded, analyzable scheduling behaviour.
Task set with deterministic periods. Can we meet every deadline?
task period T WCET C priority (rate-monotonic: shortest T = highest prio)
──── ──────── ────── ─────────────────────────────
τ1 10 ms 2 ms highest
τ2 40 ms 10 ms mid
τ3 100 ms 20 ms lowest
Utilization U = Σ C_i / T_i
= 0.2 + 0.25 + 0.2 = 0.65
Liu & Layland RM bound for n tasks:
U ≤ n · (2^(1/n) − 1) → n=3: 0.779
0.65 < 0.779 → schedulable ✓
For arbitrary deadlines, use response-time analysis (RTA):
R_i = C_i + Σ_{j higher-prio} ceil(R_i / T_j) · C_j
schedulable if each R_i ≤ D_i
- WCET — Worst-Case Execution Time — is the maximum time a task can take under any input, any cache state, any interrupt interleaving. Getting WCET right is hard on a Cortex-M with a cache and an FPU, and much harder on a multi-core with coherence traffic. Tools (aiT, OTAWA) exist; empirical worst-case plus a generous margin is common in industry.
- Rate-monotonic scheduling (Liu & Layland 1973) is optimal among fixed-priority static policies. Earliest-Deadline-First is optimal among dynamic policies (utilization up to 100% for independent periodic tasks), but a misbehaving task in EDF can starve the others because priorities shift every frame. RM is simpler and has cleaner failure modes.
- Priority inversion is the classic bug: a low-priority task holds a mutex, a medium-priority task preempts it, a high-priority task blocks waiting on the mutex — the high-priority task now effectively runs at medium priority. Priority inheritance (promote the holder to the highest waiter's priority) fixes it. The Mars Pathfinder bug in 1997 was exactly this; the fix was turning on priority inheritance in VxWorks.
- Jitter: the variance in when a periodic task actually starts. Sources include ISR preemption, cache effects, DMA contention, and higher-priority tasks. Bound jitter matters as much as bound latency for closed-loop control.
The model you want: a real-time system promises both a correct value and a correct time; every design decision either protects or compromises that joint promise. RM and response-time analysis are how you check mathematically that a task set can keep the promise on the hardware you've picked.
WARNING
"We'll just run it on Linux" is not a real-time answer by itself — stock Linux has worst-case scheduling latency in the milliseconds, sometimes tens of milliseconds under load. PREEMPT_RT (merged in 6.12, late 2024) brings it down to tens-of-microseconds territory, and Xenomai / RTAI give microsecond-grade guarantees with a co-kernel approach. Know which you're running before the first missed deadline.
Go deeper: Liu & Layland, "Scheduling Algorithms for Multiprogramming in a Hard-Real-Time Environment" (JACM 1973); Burns & Wellings, Real-Time Systems and Programming Languages (4th ed); Kopetz, Real-Time Systems: Design Principles for Distributed Embedded Applications (2nd ed); the FreeRTOS Kernel Reference Manual; Glenn Reeves' engineering log on the Mars Pathfinder priority-inversion bug.
How the stations connect
Signals travel a pipeline from the physical world to a decision: a transducer converts a physical quantity into a voltage; an ADC turns the voltage into numbers (Stations 1–2); DSP operations interpret the numbers (Stations 3–5); buses carry the results or the commands (Station 6); interrupts and DMA make the flow asynchronous and cheap (Station 7); the RTOS orchestrates it all under deadlines (Station 8).
The chip these signals run on is the subject of the Computer Architecture handbook; the fixed-point and float encodings the samples live in are defined in Foundations Station 3.
Standards & Specs
- IEEE 802.11ax / 802.11be (Wi-Fi 6/7) — the canonical OFDM deployments.
- 3GPP TS 38.211 — 5G NR physical channels and modulation.
- IEEE 1588 (PTP) — sub-microsecond time synchronization over Ethernet, the basis of every industrial / audio-over-IP network.
- NXP UM10204 — I²C-bus specification, Rev 7.0 — the authoritative I²C document.
- ISO 11898-1:2024 — CAN data link layer and ISO 11898-2 — CAN physical layer.
- TIA-232-F (RS-232) and TIA-485-A (RS-485) — the serial-line electrical standards.
- MIPI CSI-2 / DSI / I3C — the mobile-era successors to parallel camera and display buses.
- IEEE 754-2019 — the arithmetic behind every float in your DSP.
- Canonical papers — Shannon, "A Mathematical Theory of Communication" (1948) and "Communication in the Presence of Noise" (1949); Nyquist, "Certain Topics in Telegraph Transmission Theory" (1928); Cooley & Tukey, "An Algorithm for the Machine Calculation of Complex Fourier Series" (1965); Harris, "On the Use of Windows for Harmonic Analysis" (1978); Liu & Layland, "Scheduling Algorithms" (1973); Sha et al., "Priority Inheritance Protocols" (IEEE Trans. Comp. 1990); Frigo & Johnson, "Design and Implementation of FFTW3" (2005).
- Books — Oppenheim & Schafer, Discrete-Time Signal Processing (3rd ed). Lyons, Understanding Digital Signal Processing (3rd ed). Proakis & Salehi, Digital Communications. Tse & Viswanath, Fundamentals of Wireless Communication. Burns & Wellings, Real-Time Systems and Programming Languages. Kopetz, Real-Time Systems. Yiu, The Definitive Guide to Arm Cortex-M.
Test yourself
A pressure sensor has usable content up to 200 Hz. The ADC samples at 400 Hz. 60 Hz mains hum appears clearly in the output. Predict the spectrum the system actually records and the cheapest fix.
60 Hz is within band at f_s = 400 Hz (Nyquist = 200 Hz), so the mains tone lands at 60 Hz in the spectrum — the problem is not aliasing, it is direct in-band coupling. The fix is either a notch filter centered at 60 Hz (IIR biquad) in DSP, or a differential front end that cancels mains common-mode. If the issue were aliasing — say, sampling at 100 Hz with content up to 200 Hz — 60 Hz would appear plus the alias of anything near 160 Hz, and the fix would have to be analog anti-aliasing before the ADC, not DSP after. See Station 1 and Station 3.
A Cortex-M4 system runs three periodic tasks with periods 10/50/100 ms and WCETs 3/10/30 ms. Rate-monotonic assigned by period. Will they meet deadlines?
Utilization U = 3/10 + 10/50 + 30/100 = 0.3 + 0.2 + 0.3 = 0.8. The Liu–Layland bound for n = 3 is 3·(2^(1/3) − 1) ≈ 0.779. So U > bound: RM may miss a deadline; the bound is sufficient but not necessary. Do response-time analysis (RTA): R₁ = 3, R₂ = 10 + ⌈10/10⌉·3 = 13 ≤ 50 ✓, R₃ must solve R₃ = 30 + ⌈R₃/10⌉·3 + ⌈R₃/50⌉·10. Iterate: R₃ = 30 → 30+9+10=49; 49→30+15+10=55; 55→30+18+20=68; 68→30+21+20=71; 71→30+24+20=74; 74→30+24+20=74 stable; 74 ≤ 100 ✓. Schedulable. See Station 8.
An I²C bus with six sensors at 400 kHz starts producing garbled bytes when a seventh sensor is added to the chain. Name the likely cause and the two things to measure.
Adding a device adds capacitance to SDA/SCL (10–15 pF per device, plus PCB trace). Past ~400 pF total, rise times exceed the I²C fast-mode 300 ns spec and the receiver samples the slope instead of a stable level. Measure: (a) the rise time on SDA/SCL with a scope (should be under 300 ns at 400 kHz) and (b) the total bus capacitance (each device pin adds ~10 pF). Fixes: stronger pull-ups (lower resistance, more current), slower clock (100 kHz std-mode tolerates 1000 pF), or an I²C bus buffer like the PCA9515. See Station 6.