FingerIO

Turn your laptop into an active sonar. The speaker emits a near-inaudible OFDM symbol on a loop in the 18–22 kHz band; the microphone records the echoes; per-symbol channel-impulse-response estimation reveals the moving reflector — your finger or hand — against the static room. After Nandakumar et al., FingerIO (CHI 2016).

The idea in one minute

Active sonar with everyday hardware. Transmit a known signal x[n], capture y[n], and the recording is a delayed-and-scaled superposition of x — one copy per echo path. Recover the per-delay path strengths and you have a 1-D depth map: direct path, walls, your hand.

The clever bit in FingerIO is the choice of waveform. Repeat the same length-N OFDM symbol back-to-back and every period of the recording is a circular convolution of one symbol with the channel. So Y(k) = X(k)·H(k) — pointwise — and the channel impulse response falls out of one division and one inverse FFT:

h[n] = IFFT( Y(k) / X(k) )

Each symbol is its own cyclic prefix, so there is no preamble, no training sequence, no handshake. New CIR every N/fs seconds — about 5 ms with N = 256 at 48 kHz. Static reflectors live in the same range bins frame after frame; subtract a slow running average and what is left is motion.

Parameters

18.0 kHz

Lower edge of the inaudible OFDM band. Raise it if you can still hear a faint hiss; drop it if your speaker rolls off above 19 kHz (most laptops do, especially at the high end).

22.0 kHz

Upper edge. Wider band → finer time resolution, but most laptop tweeters fall apart past 21 kHz.

5.3 ms / 0.91 m

Symbol length N. Longer symbols span more range bins (max range = c·N/2fs) but update the CIR less often. 256 at 48 kHz is the FingerIO sweet spot: 91 cm of unambiguous range refreshed at ~190 Hz.

0.60 s

Time constant of the running average that estimates static clutter. Short τ tracks slow movement (and erases it from the display); long τ lets every wave of the hand glow but holds onto stale clutter.

0.30

Output amplitude. Crank it for stronger echoes — the band is above the audible range, so loud is fine. Some laptop speakers compress hard above ~0.5; back off if the visualization saturates.

50 cm

Maximum range plotted on the y-axis. Most useful action happens within the first 30 cm of the speaker.

Run

Click Start, allow microphone, then wave a hand 5–25 cm above the speaker.

Use a quiet surface. Place laptop flat, speakers up. The browser’s echo cancellation, AGC, and noise suppression are all disabled so the mic captures the sonar honestly. Headphones won’t work — the sound has to travel through the air, bounce off your hand, and return to the mic.

How the math works

Step 1. The OFDM symbol. Pick subcarrier indices k in the inaudible band, i.e. k with k·fs/N between band-low and band-high. Set X(k) = ±1 with random signs on those bins, zero elsewhere, and impose Hermitian symmetry X(N−k) = X(k)* so the IFFT comes out real. x[n] = IFFT(X(k)) is the symbol — about 5 ms long at the default settings — and we play x on a loop.

Step 2. Why the loop is the trick. If two consecutive periods of x are identical, then for any starting offset φ the N-sample window y[φ:φ+N] equals the linear convolution of one period of x with the channel h, provided the channel is shorter than N. Linear convolution of one period equals circular convolution because the next period’s samples fill in the wrap-around. Periodic transmission gives every symbol its cyclic prefix for free.

Step 3. Solve for the channel. Circular convolution diagonalizes under the DFT:

Y(k) = X(k)·H(k)  ⇒  H(k) = Y(k) / X(k)

We only divide on the active subcarriers (where |X(k)| = 1); other bins are zero, which means h[n] recovered by IFFT is band-limited to the inaudible window. Range bin n corresponds to round-trip delay n/fs seconds, i.e. one-way distance

d[n] = c · n / (2 fs)   ≈   n · 3.57 mm at 48 kHz

Step 4. Find the symbol boundary. The transmit→receive path through the speaker driver, the air, and the input ADC has unknown latency — tens of milliseconds, varying by browser and OS. Cross-correlate two periods of received signal against one known symbol; the lag at the peak is the offset, and we read length-N frames from there on.

Step 5. Background subtraction. Walls, table, the speaker grille — all those reflectors are stationary, so |h[n]| is roughly constant in time at their range bins. Maintain a per-bin exponential moving average m[n] with time constant τ and display max(0, |h[n]| − m[n]). The static room cancels; what remains is everything that moved in the last τ seconds. The strongest mover — reported live above — is your hand.

Things to try

What this isn’t

The original FingerIO uses a phone’s two-microphone array to triangulate a finger in 2D, plus a finite-state tracker that follows a single peak. This page does the front-end — the per-symbol CIR pipeline that the rest of the system rests on — on whatever single microphone your device gives the browser. You see the depth dimension. Add a second mic and the math for the lateral coordinate is the same trick run twice.