SonicSieve

Your ear can tell a voice in front of you from one behind without turning your head — the folds of your pinna color each direction with its own pattern of spectral notches, and your brain reads the color. SonicSieve 3D-prints that trick: a hollow resin cylinder with a ring of surface holes and internal capillary tubes, clipped onto an earphone mic, stamps a direction-dependent fingerprint onto incoming sound — so a neural net can pull out the talker you’re facing with just two mics, beating a flat five-mic array. Change the diameter and number of holes below and watch Fig. 2 (per-angle coloring) and Fig. 6 (spatial diversity) redraw.

The microstructure — geometry

Microstructure

Bare mic → every direction sounds identical, so two mics have almost nothing to separate by.

Diameter 20 mm Surface holes 6

Focus aim −18° Width ±15°

A larger diameter resonates at longer wavelengths, sliding the notch comb down into the 1–8 kHz speech band — the paper’s 20 mm design beats Owlet’s 15 mm. Holes act as virtual sound sources: too few and whole directions get attenuated; too many and the cylinder opens up, washing the coloring out. SonicSieve’s sweet spot is 20 mm with 6 holes. The focus region is the angular wedge it keeps — the paper reports 5.0 dB gain over a 30° (±15°) wedge.

Top-down scene — drag the talkers

Phone on the table · earphone mic + pinna microstructure

Target talker (drag) Interferer (drag) Focus region kept Direction of arrival

Figure 2 — effect of the microstructure on coloring

0° 60° 120° 180°

Redrawn from the paper’s Fig. 2. Left (bare mic): all four angles overlap — no spatial cue. Right (with microstructure): each angle leaves a distinct pattern of peaks and notches a model can learn. Grow the diameter and the structure pushes deeper into the speech band; the right panel reverts to flat lines whenever you choose Bare mic.

Directional fingerprint — magnitude vs frequency

Microstructure transfer function |H(f, θ)| · notches encode angle

Target coloring |H(f, θ_t)| Interferer coloring |H(f, θ_i)| Direction-matched mask m(f)

Each direction lands its notches at different frequencies. The net knows your target sits at θ_t, so it favors bands where the target’s coloring is strong and the interferer’s is weak (the gold mask). When the two fingerprints overlap — talkers close together, or the microstructure off — there’s nothing to grab and the mask flattens to ½.

Figure 6 — spatial diversity across the design

Diversity ‖M_θ1(f) − M_θ2(f)‖ over a semicircle

Mean spatial diversity

0.13

D = 20 mm · 6 holes

Redrawn from the paper’s Fig. 6: each pixel is the spectral distance between two arrival angles, so the dark diagonal is “an angle vs. itself.” More yellow off the diagonal = more separable directions. Sweep the diameter up and the map lights up; crowd in too many holes and it fades back toward blue. Compare 10 mm/10 holes against 20 mm/6 holes — the SonicSieve design the authors landed on.

Result

Signal-quality gain

+5.0

dB · target vs interferer

Angular separation

90°

target ↔ interferer

Fingerprint distance

0.42

dex · spectral contrast

Interferer

OUTSIDE

of focus region

Listen

Two synthetic vowels, colored by the microstructure, then masked by direction.

How it works

A microphone array localizes by time-of-arrival differences between elements — which needs many mics spread wide. SonicSieve instead borrows the pinna’s trick: a passive shape that turns direction into a spectral pattern on a single channel, so the cue lives in the color of the sound, not the spacing of sensors.
The microstructure is a clip-on 3D-printed resin cylinder on the in-line mic of cheap wired earphones — no electronics, no power. A ring of surface holes acts as virtual sound sources; sound from each angle θ couples into the holes facing it, travels capillary tubes of differing length to the mic, and the delayed copies interfere into angle-dependent notches.
Diameter sets the band. The cylinder resonates at wavelengths tied to its size, so a bigger diameter slides the notch comb down into the 1–8 kHz speech range. Owlet’s 15 mm design only differs above 7 kHz; SonicSieve’s 20 mm spreads the cues right across speech — watch the Fig. 6 map brighten as you widen it.
Holes have a sweet spot. Too few and whole swaths of directions never couple in (they get attenuated and look alike); too many and the wall between holes vanishes — the cylinder becomes acoustically open and the coloring washes out. The authors tuned a ten-hole design down to 6 strategically placed holes.
Because each angle prints a distinct fingerprint, an on-device neural net extracts a target by direction — reported 5.0 dB gain over a 30° region, two mics out-performing a flat five-mic array. But separation lives or dies on fingerprint distance: two talkers close in angle share nearly the same notches and the gain collapses; pull them apart and the target pops out.