SonicSieve

Your ear can tell a voice in front of you from one behind without turning your head — the folds of your pinna color each direction with its own pattern of spectral notches, and your brain reads the color. SonicSieve 3D-prints that trick: a hollow resin cylinder with a ring of surface holes and internal capillary tubes, clipped onto an earphone mic, stamps a direction-dependent fingerprint onto incoming sound — so a neural net can pull out the talker you’re facing with just two mics, beating a flat five-mic array. Change the diameter and number of holes below and watch Fig. 2 (per-angle coloring) and Fig. 6 (spatial diversity) redraw.

The microstructure — geometry

Bare mic → every direction sounds identical, so two mics have almost nothing to separate by.
20 mm 6
−18° ±15°

A larger diameter resonates at longer wavelengths, sliding the notch comb down into the 1–8 kHz speech band — the paper’s 20 mm design beats Owlet’s 15 mm. Holes act as virtual sound sources: too few and whole directions get attenuated; too many and the cylinder opens up, washing the coloring out. SonicSieve’s sweet spot is 20 mm with 6 holes. The focus region is the angular wedge it keeps — the paper reports 5.0 dB gain over a 30° (±15°) wedge.

Top-down scene — drag the talkers

Phone on the table · earphone mic + pinna microstructure
Target talker (drag) Interferer (drag) Focus region kept Direction of arrival

Figure 2 — effect of the microstructure on coloring

60° 120° 180°

Redrawn from the paper’s Fig. 2. Left (bare mic): all four angles overlap — no spatial cue. Right (with microstructure): each angle leaves a distinct pattern of peaks and notches a model can learn. Grow the diameter and the structure pushes deeper into the speech band; the right panel reverts to flat lines whenever you choose Bare mic.

Directional fingerprint — magnitude vs frequency

Microstructure transfer function |H(f, θ)| · notches encode angle
Target coloring |H(f, θ_t)| Interferer coloring |H(f, θ_i)| Direction-matched mask m(f)

Each direction lands its notches at different frequencies. The net knows your target sits at θ_t, so it favors bands where the target’s coloring is strong and the interferer’s is weak (the gold mask). When the two fingerprints overlap — talkers close together, or the microstructure off — there’s nothing to grab and the mask flattens to ½.

Figure 6 — spatial diversity across the design

Diversity ‖Mθ1(f) − Mθ2(f)‖ over a semicircle
Mean spatial diversity
0.13
D = 20 mm · 6 holes

Redrawn from the paper’s Fig. 6: each pixel is the spectral distance between two arrival angles, so the dark diagonal is “an angle vs. itself.” More yellow off the diagonal = more separable directions. Sweep the diameter up and the map lights up; crowd in too many holes and it fades back toward blue. Compare 10 mm/10 holes against 20 mm/6 holes — the SonicSieve design the authors landed on.

Result

Signal-quality gain
+5.0
dB · target vs interferer
Angular separation
90°
target ↔ interferer
Fingerprint distance
0.42
dex · spectral contrast
Interferer
OUTSIDE
of focus region
Two synthetic vowels, colored by the microstructure, then masked by direction.

How it works