SonicSieve
Your ear can tell a voice in front of you from one behind without turning your head — the folds of your pinna color each direction with its own pattern of spectral notches, and your brain reads the color. SonicSieve 3D-prints that trick: a hollow resin cylinder with a ring of surface holes and internal capillary tubes, clipped onto an earphone mic, stamps a direction-dependent fingerprint onto incoming sound — so a neural net can pull out the talker you’re facing with just two mics, beating a flat five-mic array. Change the diameter and number of holes below and watch Fig. 2 (per-angle coloring) and Fig. 6 (spatial diversity) redraw.
The microstructure — geometry
A larger diameter resonates at longer wavelengths, sliding the notch comb down into the 1–8 kHz speech band — the paper’s 20 mm design beats Owlet’s 15 mm. Holes act as virtual sound sources: too few and whole directions get attenuated; too many and the cylinder opens up, washing the coloring out. SonicSieve’s sweet spot is 20 mm with 6 holes. The focus region is the angular wedge it keeps — the paper reports 5.0 dB gain over a 30° (±15°) wedge.
Top-down scene — drag the talkers
Figure 2 — effect of the microstructure on coloring
Redrawn from the paper’s Fig. 2. Left (bare mic): all four angles overlap — no spatial cue. Right (with microstructure): each angle leaves a distinct pattern of peaks and notches a model can learn. Grow the diameter and the structure pushes deeper into the speech band; the right panel reverts to flat lines whenever you choose Bare mic.
Directional fingerprint — magnitude vs frequency
Each direction lands its notches at different frequencies. The net knows your target sits at θ_t, so it favors bands where the target’s coloring is strong and the interferer’s is weak (the gold mask). When the two fingerprints overlap — talkers close together, or the microstructure off — there’s nothing to grab and the mask flattens to ½.
Figure 6 — spatial diversity across the design
Redrawn from the paper’s Fig. 6: each pixel is the spectral distance between two arrival angles, so the dark diagonal is “an angle vs. itself.” More yellow off the diagonal = more separable directions. Sweep the diameter up and the map lights up; crowd in too many holes and it fades back toward blue. Compare 10 mm/10 holes against 20 mm/6 holes — the SonicSieve design the authors landed on.
Result
How it works
- A microphone array localizes by time-of-arrival differences between elements — which needs many mics spread wide. SonicSieve instead borrows the pinna’s trick: a passive shape that turns direction into a spectral pattern on a single channel, so the cue lives in the color of the sound, not the spacing of sensors.
- The microstructure is a clip-on 3D-printed resin cylinder on the in-line mic of cheap wired earphones — no electronics, no power. A ring of surface holes acts as virtual sound sources; sound from each angle
θcouples into the holes facing it, travels capillary tubes of differing length to the mic, and the delayed copies interfere into angle-dependent notches. - Diameter sets the band. The cylinder resonates at wavelengths tied to its size, so a bigger diameter slides the notch comb down into the 1–8 kHz speech range. Owlet’s 15 mm design only differs above 7 kHz; SonicSieve’s 20 mm spreads the cues right across speech — watch the Fig. 6 map brighten as you widen it.
- Holes have a sweet spot. Too few and whole swaths of directions never couple in (they get attenuated and look alike); too many and the wall between holes vanishes — the cylinder becomes acoustically open and the coloring washes out. The authors tuned a ten-hole design down to 6 strategically placed holes.
- Because each angle prints a distinct fingerprint, an on-device neural net extracts a target by direction — reported 5.0 dB gain over a 30° region, two mics out-performing a flat five-mic array. But separation lives or dies on fingerprint distance: two talkers close in angle share nearly the same notches and the gain collapses; pull them apart and the target pops out.