Signals · 2026-W18 — Fullduplex

What happened this week

Four preprints worth forwarding, all evaluation-leaning. The Hugging Face / GitHub / lab-blog buckets did not surface a primary-sourced release in scope for this window, so the issue is paper-only.

The headline — full-duplex evaluation

HumDial-FDBench is the comprehensive write-up of the ICASSP 2026 HumDial Challenge full-duplex track. The headline contribution is a dual-channel dataset of real human-recorded conversations — capturing interruptions, overlap, and feedback mechanisms — and a public leaderboard that compares open-source and proprietary systems on interruption handling and conversational flow. This is the most concrete shared eval artifact full-duplex has had since FD-Bench v3, and it slots in as a peer to the existing HumDial track Fullduplex already tracks under /benchmarks.

Paralinguistic and timing control

Two papers push the controllability frontier rather than capability:

SpeechParaling-Bench expands paralinguistic feature coverage from fewer than 50 to over 100 fine-grained features, with 1,000+ English-Chinese parallel queries across three tasks (fine-grained control, intra-utterance variation, context-aware adaptation). The pairwise-comparison evaluation pipeline is the methodologically interesting bit. The headline empirical finding is that paralinguistic misinterpretation accounts for 43.3 percent of errors in situational dialogue even on leading proprietary models.
MAGIC-TTS is presented as the first TTS system with explicit token-level local timing control over both content duration and pause. The training mechanisms — high-confidence duration supervision plus zero-value bias correction — are the parts that read as transferable. The scenario-based editing benchmark covers navigation guidance, guided reading, and accessibility-oriented code reading.

Fairness and robustness in ASR

Do LLM Decoders Listen Fairly? ships a 216-run stress test of nine ASR models across three architectural generations (CTC-only, encoder-decoder, and explicit-LLM decoder) on Common Voice 24 plus Meta's Fair-Speech. The two findings worth reading are that LLM decoders do not amplify racial bias — Granite-8B has the best ethnicity fairness in the sweep — and that audio compression, not LLM scale, is the dominant predictor of accent fairness. Whisper enters catastrophic repetition loops under chunk masking, while explicit-LLM decoders produce ~38x fewer insertions.

What is not here

No Hugging Face / GitHub / lab-blog signal landed inside the window with a primary source we can cite. The Mistral, Hume, and Microsoft speech releases that sometimes get cited under "this week" are all earlier in March or early April. If something open-weights ships before next Sunday, it will move into 2026-W19.

Corrections to hello@fullduplex.ai.

Signals · 2026-W18.

What happened this week

The headline — full-duplex evaluation

Paralinguistic and timing control

Fairness and robustness in ASR

What is not here