1) Speech & ASR/TTS data (with deep Sukuma capability)
We design and run full‑stack speech data programs—from prompt design to public release—tailored to under‑resourced languages.
- • Sukuma voice leadership: regional recruitment, accent coverage (Mwanza, Shinyanga, Simiyu, Tabora), phonetic balance, cultural relevance.
- • Production standards: 16‑bit PCM WAV at ≥48 kHz, SNR ≥ 40 dB, tight alignment, multi‑stage QA.
- • TTS‑readiness: ≥50 hours per language meeting multi‑speaker criteria.
- • Scale: plans to reach ≥500 speakers and ≥500 hours of read speech as required.
- • Open licensing & FAIR: CC‑BY 4.0 (or equivalent) with comprehensive metadata.