“The fastest LLM-based TTS system available, with competitive voice quality, virtually zero content hallucinations, and a footprint light enough for on-device deployment.”
The trick here is dead simple and obvious in hindsight. Instead of compressing audio into dozens of tokens per text token like everyone else, TADA aligns one acoustic vector per text token. That kills hallucination by design. No more TTS systems skipping words or inventing sounds that aren’t there. Hume AI gets credit for open-sourcing the 1B and 3B models on Hugging Face, but the real question is whether “open-source” here means actually open or just open-weights-with-strings-attached.