The Future of Voice AI is Here: Real-Time Cloning, On-Device & Live Translation (Gradium CEO)
发布时间 来源
Episode 设置
摘要
Current voice AI is too slow and expensive for interactive applications like gaming and robotics. Enter Gradium, a commercial spin-off from the Kyutai AI lab. In this demo, Neil Zeghidour showcases their real-time voice infrastructure. Watch their killer features in action: a high-fidelity text-to-speech model running entirely on a CPU, interactive voice agents that maintain natural conversation flow, and real-time speech translation with voice cloning. They even demonstrate restoring the voice of ALS patient Olivier Goy
00:31 - The backstory: A commercial spin-off from Kyutai Labs.
01:16 - The shift from offline to interactive voice in gaming and live streams.
03:12 - Live demo: AI-generated personalized esports commentary.
04:33 - Restoring the voice of ALS patient Olivier Goy.
05:16 - Creating real-time personalized videos.
06:41 - Running a 100M parameter text-to-speech model locally on a CPU.
08:41 - Building interactive voice agents that use function calling.
11:47 - Hibiki: Real-time, on-device speech-to-speech translation.
Gradium
Website - @
X/Twitter - @AI
HOSTED BY:
FirstMark Capital
Website - @
X/Twitter - @rkCap
Matt Turck (Managing Director)
Blog - @
LinkedIn - @k/
X/Twitter - @ck
This session was recorded live at a recent Data Driven NYC, our in-person, monthly event series. If you are ever in New York, you can join the upcoming events by following FirstMark on Luma: @rkcap
Check out the MAD Podcast:
Spotify - @LATDSaFvgJG80ACcRJtq
Apple - https://podcasts.apple.com/us/podcast/the-mad-podcast-with-matt-turck/id1686238724
GPT-4正在为你翻译摘要中......
