udio and voice-powered AI is going beyond flashy demos on the CES show floor. On Tuesday, Deepgram, a voice AI API platform, announced a $130 million Series C funding round.
The round, which included investors such as AVP, Citi Ventures, Columbia University, SAP and Alumni Ventures, brings the company’s total funding to more than $215 million, and jumps its valuation to $1.3 billion. Alongside the funding, Deepgram announced that it acquired OfOne, an AI voice platform specifically for restaurants, drive thoughs and quick-service establishments.
Deepgram offers companies the building blocks to develop voice and audio AI, including text-to-speech, speech-to-text, conversational speech recognition and voice-powered agents.
“What we see is that the world is moving to voice as the interface between humans and technology,” Scott Stephenson, CEO and Co-Founder of Deepgram, told The Deep View. “It's an emerging trillion-dollar market that is the voice AI economy.”
Voice and audio AI could remove the friction of interfacing with AI via screens, especially once these models become more lifelike, said Stephenson.
- One of Deepgram’s goals for the upcoming year is to pass the Audio Turing Test, which assesses how realistic and human-like AI-generated audio sounds.
- “Can you have a five-minute conversation with the machine and not know it?” said Stephenson. “In a year, that will be passed, and then within two years, it will be scaled.”
The use cases for human-like voice AI capabilities are wide-ranging, from call centers to drive-through ordering to consumer apps. “If you get a call from a dentist rescheduling your appointment, you don't really care if it's AI. They would just like to have a pleasant conversation and get it rescheduled,” said Stephenson.
Stephenson is among several firms regarding it as the next frontier, with many searching for the next big thing beyond chatbots. OpenAI, for example, has begun ramping up its AI audio efforts as it prepares to debut its first consumer device. But getting people to adopt this form factor might involve tearing them away from their screens, Stephenson noted.
“It's the next frontier, but it was also the first frontier,” Stephenson said. “Computers couldn't converse with humans before, so we learned how to deal with that … but [voice] is the natural interface. The technical capability just wasn't there before. Now it is.”




