udio and voice technologies have burst onto 2026 as one of the hottest new trends in AI. And new research from audio tech startup Modulate could transform AI far beyond just audio models.
On Tuesday, Boston-based Modulate announced a research breakthrough, the Ensemble Listening Model (ELM), a novel approach to AI that it says can outperform LLMs on both cost and accuracy.

In an interview with The Deep View, Mike Pappas, CEO of Modulate, said, “We think we’ve actually cracked the code on how to do what I’m going to call heterogeneous ensembles… which we think is extremely relevant to the broader AI development community.”
The breakthrough:
- Combining hundreds of different small, specialized models (analyzing background noise, transcripts, emotions, cultural cues, synthetic voices, etc.)
- Bringing them together through Modulate's new heterogeneous ensemble architecture
- Using the company's dynamic real-time orchestration method to weave the signals together and produce a clear, accurate interpretation of what's happening in the conversation
Modulate cut its teeth learning to perform voice analysis on online gaming platforms such as Call of Duty and Grand Theft Auto Online, where it helped the platforms analyze voice conversations to identify hate speech and other policy-breaking violations.
Also released on Tuesday was Velma 2.0, Modulate's enterprise platform that competes with conversational AI models from OpenAI, Gemini, Microsoft, ElevenLabs, and others. Examples of enterprise uses of the Modulate platform include:
- Fraud detection — A food delivery company uses the software to flag emotionally manipulative callers trying to scam drivers and get free meals.
- Call center burnout and retention — Modulate’s aggression/emotion scores enable customer support to automatically grant short breaks to associates after difficult calls, boosting well-being and retention.
- Protecting at-risk users — Modulate can read vocal cues like age, distress, and confusion, then route at-risk users, such as children or seniors, to human agents, helping companies meet regulatory guidelines and minimize harmful AI interactions.
- AI agent oversight — Vendors can put guardrails on their AI voice agents with Modulate to prevent bots from going off-policy by sensing user emotions and ensuring agents respond appropriately.
Modulate released a research paper for other AI model builders to learn from the techniques that allowed it to achieve its breakthroughs using heterogeneous ensemble architecture and dynamic real-time orchestration.




