Breakthrough: New class of AI tops LLMs

By
Jason Hiner

Jan 20, 2026

12:30pm UTC

Copy link
Share on X
Share on LinkedIn
Share on Instagram
Share via Facebook
A

udio and voice technologies have burst onto 2026 as one of the hottest new trends in AI. And new research from audio tech startup Modulate could transform AI far beyond just audio models.

On Tuesday, Boston-based Modulate announced a research breakthrough, the Ensemble Listening Model (ELM), a novel approach to AI that it says can outperform LLMs on both cost and accuracy.

In an interview with The Deep View, Mike Pappas, CEO of Modulate, said, “We think we’ve actually cracked the code on how to do what I’m going to call heterogeneous ensembles… which we think is extremely relevant to the broader AI development community.”

The breakthrough:

  • Combining hundreds of different small, specialized models (analyzing background noise, transcripts, emotions, cultural cues, synthetic voices, etc.)
  • Bringing them together through Modulate's new heterogeneous ensemble architecture
  • Using the company's dynamic real-time orchestration method to weave the signals together and produce a clear, accurate interpretation of what's happening in the conversation

Modulate cut its teeth learning to perform voice analysis on online gaming platforms such as Call of Duty and Grand Theft Auto Online, where it helped the platforms analyze voice conversations to identify hate speech and other policy-breaking violations.

Also released on Tuesday was Velma 2.0, Modulate's enterprise platform that competes with conversational AI models from OpenAI, Gemini, Microsoft, ElevenLabs, and others. Examples of enterprise uses of the Modulate platform include:

  • Fraud detection — A food delivery company uses the software to flag emotionally manipulative callers trying to scam drivers and get free meals.
  • Call center burnout and retention — Modulate’s aggression/emotion scores enable customer support to automatically grant short breaks to associates after difficult calls, boosting well-being and retention.
  • Protecting at-risk users — Modulate can read vocal cues like age, distress, and confusion, then route at-risk users, such as children or seniors, to human agents, helping companies meet regulatory guidelines and minimize harmful AI interactions.
  • AI agent oversight — Vendors can put guardrails on their AI voice agents with Modulate to prevent bots from going off-policy by sensing user emotions and ensuring agents respond appropriately.

Modulate released a research paper for other AI model builders to learn from the techniques that allowed it to achieve its breakthroughs using heterogeneous ensemble architecture and dynamic real-time orchestration.

Our Deeper View

What Modulate AI is trying to solve is incredibly challenging and complex. In autonomous vehicles, this is called "sensor fusion," where an AI must take signals from multiple sensors on the vehicle — multiple cameras, radar, lidar, etc. — and combine them to make life-or-death decisions from conflicting information in real time. Modulate's breakthrough of combining smaller, more expert models and producing less expensive and more accurate results than LLMs is a sign of another emerging 2026 trend: small language models (SLMs) and domain-specific models running circles around the big LLMs from frontier labs.