AI assistants are going voice-first, and Mistral AI just launched its models to compete.
On Wednesday, the French AI startup launched Voxtral Transcribe 2, its next-generation family of speech-to-text models that boast state-of-the-art transcription quality, speaker diarization, and timestamps, while maintaining ultra-low latency, according to the company. The models are also small enough to run on-device, offering wins in privacy and cost.
“Voxtral Transcribe 2 proves that state-of-the-art transcription can run locally, without compromising accuracy or speed. For businesses and users who demand privacy and control, this changes everything,” said Pierre Stock, VP Science at Mistral AI, to The Deep View.
The launch includes:
- Voxtral Realtime - A 4 billion parameter model aimed at live transcription, achieving “state of the art” transcription with 480ms latency across 13 languages. It can be configurable down to sub-200ms latency.
- Voxtral Mini Transcribe V2 - Offers high quality transcriptions at a lower cost, with Mistral claiming it achieves “the lowest word error rate, at the lowest price point.”
- An audio playground in Mistral Studio where users can test the transcription capabilities offered by Voxtral 2.
Performance on the FLEURS benchmark shows that Voxtral Mini Transcribe V2 performs competitively against models from Gemini and OpenAI, with the lowest diarization error rate.

The models can adjust to speaker accents and jargon across languages, making content accessible to as many people as possible. Real-world enterprise uses include AI-powered customer service and multilingual subtitles. Because it runs on devices, it works great for industries handling sensitive data like healthcare and finance. Staying true to Mistral's open-source approach, they've released the model weights under Apache 2.0 license.
“Open-weight models like Voxtral Realtime aren’t just about transparency - they’re about acceleration. By putting this technology in the hands of developers worldwide, we’re not just releasing a tool; we’re unlocking a wave of innovation where low latency is critical,” added Stock.

