ElevenLabs, a Palo Alto-based AI startup valued at$3.3 billion, has introduced its first stand-alone speech-to-text model, Scribe. The company, which is better known for its audio-generation capabilities, now aims to disrupt the speech detection market by providing a faster and more accurate alternative to existing models like Whisper and Deepgram. Scribe supports over 99 languages, with top accuracy in more than 25, including English, French, and Spanish.
The new AI model has already outperformed competitors like Google's Gemini 2.0 Flash and OpenAI's Whisper Large V3 in benchmark tests. It also includes features like speaker diarisation, accurate subtitles, and sound event tagging, which could appeal to customers in media and content creation. While Scribe currently only works with pre-recorded audio, ElevenLabs plans to release a real-time version soon.
Priced at$0.40 per hour of transcribed audio, Scribe offers a competitive rate, though some rivals currently offer lower prices. With this move into speech-to-text, ElevenLabs is positioning itself to expand its AI offerings and challenge established players in the field.