VexaScribe Features
VexaScribe — AI transcription in 99 languages. Speaker detection, timestamps, AI summaries, and built-in translation (133 languages). Upload files or send a meeting bot to Zoom, Meet, or Teams. From $2/month.
What VexaScribe is, in 80 words
VexaScribe is a web app that turns audio and video into searchable, timestamped, speaker-labeled transcripts using OpenAI Whisper. Drop a file (up to 5 GB) or send a bot to your Zoom, Google Meet, or Teams meeting. Get a transcript in 99 languages in ~5–10 minutes per hour of audio, optional AI summary with action items, and exports to TXT, DOCX, SRT, VTT, or JSON. 30 minutes free, then $2–$20/month. No credit card to start.
What VexaScribe doesn't do
Five things VexaScribe is genuinely not built for, with the tool we'd actually recommend in each case. If your use case is on this list, save yourself the trial signup.
No real-time live captioning
Transcripts are generated after upload, not as you speak. A 1-hour file takes 5–10 minutes to process — fine for meetings you watch back, wrong for live events.
Use instead: Otter Live, Google Meet's built-in captions, or Web Captioner for free browser-based live captions.
No public REST API
VexaScribe is a web app for humans, not a backend service. There's no developer API, no SDK, no webhook for programmatic uploads.
Use instead: OpenAI Whisper API ($0.006/min), Deepgram Nova-3 (~$0.0043/min), or AssemblyAI (~$0.012/min).
No video editing
You can export SRT/VTT subtitles to drop into your editor, but VexaScribe won't cut clips, remove filler words, or burn captions onto video.
Use instead: Descript or Vrew for transcript-based video editing; Premiere/Final Cut/DaVinci for traditional NLE workflows.
No custom vocabulary tuning
You can't upload a dictionary of brand names, drug names, or technical jargon to bias the model toward. Whisper is used as-is, with no per-account fine-tuning.
Use instead: AssemblyAI's “word boost” or Deepgram's “keywords” param for proper-noun-heavy domains.
No on-premise / enterprise self-hosting
Audio is processed in our cloud — there's no air-gapped or HIPAA-BAA-signed deployment available. For attorney-client, clinical therapy, or classified content where a breach creates direct legal liability, no cloud tool (ours included) is the right call.
Use instead: install OpenAI Whisper locally (free, runs on your machine, audio never leaves), or for legal-grade 100% accuracy use human transcription (Rev, GoTranscript) at $1.25–$1.99/min.
Honest accuracy — what the numbers really mean
VexaScribe uses OpenAI Whisper (specifically large-v3 class models). Marketing pages love to say “99% accuracy” — that's not honest. Real-world Whisper accuracy depends heavily on audio quality, accent, and number of speakers. Here's what to expect.
Transcription accuracy (Whisper)
- Clean studio English, single speaker~92–97%
- Accented English (non-native, regional)~85–92%
- Noisy environments (cafes, phone, outdoor)~80–90%
- Clean Spanish, French, German, Italian, Portuguese, Dutch~88–94%
- Korean, Japanese, Indonesian, Turkish, Arabic, Polish~85–92%
Source: Open ASR Leaderboard + Whisper paper benchmarks (LibriSpeech, FLEURS, Common Voice).
Speaker diarization accuracy
- 2 speakers, no overlap95%+
- 3–4 speakers, occasional overlap~88–94%
- 5–6 speakers, meeting dynamics~80–90%
- 7–15 speakers, panel or focus group~70–82%
- Up to 50 speakers (max supported)variable
Best accuracy with 2–6 distinct speakers. You can rename Speaker 1/2/3 in the editor after.
What moves the needle
Three things that matter more than picking the “best” transcription tool:
- A decent mic (USB headset or lapel beats laptop built-in by 5–15 accuracy points).
- One speaker at a time — overlap kills both transcription and diarization.
- Low background noise. Record in a closed room, not next to a fan or HVAC vent.
If you need legal-grade 100% accuracy (court filings, regulated research), use human transcription services like Rev or GoTranscript at $1.25–$1.99/min. AI gets you to ~95% at 1–2% the cost — fine for most use cases, wrong for some.
Core Features
99 Languages Supported
Transcribe audio and video in 99 languages with automatic language detection. From English to Japanese, Spanish to Arabic.
Speaker Detection
Automatic speaker diarization identifies and labels different voices. Perfect for interviews, podcasts, and meetings.
Timestamps
Every transcript includes precise timestamps. Click any timestamp to jump to that moment in your audio.
5 Export Formats
Export as TXT, DOCX, SRT, VTT, or JSON. Choose the format that fits your workflow.
Fast Processing
AI-powered transcription completes in minutes, not hours. A 1-hour recording typically processes in 5–10 minutes.
Built-in Editor
Review and edit your transcripts directly in the browser. Fix errors, rename speakers, and perfect your transcript before exporting.
Meeting Bot
Send an AI bot to your Zoom, Google Meet, or Teams meetings. It records, transcribes, and generates structured summaries with action items and decisions. Uses 3× transcription credits.
AI Summaries
Turn any transcript into structured key points, action items, chapter markers, and decisions. Included on all paid plans.
Transcript Translation
Translate any transcript into 133 languages via Google Translate — no extra cost, no third-party account needed.
Bulk Upload — 50 Files at Once
Upload up to 50 audio or video files in one go. All processed in parallel — not one at a time. Mix formats freely and download everything as a ZIP.
Supported Formats
Audio Formats
Video Formats
Export Formats (5)
Plain text
Word document
Subtitles
Web subtitles
Structured data
Use Cases
Meeting Transcription
AI bot joins Zoom, Meet, or Teams meetings
Podcast Transcription
Turn episodes into show notes and blog posts
Interview Transcription
Transcribe with speaker detection
Lecture Transcription
Convert class recordings to study notes
Video to Text
Extract transcripts and create subtitles
MP3 to Text
Convert audio files to text documents
Audio Transcription
General audio to text conversion
Powered by Advanced AI
VexaScribe uses state-of-the-art speech recognition models trained on millions of hours of audio.
Accuracy for clear audio
Languages supported
Processing time per hour
Feature Availability by Plan
All plans include a free trial. No credit card required to start.
| Feature | Free Trial | Starter ($2/mo) | Pro ($10/mo) |
|---|---|---|---|
| Audio & video transcription | ✓ | ✓ | ✓ |
| 99 languages supported | ✓ | ✓ | ✓ |
| Speaker detection | ✓ | ✓ | ✓ |
| Timestamps | ✓ | ✓ | ✓ |
| Export: TXT, DOCX, SRT, VTT, JSON | ✓ | ✓ | ✓ |
| Transcript translation (133 languages) | ✓ | ✓ | ✓ |
| Built-in editor | ✓ | ✓ | ✓ |
| AI Summaries | — | ✓ | ✓ |
| Meeting Bot (Zoom, Meet, Teams) | — | ✓ | ✓ |
| Bulk transcription | ✓ | ✓ | ✓ |
Feature FAQ
What languages does VexaScribe support and how accurate are they?
99 languages, all powered by OpenAI Whisper. Accuracy depends on the language and audio quality: ~92-97% on clean English audio (per the Open ASR Leaderboard), ~88-94% on clean Spanish, French, German, Italian, Portuguese, and Dutch, and ~85-92% on Korean, Japanese, Indonesian, Turkish, Polish, and Arabic. Heavily accented speech, overlapping speakers, or noisy environments drop those numbers by 5-10 points. Language is detected automatically or you can force it manually.
Does VexaScribe have a public REST API?
Not yet — VexaScribe is a web app, not an API product. If you need to integrate transcription into your own software, use the OpenAI Whisper API ($0.006/min), Deepgram (~$0.0043/min on Nova-3), or AssemblyAI (~$0.012/min). All three have proper SDKs and documentation. If you want a no-code workflow that uploads files and downloads transcripts, VexaScribe is the right fit — just not via API.
What file formats can VexaScribe transcribe?
Audio: MP3, WAV, M4A, FLAC, OGG, AAC, WMA, OPUS. Video: MP4, MOV, AVI, MKV, WebM, WMV, FLV. The audio track is extracted from video files automatically — no separate conversion step. Exports: TXT, DOCX, SRT, VTT, JSON.
How large a file can I upload?
Up to 5 GB per file, which is roughly 90 hours of MP3 at 128 kbps or about 8 hours of 1080p video. That's well past the 25 MB cap on most free converter sites and the ~200 MB cap on Otter's free tier. For files larger than 5 GB, split the file in Audacity or ffmpeg and upload the chunks separately.
Does VexaScribe train AI models on my audio?
No. Your audio and transcripts are not used to train any model — ours or OpenAI's (we use Whisper via API, which doesn't train on inputs). You can delete any file from your dashboard at any time, and both the audio and the transcript are removed. Audio is encrypted in transit (TLS) and at rest. For attorney-client, clinical, or classified content where a breach would create direct legal liability, install OpenAI Whisper locally instead — no cloud tool, including ours, is worth that risk.
What's included on the free trial?
30 minutes of transcription credit, all 99 languages, speaker detection, timestamps, every export format (TXT/DOCX/SRT/VTT/JSON), the built-in editor, bulk upload (up to 50 files), and transcript translation into 133 languages. No credit card required. AI summaries and the meeting bot (Zoom/Meet/Teams) are paid-only — those start at $2/month on the Starter plan.
Ready to Start Transcribing?
Try VexaScribe free with 30 minutes of transcription. No credit card required.