VexaScribe 功能

99 种语言的 AI 转录。说话人识别、时间戳、AI 摘要和内置翻译（133 种语言）。上传文件或向 Zoom、Meet 或 Teams 发送会议机器人。$2/月起。

What VexaScribe is, in 80 words

VexaScribe is a web app that turns audio and video into searchable, timestamped, speaker-labeled transcripts using OpenAI Whisper. Drop a file (up to 5 GB) or send a bot to your Zoom, Google Meet, or Teams meeting. Get a transcript in 99 languages in ~5–10 minutes per hour of audio, optional AI summary with action items, and exports to TXT, DOCX, SRT, VTT, or JSON. 30 minutes free, then $2–$20/month. No credit card to start.

What VexaScribe doesn't do

Five things VexaScribe is genuinely not built for, with the tool we'd actually recommend in each case. If your use case is on this list, save yourself the trial signup.

No real-time live captioning

Transcripts are generated after upload, not as you speak. A 1-hour file takes 5–10 minutes to process — fine for meetings you watch back, wrong for live events.

Use instead: Otter Live, Google Meet's built-in captions, or Web Captioner for free browser-based live captions.

No public REST API

VexaScribe is a web app for humans, not a backend service. There's no developer API, no SDK, no webhook for programmatic uploads.

Use instead: OpenAI Whisper API ($0.006/min), Deepgram Nova-3 (~$0.0043/min), or AssemblyAI (~$0.012/min).

No video editing

You can export SRT/VTT subtitles to drop into your editor, but VexaScribe won't cut clips, remove filler words, or burn captions onto video.

Use instead: Descript or Vrew for transcript-based video editing; Premiere/Final Cut/DaVinci for traditional NLE workflows.

No custom vocabulary tuning

You can't upload a dictionary of brand names, drug names, or technical jargon to bias the model toward. Whisper is used as-is, with no per-account fine-tuning.

Use instead: AssemblyAI's “word boost” or Deepgram's “keywords” param for proper-noun-heavy domains.

No on-premise / enterprise self-hosting

Audio is processed in our cloud — there's no air-gapped or HIPAA-BAA-signed deployment available. For attorney-client, clinical therapy, or classified content where a breach creates direct legal liability, no cloud tool (ours included) is the right call.

Use instead: install OpenAI Whisper locally (free, runs on your machine, audio never leaves), or for legal-grade 100% accuracy use human transcription (Rev, GoTranscript) at $1.25–$1.99/min.

Honest accuracy — what the numbers really mean

VexaScribe uses OpenAI Whisper (specifically large-v3 class models). Marketing pages love to say “99% accuracy” — that's not honest. Real-world Whisper accuracy depends heavily on audio quality, accent, and number of speakers. Here's what to expect.

Transcription accuracy (Whisper)

Clean studio English, single speaker~92–97%
Accented English (non-native, regional)~85–92%
Noisy environments (cafes, phone, outdoor)~80–90%
Clean Spanish, French, German, Italian, Portuguese, Dutch~88–94%
Korean, Japanese, Indonesian, Turkish, Arabic, Polish~85–92%

Source: Open ASR Leaderboard + Whisper paper benchmarks (LibriSpeech, FLEURS, Common Voice).

Speaker diarization accuracy

2 speakers, no overlap95%+
3–4 speakers, occasional overlap~88–94%
5–6 speakers, meeting dynamics~80–90%
7–15 speakers, panel or focus group~70–82%
Up to 50 speakers (max supported)variable

Best accuracy with 2–6 distinct speakers. You can rename Speaker 1/2/3 in the editor after.

What moves the needle

Three things that matter more than picking the “best” transcription tool:

A decent mic (USB headset or lapel beats laptop built-in by 5–15 accuracy points).
One speaker at a time — overlap kills both transcription and diarization.
Low background noise. Record in a closed room, not next to a fan or HVAC vent.

If you need legal-grade 100% accuracy (court filings, regulated research), use human transcription services like Rev or GoTranscript at $1.25–$1.99/min. AI gets you to ~95% at 1–2% the cost — fine for most use cases, wrong for some.

核心功能

支持 99 种语言

以自动语言检测转录 99 种语言的音频和视频。从英语到日语，从西班牙语到阿拉伯语。

说话人识别

自动说话人分离功能可识别并标记不同的声音。非常适合访谈、播客和会议。

时间戳

每份转录都包含精确的时间戳。点击任意时间戳即可跳转到音频中的对应时刻。

5 种导出格式

可导出为 TXT、DOCX、SRT、VTT 或 JSON。选择最适合您工作流的格式。

快速处理

AI 驱动的转录在几分钟内完成，而非几小时。1 小时的录音通常在 5–10 分钟内处理完毕。

内置编辑器

直接在浏览器中查看和编辑您的转录。在导出前修正错误、重命名说话人，让转录臻于完美。

会议机器人

向您的 Zoom、Google Meet 或 Teams 会议发送 AI 机器人。它会录制、转录并生成包含行动项和决议的结构化摘要。消耗 3 倍转录额度。

AI 摘要

将任意转录整理为结构化的要点、行动项、章节标记和决议。所有付费套餐均包含。

转录翻译

通过 Google 翻译将任意转录翻译为 133 种语言 —— 无需额外费用，无需第三方账户。

Bulk Upload — 50 Files at Once

Upload up to 50 audio or video files in one go. All processed in parallel — not one at a time. Mix formats freely and download everything as a ZIP.

支持的格式

音频格式

MP3WAVM4AFLACOGGAACWMAOPUS

视频格式

MP4MOVAVIMKVWebMWMVFLV

导出格式（5 种）

TXT

纯文本

DOCX

Word 文档

SRT

字幕

VTT

Web 字幕

JSON

结构化数据

由先进 AI 驱动

VexaScribe 使用经过数百万小时音频训练的最先进语音识别模型。

95%

清晰音频准确率

支持语言数

5-10 min

每小时处理时间

各套餐功能可用性

所有套餐均包含免费试用。无需信用卡即可开始。

功能	免费试用	Starter（$2/月）	Pro（$10/月）
音频和视频转录	✓	✓	✓
支持 99 种语言	✓	✓	✓
说话人识别	✓	✓	✓
时间戳	✓	✓	✓
导出：TXT、DOCX、SRT、VTT、JSON	✓	✓	✓
转录翻译（133 种语言）	✓	✓	✓
内置编辑器	✓	✓	✓
AI 摘要	—	✓	✓
会议机器人（Zoom、Meet、Teams）	—	✓	✓
批量转录	✓	✓	✓

查看完整定价详情 →

功能常见问题

准备开始转录了吗？

免费试用 VexaScribe，享受 30 分钟转录。无需信用卡。

开始免费试用查看定价