VexaScribeの機能

99言語対忍AI文字起こし。話者認識、タイムスタンプ、AI要約、133言語翻訳など全部入り。ファイルをアップロードするか、会議にボットを送信。月額2ドルから。

What VexaScribe is, in 80 words

VexaScribe is a web app that turns audio and video into searchable, timestamped, speaker-labeled transcripts using OpenAI Whisper. Drop a file (up to 5 GB) or send a bot to your Zoom, Google Meet, or Teams meeting. Get a transcript in 99 languages in ~5–10 minutes per hour of audio, optional AI summary with action items, and exports to TXT, DOCX, SRT, VTT, or JSON. 30 minutes free, then $2–$20/month. No credit card to start.

What VexaScribe doesn't do

Five things VexaScribe is genuinely not built for, with the tool we'd actually recommend in each case. If your use case is on this list, save yourself the trial signup.

No real-time live captioning

Transcripts are generated after upload, not as you speak. A 1-hour file takes 5–10 minutes to process — fine for meetings you watch back, wrong for live events.

Use instead: Otter Live, Google Meet's built-in captions, or Web Captioner for free browser-based live captions.

No public REST API

VexaScribe is a web app for humans, not a backend service. There's no developer API, no SDK, no webhook for programmatic uploads.

Use instead: OpenAI Whisper API ($0.006/min), Deepgram Nova-3 (~$0.0043/min), or AssemblyAI (~$0.012/min).

No video editing

You can export SRT/VTT subtitles to drop into your editor, but VexaScribe won't cut clips, remove filler words, or burn captions onto video.

Use instead: Descript or Vrew for transcript-based video editing; Premiere/Final Cut/DaVinci for traditional NLE workflows.

No custom vocabulary tuning

You can't upload a dictionary of brand names, drug names, or technical jargon to bias the model toward. Whisper is used as-is, with no per-account fine-tuning.

Use instead: AssemblyAI's “word boost” or Deepgram's “keywords” param for proper-noun-heavy domains.

No on-premise / enterprise self-hosting

Audio is processed in our cloud — there's no air-gapped or HIPAA-BAA-signed deployment available. For attorney-client, clinical therapy, or classified content where a breach creates direct legal liability, no cloud tool (ours included) is the right call.

Use instead: install OpenAI Whisper locally (free, runs on your machine, audio never leaves), or for legal-grade 100% accuracy use human transcription (Rev, GoTranscript) at $1.25–$1.99/min.

Honest accuracy — what the numbers really mean

VexaScribe uses OpenAI Whisper (specifically large-v3 class models). Marketing pages love to say “99% accuracy” — that's not honest. Real-world Whisper accuracy depends heavily on audio quality, accent, and number of speakers. Here's what to expect.

Transcription accuracy (Whisper)

  • Clean studio English, single speaker~92–97%
  • Accented English (non-native, regional)~85–92%
  • Noisy environments (cafes, phone, outdoor)~80–90%
  • Clean Spanish, French, German, Italian, Portuguese, Dutch~88–94%
  • Korean, Japanese, Indonesian, Turkish, Arabic, Polish~85–92%

Source: Open ASR Leaderboard + Whisper paper benchmarks (LibriSpeech, FLEURS, Common Voice).

Speaker diarization accuracy

  • 2 speakers, no overlap95%+
  • 3–4 speakers, occasional overlap~88–94%
  • 5–6 speakers, meeting dynamics~80–90%
  • 7–15 speakers, panel or focus group~70–82%
  • Up to 50 speakers (max supported)variable

Best accuracy with 2–6 distinct speakers. You can rename Speaker 1/2/3 in the editor after.

What moves the needle

Three things that matter more than picking the “best” transcription tool:

  1. A decent mic (USB headset or lapel beats laptop built-in by 5–15 accuracy points).
  2. One speaker at a time — overlap kills both transcription and diarization.
  3. Low background noise. Record in a closed room, not next to a fan or HVAC vent.

If you need legal-grade 100% accuracy (court filings, regulated research), use human transcription services like Rev or GoTranscript at $1.25–$1.99/min. AI gets you to ~95% at 1–2% the cost — fine for most use cases, wrong for some.

主要機能

99言語対応

自動言語検出により99言語の音声・動画を文字起こし。英語から日本語、スペイン語まで対応。

話者認識

自動話者分離で複数の声を識別・ラベル付け。インタビューや会議に最適。

タイムスタンプ

全文字起こしに正確なタイムスタンプが付きます。クリックでその時閣にジャンプ。

5種類のエクスポート

TXT、DOCX、SRT、VTT、JSONでエクスポート。ワークフローに合った形式を選択。

高速処理

AI文字起こしは数分で完了。1時間の録音が、5~10分で処理されます。

内蔵エディター

ブラウザ上で直接トランスクリプトを確認・編集。話者の名前変更や誤り修正が可能。

会議ボット

Zoom、Google Meet、TeamsにAIボットを送信。録音・文字起こし・構造化された要約を生成。クレジット3倍。

AI要約

文字起こしを要点・アクションアイテム・決定事項に構造化。全有料プランに含まれます。

文字起こし翻訳

Google Translateで133言語に翻訳 — 追加費用なし、別アカウント不要。

Bulk Upload — 50 Files at Once

Upload up to 50 audio or video files in one go. All processed in parallel — not one at a time. Mix formats freely and download everything as a ZIP.

対忍フォーマット

音声フォーマット

MP3WAVM4AFLACOGGAACWMAOPUS

動画フォーマット

MP4MOVAVIMKVWebMWMVFLV

エクスポート形式(5種)

TXT

プレーンテキスト

DOCX

Word文書

SRT

字幕

VTT

Web字幕

JSON

構造化データ

高度なAI技術を搭載

VexaScribeは、数百万時間の音声データでトレーニングされた最新の音声認識モデルを使用しています。

95%

鉄風の音声での精度

99

対忍言語数

5-10 min

1時間あたりの処理時間

プラン別機能一覧

全プランに無料トライアルあり。クレジットカード不要。

機能無料トライアルStarter(2ドル/月)Pro(10ドル/月)
音声・動画の文字起こし
99言語対忍
話者認識
タイムスタンプ
エクスポート: TXT, DOCX, SRT, VTT, JSON
トランスクリプト翻訳(133言語)
内蔵エディター
AI要約
会議ボット(Zoom、Meet、Teams)
一括文字起こし

よくある質問

文字起こしを始めましょう

VexaScribeの30分無料トライアルをお試しください。クレジットカード不要。