Tính năng VexaScribe

Phiên âm AI trong 99 ngôn ngữ. Nhận dạng người nói, dấu thời gian, tóm tắt AI và dịch tích hợp (133 ngôn ngữ). Tải tệp lên hoặc gửi bot cuộc họp tới Zoom, Meet hoặc Teams. Chỉ từ 2 $/tháng.

Dùng thử miễn phí — 30 phút Xem bảng giá

What VexaScribe is, in 80 words

VexaScribe is a web app that turns audio and video into searchable, timestamped, speaker-labeled transcripts using OpenAI Whisper. Drop a file (up to 5 GB) or send a bot to your Zoom, Google Meet, or Teams meeting. Get a transcript in 99 languages in ~5–10 minutes per hour of audio, optional AI summary with action items, and exports to TXT, DOCX, SRT, VTT, or JSON. 30 minutes free, then $2–$20/month. No credit card to start.

What VexaScribe doesn't do

Five things VexaScribe is genuinely not built for, with the tool we'd actually recommend in each case. If your use case is on this list, save yourself the trial signup.

No real-time live captioning

Transcripts are generated after upload, not as you speak. A 1-hour file takes 5–10 minutes to process — fine for meetings you watch back, wrong for live events.

Use instead: Otter Live, Google Meet's built-in captions, or Web Captioner for free browser-based live captions.

No public REST API

VexaScribe is a web app for humans, not a backend service. There's no developer API, no SDK, no webhook for programmatic uploads.

Use instead: OpenAI Whisper API ($0.006/min), Deepgram Nova-3 (~$0.0043/min), or AssemblyAI (~$0.012/min).

No video editing

You can export SRT/VTT subtitles to drop into your editor, but VexaScribe won't cut clips, remove filler words, or burn captions onto video.

Use instead: Descript or Vrew for transcript-based video editing; Premiere/Final Cut/DaVinci for traditional NLE workflows.

No custom vocabulary tuning

You can't upload a dictionary of brand names, drug names, or technical jargon to bias the model toward. Whisper is used as-is, with no per-account fine-tuning.

Use instead: AssemblyAI's “word boost” or Deepgram's “keywords” param for proper-noun-heavy domains.

No on-premise / enterprise self-hosting

Audio is processed in our cloud — there's no air-gapped or HIPAA-BAA-signed deployment available. For attorney-client, clinical therapy, or classified content where a breach creates direct legal liability, no cloud tool (ours included) is the right call.

Use instead: install OpenAI Whisper locally (free, runs on your machine, audio never leaves), or for legal-grade 100% accuracy use human transcription (Rev, GoTranscript) at $1.25–$1.99/min.

Honest accuracy — what the numbers really mean

VexaScribe uses OpenAI Whisper (specifically large-v3 class models). Marketing pages love to say “99% accuracy” — that's not honest. Real-world Whisper accuracy depends heavily on audio quality, accent, and number of speakers. Here's what to expect.

Transcription accuracy (Whisper)

Clean studio English, single speaker~92–97%
Accented English (non-native, regional)~85–92%
Noisy environments (cafes, phone, outdoor)~80–90%
Clean Spanish, French, German, Italian, Portuguese, Dutch~88–94%
Korean, Japanese, Indonesian, Turkish, Arabic, Polish~85–92%

Source: Open ASR Leaderboard + Whisper paper benchmarks (LibriSpeech, FLEURS, Common Voice).

Speaker diarization accuracy

2 speakers, no overlap95%+
3–4 speakers, occasional overlap~88–94%
5–6 speakers, meeting dynamics~80–90%
7–15 speakers, panel or focus group~70–82%
Up to 50 speakers (max supported)variable

Best accuracy with 2–6 distinct speakers. You can rename Speaker 1/2/3 in the editor after.

What moves the needle

Three things that matter more than picking the “best” transcription tool:

A decent mic (USB headset or lapel beats laptop built-in by 5–15 accuracy points).
One speaker at a time — overlap kills both transcription and diarization.
Low background noise. Record in a closed room, not next to a fan or HVAC vent.

If you need legal-grade 100% accuracy (court filings, regulated research), use human transcription services like Rev or GoTranscript at $1.25–$1.99/min. AI gets you to ~95% at 1–2% the cost — fine for most use cases, wrong for some.

Tính năng cốt lõi

Hỗ trợ 99 ngôn ngữ

Phiên âm âm thanh và video trong 99 ngôn ngữ với tính năng tự động nhận dạng ngôn ngữ. Từ tiếng Anh đến tiếng Nhật, tiếng Tây Ban Nha đến tiếng Ả Rập.

Nhận dạng người nói

Tự động phân tách người nói giúp nhận dạng và gắn nhãn các giọng nói khác nhau. Hoàn hảo cho phỏng vấn, podcast và cuộc họp.

Dấu thời gian

Mỗi bản phiên âm đều có dấu thời gian chính xác. Nhấp vào bất kỳ dấu thời gian nào để chuyển đến khoảnh khắc đó trong âm thanh của bạn.

5 định dạng xuất

Xuất dưới dạng TXT, DOCX, SRT, VTT hoặc JSON. Chọn định dạng phù hợp với quy trình làm việc của bạn.

Xử lý nhanh

Phiên âm bằng AI hoàn tất trong vài phút, không phải vài giờ. Một bản ghi 1 giờ thường được xử lý trong 5–10 phút.

Trình chỉnh sửa tích hợp

Xem lại và chỉnh sửa bản phiên âm trực tiếp trong trình duyệt. Sửa lỗi, đổi tên người nói và hoàn thiện bản phiên âm trước khi xuất.

Bot cuộc họp

Gửi bot AI tới các cuộc họp Zoom, Google Meet hoặc Teams của bạn. Bot sẽ ghi âm, phiên âm và tạo bản tóm tắt có cấu trúc với các việc cần làm và quyết định. Sử dụng 3× tín dụng phiên âm.

Tóm tắt AI

Biến mọi bản phiên âm thành các điểm chính có cấu trúc, việc cần làm, dấu chương và quyết định. Có sẵn trên tất cả các gói trả phí.

Dịch bản phiên âm

Dịch bất kỳ bản phiên âm nào sang 133 ngôn ngữ qua Google Translate — không tốn thêm chi phí, không cần tài khoản bên thứ ba.

Bulk Upload — 50 Files at Once

Upload up to 50 audio or video files in one go. All processed in parallel — not one at a time. Mix formats freely and download everything as a ZIP.

Định dạng được hỗ trợ

Định dạng âm thanh

MP3WAVM4AFLACOGGAACWMAOPUS

Định dạng video

MP4MOVAVIMKVWebMWMVFLV

Định dạng xuất (5)

TXT

Văn bản thuần

DOCX

Tài liệu Word

SRT

Phụ đề

VTT

Phụ đề web

JSON

Dữ liệu có cấu trúc

Được hỗ trợ bởi AI tiên tiến

VexaScribe sử dụng các mô hình nhận dạng giọng nói hiện đại nhất, được huấn luyện trên hàng triệu giờ âm thanh.

95%

Độ chính xác với âm thanh rõ

Ngôn ngữ được hỗ trợ

5-10 min

Thời gian xử lý mỗi giờ

Tính năng theo từng gói

Tất cả các gói đều bao gồm bản dùng thử miễn phí. Không cần thẻ tín dụng để bắt đầu.

Tính năng	Dùng thử miễn phí	Starter (2 $/tháng)	Pro (10 $/tháng)
Phiên âm âm thanh và video	✓	✓	✓
Hỗ trợ 99 ngôn ngữ	✓	✓	✓
Nhận dạng người nói	✓	✓	✓
Dấu thời gian	✓	✓	✓
Xuất: TXT, DOCX, SRT, VTT, JSON	✓	✓	✓
Dịch bản phiên âm (133 ngôn ngữ)	✓	✓	✓
Trình chỉnh sửa tích hợp	✓	✓	✓
Tóm tắt AI	—	✓	✓
Bot cuộc họp (Zoom, Meet, Teams)	—	✓	✓
Phiên âm hàng loạt	✓	✓	✓

Xem chi tiết bảng giá đầy đủ →

Câu hỏi thường gặp về tính năng

Sẵn sàng bắt đầu phiên âm?

Dùng thử VexaScribe miễn phí với 30 phút phiên âm. Không cần thẻ tín dụng.

Bắt đầu dùng thử miễn phí Xem bảng giá