By NovaScribe Editorial · 8 tools tested April 2026

Best Transcription Software for Japanese Audio in 2026

The best tool for Japanese audio depends on use case. For standard business Japanese: Notta has a CJK edge (~5% CER, 93.7% multilingual accuracy). NovaScribe (Whisper-based, $2/mo) is a close second and significantly cheaper. For technical Japanese with specialized kanji: Sonix with custom vocabulary. Japanese is Tier 2 — significantly better than Arabic or Turkish but below French or German. Main challenges: keigo (honorific forms), technical kanji compounds, and script selection. Otter.ai does NOT support Japanese.

The right tool depends on your priorities: CJK-optimized accuracy → Notta ($13.99/mo). Best value + translation → NovaScribe ($2/mo). Unlimited volume → TurboScribe ($20/mo). Technical kanji → Sonix (custom vocabulary). Human-grade accuracy → Rev or Happy Scribe human plans.

Quick Decision Rule:

  • CJK-optimized accuracy → Notta ($13.99/mo, ~5% CER, CJK-specific training)
  • Best value + translation → NovaScribe ($2/mo, ~5% CER, free translation to 133 languages)
  • Unlimited Japanese transcription → TurboScribe ($20/mo, no minute limits)
  • Technical / medical kanji → Sonix (custom vocabulary) or Rev (human option)

Disclosure: NovaScribe is our product. We rank Notta first for Japanese because its CJK-specific engine has a genuine edge on keigo and formal Japanese. NovaScribe matches Notta on standard speech and is significantly cheaper ($2/mo vs $13.99/mo). All pricing verified on official sites April 8, 2026.

Key Takeaways

  • CJK leader: Notta — $13.99/mo, ~5% CER, CJK-optimized engine with keigo edge
  • Best value: NovaScribe — $2/mo, ~5% CER on standard Japanese, free translation to 133 languages
  • Japanese uses CER, not WER — no word boundaries mean Character Error Rate is the standard metric
  • Otter does NOT support Japanese — English-only. Don't waste time trying
  • Keigo is hard for AI: Polite form works, but complex honorifics are often flattened to plain form
  • Three scripts: AI must choose between kanji, hiragana, and katakana correctly — errors here are unique to CJK

Japanese Accuracy: CER vs WER

Japanese doesn't use spaces between words, so the standard Word Error Rate (WER) metric used for English and European languages doesn't apply. Instead, Japanese transcription accuracy is measured by Character Error Rate (CER) — the percentage of incorrectly transcribed characters. A 5% CER means roughly 5 wrong characters per 100.

Accuracy varies significantly depending on the formality level, domain, and dialect. Standard NHK-style broadcast Japanese is well-represented in training data and performs well. Business meetings with keigo, technical content with specialized kanji, and regional dialects are progressively harder.

ContextNotta (CJK)Whisper-basedOthers
Standard NHK-style (clean)~5%~5%~7–9%
Business meeting (standard keigo)~7–8%~8–10%~10–14%
Heavy keigo (formal)~10–12%~10–14%~14–18%
Technical / medical kanji~12–15%~14–18%~16–22%
Kansai dialect~8–10%~8–12%~12–16%
Japanese-English mixing~8–10%~8–12%~14–18%

Testing methodology: CER measured on 10+ hours of real-world Japanese audio per category. “Whisper-based” covers NovaScribe and TurboScribe (both use Whisper large-v3). “Others” includes tools with non-specialized Japanese models. Notta's CJK-specific training gives it a measurable edge on keigo and formal contexts.

Why Japanese Is Unique for AI Transcription

Japanese presents five distinct challenges that don't exist (or are much milder) for European languages. Understanding these helps you set realistic accuracy expectations and choose the right tool.

1. Three Writing Systems

Japanese uses three scripts simultaneously: 漢字 (kanji) — Chinese-derived characters for content words, ひらがな (hiragana) — native syllabary for grammar and native words, and カタカナ (katakana) — syllabary for foreign loanwords and emphasis. AI must choose the correct script for each word. Writing “coffee” as コーヒー (katakana) is correct; writing it in hiragana (こーひー) is an error. This script-selection problem is unique to CJK languages.

2. No Spaces Between Words

Japanese text has no spaces: 今日は天気がいいです (kyou wa tenki ga ii desu — “the weather is nice today”). AI must segment the continuous character stream into words, which affects both accuracy measurement (CER vs WER) and downstream tasks like translation. Incorrect segmentation can change meaning entirely.

3. Keigo (Politeness System)

Japanese has three formal levels: 丁寧語 (teineigo) — standard polite form, 尊敬語 (sonkeigo) — respectful form for others' actions, and 謙譲語 (kenjōgo) — humble form for your own actions. The same verb changes form across levels: “to eat” = 食べます (polite) / 召し上がります (respectful) / いただきます (humble). AI often flattens formal keigo to plain form.

4. Homophone Kanji

Many kanji share the same pronunciation but have different meanings. Classic example: はし (hashi) can be 橋 (bridge), 箸 (chopsticks), or 端 (edge). AI resolves homophones using context, but specialized domains with unusual kanji compounds cause errors. This is why custom vocabulary (Sonix) helps — it biases the model toward domain-relevant kanji.

5. Wasei-eigo (和製英語)

Japanese has many English-derived words with Japanese pronunciation and sometimes altered meaning: パソコン (pasokon — personal computer), サラリーマン (sarariman — salaryman/office worker), マンション (manshon — apartment, not mansion). AI must recognize these as Japanese words written in katakana, not attempts at English. Whisper-based tools handle common wasei-eigo well but may stumble on newer coinages.

Otter.ai Does NOT Support Japanese

Otter.ai is English-only. There is no Japanese language option, no Japanese model, and no announced plans to add Japanese support. If you upload Japanese audio to Otter, you will get garbled English text — not a Japanese transcript.

This matters because Otter frequently appears in “best transcription software” listicles without language qualifiers. If you found this page searching for Japanese transcription, skip Otter entirely. The same applies to Zoom's built-in transcription — it does not reliably support Japanese.

Japanese-capable alternatives: Notta ($13.99/mo, CJK-optimized), NovaScribe ($2/mo), TurboScribe ($20/mo), Sonix ($10/hr), Amberscript (€25/mo), Happy Scribe (€17/mo), Rev ($0.25/min AI), Descript ($24/mo).

Quick Picks: 8 Japanese Transcription Tools

ToolPriceJapanese CERStandout Feature
NottaCJK Leader
$13.99/mo~5% CERCJK-optimized engine, bilingual pairs
NovaScribeBest Value
$2/mo~5% CERFree translation to 133 languages
TurboScribeVolume Pick
$20/mo~5% CERUnlimited transcription on paid plans
SonixCustom Vocab
$10/hr~6% CERCustom glossaries for technical kanji
AmberscriptEU Native
€25/mo~7% CEREU-hosted, GDPR by design
Happy ScribeHuman Option
€17/mo~7% CERHuman proofreading available
RevHuman Option
$0.25/min AI~7% CERHuman Japanese transcription available
DescriptVideo + Japanese
$24/mo~8% CERTranscription inside video editor

All pricing verified April 2026. CER measured on standard Japanese test set.

Keigo: The Formality Challenge

Keigo is Japanese's multi-layered politeness system. It's not just vocabulary — it changes verb forms, sentence structure, and even the words used for common actions. AI transcription tools handle the levels unevenly.

Level“to eat”Used whenAI accuracy
Plain (普通体)食べる (taberu)Friends, casualGood
Polite (丁寧語)食べます (tabemasu)Standard businessGood
Respectful (尊敬語)召し上がります (meshiagarimasu)Referring to superiorsPartial
Humble (謙譲語)いただきます (itadakimasu)Your own actions (to superiors)Partial

The Flattening Problem

AI tools tend to “flatten” complex keigo to plain or polite form. A speaker saying 召し上がりますか (meshiagarimasu ka — “would you like to eat?” in respectful form) may be transcribed as 食べますか (tabemasu ka — polite but not respectful). The meaning is preserved, but the formality level is lost. In business and government contexts where keigo precision matters, this is a significant issue.

Impact on Formal Transcripts

For board meetings, government proceedings, or customer-facing communications where keigo is expected, AI transcripts will sound inappropriately casual. The content is correct but the register is wrong. This can be offensive in Japanese business culture where formality signals respect.

Workaround: For formal keigo-heavy content, use AI transcription as a draft and have a native Japanese speaker review and restore the correct formality levels. Notta's CJK engine preserves keigo slightly better than Whisper-based tools, but human review is still needed for critical formal documents.

Japanese-English Bilingual Workflows

One of the most common use cases: transcribe a Japanese meeting, then get an English summary for international stakeholders. Here's how the tools handle this workflow.

Notta: Bilingual Pairs

Notta supports bilingual transcription — it can output Japanese and English simultaneously for meetings where both languages are spoken. The CJK-optimized engine handles Japanese-English code-switching at ~8–10% CER. For teams with both Japanese and English speakers, this is the most streamlined option.

NovaScribe: Transcribe + Translate

Upload Japanese audio → get Japanese transcript → translate to English (or any of 133 languages) with one click. Translation is included free on all plans. Japanese-to-English machine translation is mature and works well for standard business content. The $2/mo price point makes this the most cost-effective transcribe-and-translate workflow.

Quality Considerations

Japanese-to-English translation quality is generally high for factual content. Keigo nuance is typically lost — respectful and humble forms are translated the same as plain form. Idiomatic expressions and cultural references may need human review. For customer-facing translations, consider human review on top of AI.

Keigo in Translation

Even when AI correctly transcribes keigo in Japanese, the formality distinctions are lost in English translation. English doesn't have equivalent politeness levels, so 召し上がりますか and 食べますか both become “would you like to eat?” This is a fundamental limitation of Japanese-to-English translation, not a tool-specific issue.

Detailed Reviews: 8 Japanese Transcription Tools

CJK Optimized

Notta

Best for: CJK-optimized Japanese transcription
Price: Free (120 min/mo) | Pro $13.99/seat/mo
Japanese CER: ~5% standard | Languages: 58
Pricing source: notta.ai/pricing (verified Apr 2026)

Notta is the CJK leader for a reason: its engine is specifically trained on Japanese, Chinese, and Korean audio data. On standard Japanese, it matches Whisper at ~5% CER. Where it pulls ahead is keigo and formal business Japanese — ~7–8% CER vs Whisper's ~8–10%. The bilingual pair feature (Japanese + English simultaneous output) is unique and valuable for international teams.

Real-time transcription for meetings works well with standard business Japanese. The 58-language support is narrower than NovaScribe (99) but covers all major Asian and European languages. At $13.99/seat/mo, it's 7x the price of NovaScribe but the CJK optimization is genuine, not marketing.

Strengths:

  • ✓ CJK-specific training data and engine
  • ✓ Better keigo handling than Whisper-based tools
  • ✓ Bilingual Japanese-English transcription
  • ✓ Real-time meeting transcription

Weaknesses:

  • ✗ $13.99/seat/mo (7x NovaScribe price)
  • ✗ 58 languages (vs 99 NovaScribe)
  • ✗ No custom vocabulary for technical kanji
  • ✗ No built-in translation to non-CJK languages
Choose if: Japanese accuracy is your top priority and you need CJK-optimized keigo handling. Worth the premium for formal business Japanese where politeness levels matter.
Best Value

NovaScribe

Best for: Budget Japanese transcription with translation
Price: $2/mo individual | Team from $35/mo
Japanese CER: ~5% standard | Languages: 99
Pricing source: novascribe.ai/pricing (verified Apr 2026)

NovaScribe uses Whisper large-v3 under the hood, achieving ~5% CER on standard Japanese — matching Notta on clean speech. The killer feature is built-in translation to 133 languages, included free on all plans. Upload Japanese audio, get a Japanese transcript, click translate, get English (or French, German, etc.) — all for $2/mo. The meeting bot supports Zoom, Meet, and Teams for live Japanese meeting transcription.

Where NovaScribe falls slightly behind Notta: keigo-heavy formal Japanese (~8–10% CER vs Notta's ~7–8%) and business meetings with complex honorifics. For standard Japanese — interviews, podcasts, lectures, casual meetings — the accuracy is equivalent at a fraction of the cost. GDPR-compliant but US-hosted.

Strengths:

  • ✓ ~5% CER on standard Japanese (Whisper large-v3)
  • ✓ Free translation to 133 languages included
  • ✓ $2/mo — cheapest Japanese transcription
  • ✓ Meeting bot for Zoom/Meet/Teams

Weaknesses:

  • ✗ Keigo handling behind Notta (~8–10% vs ~7–8% CER)
  • ✗ No CJK-specific engine optimization
  • ✗ US-hosted (not EU data residency)
  • ✗ Minute-based limits on individual plan
Choose if: You want the cheapest Japanese transcription with built-in translation. Best for freelancers, content creators, and teams who need Japanese + other languages without paying 7x for CJK optimization.
Volume Pick

TurboScribe

Best for: High-volume Japanese transcription
Price: Free (3/day) | Pro $20/mo unlimited
Japanese CER: ~5% standard | Languages: 98
Pricing source: turboscribe.ai/pricing (verified Apr 2026)

TurboScribe uses the same Whisper large-v3 model as NovaScribe, so Japanese accuracy is identical (~5% CER on standard speech). The advantage: unlimited transcription on the $20/mo Pro plan. If you batch-process Japanese podcast archives, lecture recordings, or interview libraries, TurboScribe avoids per-minute costs. The free tier gives 3 transcriptions per day to test Japanese quality.

The trade-off vs. NovaScribe: 10x the price ($20 vs. $2/mo) but no minute limits. No built-in translation — you'll need to export and use a separate tool for Japanese-to-English conversion. No meeting bot for live transcription.

Strengths:

  • ✓ Unlimited transcription on Pro ($20/mo)
  • ✓ ~5% CER on standard Japanese (Whisper large-v3)
  • ✓ Batch upload for high-volume workflows
  • ✓ Free tier to test Japanese quality

Weaknesses:

  • ✗ $20/mo vs. $2/mo NovaScribe for low volume
  • ✗ No built-in translation
  • ✗ No CJK-specific optimization (generic Whisper)
  • ✗ No meeting bot or real-time transcription
Choose if: You transcribe large volumes of Japanese audio regularly and need unlimited minutes. If volume is low, NovaScribe at $2/mo is more cost-effective. If CJK optimization matters, use Notta.
Best Custom Vocab

Sonix

Best for: Technical Japanese with custom kanji glossaries
Price: $10/hr pay-as-you-go | Premium $5/hr + $22/mo
Japanese CER: ~6% standard | Languages: 53
Pricing source: sonix.ai/pricing (verified Apr 2026)

Sonix's standout feature for Japanese is custom vocabulary. You can upload glossaries of domain-specific kanji compounds — medical terms, legal jargon, company-specific terminology — and the model biases toward those characters. This is critical for technical Japanese where rare kanji compounds cause errors on general-purpose tools. A cardiology glossary, for example, can reduce CER on medical Japanese from ~15% to ~10%.

Base accuracy on standard Japanese (~6% CER) is slightly behind Whisper-based tools, but the custom vocabulary feature more than compensates in specialized domains. Translation is available as an add-on. The pay-as-you-go pricing ($10/hr) works well for occasional technical transcription.

Strengths:

  • ✓ Custom vocabulary for domain-specific kanji
  • ✓ Significantly better on technical/medical Japanese
  • ✓ In-browser transcript editor
  • ✓ Translation available as add-on

Weaknesses:

  • ✗ ~6% CER baseline (slightly behind Whisper)
  • ✗ $10/hr is expensive for high volume
  • ✗ 53 languages (narrower than competitors)
  • ✗ Custom vocab setup requires domain expertise
Choose if: You transcribe technical, medical, or legal Japanese with specialized kanji that general tools get wrong. The custom vocabulary feature is worth the premium for domain-specific accuracy.
EU Native

Amberscript

Best for: EU-hosted Japanese transcription
Price: AI from €25/mo | Human available
Japanese CER: ~7% standard | Languages: 39
Pricing source: amberscript.com/pricing (verified Apr 2026)

Amberscript is primarily a European language specialist, but supports Japanese AI transcription. The EU-hosted infrastructure (Amsterdam) is relevant for organizations processing Japanese audio under EU data protection requirements — e.g., EU-based companies with Japanese clients. Japanese accuracy (~7% CER) is behind Notta and Whisper-based tools but serviceable for general content.

The 39-language coverage focuses on European and major Asian languages. Human transcription is available but with limited Japanese native speaker capacity compared to CJK-focused services. Best suited for EU organizations that need occasional Japanese transcription alongside their primary European language workflows.

Strengths:

  • ✓ EU-hosted (Amsterdam) — GDPR by design
  • ✓ ISO 27001 certified
  • ✓ In-browser transcript editor
  • ✓ Human transcription option

Weaknesses:

  • ✗ ~7% CER (behind Notta and Whisper tools)
  • ✗ Not CJK-optimized
  • ✗ Limited Japanese human transcriber pool
  • ✗ €25/mo starting price
Choose if: You need EU data residency for Japanese audio processing. For Japanese-first workflows without EU requirements, Notta or NovaScribe deliver better accuracy at lower cost.
Human Option

Happy Scribe

Best for: Japanese transcription with human proofreading
Price: AI from €17/mo | Human €2.00/min
Japanese CER: ~7% AI | Languages: 62
Pricing source: happyscribe.com/pricing (verified Apr 2026)

Happy Scribe offers both AI and human Japanese transcription. The human option is particularly valuable for keigo-heavy content where AI flattens formality levels — native Japanese proofreaders can restore the correct register. AI accuracy on standard Japanese (~7% CER) is adequate but behind Notta and Whisper-based tools.

The Barcelona-based EU hosting is a plus for European organizations. 62-language support covers Japanese alongside European languages. The in-browser editor with synchronized audio playback makes reviewing Japanese transcripts easier. Human Japanese transcription at €2.00/min is more expensive than European language rates due to specialist availability.

Strengths:

  • ✓ Human Japanese proofreading available
  • ✓ EU-hosted (Barcelona) — GDPR by design
  • ✓ In-browser editor with audio sync
  • ✓ 62 languages for multilingual workflows

Weaknesses:

  • ✗ ~7% CER AI (behind Notta and Whisper tools)
  • ✗ Human Japanese transcription €2.00/min (premium)
  • ✗ Not CJK-optimized
  • ✗ Slower turnaround on Japanese human orders
Choose if: You need human-reviewed Japanese transcripts with correct keigo, or you're an EU organization needing reliable Japanese alongside European languages.
Human Option

Rev

Best for: Human Japanese transcription with NDA
Price: AI $0.25/min | Human pricing varies
Japanese CER: ~7% AI | Languages: 36
Pricing source: rev.com/pricing (verified Apr 2026)

Rev's strength is human transcription with NDA availability — important for confidential Japanese business audio. The AI engine handles Japanese at ~7% CER, behind Notta and Whisper-based tools. Human Japanese transcription is available through Rev's global freelancer network, though turnaround times are longer than for English.

The NDA option is particularly relevant for Japanese corporate content where confidentiality matters. Rev's 36-language coverage is the narrowest in this comparison but includes the major languages. For Japanese-specific workflows, the per-minute pricing can get expensive at scale.

Strengths:

  • ✓ Human Japanese transcription available
  • ✓ NDA available for confidential content
  • ✓ Pay-per-minute (no subscription required)
  • ✓ Established reputation and reliability

Weaknesses:

  • ✗ ~7% CER AI (behind Notta and Whisper tools)
  • ✗ Per-minute pricing expensive at scale
  • ✗ 36 languages (narrowest coverage)
  • ✗ Longer turnaround for Japanese human orders
Choose if: You need human Japanese transcription with NDA for confidential content, or you prefer pay-per-minute without a subscription commitment.
Video + Japanese

Descript

Best for: Japanese video editing with transcription
Price: Free (1 project) | Pro $24/mo
Japanese CER: ~8% standard | Languages: 23
Pricing source: descript.com/pricing (verified Apr 2026)

Descript is a video and podcast editor with built-in transcription. For Japanese content creators who edit video by editing text, the workflow is unique — delete a sentence from the Japanese transcript and the corresponding video clip is removed. Japanese transcription at ~8% CER is the weakest in this comparison but serviceable for content creation workflows where perfect accuracy isn't critical.

The 23-language limit includes Japanese but is restrictive compared to competitors. At $24/mo, Descript is expensive purely for transcription — you're paying for the video editor. Japanese subtitle generation is automatic. If you only need Japanese transcripts without video editing, NovaScribe at $2/mo or Notta at $13.99/mo are far better options.

Strengths:

  • ✓ Text-based video editing (unique workflow)
  • ✓ Japanese subtitles generated automatically
  • ✓ Screen recording with Japanese transcription
  • ✓ Filler word removal

Weaknesses:

  • ✗ ~8% CER — weakest Japanese accuracy
  • ✗ $24/mo expensive for transcription alone
  • ✗ Only 23 languages
  • ✗ No CJK optimization
Choose if: You create Japanese video or podcast content and want transcription integrated into your editing workflow. Not cost-effective as a standalone Japanese transcription tool.

Tools That Do NOT Support Japanese

These popular transcription tools appear in many comparison articles but do not reliably support Japanese. If you're specifically looking for Japanese transcription, avoid them.

ToolLanguage SupportNotes
Otter.aiEnglish onlyNo Japanese, no plans announced. Use Notta or NovaScribe instead.
Zoom native transcriptionEnglish + limitedZoom’s built-in transcription does not support Japanese reliably. Use a dedicated tool.

Bottom line: If Japanese is your primary language, start with Notta ($13.99/mo, CJK-optimized) or NovaScribe ($2/mo, Whisper-based). Both deliver ~5% CER on standard Japanese with reliable Japanese language models.

Frequently Asked Questions

What is the most accurate tool for Japanese transcription?

Notta (~5% CER, CJK-optimized) and NovaScribe (~5% CER, Whisper-based) are tied on standard speech. Notta has a slight edge on keigo and formal Japanese due to CJK-specific training.

Does Otter.ai support Japanese?

No. Otter.ai is English-only. For Japanese, use Notta (CJK-optimized), NovaScribe (Whisper-based), or TurboScribe.

What's the difference between CER and WER for Japanese?

Japanese doesn't use spaces between words, so Character Error Rate (CER) is used instead of Word Error Rate. 5% CER means roughly 5 incorrect characters per 100. This is the standard accuracy metric for Japanese and other CJK languages.

How does AI handle kanji correctly?

Well on common kanji. Poorly on rare or specialized kanji compounds. Homophone kanji (e.g., 橋 bridge vs 箸 chopsticks, both 'hashi') are resolved by context. Custom vocabulary tools like Sonix can improve accuracy on domain-specific kanji.

Can AI handle keigo (honorific Japanese) correctly?

Partially. Standard polite form (丁寧語 teineigo) is handled well. Complex honorific (尊敬語 sonkeigo) and humble (謙譲語 kenjōgo) forms are often flattened to plain form, making formal transcripts sound inappropriately casual.

Which tool for Japanese meetings?

NovaScribe ($2–$35/mo with meeting bot for Zoom/Meet/Teams) or Notta ($13.99/seat with CJK optimization). Both handle standard business Japanese well.

Can I translate Japanese transcripts to English?

Yes. NovaScribe includes free translation to 133 languages. Japanese-to-English machine translation is mature — transcript-then-translate works well for standard Japanese. Keigo nuance may be lost in translation.

What about Kansai dialect (Osaka/Kyoto)?

+3–5% error rate vs standard Japanese. Distinctive vocabulary (e.g., 'ookini' instead of 'arigatō') may be misrecognized. Most Whisper-based tools handle Kansai acceptably for informal content.

Test Japanese Transcription

Start with 30 free minutes. No credit card required. ~5% CER on standard Japanese with free translation to 133 languages.