What Is Transcription? (The Simple Definition)
Transcription is the process of converting spoken words from an audio or video recording into written text. (This page covers audio/video transcription — not biological transcription, which is a different process in genetics.)
For most of history, transcription was done by hand: stenographers took shorthand notes in courtrooms and boardrooms. Magnetic tape recorders made it possible to replay speech and type it out later. Digital recorders and software made files easier to manage and share. Today, AI transcription services process 1 hour of audio in under 5 minutes.
The 4 Types of Transcription
Not all transcription is the same. The type you need depends on your use case.
| Type | What It Includes | Used For | Example |
|---|---|---|---|
| Full Verbatim | Every word, filler (um, uh), pauses, laughter, non-verbal sounds | Legal, psychological research, linguistics | [laughs] Um, I— I think the, the contract... |
| Clean Verbatim | All meaningful words, fillers removed, grammar intact | Business meetings, journalism, most professional use | I think the contract covers this. |
| Edited | Grammar corrected, restructured for readability | Publishing, marketing, blog posts | The contract covers this scenario. |
| Phonetic | Sound-based notation of speech sounds | Linguistic analysis, dialect research | Not used in commercial transcription |
Most business transcription uses “clean verbatim” by default.
What Is a Transcript Used For?
Transcripts serve every industry that relies on spoken communication.
Legal
Depositions, court records, and legal proceedings. Verbatim accuracy for official documentation.
Research
Qualitative interviews, focus groups, and oral histories. Make recordings searchable and citable.
Journalism
Interview records, source quotes, and fact-checking. Never misquote a source again.
Content Creation
Podcast show notes, blog posts, YouTube chapters, and social media clips from your recordings.
Accessibility
Making audio and video content available to deaf and hard-of-hearing users. Required by ADA and WCAG.
Education
Lecture notes, study guides, and course materials. Helps students review and retain information.
Transcription vs Captions vs Subtitles: What's the Difference?
These three terms are often confused. Here's a clear breakdown.
| Transcription | Captions | Subtitles | |
|---|---|---|---|
| Has timestamps | No (optional) | Yes | Yes |
| Language | Same as audio | Same as audio | Different language |
| Purpose | Reading / searching | Accessibility | Translation |
| File format | TXT, DOCX, PDF | SRT, VTT | SRT, VTT |
| Used on | Documents | Video players | Video players |
How Transcription Works: Manual vs AI
Two approaches, very different trade-offs. See the full comparison →
Manual (Human) Transcription
AI Transcription
Transcription File Formats Explained
Different formats serve different purposes. NovaScribe exports all of them.
Plain Text
No formatting, universally compatible with any text editor or app.
Microsoft Word
Easy to edit and share. The most common format for professional transcripts.
PDF Document
Read-only format for distribution and archiving.
SubRip Subtitles
Timed subtitles for video platforms like YouTube and Vimeo.
WebVTT Captions
Web captions for HTML5 video players and streaming platforms.
When Is Transcription Legally Required?
For many organizations, transcription is not optional — it's a legal obligation.
ADA (Americans with Disabilities Act)
Requires accessible audio and video content for public-facing businesses. Organizations must provide transcripts or captions so deaf and hard-of-hearing users can access the same content.
Section 508 (US Federal)
US federal agencies must caption and transcribe all audio and video content. This applies to training materials, public announcements, recorded meetings, and online video.
WCAG 2.1 Level AA
Web Content Accessibility Guidelines require captions for all pre-recorded video content on websites. Level AA is the standard required by most accessibility laws and policies worldwide.
Consult a legal professional for specific compliance requirements.
How NovaScribe Transcription Works
Upload Your File
MP3, WAV, M4A, MP4, MOV, WEBM, and more. Drag and drop or browse to select your audio or video file.
AI Transcribes
NovaScribe's AI processes your file and generates an accurate text transcript — results ready in minutes.
Export Your Transcript
Download as TXT, DOCX, PDF, SRT, or VTT. Edit in-browser before exporting, or share a link directly.
Affordable Pricing
Based on Pro plan ($10/mo for 2,500 minutes). All export formats included at no extra cost.
View pricing plansTranscription FAQ
What is the difference between transcription and translation?
Transcription converts spoken words into written text in the same language — a spoken English interview becomes a written English transcript. Translation converts content from one language to another — a written English document becomes written Spanish. Some services combine both: transcription + translation produces a written transcript in a different language than the original speech.
What is verbatim transcription?
Verbatim transcription captures every spoken word exactly as said, including filler words (um, uh, like), false starts, repetitions, laughter, pauses, and non-verbal sounds. It’s used in legal proceedings, psychological research, and linguistic studies where the exact manner of speech is important. Most business transcription uses ‘clean verbatim’ instead, which removes fillers while keeping all meaningful content.
What is the difference between transcription, captions, and subtitles?
Transcription is a text document without timing data — for reading and searching. Captions are timed text synchronized with audio/video, shown in the same language as the spoken content — designed for accessibility (deaf and hard-of-hearing audiences). Subtitles are also timed and synchronized, but typically in a different language than the audio — used for translation. NovaScribe exports all three formats: TXT/DOCX for transcripts, SRT/VTT for captions and subtitles.
How long does transcription take?
AI transcription processes 1 hour of audio in 2–5 minutes. Human transcription typically takes 4–6 hours per hour of audio for a standard turnaround, or 12–24 hours for rush service. The actual time depends on audio quality, number of speakers, and subject complexity. AI services like NovaScribe are near-instant regardless of file length.
What file formats can I get my transcript in?
Common transcript formats include: TXT (plain text, universally compatible), DOCX (Microsoft Word, most common for editing), PDF (read-only sharing), SRT (SubRip — timed subtitles for video platforms), VTT (WebVTT — web captions for HTML5 video), and JSON (structured data for developers). NovaScribe exports TXT, DOCX, PDF, SRT, and VTT.
Is transcription important for accessibility?
Yes — transcription is a key accessibility tool. The Americans with Disabilities Act (ADA) and Web Content Accessibility Guidelines (WCAG 2.1 AA) require that audio and video content be accessible to deaf and hard-of-hearing users. Transcripts and captions fulfill this requirement. Universities, government agencies, and companies subject to Section 508 compliance must provide transcripts or captions for all recorded audio/video content.
How accurate is AI transcription?
AI transcription reaches 95–98% accuracy in ideal conditions — clear audio, single speaker, standard accent, general vocabulary. In challenging conditions (multiple speakers, background noise, heavy accents, technical jargon), accuracy typically falls to 70–90%. For most business use cases like meeting notes, podcast show notes, and YouTube captions, AI accuracy is more than sufficient.
What is the difference between transcription and dictation?
Dictation is the real-time process of speaking for immediate capture — like speaking to a voice assistant or dictating a letter. Transcription is the conversion of pre-recorded audio into text after the fact. The key difference is timing: dictation happens live, transcription happens later. Many AI transcription tools can also handle dictation (real-time speech-to-text), but the primary use case is post-recording conversion.
Note: This guide covers audio and video transcription for business, legal, research, and content creation. For biological transcription (DNA to RNA), refer to molecular biology resources.
Ready to convert your audio or video to text? NovaScribe handles every format, every accent, and every use case — from quick meeting notes to legally compliant captions.
Related Transcription Tools
AI vs Human Transcription
A detailed comparison of accuracy, speed, cost, and use cases.
Transcription Software
Review of the best transcription tools and software in 2026.
SRT Generator
Generate SRT subtitle files from any audio or video recording.
Interview Transcription
Transcribe interviews with speaker labels and accurate timestamps.