How to Generate Subtitles from Video (2026 Complete Guide)
4-step workflow: extract audio → AI transcribe → export SRT/VTT → attach to video. Takes 5 minutes. Works on any video length. $0–$10/month depending on volume.
By NovaScribe Editorial · Updated April 2026
Generate Subtitles from Video in 5 Minutes
Extract audio
Or skip — most AI tools accept video directly.
AI transcribe
Upload to NovaScribe, TurboScribe, or Descript.
Export SRT/VTT
Both formats from the same transcript.
Attach to video
YouTube, soft subs, burn-in, or sidecar.
AI subtitle generation in 2026 achieves 92–96% accuracy on clean audio. For most creators, that's sufficient for YouTube uploads and social media content. For institutional use (courses, broadcast), human review is recommended after AI draft. See our accuracy verdict for a full rubric.
Step 1: Extract Audio (or Skip to Step 2)
Most transcription tools accept video directly. Only extract audio if you hit file size limits or your tool is audio-only.
When to extract audio first
- • Video file is > 2GB (most AI uploaders cap at 2GB)
- • You only have the audio portion (podcast workflow)
- • Tool only accepts audio (most legacy tools)
When to skip
- • Video < 2GB and tool accepts video (NovaScribe, Descript)
- • You want to preserve video timestamps
- • Modern subtitle workflow — upload original MP4
Extraction methods
ffmpeg -i input.mp4 -vn -acodec copy audio.aacNo re-encoding, keeps original audio quality, seconds to complete.
Open video → Audio Source → Export as MP3 or AAC. Free, cross-platform, no command line.
CloudConvert, Convertio — upload video, select audio format, download. Slower but zero setup. Note: don't use for confidential content.
NovaScribe and Descript accept video files directly and handle audio extraction internally. Simplest workflow.
Step 2: Transcribe with AI
Upload your audio or video to an AI transcription tool. Takes 2–5 minutes for a 60-minute video on most platforms.
Tool choice by video type
| Video Type | Best Tool | Why |
|---|---|---|
| YouTube long-form (10+ min) | NovaScribe + YouTube upload | $0.24/hr + YouTube auto-sync |
| Instagram/TikTok Reels | CapCut or platform auto-caption | Platform-native, 60s clips |
| Online course (2–10 hrs) | TurboScribe unlimited | No caps, batch upload |
| Live webinar recording | Otter.ai + post-edit | Was live transcribed, just export |
| Professional client video | Descript or NovaScribe + human review | Quality + SRT export |
| Multilingual video | NovaScribe + translation | 100+ languages |
Transcription process
- 1Upload video/audio file to your chosen tool
- 2Wait 2–5 minutes (real-time multiplier varies by tool)
- 3Review transcript in editor — fix the 3–5% of words AI gets wrong
- 4Check speaker labels if multi-speaker (may need manual assignment)
- 5Click export
Step 3: Export as SRT or VTT
Multiple subtitle formats exist. SRT and VTT cover 95% of use cases. Export both — most tools generate them simultaneously.
| Format | Extension | Use For | Styling | Support |
|---|---|---|---|---|
| SRT | .srt | Most video players | Basic | Universal |
| VTT | .vtt | HTML5 web video | Yes | Modern browsers |
| SBV | .sbv | YouTube (legacy) | No | YouTube only |
| TTML | .ttml | Broadcast TV | Full | Professional |
| ASS/SSA | .ass | Anime/Gaming | Advanced | Specialized |
Step 4: Attach Subtitles to Video
4 attachment methods, ranked by ease and use case.
1YouTube (easiest)
Easy- →Upload video to YouTube (via Studio)
- →Go to Subtitles in Creator Studio
- →Upload your .srt or .vtt file
- →Choose language, review timing, publish
YouTube auto-generates captions too, but accuracy is 85–88% vs 92–96% from dedicated AI tools. Always upload your own for institutional content.
2Embed as soft subtitles (recommended for most)
Medium- →Premiere: Import SRT → Captions track → Embed in export
- →Final Cut: Add captions track → Import SRT
- →DaVinci Resolve: Import → Create captions from imported file
- →Export video with caption track enabled
Soft subtitles let viewers toggle captions on/off. Best choice for web video, course platforms, and YouTube alternatives.
3Burn-in (hardcoded) subtitles
Easy (FFmpeg) / Medium (GUI)- →FFmpeg: ffmpeg -i video.mp4 -vf subtitles=captions.srt output.mp4
- →HandBrake: Add SRT → enable "Burned In" checkbox
- →Premiere: Render captions on top of video
Use burn-in when: Instagram Reels, TikTok, or other platforms that strip soft subs. Downside: subtitles are permanent and can't be turned off or translated later.
4Separate sidecar file
Easy- →Name SRT identically to video: video.mp4 + video.srt
- →Distribute both files together
- →VLC, Plex, modern TVs auto-load same-named .srt
Best for: technical viewers, self-hosted media servers, archival distribution. Not for platforms like YouTube or Instagram.
Quality Checklist Before Publishing
Verify before shipping — especially for institutional, educational, or professional content.
Accessibility Compliance (ADA, FCC, WCAG)
Subtitles aren't just a nice-to-have — they're legally required for many institutional publishers. Here's what each standard actually requires.
ADA (US, public-facing content)
Captions required on public video content. AI auto-captions = "good faith effort" baseline but not full compliance.
Recommendation: AI draft + human review recommended for institutional publishers.
FCC (broadcast TV)
99%+ accuracy required for broadcast television captions.
Recommendation: AI alone insufficient. Standard practice: AI + human review + certified captioner for compliance sign-off.
YouTube auto-captions
Acceptable for ADA baseline, not for FCC.
Recommendation: Always upload reviewed SRT for institutional content; don't rely on YouTube's auto-generated only.
WCAG 2.1 AA (web)
Captions synchronized within 0.25 seconds of spoken word.
Recommendation: Any good AI subtitle output meets this. Verify timing in checklist above.
Section 504 / ADA (education)
Accurate captions on course video content required.
Recommendation: Best practice for universities: AI first pass + faculty/TA review before publishing to LMS.
Multilingual Subtitles
Publishing for international audiences? Here's the multi-language subtitle workflow.
- 1
Generate English subtitles first
English has the highest AI accuracy — use this as your source of truth baseline.
- 2
Use AI translation for other languages
NovaScribe supports 100+ language translation from any source. Translate the English SRT to target languages.
- 3
Native speaker review strongly recommended
AI translation is 85–92% accurate for European languages, 75–85% for Asian languages. Native review for critical content.
- 4
Export separate SRT per language
Naming convention: video.en.srt, video.es.srt, video.pt.srt. Upload each as a separate track.
- 5
YouTube: add each language as separate subtitle track
Creator Studio → Subtitles → Add Language → Upload SRT. Viewers select language via settings gear.
Cost of Subtitle Generation
YouTube auto-captions, CapCut auto-captions (60s clips), Whisper self-hosted. Quality varies.
NovaScribe Starter $2 (200 min), Pro $10 (2,500 min). TurboScribe $10 unlimited. 92–96% accuracy.
52 videos × 10 min = 520 min/year on NovaScribe Pro = ~$24 in transcription amortized cost.
A typical 10-minute YouTube video costs ~$0.04 to subtitle on NovaScribe Pro. For full pricing details, see our podcast transcription cost guide (same math applies to video).
Common Mistakes to Avoid
✗Using auto-generated YouTube captions as final
YouTube auto-captions are 85–88% accurate. Dedicated AI tools are 92–96%. The extra 5–10% accuracy matters for professional content.
✗Not previewing subtitles before publishing
Timing errors are only visible when watching. A 5-minute preview saves embarrassment.
✗Burn-in when soft subs would work
Burn-in is irreversible. Don't do it unless your target platform (Instagram Reels, certain TikTok formats) strips soft subs.
✗Ignoring character-per-line limits
Over 42 characters per line breaks awkwardly on mobile. Tools rarely enforce this — you have to check.
✗Skipping speaker labels in multi-speaker content
Confuses viewers. Adds ~0 time to include. Makes content more professional.
✗Wrong aspect ratio planning
Subtitles can be cropped off on vertical (9:16) or square (1:1) video if not positioned correctly. Test on target platform.
Generate subtitles in 5 minutes
NovaScribe exports SRT, VTT, TXT, and DOCX from any video. $2/month covers 200 minutes — enough for ~20 short YouTube videos.
Start NovaScribe FreeRelated Guides
Frequently Asked Questions
How do I generate subtitles from a video automatically?
The 4-step workflow: (1) Extract audio from video or upload video directly to an AI tool, (2) Transcribe with AI (NovaScribe, TurboScribe, Descript), (3) Export as SRT or VTT, (4) Attach to video via YouTube upload, embed as soft subs, burn-in with FFmpeg, or deliver as sidecar file. Full process takes about 5 minutes for a standard 10-minute video.
What's the best free subtitle generator?
YouTube Studio for YouTube videos (built-in auto-captions, 85–88% accuracy). CapCut for short-form video content. Whisper self-hosted for unlimited use with 92–96% accuracy (requires technical setup). TurboScribe free tier (3 files/month) is the best free option for occasional standalone SRT generation.
How accurate are auto-generated subtitles?
Dedicated AI tools (NovaScribe, TurboScribe, Descript): 92–96% on clear audio, 85–90% on noisy audio. YouTube's built-in auto-captions: 85–88%. Accuracy is highest with clean studio audio, single speaker, and neutral English accent. Accented speech and multi-speaker content reduce accuracy by 5–10%.
Can I generate subtitles in another language?
Yes. NovaScribe supports transcription in 100+ languages natively. For multilingual subtitles, the recommended workflow: generate English subtitles first (highest accuracy), then use AI translation for target languages, then native speaker review for critical content. Upload each language as a separate subtitle track on YouTube.
SRT or VTT: which format should I use?
Use SRT for universal compatibility — nearly all video players, editors (Premiere, Final Cut, DaVinci Resolve), and platforms (YouTube, Vimeo) support SRT. Use VTT for modern HTML5 web video with styling features. Most transcription tools export both simultaneously, so you don’t have to choose.
How long does it take to subtitle a 1-hour video?
With AI transcription: ~5 minutes (2–5 minutes to process + 2–3 minutes to review and attach to video). With human transcription: 12–48 hours turnaround at $90–$180 cost. For most use cases, AI is the right tradeoff. For legal or broadcast compliance (99%+ accuracy), human transcription is still the standard.
Do I need to burn in subtitles?
Only if the target platform strips soft subtitles. Instagram Reels and certain TikTok formats require burned-in (hardcoded) captions. For YouTube, Vimeo, and most platforms, soft subtitles are better because viewers can toggle them on/off and you can add subtitles in multiple languages. Don’t burn in unless you have to.
Are AI-generated subtitles ADA compliant?
AI subtitles count as a "good faith effort" baseline under ADA. For full compliance, especially for institutional publishers (universities, government, large enterprises), human review of AI-generated subtitles is recommended. For FCC broadcast captions, human review is required — 99%+ accuracy cannot be consistently achieved by AI alone.