Home/How to Generate Subtitles from Video

How to Generate Subtitles from Video (2026 Complete Guide)

4-step workflow: extract audio → AI transcribe → export SRT/VTT → attach to video. Takes 5 minutes. Works on any video length. $0–$10/month depending on volume.

By NovaScribe Editorial · Updated April 2026

Generate Subtitles from Video in 5 Minutes

1
0–2 min

Extract audio

Or skip — most AI tools accept video directly.

2
2–5 min

AI transcribe

Upload to NovaScribe, TurboScribe, or Descript.

3
Instant

Export SRT/VTT

Both formats from the same transcript.

4
1–3 min

Attach to video

YouTube, soft subs, burn-in, or sidecar.

AI subtitle generation in 2026 achieves 92–96% accuracy on clean audio. For most creators, that's sufficient for YouTube uploads and social media content. For institutional use (courses, broadcast), human review is recommended after AI draft. See our accuracy verdict for a full rubric.

Step 1: Extract Audio (or Skip to Step 2)

Most transcription tools accept video directly. Only extract audio if you hit file size limits or your tool is audio-only.

When to extract audio first

  • • Video file is > 2GB (most AI uploaders cap at 2GB)
  • • You only have the audio portion (podcast workflow)
  • • Tool only accepts audio (most legacy tools)

When to skip

  • • Video < 2GB and tool accepts video (NovaScribe, Descript)
  • • You want to preserve video timestamps
  • • Modern subtitle workflow — upload original MP4

Extraction methods

FFmpeg (technical — fastest)
ffmpeg -i input.mp4 -vn -acodec copy audio.aac

No re-encoding, keeps original audio quality, seconds to complete.

HandBrake (GUI)

Open video → Audio Source → Export as MP3 or AAC. Free, cross-platform, no command line.

Online tools (easiest for one-offs)

CloudConvert, Convertio — upload video, select audio format, download. Slower but zero setup. Note: don't use for confidential content.

Built into transcription tools

NovaScribe and Descript accept video files directly and handle audio extraction internally. Simplest workflow.

Step 2: Transcribe with AI

Upload your audio or video to an AI transcription tool. Takes 2–5 minutes for a 60-minute video on most platforms.

Tool choice by video type

Video TypeBest ToolWhy
YouTube long-form (10+ min)NovaScribe + YouTube upload$0.24/hr + YouTube auto-sync
Instagram/TikTok ReelsCapCut or platform auto-captionPlatform-native, 60s clips
Online course (2–10 hrs)TurboScribe unlimitedNo caps, batch upload
Live webinar recordingOtter.ai + post-editWas live transcribed, just export
Professional client videoDescript or NovaScribe + human reviewQuality + SRT export
Multilingual videoNovaScribe + translation100+ languages

Transcription process

  1. 1Upload video/audio file to your chosen tool
  2. 2Wait 2–5 minutes (real-time multiplier varies by tool)
  3. 3Review transcript in editor — fix the 3–5% of words AI gets wrong
  4. 4Check speaker labels if multi-speaker (may need manual assignment)
  5. 5Click export
Pro tip: For videos with specialized vocabulary (medical, legal, technical), AI will consistently mis-transcribe domain terms. Do a find-and-replace pass after transcription for common terms in your niche before exporting.

Step 3: Export as SRT or VTT

Multiple subtitle formats exist. SRT and VTT cover 95% of use cases. Export both — most tools generate them simultaneously.

FormatExtensionUse ForStylingSupport
SRT.srtMost video playersBasicUniversal
VTT.vttHTML5 web videoYesModern browsers
SBV.sbvYouTube (legacy)NoYouTube only
TTML.ttmlBroadcast TVFullProfessional
ASS/SSA.assAnime/GamingAdvancedSpecialized
Recommendation: Export both SRT and VTT from your transcription tool. Use SRT for video editors (Premiere, Final Cut, DaVinci Resolve). Use VTT for HTML5 web video and modern web players. Our SRT generator and subtitle generator produce both formats simultaneously.

Step 4: Attach Subtitles to Video

4 attachment methods, ranked by ease and use case.

1YouTube (easiest)

Easy
  1. Upload video to YouTube (via Studio)
  2. Go to Subtitles in Creator Studio
  3. Upload your .srt or .vtt file
  4. Choose language, review timing, publish

YouTube auto-generates captions too, but accuracy is 85–88% vs 92–96% from dedicated AI tools. Always upload your own for institutional content.

2Embed as soft subtitles (recommended for most)

Medium
  1. Premiere: Import SRT → Captions track → Embed in export
  2. Final Cut: Add captions track → Import SRT
  3. DaVinci Resolve: Import → Create captions from imported file
  4. Export video with caption track enabled

Soft subtitles let viewers toggle captions on/off. Best choice for web video, course platforms, and YouTube alternatives.

3Burn-in (hardcoded) subtitles

Easy (FFmpeg) / Medium (GUI)
  1. FFmpeg: ffmpeg -i video.mp4 -vf subtitles=captions.srt output.mp4
  2. HandBrake: Add SRT → enable "Burned In" checkbox
  3. Premiere: Render captions on top of video

Use burn-in when: Instagram Reels, TikTok, or other platforms that strip soft subs. Downside: subtitles are permanent and can't be turned off or translated later.

4Separate sidecar file

Easy
  1. Name SRT identically to video: video.mp4 + video.srt
  2. Distribute both files together
  3. VLC, Plex, modern TVs auto-load same-named .srt

Best for: technical viewers, self-hosted media servers, archival distribution. Not for platforms like YouTube or Instagram.

Quality Checklist Before Publishing

Verify before shipping — especially for institutional, educational, or professional content.

Timing
Subtitles appear ±0.25 seconds of spoken word
Duration
Each subtitle visible minimum 1 second, max 6 seconds
Line length
Max 42 characters per line, 2 lines max per caption
Reading speed
Max 17 characters per second (slower for kids)
Accuracy
Spot-check 5 random subtitles — correct words, correct speaker
Speaker labels
Include [Speaker 1] or - [Name]: for multi-speaker content
Non-speech cues
Include [music], [applause], [laughter] for accessibility
Sound effects
Add important audio cues relevant for understanding

Accessibility Compliance (ADA, FCC, WCAG)

Subtitles aren't just a nice-to-have — they're legally required for many institutional publishers. Here's what each standard actually requires.

ADA (US, public-facing content)

Captions required on public video content. AI auto-captions = "good faith effort" baseline but not full compliance.

Recommendation: AI draft + human review recommended for institutional publishers.

FCC (broadcast TV)

99%+ accuracy required for broadcast television captions.

Recommendation: AI alone insufficient. Standard practice: AI + human review + certified captioner for compliance sign-off.

YouTube auto-captions

Acceptable for ADA baseline, not for FCC.

Recommendation: Always upload reviewed SRT for institutional content; don&apos;t rely on YouTube&apos;s auto-generated only.

WCAG 2.1 AA (web)

Captions synchronized within 0.25 seconds of spoken word.

Recommendation: Any good AI subtitle output meets this. Verify timing in checklist above.

Section 504 / ADA (education)

Accurate captions on course video content required.

Recommendation: Best practice for universities: AI first pass + faculty/TA review before publishing to LMS.

Multilingual Subtitles

Publishing for international audiences? Here's the multi-language subtitle workflow.

  1. 1

    Generate English subtitles first

    English has the highest AI accuracy — use this as your source of truth baseline.

  2. 2

    Use AI translation for other languages

    NovaScribe supports 100+ language translation from any source. Translate the English SRT to target languages.

  3. 3

    Native speaker review strongly recommended

    AI translation is 85–92% accurate for European languages, 75–85% for Asian languages. Native review for critical content.

  4. 4

    Export separate SRT per language

    Naming convention: video.en.srt, video.es.srt, video.pt.srt. Upload each as a separate track.

  5. 5

    YouTube: add each language as separate subtitle track

    Creator Studio → Subtitles → Add Language → Upload SRT. Viewers select language via settings gear.

For language-specific tool comparisons, see our multilingual transcription comparison.

Cost of Subtitle Generation

$0
Free options

YouTube auto-captions, CapCut auto-captions (60s clips), Whisper self-hosted. Quality varies.

$2–$10/mo
Best value

NovaScribe Starter $2 (200 min), Pro $10 (2,500 min). TurboScribe $10 unlimited. 92–96% accuracy.

~$24/year
Weekly YouTube channel

52 videos × 10 min = 520 min/year on NovaScribe Pro = ~$24 in transcription amortized cost.

A typical 10-minute YouTube video costs ~$0.04 to subtitle on NovaScribe Pro. For full pricing details, see our podcast transcription cost guide (same math applies to video).

Common Mistakes to Avoid

Using auto-generated YouTube captions as final

YouTube auto-captions are 85–88% accurate. Dedicated AI tools are 92–96%. The extra 5–10% accuracy matters for professional content.

Not previewing subtitles before publishing

Timing errors are only visible when watching. A 5-minute preview saves embarrassment.

Burn-in when soft subs would work

Burn-in is irreversible. Don&apos;t do it unless your target platform (Instagram Reels, certain TikTok formats) strips soft subs.

Ignoring character-per-line limits

Over 42 characters per line breaks awkwardly on mobile. Tools rarely enforce this — you have to check.

Skipping speaker labels in multi-speaker content

Confuses viewers. Adds ~0 time to include. Makes content more professional.

Wrong aspect ratio planning

Subtitles can be cropped off on vertical (9:16) or square (1:1) video if not positioned correctly. Test on target platform.

Generate subtitles in 5 minutes

NovaScribe exports SRT, VTT, TXT, and DOCX from any video. $2/month covers 200 minutes — enough for ~20 short YouTube videos.

Start NovaScribe Free

Related Guides

Frequently Asked Questions

How do I generate subtitles from a video automatically?

The 4-step workflow: (1) Extract audio from video or upload video directly to an AI tool, (2) Transcribe with AI (NovaScribe, TurboScribe, Descript), (3) Export as SRT or VTT, (4) Attach to video via YouTube upload, embed as soft subs, burn-in with FFmpeg, or deliver as sidecar file. Full process takes about 5 minutes for a standard 10-minute video.

What&apos;s the best free subtitle generator?

YouTube Studio for YouTube videos (built-in auto-captions, 85–88% accuracy). CapCut for short-form video content. Whisper self-hosted for unlimited use with 92–96% accuracy (requires technical setup). TurboScribe free tier (3 files/month) is the best free option for occasional standalone SRT generation.

How accurate are auto-generated subtitles?

Dedicated AI tools (NovaScribe, TurboScribe, Descript): 92–96% on clear audio, 85–90% on noisy audio. YouTube&apos;s built-in auto-captions: 85–88%. Accuracy is highest with clean studio audio, single speaker, and neutral English accent. Accented speech and multi-speaker content reduce accuracy by 5–10%.

Can I generate subtitles in another language?

Yes. NovaScribe supports transcription in 100+ languages natively. For multilingual subtitles, the recommended workflow: generate English subtitles first (highest accuracy), then use AI translation for target languages, then native speaker review for critical content. Upload each language as a separate subtitle track on YouTube.

SRT or VTT: which format should I use?

Use SRT for universal compatibility — nearly all video players, editors (Premiere, Final Cut, DaVinci Resolve), and platforms (YouTube, Vimeo) support SRT. Use VTT for modern HTML5 web video with styling features. Most transcription tools export both simultaneously, so you don’t have to choose.

How long does it take to subtitle a 1-hour video?

With AI transcription: ~5 minutes (2–5 minutes to process + 2–3 minutes to review and attach to video). With human transcription: 12–48 hours turnaround at $90–$180 cost. For most use cases, AI is the right tradeoff. For legal or broadcast compliance (99%+ accuracy), human transcription is still the standard.

Do I need to burn in subtitles?

Only if the target platform strips soft subtitles. Instagram Reels and certain TikTok formats require burned-in (hardcoded) captions. For YouTube, Vimeo, and most platforms, soft subtitles are better because viewers can toggle them on/off and you can add subtitles in multiple languages. Don’t burn in unless you have to.

Are AI-generated subtitles ADA compliant?

AI subtitles count as a &quot;good faith effort&quot; baseline under ADA. For full compliance, especially for institutional publishers (universities, government, large enterprises), human review of AI-generated subtitles is recommended. For FCC broadcast captions, human review is required — 99%+ accuracy cannot be consistently achieved by AI alone.