What Is a VTT File? WebVTT Format Explained + Sample + How to Create One (2026)
A WebVTT file (Web Video Text Tracks Format, .vtt extension, MIME type text/vtt) is a plain-text UTF-8 subtitle format published as a W3C Community Group Report (May 2013, latest editor's draft 2024) for use with the HTML5 <track> element (2014). Every file begins with the literal string 'WEBVTT' on line one; cues use hh:mm:ss.mmm timing separated by '-->' (period before milliseconds, unlike SRT's comma). Natively supported by all major browsers (Chrome, Firefox, Safari, Edge). Preferred over SRT for HTML5 web video; SRT remains more universal across desktop players and NLEs.
TL;DR
A WebVTT file (.vtt) is a plain-text UTF-8 subtitle format published by the W3C in May 2013 for HTML5 video. Every file starts with the literal string WEBVTT on line one, followed by cues that combine timing (hh:mm:ss.mmm --> hh:mm:ss.mmm, note the PERIOD before milliseconds) and text. VTT is natively supported by every major browser through the HTML5 <track> element, and it adds CSS styling, positioning cues, and inline classes that SRT can't do. Use VTT for HTML5 web video; use SRT for max compatibility everywhere else.
WebVTT Defined
WebVTT is the Web Video Text Tracks Format — a plain-text subtitle and caption format built specifically for HTML5 video. The name is a mouthful, so most people just say “VTT” or “WebVTT”. Here are the facts you need to know before touching one:
- Full name: Web Video Text Tracks Format
- Extension:
.vtt - MIME type:
text/vtt - Encoding: UTF-8, required by spec, no BOM
- Standardized by: W3C — published as a Community Group Report in May 2013; latest editor's draft (2024) refines cue positioning, styling, and metadata handling
- Designed for: the HTML5
<track>element, which reached W3C Recommendation status in October 2014 - Extends SRT with: CSS-like styling, positioning cues (line / position / align), cue metadata, and
NOTEblocks for inline comments
WebVTT was designed to solve a problem SRT ignored: how do you put styled, positioned captions on a native web video without shipping a JavaScript rendering engine? The answer is a text format the browser parses itself, which is exactly what VTT does through the HTML5 <track> element.
References: W3C — WebVTT: The Web Video Text Tracks Format; MDN Web Docs — WebVTT API; WHATWG HTML Living Standard — the track element.
Anatomy of a WebVTT Cue
A WebVTT file is a header line, a blank line, and a sequence of cues separated by blank lines. Each cue combines an optional identifier, a timing line, one or two lines of text, and (optionally) inline settings and formatting. Here's a real, valid WebVTT sample you can save as demo.vtt and load in any HTML5 <video>:
WEBVTT
1
00:00:00.000 --> 00:00:04.500
Welcome to the podcast. Today we're
talking about productivity tips.
2
00:00:04.500 --> 00:00:08.750
Thanks for having me on. I've been
working remotely for five years.
3
00:00:08.750 --> 00:00:12.000 line:90% position:50% align:middle
Great to have you. What's your
number one tip?Breaking down every part
WEBVTT. REQUIRED — the file is invalid without it, and browsers reject it silently.1, intro, or any string without -->. Useful for CSS ::cue(#intro) targeting or JavaScript access.hh:mm:ss.mmm --> hh:mm:ss.mmm. Note the PERIOD before milliseconds — unlike SRT's comma. The arrow --> is two hyphens plus a greater-than, with a space on either side.line:90% (vertical position), position:50% (horizontal offset), align:middle (text alignment). SRT has no equivalent.<i>, <b>, <u>, and <c.classname>text</c>, which applies a CSS class defined by the video player.NOTE This section covers productivity tipsReference: W3C WebVTT 1 — Cues.
VTT vs SRT vs ASS vs DFXP / TTML
Four subtitle formats dominate real-world workflows. Pick based on where the file will play, how much styling you need, and what downstream tools consume it.
| Format | Standardized by | Best for | Styling | Browser support | File size |
|---|---|---|---|---|---|
| SRT (SubRip) | De facto (~2001) | Max compatibility, YouTube, editors | Minimal (bold / italic) | Via player, not native | Smallest |
| WebVTT | W3C (May 2013 CG Report) | HTML5 web video, styled captions | CSS + positioning + inline classes | Native (all major browsers) | Small |
| ASS / SSA | Aegis Advanced SubStation Alpha | Anime, karaoke, advanced styling | Full color / position / font control | Not browser-native (VLC etc.) | Medium |
| DFXP / TTML | W3C TTML1 / TTML2 + IMSC 1.2 profile | Netflix, broadcast, professional | Full XML styling | Not browser-native | Larger (XML) |
Source: W3C WebVTT + W3C TTML2 specifications; Netflix Timed Text Style Guide; verified July 2026.
Which format to pick
Pick SRT for max compatibility. It's the format YouTube prefers for uploads, the format video editors (Premiere Pro, DaVinci Resolve, Final Cut, CapCut) import cleanly, and the format desktop players (VLC, MPV, PotPlayer) handle without configuration. If you're delivering to a client or a platform whose subtitle-format policy you don't know, SRT is the safe default. See our SRT format guide for the full spec.
Pick WebVTT when you're serving video through a native HTML5 <video> element with a <track> child. Every major browser parses VTT natively — no JavaScript library required — and VTT is the only format that lets you position cues (top / bottom / percentage-based), style them with CSS via ::cue, and apply inline classes for speaker differentiation.
Pick ASS / SSA only when you need heavy typographic control — karaoke effects, per-character animation, precise font positioning. Aegisub is the standard editor. Web browsers ignore it entirely, so ASS is a hardsub-workflow format for anime, fansubs, and specialized encoding.
Pick DFXP / TTML only when a broadcast or streaming client explicitly requires it. Netflix's Timed Text Style Guide mandates IMSC 1.2 (a W3C TTML profile) for delivery. Outside that world, TTML's XML weight is overkill.
Timing and Formatting Rules
The WebVTT spec itself doesn't enforce reading-speed or character limits — that's a caption-quality standard set by the broadcast and streaming industry. Follow these rules or your captions will fail professional QC and accessibility review.
| Rule | Source / rationale |
|---|---|
| Minimum cue duration: 1 second | Below 1s viewers can't finish reading |
| Maximum cue duration: 7 seconds | Above 7s viewers lose sync with speech |
| Reading speed: 15–20 characters per second | WCAG 2.1 SC 1.2.2 recommendation |
| Max 42 characters per line | Netflix Timed Text Style Guide, BBC subtitle guidelines |
| Max 2 lines per cue | Broadcast + streaming standard |
| Minimum 2-frame gap between cues | Prevents cues bleeding together on screen |
Sources: WCAG 2.1 SC 1.2.2 Captions (Prerecorded); Netflix Timed Text Style Guide; BBC Subtitle Guidelines.
Positioning cues — VTT's unique feature
Appended to the timing line, cue settings let you place text anywhere on the video surface. This is the biggest capability gap between VTT and SRT:
line:X%— vertical position (0% = top, 100% = bottom). Use for captions that need to move around on-screen text.position:X%— horizontal offset (0% = left edge, 50% = center, 100% = right edge).align:start | middle | end— text alignment within the cue box.size:X%— cue box width as a percentage of the video width.
Inline CSS class styling
Wrap text in <c.classname>...</c> to apply a CSS class defined by the player. Common uses: colour-coding by speaker (<c.host> vs <c.guest>) or marking non-speech events (<c.sfx>[laughter]</c>). The player targets these with ::cue(.host) { color: yellow; }.
How to Open, Edit, or Create a VTT File
WebVTT is plain text, so every workflow starts with a text editor. Beyond that, tooling scales up with your needs.
Opening a VTT file
Any plain-text editor works. On Windows: Notepad or Notepad++. On Mac: TextEdit (Format → Make Plain Text first) or BBEdit. Cross-platform: VS Code, Sublime Text, or the command line (nano, vim). VLC also plays VTT tracks natively when the file sits next to a video with a matching filename.
Editing a VTT file
For small typos and wording tweaks, a text editor is fine. For retiming or bulk edits, use a subtitle editor. Subtitle Edit (Windows, free, actively maintained) is the industry standard for cleanup and format conversion. Aegisub (cross-platform, free) is preferred for advanced styling and positioning work. Both give you a waveform view for aligning cues to speech visually.
Creating a VTT file manually (7 steps)
Fine for a short demo. Painful past a couple of minutes.
- Open a plain-text editor (Notepad, TextEdit in plain-text mode, or VS Code).
- Type
WEBVTTon line 1, then leave a blank line. - Type the cue timing on the next line:
hh:mm:ss.mmm --> hh:mm:ss.mmm— PERIOD before the milliseconds, not a comma. - Type the cue text below the timing (max 2 lines, around 42 characters each).
- Leave a blank line, then start the next cue.
- Save with a
.vttextension and UTF-8 encoding without BOM. - Test by loading in an HTML5
<video>element with a<track>child.
Generating a VTT file with AI
The realistic option for anything longer than a couple of minutes. An AI transcription tool listens to the audio, produces word-level timestamps, and formats the result as WebVTT (or SRT — most tools do both).
- VexaScribe — paste an audio or video file, get a valid WebVTT with word-level timing, character-limit-aware cue splits, and multi-language support. See the subtitle generator for VTT + SRT export, or the dedicated SRT generator for SRT-only workflows.
- Whisper (local) — free, MIT-licensed, requires Python and a decent CPU or GPU. Use with the
--output_format vttflag. - Descript, Rev — commercial editor-based alternatives. Descript is video-editor-first; Rev leans transcription-first.
Converting SRT to VTT (find-and-replace)
If you already have a valid SRT file, converting to VTT takes about ten seconds in any text editor:
- Add the literal string
WEBVTTfollowed by a blank line at the very top of the file. - Find-and-replace the comma before milliseconds with a period in every timing line:
,→.(be careful not to change commas inside the caption text). - Save with a
.vttextension and UTF-8 encoding without BOM.
Browser and Platform Support Matrix
Every major browser parses WebVTT natively through the HTML5 <track> element — no library required. Beyond browsers, VTT support across social platforms, meeting tools, and NLEs is patchier:
| Platform | Native VTT support | Notes |
|---|---|---|
| Chrome | ✓ Native since v22 | HTML5 <track> element |
| Firefox | ✓ Native since v31 | Full VTT feature support |
| Safari | ✓ Native since v10 | iOS Safari supported |
| Edge | ✓ Native, all versions | Since Chromium migration |
| YouTube | ✓ Accepts uploads | Prefers SRT; VTT also accepted |
| Vimeo | ✓ Accepts VTT | Uploads processed to native player format |
| Netflix | ✗ | Requires DFXP / IMSC 1.2 per TTSGuide |
| Zoom | ✓ | Live captions exported as VTT |
| Microsoft Teams | ✓ | Live captions VTT export |
| VLC | ✓ | Full VTT support |
| Premiere Pro | Partial | Import supported; direct export requires plugin |
| DaVinci Resolve | Partial | Similar to Premiere |
Source: MDN Web Docs — WebVTT API; caniuse.com — WebVTT; platform documentation; verified July 2026.
Common VTT Problems and How to Fix Them
Five failure modes account for almost every broken WebVTT file we've seen. Here's how to spot and fix each.
⚠Timecode uses period, not comma
Cause: Editors converting SRT to VTT commonly miss this — SRT uses commas, VTT uses periods
Fix: Write 00:00:04.500 (period), NOT 00:00:04,500 (comma). This is the opposite of SRT. If you're converting an SRT, replace the comma before milliseconds with a period in every timing line.
⚠Missing 'WEBVTT' header on line 1
Cause: File saved without the required literal WEBVTT header — most common failure mode
Fix: Line 1 of every WebVTT file must be the literal string 'WEBVTT' (uppercase). Some players reject the file silently. Add it, followed by a blank line, then start your cues.
⚠UTF-8 BOM added by Windows Notepad
Cause: Windows Notepad adds a UTF-8 BOM (byte order mark) by default; the WebVTT spec forbids BOM before the WEBVTT header
Fix: Save without BOM. In VS Code: click the encoding menu bottom-right → 'Save with Encoding' → 'UTF-8' (not 'UTF-8 with BOM'). In Notepad on Windows 11, use the encoding dropdown in the Save As dialog and pick 'UTF-8' (no BOM).
⚠Wrong MIME type served over HTTP
Cause: Some CDNs and static hosts send Content-Type: text/plain for .vtt files; browsers strictly require text/vtt
Fix: Configure your server (nginx, Apache, S3, Cloudflare, Netlify) to send Content-Type: text/vtt for .vtt extensions. Without it, the HTML5 <track> element refuses to load the file.
⚠CORS blocks VTT loaded from a different origin
Cause: The <track> element enforces same-origin policy — a VTT hosted on a CDN separate from the video will be blocked
Fix: Send Access-Control-Allow-Origin: * (or a matching origin) on the VTT response, and add crossorigin='anonymous' to the <video> element. Both sides are required.
VTT vs SRT: When to Use Which
This is the decision most people come to this page for. Both formats are plain-text UTF-8 with cue-based structure — conversion between them is trivial (see §F above). What actually differs is the ecosystem each format targets:
Use WebVTT for
- HTML5 web video — native browser support via
<track> - Styled captions using CSS
::cueselectors - Positioning control —
line,position,alignsettings - Speaker differentiation with inline CSS classes
- Same-language captions for accessibility on your own site
- Video.js, Plyr, Shaka Player, and other web video libraries
Use SRT for
- YouTube uploads (SRT is the preferred format for import)
- Desktop players — VLC, MPV, PotPlayer, Windows Media Player
- Video editor timelines — Premiere Pro, DaVinci Resolve, Final Cut, CapCut
- Legacy broadcast workflows and older LMS platforms
- Handoff to clients — SRT is the safe universal default
- Simple captions where styling and positioning aren't needed
The practical takeaway: if the video will play in a browser via HTML5 <video>, use VTT. If it'll play anywhere else — YouTube, editors, desktop players, social — use SRT. If you need both, generate both (any decent AI subtitle tool exports side-by-side). See our companion SRT format guide for the full SRT spec, timing rules, and troubleshooting.
Generate WebVTT from any audio or video with VexaScribe
Upload an audio or video file, get a valid WebVTT (and SRT) with word-level timestamps, character-limit-aware cue splits, and UTF-8 encoding by default. No manual timing required.
Try the subtitle generator →Frequently Asked Questions
What is a VTT file?
A WebVTT (.vtt) file is a plain-text UTF-8 subtitle format designed by the W3C (May 2013) for use with HTML5 video. It stores timed captions or subtitles that a video player displays synchronized with the audio.
What's the difference between VTT and SRT?
Both are plain-text subtitle formats. VTT is a W3C standard (2013) designed for HTML5 web video with native browser support, CSS styling, and positioning cues. SRT is an older de facto standard (SubRip, ~2001) with wider compatibility across desktop players and video editors but no styling or positioning support. Timecode formats differ: VTT uses periods before milliseconds (00:00:04.500); SRT uses commas (00:00:04,500).
Does YouTube support VTT?
Yes. YouTube accepts VTT uploads for captions and subtitles, though SRT remains the more common upload format. Both work equally well for creators.
Can I open a VTT file with Notepad?
Yes. VTT is plain text — Notepad (Windows), TextEdit (Mac), VS Code, or any text editor can open, view, and edit it. Save as UTF-8 without BOM.
How do I convert SRT to VTT?
Two changes: (1) add the literal string 'WEBVTT' followed by a blank line at the very top of the file; (2) replace the comma before milliseconds with a period in timing lines (00:00:04,500 → 00:00:04.500). Save with .vtt extension.
Why is my VTT file not showing captions in the browser?
Most common causes: missing 'WEBVTT' header on line 1; wrong MIME type (Content-Type must be text/vtt); CORS issues serving VTT from a different origin than the video; timecode uses commas instead of periods (invalid VTT).
Related Guides
What Is an SRT File?
Companion sibling — SRT format explained, annotated sample, and full SubRip spec.
What Is Closed Captioning?
Authority hub on captioning laws, WCAG requirements, and format choices.
Subtitle Generator
Generate WebVTT and SRT from any audio or video with automatic cue splitting.
SRT Generator
Dedicated SRT export from audio or video with word-level timestamps.
How to Generate Subtitles from Video
Step-by-step workflow for captioning a video from scratch.
What Is a Transcript?
Sibling authority page — transcript formats, use cases, and how they differ from subtitles.
Video to Text
Extract a text transcript from any video file — MP4, MOV, MKV, WEBM.