TikTok Transcript Generator — Free, No Signup

Paste a public TikTok URL. We pull the existing captions and reformat them into TXT, SRT, VTT, JSON, or CSV — usually in 2–3 seconds. Free with one transcript per day, no signup. Sign in for unlimited Whisper Large-v3 transcription on audio or video files you upload yourself.

Free, no signup required for Path A1 free per day across our TikTok and Instagram tools

TL;DR

—Path A (free, no signup): paste a public TikTok URL and we fetch TikTok's own auto-captions, reformatted into your chosen export. Usually 2–3 seconds. We're not re-transcribing — you get the same caption track TikTok already has on file.
—Path B (signed in): upload the audio or video file directly. We run full Whisper Large-v3 transcription with speaker diarization. Works on videos without TikTok captions and on any audio file from your device.
—Caption ≠ transcript ≠ on-screen overlay. All three terms get used interchangeably and they mean different things. See the disambiguation below.

TikTok transcript vs caption vs subtitle

These three terms get used as if they're the same thing. They aren't. If you ask the wrong tool for the wrong one you'll get garbage.

Transcript (this tool)

The spoken words inside the video, in plain or timed text. This is what this tool returns.

Caption

The text the creator types in the description — the part with hashtags and emoji. Copy this from the TikTok app directly.

Subtitle / overlay

The on-screen text burned into the video (karaoke-style bouncing words). Usually done in Submagic, CapCut, or TikTok's own caption sticker.

How this tool actually works

Two paths. Different problems. We explain both because most competitors don't.

Path A — URL paste (free, no signup)

Paste a public TikTok URL. We hit TikTok's public endpoints, fetch the existing caption track TikTok's own auto-caption system produced, and reformat into your chosen output (TXT, SRT, VTT, JSON, or CSV). No re-transcription — you get the same text TikTok itself has on file.

Best when (a) the video is public, (b) it has spoken content, and (c) TikTok has already generated captions for it. If any of those is missing, Path A fails fast and tells you to try Path B.

Path B — signed-in bulk upload (real Whisper transcription)

Sign in and upload the file directly — up to 50 files per batch (MP4, MOV, MKV, MP3, M4A, WAV, FLAC, OGG, OPUS). We run OpenAI Whisper Large-v3 with speaker diarization and return the transcript with timestamps, plus exports as TXT, DOCX, SRT, VTT, JSON, or PDF.

Costs your normal NovaScribe transcription credits, but works on private videos, deleted videos you saved locally, and videos TikTok never generated captions for.

Format export decision matrix

Pick the export by what you're doing next with it — not by which sounds fanciest.

Format	Pick this when…	Skip this when…
SRT	Importing into Premiere, DaVinci, CapCut, Final Cut, YouTube Studio.	You don't need timestamps.
VTT	HTML5 <track> tags, in-browser captions.	Your editor prefers SRT.
TXT	Quoting in a blog, brief, or thread. Just the words.	You need timestamps.
JSON	Feeding an LLM, building a search index, programmatic parsing.	A human reads it directly.
CSV	Spreadsheet workflows, tagging clips by timestamp.	You don't open it in Excel/Sheets.

Default to SRT if you're unsure — you can always strip timestamps later to convert to TXT.

Accuracy — what to expect on each path

Path A (URL paste)

We're reformatting TikTok's own captions — we're not running speech recognition. On clear English speech with no overlapping music, TikTok's auto-captions typically land in the 85–92% accuracy range. On music-heavy clips, accented speech, or fast-paced creator delivery, accuracy drops. If the result looks rough, run Path B for the same video.

Path B (Whisper Large-v3)

OpenAI's Whisper Large-v3 published a 5.6% word error rate on Common Voice 15 English — roughly 94% accuracy under benchmark conditions (Radford et al., 2022). Real-world TikToks with music, overlapping speech, and creator-style fast pacing typically land between that benchmark and Path A.

Honest caveat: ElevenLabs Scribe v2 publishes higher English accuracy (96.7%) than Whisper Large-v3 on benchmark. For creator workflows, content repurposing, and accessibility captioning, Path B is plenty. For high-stakes content where every word matters, run a paid transcription service.

Multilingual TikTok — what to expect on each path

Path A language coverage

The dropdown supports 15 languages plus auto-detect. Path A inherits whatever languages TikTok itself supports for auto-captions — English, Spanish, Portuguese, French, German, Italian, Japanese, Korean, and Indonesian are reliably available; coverage for smaller languages is patchier.

Path B language coverage (Whisper Large-v3, 99 languages)

Whisper supports 99 languages with widely varying quality. The honest breakdown:

Tier 1 — 92–95% accuracy on clean audio

English, Spanish, Portuguese (BR/PT), French, German, Italian, Dutch, Polish, Russian, Japanese, Korean, Mandarin.

Tier 2 — 88–92% accuracy

Turkish, Arabic, Hindi, Vietnamese, Indonesian, Thai, Hebrew, Swedish, Norwegian, Danish, Finnish, Czech.

Tier 3 — 75–88% accuracy (lower-resource languages)

Bengali, Tamil, Urdu, Swahili, Yoruba, and other lower-resource languages. Usable, but expect noticeable errors that need a manual review pass.

When the tool fails — six common cases

We'd rather tell you upfront when Path A won't work than have you copy a URL six times and curse the page.

1. No captions available on this TikTok

TikTok hasn't generated captions for this video yet — common on brand-new uploads, lower-engagement posts, or very short clips. What to do: sign in and use Path B with the file.

2. Private TikTok

Path A only reaches publicly-viewable videos. Private accounts and friends-only content can't be fetched. What to do: if you have access, screen-record and upload via Path B.

3. Deleted video

Once the original is gone, captions are gone too. What to do: if you saved the file before deletion, upload via Path B.

4. Music-only clip or dance TikTok

No spoken content means no meaningful transcript. Tools that claim to transcribe pure music are either hallucinating lyrics or just listing the music track ID. What to do: not a use case for this tool.

5. Clip under 3 seconds

Too short for TikTok to bother generating captions. Whisper can technically transcribe it, but a 3-second clip rarely has enough audio context for useful output.

6. Non-video URL

Profile pages, hashtag pages, sound/audio pages, and photo carousels won't work. The tool only accepts video content URLs.

5-step practical workflow

1
Copy the TikTok URL. Tap Share → Copy link in the TikTok app, or copy from the browser address bar.
2
Paste and pick a format. SRT for video editing, TXT for writing, JSON for LLM workflows. If unsure, pick SRT.
3
Generate. Path A returns in 2–3 seconds. If it fails, the page tells you why.
4
Skim and correct. Auto-captions miss brand names, slang, and proper nouns. Quick pass in a text editor to fix obvious errors.
5
Use it. Import SRT into your editor, paste TXT into your CMS, or pipe JSON into your LLM workflow.

Submagic vs NovaScribe for TikTok transcripts

Submagic is one of the most-searched tools alongside NovaScribe for TikTok work. They're actually built for different jobs.

Submagic is best for…

Burning karaoke-style captions into your vertical video for posting back to TikTok. You upload a video, Submagic transcribes and renders animated captions in the bouncy-word style that looks native on TikTok feed. The output is a new video file with captions baked in.

NovaScribe is best for…

Getting the text out of a TikTok so you can use it elsewhere — blog posts, scripts, search indexes, LLM workflows, accessibility captions for other platforms. We return text files, not styled videos.

Short version: if your output is a video, use Submagic. If your output is a text file, use this tool.

ElevenLabs Scribe v2 vs NovaScribe for TikTok

ElevenLabs Scribe v2 publishes higher English accuracy (96.7%) than Whisper Large-v3 (~94%). The trade-offs:

ElevenLabs: better raw English accuracy on benchmarks. Paid only. No free URL-paste path — you have to upload the file. Designed for production media workflows that need maximum word-level precision.
NovaScribe: Path A is free with no signup — ElevenLabs simply doesn't offer this. Path B uses Whisper Large-v3, which is roughly 2-3 percentage points behind ElevenLabs on English but covers 99 languages with documented per-language WER. Cheaper per minute when you do sign in.

Short version: if you need the absolute highest English transcription accuracy and you're happy to pay, use ElevenLabs. If you want a free fast option for public TikToks, or you transcribe non-English content regularly, use this tool.

Honest tool comparison — when to use which

We're not going to claim we're best at everything. Here's what each tool is actually good at.

Tool	Best for	Honest verdict
NovaScribe (this tool)	Free URL paste → SRT/VTT/TXT, plus signed-in Whisper bulk upload.	Dual-path. No login for Path A. Honest about failure modes.
Submagic	Burned-in animated captions for posting back to TikTok.	Best in class for that aesthetic. Not a transcript extractor.
ElevenLabs Scribe v2	Highest published English accuracy.	Paid only, no free URL-paste path. Worth it for high-stakes English.
Opus Clip	Auto-clipping a long video into short-form TikTok content.	Different job entirely. Use this if you're chopping podcasts into TikToks.
Descript	Editing transcripts and video together (cut by deleting text).	Overkill for just extracting text. Best if you're editing video too.
Kapwing	In-browser editor with styled captions.	Closer to Submagic than to us. Pick it if you're also editing.

Frequently asked questions

How do I get a transcript from a TikTok video?

Copy the TikTok URL from the share menu or browser address bar, paste it into the box at the top of this page, pick your output format (SRT, VTT, TXT, JSON, or CSV), and click Generate. If TikTok has already generated captions for that video, you'll get the transcript in 2–3 seconds. If not, sign in and use the bulk upload path with the original file.

Does TikTok have a built-in transcript or caption feature?

TikTok introduced auto-generated captions in 2021. They display during playback and creators can toggle them on or off per video. But TikTok doesn't give you a way to download or export the transcript text. This tool fills that gap — we pull the same auto-caption track TikTok has on file and reformat it into a downloadable file.

What's the difference between TikTok captions and a transcript?

The TikTok caption is the text the creator types in the description with hashtags and emoji. The TikTok transcript is the spoken words inside the video itself. They're different things. This tool returns the transcript. The caption is something you can copy directly from the TikTok app.

How accurate are TikTok transcript tools?

For Path A (URL paste), accuracy depends on what TikTok's own auto-captioning produced — typically 85–92% on clear English speech, lower on music-heavy clips, accented voices, or fast pacing. For Path B (signed-in Whisper Large-v3 upload), the published benchmark is 5.6% word error rate on Common Voice 15 English, roughly 94% accuracy under benchmark conditions.

Can I transcribe a TikTok without downloading the video?

Yes — that's exactly what Path A does. Paste the public TikTok URL and we fetch the existing caption track directly. No download required. Path B requires the file (so you'd need to save it first), but it works on private content and on TikToks that have no auto-captions yet.

What file formats are best for TikTok transcripts?

SRT for importing into video editors (Premiere, DaVinci, CapCut, Final Cut). VTT for HTML5 web players. TXT if you just want the words for a blog post or quote. JSON if you're feeding the transcript to an LLM or building a search index. CSV for spreadsheet workflows. Default to SRT if you're unsure.

How do I transcribe a TikTok in another language?

Pick the language from the dropdown above the Generate button — we support 15 languages explicitly plus auto-detect. Path A inherits whatever languages TikTok's auto-caption system supports for that video. Path B uses Whisper Large-v3, which supports 99 languages with varying accuracy (see the Multilingual section above for tier breakdown).

Why did the tool fail on this TikTok?

Six common reasons: (1) TikTok hasn't generated captions for this video yet, (2) the account is private, (3) the video was deleted, (4) it's a music-only or dance clip with no spoken content, (5) it's under 3 seconds, or (6) you pasted a profile, hashtag, sound, or photo URL instead of a video URL. If it's reason 1, 2, or 3 and you have the file, sign in and upload via Path B.

Can I bulk-transcribe multiple TikToks at once?

Yes, on Path B (signed-in upload). Sign in and upload up to 50 audio or video files per batch. Each gets transcribed with Whisper Large-v3 and you can export each as SRT, VTT, TXT, JSON, DOCX, or PDF. The free URL paste path (Path A) is one URL at a time.

Sources

Radford, A. et al. (2022). Robust Speech Recognition via Large-Scale Weak Supervision. arXiv:2212.04356 — source of the 5.6% WER figure for Whisper Large-v3 on Common Voice 15 English.
TikTok Newsroom — introducing auto-captions (2021)
W3C WebVTT specification — the standard for in-browser caption tracks.

Page reviewed and accuracy figures verified June 2026.