How to Transcribe an Interview

AI transcription takes 3–5 minutes per hour. Manual transcription takes 4–6 hours. Here's how to choose the right method — and get accurate results.

Verbatim + clean verbatim supportedSpeaker identification100+ languages

Accepted recording formats:

MP3WAVM4AFLACMP4MOV

Choose Your Transcription Style First

The type of transcript you need determines which method is right for you.

StyleWhat it includesBest forNovaScribe output
True verbatimEvery word, “um”, stutter, false start, [laughter]Academic qualitative research, legalReview + edit pass for filler words
Clean verbatimWords only — fillers removed, grammar smoothed, meaning preservedJournalism, UX research, podcastsDefault AI output (excellent fit)
EditedReorganized by topic or theme, not chronologicalHR debrief summaries, meeting recapsAI summary output

AI tools produce clean verbatim by default — filler words and false starts are filtered. If you need true verbatim (academic or legal use), do a review pass to restore any fillers after transcription.

Method 1: AI Transcription (Recommended)

3–5 minutes per hour. Best for journalism, UX research, HR calls, academic research (with review pass).

1

Export your recording

Export from Zoom, Google Meet, Teams, Riverside, voice memo, or any device. Formats: MP3, WAV, M4A, FLAC, MP4, MOV. Files up to 500 MB supported.

2

Upload to NovaScribe

Drag and drop, or click to browse. Select language (or leave on auto-detect for single-language interviews).

3

Review speaker labels

NovaScribe labels speakers as "Speaker 1", "Speaker 2". Rename to interviewer and interviewee names in the editor.

4

Correct errors

Use the built-in editor. Focus on: proper nouns, technical terms, and any moments with crosstalk or background noise.

5

Export

Download as TXT, DOCX, SRT, or VTT. Import directly into NVivo, ATLAS.ti, Dovetail, or any analysis tool.

Method 2: Manual Transcription

4–6 hours per hour of audio. Use when: legal testimony, medical dictation, or when zero errors are required in a very short transcript.

1

Set playback speed to 0.5×

Use VLC, QuickTime, or Audacity. 0.5× is faster than stopping and rewinding constantly.

2

Open a text editor with timestamps

Add a timestamp like [00:05:23] every 2–3 minutes. This helps you navigate back to specific moments.

3

Label speakers consistently

"INT:" for interviewer, "RES:" for respondent — or use first names. Be consistent throughout.

4

Use shorthand first, edit later

Get the words down at speed during the first pass. Do a clean-up pass afterward.

5

Proofread against the recording

Play at normal speed and scan the transcript simultaneously. Catch missed words and speaker misattributions.

Time & cost estimate: Manual transcription of a 2-hour interview takes 8–12 hours. At $25/hr of your time, that’s $200–$300 per interview. AI transcription via NovaScribe: $0.10–0.20 per interview on the $2/month plan.

AI vs. Manual vs. Human Professional

MethodAI (NovaScribe)Manual (yourself)Human professional
Time3–5 min/hour4–6 hours/hour12–24 hr turnaround
Cost~$0.10–0.20/interviewYour time (~$200+/interview)$1.99/min (~$120/hour)
Accuracy90–95% (clean audio)99%+99%+
Speaker labelsAutomaticManualManual or included
True verbatimWith review pass
Languages100+Your skillsLimited (specialist)
Best forMost interviewsLegal/medical criticalLegal, medical, depositions

How to Transcribe an Interview with NovaScribe

Export your recording

Export from Zoom, Google Meet, Teams, Riverside, voice memo, or any device. Accepted: MP3, WAV, M4A, FLAC, MP4, MOV.

AI transcribes with speaker labels

NovaScribe processes your interview in ~3–5 minutes per hour. Speakers are labeled and timestamped throughout.

Review, rename speakers, and export

Rename Speaker 1/2 to interviewer and interviewee names. Correct any errors. Export as TXT, DOCX, SRT, or VTT.

Recording Quality Tips for Better Accuracy

These factors affect AI accuracy more than any other variable.

1

Record each person on a separate track

Zoom allows separate audio track recording in settings. Dual-track files give speaker diarization much better results — especially for 3+ person interviews.

2

Use a wired mic or external recorder

Laptop microphones pick up keyboard noise, HVAC, and room echo. A $30 USB mic or dedicated recorder like a Zoom H1 dramatically improves accuracy for all participants.

3

Choose a quiet room, not a café

Background noise is the single biggest accuracy killer. A lightly padded room (bookshelves, carpet) beats an open office. Close the door.

4

Minimize crosstalk

Brief interviewees to wait for a full pause before responding. AI speaker identification works best on clear turn-taking. Overlapping speech reduces accuracy.

5

Name your file before uploading

Include the interviewee’s name, date, and topic (e.g., john-smith-2026-03-29-product-feedback.mp3). Helps when searching transcripts across a research project.

Why NovaScribe for Interview Transcription

Built for journalism, UX research, academic, and HR use cases

Speaker identification

Auto-labels interviewer vs interviewee throughout. 90–95% accuracy on 2-person interviews with clean audio.

Timestamps on every sentence

Jump back to the exact moment for quote verification. Essential for journalism and academic research.

100+ languages

Works for interviews in any language. Language is auto-detected from the audio, or specify manually.

Clean verbatim output

Filler words filtered, grammar smoothed, meaning preserved. Suitable for journalism, UX research, and HR.

Export to DOCX, TXT, SRT

Compatible with NVivo, ATLAS.ti, Dovetail, and any qualitative analysis tool. Import transcripts directly.

From $2/month

Transcribe ~3 one-hour interviews per month on the Starter plan. Free trial with 30 minutes, no credit card.

Simple, Affordable Pricing

Transcribe an entire qualitative research project for less than one professional transcription.

Free

Free

30 min (1 short interview)

$2/mo

Starter

200 min (~3 interviews)

$5/mo

Basic

1,000 min (~16 interviews)

$10/mo

Pro

2,500 min (full research project)

$20/mo

Studio

6,000 min (agency/team)

View all plans

Interview Transcription FAQ

How long does it take to transcribe a 1-hour interview?

AI transcription takes 10–15 minutes for a 60-minute interview. Manual transcription takes 4–6 hours for the same recording (the 4–6× rule). Professional human services take 12–24 hours with standard turnaround. For time-sensitive work, AI is the only practical option.

What is the difference between verbatim and clean verbatim transcription?

True verbatim captures everything: filler words (um, uh), false starts, repetitions, and non-verbal sounds like laughter. Clean verbatim removes fillers and false starts but keeps the speaker's exact wording otherwise. Edited transcription paraphrases for readability. Most researchers use clean verbatim; legal depositions use true verbatim.

What is the best software for transcribing interviews?

For research and journalism: NovaScribe (95%+ accuracy, $2/mo, 100+ languages). For diarization focus: Otter.ai (strong speaker tracking, $16.99/mo). For bulk research: Sonix (good editor, $10/hr or $22/mo). For human-quality guaranteed: Rev ($1.50/min, 24-hr turnaround). Choose based on your accuracy needs and budget.

Can AI handle multiple speakers in an interview?

Yes. NovaScribe automatically identifies and labels different speakers (Speaker 1, Speaker 2, etc.). For best results, ensure speakers don’t overlap and record in a quiet environment. You can rename speaker labels in the editor after transcription. Most AI tools handle 2–8 speakers reliably.

Is AI transcription accurate enough for academic research?

Modern AI achieves 90–98% accuracy on clear recordings, which is accepted by most journals and ethics boards when researchers manually verify the transcript. Always review the transcript against the audio before quoting. For verbatim quotes in published work, a second read-through is standard practice.

How much does interview transcription cost?

AI tools: $0–2/month for casual use, $5–22/month for regular researchers. Human professional services: $1–1.50/minute of audio ($60–90 per hour-long interview). Manual DIY transcription is “free” but costs 4–6 hours of your time — at $25/hr that's $100–150 in opportunity cost per interview.

Can I transcribe a phone call or Zoom interview?

Yes. Record the call using your phone’s built-in recorder, a Zoom local recording, or a third-party app like Grain or Fireflies. Then upload the audio or video file to NovaScribe. Phone call audio (single-channel) is slightly less accurate than in-person dual-track recordings, but still achieves 90–95% accuracy.