Guide

Free vs paid transcription

A plain-English breakdown for researchers and journalists: what free transcription really covers, what it quietly costs, and when paying earns its keep.

The short answer

Free transcription covers real work but with hard limits: short per-file caps, no speaker labels or timestamps, and terms that may let the tool train on your audio. Paid buys a diarized, timestamped, exportable file and a no-training guarantee. For irregular, project-based volume, a pay-once tool with free starter minutes sits between the two.

What does 'free transcription' actually get you?

Free transcription comes in three shapes, and each one caps somewhere. First, the free tiers of hosted tools. Otter's free Basic plan, for instance, allows 300 transcription minutes a month, 30 minutes per conversation, and three lifetime file imports. That covers a short interview or two, then it stops.

Second, general AI assistants, where the audio limits are stricter than most people expect. Gemini's free tier does accept audio uploads, but caps them at 10 minutes total, extendable to about three hours only on a paid plan. Claude's free tier accepts no audio at all, just documents and images, so it can't transcribe a recording in the first place.

A general assistant also works on audio inside a chat window. That's a different job from making a transcript. You get a summary or an answer, not a speaker-labeled, timestamped file you can export and cite. Gemini's free access is metered too, on compute-based limits that refresh every five hours up to a weekly cap, with the exact thresholds unpublished.

Third, do it yourself with an open-source model. OpenAI's Whisper is released under the permissive MIT license, so you can self-host it for nothing. The catch: you supply the hardware and handle the upkeep yourself. The hosted transcription API is easier, but it caps uploads at 25 MB per file, which a long recording passes quickly.

Is free transcription less accurate than paid?

Not necessarily. On clean conversational speech, the accuracy gap is smaller than the price gap suggests. In one benchmark, professional human transcribers hit 5.9% word error rate and a top automated system 5.8% on the same audio. Free automatic tools run on the same class of models. Accuracy tracks the recording and the accent far more than the price tag.

Where any tool slips is hard audio. Across five commercial speech engines, average word error rate ran 0.35 for Black speakers versus 0.19 for white speakers, roughly double. Overlapping voices, thick accents, background hum, and quiet talkers push errors up on free and paid alike. Paying more does not fix a bad recording.

So accuracy is the wrong axis for this decision. Both free and paid land in a similar range on good audio, and both struggle on bad audio. We go deeper on the numbers in how accurate AI transcription really is. The real differences show up elsewhere: what the tool hands you, and what it does with your data.

The hidden cost of free is often your data

Plenty of free tiers reserve the right to train on what you upload. Anthropic's consumer terms, for one, let it train new models on Free, Pro, and Max account chats when the model-improvement setting is on. Choosing to allow it extends retention to five years. Users had until October 8, 2025 to make the call.

The control is real, but it's easy to miss. Claude's Privacy Center confirms your chats improve the model only if you allow it, and Incognito chats are always excluded. 'Free' usually means the defaults favor the tool, not you – not that any one vendor is careless. For off-the-record or IRB-governed audio, that default matters.

There's a legal edge here too. If your recording carries identifiable voices, it counts as personal data. GDPR makes processing lawful only when you have a valid basis, such as freely given, specific, informed consent from each speaker. A free tool that retains or trains on that audio can pull you into a duty you didn't plan for. We walk through the sensitive-audio workflow in confidential transcription.

What free tools usually leave out

The artifact. A research or newsroom workflow needs a specific file: speaker-labeled, timestamped, and exportable to something you can cite from. Most free options hand you plain text instead. A general assistant gives you a summary in a chat. A quick free tier often drops speaker labels, or diarization, unless you upgrade.

Timestamps are the other common omission. Without them, you can't jump back to the exact second a source said the line you're about to quote, and free plain-text exports usually strip that anchor out. When you're coding qualitative data or fact-checking against a deadline, a timestamp lets you re-hear the moment before you commit a quote to print. Without it, you're scrubbing the whole recording to find one line.

Self-hosting Whisper does get you a file, but only after you stand up the model on your own hardware and build in whatever it lacks out of the box. Paid tools bundle that work. You upload once and export to TXT, DOCX, PDF, SRT, or VTT without building a pipeline first.

When is paying actually worth it?

When your time is worth more than the fee. Transcribing by hand runs up to six hours of work per hour of audio. If a tool turns that into a few minutes of review, it pays for itself on the first serious project. The break-even is low, and it drops further as your volume rises.

The usual catch with paid is the subscription. A monthly plan sits idle between projects, which punishes anyone with irregular volume: researchers, freelancers, students. A pay-once model closes that gap. You buy minutes when you need them, and credits never expire, so a quiet month costs nothing.

Pepys sits in that middle. You get 60 free minutes to test your real audio, then pay once for what you use, with no training on your files and auto-delete when you're done. You still get the artifact the free tiers skip: a diarized, timestamped transcript you can export, without the subscription paid tools usually attach.

How do you choose between free vs paid transcription?

Match the tool to the job. For a one-off five-minute voice memo, a free tier is plenty. For a 90-minute interview you'll quote in print, or any recording with a named, identifiable source, the short caps and data terms stop being a bargain.

Here's a simple test. If the transcript is a disposable note, free works. If it's an artifact you'll cite, code, or archive, you need diarization, timestamps, export, and a no-training guarantee. Start with free minutes to check a tool on your actual audio, then pay only when the recording earns it.

Tips from people who do this a lot

Test any free tool on your worst audio, not your cleanest. Accents, crosstalk, and room noise are where the gap between free and paid actually shows.
Check the data setting before you upload anything sensitive. Free tiers often default to model-improvement on, and the opt-out is buried a menu deep.
Watch the per-conversation cap, not just the monthly total. A plan with 300 minutes a month but a 30-minute ceiling can't handle a single long interview.
A general AI assistant is fine for asking questions about a short clip. It won't hand you a timestamped, speaker-labeled file you can quote from.
Self-hosting Whisper is genuinely free, but budget the setup time. If your hourly rate beats the fee, a hosted tool usually wins on total cost.

Try it now

Drop in your recording or paste a link and get a clean, speaker-labeled transcript in minutes. Your first 60 minutes are free.

or paste a link

60 min free · no card required · we never train on your audio

Trusted by 100,000+ creators, podcasters, journalists & researchers

Free vs paid transcription – questions, answered

Is free transcription good enough for real work?

For clean, short audio, often yes. On one benchmark, top automatic systems and professional humans both landed near 5.8 to 5.9% word error rate. The limits are practical, not accuracy: short caps, missing speaker labels and timestamps, and terms that may let the tool train on your files.

Can free AI assistants transcribe audio?

Partly. Gemini's free tier accepts audio but caps it at 10 minutes total, extendable to about three hours only on paid. Claude's free tier takes no audio at all, only documents and images. And an assistant analyzes audio in a chat; it doesn't hand you an exportable, speaker-labeled transcript file.

Is Whisper really free?

Yes. OpenAI released Whisper under the MIT license, so you can self-host it at no cost. But you supply the hardware and keep it running, and the hosted API caps uploads at 25 MB per file. It's free in dollars, not in time, which is the real trade for a busy researcher or reporter.

Do free transcription tools use my audio to train AI?

Some can. Anthropic's consumer terms let it train on Free, Pro, and Max chats when the model-improvement setting is on, and allowing it extends retention to five years. Read the data setting before uploading sensitive recordings, since GDPR treats audio with identifiable voices as personal data needing a lawful basis.

When is paid transcription worth the money?

When your time beats the fee. Manual transcription runs up to six hours per hour of audio, so any tool that turns that into minutes of review pays for itself fast. Paid earns its keep when you need diarization, timestamps, export, and a no-training guarantee for an attributable transcript.

References

1.Otter.ai Pricing – Basic (free) plan limits – Otter.ai (official pricing page)
2.Upload & analyze files in Gemini Apps – audio upload limits – Google (Gemini Apps Help)
3.Gemini Apps limits & upgrades – compute-based usage limits – Google (Gemini Apps Help)
4.What kinds of documents can I upload to Claude.ai? – Anthropic (Claude Help Center)
5.OpenAI Whisper – MIT License – OpenAI (GitHub repository)
6.OpenAI API – Speech to text guide (25 MB file limit) – OpenAI (official API docs)
7.Updates to our Consumer Terms and Privacy Policy (Aug 28, 2025) – Anthropic
8.Claude Privacy Center – Is my data used for model training? – Anthropic (Claude Privacy Center)
9.Haberl et al. (2023), Take the aTrain – transcription time cost, citing Bell et al. (2018) – arXiv / University of Graz
10.Xiong et al. (2016), Achieving Human Parity in Conversational Speech Recognition – Microsoft Research / arXiv
11.Koenecke et al. (2020), Racial disparities in automated speech recognition – PNAS (peer-reviewed)
12.Regulation (EU) 2016/679 (GDPR), Art. 6(1) and Art. 4(11) – European Union / EUR-Lex

Keep reading

Don't just take our word for it.

Ask ChatGPT, Claude, or Perplexity what Pepys is and who it's for. One click, and your favorite AI does the homework.

Ask ChatGPT Ask Claude Ask Perplexity

Get your transcript – free to start

Pay as you go – credits never expire, nothing to cancel. Or start free with 60 minutes, no card.

Start free – 60 minutes or see pricing