9,438,517minutes transcribed

Transcription for accessibility, done right

Drop the video or paste the link – get a clean, speaker-labeled transcript you can correct in minutes, then export the caption files and the on-page transcript your viewers actually need.

or paste a link

60 min free · no card required · we never train on your audio

Trusted by 100,000+ creators, podcasters, journalists & researchers

How do you transcribe videos?

Transcription for accessibility means turning a video into a corrected, time-coded transcript you can export as caption files (SRT and VTT) plus a readable on-page transcript. Pepys returns a speaker-labeled draft in minutes that you fix and ship, so captions are accurate rather than auto-generated guesses. It's pay-as-you-go with no subscription, and credits never expire.

Upload or paste a link

Drop your video or paste its link – any audio or video, in any language.

Get your transcript

A clean, speaker-labeled transcript with AI notes tuned to your format, ready in minutes.

Edit and export

Fix anything inline, then export to SRT, VTT, TXT, DOCX, PDF, or JSON.

Made for accessibility teams

The green checkmark that lied
An unchecked auto-caption track mangles names, numbers, and noisy passages, and nobody catches it until a complaint does – Pepys hands you a time-coded draft to correct before the video ships.
A panel reading as one wall
Run two or three speakers together and a deaf viewer loses the thread – Pepys returns each voice labeled, so a panel or interview reads as distinct people.
A back catalogue, a team of three
Hundreds of legacy videos and nobody to hand-type them – correct a draft instead of transcribing from silence, and pay per video with credits that never expire.

Built in, not bolted on

A corrected transcript, caption files, and a readable on-page version

Every videois analyzed automatically the moment it’s transcribed. Here’s a real sample, run through it.

captioning-clinic-staff-training.mp4AI analysis, built in

AI analysis

Why Auto-Captions Aren't Accessible (and How to Fix It)

In this staff captioning clinic, a digital accessibility specialist makes the case that an unchecked auto-caption track is not an accessible video – it fails exactly on the names, numbers, and noisy passages where accuracy decides meaning. The fix is to start from a real, time-coded transcript, correct the words and speaker labels, then export it as a caption file, which is the only realistic way a small team clears a large back catalogue. The session also draws the line between captions (timed, on-screen) and a transcript (the full readable document), argues that speaker labels are comprehension rather than polish, and closes on building access into publishing so it stops being a one-time cleanup that rots.

Key points

An auto-caption track that nobody checked is not an accessible video – auto-captions fail exactly where it matters: names, technical terms, numbers, and anything said over background noise.
Don't treat accuracy as a percentage: a deaf viewer doesn't experience eighty-percent-correct as eighty percent of the meaning, because one wrong word in a sentence can flip what the whole sentence means.
Work from a corrected draft, not the green checkmark: you start from a real transcript and you correct it, fix the names and the numbers, then export it as a caption file – the only way three people clear four hundred videos.
Captions and transcripts are different deliverables: captions are timed to the video and appear on screen as it plays, while a transcript is the full text as one readable document – you want both.
Speaker labels are comprehension, not a nicety: when several people are speaking, an undifferentiated wall of caption text loses the thread, so mark who is speaking.
The closing principle: caption for meaning, not just for words, and build access into publishing so it becomes how you publish instead of a one-time cleanup that quietly rots.

Run this on your own video

Clean, speaker-labeled, click-to-seek

0:00 / 2:40

Transcribe your first video free – 60 min

Ask, don’t scrub

Ask the transcript anything.

An hour-long recording? Don’t skim it – ask. Every answer stays grounded in your transcript and cites the exact timestamp, so you can jump to the moment and check it yourself.

captioning-clinic-staff-training.mp4Ask AI

Why isn't turning on the platform's automatic captions enough for accessibility?

The facilitator calls an auto-caption track that nobody checked not an accessible video, just a liability with a green checkmark next to it. He says auto-captions fail exactly where it matters – names, technical terms, numbers, and anything said over background noise – and that a deaf viewer doesn't experience eighty-percent-correct as eighty percent of the meaning, because one wrong word can flip the whole sentence.

Cited0:000:28

What's the difference between captions and a transcript, and do speaker labels matter?

Captions are timed to the video and appear on screen as it plays, while a transcript is the full text as one readable document with no timing – and you want both. On labels, he's blunt that they're comprehension, not a nicety: an undifferentiated wall of caption text loses the thread when several people are speaking, so mark who is talking.

Cited1:241:53

Why isn't turning on the platform's automatic captions enough for accessibility?

Cited0:000:28

What's the difference between captions and a transcript, and do speaker labels matter?

Cited1:241:53

Ask anything about this transcript…

Grounded in your transcript – if the answer isn’t in the audio, it says so instead of guessing.

Clean paragraphs. No more um's and ah's.

The left is what Pepys hands back – logical paragraphs with the filler stripped out, punctuated and readable. The right is the raw, one-line-per-segment dump most transcribers leave you with.

reel-voiceover.mp4

um so yeah everyone keeps telling you to like lead with your best line right but uh honestly if you give away the whole answer in the first second you know there's basically no reason for anyone to keep watching so the hook isn't kind of the smartest thing you say it's like a loop you open that they need to close and um that's the part that actually keeps people around

Raw

BeforeAfter

Who said what

Speaker labels that survive cross-talk

Automatic speaker diarization. Two people, four people, cross-talk and interruptions – interviews, panels, messy meetings. Pepys keeps each voice on its own line instead of blurring them into one, so you never rewind to figure out who was talking.

Reporter

So the festival nearly didn't happen this year–

Mara Okonkwo

–it almost didn't. We lost the venue three weeks out.

Reporter

Three weeks? How do you even start to–

Mara Okonkwo

You call everyone you know. The whole town pitched in.

Reporter

And that's how it ended up in the park.

Corrected caption files
A time-coded transcript you fix in minutes, then export as SRT and VTT – accurate captions instead of an auto track nobody checked.
On-page transcript
The full readable transcript to publish under each video for screen-reader users – and for the search engines that index the page.
Speaker-labeled panels
Multi-speaker talks come back with each voice separated, so a panel or interview reads as distinct people rather than one undifferentiated wall.
Back-catalogue captioning
A whole archive of legacy video turned into correctable drafts, so a small team can work through hundreds of clips instead of typing each from silence.

Record in any language – 99+ detected automatically

English
中文
Español
العربية
हिन्दी
Français
日本語
Português
Русский
Deutsch
한국어
Italiano
বাংলা
Türkçe
فارسی
Tiếng Việt
தமிழ்
Polski
ไทย
Українська
Nederlands
עברית
Ελληνικά
తెలుగు
Bahasa Indonesia
اردو
Svenska
मराठी
Română
Magyar
Čeština
ગુજરાતી
Kiswahili
ქართული
Tagalog
አማርኛ
99+ total

Works with the platforms you live in.

Paste a link from YouTube, TikTok, Instagram, Facebook, Spotify, or Apple Podcasts – or drop in any audio or video file. We transcribe it once, then you export it however your workflow needs.

YouTube
TikTok
Instagram
Facebook
Spotify
Apple Podcasts
or any file

Export to any format

TXT
Markdown
DOCX
PDF
SRT
VTT
JSON

Most useful for accessibility teams: SRT · VTT · Transcript (TXT) · DOCX · JSON

Timestamps, speaker labels, and subtitle timing carry through to every export.

Why accessibility teams pick Pepys

No subscription – pay per video you caption, and the credits never expire while you work through a back catalogue.
You get a correctable draft, not a locked auto-caption track – fix the names and numbers, then export, so captions are accurate by the time they ship.
Captions and a readable transcript come out of one pass: SRT and VTT for the player, plain text and DOCX for the page.
Speaker labels keep a multi-person panel separated, so a deaf viewer can follow who is speaking instead of reading one merged block.

What accessibility teams say

I had hours of interviews and that horrible feeling that the story was somewhere in there, but I could not see it yet. Reading the transcripts made the shape of the film visible. I could search, highlight, pull quotes, and start building the cut before opening the timeline.
Giulia F.Documentary filmmaker · The edit got unstuck
The translation is useful, but the magic is that the timing survives. That is the part that used to ruin my afternoon.
Lucas D.Subtitle translator · Timing survived translation
Our video archive always felt like this impossible accessibility debt. With Pepys, we get a transcript we can correct, SRT/VTT files for the player, and plain text for the page. It turned a giant project into a queue we can actually work through.
Sam V.Accessibility lead · Back catalogue moving

Transcription for accessibility – questions, answered

How is this different from the automatic captions my platform already generates?

Auto-captions are a first guess that fails on names, technical terms, numbers, and anything said over background noise – and nobody usually checks them. Pepys gives you a clean, time-coded transcript you correct in minutes and then export as a caption file, so what ships is accurate rather than an unchecked guess.

What's the difference between captions and a transcript, and can I get both?

Captions are timed to the video and appear on screen as it plays; a transcript is the full text as one readable document with no timing required. You want both, and you get both from a single pass: export SRT and VTT caption files for the player, and a TXT or DOCX transcript to publish on the page.

Can it tell speakers apart on a panel or interview?

Yes. Speaker diarization separates each voice, so a two- or three-person video comes back labeled rather than as one undifferentiated wall of text. You can rename a speaker once and it updates everywhere, which matters when a viewer needs to follow who is talking.

We have a large back catalogue and a small team. How do we get through it?

You correct drafts instead of typing from silence. Each video comes back as a time-coded transcript you fix – names, numbers, the noisy moments – and then export. Editing a good draft is a different, faster job than transcribing from scratch, which is how a small team clears hundreds of legacy videos. Credits never expire, so you can work at the pace your funding allows.

What caption and transcript formats can I export?

SRT and VTT caption files that drop straight into your player, plus a plain-text or DOCX transcript for the page, and JSON if you need to pipe it into another system. One click each.

Does it handle other languages for multilingual content?

Yes. It auto-detects the spoken language across 99+ languages, so a video in another language transcribes without you changing a setting. You can also transcribe in the original language and get a translated version with the caption timing preserved.

Do I have to commit to a monthly plan?

No. Pepys is pay-as-you-go – buy a block of hours, use them across however many videos you caption, and the credits never expire. You can start free with 60 minutes, no card.

Start your first video free

More industries

Popular tools

Don't just take our word for it.

Ask ChatGPT, Claude, or Perplexity whether Pepys is the right fit for accessibility teams.

Ask ChatGPT Ask Claude Ask Perplexity

Turn your next video into accurate captions and a transcript – and pay only for that video.

Pay as you go – credits never expire, nothing to cancel. Or start free with 60 minutes, no card.

Start free – 60 minutes or see pricing

Transcription for accessibility, done right

How do you transcribe videos?

Upload or paste a link

Get your transcript

Edit and export

Made for accessibility teams

The green checkmark that lied

A panel reading as one wall

A back catalogue, a team of three

A corrected transcript, caption files, and a readable on-page version

Why Auto-Captions Aren't Accessible (and How to Fix It)

Clean, speaker-labeled, click-to-seek

Ask the transcript anything.

Clean paragraphs. No more um's and ah's.

Speaker labels that survive cross-talk

Corrected caption files

On-page transcript

Speaker-labeled panels

Back-catalogue captioning

Record in any language – 99+ detected automatically

Works with the platforms you live in.

Why accessibility teams pick Pepys

What accessibility teams say

Transcription for accessibility – questions, answered

Don't just take our word for it.

Turn your next video into accurate captions and a transcript – and pay only for that video.