Pepys
12,438,517minutes transcribed

Video transcription, built for the edit

Drop in the footage or paste a link – get a speaker-labeled, click-to-seek transcript plus ready-to-burn captions, so you can cut to the words instead of scrubbing the timeline.

or paste a link
InstagramTikTokYouTubeFacebookSpotifyApple Podcasts

60 min free · no card required · we never train on your audio

PodcasterJournalistContent creatorResearcherStudent
Trusted by 100,000+ creators, podcasters, journalists & researchers

How do you transcribe videos?

To transcribe a video, upload the file or paste its link and Pepys returns a speaker-labeled, time-coded transcript in minutes – plus exportable SRT and VTT captions and a quick AI summary. It's pay-as-you-go with no subscription, and credits never expire.

Made for videographers

Every project lives twice: once as footage on a card and once as a timeline you have to assemble. The hardest part of the assembly is finding the right words buried across hours of interviews, b-roll banter, and run-and-gun audio. A transcript turns that haystack into a document you can read, search, and cut from – so you spend your hours coloring and shaping, not scrubbing back and forth hunting for the one line that makes the cut.

The reality of a paper edit is matching what was said to where it lives, and that means word-level timestamps you can click to seek and speaker labels that keep your subjects from collapsing into one block. Pick your selects on the page, mark your circle-takes, then export frame-accurate SRT and VTT straight into Premiere or DaVinci Resolve. Voice-driven video transcription means you build the cut around the line that lands and let the footage follow it, instead of the other way around.

Clean paragraphs. No more um's and ah's.

The left is what Pepys hands back – logical paragraphs with the filler stripped out, punctuated and readable. The right is the raw, one-line-per-segment dump most transcribers leave you with.

reel-voiceover.mp4

um so yeah everyone keeps telling you to like lead with your best line right but uh honestly if you give away the whole answer in the first second you know there's basically no reason for anyone to keep watching so the hook isn't kind of the smartest thing you say it's like a loop you open that they need to close and um that's the part that actually keeps people around

Raw
BeforeAfter
  • Captions for every cut

    Frame-accurate SRT and VTT files that drop straight into your NLE or your social uploads, no retyping.

  • A paper edit you can read

    A clean, time-coded transcript so you can mark your selects on the page before you ever touch the timeline.

  • Find any line in seconds

    A searchable transcript that jumps you to the exact frame a phrase was spoken, instead of scrubbing for it.

  • Pull the soundbites

    A quick summary surfaces the strongest lines, so the clips you cut for the highlight reel write themselves.

Built in, not bolted on

A searchable transcript, summary, and captions – the moment it uploads

Every videois analyzed automatically the moment it’s transcribed. Here’s a real sample, run through it.

hartley-wedding-prep-memo.mp4AI analysis, built in
AI analysis

On-Set Memo: Shooting the Hartley Wedding So the Edit Cuts Itself

A two-camera wedding shoot planned out loud before call time. The locked wide is the safety net while the long lens hunts reactions, and the whole approach is built around the vows audio, because the edit is cut to the voice first and the picture is built around it. Lav-and-backup-recorder audio, exposing for faces against a blown-out window, grabbing b-roll and safe portrait frames early, mirrored cards and fresh batteries, and a two-drive backup before leaving all serve one goal: never lose the moment the couple actually paid for.

Key points

  • Two-camera plan: the A-cam wide of the altar is the locked safety shot, while the B-cam long lens lives on faces – "The story is in the reactions, not the wide."
  • Audio is treated as the make-or-break: a lav on the officiant plus a backup recorder on the lectern, because "If the lav fails, the whole ceremony is unusable".
  • Expose for the couple against the harsh west window: "A blown window looks intentional. A muddy gray face looks like a mistake."
  • The edit is voice-first: "Find the line, then find the frame. The voice drives the cut, never the other way around."
  • Capture b-roll and the five known-good portrait frames early: "Get the safe shots before you get the pretty shots", since golden hour is roughly twenty minutes of light.
  • Protect the footage: mirror to two cards per body, fresh batteries at three forty-five, and back up to two drives before leaving – "The footage doesn't exist until it's in two places."

Clean, speaker-labeled, click-to-seek

0:00 / 2:21

Ask, don’t scrub

Ask the transcript anything.

An hour-long recording? Don’t skim it – ask. Every answer stays grounded in your transcript and cites the exact timestamp, so you can jump to the moment and check it yourself.

hartley-wedding-prep-memo.mp4Ask AI

What's the audio plan for the ceremony, and what's the backup if it fails?

She's putting a lav on the officiant plus a recorder in his jacket pocket, because the on-camera mic is garbage at thirty feet. If the lav fails the whole ceremony is unusable, so she's also running a backup recorder on the lectern.

Cited0:26

Why does she expose for the couple's faces and let the window blow out?

The four o'clock sun comes straight through the big west window behind the altar, so the couple will be backlit. She exposes for their faces and lets the window go, on the logic that a blown window looks intentional while a muddy gray face looks like a mistake.

Cited0:40
Ask anything about this transcript…

Grounded in your transcript – if the answer isn’t in the audio, it says so instead of guessing.

Who said what

Speaker labels that survive cross-talk

Automatic speaker diarization. Two people, four people, cross-talk and interruptions – interviews, panels, messy meetings. Pepys keeps each voice on its own line instead of blurring them into one, so you never rewind to figure out who was talking.

Reporter

So the festival nearly didn't happen this year–

Mara Okonkwo

–it almost didn't. We lost the venue three weeks out.

Reporter

Three weeks? How do you even start to–

Mara Okonkwo

You call everyone you know. The whole town pitched in.

Reporter

And that's how it ended up in the park.

Works with the platforms you live in.

Paste a link from YouTube, TikTok, Instagram, Facebook, Spotify, or Apple Podcasts – or drop in any audio or video file. We transcribe it once, then you export it however your workflow needs.

  • YouTubeYouTube
  • TikTokTikTok
  • InstagramInstagram
  • FacebookFacebook
  • SpotifySpotify
  • Apple PodcastsApple Podcasts
  • or any file

Export to any format

  • TXT
  • Markdown
  • DOCX
  • PDF
  • SRT
  • VTT
  • JSON

Most useful for videographers: SRT · VTT · TXT · DOCX · PDF

Timestamps, speaker labels, and subtitle timing carry through to every export.

How video transcription works

Upload or paste a link

Drop your video or paste its link – any audio or video, in any language.

Get your transcript

A clean, speaker-labeled transcript with AI notes tuned to your format, ready in minutes.

Edit and export

Fix anything inline, then export to SRT, VTT, TXT, DOCX, PDF, or JSON.

Why videographers pick Pepys

  • No subscription – pay per video, and credits never expire between shoots.

  • Captions are built in, not a separate caption tool to round-trip through.

  • Paste a YouTube, Vimeo, or direct video link – no exporting the file first.

  • Speaker labels keep your interview subjects from blurring into one block of text.

What videographers say

  • captions, chapters AND a hook breakdown straight off the upload. i pull 3 shorts out of every long video now. huge.
    Daniel K.Daniel K.YouTube creator · Product Hunt
  • I transcribe in the original language and receive a translated version with the subtitles still intact. It saved an entire round of contractor work on my last film. Thank you for building this.
    Giulia F.Documentary filmmaker · email
  • every module comes back captioned with a handout written from the transcript. launch prep went from a week to an afternoon, wish id found this sooner honestly.
    Alina M.Alina M.Course creator · Reddit

Video transcription – questions, answered

How do I transcribe a video?

Upload the video file or paste its link (YouTube, Vimeo, or a direct URL) and Pepys returns a speaker-labeled, time-coded transcript in minutes, along with a short AI summary and exportable captions. You don't need to strip the audio out first.

Can I get burn-in or sidecar captions for my edit?

Yes. Every video exports to SRT and VTT, both frame-accurate and ready to import into Premiere, DaVinci Resolve, Final Cut, or a social uploader. Edit any wording inline before you export.

Does it separate the people speaking in an interview?

Yes. Speaker diarization splits each voice, so a multi-person interview or a two-subject piece comes back labeled rather than as one wall of text. Rename "Speaker 1" to your subject's name and it updates everywhere.

Can I do a paper edit from the transcript?

That's the point. The transcript is time-coded and click-to-seek, so you can read the whole shoot, mark your selects on the page, and jump straight to the frame each line was spoken before you build the timeline.

What can I export for a project?

SRT and VTT captions, plain text, a DOCX, and a PDF of the transcript. One click each, and the timecodes stay intact so everything lines up back in your NLE.

How does it handle on-location audio and accents?

It auto-detects the spoken language across 99+ languages and handles a range of accents and noisier run-and-gun audio. Anything it mishears you can fix inline in the editor before exporting.

Do I have to subscribe?

No. Pepys is pay-as-you-go – buy a block of hours, use them across as many shoots as you like, and the credits never expire. You can start free with 60 minutes, no card.

More industries

Turn your next shoot into a searchable transcript and ready-to-burn captions – and pay only for that video.

Pay as you go – credits never expire, nothing to cancel. Or start free with 60 minutes, no card.