Transcript to JSON
Upload a recording or paste a link and get structured JSON – an array of cues with start, end, speaker, and text.
Accepts an audio or video file – or a link to one · returns structured JSON – an array of cues with start, end, speaker, and text.
60 min free · no card required · we never train on your audio
How do I export a transcript to JSON?
To get a transcript as JSON, upload a recording or paste a link to Pepys and it transcribes the audio, then exports structured JSON – an array of cues, each with start, end, speaker, and text. It's ready to parse in any language or pipeline, across 99+ spoken languages. Your first 60 minutes are free, no card required.
How transcript to json works
Upload audio or paste a link
Drop in a recording or paste a link – we extract the audio automatically.
Get your transcript
AI transcribes it into speaker-labeled cues with start and end timestamps in minutes.
Export to JSON
Download structured JSON – an array of cues you can parse in code – or export TXT, Markdown, DOCX, PDF, SRT, or VTT.
A .json export hands your code the transcript as data, not prose: an array of cues, each with a start, end, speaker, and text field. Pepys transcribes your audio or link and returns exactly that, so a search index, a captions player, a summarizer, or a dataset can iterate over the cues without scraping or parsing free-form text.
The timestamps are precise enough to sync subtitles, jump a player to a quote, or align segments for translation. It works in 99+ auto-detected languages, every cue is grounded in the real audio, and we never train on your data. Pay only for what you transcribe; credits never expire.
What the JSON looks like
[
{ "start": 0.00, "end": 4.20, "speaker": "Speaker 1",
"text": "Welcome back to the show." },
{ "start": 4.20, "end": 9.80, "speaker": "Speaker 2",
"text": "Glad to be here. Let's get into it." }
]An array of cues – start, end, speaker, and text fields – ready to parse in any language or runtime
Precise timestamps to sync captions, drive a player, or align segments for translation
Speaker labels included, so your code knows who said each line
99+ languages, auto-detected · we never train on your audio · credits never expire
Works with the platforms you live in.
Paste a link from YouTube, TikTok, Instagram, Facebook, Spotify, or Apple Podcasts – or drop in any audio or video file. We transcribe it once, then you export it however your workflow needs.
- YouTube
- TikTok
- Spotify
- Apple Podcasts
- or any file
Export to any format
- TXT
- Markdown
- DOCX
- SRT
- VTT
- JSON
Most useful for your work: JSON
Timestamps, speaker labels, and subtitle timing carry through to every export.
Transcript to json – questions, answered
How do I export a transcript to JSON?
Upload a recording or paste a link on this page – the first 60 minutes are free, no card. Pepys transcribes it and lets you download structured JSON cues with timestamps, speakers, and text.
What's the shape of the JSON?
You get an array of cues, where each cue has a start time, an end time, a speaker label, and the text spoken – simple to iterate over in any programming language.
Can I convert the JSON to subtitles later?
Yes – the same transcript also exports directly to SRT and VTT, so you can grab JSON for your code and a subtitle file from the one upload.
Does it support multiple speakers and languages?
Yes. Cues come speaker-labeled, and language is auto-detected across 99+ languages – you can translate the finished transcript before exporting too.
More free tools
Keep reading
Transcript to json – free to start
Pay as you go – credits never expire, nothing to cancel. Or start free with 60 minutes, no card.