9,438,517minutes transcribed

Chinese Audio to Text

Drop in a Mandarin or Cantonese recording and get a clean, timestamped 逐字稿 back – in Simplified or Traditional characters.

or paste a link

Accepts Chinese audio or video – MP3, M4A, WAV, MP4 and more, or a link · returns a clean, timestamped Chinese transcript.

60 min free · no card required · we never train on your audio

Trusted by 100k+ users

How do I turn Chinese audio into text?

Upload your file or paste a link and Pepys writes your Chinese audio out as a timestamped transcript in minutes. It hears Mandarin (普通話 / 國語) and Cantonese (粤語) alike, returns Simplified or Traditional characters to match the speaker, and picks Chinese out of 99+ languages on its own. The first 60 minutes are free, no card.

How chinese audio to text works

Upload or paste a link

Drop in a Mandarin or Cantonese recording, or paste a link – no install.

Get your transcript

Pepys writes the speech out as timestamped characters in minutes.

Edit and export

Fix any character inline, then export to TXT, Markdown, DOCX, PDF, SRT, VTT, or JSON.

"Chinese" on a recording could be a Beijing lecturer in crisp 普通話, a Hong Kong host rattling through 粤語, or a Taipei guest who calls it 國語 – and Pepys takes all of them. Feed it an interview, a class recording, a podcast episode, or a quick voice memo, and you get back searchable, timestamped characters you can quote, edit, and translate without retyping a word.

The thing that defeats off-the-shelf models here is that Chinese carries meaning in tone and writes with no spaces between words, so the same syllable can land as four different characters depending on pitch. Pepys is tuned for exactly that, and it hands you a finished 逐字稿 in either Simplified or Traditional script. Chinese is detected automatically among 99+ languages, your first 60 minutes are free, credits never expire, and your audio is never used to train anything.

Clean paragraphs. No more um's and ah's.

The left is what Pepys hands back – logical paragraphs with the filler stripped out, punctuated and readable. The right is the raw, one-line-per-segment dump most transcribers leave you with.

reel-voiceover.mp4

um so yeah everyone keeps telling you to like lead with your best line right but uh honestly if you give away the whole answer in the first second you know there's basically no reason for anyone to keep watching so the hook isn't kind of the smartest thing you say it's like a loop you open that they need to close and um that's the part that actually keeps people around

Raw

BeforeAfter

Accurate transcripts for both Mandarin (普通話 / 國語) and Cantonese (粤語)
Timestamps and per-turn speaker labels · export to TXT, Markdown, DOCX, PDF, SRT, VTT, or JSON
Translate the finished Chinese transcript into another language in one click
99+ languages including Chinese, auto-detected · we never train on your audio · credits never expire

Any language – 99+ detected automatically

English
中文
Español
العربية
हिन्दी
Français
日本語
Português
Русский
Deutsch
한국어
Italiano
বাংলা
Türkçe
فارسی
Tiếng Việt
தமிழ்
Polski
ไทย
Українська
Nederlands
עברית
Ελληνικά
తెలుగు
Bahasa Indonesia
اردو
Svenska
मराठी
Română
Magyar
Čeština
ગુજરાતી
Kiswahili
ქართული
Tagalog
አማርኛ
99+ total

Works with the platforms you live in.

Paste a link from YouTube, TikTok, Instagram, Facebook, Spotify, or Apple Podcasts – or drop in any audio or video file. We transcribe it once, then you export it however your workflow needs.

YouTube
TikTok
Instagram
Facebook
Spotify
Apple Podcasts
or any file

Export to any format

TXT
Markdown
DOCX
PDF
SRT
VTT
JSON

Timestamps, speaker labels, and subtitle timing carry through to every export.

Chinese audio to text – questions, answered

How do I turn Chinese audio into text?

Upload your recording or paste a link on this page – the first 60 minutes are free, no card. Pepys writes the Mandarin or Cantonese speech out as timestamped characters within minutes.

Does it handle both Mandarin and Cantonese?

Yes. Pepys transcribes Mandarin (普通話 / 國語) and Cantonese (粤語), and returns Simplified or Traditional characters to suit the speaker. Anything off, you correct inline before exporting.

Why is Chinese so hard for speech models?

Tone changes the word, and there are no spaces to mark where one word ends and the next begins – so a flat model guesses the wrong homophone constantly. Pepys is built around that, then lets you tidy any character by hand.

Can I choose Simplified or Traditional output?

Yes. The transcript comes back in whichever script fits your source, and you can convert or edit characters inline before you export.

Is my Chinese audio kept private?

Yes. We never train AI on your audio or transcripts, and you can set files to auto-delete once processing is done.

More free tools

Keep reading

Don't just take our word for it.

Ask ChatGPT, Claude, or Perplexity what Pepys is and who it's for. One click, and your favorite AI does the homework.

Ask ChatGPT Ask Claude Ask Perplexity

Chinese audio to text – free to start

Pay as you go – credits never expire, nothing to cancel. Or start free with 60 minutes, no card.

Start free – 60 minutes or see pricing