Transcribe Chinese Video
Hand Pepys a Mandarin or Cantonese video and get a timestamped transcript plus matching SRT/VTT captions.
Accepts a Chinese video (MP4, MOV, MKV and more) or a link · returns a timestamped Chinese transcript, with SRT/VTT subtitle export.
60 min free · no card required · we never train on your audio
How do I transcribe a Chinese video?
Drop in the video or paste a link and Pepys pulls the Chinese speech, writes it out as a timestamped transcript, and exports SRT or VTT captions to load alongside the clip. It reads Mandarin (普通話 / 國語) and Cantonese (粤語), returns Simplified or Traditional characters, and detects Chinese on its own. The first 60 minutes are free, no card.
How transcribe chinese video works
Add your Chinese video
Upload the clip or paste a link – Pepys lifts the audio track for you.
Get transcript & captions
The Mandarin or Cantonese speech comes back as timestamped characters and cue files.
Export
Take the transcript (TXT, Markdown, DOCX, PDF, SRT, VTT, or JSON) or grab SRT/VTT captions with the timing already lined up.
A lecture from a mainland university, a Cantonese cooking channel, a cross-strait interview where the guest slips between 國語 and English mid-sentence – Pepys turns any of it into text. It reads Mandarin (普通話 / 國語) and Cantonese (粤語), writes Simplified or Traditional characters to match what's spoken, and gives you a timestamped transcript plus SRT or VTT captions to drop next to the video.
Two things make Chinese video stubborn: tone decides the word, and the script runs with no spaces, so a careless model picks the wrong homophone and you can't tell where one word stopped. Pepys is built for that, and it copes with the real-world mess of a video soundtrack – background music, a roomful of people, a phone mic across the table. Chinese is detected among 99+ languages, the first 60 minutes are free, credits never expire, and your video is never used to train anything.
Clean paragraphs. No more um's and ah's.
The left is what Pepys hands back – logical paragraphs with the filler stripped out, punctuated and readable. The right is the raw, one-line-per-segment dump most transcribers leave you with.
um so yeah everyone keeps telling you to like lead with your best line right but uh honestly if you give away the whole answer in the first second you know there's basically no reason for anyone to keep watching so the hook isn't kind of the smartest thing you say it's like a loop you open that they need to close and um that's the part that actually keeps people around
RawChinese video to text with timestamps · plus matching SRT/VTT captions
Reads Mandarin (普通話 / 國語) and Cantonese (粤語), even over music and crowd noise
Paste a link or upload the file · every common video format
99+ languages including Chinese, auto-detected · we never train on your audio · credits never expire
Any language – 99+ detected automatically
- English
- 中文
- Español
- العربية
- हिन्दी
- Français
- 日本語
- Português
- Русский
- Deutsch
- 한국어
- Italiano
- বাংলা
- Türkçe
- فارسی
- Tiếng Việt
- தமிழ்
- Polski
- ไทย
- Українська
- Nederlands
- עברית
- Ελληνικά
- తెలుగు
- Bahasa Indonesia
- اردو
- Svenska
- मराठी
- Română
- Magyar
- Čeština
- ગુજરાતી
- Kiswahili
- ქართული
- Tagalog
- አማርኛ
Works with the platforms you live in.
Paste a link from YouTube, TikTok, Instagram, Facebook, Spotify, or Apple Podcasts – or drop in any audio or video file. We transcribe it once, then you export it however your workflow needs.
- YouTube
- TikTok
- Spotify
- Apple Podcasts
- or any file
Export to any format
- TXT
- Markdown
- DOCX
- SRT
- VTT
- JSON
Timestamps, speaker labels, and subtitle timing carry through to every export.
Transcribe chinese video – questions, answered
How do I transcribe a Chinese video?
Upload the clip or paste a link – first 60 minutes free, no card. Pepys lifts the audio, writes the Mandarin or Cantonese speech out, and returns a timestamped transcript plus captions in minutes.
Can I get Chinese captions as well?
Yes. Export the captions as a downloadable SRT or VTT sidecar file with the timestamps already aligned to the video.
What if the speakers mix Chinese and English?
Code-switching is common in cross-strait and Hong Kong video, and Pepys keeps the English words intact inside the Chinese transcript instead of mangling them into characters.
Which video formats work?
MP4, MOV, MKV, WEBM, AVI and more, plus links. Pepys extracts the audio track and transcribes the speech.
Is my video kept private?
Yes. We never train on your video or transcripts, and you can have files auto-deleted once processing finishes.
More free tools
Keep reading
Transcribe chinese video – free to start
Pay as you go – credits never expire, nothing to cancel. Or start free with 60 minutes, no card.