Pepys
9,438,517minutes transcribed

Vietnamese Audio to Text

Drop in a Vietnamese recording or paste a link and get a timestamped transcript with every tone mark and diacritic in the right place.

or paste a link
InstagramTikTokYouTubeFacebookSpotifyApple Podcasts

Accepts Vietnamese audio or video – MP3, M4A, WAV, MP4 and more, or a link · returns a clean, timestamped transcript in correctly accented quốc ngữ.

60 min free · no card required · we never train on your audio

PodcasterJournalistContent creatorResearcherStudent
Trusted by 100k+ usersRated 4.9 out of 5 by 100k+ users

How do I convert Vietnamese audio to text?

Upload your file or paste a link, and Pepys turns Vietnamese speech into a clean, timestamped transcript in minutes – correctly spelled in quốc ngữ, with the six tones reflected in word choice. It follows Northern (Hanoi) and Southern (Saigon) speakers alike, auto-detects Vietnamese among 99+ languages, and adds an AI summary. The first 60 minutes are free, no card.

How vietnamese audio to text works

01

Upload or paste a link

Drop in a Vietnamese recording or paste a link – any format, nothing to install.

02

Get your transcript

Pepys writes the speech out in quốc ngữ with timestamps, ready in minutes.

03

Edit and export

Fix any term inline, then export to TXT, Markdown, DOCX, PDF, SRT, VTT, or JSON.

Tiếng Việt is written in quốc ngữ – a Latin alphabet so loaded with diacritics that a single vowel can stack a tone mark on top of a vowel modifier (think ấ, ệ, ở). Get one of those marks wrong and you've written a different word entirely. Pepys is tuned to place them correctly, so an interview, lecture, podcast, or voice note comes back as text you can actually search, quote, and translate – not a stream of bare vowels.

The deeper challenge is the tones. Vietnamese carries six of them, and they do real lexical work: "ma, má, mà, mả, mã, mạ" are six unrelated words separated only by pitch. Generic speech models flatten that distinction; a Vietnamese-aware model has to hear it. Pepys also rolls with the regional split – clipped Northern (Hanoi) vowels, the softer Southern (Saigon) sound. Vietnamese is auto-detected among 99+ languages, your first 60 minutes are free, credits never expire, and we never train on your audio.

Clean paragraphs. No more um's and ah's.

The left is what Pepys hands back – logical paragraphs with the filler stripped out, punctuated and readable. The right is the raw, one-line-per-segment dump most transcribers leave you with.

reel-voiceover.mp4

um so yeah everyone keeps telling you to like lead with your best line right but uh honestly if you give away the whole answer in the first second you know there's basically no reason for anyone to keep watching so the hook isn't kind of the smartest thing you say it's like a loop you open that they need to close and um that's the part that actually keeps people around

Raw
BeforeAfter
  • Correct quốc ngữ spelling – tone marks and vowel diacritics land in the right place, across Hanoi and Saigon accents

  • Timestamps and per-chunk speaker labels · export to TXT, Markdown, DOCX, PDF, SRT, VTT, or JSON

  • Translate the finished Vietnamese transcript into another language with one click

  • 99+ languages including Vietnamese, auto-detected · we never train on your audio · credits never expire

Any language – 99+ detected automatically

Works with the platforms you live in.

Paste a link from YouTube, TikTok, Instagram, Facebook, Spotify, or Apple Podcasts – or drop in any audio or video file. We transcribe it once, then you export it however your workflow needs.

  • YouTubeYouTube
  • TikTokTikTok
  • InstagramInstagram
  • FacebookFacebook
  • SpotifySpotify
  • Apple PodcastsApple Podcasts
  • or any file

Export to any format

  • TXT
  • Markdown
  • DOCX
  • PDF
  • SRT
  • VTT
  • JSON

Timestamps, speaker labels, and subtitle timing carry through to every export.

Vietnamese audio to text – questions, answered

How do I convert Vietnamese audio to text?

Drop your Vietnamese recording on this page or paste a link – the first 60 minutes are free, no card. Pepys writes it out in timestamped quốc ngữ within minutes, then lets you tidy anything up inline.

Will the tone marks and diacritics be correct?

That's the whole point. Pepys places tone marks and vowel diacritics (á, ầ, ệ, ở and the rest) where they belong, since a missing mark changes the word. Anything off, you can fix inline before exporting.

Does it understand Northern and Southern accents?

Yes. It follows clipped Northern (Hanoi) speech and the softer Southern (Saigon) sound, and you can correct any term inline before you export.

Why is Vietnamese tricky for generic transcribers?

Six tones decide a word's meaning – ma, má, mà, mả, mã and mạ are six different words. Generic models smear those together; Pepys is built to keep them apart.

Is my Vietnamese audio private?

Yes. We never train AI on your audio or transcripts, and you can set files to auto-delete after processing.

More free tools

Keep reading

Don't just take our word for it.

Ask ChatGPT, Claude, or Perplexity what Pepys is and who it's for. One click, and your favorite AI does the homework.

Vietnamese audio to text – free to start

Pay as you go – credits never expire, nothing to cancel. Or start free with 60 minutes, no card.