Guide

Apple Voice Memos transcription

What Apple's built-in transcript can and can't do for interviews and research, and how to turn a Voice Memo into a diarized, timestamped file you can actually cite.

The short answer

Apple's Voice Memos transcribes recordings to text on the device itself – a Mac on macOS 15 with Apple silicon, or an iPhone 12 or later on iOS 18 – but it adds no speaker labels, no timestamped export file, and keeps the transcript tied to the recording. For a diarized, timestamped, exportable transcript, export the recording as an .m4a and upload it to a transcription tool.

What the built-in transcript is, and what it needs

Recent versions of Voice Memos turn a recording into text on the device itself, with no third-party app in the loop. On a Mac, that takes macOS 15 (Sequoia) or later on Apple silicon. Apple documents that there, "speech in your audio recordings can be recognized and transcribed to text."

On iPhone, the feature needs an iPhone 12 or later running iOS 18, and Apple notes "it's not available in all countries or regions."

Because the work happens on your hardware, the audio doesn't need a server round-trip. Apple's own Voice Memos pages don't use the phrase "on-device," but the requirement for Apple silicon, or an iPhone 12-class chip, points that way. Independent reporting describes this kind of transcription as processing audio "locally on the user's hardware ... without connecting to an external server."

Supported languages differ by device

The language list isn't universal, which trips people up. Apple's Mac guide names 16: "English, Danish, Dutch, French, German, Italian, Norwegian, Portuguese, Spanish, Swedish, Turkish, Chinese (Simplified), Chinese (Traditional), Japanese, Korean, and Vietnamese."

The iPhone page is shorter. It documents 10 languages: English (all variants), Spanish, Portuguese, Italian, French, German, Japanese, Korean, Simplified Chinese, and Traditional Chinese. Same feature, narrower list.

Before you count on a language, open Apple's page for the device you'll actually use. What the Mac handles and what the iPhone handles are not the same set.

Where Voice Memos transcription stops short

For a solo memo, the built-in transcript is genuinely useful. For anything with two or more voices, three gaps start to matter. None of Apple's transcription pages describe speaker labels; the documentation covers viewing, searching, and copying text, and stops there.

No speaker labels means no diarization. A two-person interview comes back as one undivided block of text, with no "Interviewer" and "Source" turns to work from. Rebuilding who-said-what by ear is the slow part, and it's exactly what speaker diarization exists to remove.

There's no timestamped transcript file either. Apple documents selecting the transcript text and copying it, "Control-click it, then choose Copy," but nothing exports a standalone document or caption file with times attached. So you can't jump from a written line straight back to the second it was spoken.

The transcript is also device-bound. With iCloud, your recordings "appear automatically" in Voice Memos across your Mac, iPhone, and iPad when you're signed in to the same Apple Account. Handy, but that's sync, not a portable artifact. You can't hand the transcript to a fact-checker or a coding tool as a file. Copy the text out and you lose whatever structure it had.

Turning the .m4a into a citeable transcript

The recording itself travels fine. A Voice Memo exports as an .m4a audio file, "by default, recordings are exported in .m4a format," which is a plain, widely supported container any transcription tool will accept. That audio file, not the built-in transcript, is what you want to move.

Push that .m4a through a tool that diarizes and timestamps, and you get the two things Voice Memos withholds: labeled speaker turns and times you can cite. From there you can export a timestamped file such as SRT or a formatted document, then check any quote against the audio in seconds.

On a fresh recording the path is short. Record in Voice Memos, then export the .m4a and upload it, and edit only the quotes you'll actually publish.

Is the built-in transcript enough on its own?

For a voice note to yourself, yes. For interview or research work where you'll quote people, it gets you a rough read but not a citeable file: the speakers arrive unlabeled, the timing never lands in a file, and nothing exports to hand off. Whether that's a dealbreaker depends on what you'll do with the words.

The economics still favor letting a machine draft first. Manual transcription runs "up to six hours of manual work" for a single hour of audio. Modern speech recognition, by contrast, reaches "a Word Accuracy of 97.9329%" on clean read speech. So you're correcting a strong draft, not typing from nothing. Feed the tool clean audio and your time goes to the quotes that matter.

Consent is the one thing no tool settles for you. Federal law makes "one-party consent ... the minimum requirement", and about 11 states require every party to agree, so get a clear yes on the record before the substance starts. If you recorded the interview this way, the full interview workflow covers cleanup, verbatim style, and export.

Tips from people who do this a lot

Send the .m4a, not the copied transcript text – a tool can only rebuild speaker turns and timestamps from the audio.
Check Apple's page for the exact device you'll use before you rely on a language: the Mac guide lists 16 transcription languages and the iPhone page lists 10, so a language that works on your laptop may not be there on your phone.
For a two-person interview, don't split the undivided block by ear – a diarizing tool labels the turns in one pass and frees your attention for the quotes.
The transcript lives inside the recording. Copy it out and the words survive, but the structure doesn't.
Say each speaker's name and the date into the recording at the top; once you move the .m4a somewhere that can label speakers, that anchor tells you who "Speaker 1" really is.

Try it now

Drop in your recording or paste a link and get a clean, speaker-labeled transcript in minutes. Your first 60 minutes are free.

or paste a link

60 min free · no card required · we never train on your audio

Trusted by 100,000+ creators, podcasters, journalists & researchers

Apple voice memos transcription – questions, answered

Can Voice Memos transcribe a recording by itself?

Yes, on recent hardware. A Mac needs macOS 15 or later with Apple silicon, and an iPhone needs an iPhone 12 or later on iOS 18. The feature turns speech into text on the device, but it isn't available in every country or region.

Does Voice Memos label who is speaking?

No. Apple's transcription pages describe viewing, searching, and copying the text, with no speaker labels. A two-person interview comes back as one undivided block, so you'd separate the turns by ear or run the audio through a tool that diarizes it for you.

Can I export a timestamped transcript from Voice Memos?

Not as a file. Apple documents selecting and copying the transcript text, but there's no export to a standalone document or caption file with timestamps. The transcript stays attached to the recording and syncs across your own Apple devices rather than existing as a portable artifact.

What languages can Voice Memos transcribe?

It depends on the device. Apple's Mac guide lists 16 languages, including English, Danish, Dutch, and Vietnamese, while the iPhone page lists 10. Check Apple's page for your specific device, since the supported set is not the same on Mac and iPhone.

How do I get a diarized, timestamped transcript from a Voice Memo?

Export the recording as an .m4a, which is the default format, then upload it to a transcription tool that labels speakers and adds timestamps. You get back a citeable draft you can edit and export, which the built-in transcript doesn't provide.

References

1.View a transcription of a recording in Voice Memos (Mac) – Apple Support
2.View a transcription of a recording in Voice Memos (iPhone) – Apple Support
3.Export a Voice Memos recording to Files (iPhone) – .m4a default format – Apple Support
4.See your Voice Memos recordings on all your Apple devices (iCloud sync) – Apple Support
5.Audio transcription compared – cloud-based vs on-device – AppleInsider
6.Haberl et al. (2023), Take the aTrain (arXiv:2310.11967) – manual transcription time, citing Bell et al. (2018) – arXiv (Haberl, Fleiss, Kowald, Thalmann)
7.Whisper MLPerf Inference v5.1 benchmark – Word Accuracy on LibriSpeech – MLCommons
8.Introduction to the Reporter's Recording Guide – one-party vs all-party consent – Reporters Committee for Freedom of the Press

Keep reading

Don't just take our word for it.

Ask ChatGPT, Claude, or Perplexity what Pepys is and who it's for. One click, and your favorite AI does the homework.

Ask ChatGPT Ask Claude Ask Perplexity

Get your transcript – free to start

Pay as you go – credits never expire, nothing to cancel. Or start free with 60 minutes, no card.

Start free – 60 minutes or see pricing