Record before you transcribe a Zoom meeting
Zoom can generate an audio transcript for cloud recordings, but that feature needs a paid plan – Pro, Business, Education, or Enterprise – with cloud recording turned on by an admin. On a free Basic account it isn't there by default. So your first decision is how you'll capture the call: Zoom's cloud transcript, a local recording, or a separate file per speaker.
The best capture for accuracy is per-participant audio. Zoom can save a separate audio file for each participant, so every speaker lands on their own track. Isolated tracks make speaker labeling far cleaner, because the tool isn't guessing who's talking when two people talk over each other. Turn that setting on before the meeting, not after.
If you can only get a single mixed recording, that's workable – you'll just fix more speaker turns by hand. Either way, have each person say their name and the date at the top of the call. It anchors who 'Speaker 1' is and timestamps your consent.
Should you use Zoom's transcript or upload the audio?
Zoom's built-in transcript is convenient, but it's tied to a paid plan. Uploading your recording to a dedicated tool is the more flexible route, especially if you captured per-participant tracks. The underlying workflow is the same one you'd use to transcribe any interview: AI does the first pass, you do the cleanup.
Typing it up by hand is the slow path. Transcribing by hand can take up to six hours for a single hour of audio – most of a working day for one meeting. An AI first pass turns that hour into a few minutes of processing plus a focused cleanup, so you're editing, not re-transcribing.
Where the machine still needs you: proper nouns, acronyms, numbers said fast, and crosstalk where two people speak at once. Those are exactly the lines that matter for an attributable quote. Let the tool handle the bulk and spend your attention on the small share that's load-bearing.
What gives you the cleanest speaker labels on a Zoom recording?
Feed the tool separated audio. Zoom's separate-file-per-participant recording gives each speaker a clean track, which is the single biggest lever on label accuracy – bigger than any setting inside the transcription tool. When every voice arrives on its own file, the labels stay distinct even through fast back-and-forth.
With a single mixed file, the tool separates speakers by voice, which works but stumbles on overlapping speech and similar-sounding people. Read the labeled draft against the audio and correct the turns around crosstalk. Keeping gallery-view video helps too – matching a voice to a face resolves an ambiguous label in seconds.
Don't rename speakers until the draft is otherwise clean. Fix the turn boundaries first, then swap 'Speaker 1' for the real name in one pass, so a mislabeled stretch doesn't propagate the wrong name through the whole transcript.
Verbatim, clean, or readable – and quoting people fairly
Decide your transcript style before you edit, because it changes every line. In research, that choice is a real act of representation, not a neutral clerical step. Strict verbatim keeps every 'um' and false start; clean verbatim drops the filler but keeps the words; readable verbatim lightly tidies grammar so a quote reads smoothly in print.
For the quotes you'll actually publish, tighten to your chosen style and keep the timestamp so you can re-check the line against the audio before it goes out. Don't polish the whole transcript to publication quality. Most of it you'll never quote, so spend the effort on the lines going into the piece.
Never silently fix a factual slip a speaker made. If they say the wrong year, the quote keeps the wrong year, and you flag it with a bracketed [sic] – the standard way to show the error is the source's, not yours – rather than making a quiet edit.
Consent and where the recording lives
Zoom shows a recording banner, but a banner isn't the same as consent. US recording law varies: federal law and most states allow one-party consent, while roughly a dozen states require every party to agree. Get a clear yes on the record before the substance starts.
Zoom calls routinely cross state lines, and that complicates things. When participants sit in different states, the cautious move is to assume the stricter state law applies – so treat an all-party-consent state on the call as the rule for everyone. Asking each person to consent on the recording is the cleanest defense.
For sensitive meetings, mind where the audio and transcript sit. Use a tool that doesn't train on your files and lets you delete them after processing. When you're done, export to DOCX for writing or SRT/VTT for captions, and keep the master somewhere access-controlled. Pepys never trains on your audio or text, and you can auto-delete files after transcription.