Why a focus group is the hardest recording to transcribe
A focus group is the worst case for automatic speaker labeling. On clean one- or two-person audio, diarization error stays below 10%; on overlap-heavy, many-speaker recordings it runs 35-45% (Ryant et al., DIHARD III, 2021). More voices and more crosstalk mean more turns you'll fix by hand.
A typical group runs 6 to 10 participants (J-PAL, MIT), plus a moderator, all in one room. People finish each other's sentences, laugh over answers, and interrupt. That's exactly the signal that pushes a diarization model off track, because it has to guess who's speaking during every overlap.
The task underneath a good transcript is diarization: labeling who spoke each turn. It's reliable when voices are distinct and rarely collide. In a group of eight, the model still gives you a strong first draft. But the overlaps and the two soft-spoken participants at the far end of the table are where you'll spend your cleanup time.
How do you record so the voices stay separable?
Separation at the microphone is what saves you hours later. The model can only distinguish voices it can actually hear apart. Put a boundary or PZM mic in the center of the table for the group, and give the moderator a separate mic or channel, so the person steering the discussion is always cleanly captured.
Kill the room's noise before you record. Turn off HVAC, close windows onto traffic, and move away from the buzzing mini-fridge. A round table beats a long boardroom slab, because everyone sits roughly equidistant from the center mic instead of trailing off at the ends.
Open with a name round. Have each participant say their first name, or an assigned ID, in turn before the discussion starts. It anchors who 'Speaker 3' is, timestamps the moment for later, and gives you a clean voice sample of every person to match against.
How to transcribe a focus group into a clean draft
Typing a group discussion by hand is brutal: a single audio hour can take up to six hours of manual transcription (Haberl et al., 2023, citing Bell et al. 2018), and a 90-minute session is a full day gone. An AI first pass turns that into minutes of processing plus targeted cleanup.
Upload the recording to get a multi-speaker, timestamped draft with each turn attributed. Read it against the audio and fix the predictable failure points first: the moderator's questions mislabeled as a participant, two people talking at once, and the quiet voices the mic barely caught.
Correct the speaker labels in the transcript as you go, and mark true overlap with a bracket, [crosstalk], rather than forcing a guess. A group draft will never be turn-perfect out of the box. Your job is to fix attribution wherever a quote or a coded segment depends on knowing exactly who said it.
What should you keep: verbatim, clean, or coding-ready?
Decide your transcription style before you edit, because in qualitative work the choice between naturalized and denaturalized transcription shapes the analysis (Oliver, Serovich & Mason, 2005), not just readability. Naturalized keeps every utterance in detail; denaturalized corrects grammar and strips interview noise.
For a focus group, the interaction is often the data: who agrees, who pushes back, where the laughter lands. If you're analyzing group dynamics, keep it naturalized, overlaps and all. If you only need the content of what was said, denaturalized reads cleaner. Pick one and apply it consistently across the file.
Then get it into your coding software. Export the cleaned transcript to DOCX for NVivo, ATLAS.ti, or Dedoose, keeping speaker labels and timestamps intact. For the deeper decisions, how to code group data and choose a convention, see the qualitative research transcription guide.
Consent, confidentiality, and anonymizing the transcript
You cannot promise confidentiality in a focus group. IRB guidance is blunt: because every participant hears each disclosure, the confidentiality of anything said in the session cannot be guaranteed (University of Connecticut IRB; Boise State IRB says the same). Tell participants that upfront.
Get consent to record, ideally captured in the audio. Recording-consent laws vary by state (RCFP): some need one party's consent, others require every participant to agree. With a roomful of people, all-party consent is the safe default, so collect a clear yes from everyone before you hit record.
Anonymize in the transcript, not just the report. Under the Common Rule's definition of identifiable private information (45 CFR 46.102), a name, employer, or a too-specific anecdote can re-identify someone. Replace them with role labels in a working copy and keep the master access-controlled. With Pepys, the source recording is auto-deleted about 30 days after upload, while the transcript and your exports are kept. The sensitive original file doesn't linger on a server.