Guide

How to transcribe a focus group

A working guide for qualitative and UX researchers running group sessions – how to get accurate, attributable, coding-ready transcripts from a room full of voices.

The short answer

To transcribe a focus group, record so each voice stays separable, then upload the file for a speaker-labeled, timestamped draft. Expect to fix more speaker turns by hand than in a one-on-one interview, because overlap and 6 to 10 voices push automatic labeling to its limit. Clean only the passages you'll code or quote, and anonymize names in the transcript itself.

Why a focus group is the hardest recording to transcribe

A focus group is the worst case for automatic speaker labeling. On clean one- or two-person audio, diarization error stays below 10%; on overlap-heavy, many-speaker recordings it runs 35-45% (Ryant et al., DIHARD III, 2021). More voices and more crosstalk mean more turns you'll fix by hand.

A typical group runs 6 to 10 participants (J-PAL, MIT), plus a moderator, all in one room. People finish each other's sentences, laugh over answers, and interrupt. That's exactly the signal that pushes a diarization model off track, because it has to guess who's speaking during every overlap.

The task underneath a good transcript is diarization: labeling who spoke each turn. It's reliable when voices are distinct and rarely collide. In a group of eight, the model still gives you a strong first draft. But the overlaps and the two soft-spoken participants at the far end of the table are where you'll spend your cleanup time.

How do you record so the voices stay separable?

Separation at the microphone is what saves you hours later. The model can only distinguish voices it can actually hear apart. Put a boundary or PZM mic in the center of the table for the group, and give the moderator a separate mic or channel, so the person steering the discussion is always cleanly captured.

Kill the room's noise before you record. Turn off HVAC, close windows onto traffic, and move away from the buzzing mini-fridge. A round table beats a long boardroom slab, because everyone sits roughly equidistant from the center mic instead of trailing off at the ends.

Open with a name round. Have each participant say their first name, or an assigned ID, in turn before the discussion starts. It anchors who 'Speaker 3' is, timestamps the moment for later, and gives you a clean voice sample of every person to match against.

How to transcribe a focus group into a clean draft

Typing a group discussion by hand is brutal: a single audio hour can take up to six hours of manual transcription (Haberl et al., 2023, citing Bell et al. 2018), and a 90-minute session is a full day gone. An AI first pass turns that into minutes of processing plus targeted cleanup.

Upload the recording to get a multi-speaker, timestamped draft with each turn attributed. Read it against the audio and fix the predictable failure points first: the moderator's questions mislabeled as a participant, two people talking at once, and the quiet voices the mic barely caught.

Correct the speaker labels in the transcript as you go, and mark true overlap with a bracket, [crosstalk], rather than forcing a guess. A group draft will never be turn-perfect out of the box. Your job is to fix attribution wherever a quote or a coded segment depends on knowing exactly who said it.

What should you keep: verbatim, clean, or coding-ready?

Decide your transcription style before you edit, because in qualitative work the choice between naturalized and denaturalized transcription shapes the analysis (Oliver, Serovich & Mason, 2005), not just readability. Naturalized keeps every utterance in detail; denaturalized corrects grammar and strips interview noise.

For a focus group, the interaction is often the data: who agrees, who pushes back, where the laughter lands. If you're analyzing group dynamics, keep it naturalized, overlaps and all. If you only need the content of what was said, denaturalized reads cleaner. Pick one and apply it consistently across the file.

Then get it into your coding software. Export the cleaned transcript to DOCX for NVivo, ATLAS.ti, or Dedoose, keeping speaker labels and timestamps intact. For the deeper decisions, how to code group data and choose a convention, see the qualitative research transcription guide.

Consent, confidentiality, and anonymizing the transcript

You cannot promise confidentiality in a focus group. IRB guidance is blunt: because every participant hears each disclosure, the confidentiality of anything said in the session cannot be guaranteed (University of Connecticut IRB; Boise State IRB says the same). Tell participants that upfront.

Get consent to record, ideally captured in the audio. Recording-consent laws vary by state (RCFP): some need one party's consent, others require every participant to agree. With a roomful of people, all-party consent is the safe default, so collect a clear yes from everyone before you hit record.

Anonymize in the transcript, not just the report. Under the Common Rule's definition of identifiable private information (45 CFR 46.102), a name, employer, or a too-specific anecdote can re-identify someone. Replace them with role labels in a working copy and keep the master access-controlled. With Pepys, the source recording is auto-deleted about 30 days after upload, while the transcript and your exports are kept. The sensitive original file doesn't linger on a server.

The steps, in order

01
Set the room and mics
Put a boundary mic in the center of the table and give the moderator a separate mic or channel. Kill HVAC and traffic noise, and seat the group around a round table so everyone is roughly equidistant from the mic.
02
Do a name round at the top
Before the discussion, have each participant say their first name or assigned ID in turn. It anchors the speaker labels and gives the tool a clean voice sample of everyone in the room.
03
Upload for a multi-speaker draft
Drop the recording into Pepys and get a speaker-labeled, timestamped draft in minutes instead of a day of typing. Every turn comes back attributed and time-coded.
04
Fix attribution and overlap
Read the draft against the audio. Correct mislabeled turns, split the moderator from participants, and mark true crosstalk as [crosstalk] rather than guessing who spoke.
05
Choose a style, then export for coding
Apply naturalized or denaturalized verbatim consistently, then export to DOCX for NVivo, ATLAS.ti, or Dedoose with speaker labels and timestamps intact.

Tips from people who do this a lot

A single boundary (PZM) mic in the center of a round table usually beats several scattered phone recorders, because every voice reaches it at a similar level.
Put the moderator on a separate channel or lav. The one voice that appears most often is the easiest to isolate and the most useful anchor for cleanup.
Run the opening name round even if it feels stiff. It's the fastest way to map 'Speaker 4' to a real person when you clean the draft later.
Don't clean the whole transcript. Polish only the segments you'll code or quote; the rest just needs to be searchable.
Anonymize in a copy and keep an un-redacted master somewhere access-controlled, so you never lose the original attribution if you need to verify a segment.

Try it now

Drop in your recording or paste a link and get a clean, speaker-labeled transcript in minutes. Your first 60 minutes are free.

or paste a link

60 min free · no card required · we never train on your audio

Trusted by 100,000+ creators, podcasters, journalists & researchers

Transcribe a focus group – questions, answered

Why is a focus group harder to transcribe than a one-on-one interview?

More voices and more overlap. On clean one- or two-person audio, automatic speaker-labeling error stays under 10%, but on overlap-heavy, many-speaker recordings it runs 35 to 45% (Ryant et al., DIHARD III, 2021). A group of 6 to 10 people talking over each other is the hard case, so expect more manual cleanup.

How do I get accurate speaker labels for six or more people?

Separate the voices at the microphone. Use a center boundary mic, put the moderator on their own channel, and open with a name round so each voice has a labeled sample. You'll still correct some turns by hand, especially where people talk at once, but a clean recording cuts that work sharply.

Can I promise participants their answers stay confidential?

No. IRB guidance states that because everyone in the room hears each disclosure, confidentiality cannot be guaranteed in a focus group (University of Connecticut IRB; Boise State IRB). Tell participants that before you start, and ask them to respect each other's privacy after the session ends.

Should a focus group be transcribed verbatim or cleaned up?

It depends on your analysis. If group dynamics are the data, keep naturalized verbatim with overlaps and fillers intact. If you only need the content, denaturalized transcription corrects grammar and removes noise for cleaner reading. The choice affects your findings, so pick one convention and apply it across the file.

What happens to my recording after it's transcribed?

With Pepys, the source audio is auto-deleted about 30 days after upload, while the transcript and your exports are kept. That matters for IRB and off-the-record material, because the large, sensitive original file doesn't sit on a server indefinitely once you have your transcript.

References

1.Ryant et al. (2021), The Third DIHARD Diarization Challenge – diarization error by domain – arXiv:2012.01477 (Interspeech 2021)
2.Implementing qualitative methods in the field – focus group size (6-10 participants) – Abdul Latif Jameel Poverty Action Lab (J-PAL), MIT
3.Haberl et al. (2023), Take the aTrain – manual transcription time, citing Bell et al. (2018) – arXiv / University of Graz
4.IRB Researcher Guide – Focus Groups (confidentiality cannot be guaranteed) – University of Connecticut, Office of the Vice President for Research (IRB)
5.Guidance on the Use of Focus Groups (confidentiality limitation) – Boise State University, Office of Research Compliance (IRB)
6.Introduction to the Reporter's Recording Guide – one-party vs all-party consent – Reporters Committee for Freedom of the Press
7.Oliver, Serovich & Mason (2005), Constraints and Opportunities with Interview Transcription – Social Forces (Oxford University Press)
8.45 CFR 46.102 – Definitions (Common Rule): identifiable private information – Cornell Legal Information Institute (eCFR)

Keep reading

Don't just take our word for it.

Ask ChatGPT, Claude, or Perplexity what Pepys is and who it's for. One click, and your favorite AI does the homework.

Ask ChatGPT Ask Claude Ask Perplexity

Get your transcript – free to start

Pay as you go – credits never expire, nothing to cancel. Or start free with 60 minutes, no card.

Start free – 60 minutes or see pricing