Pepys

Guide

How to analyze interview transcripts

A working method for researchers and journalists: turn a clean transcript into coded, defensible themes – not a pile of highlighted quotes.

The short answer

To analyze interview transcripts, start from a clean, speaker-labeled transcript and code it: label the meaningful segments, then group those codes into themes. Braun and Clarke's six phases – familiarize, generate codes, search, review, define, and report – give the standard route. Evidence each theme with timestamped quotes drawn fairly across the whole data set, not just the striking lines.

Start with a transcript you can actually code

The analysis is only as good as the transcript under it. Before you code a word, you want a clean, speaker-labeled, timestamped transcript – ideally line-numbered, so you can point a co-coder to "line 214." Producing that artifact is a job of its own; the qualitative research transcription workflow covers making and formatting it. This guide picks up once the transcript exists.

Decide your verbatim level before you analyze, because it shapes what you can claim. Strict verbatim – every "um," pause, and false start – matters for discourse or conversation analysis, where how something is said is the data. For most thematic work, clean verbatim (filler removed, words intact) is enough. Apply one style consistently across every interview in the set.

Keep the audio close. A transcript is a reduction of the recording, and you'll want to re-hear ambiguous lines while coding. Timestamps make that a two-second jump instead of a scrub through an hour of tape. Flag anything genuinely unclear as [inaudible] with its timestamp, rather than guessing at a word you might later quote.

Coding turns interview transcripts into analyzable data

Coding means labeling segments of the transcript so you can retrieve and compare them. Johnny Saldaña frames it in cycles: First Cycle coding assigns codes to portions from a single word to a full page, and Second Cycle coding recodes and regroups those into categories, themes, and concepts (Saldaña, 2013). Coding is cyclical – the first pass is rarely the last.

Grounded theory supplies the classic coding vocabulary. Strauss and Corbin's approach moves from open coding to axial coding, where categories are reconnected around a "coding paradigm" (Kelle, 2005). Open coding fractures the data into concepts; axial coding builds the relationships between them.

Not everyone accepted that. In Emergence vs. Forcing (1992), Glaser argued that axial coding and coding paradigms "force" categories onto the data instead of letting them emerge (Kelle, 2005). The practical lesson: a coding structure is a tool, not a verdict. Hold it loosely enough to notice what it hides.

However you code, tie each code to evidence you can attribute. Pull the exact line, with its speaker and timestamp, as you tag it – a timestamped quote is what turns a code into something you can defend in the write-up. Codes without traceable extracts are just opinions wearing labels.

Braun and Clarke's six phases of thematic analysis

Thematic analysis is the most common route for interview data, and Braun and Clarke's six-phase guide is its reference procedure. The phases are: familiarizing yourself with the data, generating initial codes, searching for themes, reviewing themes, defining and naming themes, and producing the report (Braun & Clarke, 2006). They're recursive, not strictly linear.

A theme is more than a topic that keeps coming up. Braun and Clarke define it precisely: a theme "captures something important about the data in relation to the research question, and represents some level of patterned response or meaning within the data set" (Braun & Clarke, 2006). Frequency alone doesn't make a theme.

One reframing matters. Braun and Clarke reject the idea that themes "emerge" from data; in reflexive thematic analysis, themes are "analytic outputs, created from codes and through the researcher's active engagement with their data" (Braun & Clarke, 2019). You build themes; you don't find them lying in wait.

That's also why they don't advocate inter-rater reliability scores or a fixed coding frame for reflexive TA (Braun & Clarke, 2019). Coding "inescapably bears the mark of the researcher," so there's no single accurate coding to agree on. If your design needs multiple independent coders, that's a coding-reliability approach – a different tradition. Be clear which you're using.

What CAQDAS software does, and what it can't

Software organizes coding; it doesn't do the thinking. A peer-reviewed reflection on NVivo puts it plainly: the main function of CAQDAS "is not necessarily to analyse data, but rather to aid the analysis process, which the researcher must always remain in control of" (Zamawe, 2015). The interpretation stays yours.

What these tools do well is store, tag, and retrieve. To get a transcript in, export a clean DOCX first. ATLAS.ti imports .doc, .docx, .rtf, and .txt, and separately imports .srt and .vtt caption files as transcripts; NVivo imports Word and text sources and PDFs, converting formats like PowerPoint to PDF first. An audio-to-DOCX export lands cleanly in either.

Speaker structure is worth preserving on import. Both packages can auto-code by speaker when your transcript's turns are formatted consistently – the payoff for tidy speaker labels upstream. The dissertation transcription guide walks through the speaker-turn import mechanics if you're coding in NVivo or ATLAS.ti.

Manual coding is still legitimate. For a handful of interviews, colored highlighters, a spreadsheet, or a word processor's comments work fine. Software earns its place when the data set is large, several people code together, or you need to retrieve every extract under a code across dozens of transcripts.

How many interviews before the themes hold up?

Fewer than most people expect, if your sample and question are tight. A well-known Field Methods study found saturation within the first twelve interviews, with the basic elements of metathemes present by six (Guest, Bunce & Johnson, 2006). Scope, not a magic number, drives it.

A 2022 systematic review put a range on it. Hennink and Kaiser found studies reached saturation within 9–17 interviews, or 4–8 focus groups, for homogeneous populations with narrowly defined aims (Hennink & Kaiser, 2022). Broader questions and more varied participants push the number up.

Saturation is about having enough data; rigor is about using all of it. The temptation is to quote the three vivid lines that fit your argument. Silverman's rule cuts against that: do not look for telling examples, but "analyse your data thoroughly and fairly" (Silverman, 2011). Test each theme against the whole set.

Look for the disconfirming case on purpose. A theme that survives the interview that seemed to contradict it is stronger than one built only from supportive quotes. Report the counter-examples and how you accounted for them – that's what separates analysis from a highlight reel.

The steps, in order

  1. 01

    Prepare a coding-ready transcript

    Start from a clean, speaker-labeled, timestamped transcript, ideally line-numbered. Fix names, jargon, and unclear lines against the audio, and settle on one verbatim style across the set.

  2. 02

    Read for familiarity first

    Read the whole transcript once before coding anything, noting first impressions and recurring ideas. This is the familiarization phase, and it stops you coding on autopilot.

  3. 03

    Code the data in cycles

    In the first cycle, label meaningful segments from a word to a paragraph. In the second cycle, recode and regroup those labels into tighter categories.

  4. 04

    Build themes from the codes

    Cluster related codes into candidate themes. A theme should capture something important about the data in relation to your research question, not just a frequent topic.

  5. 05

    Review and define each theme

    Check every theme against its coded extracts and the full data set. Merge, split, or drop themes that don't hold, then name and define the ones that do.

  6. 06

    Evidence themes fairly, then report

    Pull timestamped quotes for each theme from across the data, including counter-examples. Analyze thoroughly and fairly rather than selecting only the striking lines, then write it up.

Tips from people who do this a lot

  • Code on a line-numbered copy so you can cite "line 214" in a memo or to a co-coder, and keep an untouched master for verification.

  • Write an analytic memo the moment a code feels important – the reasoning behind it fades faster than you expect once you're twelve interviews deep.

  • Keep a codebook with each code's name, a one-line definition, and an example extract, so the same label means the same thing on interview 12 as on interview 2.

  • If you're doing reflexive thematic analysis, drop the inter-rater score – Braun and Clarke don't advocate it, because coding carries the researcher's mark by design.

  • Hunt for the quote that contradicts your theme on purpose. A disconfirming case you can explain is what keeps the analysis honest and the theme defensible.

Try it now

Drop in your recording or paste a link and get a clean, speaker-labeled transcript in minutes. Your first 60 minutes are free.

or paste a link
InstagramTikTokYouTubeFacebookSpotifyApple Podcasts

60 min free · no card required · we never train on your audio

PodcasterJournalistContent creatorResearcherStudent
Trusted by 100,000+ creators, podcasters, journalists & researchers

How to analyze interview transcripts – questions, answered

What's the difference between coding and thematic analysis?

Coding is the narrower act of labeling segments of the transcript so you can retrieve and compare them. Thematic analysis is the wider process that clusters those codes into themes and reports them. Coding is one stage inside thematic analysis, and inside grounded theory too.

What is open coding versus axial coding?

Open coding fractures the data into concepts, labeling it segment by segment. Axial coding, from Strauss and Corbin, reconnects those categories around a coding paradigm. Glaser objected that axial coding forces categories onto the data rather than letting them emerge – a live debate in grounded theory.

Do I need software to analyze interview transcripts?

No. CAQDAS tools like NVivo and ATLAS.ti store, tag, and retrieve coded data, but they don't interpret it – the analysis stays with you. Manual coding with highlighters or a spreadsheet is valid for a small set. Software mainly earns its place with large data or several coders.

How many interviews do I need for saturation?

It depends on scope. One Field Methods study reached saturation within twelve interviews, with metatheme elements by six. A 2022 review found 9 to 17 interviews for homogeneous samples with narrow aims. Fewer, more similar participants and a tighter question mean fewer interviews.

Do themes really "emerge" from the data?

Not in reflexive thematic analysis. Braun and Clarke reframe themes as analytic outputs the researcher actively creates from codes, not patterns waiting to be found. The phrase "themes emerged" understates the interpretive choices you make. You build themes through engagement with the data.

References

  1. 1.Braun & Clarke (2006), Using thematic analysis in psychology (six phases; theme definition)Qualitative Research in Psychology (Routledge/Taylor & Francis); full text via Charles University repository
  2. 2.Braun & Clarke (2019), Answers to frequently asked questions about thematic analysis (themes are constructed, not emerging; no inter-rater reliability)Virginia Braun & Victoria Clarke, University of Auckland
  3. 3.Kelle (2005), "Emergence" vs. "Forcing" of empirical data (open/axial coding; Glaser's objection)Forum Qualitative Sozialforschung / Forum: Qualitative Social Research (FQS)
  4. 4.Saldaña (2013), The Coding Manual for Qualitative Researchers, 2nd ed. (First/Second Cycle coding)Johnny Saldaña, SAGE Publications
  5. 5.Guest, Bunce & Johnson (2006), How Many Interviews Are Enough? (saturation within 12 interviews)Field Methods (SAGE Publications)
  6. 6.Hennink & Kaiser (2022), Sample sizes for saturation in qualitative research (9–17 interviews; 4–8 focus groups)Social Science & Medicine (Elsevier); indexed on PubMed (PMID 34785096)
  7. 7.Zamawe (2015), The Implication of Using NVivo Software in Qualitative Data Analysis (CAQDAS aids, does not analyze)Malawi Medical Journal; via PubMed Central (PMC4478399)
  8. 8.Silverman (2011), Interpreting Qualitative Data, Ch. 3, Rule 4 (analyse thoroughly and fairly, do not look for telling examples)David Silverman, SAGE Publications (4th ed.)
  9. 9.ATLAS.ti 23 Windows User Manual, supported document file formats (transcript import: .doc/.docx/.rtf/.txt)ATLAS.ti Scientific Software Development GmbH (official user manual)
  10. 10.ATLAS.ti 23 Windows User Manual, Importing Automated Transcripts in VTT and SRT format (caption-file import)ATLAS.ti Scientific Software Development GmbH (official user manual)
  11. 11.NVivo support documentation, Documents and PDFs (Word/text/PDF import; converts PowerPoint to PDF first)QSR International / Lumivero (official NVivo support docs)

Keep reading

Don't just take our word for it.

Ask ChatGPT, Claude, or Perplexity what Pepys is and who it's for. One click, and your favorite AI does the homework.

Get your transcript – free to start

Pay as you go – credits never expire, nothing to cancel. Or start free with 60 minutes, no card.