Every transcript is a choice, not a copy
Types of transcription sit under one parent idea: a transcript is a written representation of speech. Writing speech down always re-renders it. That framing matters before you pick a style. Bucholtz argues that transcription involves interpretive decisions (what you write down) and representational decisions (how you write it), and that a truly objective transcription is not possible.
Qualitative researchers describe two poles for that choice. Naturalism captures every utterance in as much detail as possible. Denaturalism corrects grammar, removes interview noise like stutters and pauses, and standardizes non-standard accents. Oliver, Serovich and Mason call transcription a consequential act of representation whose decisions carry into the research findings themselves.
So the type you choose shapes the record itself. It sets what a later reader, coder, or court can see. Choose it on purpose, before you start editing.
Four types of transcription by style, from verbatim to intelligent verbatim
Most named transcription types describe how much of the raw speech survives the edit. Strict verbatim, also called true verbatim, keeps everything: every 'um', false start, repetition, stutter, and pause. It sits at the naturalism pole. Reach for it when how something was said carries meaning, as in discourse or legal analysis.
Clean verbatim removes filler words and stammers but keeps the speaker's real words, grammar, and meaning. It is the readable default for journalism and most research quoting. Edited transcription goes a step further, fixing grammar and trimming tangents for a tidy record. Intelligent verbatim, sometimes called readable verbatim, lightly rewrites for smooth reading without changing what was meant.
Read the four as points on a naturalism-to-denaturalism line, not four sealed boxes. The line between verbatim and clean verbatim, and when each is right, is worth its own walk-through. One rule holds across all of them: never silently fix a factual error a speaker made – flag it instead.
Phonetic and conversation-analytic transcription capture what words miss
Two specialist types record sound and delivery rather than just words. Phonetic transcription writes down how speech is pronounced, using the International Phonetic Alphabet, the standard for notating speech sounds. It comes in two grains, broad and narrow.
Broad, or phonemic, transcription ignores as many details as possible, capturing only enough to tell one word from another. Narrow transcription captures as many pronunciation details as it can, within the limits of hearing and IPA conventions. Linguists, speech pathologists, and language teachers use these grains. A journalist almost never will.
Conversation analysis adds a third system, the Jeffersonian transcript. It encodes delivery – timing, speed, emphasis, pitch, and volume – plus overlap and measured pauses. Park and Hepburn note that standard orthographic transcripts wipe out these core elements, the very things speakers use to build meaning.
Human, AI, and hybrid: the method axis
The second axis is who produces the transcript, and the gap between people and machines has narrowed. On the Switchboard conversational benchmark, professional human transcribers reached a 5.9% word error rate, and an automated system reached 5.8%, edging past that human mark. On clean speech, in other words, AI is now competitive with a professional.
Human transcription still leads on hard audio: heavy accents, crosstalk, poor recordings, and specialist vocabulary. But it is slow and costly. Transcribing one hour of interview audio can take up to six hours of manual work. That cost is why pure-human transcription is now saved for the jobs that truly need it.
The hybrid method splits the difference. An AI first pass produces a draft in minutes; a person then corrects names, jargon, numbers, and overlapping speech. For most research and journalism, this is the practical type to use. Where the method really matters, such as certified or regulated work, the AI-versus-human tradeoff deserves a closer look.
Which type of transcription do you actually need?
Match the type to the job, not the other way round. In qualitative research, the naturalism-versus-denaturalism call comes first, because it shapes your coding and your findings. Capture the detail you will actually analyze, then apply the same style consistently across every transcript in the study.
Legal and court work sits at the strict-verbatim end. Federal court proceedings must be recorded verbatim by statute, and depositions hold to the same standard. This is where the method matters most; the legal transcription guide covers when a certified human is required. For accessibility, a plain readable transcript is often the baseline, since WCAG 2.1 requires a text alternative for prerecorded audio.
For journalism, clean verbatim plus a hybrid workflow covers almost everything: readable quotes, fast turnaround, and strict verbatim held back for the lines where exact phrasing is the story.