Guide

Types of transcription

How the style you keep and the method you use change what a transcript can be used for, and how to match the type to the job.

The short answer

Transcription types split along two axes. Style covers how much you keep: strict verbatim records every filler and false start, while clean, edited, and intelligent verbatim progressively tidy the words for readability. Method covers who does the work: human, AI, or a hybrid first-pass-plus-cleanup. Phonetic transcription is separate, using the International Phonetic Alphabet (IPA) to represent speech sounds rather than words. Match the type to your use case.

Every transcript is a choice, not a copy

Types of transcription sit under one parent idea: a transcript is a written representation of speech. Writing speech down always re-renders it. That framing matters before you pick a style. Bucholtz argues that transcription involves interpretive decisions (what you write down) and representational decisions (how you write it), and that a truly objective transcription is not possible.

Qualitative researchers describe two poles for that choice. Naturalism captures every utterance in as much detail as possible. Denaturalism corrects grammar, removes interview noise like stutters and pauses, and standardizes non-standard accents. Oliver, Serovich and Mason call transcription a consequential act of representation whose decisions carry into the research findings themselves.

So the type you choose shapes the record itself. It sets what a later reader, coder, or court can see. Choose it on purpose, before you start editing.

Four types of transcription by style, from verbatim to intelligent verbatim

Most named transcription types describe how much of the raw speech survives the edit. Strict verbatim, also called true verbatim, keeps everything: every 'um', false start, repetition, stutter, and pause. It sits at the naturalism pole. Reach for it when how something was said carries meaning, as in discourse or legal analysis.

Clean verbatim removes filler words and stammers but keeps the speaker's real words, grammar, and meaning. It is the readable default for journalism and most research quoting. Edited transcription goes a step further, fixing grammar and trimming tangents for a tidy record. Intelligent verbatim, sometimes called readable verbatim, lightly rewrites for smooth reading without changing what was meant.

Read the four as points on a naturalism-to-denaturalism line, not four sealed boxes. The line between verbatim and clean verbatim, and when each is right, is worth its own walk-through. One rule holds across all of them: never silently fix a factual error a speaker made – flag it instead.

Phonetic and conversation-analytic transcription capture what words miss

Two specialist types record sound and delivery rather than just words. Phonetic transcription writes down how speech is pronounced, using the International Phonetic Alphabet, the standard for notating speech sounds. It comes in two grains, broad and narrow.

Broad, or phonemic, transcription ignores as many details as possible, capturing only enough to tell one word from another. Narrow transcription captures as many pronunciation details as it can, within the limits of hearing and IPA conventions. Linguists, speech pathologists, and language teachers use these grains. A journalist almost never will.

Conversation analysis adds a third system, the Jeffersonian transcript. It encodes delivery – timing, speed, emphasis, pitch, and volume – plus overlap and measured pauses. Park and Hepburn note that standard orthographic transcripts wipe out these core elements, the very things speakers use to build meaning.

Human, AI, and hybrid: the method axis

The second axis is who produces the transcript, and the gap between people and machines has narrowed. On the Switchboard conversational benchmark, professional human transcribers reached a 5.9% word error rate, and an automated system reached 5.8%, edging past that human mark. On clean speech, in other words, AI is now competitive with a professional.

Human transcription still leads on hard audio: heavy accents, crosstalk, poor recordings, and specialist vocabulary. But it is slow and costly. Transcribing one hour of interview audio can take up to six hours of manual work. That cost is why pure-human transcription is now saved for the jobs that truly need it.

The hybrid method splits the difference. An AI first pass produces a draft in minutes; a person then corrects names, jargon, numbers, and overlapping speech. For most research and journalism, this is the practical type to use. Where the method really matters, such as certified or regulated work, the AI-versus-human tradeoff deserves a closer look.

Which type of transcription do you actually need?

Match the type to the job, not the other way round. In qualitative research, the naturalism-versus-denaturalism call comes first, because it shapes your coding and your findings. Capture the detail you will actually analyze, then apply the same style consistently across every transcript in the study.

Legal and court work sits at the strict-verbatim end. Federal court proceedings must be recorded verbatim by statute, and depositions hold to the same standard. This is where the method matters most; the legal transcription guide covers when a certified human is required. For accessibility, a plain readable transcript is often the baseline, since WCAG 2.1 requires a text alternative for prerecorded audio.

For journalism, clean verbatim plus a hybrid workflow covers almost everything: readable quotes, fast turnaround, and strict verbatim held back for the lines where exact phrasing is the story.

Tips from people who do this a lot

Decide style and method as two separate questions. 'Verbatim' answers how much you keep; 'human vs AI' answers who types it. Conflating them leads to over-editing.
If you might code or quote how something was said, capture it in the first pass. You can always clean a naturalized transcript down, but you cannot recover detail you deleted.
Reserve strict verbatim for the passages that need it. Transcribing every 'um' across a three-hour recording you will quote twice is wasted effort.
Phonetic (IPA) transcription answers a different question than an ordinary transcript. If you need pronunciation, no standard interview transcript, human or AI, will give it to you.
In a hybrid workflow, spend your human time on names, numbers, acronyms, and crosstalk. That is where AI drafts break, and where a wrong word costs the most.

Try it now

Drop in your recording or paste a link and get a clean, speaker-labeled transcript in minutes. Your first 60 minutes are free.

or paste a link

60 min free · no card required · we never train on your audio

Trusted by 100,000+ creators, podcasters, journalists & researchers

Types of transcription – questions, answered

What are the main types of transcription?

They fall on two axes. By style: strict verbatim (every filler kept), clean verbatim (filler removed, words intact), edited, and intelligent or readable verbatim. By method: human, AI, or a hybrid of the two. Phonetic transcription is a separate system that records pronunciation using the IPA rather than ordinary spelling.

What is the difference between verbatim and clean verbatim?

Strict verbatim keeps every word and sound: fillers, false starts, stutters, and pauses. Clean verbatim removes those disfluencies but preserves the speaker's actual words, grammar, and meaning. Verbatim suits discourse, legal, and conversation analysis; clean verbatim is the readable default for journalism and most research quoting.

What is naturalized versus denaturalized transcription?

Naturalism captures every utterance in as much detail as possible. Denaturalism corrects grammar, removes interview noise like stutters and pauses, and standardizes non-standard accents. Researchers treat this as a consequential choice, because it shapes how the material can later be coded and reported.

What is phonetic transcription used for?

Phonetic transcription records how speech is pronounced, using the International Phonetic Alphabet. Broad (phonemic) transcription notes only enough to tell words apart; narrow transcription captures fine pronunciation detail. It is used in linguistics, speech pathology, and language teaching, not in ordinary interview or meeting transcription.

Is AI or human transcription more accurate?

It depends on the audio. On clean conversational speech, automated systems have reached roughly the same word error rate as professional human transcribers. On hard audio, such as heavy accents, crosstalk, poor recordings, and specialist terms, humans still lead, which is why a hybrid AI-draft-plus-human-edit workflow is common.

References

1.Oliver, Serovich & Mason (2005), Constraints and Opportunities with Interview Transcription: Towards Reflection in Qualitative Research – Social Forces (Oxford University Press)
2.Bucholtz, M. (2000), The politics of transcription – Journal of Pragmatics (Elsevier); UC Santa Barbara eScholarship
3.Handbook of the International Phonetic Association, Foreword – International Phonetic Association / Cambridge University Press
4.Broad and narrow transcriptions (phonetics course resource) – University of Manitoba, Dept. of Linguistics
5.Park & Hepburn (2022), The Benefits of a Jeffersonian Transcript – Frontiers in Communication
6.28 U.S. Code § 753 – Reporters (verbatim recording of court proceedings) – Cornell Law School Legal Information Institute
7.Haberl et al. (2023), aTrain – manual transcription time, citing Bell et al. (2018) – arXiv
8.Xiong et al. (2016), Achieving Human Parity in Conversational Speech Recognition – Microsoft Research (arXiv)
9.W3C, Understanding WCAG 2.1: SC 1.2.1 Audio-only and Video-only (Prerecorded) – W3C / Web Accessibility Initiative

Keep reading

Don't just take our word for it.

Ask ChatGPT, Claude, or Perplexity what Pepys is and who it's for. One click, and your favorite AI does the homework.

Ask ChatGPT Ask Claude Ask Perplexity

Get your transcript – free to start

Pay as you go – credits never expire, nothing to cancel. Or start free with 60 minutes, no card.

Start free – 60 minutes or see pricing