Pepys

Guide

How to transcribe a phone interview

A working guide for journalists and researchers recording calls – how to stay legal, get the cleanest audio a phone line allows, and turn it into accurate, attributable quotes.

The short answer

To transcribe a phone interview, first confirm you can legally record the call – U.S. federal law allows one-party consent, but about a dozen states require everyone to agree. Capture the cleanest audio the line allows, ideally each speaker on a separate track. Then run an AI first pass for a speaker-labeled, timestamped draft and hand-check the quotes you'll publish.

Get consent before you press record

Before you record a call, make sure you're allowed to. Under U.S. federal law, it's lawful to record a conversation you're a party to, or where one party has given prior consent – the one-party-consent baseline (18 U.S.C. Sec. 2511). That's the floor, not the ceiling.

State law can be stricter. About a dozen states require all-party consent (11, by the Reporters Committee's count), meaning everyone on the line has to agree. That list includes California, Florida, Illinois, Maryland, Massachusetts, Pennsylvania, and Washington. A phone interview often crosses state lines, and the safe default is to assume the stricter state's law applies. When you're not sure, ask and get a clear yes on the recording itself.

Say the date, the participants, and the consent out loud at the top of the call, so your permission is timestamped in the audio. For the full picture on what's legal where (and how consent interacts with all-party rules), work through the legal walk-through on recording and transcribing. This guide keeps consent practical; that one goes deep. We can't give legal advice, so treat a clear yes as the safe default.

Phone audio is narrowband by design

Phone calls throw away audio that speech recognition depends on. Traditional telephone lines are band-limited to 300–3400 Hz and sampled at 8 kHz (so-called narrowband), which strips out higher-frequency detail useful for telling many speech sounds apart (Liu, Fu & Narayanan, 2009). That's the baseline every phone transcript inherits.

The 8 kHz PCM standard behind that is ITU-T G.711, in force since 1988. Mobile and VoIP calls add lossy compression on top: G.729 codes speech at just 8 kbit/s to save bandwidth, discarding detail it judges inaudible. The recording never had the fidelity of an in-person mic – the codec decided that before you pressed record.

VoIP adds one more hazard: dropped packets. Packet loss leaves gaps in the audio stream when data fails to arrive, and it's a growing problem as calls move to internet telephony. A half-second dropout mid-sentence is a word your transcriber will never recover. Knowing the audio's ceiling is set this low tells you where to spend your editing time later.

Capture the call as cleanly as the line allows

You can't add fidelity a phone line stripped out, but you can avoid making the recording worse. The cleanest capture records each side separately. If the interview runs over a computer, a per-track recording gives you the source on one channel and yourself on another, which keeps the two voices from bleeding together.

When you only have the handset, use a dedicated call-recording app rather than a speakerphone in a noisy room. Speakerphone pulls in the fan, the keyboard, and the room echo on top of an already-thin signal. A wired or app-based recording of the call itself skips the room and keeps the compressed-but-clean line audio.

Record a 10-second test and play it back before the real conversation starts. A dropped connection or a muted mic caught now costs you seconds; caught after the interview, it costs you the whole thing. Note the file's start time so you can line up timestamps against your notes afterward.

One mixed line makes who-said-what harder

If both voices land on a single mixed track, labeling who said what gets harder – especially when you talk over each other. Overlapping speech is a leading source of speaker-diarization error, and in strict scoring it counts directly against the error rate (DIHARD II, 2020). Separate channels sidestep most of that, because each voice is already isolated.

On one mono line, the software has to infer the speaker turn from voice characteristics alone, and a phone call's narrow band gives it less to work with. That's reasoning, not a benchmark: the fewer acoustic cues in the signal, the more turns you'll fix by hand. Speaker labeling on a mixed track is exactly the job to lean on here, then correct the boundaries where you cut each other off.

Expect the most errors around crosstalk and quick back-and-forth. When you and the source overlap, the transcript may drop a turn or merge two people into one – so those are the spots to check against the audio. A short question you asked can get folded into the source's answer, which changes who owns the quote.

Transcribe the phone interview with an AI first pass

Once you have the recording, don't type it out. Manual transcription can take up to six hours of work for a single hour of audio (Bell et al., via Haberl et al., 2023) – most of a working day. An AI first pass built for reporting turns that into a few minutes of processing plus a focused cleanup.

From there, the workflow is the same as any interview transcript: read the draft against the audio and fix the spots AI struggles with. On a phone call that means names, jargon, numbers said fast, and every overlapped turn. Bracket anything you genuinely can't hear as [inaudible] with its timestamp rather than guessing at a quote.

Clean only the lines you'll publish, to one consistent verbatim style, and keep the timestamps so every quote stays re-checkable. A phone transcript carries more uncertain patches than a studio recording. So a source who can point back to 12:04 in the audio is worth more than a paragraph of tidy but unverifiable text.

The steps, in order

  1. 01

    Confirm you can legally record

    Check consent law before the call. Federal law allows one-party consent, but about a dozen states require all parties to agree, and interstate calls default to the stricter law. Get a clear yes on the recording.

  2. 02

    Record each side as cleanly as possible

    Use per-track recording or a dedicated call-recording app, not a speakerphone in a noisy room. State the date, participants, and consent at the top, and run a 10-second test first.

  3. 03

    Upload it for an AI first pass

    Drop the audio into a transcription tool to get a speaker-labeled, timestamped draft in minutes instead of hours of typing from a compressed, narrowband recording.

  4. 04

    Fix the phone-audio trouble spots

    Read the draft against the call and correct names, jargon, fast numbers, and overlapped turns where two voices merge. Mark anything unclear as [inaudible] with its timestamp.

  5. 05

    Clean the quotes and export

    Tighten only the lines you'll publish to one verbatim style, keep timestamps for re-checking, then export to DOCX or TXT and store the audio securely.

Tips from people who do this a lot

  • Per-track recording is the single biggest upgrade on a phone call – separate channels remove most of the crosstalk that trips up speaker labeling.

  • Assume the stricter state law on any call that crosses state lines, and capture the consent in the audio so you never have to prove it later.

  • Don't record on speakerphone. It stacks room noise and echo on top of an already narrowband line and gives the transcriber less to work with.

  • Check overlapped turns first. Crosstalk is where a phone transcript most often drops a turn or merges two speakers into one.

  • Keep timestamps on every quote. A phone recording has more uncertain patches, so being able to jump back to the exact second matters more, not less.

Try it now

Drop in your recording or paste a link and get a clean, speaker-labeled transcript in minutes. Your first 60 minutes are free.

or paste a link
InstagramTikTokYouTubeFacebookSpotifyApple Podcasts

60 min free · no card required · we never train on your audio

PodcasterJournalistContent creatorResearcherStudent
Trusted by 100,000+ creators, podcasters, journalists & researchers

How to transcribe a phone interview – questions, answered

Is it legal to record a phone interview?

Often, but it depends where the parties are. U.S. federal law allows one-party consent, so you can record a call you're on. About a dozen states require all parties to agree, and interstate calls default to the stricter state's law. Get a clear yes on the recording, and treat that as the safe minimum.

Why is phone audio harder to transcribe?

Phone lines are narrowband (limited to about 300–3400 Hz, sampled at 8 kHz), which removes higher-frequency detail that helps tell speech sounds apart. Mobile and VoIP calls add lossy compression and occasional dropped packets, so the recording starts with less fidelity than an in-person microphone would capture.

How do I get accurate speaker labels on a phone call?

Record each side on a separate channel where you can, so the two voices are already isolated. On a single mixed line the tool has to infer speaker turns from voice alone, and overlapping speech is a leading source of labeling error. Expect to correct more turns by hand around crosstalk.

How long does it take to transcribe a phone interview?

By hand, up to about six hours for one hour of audio – most of a working day. An AI first pass cuts that to a few minutes of processing plus a focused cleanup, where you fix names, jargon, numbers, and overlapped turns rather than typing the whole thing from scratch.

Should I record on speakerphone?

Avoid it if you can. Speakerphone adds room echo, background noise, and keyboard clatter on top of an already thin phone signal, which gives the transcriber less to work with. A dedicated call-recording app or a per-track computer recording keeps the line audio cleaner.

References

  1. 1.Liu, Fu & Narayanan (2009), Effect of bandwidth extension to telephone speech recognition in cochlear implant usersJournal of the Acoustical Society of America / NIH PubMed Central
  2. 2.ITU-T Recommendation G.711 – Pulse code modulation (PCM) of voice frequenciesInternational Telecommunication Union (ITU-T)
  3. 3.18 U.S.C. Sec. 2511 – interception of communications (federal one-party consent)Cornell Law School – Legal Information Institute
  4. 4.Introduction to the Reporter's Recording Guide (state consent laws)Reporters Committee for Freedom of the Press
  5. 5.Haberl et al. (2023), Take the aTrain – manual transcription time, citing Bell et al. (2018)arXiv / University of Graz
  6. 6.Lin et al. (2020), DIHARD II is Still Hard – overlapping speech and diarization errorarXiv
  7. 7.ITU-T Recommendation G.729 – coding of speech at 8 kbit/s (CS-ACELP)International Telecommunication Union (ITU-T)
  8. 8.INTERSPEECH 2022 Audio Deep Packet Loss Concealment Challenge – packet loss in VoIParXiv / INTERSPEECH 2022

Keep reading

Don't just take our word for it.

Ask ChatGPT, Claude, or Perplexity what Pepys is and who it's for. One click, and your favorite AI does the homework.

Get your transcript – free to start

Pay as you go – credits never expire, nothing to cancel. Or start free with 60 minutes, no card.