Pepys

Guide

What HIPAA-compliant transcription actually requires

A plain-English guide for researchers and journalists: who HIPAA binds, when you need a BAA, and how de-identifying your data takes it out of scope.

The short answer

HIPAA-compliant transcription only matters if HIPAA binds you. The Privacy Rule covers health plans, clearinghouses, most healthcare providers, and the vendors acting on their behalf. If that's you and a tool will handle patient data, you need a signed business associate agreement first. De-identify the data and it stops being PHI, so the requirement falls away.

Who does HIPAA actually bind?

HIPAA's Privacy Rule binds a narrow set of organizations, not everyone who touches health information. It covers health plans, healthcare clearinghouses, and most providers who send health data electronically, plus the business associates who handle protected health information on their behalf (45 CFR 160.103). The stakes explain the strictness. Between 2009 and 2025, 7,418 healthcare breaches of 500 or more people were reported to the HHS Office for Civil Rights, exposing data on more than a billion Americans (HIPAA Journal, 2026). 2024 alone set the record, at 289 million individuals' data (HIPAA Journal, 2026).

By that definition, an independent journalist or an academic researcher is usually none of the three, and works on behalf of no covered entity. Read literally, the Privacy Rule doesn't reach them. That last point is an inference from the definitions, not a sentence HHS wrote about your job. HHS's own plain-language guidance frames it the same way: if you fit neither definition, the HIPAA Rules don't apply to you (HHS).

Two things can flip that. Handle PHI on a covered entity's behalf, say, transcribing a clinic's patient interviews under contract, and you become a business associate yourself. And even when HIPAA stays silent, an IRB's rules, the Common Rule, or a state privacy law can still bind how you handle the recording. So the honest first question isn't whether a tool is compliant. It's whether you're even covered.

When you need a business associate agreement

You need a business associate agreement, a BAA, whenever a vendor will create, receive, maintain, or transmit PHI for you. That applies when you're a covered entity or another business associate. The rule is strict about it. 45 CFR 164.502(e) lets you disclose PHI to a vendor only if you first obtain 'satisfactory assurance' it will safeguard the data. That assurance has to be documented in a written contract.

Skip the contract and the upload itself is the violation. Because the disclosure is permitted only when that written assurance exists, handing PHI to a vendor with no BAA is an impermissible disclosure. It breaks the Privacy Rule before any data ever leaks. The Security Rule sets a parallel requirement for electronic PHI at 45 CFR 164.308(b).

In practice this splits transcription vendors into two groups. Some will sign a BAA and take identifiable PHI. Many consumer tools, Pepys included, will not. If you're a covered entity with real patient data, that decides your shortlist. If you're not, or you can take the data out of PHI scope first, the whole BAA question falls away, which is the path most researchers and journalists actually want.

How de-identified data sidesteps the Privacy Rule

De-identified data isn't PHI, so the Privacy Rule stops applying to it. 45 CFR 164.514(a) says health information with no reasonable basis to identify a person 'is not individually identifiable health information.' Once a transcript clears that bar, it sits outside the Rule, no BAA required. HHS recognizes exactly two ways to get there.

The first is Expert Determination: a qualified statistician certifies that the re-identification risk is very small. The second is the method HHS calls Safe Harbor, which means removing 18 specified identifiers, from names and dates to record numbers. 'Safe Harbor' is HHS's label for that removal method. The regulation itself just lists the identifiers you strip.

For a researcher, this is the clean path. If your transcript never carries patient identifiers, or you redact them before analysis, the BAA question never arises. The redaction craft is a job of its own: stripping direct identifiers, generalizing the quasi-identifiers that still single people out, and keeping an un-redacted master. How to anonymize a transcript walks through it.

Redacting the transcript doesn't de-identify the audio

The voice itself is a listed identifier. Safe Harbor's list of 18 identifiers includes, at subparagraph (P), 'biometric identifiers, including finger and voice prints,' alongside full-face photos at (Q) (45 CFR 164.514(b)(2)(i)). A raw voice recording is one of those identifiers. So the audio can still identify a speaker even after you clean the transcript perfectly.

This trips people up. They redact every name in the text, then keep the original recording in the same folder, and treat the job as done. But text redaction removes identifiers from the text, not from the audio beside it. If you're de-identifying to leave HIPAA behind, the recording is in scope too, so it needs deleting or the same access controls you'd give raw PHI.

The order matters most when the transcript is analysis data, not a published quote. The health identifiers have to come out before the transcript enters your coding software or your dataset, and the audio has to be handled alongside it. Qualitative research transcription covers how de-identification fits the wider coding and IRB workflow.

GDPR reaches health transcripts HIPAA doesn't

GDPR applies on entirely different terms. Under it, 'data concerning health' is a special category, and Article 9(1) says processing it 'shall be prohibited' unless you meet one of the specific conditions in Article 9(2). So an EU interviewee's health transcript carries obligations even where US HIPAA never applies at all.

The trigger is different from HIPAA's. HIPAA turns on who you are, a covered entity or business associate. GDPR turns on whose data it is and where the processing sits, regardless of your job title. A journalist outside every US health rule can still be squarely inside GDPR the moment an interviewee is an EU data subject.

De-identification helps here too, but treat GDPR and HIPAA as separate checks rather than one. They trigger differently and set their own bars, so clearing HIPAA's Safe Harbor list doesn't automatically satisfy GDPR. If your recordings include EU health data, confirm the GDPR condition separately before you rely on a US-shaped workflow.

What makes a HIPAA-compliant transcription workflow?

No tool is 'HIPAA-compliant' on its own. Compliance lives in your workflow, not a badge. The rule already settled what matters. Are you a covered entity or a business associate? And if so, is a signed BAA in place before any PHI moves? A logo on a pricing page answers neither.

A defensible workflow starts before any tool choice: confirm whether you're actually covered. If you are and PHI is involved, get a BAA signed or de-identify the data first. Treat the audio, not only the transcript, as an identifier. And whatever the vendor, make sure it won't train on your files and lets you delete them when you're done.

Here's where Pepys sits, plainly. It isn't a covered entity or a business associate and doesn't sign BAAs, so it isn't the tool for identifiable PHI you're handling for a hospital. It never trains on your audio or text, source media auto-deletes 30 days after upload by default, and an unclaimed anonymous job is purged in about 12 hours. For de-identified research recordings, interview transcription is often the right fit, and you can start a transcript without an account.

Tips from people who do this a lot

  • HIPAA is a who-question first. Confirm whether you're a covered entity or business associate (45 CFR 160.103) before you weigh any tool's 'compliance.'

  • No product is 'HIPAA compliant' by itself. Compliance rides on your status and whether a BAA is in place, not on a software badge.

  • De-identify to leave the Rule behind, but don't forget the recording. A voice print is a Safe Harbor identifier, so redacted text beside raw audio still identifies the speaker.

  • If you're a covered entity, no signed BAA means no lawful upload. Disclosing PHI to a vendor without one is itself a Privacy Rule violation, before any breach.

  • HIPAA isn't the only rule in play. An EU interviewee's health data is a GDPR special category even when US HIPAA never applies.

Try it now

Drop in your recording or paste a link and get a clean, speaker-labeled transcript in minutes. Your first 60 minutes are free.

or paste a link
InstagramTikTokYouTubeFacebookSpotifyApple Podcasts

60 min free · no card required · we never train on your audio

PodcasterJournalistContent creatorResearcherStudent
Trusted by 100,000+ creators, podcasters, journalists & researchers

Hipaa compliant transcription – questions, answered

Is AI transcription HIPAA compliant?

No tool is compliant on its own. HIPAA binds covered entities and business associates; if you're one and the tool handles PHI, you need a signed BAA first. De-identify the data and it stops being PHI, so the requirement doesn't apply. Compliance is a property of your workflow, not the software.

Does HIPAA apply to a journalist's or researcher's interviews?

Usually not directly. HIPAA's Privacy Rule binds health plans, clearinghouses, most providers, and their business associates (45 CFR 160.103). An independent journalist or academic researcher is typically none of those. But you can become a business associate by handling PHI for a covered entity, and IRB or state rules may still apply.

Do I need a business associate agreement for a transcription tool?

Only if you're a covered entity or business associate and the vendor will handle PHI. 45 CFR 164.502(e) permits that disclosure only with documented, written assurances. Without a signed BAA, the disclosure itself violates the Privacy Rule, before any data leaks. If you're not covered, the requirement doesn't apply.

Does de-identifying a transcript remove HIPAA obligations?

For the text, largely yes. 45 CFR 164.514(a) says data with no reasonable basis to identify someone isn't PHI, so it falls outside the Rule. But a raw voice recording is itself a listed identifier, a voice print, so the audio still counts even when the transcript is clean.

Does GDPR cover health transcripts that HIPAA doesn't?

Yes, on different terms. GDPR treats 'data concerning health' as a special category whose processing is prohibited without a specific legal condition (Article 9). An EU interviewee's health transcript carries obligations regardless of whether US HIPAA applies, because GDPR turns on whose data it is, not on your job.

References

  1. 1.45 CFR § 160.103 – HIPAA covered entity & business associate definitionsCornell Law School Legal Information Institute (official 45 CFR text)
  2. 2.Covered Entities and Business Associates guidanceU.S. Department of Health & Human Services (HHS)
  3. 3.45 CFR § 164.502(e) – business associate disclosures & written contractCornell Law School Legal Information Institute (official 45 CFR text)
  4. 4.45 CFR § 164.514 – de-identification standards (Expert Determination & Safe Harbor)Cornell Law School Legal Information Institute (official 45 CFR text)
  5. 5.GDPR Article 9 – processing of special categories of personal datagdpr-info.eu (Regulation (EU) 2016/679; canonical text at EUR-Lex)
  6. 6.Healthcare Data Breach Statistics (2009–2025; HHS OCR portal data)The HIPAA Journal (source data: HHS OCR Breach Portal)
  7. 7.2025 Healthcare Data Breach Report (2024 record figure)The HIPAA Journal (aggregating HHS OCR data)

Keep reading

Don't just take our word for it.

Ask ChatGPT, Claude, or Perplexity what Pepys is and who it's for. One click, and your favorite AI does the homework.

Get your transcript – free to start

Pay as you go – credits never expire, nothing to cancel. Or start free with 60 minutes, no card.