Who does HIPAA actually bind?
HIPAA's Privacy Rule binds a narrow set of organizations, not everyone who touches health information. It covers health plans, healthcare clearinghouses, and most providers who send health data electronically, plus the business associates who handle protected health information on their behalf (45 CFR 160.103). The stakes explain the strictness. Between 2009 and 2025, 7,418 healthcare breaches of 500 or more people were reported to the HHS Office for Civil Rights, exposing data on more than a billion Americans (HIPAA Journal, 2026). 2024 alone set the record, at 289 million individuals' data (HIPAA Journal, 2026).
By that definition, an independent journalist or an academic researcher is usually none of the three, and works on behalf of no covered entity. Read literally, the Privacy Rule doesn't reach them. That last point is an inference from the definitions, not a sentence HHS wrote about your job. HHS's own plain-language guidance frames it the same way: if you fit neither definition, the HIPAA Rules don't apply to you (HHS).
Two things can flip that. Handle PHI on a covered entity's behalf, say, transcribing a clinic's patient interviews under contract, and you become a business associate yourself. And even when HIPAA stays silent, an IRB's rules, the Common Rule, or a state privacy law can still bind how you handle the recording. So the honest first question isn't whether a tool is compliant. It's whether you're even covered.
When you need a business associate agreement
You need a business associate agreement, a BAA, whenever a vendor will create, receive, maintain, or transmit PHI for you. That applies when you're a covered entity or another business associate. The rule is strict about it. 45 CFR 164.502(e) lets you disclose PHI to a vendor only if you first obtain 'satisfactory assurance' it will safeguard the data. That assurance has to be documented in a written contract.
Skip the contract and the upload itself is the violation. Because the disclosure is permitted only when that written assurance exists, handing PHI to a vendor with no BAA is an impermissible disclosure. It breaks the Privacy Rule before any data ever leaks. The Security Rule sets a parallel requirement for electronic PHI at 45 CFR 164.308(b).
In practice this splits transcription vendors into two groups. Some will sign a BAA and take identifiable PHI. Many consumer tools, Pepys included, will not. If you're a covered entity with real patient data, that decides your shortlist. If you're not, or you can take the data out of PHI scope first, the whole BAA question falls away, which is the path most researchers and journalists actually want.
How de-identified data sidesteps the Privacy Rule
De-identified data isn't PHI, so the Privacy Rule stops applying to it. 45 CFR 164.514(a) says health information with no reasonable basis to identify a person 'is not individually identifiable health information.' Once a transcript clears that bar, it sits outside the Rule, no BAA required. HHS recognizes exactly two ways to get there.
The first is Expert Determination: a qualified statistician certifies that the re-identification risk is very small. The second is the method HHS calls Safe Harbor, which means removing 18 specified identifiers, from names and dates to record numbers. 'Safe Harbor' is HHS's label for that removal method. The regulation itself just lists the identifiers you strip.
For a researcher, this is the clean path. If your transcript never carries patient identifiers, or you redact them before analysis, the BAA question never arises. The redaction craft is a job of its own: stripping direct identifiers, generalizing the quasi-identifiers that still single people out, and keeping an un-redacted master. How to anonymize a transcript walks through it.
Redacting the transcript doesn't de-identify the audio
The voice itself is a listed identifier. Safe Harbor's list of 18 identifiers includes, at subparagraph (P), 'biometric identifiers, including finger and voice prints,' alongside full-face photos at (Q) (45 CFR 164.514(b)(2)(i)). A raw voice recording is one of those identifiers. So the audio can still identify a speaker even after you clean the transcript perfectly.
This trips people up. They redact every name in the text, then keep the original recording in the same folder, and treat the job as done. But text redaction removes identifiers from the text, not from the audio beside it. If you're de-identifying to leave HIPAA behind, the recording is in scope too, so it needs deleting or the same access controls you'd give raw PHI.
The order matters most when the transcript is analysis data, not a published quote. The health identifiers have to come out before the transcript enters your coding software or your dataset, and the audio has to be handled alongside it. Qualitative research transcription covers how de-identification fits the wider coding and IRB workflow.
GDPR reaches health transcripts HIPAA doesn't
GDPR applies on entirely different terms. Under it, 'data concerning health' is a special category, and Article 9(1) says processing it 'shall be prohibited' unless you meet one of the specific conditions in Article 9(2). So an EU interviewee's health transcript carries obligations even where US HIPAA never applies at all.
The trigger is different from HIPAA's. HIPAA turns on who you are, a covered entity or business associate. GDPR turns on whose data it is and where the processing sits, regardless of your job title. A journalist outside every US health rule can still be squarely inside GDPR the moment an interviewee is an EU data subject.
De-identification helps here too, but treat GDPR and HIPAA as separate checks rather than one. They trigger differently and set their own bars, so clearing HIPAA's Safe Harbor list doesn't automatically satisfy GDPR. If your recordings include EU health data, confirm the GDPR condition separately before you rely on a US-shaped workflow.
What makes a HIPAA-compliant transcription workflow?
No tool is 'HIPAA-compliant' on its own. Compliance lives in your workflow, not a badge. The rule already settled what matters. Are you a covered entity or a business associate? And if so, is a signed BAA in place before any PHI moves? A logo on a pricing page answers neither.
A defensible workflow starts before any tool choice: confirm whether you're actually covered. If you are and PHI is involved, get a BAA signed or de-identify the data first. Treat the audio, not only the transcript, as an identifier. And whatever the vendor, make sure it won't train on your files and lets you delete them when you're done.
Here's where Pepys sits, plainly. It isn't a covered entity or a business associate and doesn't sign BAAs, so it isn't the tool for identifiable PHI you're handling for a hospital. It never trains on your audio or text, source media auto-deletes 30 days after upload by default, and an unclaimed anonymous job is purged in about 12 hours. For de-identified research recordings, interview transcription is often the right fit, and you can start a transcript without an account.