Is AI transcription confidential? It depends on the vendor
The honest answer starts with who runs the tool, not the model inside it. Automatic transcription is confidential only when the provider encrypts your files, deletes them on a schedule, and won't feed your content into model training. Confidentiality is a promise a company makes and keeps. It isn't something the technology hands you for free.
The security world has a precise word for this. NIST defines confidentiality as preserving authorized restrictions on information access and disclosure – who is allowed to see your data, and who isn't. That's separate from whether the file is accurate or reachable. For a transcript it's the entire question, because your recording can hold names, medical details, off-the-record sources, or privileged testimony.
The worry is mainstream, not niche. In Cisco's 2024 Consumer Privacy Survey, 84% of generative-AI users said they were concerned about data they enter going public, and 30% admitted entering confidential information anyway. The risk isn't theoretical. A consolidated class action now accuses a well-known AI notetaker of transcribing conversations and training its models without participant consent (National Law Review). Those allegations are unproven, and no liability has been found. Still, that case is why 'will this vendor train on my data?' turned into a real buying question.
Confidentiality and no-training are two different promises
These two get blurred constantly, and the difference matters when you compare tools. Confidentiality covers who can access your file: encryption, access controls, and how soon it's deleted. The no-training guarantee is narrower. It covers one thing: whether your audio and transcript become fuel for someone's model. A tool can be fully encrypted and still train on your content. Another can promise no training and still keep your files forever.
That same case sits squarely on the training-and-consent side. The complaint alleges the tool recorded and transcribed non-users' conversations without their consent (OpenClassActions), and that it used those recordings to train its ASR and machine-learning models (National Law Review). Again, these are unproven allegations. But the case shows why you want a written no-training term in the privacy policy, not just a reassuring line on the marketing page.
Here's where we'll be specific about Pepys. We don't use your content to train our own models, and our AI gateway is configured for zero data retention. One honest caveat: our subprocessors don't train on your content where our agreements support that. That's a contractual condition, not an absolute guarantee, and we'd rather say so plainly. If your goal is to strip identifying details from the transcript text itself, that's a different job – anonymizing a transcript covers de-identifying the content, while this guide is about how the vendor handles your file.
What do 'encryption in transit' and 'encryption at rest' mean?
Encryption scrambles readable data into ciphertext that only an authorized key can unlock (NIST). Two kinds matter for a transcription tool, and a credible vendor does both. Skip either one and there's a gap where your file sits in the clear.
Encryption in transit protects your file while it moves across the network – the padlock in your browser bar. That's the job of TLS, the protocol behind HTTPS, which NIST describes as protecting data during transmission across the Internet (NIST SP 800-52). Without it, anyone sitting on the network between you and the server could read the upload.
Encryption at rest protects the file once it's stored on disk. NIST treats storage encryption as a separate problem from data in motion (NIST SP 800-111). If a drive or backup is stolen, at-rest encryption keeps the contents unreadable. Pepys encrypts your files in transit over HTTPS/TLS and at rest through its infrastructure providers, so both legs are covered.
How long does the vendor keep your files?
Retention is the overlooked half of confidentiality. A file that no longer exists is safer than one sitting on a server indefinitely. The GDPR's storage-limitation principle captures the idea: personal data should be kept no longer than is necessary for the processing purpose (GDPR Article 5(1)(e)). So the sharp question to ask any vendor is simple. When does my audio actually get deleted?
Pepys answers it concretely. By default, we auto-purge your uploaded source audio or video 30 days after upload, while the transcript and every export stay until you delete them or close your account. Transcribe without an account and the unclaimed anonymous job is deleted roughly 12 hours after it's created. You keep the useful artifact. The sensitive raw recording doesn't linger.
Whatever tool you choose, read the retention section of its privacy policy before uploading anything sensitive, then delete the source file yourself once you have the transcript. You can try a transcript in your browser with no card and see exactly what's stored. For an IRB study or a sensitive-source project, map where the audio lives at every step before you hit record – qualitative-research transcription walks through that workflow.
What a self-serve tool can't promise, and how to choose
A vendor's confidentiality only reaches as far as the companies it hands your data to. Under GDPR Article 28, a processor can't bring in a sub-processor without the controller's written authorisation, and must bind that sub-processor to the same data-protection terms. In plain terms: the subprocessor list is the real perimeter of who touches your file. Read it.
For regulated data, confidentiality becomes a legal instrument, not a preference. To let any vendor handle protected health information, a HIPAA-covered entity must first obtain a written Business Associate Agreement (45 CFR 164.502(e)). A self-serve tool without a signed BAA is not the right home for PHI. We'll be blunt about our own limits: Pepys does not offer a BAA and does not advertise SOC 2 or HIPAA certification. If your work requires those, pick a vendor that contracts for them.
So, is AI transcription confidential? It can be, if you choose on evidence instead of adjectives. Look for five things: encryption in transit and at rest, a stated deletion schedule, a written no-training term, a published subprocessor list, and a BAA if you handle PHI. When confidentiality is a professional duty rather than a preference, treat it as one – legal transcription covers the duty of confidentiality and consent in more depth. The honest vendors tell you exactly what they do and don't guarantee. Read for that, and upload accordingly.