Pepys

Guide

Is AI transcription confidential?

A plain-English guide to what confidentiality actually means for a transcription tool – encryption, deletion, and the no-training promise – so you can judge a vendor on evidence, not adjectives.

The short answer

It depends on the vendor, not the technology. AI transcription is confidential when the provider encrypts your files in transit and at rest, deletes them on a clear schedule, and contractually won't train models on your content. None of that is automatic, so read the privacy policy for retention, subprocessors, and a written no-training term before you upload anything sensitive.

Is AI transcription confidential? It depends on the vendor

The honest answer starts with who runs the tool, not the model inside it. Automatic transcription is confidential only when the provider encrypts your files, deletes them on a schedule, and won't feed your content into model training. Confidentiality is a promise a company makes and keeps. It isn't something the technology hands you for free.

The security world has a precise word for this. NIST defines confidentiality as preserving authorized restrictions on information access and disclosure – who is allowed to see your data, and who isn't. That's separate from whether the file is accurate or reachable. For a transcript it's the entire question, because your recording can hold names, medical details, off-the-record sources, or privileged testimony.

The worry is mainstream, not niche. In Cisco's 2024 Consumer Privacy Survey, 84% of generative-AI users said they were concerned about data they enter going public, and 30% admitted entering confidential information anyway. The risk isn't theoretical. A consolidated class action now accuses a well-known AI notetaker of transcribing conversations and training its models without participant consent (National Law Review). Those allegations are unproven, and no liability has been found. Still, that case is why 'will this vendor train on my data?' turned into a real buying question.

Confidentiality and no-training are two different promises

These two get blurred constantly, and the difference matters when you compare tools. Confidentiality covers who can access your file: encryption, access controls, and how soon it's deleted. The no-training guarantee is narrower. It covers one thing: whether your audio and transcript become fuel for someone's model. A tool can be fully encrypted and still train on your content. Another can promise no training and still keep your files forever.

That same case sits squarely on the training-and-consent side. The complaint alleges the tool recorded and transcribed non-users' conversations without their consent (OpenClassActions), and that it used those recordings to train its ASR and machine-learning models (National Law Review). Again, these are unproven allegations. But the case shows why you want a written no-training term in the privacy policy, not just a reassuring line on the marketing page.

Here's where we'll be specific about Pepys. We don't use your content to train our own models, and our AI gateway is configured for zero data retention. One honest caveat: our subprocessors don't train on your content where our agreements support that. That's a contractual condition, not an absolute guarantee, and we'd rather say so plainly. If your goal is to strip identifying details from the transcript text itself, that's a different job – anonymizing a transcript covers de-identifying the content, while this guide is about how the vendor handles your file.

What do 'encryption in transit' and 'encryption at rest' mean?

Encryption scrambles readable data into ciphertext that only an authorized key can unlock (NIST). Two kinds matter for a transcription tool, and a credible vendor does both. Skip either one and there's a gap where your file sits in the clear.

Encryption in transit protects your file while it moves across the network – the padlock in your browser bar. That's the job of TLS, the protocol behind HTTPS, which NIST describes as protecting data during transmission across the Internet (NIST SP 800-52). Without it, anyone sitting on the network between you and the server could read the upload.

Encryption at rest protects the file once it's stored on disk. NIST treats storage encryption as a separate problem from data in motion (NIST SP 800-111). If a drive or backup is stolen, at-rest encryption keeps the contents unreadable. Pepys encrypts your files in transit over HTTPS/TLS and at rest through its infrastructure providers, so both legs are covered.

How long does the vendor keep your files?

Retention is the overlooked half of confidentiality. A file that no longer exists is safer than one sitting on a server indefinitely. The GDPR's storage-limitation principle captures the idea: personal data should be kept no longer than is necessary for the processing purpose (GDPR Article 5(1)(e)). So the sharp question to ask any vendor is simple. When does my audio actually get deleted?

Pepys answers it concretely. By default, we auto-purge your uploaded source audio or video 30 days after upload, while the transcript and every export stay until you delete them or close your account. Transcribe without an account and the unclaimed anonymous job is deleted roughly 12 hours after it's created. You keep the useful artifact. The sensitive raw recording doesn't linger.

Whatever tool you choose, read the retention section of its privacy policy before uploading anything sensitive, then delete the source file yourself once you have the transcript. You can try a transcript in your browser with no card and see exactly what's stored. For an IRB study or a sensitive-source project, map where the audio lives at every step before you hit record – qualitative-research transcription walks through that workflow.

What a self-serve tool can't promise, and how to choose

A vendor's confidentiality only reaches as far as the companies it hands your data to. Under GDPR Article 28, a processor can't bring in a sub-processor without the controller's written authorisation, and must bind that sub-processor to the same data-protection terms. In plain terms: the subprocessor list is the real perimeter of who touches your file. Read it.

For regulated data, confidentiality becomes a legal instrument, not a preference. To let any vendor handle protected health information, a HIPAA-covered entity must first obtain a written Business Associate Agreement (45 CFR 164.502(e)). A self-serve tool without a signed BAA is not the right home for PHI. We'll be blunt about our own limits: Pepys does not offer a BAA and does not advertise SOC 2 or HIPAA certification. If your work requires those, pick a vendor that contracts for them.

So, is AI transcription confidential? It can be, if you choose on evidence instead of adjectives. Look for five things: encryption in transit and at rest, a stated deletion schedule, a written no-training term, a published subprocessor list, and a BAA if you handle PHI. When confidentiality is a professional duty rather than a preference, treat it as one – legal transcription covers the duty of confidentiality and consent in more depth. The honest vendors tell you exactly what they do and don't guarantee. Read for that, and upload accordingly.

Tips from people who do this a lot

  • Read the retention clause before the marketing page. 'We take privacy seriously' is not a deletion schedule; a specific number of days is.

  • Check the subprocessor list, not just the headline promise. A vendor's confidentiality is only as strong as the companies it forwards your file to.

  • Separate the two questions: 'Can anyone see my file?' (encryption plus deletion) and 'Will it train a model?' (a written no-training term). A tool can pass one and fail the other.

  • If you handle PHI, a signed BAA is non-negotiable – a self-serve tool without one isn't a lawful option, no matter how good the encryption is.

  • Delete the source audio yourself once you have the transcript. Even a 30-day auto-purge is 30 days you didn't need to keep a sensitive recording.

Try it now

Drop in your recording or paste a link and get a clean, speaker-labeled transcript in minutes. Your first 60 minutes are free.

or paste a link
InstagramTikTokYouTubeFacebookSpotifyApple Podcasts

60 min free · no card required · we never train on your audio

PodcasterJournalistContent creatorResearcherStudent
Trusted by 100,000+ creators, podcasters, journalists & researchers

Is ai transcription confidential – questions, answered

Is AI transcription confidential?

Yes, but only when the vendor earns it, not because the software is 'AI.' Three concrete commitments make the difference: files encrypted in transit and at rest, a stated deletion window, and a written pledge never to train on your content. If the privacy policy skips any one, treat the upload as exposed.

Does AI transcription train on my recordings?

Some tools do, some don't; it's a contract term, not a default. A consolidated class action accuses a well-known AI notetaker of training its models on conversations without consent (unproven allegations). Look for a written statement that your content won't be used for training, and check whether it also binds the vendor's subprocessors.

What is the difference between encryption in transit and at rest?

Encryption in transit protects your file while it moves across the network, using TLS, the protocol behind HTTPS. Encryption at rest protects the same file once it's stored on disk, so a stolen drive stays unreadable. NIST treats them as separate safeguards, and a serious transcription vendor should do both.

Can I use AI transcription for HIPAA or medical recordings?

Only with a signed Business Associate Agreement. Under 45 CFR 164.502(e), a HIPAA-covered entity may hand protected health information to a vendor only after getting that written assurance. A self-serve tool without a BAA is not a lawful home for PHI, however strong its encryption, so confirm the BAA before uploading.

How long does a transcription service keep my audio?

It varies, so check the retention clause. The GDPR's storage-limitation principle says personal data shouldn't be kept longer than necessary, but each vendor sets its own window. Pepys auto-purges uploaded source media 30 days after upload and keeps the transcript until you delete it; unclaimed anonymous jobs go in about 12 hours.

References

  1. 1.2024 Consumer Privacy Survey – concern about GenAI data going publicCisco Systems
  2. 2.AI Notetaking Tools Under Fire: Lessons from the Otter.ai Class Action ComplaintThe National Law Review
  3. 3.Otter.ai Privacy Wiretap Class Action (Brewer v. Otter.ai) – unproven allegationsOpenClassActions
  4. 4.Glossary – Confidentiality (FIPS 199 / 44 U.S.C. 3542)NIST Computer Security Resource Center
  5. 5.Glossary – Encryption (CNSSI 4009-2015)NIST Computer Security Resource Center
  6. 6.SP 800-52 Rev. 2 – Guidelines for the Selection, Configuration, and Use of TLS ImplementationsNIST
  7. 7.SP 800-111 – Guide to Storage Encryption Technologies for End User DevicesNIST
  8. 8.GDPR Article 5(1)(e) – Storage Limitationgdpr-info.eu (Regulation (EU) 2016/679)
  9. 9.GDPR Article 28 – Processor and sub-processor obligationsgdpr-info.eu (Regulation (EU) 2016/679)
  10. 10.45 CFR 164.502(e) – Business Associate Agreement requirementCornell Law School Legal Information Institute

Keep reading

Don't just take our word for it.

Ask ChatGPT, Claude, or Perplexity what Pepys is and who it's for. One click, and your favorite AI does the homework.

Get your transcript – free to start

Pay as you go – credits never expire, nothing to cancel. Or start free with 60 minutes, no card.