Pepys

Guide

Captions vs subtitles vs transcripts

For anyone captioning, translating, or publishing video: which text artifact each viewer actually needs, and what accessibility law requires.

The short answer

Captions and subtitles are both time-synced text on video, but they solve different problems. Captions transcribe speech plus non-speech sound (speaker IDs, sound effects, music) for viewers who can't hear the audio. Subtitles translate only the dialogue for viewers who can hear but don't know the language. A transcript, by contrast, is a standalone document with no timing.

Three artifacts, three different jobs

Transcripts, captions, and subtitles all turn spoken audio into text, but they serve different viewers and formats. A transcript is a standalone document you read on its own. Captions and subtitles are time-synced text laid over video. The W3C Web Accessibility Initiative separates captions from subtitles by who needs them and what each one includes.

The dividing line is sound versus language. Captions exist for people who can't hear the audio, so they include non-speech information alongside dialogue. Subtitles exist for people who can hear but don't know the spoken language, so they carry the words only. A transcript sits outside video timing entirely – it's the written record you read or search.

The terms blur because they overlap. W3C notes these are sometimes distinguished as "intralingual subtitles" (same language) and "interlingual subtitles" (different language) – the prefix tells you whether the text stays in the source language or crosses into another. Same surface, text on or beside video, but a different job underneath.

Captions vs subtitles: sound vs language

Captions and subtitles differ by what they include and who they're for. Per the W3C WAI, captions are a text version of both the speech and the non-speech audio needed to understand the content, made for Deaf people and others who can't hear. Subtitles translate spoken audio into another language for viewers who can hear but don't know it.

In practice, captions do the work of an ear. They identify who is speaking and mark the sounds that carry meaning: [phone rings], [tense music], [laughter], [door slams]. Mute a thriller and the plot may hinge on a sound you can't see. A caption tells you it happened; a subtitle assumes you heard it.

Subtitles take hearing for granted. They render only the dialogue, translated, trusting the viewer to catch tone, effects, and music unaided. Swap one for the other and the file fails its audience – a hard-of-hearing viewer handed subtitles loses every non-speech cue, which is often half the story.

A transcript is a standalone document

A transcript is a standalone text alternative with no synchronization to playback. WCAG 2.1 Success Criterion 1.2.1 (Level A) requires one for prerecorded audio-only content: a text alternative that presents equivalent information. It's a record you read on its own, with no timing tied to playback.

Timing is the whole difference. A transcript is one continuous document you read top to bottom or search by keyword. Captions and subtitles are chopped into short cues, each stamped with an in and out time so text surfaces exactly as the words are spoken. Add those timestamps to a transcript and you have most of a timestamped transcript – the raw material of a caption file.

That shared lineage is why one recording becomes all three. Start with a transcript, segment it, add timing, and export it as an SRT caption file; translate those cues and you have subtitles. The document comes first; the synced tracks are what you build from it.

Closed captions, open captions, and SDH

Captions split into closed and open by one question: can the viewer turn them off? The DCMP Captioning Key defines closed captions as hidden and decoded on demand – toggleable on and off. Open captions are always visible, burned into the picture and impossible to switch off.

SDH, subtitles for the deaf and hard of hearing, is the hybrid between the two. DCMP describes SDH as just like subtitles but adding sound effects, speaker identification, and other non-speech features. So it packages a caption's non-speech cues – speaker labels, effects, music – into a subtitle-format track.

For most work, closed captions are the safer default. They're user-controlled, can be restyled, and can be corrected without re-rendering the video. Open captions can't be switched off, which is why creators burn them into short social clips built to be watched on mute – but that permanence is also their drawback.

What accessibility law requires

For accessibility, the requirement is captions, not subtitles. WCAG 2.1 Success Criterion 1.2.2 (Level A) requires captions for all prerecorded audio in synchronized media, with one exception: a media alternative for text that's clearly labeled as such. Subtitles don't satisfy it, because they drop the non-speech information.

US federal content answers to Section 508. The revised standards incorporate WCAG 2.0 Level A and AA by reference for web and non-web content, and Section508.gov maps the captioning criteria straight onto federal video.

State and local government adds ADA Title II. Under DOJ's 2024 web rule, their web content and mobile apps must meet WCAG 2.1 Level AA, which includes captions. After DOJ's April 2026 extension, compliance falls due on April 26, 2027 for governments serving 50,000 or more people, and April 26, 2028 for smaller entities and special districts.

Broadcast runs on its own rulebook. FCC regulation 47 CFR 79.1 requires distributors to caption 100% of new, nonexempt English- and Spanish-language programming. Carve-outs cover other languages, airings between 2 a.m. and 6 a.m., short promos and PSAs, and programming that's mostly non-vocal music.

So which one do you actually need?

Match the artifact to the barrier your viewer faces. Can't hear the audio? You need captions – speech plus sound cues, ideally closed so they toggle. Can hear but don't speak the language? You need subtitles. Nobody watching the video at all? A transcript is the deliverable, and WCAG treats it as the baseline for audio-only content.

Most real projects need more than one. A published interview wants a transcript for readers and quotes, captions for the embedded video, and subtitles if it crosses languages. Because all three descend from the same source text, producing one gets you most of the way to the next – segment and time a transcript, then add subtitles to the video once the caption cues exist.

Tips from people who do this a lot

  • If a deaf or hard-of-hearing viewer might watch, ship captions, not subtitles. Subtitles drop the music, effects, and speaker changes that carry meaning when you can't hear the audio.

  • Default to closed captions over burned-in open captions. Closed captions can be toggled off, restyled, and corrected without re-rendering the video; open captions are permanent.

  • Keep the transcript as your master file. Segment and timestamp it into SRT or VTT for captions, then translate those cues for subtitles – one source, three deliverables.

  • For state or local government video, WCAG 2.1 AA (captions included) is the bar, with April 2027 and April 2028 deadlines. Build captioning in now, not at audit time.

  • Reach for SDH when you need caption-level detail inside a subtitle-format track. Unlike plain subtitles, it carries the speaker labels and sound effects a deaf or hard-of-hearing viewer needs.

Try it now

Drop in your recording or paste a link and get a clean, speaker-labeled transcript in minutes. Your first 60 minutes are free.

or paste a link
InstagramTikTokYouTubeFacebookSpotifyApple Podcasts

60 min free · no card required · we never train on your audio

PodcasterJournalistContent creatorResearcherStudent
Trusted by 100,000+ creators, podcasters, journalists & researchers

Captions vs subtitles – questions, answered

What's the difference between captions and subtitles?

Captions include the non-speech audio – speaker changes, sound effects, music – for viewers who can't hear. Subtitles translate only the spoken dialogue for viewers who can hear but don't know the language. Both appear as time-synced text on video; the difference is sound versus translation.

Is a transcript the same as captions?

No. A transcript is a standalone document with no timing, meant to be read or searched on its own. Captions are short, time-synced cues that appear on the video as words are spoken. WCAG requires a transcript for audio-only content and captions for synchronized media.

What's the difference between closed and open captions?

Closed captions can be toggled on and off by the viewer and restyled; open captions are burned into the picture and always visible. The words are usually identical – the difference is delivery. Closed captions are the accessible default; open captions suit clips watched on mute.

What is SDH?

SDH stands for subtitles for the deaf and hard of hearing. Per the DCMP Captioning Key, it's like subtitles but adds sound effects, speaker identification, and other non-speech cues. So it packages caption-style detail into a subtitle-format track for viewers who can't hear the audio.

Does the law require captions or subtitles?

Captions. WCAG 2.1 Success Criterion 1.2.2 requires captions for prerecorded synchronized media at Level A, and Section 508, ADA Title II, and FCC rule 47 CFR 79.1 all mandate captioning rather than subtitling. Subtitles don't qualify because they omit non-speech sound information.

References

  1. 1.Captions/Subtitles – media accessibility guidanceW3C Web Accessibility Initiative (WAI)
  2. 2.Understanding SC 1.2.2: Captions (Prerecorded)W3C (WCAG 2.1)
  3. 3.Understanding SC 1.2.1: Audio-only and Video-only (Prerecorded)W3C (WCAG 2.1)
  4. 4.Applicability & Conformance (Revised 508 Standards incorporate WCAG 2.0 AA)U.S. General Services Administration / Section508.gov
  5. 5.Video and Other Synchronized Media (captioning criteria)U.S. General Services Administration / Section508.gov
  6. 6.Fact Sheet: New Rule on Web Content and Mobile Apps (ADA Title II)U.S. Department of Justice / ADA.gov
  7. 7.47 CFR § 79.1 – Closed captioning of televised video programmingU.S. Code of Federal Regulations (eCFR) via Cornell LII
  8. 8.Captioning Types, Methods, and Styles (Captioning Key)Described and Captioned Media Program (DCMP)

Keep reading

Don't just take our word for it.

Ask ChatGPT, Claude, or Perplexity what Pepys is and who it's for. One click, and your favorite AI does the homework.

Get your transcript – free to start

Pay as you go – credits never expire, nothing to cancel. Or start free with 60 minutes, no card.