Guide

Caption accuracy standards

What "accurate captions" actually means under the FCC's rules, WCAG, and Section 508 – how the bar is defined, measured, and why live and prerecorded captions are judged differently.

The short answer

Accurate captions match the spoken words in the order spoken, keep proper names right, and convey non-speech information like speaker identity, music, and sound effects. In the US, the FCC judges captions on four standards – accuracy, synchronicity, completeness, and placement – while WCAG and Section 508 set the accessibility baseline. There's no single legal accuracy percentage; errorless captions are the stated goal.

What 'accurate captions' actually means

Under the FCC's accuracy standard, captions must match the spoken words in the order spoken, without substituting words for proper names and places. They also have to convey nonverbal information: speaker identity, the presence of music, sound effects, and audience reaction (47 CFR 79.1(j)(2)(i)). So "accurate" means more than a clean word-for-word match.

A caption file is more than a transcript with timecodes. The accuracy standard treats a missing speaker label or an unmarked laugh as a defect, because a deaf or hard-of-hearing viewer depends on captions for everything the audio carries. Getting the words right but dropping a [MUSIC] cue still misses the point.

This is why raw speech-to-text output isn't automatically caption-grade. A model can nail the transcript and still miss speaker identity, sound cues, and the on-screen timing captions need. For the mechanics of scoring the words alone, see how word error rate is calculated.

The FCC's four caption accuracy standards

US caption rules rest on four quality standards, set out in a fixed order: accuracy, synchronicity, completeness, and placement (47 CFR 79.1(j)(2)). They were codified from the FCC's 2014 Closed Captioning Quality Report and Order and govern captioned TV programming.

Each standard covers a distinct failure. Accuracy is the words plus non-speech information. Synchronicity means captions line up with the audio and hold on screen long enough to read. Completeness means captions run from the start of a program to the end. Placement means captions don't block faces, mouths, or other on-screen text.

These are the caption accuracy standards most US broadcasters and programmers work against. They're standards, not one pass/fail number: the FCC weighs them together and, as the next sections show, applies them differently to live versus prerecorded content.

How caption accuracy gets measured

There's no universal legal percentage for caption accuracy. The DCMP Captioning Key, a widely used quality reference, sets no numeric threshold at all – it states that "errorless captions are the goal for each production." The one figure DCMP hosts, 98% or better, sits on its real-time captioning page and is attributed to captioning companies, not set by DCMP.

That distinction matters. A "99% accuracy" figure gets repeated as if it were an official caption standard. It isn't DCMP's. DCMP's own guidance asks for errorless captions and, for live work, cites the 98%-or-better rate vendors typically set. Offline captions are usually scored with word error rate; live captions use a different model, covered next.

Real-world speech-to-text rarely reaches these bars on messy audio. Overlap, accents, background noise, and unfamiliar names all drag accuracy down, which is why an AI first pass plus human cleanup beats either alone. For a grounded look at the numbers, see how accurate AI transcription really is.

Reading speed counts as accuracy too. Captions that flash past faster than a viewer can read fail even when every word is correct. The DCMP Captioning Key's presentation-rate guidance caps captions at 130 words per minute for lower-level material, 140 for middle-level, and 160 for upper-level.

Why are live captions judged by a different standard?

Because you can't proofread speech as it happens. The FCC's rules define live, near-live, and prerecorded programming separately, and apply the quality standards differently to live and near-live content than to prerecorded content (47 CFR 79.1). A stenographer or re-speaker working in real time can't hit an offline editor's word-perfect bar.

So live captions get their own metric. Instead of plain word error rate, live subtitling quality is scored with a severity-weighted model called NER, developed by Pablo Romero-Fresco. It weights errors by how much they distort meaning, not by counting every slip equally (Romero-Fresco & Pöchhacker; Wolk & Korzinek).

Regulators pair accuracy with timing. The UK's Ofcom measures live subtitling on three things: average speed, average latency – the delay between speech and subtitle – and the number and severity of errors. The University of Roehampton validates the measurements (Ofcom). Speed and latency are separate metrics, not part of the NER score.

WCAG, Section 508, and the legal baseline

Outside broadcast TV, caption requirements usually flow through web accessibility law. WCAG 2.1 makes captions for prerecorded video a Level A requirement (SC 1.2.2) – the minimum conformance level – and captions for live video a Level AA requirement (SC 1.2.4).

US federal agencies inherit those rules through Section 508. The revised standards incorporate WCAG 2.0 Level AA by reference and apply it to both web and non-web electronic content. For a government site or a federal contractor, "accessible video" means captions meeting the WCAG AA bar.

For internet-delivered video, the FCC's IP-captioning rule (47 CFR 79.4) grew out of the 2010 Twenty-First Century Communications and Video Accessibility Act, which extended captioning to programming shown online after it aired on TV. That's background, not legal advice: which rules bind you depends on your content and audience.

Whichever rule binds you, the criteria converge. Accurate captions carry every word in order, name the speakers, mark the non-speech sounds, stay in sync, run start to finish, and read slowly enough to follow. No single percentage makes a caption "compliant." For your own video, start from a clean, time-synced file and check it against the audio – you can export a time-synced SRT caption file, then correct names, sound cues, and timing by hand.

This explainer stops at the standards. For the production procedure – attaching or burning in the file, syncing it, and exporting for a specific platform – see how to add subtitles to a video.

Tips from people who do this a lot

Accuracy isn't only the words. A transcript that nails every sentence but drops speaker labels and sound cues like [MUSIC] or (applause) still fails the FCC accuracy standard.
There's no magic percentage. DCMP asks for "errorless" captions and cites 98% or better only for real-time work – don't treat a "99%" figure as an official caption standard.
Watch reading speed, not just correctness. Captions running past roughly 160 words per minute outpace many viewers even when every word is right (DCMP caps: 130/140/160 wpm).
Grade live and offline captions differently. Live work is scored on severity-weighted errors plus speed and latency, so hold real-time captions to a realistic bar, not a word-perfect one.
On a federal or contractor site, the operative bar is WCAG 2.0 AA via Section 508 – prerecorded captions are the minimum (Level A), live captions are Level AA.

Try it now

Drop in your recording or paste a link and get a clean, speaker-labeled transcript in minutes. Your first 60 minutes are free.

or paste a link

60 min free · no card required · we never train on your audio

Trusted by 100,000+ creators, podcasters, journalists & researchers

Caption accuracy standards – questions, answered

What are the FCC's caption quality standards?

The FCC sets four caption quality standards: accuracy, synchronicity, completeness, and placement, codified in 47 CFR 79.1 from its 2014 Closed Captioning Quality Report and Order. Together they require captions to match the spoken words and non-speech sounds, stay in sync, run the full program, and sit clear of on-screen text.

Is there a required caption accuracy percentage?

No single legal percentage defines an accurate caption. The DCMP Captioning Key sets no numeric threshold and calls errorless captions the goal. The one figure DCMP hosts, 98% or better, applies to real-time captioning and is the rate captioning companies typically set, not an official standard.

How is caption accuracy measured?

Offline captions are usually scored with word error rate, which counts inserted, deleted, and substituted words against the audio. Live captions use a severity-weighted model called NER, developed by Pablo Romero-Fresco, that weights errors by how much they distort meaning. Regulators like Ofcom add separate speed and latency measurements.

Do WCAG and Section 508 require captions?

Yes. WCAG 2.1 makes captions for prerecorded video a Level A requirement (SC 1.2.2) and live captions a Level AA requirement (SC 1.2.4). US Section 508 incorporates WCAG 2.0 Level AA by reference for federal web and non-web content, so accessible video means captions meeting that AA bar.

Are live captions held to the same standard as prerecorded ones?

No. The FCC defines live, near-live, and prerecorded programming separately and applies its quality standards differently to each. Live captions can't be proofread in advance, so they're judged on a realistic real-time bar, scored on severity-weighted errors plus speed and latency rather than a word-perfect match.

References

1.47 CFR 79.1 – Closed captioning of televised video programming (four quality standards; live/near-live/prerecorded definitions) – Legal Information Institute, Cornell Law School
2.Closed Captioning Quality Report and Order, Declaratory Ruling, and FNPRM (CG Docket 05-231) – Federal Communications Commission
3.Captioning Key – Elements of Quality Captioning (Accurate: errorless captions are the goal) – Described and Captioned Media Program (DCMP)
4.Real-Time Captioning (98% or better accuracy, as set by the captioning company) – Described and Captioned Media Program (DCMP)
5.Captioning Key – Presentation Rate (130/140/160 wpm caps) – Described and Captioned Media Program (DCMP)
6.Understanding SC 1.2.2: Captions (Prerecorded) – Level A – W3C Web Accessibility Initiative (WAI)
7.Understanding SC 1.2.4: Captions (Live) – Level AA – W3C Web Accessibility Initiative (WAI)
8.Applicability & Conformance – Revised 508 Standards incorporate WCAG 2.0 Level AA – U.S. General Services Administration (Section508.gov)
9.47 CFR 79.4 – Closed captioning of video programming delivered using Internet protocol (CVAA) – Legal Information Institute, Cornell Law School
10.Romero-Fresco & Pöchhacker, Quality assessment in interlingual live subtitling: The NTR Model – Linguistica Antverpiensia, New Series – Themes in Translation Studies (University of Antwerp)
11.Wolk & Korzinek, Comparison and Adaptation of Automatic Evaluation Metrics for Quality Assessment of Re-Speaking – arXiv
12.The quality of live subtitling (average speed, latency, and error measurement; Roehampton validation) – Ofcom

Keep reading

Don't just take our word for it.

Ask ChatGPT, Claude, or Perplexity what Pepys is and who it's for. One click, and your favorite AI does the homework.

Ask ChatGPT Ask Claude Ask Perplexity

Get your transcript – free to start

Pay as you go – credits never expire, nothing to cancel. Or start free with 60 minutes, no card.

Start free – 60 minutes or see pricing