Subtitles or captions – which do you actually need?
Subtitles and captions use the same file formats but carry different content. Subtitles render spoken dialogue as text, often for viewers who can hear but don't share the language. Captions add speaker labels and non-speech sound – [music], [applause], a door slamming – for deaf and hard-of-hearing viewers. If accessibility is the goal, you want captions, not bare subtitles.
That distinction has legal weight. WCAG 2.1 Success Criterion 1.2.2 requires captions for all prerecorded audio in synchronized media at Level A, the baseline conformance level. In the US, Section508.gov adopts that same criterion for federal and public video. So for an institutional or public-facing video, "add subtitles" really means "add compliant captions."
Even outside compliance, subtitles widen who can watch. People view muted in offices and on trains, and non-native speakers read along to keep up. Text on screen is the cheapest reach you'll buy. Decide up front which you're making, because it changes what goes in each cue.
Why auto-captions won't cut it on their own
Auto-generated captions save typing but miss the accuracy bar. A university accessibility office states it plainly: automated captions aren't sufficient for public content or accommodation requests without human editing. So the machine draft needs correction before you ship it.
Typing from scratch, though, is brutal. Manual transcription can take up to six hours for a single hour of audio – most of a working day for one video. The workable path splits the job: an AI first pass handles the bulk, then you edit. You're correcting, not re-transcribing.
Spend that editing time where ASR fails. Proper nouns, acronyms, numbers said quickly, and punctuation that changes meaning are the usual suspects, and they're exactly what a viewer notices on screen. For a recorded talk or spoken-word footage, the same first-pass-then-fix workflow applies, and our lecture transcription guide walks the spoken-word specifics.
SRT, WebVTT, or burned into the picture?
Subtitles reach the screen two ways. A sidecar file rides alongside the video and the player toggles it on; burned-in (open) subtitles are painted into the pixels and can't be switched off. WebVTT is the web-native standard here: it's a published W3C specification and the format the HTML5 <track> element reads.
SRT (SubRip) is a plain-text timed format that players, editors, and social platforms read widely – export it when you need a caption file to upload with your video, using an audio-to-SRT export. WebVTT is the choice for a <track> on your own site, from an audio-to-VTT export. Both are sidecar files; keep one as your editable master.
Burn-in is the fallback. For feeds that autoplay muted or ignore uploaded caption files, rendering the subtitles into the frame guarantees they show. The trade-off: you lose the viewer's toggle, and fixing one typo means re-rendering the whole clip. So keep the sidecar file as the source of truth and burn a copy only when a platform forces it.
Format subtitles people can actually read
Readable subtitles hold to a reading speed and a line budget. The BBC Subtitle Guidelines recommend 160–180 words per minute, a 37-character line-length limit, and a maximum of two lines per cue for landscape or square video. Push past that and viewers can't finish a line before it changes on them.
Break lines at natural clause boundaries, never mid-phrase. Don't split a name across cues or strand a preposition from its noun. Let each cue sit long enough to read – even a short line wants roughly a second on screen. Cramming three lines to fit a long sentence just loses people.
Sync matters as much as wording. Subtitles should appear as the words are spoken and clear soon after. An off-by-a-second cue is more distracting than a small wording slip, so spot-check the timing against the audio using your transcript's timestamps before you export.
How to add the subtitles to your video
Attaching a sidecar track is the clean route. For your own site, reference the .vtt file in a <track> element inside <video>, and the browser draws the toggle. For YouTube, Vimeo, or social, upload the SRT in the caption settings. There's no re-encoding, and viewers keep control of whether text shows.
Burn in when you must. An editor (or ffmpeg) renders the subtitle file into the video for platforms that won't take a sidecar. Style for legibility: high-contrast text, a slight shadow or backing box, positioned bottom-center and clear of any lower-third graphics.
Either way, do the transcript first. You can get a timed, editable transcript in minutes, correct it, then export SRT or VTT, rather than typing and timing every cue by hand. The transcript is the real work; the subtitle file is just its export.