Start with a clean feed from the soundboard
The caption is only as good as the audio behind it. A microphone at the back of a sanctuary picks up reverb, the HVAC, and the whole congregation, and speech recognition gets worse on that far-field sound. One study reports a 75% relative rise in word error rate when a close mic is swapped for a far-field array mic (Purushothaman et al., 2021).
Take the feed straight off the soundboard instead. Most church mixing desks have a record output or a spare aux send, and that signal carries the pulpit mic and the lavaliers without the room around them. Send that into your transcription tool and the captions come back cleaner, because the model isn't fighting echo and crowd noise.
The capture details are the same ones that make any sermon recording usable: which board output to use, mono versus stereo, matching your levels. Rather than repeat them here, we cover soundboard capture in depth in how to transcribe a sermon.
Add captions to a church service, live or recorded
Two different jobs hide inside 'captioning a service,' and the accessibility standards treat them differently. Captions on a livestream are live captions, which WCAG 2.1 lists as a Level AA success criterion (W3C, SC 1.2.4). Captions on the recording you post afterward are prerecorded captions, a stricter Level A criterion (W3C, SC 1.2.2).
Live captioning runs in real time, so it leans on either automatic speech recognition or a human captioner typing along with the service. It keeps up, but it's the harder path. There's no second pass, and every misheard name goes out uncorrected. For most teams the live track is good enough to follow, not clean enough to publish.
Captioning the recording afterward is where the accuracy comes from. You transcribe the finished audio, read it against the video, fix what the model missed, then attach a corrected caption file. Plenty of churches do both: a rough live caption during the stream, and a clean track on the archived sermon.
Turn the recording into an SRT or VTT file
A caption file is a plain-text list of lines, each with a start and end timestamp, and two formats cover almost every platform: SRT and WebVTT. WebVTT is a published W3C specification (W3C, WebVTT), and it's the format the HTML `<track>` element reads to show captions on a web video (MDN, <track>).
To make one, run the service audio through a transcription tool that timestamps every line and exports a caption file directly. Church-service transcription gives you a timestamped, speaker-labeled transcript you can export as SRT or VTT. For the church website, export a .vtt file and point a `<track>` tag at it; for a livestream or video platform, upload the .srt version, which is the format most of them expect.
The parts that aren't specific to a church are the same for any video: burning captions into the picture versus a separate track, splitting long lines, choosing between SRT and WebVTT. We walk through those in how to add subtitles to a video.
Fix the names and Scripture before you publish
Automatic captions get the sermon mostly right and the proper nouns wrong. End-to-end speech recognition keeps a measurable error rate on entity names that show up rarely in its training data (Pusateri et al., 2024). That's exactly your hard part: Scripture references, book and chapter numbers, hymn titles, and the names of people in the congregation.
So budget a short correction pass on those spots. Read the caption file against the audio and fix the names, the 'Ephesians' the model heard as three separate words, the chapter-and-verse numbers, and any vocabulary specific to your tradition. The rest of the transcript usually needs only a light touch.
While you're editing, keep the lines readable. Broadcast subtitle norms keep to about two lines on screen at once and pace reading around 160 to 180 words per minute (BBC Subtitle Guidelines). If a caption flashes past faster than a person can read it, split it into two.
What the ADA actually requires of a church
Start with the honest legal picture, then decide on the merits. The ADA's Title III, the part that covers public accommodations, does not apply to religious entities, including places of worship (28 CFR 36.102(e)). The Department of Justice reads that exemption broadly, covering a religious entity's activities whether they're religious or secular (ADA.gov).
That's the general rule, not a reason to skip accessibility, and it isn't legal advice. The same church can still fall under Title I employment obligations once it has enough staff. Local law, grant conditions, and your own denomination's standards may point the other way, so check with counsel before you rely on the exemption.
The stronger reason to caption isn't the statute anyway. Over 5% of the world's population, about 430 million people, live with disabling hearing loss (WHO, 2026). Captions let those members follow the sermon, and they make your archived services searchable for everyone. WCAG's caption criteria are a voluntary standard you can choose to meet even where no law demands it.