Guide

SRT vs VTT: what's the difference?

A plain comparison for anyone exporting captions: what differs between the two formats, and which one your player needs.

The short answer

SRT and WebVTT both pair caption text with timecodes, but they differ in a few ways. SRT separates milliseconds with a comma (00:00:01,000); WebVTT uses a full stop (00:00:01.000). WebVTT files open with a WEBVTT header and support on-screen positioning and CSS styling. The HTML5 <track> element reads WebVTT, not SRT. Use SRT for broad uploads, WebVTT for web players.

SRT and WebVTT both carry timed captions

Both formats do the same core job: pair lines of text with start and end timecodes so a player shows the right caption at the right moment. The difference is lineage. SubRip's SRT is the older, plainer format – Matroska's technical reference calls it perhaps the most basic of all subtitle formats. WebVTT is the web-native one, published by the W3C Timed Text Working Group on the Recommendation track. No standards body governs SRT; it grew as a de-facto convention, which is why its rules stay loose.

You can tell them apart at a glance. A WebVTT file must open with the literal string WEBVTT on its first line, after an optional byte-order mark. An SRT file has no header at all. It starts straight into the first cue, numbered with a sequential index. That index above every SRT block is required; WebVTT makes the cue identifier optional.

SRT vs VTT: the comma-versus-full-stop timecode divide

The most cited difference is a single punctuation mark. SRT writes its milliseconds after a comma – the Matroska reference shows a cue running 00:02:17,440 --> 00:02:20,375. WebVTT writes the same field after a full stop, as in 00:02:17.440. The WebVTT grammar is explicit: a timestamp ends with a full stop followed by three ASCII digits for the thousandths of a second. Both formats use HH:MM:SS and a three-digit fraction; only the separator changes.

That one character is why the two files aren't interchangeable by a rename. A valid WebVTT file also needs its WEBVTT header, which an SRT file never carries. A converter has to add that header and swap every comma for a full stop. If you're building a caption file by hand, start from the plainer SRT structure and adjust from there.

Styling and positioning set the two apart

WebVTT carries formatting that SRT has no mechanism for. The spec defines six cue settings – vertical, line, position, size, alignment, and region – that place a caption anywhere in the frame. SRT has none of these. Matroska notes there are no general settings for SRT at all, so the format leaves that container field blank.

Styling goes further in WebVTT. CSS style sheets can target cues through the ::cue pseudo-element, and a file can hold STYLE blocks for reusable rules and REGION definitions for scrollable caption areas. SRT expresses none of that in the format. Its plainness is the point: if you need a caption parked in a specific corner or styled to match a brand, that lives in WebVTT.

Which format does your destination expect?

Your player decides the format more than your preference does. The HTML5 <track> element is defined to read WebVTT: MDN states the tracks are formatted in WebVTT format (.vtt files), and the WHATWG HTML standard ties the track element's URL to a WebVTT resource. The <track> element is specified only for WebVTT; SRT is not a defined <track> format.

WebVTT also carries more than captions. Through the <track> kind attribute, it can hold chapters, metadata, and descriptions. The WebVTT spec lists captions, subtitles, chapters, audio descriptions, and metadata as data kinds a file can carry. SRT defines no such mechanism. If your target is a web page, export a WebVTT file and point a <track> at it.

SRT's reach comes from its plainness. As the most basic subtitle format, it's the low-common-denominator file that most desktop players and upload fields accept. That makes it a sensible default when you don't know what the far end supports. Exporting from a recording for those uploads? Generate an SRT instead.

Getting your captions onto the video

With the format settled, the next step is producing the file. Upload your recording and export it in the format you picked. From there, the file has to reach your player, and how you deliver it depends on where it's going.

On a web page, you host the .vtt and point an HTML5 <track> at it. A desktop player is different: side-load the .srt next to the video so the player finds it, or embed it in the file. Social platforms take the caption file in an upload field on the post. Getting the captions to display is its own step, separate from choosing the format, but the file you just exported decides how it runs.

Tips from people who do this a lot

The comma-versus-full-stop separator is the one edit that trips up most manual SRT-to-VTT conversions: change 00:00:01,000 to 00:00:01.000 before anything else.
Renaming a .srt to .vtt won't produce a valid file. WebVTT requires the WEBVTT header line on top, which SRT never has.
Need a caption parked in a specific corner or styled with CSS? That only exists in WebVTT's cue settings and ::cue rules – SRT has no positioning or styling in the format.
When you don't know what the far end supports, default to SRT. It's the most basic subtitle format and the widest-accepted plain-text file.
WebVTT can also carry chapters, metadata, and descriptions, not just captions – the <track> kind attribute selects which. SRT has no equivalent.

Try it now

Drop in your recording or paste a link and get a clean, speaker-labeled transcript in minutes. Your first 60 minutes are free.

or paste a link

60 min free · no card required · we never train on your audio

Trusted by 100,000+ creators, podcasters, journalists & researchers

Srt vs vtt – questions, answered

What's the main difference between SRT and VTT?

The clearest difference is the timecode separator. SRT writes milliseconds after a comma (00:00:01,000); WebVTT writes them after a full stop (00:00:01.000). WebVTT also opens with a WEBVTT header, supports on-screen positioning and CSS styling, and is the format the HTML5 <track> element reads. SRT carries none of that formatting.

Can I just rename a .srt file to .vtt?

No. A valid WebVTT file must begin with the literal WEBVTT header line, which an SRT file never has. You also have to change every timecode's comma to a full stop. Renaming alone leaves a file that isn't valid WebVTT, so a proper conversion is the safe route.

Which format does an HTML5 video player use?

The HTML5 <track> element is defined to read WebVTT (.vtt) files, so a caption track embedded in a web page uses WebVTT. SRT is not a defined <track> format. For social uploads and desktop players, SRT is the more broadly accepted plain-text default.

Does SRT support text positioning or styling?

No. SRT has no positioning or styling mechanism in the format itself; its container settings are left blank. WebVTT defines six cue settings (vertical, line, position, size, alignment, region) plus CSS ::cue styling and STYLE and REGION blocks. For captions placed or styled precisely, use WebVTT.

Is SRT or VTT better for captions?

Neither is better; they suit different destinations. Use SRT for the widest, plainest compatibility on social uploads and desktop players. Use WebVTT when the file feeds a web page's HTML5 <track>, or when you need positioning, styling, or non-caption data like chapters and metadata.

References

1.WebVTT 1.0 – file structure, timestamp grammar, cue settings, and styling – W3C (World Wide Web Consortium)
2.Matroska technical reference – SRT Subtitles (structure, timecode example, settings) – Matroska.org
3.The <track> element – WebVTT format and the kind attribute – MDN Web Docs (Mozilla)
4.HTML Living Standard – media (the track element's WebVTT resource) – WHATWG

Keep reading

Don't just take our word for it.

Ask ChatGPT, Claude, or Perplexity what Pepys is and who it's for. One click, and your favorite AI does the homework.

Ask ChatGPT Ask Claude Ask Perplexity

Get your transcript – free to start

Pay as you go – credits never expire, nothing to cancel. Or start free with 60 minutes, no card.

Start free – 60 minutes or see pricing