How to manage subtitles, transcripts, and audio descriptions to make videos and audio accessible : WebSite X5 Guide

Compatibility:

WebSite X5 Evo e Pro

Multimedia content, such as videos and audio, plays a central role in the web experience. However, to be truly accessible to everyone, they must be accompanied by subtitles, transcripts, and audio descriptions. These tools are essential for supporting users with hearing or visual impairments but also benefit many others, such as those browsing in noisy environments, those who cannot activate the audio, or those who prefer reading over listening.

Beyond improving website inclusivity, implementing these measures can also provide benefits in terms of SEO. Transcripts, for example, offer textual content that search engines can index, enhancing page ranking.

This guide explores the key tools and techniques for making your multimedia content more accessible, covering the following points:

WCAG guidelines for multimedia content
Creating synchronized subtitles for pre-recorded videos
Using transcripts to provide complete textual alternatives
Using audio descriptions to interpret essential visual elements
How to create subtitles, transcripts, and audio descriptions
Managing video accessibility in WebSite X5

Implementing these solutions is not just a legal requirement to comply with accessibility regulations but a choice that improves user experience and enhances the value of your website for all visitors.

WCAG guidelines for multimedia content

The Web Content Accessibility Guidelines (WCAG), the international standard for web accessibility, include specific recommendations to make multimedia content accessible to all users, including people with sensory disabilities. The main guideline dedicated to multimedia content is 1.2: Time-based Media, which requires providing an equivalent alternative for all content where time is a key element for understanding.

What is meant by time-based media?

Time-based media includes multimedia content that relies on a timeline to convey information. These can be divided into three main categories:

Audio-only – Files that contain only audio tracks, such as podcasts or narrations.
Video-only – Visual content without audio, such as animations or silent slides.
Audio-Video (Multimedia) – Content that combines both audio and video, such as webinars or tutorials.

The main success criteria of Guideline 1.2

For each type of media, Guideline 1.2 specifies success criteria that help ensure accessibility. Here are the most important ones:

Criterion 1.2.1 - Alternatives for Audio-only or Video-only Media (Level A)

Requires that for media content that is audio-only (e.g., podcasts) or video-only (e.g., silent animations), text-based alternatives are provided that describe the content in detail. For audio content, this could be a transcript of everything spoken or narrated; for video content, it could be a detailed description of the scenes shown.

Criterion 1.2.2 - Captions for Pre-recorded Media (Level A)

Applies to videos with a pre-recorded audio track, such as interviews or tutorials, and requires synchronized captions. These captions must include not only dialogue but also indications of tone of voice and meaningful sounds in the context (e.g., [door closing]).

Criterion 1.2.3 - Audio Descriptions or Transcripts for Pre-recorded Media (Level A)

Requires an audio description or transcript for videos that contain critical visual elements that are not explained in the audio. This is particularly relevant when images convey information that is not directly described in dialogue or narration.

Criterion 1.2.5 - Audio Description for Complex Content (Level AA)

Designed for complex video content where visual information is essential for understanding the message. It requires an audio description, a separate audio track that describes what is happening visually, integrated between dialogue or during silent moments.

These success criteria help ensure that no user is excluded from the multimedia experience by providing equivalent alternatives such as captions, transcripts, and audio descriptions. Let’s take a closer look at what these elements involve.

Creating synchronized subtitles for pre-recorded videos

Subtitles are one of the most effective tools for ensuring the accessibility of multimedia content for those who cannot hear or understand spoken language due to hearing impairments, cognitive difficulties, or environmental conditions (e.g., excessive background noise).

As we have seen, subtitles are essential for all pre-recorded videos that include audio tracks, such as interviews, tutorials, or presentations. They are a synchronized transcription of dialogues and relevant sounds, designed to be displayed directly on the video. In addition to faithfully transcribing spoken content, subtitles should include:

relevant sounds for the context (example: [applause], [laughter]);
identification of the speaker when it is not obvious (example: John: "Let’s go!").

Using transcriptions to provide complete text alternatives

Transcriptions are essential for ensuring the accessibility of audio or video content, particularly for podcasts, audio files without video, and videos where subtitles or audio descriptions cannot be included. They are similar in format and purpose to subtitles but, compared to them, they are more detailed, offering a complete text version of the content, including dialogues, significant sounds, and relevant visual descriptions.

Each media player handles transcriptions differently, so it is important to insert them in one of the following ways:

Directly on the page: inserting the transcription text below the multimedia content.
As a downloadable file: providing users with a link to download the transcription.
On a separate page: linking to a dedicated page containing the transcription.

#tip - Transcriptions not only improve accessibility for people with hearing impairments, but also have a positive impact on SEO: search engine bots can index text content, helping to improve the page’s ranking without penalties for duplicate content.

Using audio descriptions to interpret essential visual elements

For videos where images contain crucial information not described in the audio, such as charts, actions, or significant gestures, it is necessary to provide audio descriptions. These are additional narrations that describe visual elements, helping blind or visually impaired users understand the content.

Audio descriptions should include non-verbal information, such as:

Facial expressions or significant gestures.
Relevant actions that are not described in the dialogues.
Visual details of the environment that help contextualize the content.

They can be integrated during natural pauses in the dialogue or relevant sounds. When these pauses are not sufficient, extended audio descriptions can be used, where the video temporarily pauses to allow the narrator to describe all relevant visual elements before continuing playback.

How to create subtitles, transcripts, and audio descriptions

Creating subtitles, transcripts, and audio descriptions may seem like a complex task, especially for those with no prior experience. However, various technical solutions simplify the process, making it more accessible even for those with limited resources or time.

To create subtitles synchronized with audio, one of the easiest solutions is to use YouTube Studio. Once the video is uploaded, YouTube automatically generates synchronized subtitles, which can be manually edited to improve accuracy. Other free online platforms, such as Amara and Kapwing, offer intuitive tools for creating custom subtitles and easily synchronizing them with audio.

Subtitles can serve as a useful base for producing transcripts, reducing the time required for their creation. However, dedicated tools allow for accurate automatic transcription. Among them, Otter.ai and Descript are two of the most popular solutions: they generate quick transcriptions and offer user-friendly interfaces to correct any errors.

Audio descriptions, compared to subtitles and transcripts, require a more creative and time-intensive process. A free platform like YouDescribe allows users to add audio descriptions to YouTube videos, simplifying the process. Alternatively, professional software like Adobe Premiere Pro and Camtasia enable recording additional audio tracks and integrating them directly into videos, offering greater control over quality and the final result.

Although implementing subtitles, transcripts, and audio descriptions may present some challenges, the importance of making content accessible cannot be underestimated. With the right technical solutions and proper planning, the process becomes more manageable, improving multimedia content accessibility without requiring excessive effort.

Managing video accessibility in WebSite X5

WebSite X5 allows for easy integration of video and audio files within web pages, but it is the user's responsibility to ensure that these media are accessible to all users, including those with visual or hearing impairments. This means that:

Subtitles must be generated and embedded in the video file before being added to the site.
Transcripts should be provided as separate text or included within the page.
Audio descriptions should be recorded and integrated into the videos, if necessary.

As we have seen, to facilitate this process, external tools such as YouTube Studio, Amara, Otter.ai, or Adobe Premiere Pro can be used to simplify the creation and synchronization of subtitles and transcripts. Once generated, these elements can be integrated into WebSite X5 by adding extra text or linking to external resources.

While the software does not automate the generation of these elements, it ensures maximum flexibility in their integration, allowing users to create web pages that are accessible and compliant with WCAG standards.

How to manage subtitles, transcripts, and audio descriptions to make videos and audio accessible