Provide text equivalents for audio - general advice on transcripts

Why this is important

Video content that contains spoken or other audio information (on-or off-screen sound effects, or background music) important to understanding the video's content will present access barriers to anyone who is deaf or hard of hearing and unable to hear the video's soundtrack. The same problem will, of course, be encountered by anyone accessing the video using a computer without a soundcard and/or speakers, or in a noisy environment, or where no sound is permitted. Additionally, people who are deaf-blind, and unable to access either visual or audio content will face access barriers if this content is not presented in an alternative way.

General Principles

There are two accessibility solutions for this barrier. The most equitable option is to provide the video content with captions and audio descriptions. For more details, see the How To articles providing General advice on captions and General advice on audio description.

Unfortunately implementing effective captioning and audio description accessibility solutions often requires a significant degree of expertise and time, overwhelmingly related to what is to be included, and increasing as the complexity of the video content increases. This means professional captioning and audio description organisations are best placed to provide a solution, with the associated time delay and expense. Thus, where resources are limited, providing an HTML transcript, however 'rough-and-ready', is a practical alternative - and once you have a transcript, you have a resource from which you can create captions.

Before you continue

The advice on this page helps you avoid introducing a specific accessibility barrier, but it's not a magic formula. To avoid attempting to follow a technical solution that is not appropriate to the resource and its intended purpose, you need to know the context in which the multimedia resource is being used:

  1. The purpose or aim of the multimedia resource in question, and whether it is being used to supplement another resource in the learning environment, or whether its use is required by students.
  2. The target audience, their knowledge and expectations, and the type of browsing and assistive technology that they may be using.
  3. Whether the information and experiences provided by the multimedia technology are already available in an equivalent, alternative form.

For more background on this approach, see our Guide to the use of multimedia in accessible e-learning.

Technique Details

There are two stages to creating a text transcript of an audio or audiovisual multimedia resource:

  1. Generation of the content of the transcript.
  2. Publishing the transcript in structured HTML.

Generation of the transcript is by far the most challenging and time consuming exercise of the two, requiring listening to and writing down the spoken content (take advantage of any existing script), plus any non-spoken audio information important to understanding. You also need to watch the video, in order to provide additional text to provide important information about visual events (in other words, text equivalents of what would be spoken as part of an audio description), so that people who are deaf-blind can understand the content. Links to guidelines and advice on transcription techniques are provided in the External Links section on this page; but as a rule of thumb, the transcript should contain:

  1. All spoken content, including speaker name;
  2. Any additional contextual information relating to the spoken content, for example whether dialogue is whispered or shouted;
  3. Any non-spoken information without which understanding of the video or audio content would be reduced or lost;
  4. Any text that would be spoken as part of an audio description - other, and visual events important to the understanding of the content.

We recommend publishing on-line transcripts in HTML rather than as plain text or Word document. Using HTML allows you to structure the transcript and using Cascading Style Sheets allows you to visually present this structure, for example by distinguishing between different speakers and non-spoken sounds.

Example 1 is taken from one of the transcripts used in this resource. Here, non-spoken information is provided using italicised text:

Alistair sits in front of a computer screen showing the example and talks through it

Alistair: There are many ways we can use video clips in a very low tech way. In this case we've got the students here and we're passing round a cheap digital camera that does video clips - has video clip capability - and so we've got the student explaining to other students how to use a particular piece of kit.

Shot of the computer screen showing the videoclip, with sound

Alistair: Now what we got from a learning point of view is tremendous reinforcement going on because......

Example 1: Sample of transcript for a video clip.

Testing

The subjective nature of this accessibility technique, and a lack of formal, universally applicable guidelines makes it impossible to test using automated means. Instead, feedback from end users is highly recommended, particularly people who are blind and/or deaf or hard of hearing.