Captioning with QuickTime SMIL
Author
Patrick H. Lauke
Key multimedia technology
SMIL; MAGpie (Media Access Generator); Quicktime.
Specific Issues/Key Terms
Captions; text transcripts; web standards; cross-browser compatibility.
Introduction
First, some background information on this project: in September 2004 the Web Essentials 04 conference took place in Sydney, Australia. Among the highlights of this conference was a keynote speech by renowned web designer Jeffrey Zeldman. As he was unable to attend in person, the keynote was delivered as a pre-recorded video presentation. In November 2004, Zeldman made this video available on his company's site. After announcing the availability of the video on various mailing lists, I received a reply from a deaf user asking for a transcript. As I had experimented with small video captioning examples in the past, I decided to take this opportunity to apply some of my findings to a real-world problem.
Project Aims and Accessibility Design Objectives
The main aim of the project was to make the keynote video more accessible to deaf/hard of hearing users. Additionally, I wanted to use the opportunity to raise awareness of accessibility issues and showcase the possibilities offered by simple and easily available technologies such as SMIL - which up to now have mostly been confined to academic proofs of concept - on a high-profile (in web standards/development circles) example.
Technology Used
This project used SMIL (Synchronised Multimedia Integration Language), an official W3C (World Wide Web Consortium) specification which offers a simple framework for creating rich media presentations from separate elements. As captioning is essentially the simultaneous playback of video and text, this technology seemed the most appropriate standards-based solution available to developers.
There were three additional technical considerations which reinforced this decision:
As the outcome of the project was going to be published on my personal site, I needed a solution which would enable me to reference the existing 9 MB video file without having to host it myself (and bear the resulting bandwidth cost)
the caption text should be stored "in the clear", thus making it available (and editable - see next point) to all users (e.g. text-only browsers, users wishing to copy/paste extracts, etc)
I wanted to realise the project without the need for any expensive, specialised software to demonstrate the ease with which new and existing audio/video resources can be made more accessible.
As SMIL essentially works as a "container" for external content, it satisfied points 1 and 2, enabling me to host the text caption file (as well as the actual SMIL file itself) and, on playback, synchronise it with the video held on Zeldman's site. As the caption and SMIL file are both pure text files, the format satisfies point 3, since it requires no software other than a standard text editor.
Project success
The project was successful insofar as the deaf user who originally requested the transcript was able to watch and take advantage of the captioned version of Zeldman's keynote speech - read the user's positive feedback in an archived email post
Following completion of this project (which was publicised to great effect when Jeffrey Zeldman himself added a link to it from the download location of his original movie), I have received positive feedback from a large number of users worldwide (both with and without hearing impairments). To date, there has only been a single report of technical problems on Macintosh OS X / Safari, which may be due to specific configuration issues at the user's end.
Reflections
Overall, the project achieved all of the original aims. It made the video accessible to deaf/hard of hearing users. It also provided a highly visible, real world example of what can be achieved with SMIL, and a framework for producing captioned versions of audio-visual material at minimal cost without the need for any additional software.
The major factor in achieving these aims was that of time. The initial transcription of the 7.5 minute video took around 30 minutes. This time could be dramatically reduced if even a rough transcript had been available - which emphasises the point that accessibility requirements should be taken into account at the planning stages (in this case, before filming) rather than subsequently "retro-fitting" an inaccessible resource.
Manually converting the transcript into an actual caption file, which includes timing information for each discrete piece of text, took an additional hour, including fine-tuning and testing. This step can be simplified by the use of free programs such as MAGpie (Media Access Generator).
Finally, SMIL has been ratified as an official W3C recommendation back in July 1998, but has not seen wide adoption in the same way as proprietary formats such as Macromedia Flash.
Although SMIL is a standard, it still only provides a framework to "glue" files in other formats together. Therefore, there are a variety of different and incompatible implementations. This particular project used Quicktime specific SMIL components (the original QuickTime video and a caption file in QTtext format).
A final technical stumbling block was presented by the fact that although Quicktime understands (its specific implementation of) SMIL, it does not currently register itself with the operating system (at least on Microsoft Windows) as default players for files with a .smil suffix or application/smil MIME type. This makes it impossible to simply provide a link on a web page to a Quicktime SMIL file, as the user's machine would not know how to handle it (requiring the user to explicitly force Quicktime to open the file). Worse still, other players such as Realplayer (which also has its specific SMIL implementation) may attempt to play the file and fail due to the Quicktime specific format.
To circumvent this particular problem, I opted to embed the SMIL in an HTML file via the <object> element and forcing the browser to use Quicktime. This is not an optimal solution, as an embedded object does not allow the user to easily resize the display area.
Due to these deployment issues, a follow-up project currently in planning will use a more universal and consistent playback solution, such as that provided by Flash, while still retaining the separation of video and captions as separate and independently editable resource files. In future, I would certainly like to see a complete implementation of a general purpose SMIL parser for Flash, which would resolve most of the issues outlined above.
The original video: Web Essentials 04 - Zeldman video keynote
The completed project: Web Essentials 04 - Zeldman keynote captioned with Quicktime SMIL
About the author
Patrick works as webmaster for the University of Salford, where he heads a small central web team; in his spare time, he runs a small web/design consultancy - Splintered.co.uk. He has been engaged in the discourse on accessibility for the last 5 years, regularly contributing to a variety of web development and accessibility related mailing lists, as well as taking on the role of administrator for Accessify.com and moderator for the Accessify forum.
Related Sites
- Best Practices in Online Captioning (Joe Clark)
- An extremely comprehensive on-line resource relating to all aspects of captioning multimedia content.
- Creating Captions for Rich Media (NCAM)
- The National Centre for Accessible Media's overview of methods for captioning web-based multimedia, including links to tutorials and examples of captioned video and animation.
- QuickTime Interactivity - SMIL (Apple)
- Apple's page on SMIL and QuickTime, including a list of QuickTime's SMIL extensions.
- SMIL 1.0 Specification (W3C)
- The full specification of version 1.0 of Synchronised Multimedia Integration Language, published in 1998.
- Splintered.co.uk
- Home page of the author's web/design consultancy.
- Web Essentials 04 Conference (WE04)
- The web site of the Web Essentials Conference, which took place in Sydney, Australia in autumn 2004, and at which the Zeldman presentation was aired.
- Zeldman.com
- The web site of Jeffrey Zeldman, a leading figure in the web standards movement, and subject of the captioned video.
Related Resources
How To
- Provide audio descriptions for video or animated content - with Synchronised Multimedia Integration Language (SMIL)
- Provide audio descriptions for video or animated content - general advice
- Provide text equivalents for audio - with Synchronised Multimedia Integration Language (SMIL)
- Provide text equivalents for audio - general advice on captions
- Provide text equivalents for audio - general advice on transcripts
- Use media to enhance text - using video
Articles
Case Studies
- Providing captioned video clips for the Skills for Access web site
- Captioning Video for Accessibility