What is Automated Speech Recognition?

Updated on Mar 28, 2024

This guide will explain the basics of Automated Speech Recognition (ASR) and related concepts. More information regarding Automated Speech Recognition can be found here.

What is Automated Speech Recognition (ASR)?

ASR is a technology that automatically recognizes spoken language and automatically generates text transcripts.

Do all my videos receive transcripts?

All new videos uploaded to the platform will automatically be transcribed. However, existing media items uploaded prior to enabling automatic transcriptions are not transcribed, and videos over four hours in length are not automatically transcribed. Videos can still be submitted for transcribing, or transcriptions can be manually uploaded.

How long does it take for my video to be transcribed?

Transcription time is normally 30 minutes for automated transcript requests.

What is the difference between transcripts and closed captioning?

Transcripts are a textual representation of spoken audion from a video. Closed Captions meet ADA requirements for accessibility and include contextual audio such as background noises. For example, take a scene of someone opening a door and greeting someone. A text transcript would only note the text, while closed captioning would include sounds such as a door opening, speaker identification, background noises, in addition to the spoken audio.