Auto Caption a video
February 26, 2021 - Doug Sillars in captions, Delegated Upload, Video Status, NodeJS
Video captions/subtitles make it easier to follow along with your video. Whether your users have hearing issues or not, sometimes they may not want to have a loud speaker playing the video's audio, and adding captions allows them to understand the video.
When adding captions to a video, you must create a WebVTT file. We've written on the process of creating WebVTT files. The format is not difficult to understand, but they are not simple to create manually, you need to break up a transcription into segments, and add start and stop times for each section.
3
00:00:07.519 --> 00:00:11.759
The little train rumbled over the tracks.
4
00:00:10.880 --> 00:00:15.279
She was a happy little train for She had such
5
00:00:13.439 --> 00:00:16.719
a jolly load to carry.
6
00:00:15.599 --> 00:00:20.480
Her cars were filled full of good things for boys
7
00:00:18.879 --> 00:00:21.199
and girls.
In this post, we'll show how using the Authot transcription APIs can help you automatically add video captions to your videos, without the fuss of manually creating the VTT caption file.
Authot will receive the video file, make a transcription of the audio file, and then return a complete webVTT file. api.video can then ingest this file into the same videoID as the video, and the captions are added to the video.
It sounds like magic! To prove that it actually works, we've built a demo application caption.a.video that utilizes api.video and Authot to host, stream and auto-caption your videos.
How it works
The first step is to upload your video to api.video. This demo is built on top of a delegated token upload, meaning that a temporary public token was created that lets us upload right from the browser.
Upload complete
Once the video is uploaded, api.video begins transcoding into a HLS sstream and an MP4 version of the movie. To create a transcription, Authot requires the mp4 file, so the application uses the video status endpoint to determine when the mp4 video has been encoded. This endpoint tells us when each video stream is ready for playback. Once the mp4 is encoded, we can submit the file to Authot for transcription.
Transcription
Once the video has been submitted to Authot, the API returns the ID number of the transcription. The Status endpoint tells us what state the transcription is in.
The initial state is 0: Uploading
The final state is 10: transcoded
There are some states in between, including 2: transcribing
Getting the WebVTT
Once the transcription is complete, (when Authot returns the status "10"), the Node server requests the VTT file, and then uploads the file to the existing video at api.video.
Completed
The webpage finally updates the text, saying "success". Clicking the video link - opens the video, and the user can enable captions and watch them play back.
Automatic captions!
Of course, automatic captions are not 100% accurate. You may have a few words incorrectly transcribed. However, in my testing the Authot API is pretty good, and the captions are of a very high quality.
On the other hand - manually creating captions will either take a lot of time, or cost a lot of money (to pay someone else to do the work). The ease of use, and low cost makes auto-captioning a compelling choice
Conclusion
We've described an application that allows you to upload a video and automatically add captions to the video. Try it out today at caption.a.video. If you're curious on how we built this app, the code is available on Github. Let us know what you think by posting on our community forum.
Add captions, and watch your video watch time increase!
Follow our latest news by subscribing to our newsletter
Create your free account
Start building with video now