Posted on Leave a comment

Add Closed Caption to Videos – Part Two

The Bottleneck – Transcription File to Captioning file

After processing through the Watson – Amara approach It became clear that there is a piece missing. Watson does a decent job of creating a transcript. It even has something it calls a “Timing File”. Unfortunately, from the demo page, it is not possible to easily save either this file or the JSON file that is also available. I am presuming this is due to the fact that the Watson Speach-To-Text is a paid service and I am using a demo. So I will have to try to full-fledged service.

Trying out Watson Speach-to-Text API

The biggest question is “Can the Watson speech API output a Caption file format?” I went over to Wikipedia for some background and a list of the different formats.

A short simple introduction can be found here:

A more in-depth article on Subtitles can be found here:

What I was looking for, a chart of the formats is here:

From the upload dialog the accepted formats are:

Our site accepts SRT, SSA, SBV, DFXP, TXT, and VTT format. Only files ending in .srt, .ssa, .sbv, .dfxp, .txt, .vtt or .xml (for dfxp) are accepted.

Attempt 1 – Github, SubtitleMe

I did what all of us do, I googled “Watson Speach API convert to subtitles.” Some of the first entries returned where GitHub entries so I tired the first.

It is a program called “SubtitleMe” its claim is that it will use the Watson Speach API to create a subtitle file. Here is my first attempt:

This certainly could be user failure, but I definitely want the easiest solution. I will try the second Github entry.

Attempt 2 – Github, Subtitler

So, it turns out that, Subtitler is a fork of the first one, but it did seem to get a little further. I fed it a one minute file. After more than a minute of streaming the file, I was getting no results. I set that window aside and moved on to the next approach.

Attempt 3 – Using IBM-Watson Nodes for Node-RED

In the next article, I will be going through how I used Node-RED to query IBM Watson’s Speech-to-Text API. For now, here is a screenshot: