Typing as Fast as You Can Speak

A typical computer uses a standard keyboard with more than 100 buttons. Many of these will have a secondary function, activated by modifiers such as Shift and Ctrl modifiers. This is more than enough to encode the entire alphabet in upper- and lowercase, numbers 0 to 9, a selection of everyday symbols, and common functions that interact with the operating system.

On the other hand, a stenotype machine has less than 25 buttons, which is not enough for all the letters of the English alphabet, never mind the numbers and punctuation marks. This is because the operator is more interested in the sound of a word than the spelling, and it allows a speed of more than 200 words per minute while moving the hands as little as possible.

Incidentally, the one punctuation mark on the device is an asterisk, used to mark corrections. In some messaging applications, where messages can’t be recalled, users will typically type an asterisk underneath, followed by the corrected word underneath.

However, the stenotype is now decades old and technology has now moved beyond that. Below is a video about live subtitling for proceedings in Parliament.

A video hosted on YouTube with an overview of how subtitles are produced for Parliamentary sessions.

In this application, voice recognition is used. However, it’s far easier to program a computer to understand just one voice instead of many, so an operator listens through headphones to the words spoken on TV and repeats them.

You’ll notice from the video that the operator speaks in something of a monotone regardless of how passionate the MPs are feeling, and this helps the software to provide a consistent result. Punctuation also needs to be added manually, not to mention switching between different people; colour codes are often used to help viewers work out which person said what.

Such software is also available for home users. For a period when I had RSI, I used Dragon NaturallySpeaking to give my fingers a rest. It worked to a high standard, I found, even straight out of the box and with a Scottish accent. However, it produces its best results when connected to the Internet, as it can benefit from deep learning techniques. If it can’t, the audio is processed locally and there’s a noticeable decrease in quality.

Captioning the Moment

By law, UK broadcasters must make sure that a minimum percentage of their output is subtitled. This week, I’ve been finding out how this is done.

Traditionally, a typist would be listering to the broadcast and entering the words using a stenography machine. These have a keyboard that accepts syllables rather than individual letters, and complete words would appear to viewers.

However, this method has been superseded by a technique called respeaking. Rather than a typist entering the words by hand, they listen to the audio and speak it into another microphone, where it’s converted into text by software.

So why not simply take the broadcast audio output and convert that directly into text? The computer would have to work out what is speech and to filter out any background noise such as applause, then it would need to be able to accommodate for different people’s accents and mannerisms. Lord Prescott, for instance, is notorious for not finishing his sentences.

Even today, a person can identify the correct content much more effectively than a machine, and can cope better with understanding one voice than thousands.

Respeaking also has two advantages over traditional stenography:

  1. It can take between two and five years of full-time training to use the keyboard at 200 words per minute. Respeakers can reach trainee standard after six months.
  2. The typist’s fingers are left free to make other adjustments, such as the position and colour of the text on the screen.

I mentioned a couple of weeks ago that I use Dragon NaturallySpeaking to assist me in my own writing. While writing this entry, I opened too many browser tabs and other applications, leaving not enough memory to run the software. I could have rebooted the computer to free up space, but I instead typed it out by hand.