voice recognition – Gavin Cameron

Typing as Fast as You Can Speak

A typical computer uses a standard keyboard with more than 100 buttons. Many of these will have a secondary function, activated by modifiers such as Shift and Ctrl modifiers. This is more than enough to encode the entire alphabet in upper- and lowercase, numbers 0 to 9, a selection of everyday symbols, and common functions that interact with the operating system.

On the other hand, a stenotype machine has less than 25 buttons, which is not enough for all the letters of the English alphabet, never mind the numbers and punctuation marks. This is because the operator is more interested in the sound of a word than the spelling, and it allows a speed of more than 200 words per minute while moving the hands as little as possible.

Incidentally, the one punctuation mark on the device is an asterisk, used to mark corrections. In some messaging applications, where messages can’t be recalled, users will typically type an asterisk underneath, followed by the corrected word underneath.

However, the stenotype is now decades old and technology has now moved beyond that. Below is a video about live subtitling for proceedings in Parliament.

A video hosted on YouTube with an overview of how subtitles are produced for Parliamentary sessions.

In this application, voice recognition is used. However, it’s far easier to program a computer to understand just one voice instead of many, so an operator listens through headphones to the words spoken on TV and repeats them.

You’ll notice from the video that the operator speaks in something of a monotone regardless of how passionate the MPs are feeling, and this helps the software to provide a consistent result. Punctuation also needs to be added manually, not to mention switching between different people; colour codes are often used to help viewers work out which person said what.

Such software is also available for home users. For a period when I had RSI, I used Dragon NaturallySpeaking to give my fingers a rest. It worked to a high standard, I found, even straight out of the box and with a Scottish accent. However, it produces its best results when connected to the Internet, as it can benefit from deep learning techniques. If it can’t, the audio is processed locally and there’s a noticeable decrease in quality.

Uncategorized

Repetitive Strain Recovery

It was around the time of the 2014 Commonwealth Games when I really started notice the strain in my fingers. It had started off weeks before as a pain in the middle finger of the hand I used to click a computer mouse, but as I was writing humorous commentary about the opening ceremony to online friends, it was difficult to keep going.

The cause was obvious. I had a job where I was typing for most of the day, and I was using a computer outside of working hours. As such, something had to change before my fingers dropped off and I couldn’t write any stories or poems.

One practical adjustment I could make at work was to apply for a roller mouse. The roller is in a fixed place, and it can be controlled with different parts of your hand to avoid straining one place. I’d already been using AutoCorrect to save keystrokes when entering common phrases and jargon.

Outside of work, however, there was more freedom. I started to write my first drafts by hand, making use of the lined pages in my diary. To type up the second draft, I learnt how to use voice recognition. Used properly, speech-to-text software has a good level of accuracy even out of the box, but it’s important to exercise patience while it learns the way you speak.

Furthermore, I found that lifting free weights at the gym relieved the pain in my fingers temporarily. As I usually go at lunchtime, this helped me out in the afternoons.

This year, I’ve realised that by making these changes, I can now type for much longer without my hands hurting. However, I still keep my other measures in place as I don’t want another five or six years of beating RSI again.

Uncategorized

Pencil to Paper, Mouth to Microphone

Margaret Atwood launched her latest novel The Testaments last Tuesday with a worldwide cinema broadcast. This included a short biographical film, long readings by three actresses, and an interview with the author herself.

I discovered she likes to write her first draft on paper, although she says her spelling is terrible. It’s then passed to a typist who makes the necessary corrections. I also make my first draft by hand, then enter it into a PC.

I don’t, however, pass my writing to a typist. What I do is speak my words using Dragon NaturallySpeaking software. As you can hear in the recording below, the software reacts best when you speak in a monotone – although it can handle variations in speech rather well. There are also seemingly awkward gaps while the software catches up with what I’m saying.

You’ll notice I have to say which punctuation I want; this can be done automatically, but I prefer to specify. At 1m 15s into the recording, you can also hear me make a correction, as the software had misunderstood the word ‘pass’ as ‘passed’. I then say ‘choose two’, where I’m selecting the correct word from a list of other possibilities.

Admittedly, dictating can take longer than typing, but there are two advantages. Firstly, since I type every day in my job, my hands are given a rest from the same repetitive motion. Secondly, I can make corrections when it’s transferred into the computer, creating a more refined second draft. For a longer piece, I might then print it off and make further corrections by hand, then return to the PC.

However you choose to write and edit your work, my best piece of advice is to leave time between one draft and the next. On my next reading, I invariably find spelling errors, plot holes, and self-indulgent passages. If even an experienced author like Margaret Atwood can make mistakes, then we should definitely rewrite and rewrite until it’s as good as it can be.

Uncategorized

Captioning the Moment

By law, UK broadcasters must make sure that a minimum percentage of their output is subtitled. This week, I’ve been finding out how this is done.

Traditionally, a typist would be listering to the broadcast and entering the words using a stenography machine. These have a keyboard that accepts syllables rather than individual letters, and complete words would appear to viewers.

However, this method has been superseded by a technique called respeaking. Rather than a typist entering the words by hand, they listen to the audio and speak it into another microphone, where it’s converted into text by software.

So why not simply take the broadcast audio output and convert that directly into text? The computer would have to work out what is speech and to filter out any background noise such as applause, then it would need to be able to accommodate for different people’s accents and mannerisms. Lord Prescott, for instance, is notorious for not finishing his sentences.

Even today, a person can identify the correct content much more effectively than a machine, and can cope better with understanding one voice than thousands.

Respeaking also has two advantages over traditional stenography:

It can take between two and five years of full-time training to use the keyboard at 200 words per minute. Respeakers can reach trainee standard after six months.
The typist’s fingers are left free to make other adjustments, such as the position and colour of the text on the screen.

I mentioned a couple of weeks ago that I use Dragon NaturallySpeaking to assist me in my own writing. While writing this entry, I opened too many browser tabs and other applications, leaving not enough memory to run the software. I could have rebooted the computer to free up space, but I instead typed it out by hand.

	Warming Up for the S… on Line Breaks
	A Tale of Two Topics… on The Cultural Value of the Publ…
	Gavin Cameron on Hotchpotch Moves to Groucho…
	Anonymous on Hotchpotch Moves to Groucho…
	Understanding Epheme… on The Middle of the Road