Smart Computing ® Smart Computing ®
Top Subscribe Today | Contact Us | Register Now   
middle
Home | Tech Support | Q&A Board | Article Search | Subscribe & Shop   


Use Voice Recognition Software Email This
Print This
View My Personal Library

Communication
July 2000 • Vol.6 Issue 7
Page(s) 71-75 in print issue
Add To My Personal Library

Use Voice Recognition Software
Free Your Hands From The Keyboard
The same skills people learn almost automatically at a young age can be the most difficult for computers to pick up. Although babies all over the world pick up language abilities at an astonishing rate, computers muddle along with their superduper electronic brains, incapable of understanding even rudimentary speech. Even your lowly mutt who sleeps on a pile of old blankets in the garage can make more sense of your voice.

Even if your computer doesn't understand what you say, the newest software can at least recognize it. Today's voice recognition software comes closer than ever to the goal of making keyboards optional. We aren't there yet, but for a variety of users, voice recognition software is actually well worth the money. Below, we take a look at how voice recognition works and how it can work for you.



How It Works. The first step in voice recognition is converting a speaker's voice into something a computer has a chance of recognizing. That means translating your voice into something digital, which means numbers and more numbers.

Sampling. When we speak, the sounds we make vibrate and travel through the air in the form of an analog wave. An analog wave is like a ramp that has an infinite set of values. As the amplitude of the wave rises and falls, the pitch and tone of the sound correspondingly go up or down. This is the most realistic form of sound because every infliction is measured and represented on one of the infinite number of points on the wave.

However, today's computers don't deal with analog input, they deal with digital information. For a computer to interpret and output sound, it must turn analog waves into digital data through a process called sampling. Instead of recording the entire analog wave of sound, and analog-to-digital converter in your computer's sound card takes "snapshots" of the wave at various points in time. These points are then given a value, which is then translated into the digital language of binary code (a string of ones and zeros) that the computer can understand. When the sound is digitized, the result is more like a set of stairs than a ramp. Unlike an analog wave, a value can only be on one of the steps, not anywhere in between.



The Dragon NaturallySpeaking New User Wizard guides you through the process of setting up your own profile.
As a person speaks into a microphone, the electrical signal generated is divided up into tiny slices of time. Each of these slices is assigned a number based on the characteristics of the signal at that instant. This long string of numbers represents a digital version of the speaker's analog sentence.

Before the voice recognition software even begins thinking about what these numbers represent, it's helpful to narrow the jumble down to the human component only. The computer carefully listens for pauses in the speech so it can sample background noises and then remove those noises from the entire data set. Voice recognition programs can also compare the numbers to known characteristics of human speech to remove occasional extraneous noises, such as automobiles and ringing phones.

Phonemes. Once the sounds are in a filtered digital format, the real work begins. At first, the problem may not seem all that difficult. Speak a particular word, and a particular string of numbers should result. If the computer runs the process in reverse, it should be able to recognize words from those incoming numbers. In practice, however, the task is a bit more challenging. Because no two people sound exactly alike, and because the same person can sound different depending on mood, excitement, fatigue, and other variables, sampling and simplistic raw matching is not enough. Voice recognition programs have to gamble a bit if they're going to be useful.

Researchers divide all the sounds that go into making up the words of a language into units called phonemes. These building blocks, such as the "p" and "in" sounds that together make up "pin," are what voice recognition software attempts to find within the string of incoming numbers. Chances are the computer will recognize many of the phonemes it sees but not others. So, the software starts making educated guesses based on years of research into which phonemes typically follow others.

These hunches about ambiguous sounds also are solidified based on what the software already knows about how you speak. In the early days of speech recognition, "training" time required speakers to read paragraph after paragraph of prechosen sentences into their microphones. Because the software knew ahead of time what the person was going to say, it could learn how an individual said the various phonemes. Today's more advanced programs skip these lengthy sessions. Often, they only need the speaker to say a few quick sentences up-front; the software learns the rest along the way as the speaker dictates sentences. As the software's database improves, it needs to make fewer and fewer corrections.

Statistical analysis. Once the software has a handle on which phonemes it just heard, it begins combining them into words. Again, statistical analysis is used to fill in the gaps. For example, sounds alone will never tell the software whether someone said "to," "two," or "too." The problem grows larger when you figure that many words, technically speaking, aren't supposed to sound the same but are pronounced similarly by people speaking quickly or in particular dialects.

Careful research into spoken and written language usually provides an answer for the computer. For instance, say the software is pretty sure it heard "I like" and then something with the "oo" phoneme. Based simply on the sound alone it could be "you," "two," "moo," or many other choices. But the software also knows not all of these words are likely to follow "I like." By combining the sound and word probability scores, the software can make a good guess that "I like you" is the correct sentence.



Applying Technology. Developers putting all of this voice recognition research to use generally explore one of three market-hungry applications: dictation, interface control, and security.

Dictation software, which allows users to speak into a computer microphone and watch their words pop up on the screen, is the most popular. Dictation programs have been steadily improving during the past decade, moving from a discrete speech model to today's more accurate natural language products. Discrete speech recognition technology requires pauses between each spoken word to provide distinct phonemes and to let the computer catch up. Although this was better than typing for some people, the artificial stiltedness of the monologue kept such software from being swept into the mainstream.

Natural language technology, on the other hand, recognizes words at a more normal, conversational pace. Although a pause may still be necessary to add in punctuation and formatting, users of the latest software can speak in something approaching a regular pattern.

Several developers are vying for the growing natural language dictation market. Dragon Systems' NaturallySpeaking (http://www.dragonsystems.com) is probably the best known. IBM pioneered much of the research in this area and competes directly with Dragon through its ViaVoice software line (http://www.software.ibm.com/speech). Lernout & Hauspie recently teamed up with Microsoft to develop Voice Xpress (http://www.lhs.com/voicexpress), which comes in several versions and lets users navigate and dictate through Microsoft Office applications. In addition, European giant Philips Language Processing offers FreeSpeech 2000 (http://www.speech.philips.com/freespeech2000) for a variety of languages. The standard versions of leading packages usually cost about $100. Specialty users can spend more for add-ons and standalone solutions that are aimed at particular fields, such as law and medicine.

Many of the leading developers also build in compatibility tools that let mobile users take advantage of this technology. For example, Dragon offers NaturallySpeaking Mobile, a $229 digital recorder that you talk into and later plug into your PC, where NaturallySpeaking Preferred translates your voice.

Dictation technology also provides new and efficient ways for students to learn. Auralog's TeLL me More series (http://www.auralog.com) offers comprehensive language study that uses speech recognition as a sort of private tutor. The ministries of education in Spain and France endorse the software, and AT&T and Ford use it to train employees. In addition, an IBM-implemented program in Philadelphia uses speech recognition to help teach kids how to read. Students use an on-screen cartoon tutor to help identify pronunciation mistakes. As a result, teachers can use the program to tailor lessons to each student's individual needs.



At Your Command. Besides being a fast typing tool or an educational resource, speech recognition also carries the potential to eliminate the mouse. Interface software turns voice commands into the clicks, drags, and codes that make programs run. Today's software often includes the capability to open and close windows, click buttons, or run Web browsers. (See the "Browse The Web With Your Voice" sidebar.)

Command recognition is also branching out to noncomputer uses. Perhaps you've used a telephone voice mail or switching system that lets you say numbers rather than punch them in by hand. Many new wireless phones let drivers dial numbers by voice so they can keep their hands on the steering wheel. Microsoft promotes a similar idea called Auto PC, which is a voice interface device for your car that works as a CD player and information manager to give you directions, read e-mail aloud, and skip to selected songs on your compact discs.

Combine dictation and interface capabilities, and you're well on your way to a computer that needs neither the keyboard nor the mouse. Today's versions are still a little clunky, but the components are workable and improving with each version. No one doubts that speech recognition will grow to become one of the most important ways in which people interface with computers. The technology promises a convenient way to control home appliances, quickly accomplish routine tasks, or even navigate from point A to point B in your car by asking questions and getting constantly updated directions spoken back to you.



Training requires new users to read a few minutes' worth of text so the software can begin adjusting its vocabulary to your speech patterns.


Stop Right There. Security is another up-and-coming role for speech recognition, but it's one you may not think of right away. Biometric technology punts hard-to-remember passwords in favor of voice identification and verification. Companies are beginning to test systems that guard access to ATMs, computers, voice mail, and wireless phones. Veritel's Voice crypt software (http://www.veritelcorp.com) brings the idea to consumer machines. Users can use the program to lock others out of selected files or their entire computer.

Voice recognition also can be used to secure people. VoiceTrack (http://www.voicetrack.com) is a program designed to track parolees, probationers, pretrial defendants, offenders who haven't been sentenced yet, juveniles, and individuals on work release. The voice verification feature allows corrections officers to determine whether someone is where he or she is supposed to be at any given time. With a single phone call, the system matches the subject's voiceprint and verifies the person's identity.

That's not the end of the story for this diverse technology. New companies and new uses for speech recognition products are popping up all the time. Check out the Web site at http://www.tiac.net/users/rwilcox/speech.html#NEW for a comprehensive list of speech recognition industry links. For example, Vocal Systems (http://www.vocalsystems.com) is set to begin marketing a program called OmniBabel, which not only recognizes speech but also translates it into another language and then speaks it in that language. It won't be long before language-translation services are cheap enough to facilitate telephone and videophone conversations between business users and friends in different countries.



Speaking Of Today. The only way to make sure the buzz over speech recognition isn't all talk is to try it for yourself. We'll take a closer look at installing, training, and using the market leader, Dragon's NaturallySpeaking 4.0. Investing the time and money Dragon requires still isn't for everyone, but you may be surprised how well dictation and interface software works today.

To begin, load the CD in your CD-ROM drive and click the Install button. After the setup process completes, open the Dragon NaturallySpeaking program in your Start menu or click the newly minted menu button in Microsoft Word 97/2000 or Corel WordPerfect 8/9. If this is the first time you've run the program, a user dialog box should be the first dialog box you see. Click New to begin the process of setting up your own profile.

The New User Wizard appears to guide you through the process of setting up the program to recognize your voice and adjust the hardware and software to run best in your environment. Click the Next button to begin. In the next box, enter your name. The default choices in the other drop-down menus are best for new users. Click Next again to proceed.

After a few moments of loading files, the wizard asks how you will be using the program. Most users should select the first choice, speaking Directly To The Computer. However, if you will be dictating into a portable recorder for later translation into text, pick the second option. Click Next again.

Next, the program wants to know how you will be speaking to the machine. The USB (Universal Serial Bus) version of Naturally-Speaking comes with a special USB microphone for more accurate recording. For other versions, you will probably want to select the name of your sound card where you plug in a standard microphone or headset. Click OK to continue.

Now the wizard tests your microphone levels. Click the Next button to move past the explanation dialog box. Pick the radio button next to the type of microphone you will be using. In most cases, headset is the best option.

On the following screen, you'll see a picture with a brief description of where you should place the microphone. When you're all set, move to the next screen. Click the Start Adjusting button and read out loud the text you see. The software adjusts the volume to compensate for how loudly or softly you speak. When you are done, if you think you didn't speak as you normally would, click Start Adjusting to start over. When you're finished, go to the next screen.

Click the Start Quality Check button so the software can test how well your microphone and sound card work together. Then click Finish.



Teach It A Lesson. Now it's time for General Training, so click the Continue button to get started. The software presents a couple of test sentences; click the Record button and read them aloud. After that, choose from the list of longer selections and read them aloud as you did the other sentences. If you make a mistake or the computer doesn't understand what you said, a yellow arrow will appear to tell you where to start over. The process takes about five minutes. When you're finished with the training, click OK.



NaturallySpeaking's correction screen pops up whenever you tell the software it has made a mistake. You choose the correct option, and the program then "learns" how you speak.
NaturallySpeaking then calibrates its calculations to match your voice. When it is done, the wizard box reappears with a button titled Run Vocabulary Builder. It's not necessary to do this right away, but it helps the software's accuracy.

Inside the Vocabulary Builder, the first screen begins by outlining the process. Click Next to get going. The following screen lets you browse the files on your hard drive or network for a customized word list. Unless you've worked with a similar program in the past, you probably don't have a list of unusual words you regularly use. Click Next for other options.

The next screen lets you add any sort of word processor file stored on your drives for the program to analyze. It's a good idea to select documents that represent the kinds of things you frequently write about and any document with specialized vocabulary you're likely to use again. Click Add to fetch a few and then click the Analyze Documents button. The software runs through the files and looks for any words it does not know. When it's through, click Next.

Now look through the list of words that appears and check those you want the software to know. Click Add Checked Words To Vocabulary and follow the instructions to make the word part of the program's database. Click Next, and then Next again, then Finish so the software can analyze the dictation style from your documents.

Back in the wizard, the program asks you to take the Quick Tour to learn basic dictation techniques. Unless you're pressed for time or already know the drill, this is a useful introduction to how voice recognition software works.

Finally, click Finish to try it yourself. Open your word processor, put on the headset, and look for the microphone icon in the Windows Taskbar System Tray. If it is lying down, click it with the mouse to make it start listening. When the microphone is pointing up at a 45-degree angle, you can begin speaking your first sentences.



Let Your Voice Be Heard. Don't expect too much out of the system the first time you speak into your microphone. Unfortunately, the days of computers carrying on conversations with their owners are not quite here yet. Your first day with a voice recognition package won't be the last day you touch a keyboard. In fact, you may find that the first few times you use the software, it doesn't seem to recognize your words very well at all.

Just don't give up right away. As with anything else, it takes some practice to figure out how to best use a speech recognition package. We found it helps if you speak clearly and enunciate your words. There is no need to overcompensate, but we had the best results when we spoke just a bit more slowly than if we were speaking with a real person. Of course, that probably depends on how fast you normally speak. In either case, accuracy should improve as the software begins to understand your particular speech patterns.

When the program makes a mistake in translating your voice, it is very important that you be sure to correct it. With NaturallySpeaking, you usually can just say, "correct that" when you notice a bad translation. In the box that appears, select or type the correct phrase. Each correction you make helps improve the program's accuracy. For instance, we initially had trouble with similar-sounding words, such as Jack, chat, and cat. A few corrections brought a great deal of improvement although the program still fell a bit short of perfection.

It's also a good idea to learn the basic formatting commands that make word processors such useful editing tools. Take a trip through the instructions that came with your speech recognition software so you know how to do things such as move the cursor around the screen, select and deselect text, and operate the program's common commands. For example, Naturally- Speaking lets you say the word "click" followed by the name of almost any menu or on-screen button.

Once you have the basics down for controlling work within a program, take a trip into the broader realm of Windows. NaturallySpeaking lets you use your voice to do most of the things a mouse can do, such as open programs, click menu options, and switch between windows. The program also functions with just about any text-related program you might use, so it's possible to dictate and send e-mail, put numbers in spreadsheet cells, and enter information into databases.

We did find a few limitations as we dictated this article and gave speech recognition a tour through some other office work. Dictation can be enjoyable, but fast-paced editing seems a little clunky when you're relying on voice commands rather than the mouse. Although hands-free manipulation of words, sentences, and paragraphs is possible, it remains more time-consuming than we'd like.

Of course, technology that requires sustained speaking also is not conducive to environments such as the classroom, the library, a busy open office, or meetings. And just like your mother always used to tell you, we don't recommend eating and speaking into the microphone at the same time.



The Bottom Line. Most of the time, however, when you're sitting alone in your cubicle or in your private office, speech recognition gives you a great alternative for completing many jobs. There are a number of reasons to climb over that learning curve. Some people may think more clearly while speaking rather than typing. Slow, and even fast, typists may find they can dictate faster than they can move their fingers. Painful conditions, such as carpal tunnel syndrome, also can be a big incentive to make voice recognition work. It's also nice just to be able to get up and walk around rather than being strapped down to your desk for eight hours a day. Unless your walking and talking bothers your co-workers, the only limit is the length of your microphone cord, so stretch your legs.

No, speech recognition still isn't 100% accurate; on the other hand, typing isn't 100% accurate, either. Until someone invents a computer that can simply read our thoughts, there will always be errors in the translation between the human brain and the screen. You might as well let your fingers off the hook and give them a break on occasion.

by Alan Phelps


Key Points

• Voice recognition software uses a three-step process of digital sampling, recognizing phonemes, and statistical analysis to enter what you say on-screen. Although the software still isn't perfect, it has steadily improved in the past few years.

• You can use voice recognition software for more than simple dictation. Other uses include controlling your computer interface, security, and even browsing the Web.

• One of the first steps to using any voice recognition package is training it to understand your voice. Follow your program's instructions carefully so you get the best results.

• The more you use your software, the better it will become at recognizing your voice. If the software makes mistakes early on, be sure you correct it.


Browse The Web With Your Voice

Although dictation abilities get most of the press, voice recognition software is good for other jobs, too. One of the most interesting that we discovered using Dragon NaturallySpeaking is the ability to surf the Web without touching the keyboard.

Getting into gear doesn't take long with Naturally-Speaking or similar programs, such as Conversa Web and IBM's ViaVoice. The common commands of your Web browser are fairly easy to master. For instance, in NaturallySpeaking, the "go back" command tells the system to load the previous page. "Go forward" moves you ahead one page, and "go home" tells the browser to display your start page. You can even tell the browser to go to sites stored in your Favorites file.

The most impressive capability, however, is the way the software lets you click from page to page simply by speaking up. In NaturallySpeaking, users say the first few words of the links they want to follow. For example, if you happen to be viewing a page of links to news stories, you can just start to read one of the headlines. A red arrow appears on-screen to indicate the link you have chosen, and the new page loads automatically. ViaVoice displays small numbers next to various links, and speaking the number aloud is equivalent to a mouse-click.

Although the feature isn't quite as intuitive, these software packages also let you fill in Web page forms so you can perform searches, buy things online, or enter text on your favorite chat pages.

Would you want to surf by voice all the time? Probably not. Rabid Web users will find a voice-activated system responds more slowly than simple furious mouse-clicking. But it's nice to lean back once in awhile, put your feet on the desk, and speak your way through the Web.


Terms To Know

biometrics—Technology used to either identify or verify the identity of someone using one or more physical characteristics, such as the sound of their voice. Biometric systems can be an easy-to-use, secure substitute for passwords or keys.

discrete speech recognition—Older technology that requires users to pause between words to allow time for the software to keep up with what the user is saying.

natural language—The growing ability of voice recognition programs to translate sounds into screen text nearly as fast as people naturally speak.

phonemes—The basic building blocks of language, phonemes are the sounds that people string together to form words and sentences. Voice recognition software begins its work by breaking down incoming data into phonemes and using statistical analysis to fill in the occasional phonemes it can't quite understand.

sampling—
Part of the process of turning analog data, such as human speech, into digital form that a computer can manipulate. The analog data is divided into tiny "slices" and assigned a set value that can be represented in digital form.

training—
The first step in using any voice recognition package is going through a training process that gives the software an idea of how a particular person sounds when he or she says different words. Thanks to ongoing advancements in the voice recognition field, training time has been reduced with each new software version. Many programs now feature a kind of "on-the-job" training that lets users begin real work much sooner than previously possible.







Want more information about a topic you found of interest while reading this article? Type a word or phrase that identifies the topic and click "Search" to find relevant articles from within our editorial database.

Enter A Subject (key words or a phrase):
ALL Words (‘digital’ AND ‘photography’)
ANY Words (‘digital’ OR ‘photography’)
Exact Match ('digital photography'- all words MUST appear together)





Home     Copyright & Legal Information     Privacy Policy     Site Map     Contact Us

Copyright © by Sandhills Publishing Company 2010. All rights reserved.