|
||
|
| Use Voice Recognition Software |
Email This
View My Personal Library |
|
Communication July 2000 Vol.6 Issue 7 Page(s) 71-75 in print issue |
Use Voice Recognition Software Free Your Hands From The Keyboard | ||
|
Even if your computer doesn't understand what you say, the newest software can at least recognize it. Today's voice recognition software comes closer than ever to the goal of making keyboards optional. We aren't there yet, but for a variety of users, voice recognition software is actually well worth the money. Below, we take a look at how voice recognition works and how it can work for you. Sampling. When we speak, the sounds we make vibrate and travel through the air in the form of an analog wave. An analog wave is like a ramp that has an infinite set of values. As the amplitude of the wave rises and falls, the pitch and tone of the sound correspondingly go up or down. This is the most realistic form of sound because every infliction is measured and represented on one of the infinite number of points on the wave. However, today's computers don't deal with analog input, they deal with digital information. For a computer to interpret and output sound, it must turn analog waves into digital data through a process called sampling. Instead of recording the entire analog wave of sound, and analog-to-digital converter in your computer's sound card takes "snapshots" of the wave at various points in time. These points are then given a value, which is then translated into the digital language of binary code (a string of ones and zeros) that the computer can understand. When the sound is digitized, the result is more like a set of stairs than a ramp. Unlike an analog wave, a value can only be on one of the steps, not anywhere in between.
Before the voice recognition software even begins thinking about what these numbers represent, it's helpful to narrow the jumble down to the human component only. The computer carefully listens for pauses in the speech so it can sample background noises and then remove those noises from the entire data set. Voice recognition programs can also compare the numbers to known characteristics of human speech to remove occasional extraneous noises, such as automobiles and ringing phones. Phonemes. Once the sounds are in a filtered digital format, the real work begins. At first, the problem may not seem all that difficult. Speak a particular word, and a particular string of numbers should result. If the computer runs the process in reverse, it should be able to recognize words from those incoming numbers. In practice, however, the task is a bit more challenging. Because no two people sound exactly alike, and because the same person can sound different depending on mood, excitement, fatigue, and other variables, sampling and simplistic raw matching is not enough. Voice recognition programs have to gamble a bit if they're going to be useful. Researchers divide all the sounds that go into making up the words of a language into units called phonemes. These building blocks, such as the "p" and "in" sounds that together make up "pin," are what voice recognition software attempts to find within the string of incoming numbers. Chances are the computer will recognize many of the phonemes it sees but not others. So, the software starts making educated guesses based on years of research into which phonemes typically follow others. These hunches about ambiguous sounds also are solidified based on what the software already knows about how you speak. In the early days of speech recognition, "training" time required speakers to read paragraph after paragraph of prechosen sentences into their microphones. Because the software knew ahead of time what the person was going to say, it could learn how an individual said the various phonemes. Today's more advanced programs skip these lengthy sessions. Often, they only need the speaker to say a few quick sentences up-front; the software learns the rest along the way as the speaker dictates sentences. As the software's database improves, it needs to make fewer and fewer corrections. Statistical analysis. Once the software has a handle on which phonemes it just heard, it begins combining them into words. Again, statistical analysis is used to fill in the gaps. For example, sounds alone will never tell the software whether someone said "to," "two," or "too." The problem grows larger when you figure that many words, technically speaking, aren't supposed to sound the same but are pronounced similarly by people speaking quickly or in particular dialects. Careful research into spoken and written language usually provides an answer for the computer. For instance, say the software is pretty sure it heard "I like" and then something with the "oo" phoneme. Based simply on the sound alone it could be "you," "two," "moo," or many other choices. But the software also knows not all of these words are likely to follow "I like." By combining the sound and word probability scores, the software can make a good guess that "I like you" is the correct sentence. Dictation software, which allows users to speak into a computer microphone and watch their words pop up on the screen, is the most popular. Dictation programs have been steadily improving during the past decade, moving from a discrete speech model to today's more accurate natural language products. Discrete speech recognition technology requires pauses between each spoken word to provide distinct phonemes and to let the computer catch up. Although this was better than typing for some people, the artificial stiltedness of the monologue kept such software from being swept into the mainstream. Natural language technology, on the other hand, recognizes words at a more normal, conversational pace. Although a pause may still be necessary to add in punctuation and formatting, users of the latest software can speak in something approaching a regular pattern. Several developers are vying for the growing natural language dictation market. Dragon Systems' NaturallySpeaking (http://www.dragonsystems.com) is probably the best known. IBM pioneered much of the research in this area and competes directly with Dragon through its ViaVoice software line (http://www.software.ibm.com/speech). Lernout & Hauspie recently teamed up with Microsoft to develop Voice Xpress (http://www.lhs.com/voicexpress), which comes in several versions and lets users navigate and dictate through Microsoft Office applications. In addition, European giant Philips Language Processing offers FreeSpeech 2000 (http://www.speech.philips.com/freespeech2000) for a variety of languages. The standard versions of leading packages usually cost about $100. Specialty users can spend more for add-ons and standalone solutions that are aimed at particular fields, such as law and medicine. Many of the leading developers also build in compatibility tools that let mobile users take advantage of this technology. For example, Dragon offers NaturallySpeaking Mobile, a $229 digital recorder that you talk into and later plug into your PC, where NaturallySpeaking Preferred translates your voice. Dictation technology also provides new and efficient ways for students to learn. Auralog's TeLL me More series (http://www.auralog.com) offers comprehensive language study that uses speech recognition as a sort of private tutor. The ministries of education in Spain and France endorse the software, and AT&T and Ford use it to train employees. In addition, an IBM-implemented program in Philadelphia uses speech recognition to help teach kids how to read. Students use an on-screen cartoon tutor to help identify pronunciation mistakes. As a result, teachers can use the program to tailor lessons to each student's individual needs. Command recognition is also branching out to noncomputer uses. Perhaps you've used a telephone voice mail or switching system that lets you say numbers rather than punch them in by hand. Many new wireless phones let drivers dial numbers by voice so they can keep their hands on the steering wheel. Microsoft promotes a similar idea called Auto PC, which is a voice interface device for your car that works as a CD player and information manager to give you directions, read e-mail aloud, and skip to selected songs on your compact discs. Combine dictation and interface capabilities, and you're well on your way to a computer that needs neither the keyboard nor the mouse. Today's versions are still a little clunky, but the components are workable and improving with each version. No one doubts that speech recognition will grow to become one of the most important ways in which people interface with computers. The technology promises a convenient way to control home appliances, quickly accomplish routine tasks, or even navigate from point A to point B in your car by asking questions and getting constantly updated directions spoken back to you.
Voice recognition also can be used to secure people. VoiceTrack (http://www.voicetrack.com) is a program designed to track parolees, probationers, pretrial defendants, offenders who haven't been sentenced yet, juveniles, and individuals on work release. The voice verification feature allows corrections officers to determine whether someone is where he or she is supposed to be at any given time. With a single phone call, the system matches the subject's voiceprint and verifies the person's identity. That's not the end of the story for this diverse technology. New companies and new uses for speech recognition products are popping up all the time. Check out the Web site at http://www.tiac.net/users/rwilcox/speech.html#NEW for a comprehensive list of speech recognition industry links. For example, Vocal Systems (http://www.vocalsystems.com) is set to begin marketing a program called OmniBabel, which not only recognizes speech but also translates it into another language and then speaks it in that language. It won't be long before language-translation services are cheap enough to facilitate telephone and videophone conversations between business users and friends in different countries. To begin, load the CD in your CD-ROM drive and click the Install button. After the setup process completes, open the Dragon NaturallySpeaking program in your Start menu or click the newly minted menu button in Microsoft Word 97/2000 or Corel WordPerfect 8/9. If this is the first time you've run the program, a user dialog box should be the first dialog box you see. Click New to begin the process of setting up your own profile. The New User Wizard appears to guide you through the process of setting up the program to recognize your voice and adjust the hardware and software to run best in your environment. Click the Next button to begin. In the next box, enter your name. The default choices in the other drop-down menus are best for new users. Click Next again to proceed. After a few moments of loading files, the wizard asks how you will be using the program. Most users should select the first choice, speaking Directly To The Computer. However, if you will be dictating into a portable recorder for later translation into text, pick the second option. Click Next again. Next, the program wants to know how you will be speaking to the machine. The USB (Universal Serial Bus) version of Naturally-Speaking comes with a special USB microphone for more accurate recording. For other versions, you will probably want to select the name of your sound card where you plug in a standard microphone or headset. Click OK to continue. Now the wizard tests your microphone levels. Click the Next button to move past the explanation dialog box. Pick the radio button next to the type of microphone you will be using. In most cases, headset is the best option. On the following screen, you'll see a picture with a brief description of where you should place the microphone. When you're all set, move to the next screen. Click the Start Adjusting button and read out loud the text you see. The software adjusts the volume to compensate for how loudly or softly you speak. When you are done, if you think you didn't speak as you normally would, click Start Adjusting to start over. When you're finished, go to the next screen. Click the Start Quality Check button so the software can test how well your microphone and sound card work together. Then click Finish.
Inside the Vocabulary Builder, the first screen begins by outlining the process. Click Next to get going. The following screen lets you browse the files on your hard drive or network for a customized word list. Unless you've worked with a similar program in the past, you probably don't have a list of unusual words you regularly use. Click Next for other options. The next screen lets you add any sort of word processor file stored on your drives for the program to analyze. It's a good idea to select documents that represent the kinds of things you frequently write about and any document with specialized vocabulary you're likely to use again. Click Add to fetch a few and then click the Analyze Documents button. The software runs through the files and looks for any words it does not know. When it's through, click Next. Now look through the list of words that appears and check those you want the software to know. Click Add Checked Words To Vocabulary and follow the instructions to make the word part of the program's database. Click Next, and then Next again, then Finish so the software can analyze the dictation style from your documents. Back in the wizard, the program asks you to take the Quick Tour to learn basic dictation techniques. Unless you're pressed for time or already know the drill, this is a useful introduction to how voice recognition software works. Finally, click Finish to try it yourself. Open your word processor, put on the headset, and look for the microphone icon in the Windows Taskbar System Tray. If it is lying down, click it with the mouse to make it start listening. When the microphone is pointing up at a 45-degree angle, you can begin speaking your first sentences. Just don't give up right away. As with anything else, it takes some practice to figure out how to best use a speech recognition package. We found it helps if you speak clearly and enunciate your words. There is no need to overcompensate, but we had the best results when we spoke just a bit more slowly than if we were speaking with a real person. Of course, that probably depends on how fast you normally speak. In either case, accuracy should improve as the software begins to understand your particular speech patterns. When the program makes a mistake in translating your voice, it is very important that you be sure to correct it. With NaturallySpeaking, you usually can just say, "correct that" when you notice a bad translation. In the box that appears, select or type the correct phrase. Each correction you make helps improve the program's accuracy. For instance, we initially had trouble with similar-sounding words, such as Jack, chat, and cat. A few corrections brought a great deal of improvement although the program still fell a bit short of perfection. It's also a good idea to learn the basic formatting commands that make word processors such useful editing tools. Take a trip through the instructions that came with your speech recognition software so you know how to do things such as move the cursor around the screen, select and deselect text, and operate the program's common commands. For example, Naturally- Speaking lets you say the word "click" followed by the name of almost any menu or on-screen button. Once you have the basics down for controlling work within a program, take a trip into the broader realm of Windows. NaturallySpeaking lets you use your voice to do most of the things a mouse can do, such as open programs, click menu options, and switch between windows. The program also functions with just about any text-related program you might use, so it's possible to dictate and send e-mail, put numbers in spreadsheet cells, and enter information into databases. We did find a few limitations as we dictated this article and gave speech recognition a tour through some other office work. Dictation can be enjoyable, but fast-paced editing seems a little clunky when you're relying on voice commands rather than the mouse. Although hands-free manipulation of words, sentences, and paragraphs is possible, it remains more time-consuming than we'd like. Of course, technology that requires sustained speaking also is not conducive to environments such as the classroom, the library, a busy open office, or meetings. And just like your mother always used to tell you, we don't recommend eating and speaking into the microphone at the same time. No, speech recognition still isn't 100% accurate; on the other hand, typing isn't 100% accurate, either. Until someone invents a computer that can simply read our thoughts, there will always be errors in the translation between the human brain and the screen. You might as well let your fingers off the hook and give them a break on occasion. by Alan Phelps
|
|
Home Copyright & Legal Information Privacy Policy Site Map Contact Us