Voice Security may sound futuristic, but speech systems have been under development for 30 years. Texas Instruments (TI) is the pioneer in this development, actively researching the human voice and developing systems already in the 1960s. As far back as the 1970s, TI produced voice security systems for combat pilots in the U.S. Air Force. AT&T, Telecordia Technologies, IDIAP (in France), Sandia National Laboratories, the Speech Science Institute in conjunction with Lawrence Livermore National Laboratories, and a number of the world's universities have also conducted research into voice recognition security systems.
Biometrics. Biometrics is a science that makes use of an individual's unique identifiers, easily reading them by automated processes. The different categories of biometrics include fingerprints, finger geometry, hand geometry measurements, iris and retinal scans, signature verification, facial recognition, and voice recognition. There's no universal answer to which type of biometrics is the best. Selection and implementation can depend on the most applicable solution plus resources such as available money, personnel, and environment. Physical condition and other considerations also enter into the picture. For instance, the fingerprints of people who work with their hands at construction, art, or other similar occupations where their fingers are exposed to chemicals or elements are frequently worn or not well defined. In such instances, another type of biometric, for example, voice, is a better choice. Additionally, handicapped people who have problems using their hands and arms can benefit from using voice biometrics instead of anything requiring them to use their hands.
Voice Biometrics. To use voice biometrics, the user states a given word or phrase three times so the system can create a template. This template is based on numerous voice characteristics, including pitch, tone, cadence, and shape of the larynx. The template is generally stored in a database for later comparison with peoples' voices. With our rapidly changing technology, though, it has become possible to store the template on a microchip and embed the microchip into a device. Voice biometrics is considered to be a hybrid behavioral and physiological biometric because while voice pattern is largely determined by physical shape of the throat and larynx, the user can alter the voice. Recording devices and background noise can affect voice biometrics, too. In the early voice biometrics systems, when people had colds or allergies, their voices could not always be matched against their own voice print. Today however, voice biometrics has been somewhat perfected, and voice patterns depend on several characteristics, such as cadence and tone. With these improvements to the technology, even people with colds can normally match their voice prints without incident. Just to be certain that the products we looked at actually work and aren't just hype on the part of the manufacturers, we imposed upon several friends suffering from colds, flu, and allergies to test various voice recognition products. With all the products, whether encryption, access control, or dictation, once the user "trained" the software in the tone and cadence of his or her voice, nasal congestion and vocal changes didn't appear to have any significant effect on using the software. It must be noted that even without a cold or allergies, it sometimes took more than one try for the software to identify the user and carry out instructions. We attribute this to both background noise and failure to speak clearly into the microphone to make a voiceprint. With a cold or allergies, the software generally worked well, and only occasionally did words or directions need to be repeated. While this may have been due to the cold or allergies, it seemed to be caused by the speaker's failure to speak clearly into the microphone or by background noise. However, it is important to note that in one instance where the speaker was extremely congested and her voice was very hoarse, the software did not identify and verify the speaker, and the software could not be used. In that instance, it was necessary to circumvent the software and use the old-fashioned method of typing commands instead of speaking them.
Understanding Speech Recognition. Let's examine how speech recognition works. Characteristics of speech are speaker specific and are due to differences in physiological and behavioral aspects of human speech production. The main physiological difference is vocal tract shape. This area, found above the vocal folds, is generally considered to be the center of speech production. An acoustic wave is produced when the trachea carries airflow from the lungs through the vocal cords. The source of excitation can be one or a combination of whispering, vibration, compression, phonation, or frication. As an acoustic wave passes through the vocal tract, modifications of the spectral content occur, and speech is produced. Within a large volume distributed user group, voice security is the most flexible and efficient, and the least expensive, biometric. Your voice is unique to you alone. Stealing your voice is virtually impossible even when voices are imitated professionally. By using a PC-based microphone (including those in headphones) or a standard telephone, cost of the software is driven down. Setting a threshold of reliability or acceptance controls voice verification scores. Different businesses and different individuals will have different priorities when it comes to setting a matching or acceptable confidence level. This allows each organization's administrators to set their own specific acceptable statistical score. When a speaker calls in, if the threshold score is higher than the preset level, the user is accepted (identified and verified). If the threshold score is lower, the speaker is denied access.
Voice Recognition Software. Voice or speaker recognition is the process of automatically recognizing who is speaking based on individual information included in the voice or speech. Voice recognition is then divided into voice identification and voice verification. Many people think these are the same thing, but they're not. When using voice identification we have multiple speakers, in fact a set of known speakers who have provided samples of their speech. A match identifies the voice from among the group of many. When using voice verification, a speaker is accepted or rejected. In this way the voice is verified as being whoever it claims to be. Voice recognition allows a speaker to take control of access to restricted locations, services, and applications. Because of the increasing popularity, most of us are exposed to or use voice recognition in our daily lives without even realizing that's what we're using.
Uses Of Voice Security. Voice security consisting of voice identification and voice verification is generally used in instances where identity needs to be verified remotely across telephone lines. There are also applications for privacy, where access to something or some place is restricted in order to maintain the privacy and integrity of the thing or place. There are a growing number of vendors offering voice security products or services, including: SAFlink, VeriVoice, Veritel Corporation (Veritel), Lernout & Hauspie Speech Products (L&H), Phillips, SpeechWorks Systems, Keyware, Qvoice, Voice Security Systems, IBM, ITT, Buytel, Sentry Systems, T-NETIX, InterVoice-Brite, GTE, and Nuance Communications (Nuance), to name a few. While the cost of larger systems is generally based on the number of users, products are now available and reasonably priced for the individual user. Since voice security is generally an add-on, hardware costs are minimal
Cellular Industry. The cellular market is a billion-dollar market that is growing rapidly. But frequently, people make fraudulent calls on others' cell phones and services and run up large bills. When someone discovers that the calls were fraudulent, the cellular providers often wind up eating the cost and losing money since they can't collect from the perpetrators. In the United States alone during 1997, the wireless industry lost $643 million due to fraud. Using voice security can render a stolen or fraudulently obtained unit inoperable and dramatically reduce fraud. We tested this technique by buying a Samsung telephone that has voice recognition. We signed up for cellular service for that specific phone and recorded several authorized user voices. Unless an authorized voice activates the phone, it doesn't work. In this way, nobody except an authorized user (the only users with voiceprints on this phone) can activate it. This type of voice security makes others unable to activate and dial the telephone and keeps accounts safe from phone fraud. The method used for this involves chip technology. Chip technology, where a stored digital voice print is placed on a microchip and embedded in a cellular phone, notebook computer, or various other devices, is designed to prevent devices from being used by the wrong people. Using a recorded password or key phrase, access is denied to everyone but the individual whose voice print matches the one recorded. Such chips are also being embedded in notebook computers, although software remains more popular for PC voice security. Voice Security Systems and Philips are both developers and innovators in the field of chip technology for the cellular market. Their technology involves burning voice verification onto an existing microprocessor within a cellular device. As a result, an outside processor or hard drive is unnecessary.
Other Uses. The United States prison systems and criminal systems regularly use voice verification to verify parolees through telephone systems. Charles Schwab & Co. uses Nuance Communications products to provide a voice recognition system that scans anyone attempting to gain physical or virtual entrance to the site, access accounts, and conduct transactions. Customers can get information, check account balances, and make trades once they've identified themselves by voice. The company says transactions in the trillions of dollars are safely and securely conducted in this manner. You can download a trial version of IBM's ViaVoice. | Another user of Nuance technology is the Home Shopping Network. Since its business is based on customer satisfaction, and it has a huge customer base, by replacing account numbers with voice security, the company has combined increased security with increased efficiency in the trillions of dollars of transactions handled by the Home Shopping Network. InterVoice-Brite has developed software in conjunction with Nuance Communications and SpeechWorks International that is able to recognize 80% of the names in the United States and immediately refers the remaining 20% to a live representative. Use of this system is prominent within the financial area already, and it is expected to expand into other industry areas. Quincy Medical Center in Quincy, Mass., has implemented Learnout & Hauspie's voice products (i.e. VoiceXpress) with medical terminology embedded in its templates. The 10 physicians who are registered in the system can efficiently create and update records for more than 32,000 patients annually while maintaining the security and confidentiality of the contents of the records. This lowers costs for maintaining the records and lowers the probability of liability due to sacrificed confidentiality. BBN Technologies, a division of GTE, created technology that is 85% accurate in recognizing the continuous speech in news broadcasts. This accuracy extends to telephone interviews and leads the way for handling business intelligence applications. Dragon NaturallySpeaking, software that comes in several different versions for average use and for use in professional specialties such as medicine and dentistry, has always been an extremely successful tool available to the public. Learnout & Hauspie, a Belgium firm that has competing software for user authentication and dictation, recently acquired the company. However, L&H also has a long reputation for working with the American government to identify and implement voice activated solutions. There are many other applications in banking, financial management, stock trading and management, utilities, corporate call centers, and information gathering that can benefit from using voice recognition software and applications.
Product Testing. We went to several stores (Comp USA, Staples, Home Depot, RCS, Circuit City) that any reader could visit and looked for voice recognition software. Our first product was Voicecrypt by Veritel. Voicecrypt is an encryption/decryption product that safeguards PC files from any unauthorized viewer. Voicecrypt installed in 10 minutes, and it took about 20 minutes for us to train the software to the sound of our voices and commands. We tested it on Windows 98 applications and files, selectively encrypting files and then attempting to access them. The encryption actually worked well; we couldn't read the files. In fact, since they were encrypted, we couldn't actually find them visually. However, with a voice command, we were able to successfully decrypt and access the files. We ran our test on both a Pentium PC and a Pentium laptop. The product worked equally well on both. Voicecrypt was easy to use and inexpensive ($59.95). We recommend it for individuals who don't want to go through a complicated manual encryption process. We loved the second product, SAFtyLatch by SAFlink because it was really simple to use and provided great access protection. After loading it onto our Windows 98 desktop, we invoked the help of two associates, and we all initialized the product by creating individual voiceprints using the included microphone. Creating individual voiceprints meant that all of us were authorized users, so we could each protect our own files with our voice recognition. This meant that all of us were able to use the same PC, but we couldn't access each other's files. Yet, each of us was able to access our own files very easily. One especially nice feature was that other individuals could not use the desktop. Although the PC could be turned on, everyone except for the authorized users was locked out, making the PC and its data secure while out in the open. This is one product we recommend for both home and office use. It's inexpensive (less than $50) and very good. We tried several and found that while all the products worked, they did differ. Dragon NaturallySpeaking Preferred Edition for Windows 98 installed easily and took about two hours to "train" to the sound of a voice. It was remarkably easy to use but lacked a technical vocabulary, although we were able to add information technology and other scientific terms. The Preferred Edition contains more features for business users than the Standard Edition. In addition to typing as we spoke, there was recorded speech playback, which made editing easier. We paid $149.95 plus tax for the Preferred Edition. The Standard Edition costs less. A Mobile Option Kit contains a small mobile recorder that you can carry with you. It's then possible to synchronize the recorder to you desktop or PC containing NaturallySpeaking and use the recorded data to type your text. SAFtyLatch by SAFlink can secure a PC for use by multiple users. | Removable memory cards permit an unlimited amount of recording. A combined package called Dragon NaturallySpeaking Mobile includes the Preferred software and the mobile options. Additionally, Dragon offers the product DragonDictate that permits a user to be completely speech dependent because of the voice-mouse, macro processing capabilities, and screen reading software. The most comprehensive package is Dragon NaturallySpeaking Professional, which combines enhanced macro control, voice-mouse, Dragon Dictate, and a high quality microphone in one package. Dragon Systems also makes a Medical Edition with an approximately 300,000 word medical vocabulary (approximately $900); a Legal Edition with an approximately 300,000 word legal vocabulary (approximately $900). Learnout & Hauspie has a wide variety of packages available. The basic L&H Voice Xpress Standard is great for the general PC user, and for less than $50, it offers Natural Language Technology (continuous uninterrupted speech dictation), which can be used with most Windows applications. The next version, VoiceXpress Advanced, includes two plug-in vocabularies for MSWord and costs about $80. L&H also has a Professional version that integrates with Word, Excel, and PowerPoint and includes two plug-in vocabularies, acclimates to a mobile unit, and comes with a headset containing a microphone (approximately $150). Naturally there's a Mobile Professional version that offers everything in the Professional version plus a mobile dictation unit. L&H has a number of excellent specialty packages, including VoiceXpress for Medicine, Mental Health ($600); VoiceXpress for Medicine, Specialty Suite ($800), VoiceXpress for Medicine, Medical Compendium ($500); VoiceXpress for Legal, General Practice ($800); and VoiceXpress for Legal, Litigation ($800). We worked exclusively with the Professional version, and it took about three hours to "train" it to voice commands. Once again we found that although the product worked well, it lacked the vocabulary we needed, so we had to spend a lot of time adding to the vocabulary. L&H appeared to be more accurate than Dragon NaturallySpeaking. IBM provides a selection of ViaVoice products such as ViaVoice Pro, Web, Standard, and Personal Millennium Editions. The ViaVoice Personal Millennium Edition was the only one that we tried, but we found it the most difficult to install and train to voices. However, ViaVoice had the largest technical vocabulary, and although it took almost four hours to train to a voice, it performed better than any of the other products ($250) and required less time to add to its vocabulary. This product came with headphones and a microphone. Like Dragon and L&H, IBM uses continuous speech, making it unnecessary to pause at the end of each phrase or sentence. Philips Speech Processing developed the first commercially available PC-based natural, continuous, speech-recognition engine for speech-to-text applications in 1993. Still developing and selling its applications, Philips offers FreeSpeech for SOHO (small office/home office) and consumer markets, SpeechPro for the professional dictation users, SpeechMagic for client/server environments, and Speech SDK for software developers. Available in 15 languages, Philips technology has been successfully incorporated into many consumer electronic products in banking, travel, and other industries. Philips FreeSpeech lacked some of the nicer features that the other packages had. While most Windows applications can be dictated to, the functionality is much more limited than any of the other packages, and it accomplishes less.
Workplace Security. Security solutions within corporations are based on set policies and known sensitivities. To aid acceptance, voice security is frequently combined with more traditional forms of security. Biometrics is strongly supported by NIST the (National Institute of Standards and Technologies). However, the constant evolving environment plus the advanced technology already in place present a unique situation to those heavily involved with and dedicated to voice biometrics. Voice security can be cost effective and can promote efficiency in the workplace. The ability to delete template voiceprints easily means that administrators don't have to recall access cards from former employees. Help desk backlog and technical support are also minimized. In a society where telecommunications are the norm and electronic commerce is a way of life, there is a growing need to protect sensitive information. This defined need will likely trigger the improvement and acceptance of voice biometrics. Voice security is attractive to the business environment because it offers network administrators a way to economically implement advanced user authentication. Because of falling prices and increased popularity, it's anticipated that the next few years could be a turning point for the voice recognition community. We can only wait and see what happens and how broadly this voice security will be accepted. by Diane E. Levine
Key Points Voice recognition software can authenticate your voice even when you have a cold. Prevent fraudulent calls from being made on your cell phone by setting authorized user voice prints. Even a professional impersonator can't duplicate all the qualities unique to your voice. |
What You Need The hardware and software requirements for voice security packages vary considerably, but we have listed the requirements for a few different options. SAFtyLatch by SAFLINK Windows 95/98 (Win9x) 32MB (megabytes) RAM (random-access memory) 5MB hard drive space CD-ROM drive (2X) SVGA (Super Video Graphics Array) monitor 100MHz (megahertz) processor Microphone included Soundblaster compatible sound card Dragon NaturallySpeaking Preferred Win9x or NT 4.0 200MHz Intel Pentium Processor with MMX 48MB RAM for Win9x 64MB RAM for Windows NT 4.0 200MB hard drive space Creative Labs Sound Blaster 16 or compatible sound board CD-ROM Speakers for listening to playback of recorded speech and Quick Tour Noise-canceling headset microphone L&H VoiceXpress Professional Win9x or NT 4.0 processor performance equivalent to Intel Pentium II 48MB RAM for Win9x 64MB RAM for NT 4.0 200MB hard drive space Sound Blaster 16 compatible sound board Speakers for listening to Talking Text and Quick Tour CD-ROM drive Noise-canceling headset microphone IBM ViaVoice 98 Personal Win9x or NT 4.0 Intel Pentium Processor 166MHz with MMX 48MB RAM for Win9x 64MB RAM for NT 4.0 180MB hard drive space 16-bit sound card with microphone input jack CD-ROM drive double speed or faster |
|