Smart Computing ® Smart Computing ®
Top Subscribe Today | Contact Us | Register Now   
middle
Home | Tech Support | Q&A Board | Article Search | Subscribe & Shop   


Distributed Computing Harnesses Unused CPU Power Email This
Print This
View My Personal Library

Miscellanea
February 2001 • Vol.5 Issue 1
Page(s) 208-213 in print issue
Add To My Personal Library

Distributed Computing Harnesses Unused CPU Power
How Your PC Can Help Do The Work Of A Supercomputer
IBM's $100 million ASCI White is the world's largest supercomputer, rated at 12.3 teraflops (a teraflop is 1 trillion operations). It's the size of two basketball courts and uses enough electricity to power a small town. ASCI White spends its time simulating a nuclear blast at Lawrence Livermore National Laboratory in northern California. Other large scientific undertakings, such as modeling the Earth's atmosphere 50 years in the future or searching for a million-digit prime number, require the immense computing power that the ASCI White provides. However, few organizations that pursue these projects, other than non-profit and government-funded organizations, can afford to buy a supercomputer. Some of these organizations are seeking to harness computers connected to the Internet to do their calculations through distributed computing.

Distributed computing is a model of data processing, opposite the "bigger is better" model of supercomputer manufacturers. The distributed computing model consists of many smaller computers on a network working to do the same amount of processing as one supercomputer. The Internet, the world's largest network, connects millions of computers. Even a fraction of these computers provides more computing power than ASCI White's 8,192 processors. By finding a way to allow many different computers to process smaller chunks of data, computer scientists essentially turn the Internet into the world's largest supercomputer.



Distributed.net grew out of RSA Data Security's Internet distribution of encryption breaking challenges and Bovine, a group of computer users who banded together to win the challenge. Distributed.net continues to facilitate distributed computing projects.
Over the last decade, organizations, such as SETI (the Search for Extraterrestrial Intelligence) and RSA Data Security, have successfully used distributed computing to process large amounts of data. These organizations faced a large-scale data processing problem: a small computer may take dozens of years to process the behemoth amounts of data they gather. SETI records an entire spectrum of signals from a radio telescope. RSA Data Security tests encryption security by trying to crack 56 and 64 digit encryption codes. SETI and RSA Data Security broke down their data processing into small chunks and distributed them over the Internet for volunteers to process. The volunteers processed the data using software obtained from the sponsoring organizations and sent back the finished results. The organizations then compiled the results for their finished project.

SETI and RSA Data Security use the same method of distributed computing. Using the Internet, they harness the idle processing power, called cycles, of the volunteer's computers. And PCs have a ton of idle processing power; when you are using your computer, it's not working nearly as hard as it could. The fastest typist may strike 10 keystrokes per second. A computer can process 100 million instructions per second. Just like time, you never get those cycles back. Either the processor is sitting idle or it's working. You cannot save the idle time to use later. SETI@home software harnesses those idle cycles, running as a screensaver, only analyzing the SETI data when your computer would normally be sleeping. Other distributed computing software runs in the background as a low priority process so it won't cannibalize the speed of your real work. By successfully distributing the processing work over the Internet, SETI@home and RSA Data Security proved the viability of the distributed computing model, an idea imagined almost three decades earlier.



The Idea Mill. Almost as early as networks linked computers, the idea arose of harnessing unused computer cycles. Experiments with distributed computing began in the early 1970s when a pair of programs called Creeper and Reaper used the ARPAnet network, the predecessor of the Internet. Scientists designed the programs to clone themselves as part of the experiment of distributed computing. Referred to as worms, the programs moved from one machine to another, using the machine's spare cycles to propagate themselves and move on to the next machine.

In the mid-70s, the Xerox PARC (Palo Alto Research Center) created the first Ethernet network. Scientists created a worm that patrolled the network of 100 computers at night, installed itself on the computers, and used spare cycles for calculation-intensive projects. If the worm needed more computing power, it would clone itself, find another unused machine on the network, and install itself to continue processing.

More than 20 years later, the Internet now provides an unparalleled network for distributed computing. However, more problems arise when using a public Internet. Machines come in all types, using different operating systems, such as PC, Mac, and UNIX, and the network itself can be somewhat unreliable. Further, each machine possesses a separate owner, making it necessary for organizations like SETI@home to contact, persuade, and instruct strangers on the finer points of its project. It's quite an onerous task, but in the last decade, numerous organizations have successfully convinced computer users to donate their processor time.



Sweet Success. The first Internet-based distributed-computing project began in 1988 at the DEC System Research Center in Palo Alto, Calif., after DEC scientists found that distributed computing was a perfect way to factor large numbers. Factoring extremely large numbers, finding the prime factors of composite integers, is one of the hardest classical mathematical computations. Adding only three digits to the length of a number doubles the effort needed to factor it. Scientists wrote software to distribute the factoring workload among workstations in their laboratory, and with some success, they extended this model to include computers outside the network. The scientists sent the factoring tasks to volunteers through e-mail, and the volunteers then performed the computations and returned the results. By 1990, DEC labs had about 100 e-mail collaborators who helped factor numbers with 100 digits.

RSA Data Security, a data information security company, sponsored more factoring problems and other puzzles, offering cash prizes to people who solved them. In 1993, DEC labs coordinated a group of 600 people, factoring a number with 129 digits; the project was called RSA-129. In 1995, a larger group broke the 130-digit barrier (called, of course, RSA-130). This time, however, RSA Data Security coordinated the work through a Web interface instead of e-mail. The company's ultimate goals were to test the security of its own products and highlight vulnerabilities in encryption it believed to be second rate. RSA-129 and RSA-130 were part of this program. Other RSA puzzles involved direct attacks on encrypted text.

RSA sponsored a crack of a 1970s government-created cipher called DES (Data Encryption Standard). DES was a 56-bit key, a binary number 56 digits long, creating 70,000,000,000,000,000, or 70 quadrillion, combinations. The RSA puzzles gained popularity and separate groups formed to compete for the money. Each group coordinated its own effort, assigning pieces of the analysis and processing the pieces on individual users' machines. Within the groups, smaller teams formed, centering on everything from coworkers to UNIX users, and competed for bragging rights of the most pieces processed. In June 1997, one group found the correct code after only trying 18% of the possibilities.

RSA issued another challenge, encouraging the groups to crack another 56-bit key called RC5. Three large groups dominated the competition, attracting thousands of interested computer users. The competition drove the groups to develop more interesting and higher quality client software for their users. The Bovine group advanced distributed computing software by creating a GUI (graphical user interface) that displayed statistics of the individual's and their team's process, keeping the excitement fresh as each user helped try more of the 56-bit key combinations. By the end of the contest, more than 4,000 teams formed inside the Bovine group, testing key combinations at a rate of 7 billion per second. Bovine's combined computing power was the equivalent of 26,000 Pentium processors. Distributed .net (http://www.distributed.net) now sponsors and facilitates RSA challenges and boasts nearly half a million computers on its network.

If figuring factors or cracking encryption keys doesn't excite you, maybe talking to aliens will. In 1999, the SETI scientists conceived of a way to use distributed computing to analyze data for its project. The group has been using a giant radio telescope in Puerto Rico for 20 years to listen for signals that may have been sent by alien life forms. Scanning the sky, the telescope picks up sounds that the scientists analyze for anomalies. SETI already had access to a supercomputer that performed a general search for any unusual spikes in the noise, but it wanted to conduct deeper analysis for other noise patterns. Another supercomputer being cost prohibitive, it developed a program called SETI@home.

If you're interested, you can download software from the SETI@home Web site (http://setiathome.berkeley.edu) and help scan the data more thoroughly. Its software runs mainly as a screen saver, using your spare computing cycles only when you are not using your computer. SETI scientists at Berkeley break up the data into small bits that participants can quickly download, analyze, and send back within a week. SETI plans to collect data for two years, gathering data from scanning the sky three complete times. It initially hoped to gain support from thousands of people. To date, it has more than 2.5 million participants from 223 countries. It has not discovered any aliens yet, but the project proves how overwhelmingly successful distributed computing can be. (For more information on the SETI@home project and how it works, see the "How Distributed Computing Works" graphic in this article.)



Network Necessity. As we have seen, distributed computing wasn't developed overnight. It is the culmination of technologies converging and scientists putting those technologies to a new use. To understand how distributed computing works, it is necessary to understand the basics of networks and applications and how they communicate.

Applications are typically composed of three pieces called layers. Each layer performs a specific function and those functions create a working application. The first layer is the presentation layer. It's the user interface, where the application's user types in commands, ordering the system to do something, or inputs information for the system to process. The second layer is the application or business logic. It takes the user's input and interprets it. It also tells the presentation layer what to display. The real workhorse is the bottom layer, which provides all of the general services for the business logic. The bottom layer does the printing, communicating, or database processing. Once it is finished, it hands the results back to the business logic that orders the presentation layer to display the results.

Take an online registration form, for example. The presentation layer shows you the Web page with the form and accepts the keystrokes as you fill out your name and address. When you click Submit, the business logic layer accepts the information and tells the services layer to record the information in a database. The services layer records everything and sends an all-finished message back to the business logic. The business logic cues the presentation layer to display a new Web page, thanking you for signing up.

Application layers are the fundamental reason computer networks exist, that is, to spread out the layers. A network is simply a group of computers connected to each other in order to share resources. The linked machines can be close together, linked by cables, called a LAN (local-area network), or they can be widely dispersed and connected by phone lines like the Internet. There are several different kinds of networks. A peer-to-peer network is two computers sharing resources and application layers, such as what PARC did with the first Ethernet network. Because each computer handles part of the resources, they are both considered servers.

A client/sever network consists of a server, a computer that handles all of the basic services and controls the resources, and a client, the computer the user works on. The client can act alone or request the server to perform certain functions. Due to its efficiency, client/server networks are the most widely used. Different servers can specialize in tasks, such as print servers or file servers.

The Internet is basically a network of other networks. It works along the client/server network model. The Web browser on your machine connects to a host Web server on the Internet and displays the information presented. The Internet has a multitude of servers hooked to it for all kinds of specialized reasons: e-mail, newsgroups, and file transfers, for example.



How SETI@home Works. Distributed computing uses computers connected to a network for processing. Let's look at it in the context of the SETI project. SETI developed software to run on its server to distribute and collect data. It also designed the SETI@home screen saver to sit on client machines and do its processing. It distributed the software as a free download from the Web.

SETI breaks its data into bite-sized chunks, called work-units, that client machines can process easily. In the case of SETI@home, its data is a radio signal spectrum 2.5MHz wide, centered on 1420MHz. Its homegrown splitter software breaks the recorded data into 256 pieces, each 10KHz wide and 107 seconds long. The work-units are 250KB, making it small enough to download quickly over phone lines.



The SETI@home screen saver displays a graphical representation of the data analysis as it happens. It also tells you how much of the work-unit has been analyzed, the date and location of that work-unit, and how many units you have processed for SETI@home.
The server software transmits data and instructions over the network (the Internet) to computers that have the SETI@home screen saver loaded. The screen saver is essentially the services layer of this application, processing the data for the company and sending the results back through the network. The server software collects the analyzed data and compiles it into a database for final analysis.

The SETI@home software processes data for several days, the actual time depending on the speed of your processor. When the analysis is finished, you must log onto the Internet at which point the software transmits the finished work-unit back to the SETI@home servers. The software is available for several platforms (PC, Mac, Unix, and Linux), maximizing its availability to the diverse computers connected to the Internet.

This is the model that all of the successful distributed computing projects follow. The growth in network technology, client/server technology, and application development has propelled distributed computing to heights unknown 25 years ago.



Distributing Profits. It was only a matter of time before someone used the success of nonprofit projects like SETI@home as a model for business. Dot-com companies are springing up around distributed computing. The idea goes something like this: if millions of people will give away their computers' extra cycles for research on numbers or aliens, wouldn't even more people be willing to part with unused cycles if they got paid?



Companies like Popular Power offer businesses with projects needing a lot of computing power a chance to harness its users' computers for spare cycles. Popular Power also works on nonprofit projects such as optimizing the influenza vaccine.
Popular Power (http://www.popularpower.com) is a San Francisco-based company hoping to be able to sell companies supercomputer power for a fraction of the cost. Popular Power developed software that links Internet-connected computers, giving the users' computers the ability to crunch hunks of data from Popular Power's customers fairly cheaply. The company sees itself as an exchange of sorts, buying and selling this spare computer time like a commodity.

After you download Popular Power's software, it acts as a screen saver, running its computations only when your machine is idle and shutting off when you're working. The company started with a nonprofit project, finding ways to optimize vaccines against influenza, to prove themselves to businesses. They are targeting businesses with intense computing needs, such as insurance and pharmaceutical companies, looking to pitch them on reduced costs. Why spend money on a supercomputer or storage banks full of smaller computers that you have to maintain and administer yourself when you can lease space on the Internet? In fact, Popular Power estimates that its power will exceed the processing power of the top supercomputers several times over.

The incentive for the denizens of the Internet is simple. Do enough computing work for Popular Power and get free (or nearly free) Internet service. Popular Power is partnering with ISPs (Internet service providers), letting you knock $10 off your monthly bill. Alternately, you may choose gift certificates to online stores. Participants receive a ranking based on how much work they do for Popular Power. When new projects come up, the participants who have put in the most work get first shot at higher paying jobs. You can choose to work for commercial or nonprofit projects and get equal work credit for the two. Of course, only commercial products pay incentives, but you may feel better doing the nonprofit work.

United Devices (http://www.ud.com) is another Internet startup that began in December 1999 out of Austin, Texas. Much like Popular Power, the company develops the technology needed to support large distributed computing efforts across the Internet. It also administers a mix of commercial and nonprofit projects, including analyzing DNA gene sequences and evaluating molecules as potential cancer fighting agents.

Its payment method centers on frequent flier miles, as well as a daily sweepstakes: at the end of 2000, United Devices gave away an MP3 player every day for 60 days. The frequent flier miles are awarded based on the amount of work you do. All active members have equal chances of winning the sweepstakes. If that's not enough, it offers cash, up to $2,000 for the top worker, for the top 100 members.



Altruistic Computing. Not every new corporation is trying to cash in on distributed computing. Following the success of SETI @home, a for-profit corporation called Entropia launched Fight AIDS @ Home (http://www.fightaidsathome.org) in September 2000. Following a unique business model, Entropia uses profits from distributed computing work done for commercial clients to fund AIDS research in cooperation with The Scripps Research Institute. Users download and install the Entropia 2000 software (http://www.entropia.com). The software runs in the background using idle cycles that would normally be wasted. It performs a specific computational problem, as part of modeling the evolution of drug resistance, in order to design the drugs necessary to fight AIDS. When your computer has finished the computation, the results are sent to Entropia, collected and turned over to Scripps researchers for analysis. Then a new problem is sent to your computer. The Entropia 2000 software occasionally runs commercial tasks on your computer. The profits generated from running commercial tasks pay for the AIDS research.

Distributed computing will only grow in the near future. Whether commercial success can be had from harnessing Internet users' spare cycles remains to be seen. But as innovation in networking and application development continues, computers and networks will get faster, software will get smarter, and more computers will be able to perform more work for distributed computing networks. And if we're lucky, maybe we'll hear from some aliens too.

by Greg Schick

View How SETI@Home Distributed Computing Works.
(NOTE: These pages are PDF [Portable Document Format] files. You will need Adobe Acrobat Reader to view these pages. Download Adobe Acrobat Reader.)


Terms To Know

application layers—Three pieces of a software program needed for the application to work properly. The presentation layer is where the user interacts with the software, typing commands or pressing buttons. The business logic layer interprets the user's input and instructs the services layer to do the data processing or other communication requested. The layers do not necessarily need to exist on the same computer.

client/server network—A network arrangement with a server and one or more clients. Both the server and the clients are complete, standalone computers. The server can be a personal computer, minicomputer, or mainframe, and it provides resources, such as data management, and allows clients to share information among the other clients.

cycle—A single occurrence of a repeating event. One computer cycle is an instruction to do something. Modern computers can handle tens of millions of these instructions per second.

Ethernet—Created by Xerox in the 1970s, Ethernet is the most widely used LAN (local-area network) protocol.

LAN (local-area network)—A group of computers, usually in one building or office, physically connected in a manner that lets them communicate and interact with each other. In order for a network to operate, it needs a server, which is a computer that holds data used by the different computers on the network. Some of the benefits of a network connection include the ability to share document files and expensive equipment, such as laser printers. Networks can be connected using different combinations of topologies, protocols, software, and hardware.

teraflop—A measure of a computer's speed, meaning it can perform 1 trillion floating point operations per second. Flops (floating point operations) store numeric calculations in which the decimal point is not in a fixed location. Floating-point notation includes the digits of a number (called the mantissa) and an exponent. For instance, an extremely small number such as 0.000023 can be written as 23E-6. A large number like 23,000,000 can be written as 23E6. Floating-point processors are designed to perform calculations using this type of shorthand notation, and the processors can speed up the production of graphics and other tasks.

worm—A destructive program containing code that replicates itself until it fills the target drive or network, causing it to malfunction. The first worms were used to test networks of computers, but recently, users with malicious intent use worms to spread computer viruses.






Want more information about a topic you found of interest while reading this article? Type a word or phrase that identifies the topic and click "Search" to find relevant articles from within our editorial database.

Enter A Subject (key words or a phrase):
ALL Words (‘digital’ AND ‘photography’)
ANY Words (‘digital’ OR ‘photography’)
Exact Match ('digital photography'- all words MUST appear together)





Home     Copyright & Legal Information     Privacy Policy     Site Map     Contact Us

Copyright © 2009 Sandhills Publishing Company U.S.A. All rights reserved.