|
Science may be catching up with video gaming. Physicists are hoping to adapt some of the most potent computer components developed by companies to capitalize on growing consumer demands for realistic simulations that play out across personal computer screens. For researchers, that means more power, less cost, and much faster and more accurate calculations of some of Nature's most basic, if complex, processes.
Jefferson Lab is entering the second phase of a three-year effort to create an off-the-shelf supercomputer using the next generation of relatively inexpensive, easily available microprocessors. Thus far, scientists and engineers from JLab's Chief Information Office have created a "cluster supercomputer" that, at peak operation, can process 250 billion calculations per second. Such a 250 "gigaflops" machine -- the term marries the nickname for billion to the abbreviation for "floating-point operations" -- will be scaled up to 800 gigaflops by June, just shy of one trillion operations, or one teraflop. The world's fastest computer, the Earth Simulator in Japan, currently runs at roughly 35 teraflops; the next four most powerful machines, all in the United States, operate in the 5.6 to 7.7 teraflops range.
The Lab cluster-supercomputer effort is part of a broader collaboration between JLab, Brookhaven and Fermi National Laboratories and their university partners, in a venture known as the Scientific Discovery through Advanced Computing project, or SciDAC, administered by the Department of Energy's Office of Science. SciDAC's aim is to routinely make available to scientists terascale computational capability. Such powerful machines are essential to "lattice quantum chromodynamics," or LQCD, a theory that requires physicists to conduct rigorous calculations related to the description of the strong-force interactions in the atomic nucleus between quarks, the particles that many scientists believe are one of the basic building blocks of all matter.
"The big computational initiative at JLab will be the culmination of the lattice work we're doing now," says Chip Watson, head of the Lab's High-Performance Computer Group. "We're prototyping these off-the-shelf computer nodes so we can build a supercomputer. That's setting the stage for both hardware and software. "
The Lab is also participating in the Particle Physics Data Grid, an application that will run on a high-speed, high-capacity telecommunications network to be deployed within the next three years that is 1,000 times faster than current systems. Planners intend that the Grid will give researchers across the globe instant access to large amounts of data routinely shared among far-flung groups of scientific collaborators.
Computational grids integrate networking, communication, computation and information to provide a virtual platform for computation and data management in the same way that the Internet permits users to access a wide variety of information. Whether users access the Grid to use one resource such as a single computer or data archive, or to use several resources in aggregate as a coordinated, virtual computer, in theory all Grid users will be able to "see" and make use of data in predictable ways. To that end, software engineers are in the process of developing a common set of computational, programmatic and telecommunications standards.
"Data grid technology will tie together major data centers and make them accessible to the scientific community," Watson says. "That's why we're optimizing cluster-supercomputer design: a lot of computational clockspeed, a lot of memory bandwidth and very fast communications."
Computational nodes are key to the success of the Lab's cluster supercomputer approach: stripped-down versions of the circuit boards found in home computers. The boards are placed in slim metal boxes, stacked together and interconnected to form a cluster. Currently the Lab is operating a 128-node cluster, and is in the process of procuring a 256-node cluster. As the project develops, new clusters will be added each year, and in 2005 a single cluster may have as many as 1,024 nodes. The Lab's goal is to get to several teraflops by 2005, and reach 100 teraflops by 2010 if additional funding is available.
"[Our cluster supercomputer] is architecturally different from machines built today," Watson says. "We're wiring all the computer nodes together, to get the equivalent of three-dimensional computing."
That can happen because of continuing increases in microprocessor power and decreases in cost. The Lab's approach, Watson explains, is to upgrade continuously at the lowest cost feasible, replacing the oldest third of the system each year. Already, he points out, the Lab's prototype supercomputer is five times cheaper than a comparable stand-alone machine, and by next year it will be 10 times less expensive. Each year as developers innovate, creating more efficient methods of interconnecting the clusters and creating better software to run LQCD calculations, the Lab will have at its disposal a less expensive but more capable supercomputer.
"We're always hungry for more power and speed. The calculations need it," Watson says. "We will grow and move on. The physics doesn't stop until we get to 100 petaflops [100,000 teraflops], maybe by 2020. That's up to one million times greater than our capability today. Then we can calculate reality at a fine enough resolution to extract from theory everything we think it could tell us. After that, who knows what comes next?"