Feature Story | 26-Nov-2024

Taming big data and particle beams: how SLAC researchers are pushing AI to the edge

Researchers across the lab are developing AI tools to harness data and particle beams in real time and make molecular movies, speeding up the discovery process in the era of big data.

DOE/SLAC National Accelerator Laboratory

This is the first of a two-part series exploring a sampling of ways artificial intelligence helps researchers from around the world perform cutting-edge science with the lab’s state-of-the-art facilities and instruments. Read part two here.

Every day, researchers at the Department of Energy’s SLAC National Accelerator Laboratory tackle some of the biggest questions in science and technology – from laying the foundations for new drugs to developing new battery materials and solving big data challenges associated with particle physics and cosmology.

To get a hand with that work, they are increasingly turning to artificial intelligence. “AI will help accelerate our science and technology further,” said Ryan Coffee, a SLAC senior scientist. "I am really excited about that.”

Fine tuning particle beams for studying speedy atoms and molecules

Understanding the structure and behavior of atoms and molecules for materials or biological applications requires sophisticated X-ray and ultrafast instruments and machines, such as SLAC’s Linac Coherent Light Source (LCLS), Stanford Synchrotron Radiation Lightsource (SSRL) and the Megaelectronvolt Ultrafast Electron Diffraction (MeV-UED) instrument, that can reveal nature at the smallest and fastest scales through, for example, molecular movies.

These scientific endeavors, however, require finely tuned machines and create massive volumes of complex data at ultrafast rates. SLAC researchers are turning these challenges into an opportunity to drive and lead a new era of machine learning tools to optimize these facilities, experiments and data management.

Particle accelerators are the backbone of SLAC’s X-ray and ultrafast facilities, creating unprecedented opportunities for the large global research community. One challenge is quickly tuning the electron beam that generates the X-rays for the unique requirements of each experiment. Experienced operators must consider and adjust hundreds of parameters with limited information, so it can be hard to see how the adjustments are exactly shaping the beam and to determine what to try next.

Machine learning tools make this process easier, so researchers can spend less time tuning. “You want a more cohesive picture of the beam when tuning – the ability to flexibly, quickly adjust settings to produce beams that each researcher wants, dynamically control those beams in real time and have some indication of how that is feeding back into the end science goals. For LCLS we want to be able to rapidly switch between different configurations for different researchers,” said Auralee Edelen, SLAC accelerator scientist.

One method Edelen’s team has been working on is phase space reconstruction, where a machine learning tool combines a physics simulation with a machine learning model to better visualize the beam more quickly with a few data points. In contrast, some methods that use machine learning models to make predictions about the beam distribution can take thousands of data points to train, while other reconstruction methods can take many hours of data gathering and computation.

This tool can reduce the time to visualize the beam from many hours to just a few minutes with a simple setup that can be found at almost every accelerator facility. Ryan Roussel, a member of Edelen's team who is leading the development of the technique, is now working on bringing it into regular operation at the LCLS facility’s upgraded X-ray laser. “We’re trying to make it easier to understand what’s going on in the accelerator at any given point in time,” Edelen said.

SLAC’s machine learning tools have been deployed at other accelerators around the world and can be adapted for other types of instruments, such as the MeV-UED. “We have tried to make sure that the software tools we make have very specific tasks and have standard interfaces,” said Edelen. “We modularize those tools to make it as easy as possible to swap out individual pieces as needed, which helps when trying to apply them to different systems with different software ecosystems and sets of needs.”

Working on the edge

Among AI’s strong suits is handling massive amounts of data arriving over short time periods. Such abilities come in handy at the LCLS facility’s upgraded X-ray laser, where experiments are expected to churn out data at an astonishing one terabyte per second. This is analogous to streaming about 1,000 full-length movies per second, says Abhilasha Dave, a SLAC digital design engineer. Conventional methods of storing data on a computer for processing and analyzing afterwards will not be feasible due to the high power consumption and storage space required and costs involved.

SLAC’s solution is edge machine learning, which enables processing and analyzing the data on special hardware, called a field programmable gate array (FPGA), on the instrument detector close to the data source, the so-called “edge.”

"We’re working on how to accelerate this data flow so that you can analyze data in flight,” said Coffee. Edge machine learning reduces the amount of data to a minimum useful set that can be easily stored and reduces the need for expensive and power-hungry computing. To do this, however, traditional machine learning models used on a computer must now shrink in size to fit in the limited space on the FPGA.

Dave has a few options for downsizing the model. “We first explore techniques to reduce the model size through several strategies,” she explained. “One approach involves training the machine learning model on data that has already been compressed. Another method focuses on developing faster algorithms, and a third involves grouping the data together to reduce its complexity. Which strategy we choose depends on the specific application.”

After training the model and then ensuring it performs with the required accuracy, Dave needs to move the model to the FPGA, a completely different kind of computer from the one used for training – and a computer without standardized support for machine learning models that Dave can use.

To bridge this gap, SLAC software engineer J.J. Russell created the SLAC Neural Network Library (SNL). The SNL serves as an intermediary between the model and the FPGA by translating the model’s actions into instructions the FPGA understands. Now, the FPGA can be put into the instrument to keep up with huge volumes of data arriving at blazing rates from machines such as LCLS – edge machine learning in action. With ever-increasing need for edge machine learning on the horizon, the team designed SNL to be easy to use for even a novice to machine learning, helping ensure that its impact will cut across many fields of science.

As SLAC teams across the lab continue to invent, refine and expand the capabilities of machine learning tools, they are looking to design them to be flexibly applied to other instruments, facilities and applications within and outside of the lab. “We purposefully try to design all of our architectures, software and firmware to be very widely applicable, so doing work on one project may benefit another project,” said Ryan Herbst, director of the Technology Innovation Directorate’s Instrumentation Division. “We’re starting to think about areas in distributed sensing environments, such as the smart grid and low latency control systems for fusion power plants.”

“Ultimately, everything is going to benefit from machine learning, just like everything benefitted from the computer,” said Coffee, adding, “This year’s Nobel Prizes in physics and chemistry emphasize the role AI is now playing in science.”

These advances reflect the growing use of AI throughout the lab, says SLAC scientist Daniel Ratner, a leading AI and machine learning scientist at SLAC. “Looking forward, we expect to see more real-time AI in the loop, guidance during data acquisition, facility control and scientific exploration across the lab’s mission.”

This research was supported by the DOE Office of Science. LCLS and SSRL are DOE Office of Science user facilities.

Disclaimer: AAAS and EurekAlert! are not responsible for the accuracy of news releases posted to EurekAlert! by contributing institutions or for the use of any information through the EurekAlert system.