Feature Story | 5-Mar-2025

AI/ML's bright future at NSLS-II

From automation to analysis, AI-driven innovations are making synchrotron science faster, smarter, and more efficient

DOE/Brookhaven National Laboratory

Cover — **image:**
**NSLS-II schematics**
view more

Credit: Brookhaven National Laboratory

The National Synchrotron Light Source II (NSLS-II) — a U.S. Department of Energy (DOE) Office of Science user facility at DOE’s Brookhaven National Laboratory — is among the world’s most advanced synchrotron light sources, enabling and supporting science across various disciplines. Advances in automation, robotics, artificial intelligence (AI), and machine learning (ML) are transforming how research is done at NSLS-II, streamlining workflows, enhancing productivity, and alleviating workloads for both users and staff.

Artificial intelligence, real solutions

As synchrotron facilities rapidly advance — providing brighter beams, automation, and robotics to accelerate experiments and discovery — the quantity, quality, and speed of data generated during an experiment continues to increase. Visualizing, analyzing, and sorting these large volumes of data can require an impractical, if not impossible, amount of time and attention. Presenting scientists with real-time analysis is as important as preparing samples for beam time, optimizing the experiment, performing error detection, and remedying anything that may go awry during a measurement.

Addressing these challenges requires tools that are fast, adaptable, and applicable facility wide. The rapid advancement of AI/ML provides the opportunity to optimize beamline operations, solve data challenges, and automate repetitive tasks with specialized applications created by NSLS-II staff and collaborators.

Anomaly detection

NSLS-II operates around the clock, and experiments run even when staff and users are not at the beamlines. They might be working on other aspects of their research — sample preparation, for example — or even sleeping as experiments run late into the night. Real-time quality control of an experiment is crucial, especially when measurements take several hours or even days. If a sample is damaged or misaligned, equipment fails, or any other unforeseen disturbance derails an experiment, the issue needs to be detected and addressed immediately. Subtle anomalies require the experience and knowledge of experts to be identified and corrected.

To ensure that any problems with the experiment are identified and addressed quickly, beamlines have started to employ AI agents that are based on supervised ML models trained on hundreds to thousands of examples of “good” data sets compared with characteristically unusable “bad” measurements. These agents are able to keep watch over the experiment and integrate with various messaging platforms at the facility to report on data quality as the experiment is ongoing. This kind of oversight saves beam time, resources, and effort that can go into other research.

Digital beamline user assistants

As large language models (LLMs) become more robust, chatbots trained in specialized areas via methods such as retrieval augmented generation (RAGs) can supplement beamline staff to assist users. These bots can be used to answer general questions or be trained on NSLS-II-specific resources. RAGs allow LLMs to use documentation provided by staff to pull vetted information alongside all the other general knowledge at their disposal, retrieving preferential answers. These digital assistants can help new users navigate the proposal system, design, or guide experiments, and recall important safety information to ensure that beam time goes smoothly. They can also assist NSLS-II staff in summarizing and categorizing the large volume of information present in the proposal system, allowing common themes and user community needs to be quickly detected and highlighted.

Data science for beamline data

Data science methods, like unsupervised learning for real-time monitoring, can provide users with technique-independent ways to track, organize, and visualize data. This type of ML allows models to sort through unlabeled or partially labeled data and find patterns and similarities without prior instruction. A key benefit is that these methods don’t require extensive pre-training for the AI as long as they have enough data to work with.

So-called “non-negative matrix factorization” is a decomposition method that allows researchers to break down complex datasets into a small set of components and their relative contributions, helping to simplify and reconstruct the data with minimal errors. To improve the quality of these data even further, NSLS-II scientists have allowed for added constraints to be employed based on known or assumed factors, to help capture the true underlying physical patterns. By injecting prior knowledge into this AI analysis, constrained non-negative matrix factorization can provide real-time feedback and enable precise timing decisions for in-situ experiments. In molten salt research, for example, this kind of analysis helped to detect an intermediate phase that occurred during heating.

Other unsupervised learning models, like hierarchical clustering, can take large datasets or data streaming in real-time and quickly sort and categorize them. This is particularly helpful in rapidly identifying sample damage or sample changes, like phase changes in response to temperature. These techniques could be used to detect very quick, subtle changes in materials in ways that traditional data collection could miss.

AI-driven analysis

One of the broader goals of integrating AI/ML-driven tools and methods is to accelerate data analysis to catch up to the speed of data collection. Instead of relying on analysis after the fact, scientists can analyze data using AI/ML tools throughout the experiment, allowing for both real-time information and the capacity for a more focused “self-driving” study. Using expert-labeled training data, AI can also learn to perform advanced analysis methods tailored to a specific technique, assisting users who are not yet experts.

As an example, to analyze high-throughput X-ray diffraction data, an NSLS-II-led team of scientists developed the X-ray crystallography companion agent (XCA). This AI agent addresses material characterization challenges by employing a fully synthetic dataset used to train a probabilistic model prior to the experiment. The AI agent can then successfully identify known phases in experimental data in real-time. This system speeds up analysis by learning from structural databases and using a probabilistic approach to avoid overconfidence in its own predictions. Acting as a smart assistant for researchers, this agent improves accuracy and saves time.

More advanced methods could be used to swarm compute potential analysis methods, presenting composite results to users. When there isn’t just a single form of analysis, limiting data to only one, due to constraints on time and the process, can also limit what can be learned from these results. There are even instances where models that produce incorrect answers can be informative when compared to a large batch of results. Using AI/ML to analyze data using several different models at once and aggregate those data can give users facets to their measurements that they may not have initially considered.

Agent-driven science

It’s not just analysis models that are being explored. Researchers at NSLS-II have developed methods to drive experiments at beamlines in real time. For example, reinforcement learning, where AI models can learn by interactions with a real or simulated environment, can help to optimize data collection, particularly in time-constrained conditions. Unlike traditional supervised learning methods, reinforcement learning models learn more by a trial-and-error system with a “reward” structure. In an experiment with a few hundred samples, for instance, this kind of model could determine the optimal time allocation for each to maximize data quality in the limited beam time window.

AI agent driven capabilities not only allow scientists to perform their existing experimental workflows better but they also enable entirely new methods of experimentation. To push the limits of what this technology could do to accelerate materials discovery at NSLS-II, a group of scientists and engineers developed and executed an autonomous, multimodal AI/ML-driven experiment. This experiment successfully measured a single sample library across two distant beamlines as measurements were taken simultaneously — a world’s first accomplishment in beamline science.

Different kinds of analysis, however, require different kinds of AI tools, whether it’s poring through massive amounts of data to find a very small detail at the end of an experiment or monitoring something in real time and looking for a particular indicator. Because of this, several teams at NSLS-II are constantly working on new and innovative ways to leverage data analysis for different beamlines, techniques, and experiments. To use these tools and integrate AI agents brought to the facility by user groups, a robust framework is needed.

The future of human-AI collaboration in beamline science

The direction of human-AI interfaces at beamlines is likely to be characterized by a combination of automation and interactivity with researchers using AI models to quickly analyze large amounts of data while also retaining the ability to directly interact with and interpret their results. Being able to have a network of agents and tools that communicate with each other and have rich human-AI interfaces is a big initiative at NSLS-II. A lot of work is being put into building intuitive interfaces that enhance the user experience.

In this kind of cooperative framework, AI agents perform specific tasks, like classifying data or suggesting new measurements, and they can interact with human researchers or other agents. The goal is to have users interface with these agents easily, even with no AI expertise. Having meta-agents that act as adjudicators — that is, direct which, if any, of many different agents is controlling the beamline at a given time — ensures the right tool is being used for the right job at the right time.

Much like the people who work at NSLS-II, these tools must work together — not in isolation — to enhance overall scientific output and the user experience at NSLS-II. The key to leveraging AI for experiments at NSLS-II is to ensure that the framework is versatile enough to be used for multiple types of experiments on a number of different beamlines. This is where Bluesky, the backbone of these projects, comes in. Simply put, Bluesky is an open-source software suite for collecting, storing, managing, and analyzing data. It’s a library for experiment control and a collection of scientific data and metadata. The Bluesky control system, which was initially developed at NSLS-II and is now used and further developed at light sources around the world, allows the seamless integration of AI tools across the facility.

Bluesky is a versatile toolbox. It’s like having a set of building blocks that can be fit together and swapped around to meet the needs of the experiment that’s being performed; it’s modular and adaptable. By having a standardized interface, beamlines can integrate agents that can perform almost any function that is required. Being open source, Bluesky also fosters collaboration. The experience of users, as they explore this new frontier, can help shape the software’s development and interoperability between experiments, beamlines, and even other facilities in the future.

NSLS-II is building a future where AI/ML can facilitate experiments requiring high-complexity data analysis, multi-modality setups, and broader decision-making tools. By allowing users to process more samples or handle more complex experiments at accelerated time scales, these technologies empower researchers to take risks and tackle high-complexity samples they might otherwise avoid due to time and personnel constraints. AI/ML has already had a great impact on synchrotron research, and NSLS-II is continuously expanding AI-driven approaches to address new scientific challenges.

Brookhaven National Laboratory is supported by the Office of Science of the U.S. Department of Energy. The Office of Science is the single largest supporter of basic research in the physical sciences in the United States and is working to address some of the most pressing challenges of our time. For more information, visit science.energy.gov.

Follow @BrookhavenLab on social media. Find us on Instagram, LinkedIn, X, and Facebook.