Algorithms -- A new perspective on data
DOE/Pacific Northwest National Laboratory
Statisticians at Pacific Northwest National Laboratory are marrying computational power with statistical techniques to sift through all these forms of data together. Their work is being applied in a variety of areas, such as analyzing handwriting and identifying bioagents.
Whether clients come in with existing data or PNNL gathers the data, statisticians help uncover hidden information through exploratory analysis, grouping like kinds of information and extracting key features. Using systematic sampling and experimental design techniques, they ensure data are reliable and will support confident decisions.
"We take varying types of information, whether it's text, video or audio and turn it into mathematical representations. Once we have a mathematical representation, we can apply our statistical techniques of clustering and data analysis," said Brent Pulsipher, who manages PNNL's statistical and quantitative sciences group.
PNNL statisticians use clustering algorithms to find groups that share a common feature in some dimension and "cluster" them together. "Many of our algorithms are self-clustering. We don't say 'group these into a certain category that relates to a certain feature.' The algorithms specify categories themselves," said Pulsipher. Identifying these groupings is called "lead generation" because it provides leads that may explain what is causing a problem.
In one project, statisticians are developing algorithms to identify handwriting samples. These algorithms quantify handwriting characteristics, such as density, height and slant. "We use statistical methods to test for similarities and differences between unknown and known handwriting samples," said Kris Jarman, who leads the effort.
In another project, statisticians are using algorithms with a bio-pathogen sensor being developed at the Laboratory called Matrix-Assisted Laser Desorption Ionization Mass Spectrometry (MALDI-MS). The algorithms quickly identify the unique features of questionable bacteria and categorize those features in real time according to pathogen type. In lab tests, these algorithms were more than 95 percent accurate in classifying bacteria strains.
Disclaimer: AAAS and EurekAlert! are not responsible for the accuracy of news releases posted to EurekAlert! by contributing institutions or for the use of any information through the EurekAlert system.