News Release

NASA-funded competition rewards efforts to predict penguin populations

Business Announcement

NASA/Goddard Space Flight Center

Penguins

image: In the Antarctic Peninsula, an adult Adélie penguin incubates a chick to keep it warm and dry. view more 

Credit: Credits: Credit: Catherine Foley/SUNY Stony Brook University

Penguins are arguably the most iconic and well-loved of the handful of animal species that call Antarctica home. From a scientist's point of view, they are also important ecosystem indicators: how well their populations do reflects the health of krill and other fisheries that the birds prey upon. Changes in the penguins' environment -- sea ice, atmospheric and ocean temperatures, among other factors--will ultimately impact the distribution and size of their colonies.

Ecologists have monitored Antarctic penguins for decades, be it through field counts in sites near research stations or, more recently, satellite imagery spanning most of the continent. Traditional population datasets are patchy and limited in time, which makes it challenging to forecast how the birds will do in the future.

A competition funded by NASA recently aimed to improve statistical models that forecast penguin populations by bringing in data scientists to aid ecologists in their efforts.

"NASA is committed to developing satellite-based tools that increase our understanding of the distribution and abundance of wildlife to foster more effective management in the field by those charged with preserving organisms, like Antarctica's penguins, which underpin natural ecosystems," says Woody Turner, manager of the Ecological Forecasting Program at NASA Headquarters in Washington.

The data contest is the brainchild of Heather Lynch, associate professor of ecology and evolution at Stony Brook University, and Grant Humphries, a former post-doctoral student with Lynch who now works as a data scientist at Black Bawks Data Science in the United Kingdom. Lynch's laboratory has developed a NASA-funded web tool, Mapping Application for Penguin Populations and Projected Dynamics (MAPPPD), which allows anyone, from fisheries managers to citizen scientists, to check on the population data available for the four species of Antarctic penguins and make forecasts for future trends.

"As we were working on developing MAPPPD, I realized that we had one of the largest data sets for penguin populations out there," Humphries said. "I was familiar with data science competitions and I thought that we could put our population data out there and see if a competitor could come up with a solution that challenged some of our current thinking about how we predict penguin populations."

Lynch hoped that the winning models from the data competition would allow her team to expand MAPPPD's predictions.

"In the tool we have developed for NASA, we want to do ensemble forecasts, just like the National Hurricane Center does for hurricanes when they put out four to five projections for which paths a particular storm might follow," Lynch said. "We'd like to put out something like that for penguin population dynamics so we can see how much various models may differ in terms of their population predictions."

But despite having one of the largest existing ecological datasets on animal populations, Lynch said, her team had trouble finding a host for the contest: the largest data competition hosting site deemed the penguin population dataset minuscule when compared to the average size of datasets used in other contests. Finally, DrivenData--a smaller site specializing in competitions with a social impact -- offered to host the contest.

The competition, which was open for two months from April 27 to June 27 and attracted 97 participants that submitted more than 600 models, had two main prize pools plus a bonus award, for a total of $16,000. The first category of prizes, called the Prediction Competition, gauged how well the competitors' models could predict the abundance of Antarctic penguins at each the 660 known penguin colonies for the 2014-15, 2015-16 and 2016-17 field seasons. For this, the researchers provided population data from 1980 to 2013, withholding the numbers from 2014 to 2017. The Prediction Competition awarded five prizes to the most accurate models; the winners were from all over the world - India, United Kingdom, Israel, Ukraine and Brazil -- and had varied backgrounds that ranged from IT consulting to coastal engineering.

"The winning models did a lot better than I had expected," Humphries said. "Given that the participants were only given two months' time to work on them, I was thinking their predictions would be within 6 to 10 percent error to the real population numbers -- and their error range was under 5 percent."

Humphries was particularly excited that the first winner used a method in his model that the researchers themselves have been trying in the past years: utilizing machine learning algorithms to fill in gaps in the penguin population data sets.

"I was very glad to see that that approach won, because machine learning and artificial intelligence are quickly advancing fields, and to see them being applied to penguin population forecasting is really cool," Humphries said.

The second main category of prizes was the Modeling Report Competition, for which Lynch and Humphries evaluated submitted reports providing the biological reasoning about the models. The winners for this category came from the pool of five who had the most accurate predictions. The bonus prize, which is pending, will reward the model that makes the closest prediction to the 2018 Adélie penguin population, once the researchers have carried their 2017-2018 penguin counting field season.

Humphries said that their next step will be writing a scientific paper with the five winners on the commonalities and differences in approaches of the winning models. He would also like to organize another contest in the near future, pairing up Antarctic researchers with data scientists.

"Penguin populations are changing dramatically, and we really don't understand why: we penguin biologists have been banging our head against the wall about this for decades," Lynch said. "At some point, you've tried everything you can think to try. So we thought, 'Let's see if we can get some other smart people on board, and get some fresh ideas.'"

###

The MAPPPD project is a collaborative effort between Oceanites, Inc., a non-profit environmental organization in Chevy Chase, Maryland; Black Bawks Data Science Ltd.; and Lynch's lab at Stony Brook University in Stony Brook, New York. NASA provided funding for MAPPPD and prize money for the data competition.


Disclaimer: AAAS and EurekAlert! are not responsible for the accuracy of news releases posted to EurekAlert! by contributing institutions or for the use of any information through the EurekAlert system.