Washington, D.C. — A new multi-million dollar collaboration will enable university researchers to harness the full potential of the data-rich world that characterizes all fields of science and discovery. This ambitious partnership, which includes New York University, the University of California, Berkeley and the University of Washington, will spur collaborations within and across the three campuses and other partners pursuing similar data-intensive science goals.
The new 5-year, $37.8 million initiative, with support from the Gordon and Betty Moore Foundation and Alfred P. Sloan Foundation, was announced at a meeting sponsored by the White House Office of Science and Technology Policy (OSTP) focused on developing innovative partnerships to advance technologies that support advanced data management and data analytic techniques.
At a time when the natural, mathematical, computational and social sciences are all producing data with relentlessly increasing volume, variety and velocity, capturing the full potential of a progressively data-rich world has become a daunting hurdle for both data scientists and those who use data science to advance their research.
While data science is already contributing to scientific discovery, substantial systemic challenges need to be overcome to maximize its impact on academic research.
To overcome these challenges, this effort seeks to achieve three core goals:
- Develop meaningful and sustained interactions and collaborations between researchers with backgrounds in specific subjects (such as astrophysics, genetics, economics), and in the methodology fields (such as computer science, statistics and applied mathematics), with the specific aim of recognizing what it takes to move each of the sciences forward;
- Establish career paths that are long-term and sustainable, using alternative metrics and reward structures to retain a new generation of scientists whose research focuses on the multi-disciplinary analysis of massive, noisy, and complex scientific data and the development of the tools and techniques that enable this analysis; and
- Build on current academic and industrial efforts to work towards an ecosystem of analytical tools and research practices that is sustainable, reusable, extensible, learnable, easy to translate across research areas and enables researchers to spend more time focusing on their science.
"Dramatic expansion in the scale of data collection, analysis and dissemination could revolutionize the speed and volume of discovery," said Chris Mentzel, Moore's Data-Driven Discovery program officer. "However, success ultimately depends on the individuals and teams that combine domain expertise with computational, statistical and mathematical skills – what we are calling 'data science.'"
"It's been hard to establish these essential roles as durable and attractive career paths in academic research," explained Josh Greenberg, who directs the Sloan Foundation's Digital Information Technology program. "This joint project will work to create examples at the three universities that demonstrate how an institution-wide commitment to data scientists can deliver dramatic gains in scientific productivity."
The initiative will tap leading researchers at their respective institutions – and some of the best minds in science and academia. Faculty leads include:
- Yann LeCun, Silver Professor of Computer Science and Neural Science at New York University's Courant Institute of Mathematical Sciences and founding director of New York University's Center for Data Science;
- Saul Perlmutter, professor of physics at the University of California, Berkeley, astrophysicist at Lawrence Berkeley National Laboratory, and Nobel laureate; and
- Ed Lazowska, Bill & Melinda Gates Chair in Computer Science & Engineering at the University of Washington and director of the University of Washington's eScience Institute.
The three leaders believe universities are uniquely positioned to empower researchers to harness the deluge of valuable, heterogeneous, and noisy data continuing to come their way – and help navigate the flood of software analysis tools and approaches that are often incompatible, hard to learn or poorly written by brilliant scientists trying to get their job done.
"As someone whose research depends on the fluent use of data," said Saul Perlmutter, lead faculty member at the University of California, Berkeley, "I'm excited that we now have an opportunity to identify the typical data-science barriers, little and big, that slow our progress, and to see which could be mitigated – or, occasionally, just plain solved!"
"We must build on our existing efforts that leverage existing industry tools, generate new working tools and practices and support the multi-disciplinary experts who develop new approaches and tools needed to fill gaps," said Ed Lazowska, faculty lead at the University of Washington. "Working together, we believe we're going to shift the culture at our universities – and help accelerate broader uptake – for supporting data-intensive discovery."
"With the onslaught of data, much of the knowledge in the world is going to be extracted by machines," said Yann LeCun, faculty lead at New York University. "Universities must find new ways to advance data-science methodologies while facilitating the use of new methods and tools by researchers from every field. Universities also have an opportunity to train new generations of researchers in data-driven science."
Each of the three universities will contribute additional resources to the investment made by the Moore and Sloan foundations, including new faculty positions, physical space on campus and research support.
Each of the partner universities distinguished itself in recent years by pioneering new approaches to discovery in fields as diverse as astronomy, biology, oceanography, and sociology through deep collaborations between researchers in these fields and researchers in data science methodology fields such as computer science, statistics and applied mathematics.
This new partnership – a coordinated, distributed experiment involving researchers at these leading universities – hopes to establish models that will dramatically accelerate this data science revolution by addressing several specific challenges.
Cross-university teams will organize their efforts around six primary areas: strengthening an ecosystem of tools and software environments, establishing academic careers for data scientists, championing education and training in data science at all levels, promoting and facilitating efforts that are accessible and reproducible, creating physical and intellectual hubs for data science activities, and identifying the scientists' data-science bottlenecks and needs through directed ethnography.
This partnership will connect with others, practice open science and share lessons along the way.
The Gordon and Betty Moore Foundation believes in bold ideas that create enduring impact in the areas of science, environmental conservation and patient care. Intel co-founder Gordon and his wife Betty established the foundation to create positive change around the world and at home in the San Francisco Bay Area. Science looks for opportunities to transform – or even create – entire fields by investing in early-stage research, emerging fields and top research scientists. Our environmental conservation efforts promote sustainability, protect critical ecological systems and align conservation needs with human development. Patient care focuses on eliminating preventable harms and unnecessary healthcare costs through meaningful engagement of patients and their families in a supportive, redesigned healthcare system. For more information, please visit http://www.moore.org or follow @MooreScientific.
The Alfred P. Sloan Foundation is a philanthropic, not-for-profit grantmaking institution that supports original research and education in science, technology, engineering, mathematics, and economic performance. Funds for this project were provided through the Foundation's Digital Information Technology program, which leverages developments in information technology to increase the effectiveness of computational research and scholarly communication. For more information, please visit http://www.sloan.org.
New York University, founded in 1831, is one of the world's foremost research universities and a member of the selective Association of American Universities. The first Global Network University, it has degree-granting university campuses in New York, Abu Dhabi, and Shanghai; 11 other global academic sites; and sends more students to study abroad than any other U.S. college or university. Through its 18 schools and colleges, NYU conducts research and provides education in the arts and sciences, law, medicine, business, dentistry, education, nursing, the cinematic and performing arts, music and studio arts, public administration, social work, engineering, and continuing and professional studies, among other areas. For more information, please visit http://www.nyu.edu or follow @nyuniversity.
The University of California, Berkeley is the world's premier public university with a mission to excel in teaching, research and public service. This longstanding mission has led to the university's distinguished record of Nobel-level scholarship, constant innovation, a concern for the betterment of our world and consistently high rankings of its schools and departments. The campus offers superior, high value education for extraordinarily talented students from all walks of life; operational excellence and a commitment to the competitiveness and prosperity of California and the nation. For more information, please visit http://www.berkeley.edu.
Founded in 1861, the University of Washington is one of the oldest public institutions of higher education on the West Coast and is one of the world's preeminent research-intensive universities, with more than 100 members of the National Academies, elite programs in many fields, and annual standing since 1974 among the top five universities in receipt of federal research funding. For more information, please visit http://www.washington.edu.