More Data, More Science and … Moore's Law?
In the same way that the Internet has combined with web content and search engines to revolutionize every aspect of our lives, the scientific process is poised to undergo a radical transformation based on the ability to access, analyze, and merge large, complex data sets. Scientists will be able to combine their own data with that of other scientists to validate models, interpret experiments, re-use and re-analyze data, and make use of sophisticated mathematical analyses and simulations to drive the discovery of relationships across data sets. This “scientific web” will yield higher quality science, more insights per experiment, an increased democratization of science, and a higher impact from major investments in scientific instruments.
In this talk, Yellick will describe some examples of how science disciplines from biology to astrophysics are changing in the face of their own data explosion, and how mathematical analyses, programming models, and workflow tools can enable different types of scientific exploration. This will lead to a set of open questions for computer scientists, due to the scale of the data sets, the data rates, inherent noise and complexity, and the need to “fuse” disparate data sets. Rather than being at odds with scientific simulation, many important scientific questions will only be answered by combining simulation and observational data, sometimes in a real-time setting. Along with scientific simulations, experimental analytics problems will drive the need for increased computing performance, although the types of computing systems and software configurations may be quite different.
Finally, I will present a vision of an Extreme Data Scientific Facility, which will bring together data from many research projects, institutions, and subdomains of science to enable the transformation of the scientific process.
Katherine Yelick is a Professor of Electrical Engineering and Computer Sciences at the University of California at Berkeley and the Associate Laboratory Director for Computing Sciences at Lawrence Berkeley National Laboratory. She is known for her research in parallel languages, compilers, algorithms, libraries, architecture, and runtime systems. She co-invented the UPC and Titanium languages and developed analyses, optimizations, and runtime systems for their implementation.
Her work also includes memory hierarchy optimizations, communication-avoiding algorithms, and automatic performance tuning, including the first autotuned sparse matrix library. She earned her Ph.D. in Electrical Engineering and Computer Science from the Massachusetts Institute of Technology and has been on the faculty of UC Berkeley since 1991 with a joint research appointment at Berkeley Lab since 1996. She was the director of the National Energy Research Scientific Computing Center (NERSC) from 2008 to 2012 and in her current role as Associate Laboratory Director she manages a 300-person organization that includes NERSC, the Energy Science Network (ESNet), and the Computational Research Division. She has received multiple research and teaching awards, including the Athena award, and she is an ACM Fellow and an IEEE Senior member.
She has served on study committees for the National Research Council and is a member of the California Council on Science and Technology, the National Academies Computer Science and Telecommunications Board, and the Science and Technology Board overseeing research at Los Alamos and Lawrence Livermore National Laboratories.