Discover
/
Article

White House seeks to get a handle on “big data”

MAY 01, 2012
Scientific enterprise is “drowning in data but starving for understanding.”

DOI: 10.1063/PT.3.1555

Five federal science and technology agencies announced plans to spend more than $200 million in total to develop new tools and techniques to process and analyze huge volumes of digital data. The initial cadre of “big data” R&D participants are the Department of Energy, NSF, the Department of Defense and its Defense Advanced Research Projects Agency (DARPA), the National Institutes of Health, and the US Geological Survey (USGS).

Presidential science adviser John Holdren said the initiative, announced on 29 March, responds to criticism from the President’s Council of Advisors on Science and Technology that the government has been underinvesting in technologies needed to collect, store, preserve, manage, analyze, and share large quantities of data. The world is now generating 1021 bytes of data each year, and the volume is growing rapidly, Holdren said. The data are generated from such diverse sources as remote sensors, online retail transactions, text messages, email, video messages, computers running large-scale simulations, and scientific instruments, including particle accelerators and telescopes. Big data, Holdren said, “are critical to accelerating the pace of discovery in many different domains of science and engineering.”

William Brinkman, director of DOE’s Office of Science, said experiments at the Large Hadron Collider generate terabytes of data each second, and a climate-model simulation produces 10 terabytes a day. As part of the data initiative, DOE announced a $25 million, four-year award to a national laboratory–university consortium led by Lawrence Berkeley National Laboratory. The goal is to establish a scalable data management, analysis, and visualization institute to assist researchers in using the latest software tools to analyze the data generated by the labs’ high-performance computers.

A joint solicitation announced by NSF and NIH is aimed at advancing the core scientific and technological means for managing, analyzing, visualizing, and extracting useful information from large and diverse data sets. Grants will be awarded for research on new algorithms, statistical methods, technologies and tools for improved data collection and management, data analysis, and e-science collaboration environments. NSF also announced a $10 million grant to the University of California, Berkeley, researchers who are developing novel data-center programming models, improved computational infrastructure, and new scalable machine-learning algorithms and data management tools for handling large-scale heterogeneous data sets. The NSF National Center for Supercomputing Applications, at the University of Illinois at Urbana-Champaign, is home to one of the most powerful supercomputers in the world, and it just granted access to researchers a few weeks ago, noted NSF director Subra Suresh.

The NIH announced that it is making its 1000 Genomes Project, the world’s largest collection of data on human genetic variation, available for free on a cloud operated by Amazon.com. At 200 terabytes, that data printed out would fill 16 million file cabinets, said NIH director Francis Collins. The project has collected a data set so large that few researchers have the computing power needed to make use of it.

The Pentagon “is placing a big bet on big data,” said Zachary Lemnios, assistant secretary of defense research and engineering. The $60 million of new research funding just announced brings DOD spending on big data R&D to $250 million annually. Some of the funding will be devoted to open prize competitions (see PHYSICS TODAY,November 2010, page 21 ) to be announced in the months ahead, Lemnios said. One DOD goal is to enable “truly autonomous systems that go well beyond tethered joysticks. These systems will be agile, they will maneuver and understand their environment, they will make decisions by themselves, and [they will] also know when to call upon a human,” he said. Ken Gabriel, acting director of DARPA, said that agency will devote $25 million a year for four years to developing computational techniques and software tools to sort through mountains of internet traffic looking for terrorist threats. Gabriel likened the quest to searching for a 55-gallon drum in the Atlantic Ocean.

“We are drowning in data but starving for understanding,” said USGS director Marcia McNutt. The agency’s John Wesley Powell Center for Analysis and Synthesis announced the award of eight new research projects for transforming big data sets and big ideas about Earth science theories into scientific discoveries.

Holdren promised that other federal agencies would have additional announcements about big data in the months ahead. He invited industry and universities to participate and said the effort is “not something the government can or wants to do by itself.”

A White House fact sheet (http://www.whitehouse.gov/sites/default/files/microsites/ostp/big_data_fact_sheet_final_1.pdf ) lists dozens of ongoing federal programs that address challenges and opportunities afforded by big data in support of agency missions and science and innovation.

PTO.v65.i5.28_1.f1.jpg

The evolution of Hurricane Katrina. This simulation was generated by researchers in the Advanced Visualization Laboratory at the NSF-funded National Center for Supercomputing Applications. The AVL team transformed terabytes of data into an animation of the 36-hour period when the storm gained energy over the warm waters of the Gulf of Mexico and headed toward New Orleans.

NSF

View larger

More about the Authors

David Kramer. dkramer@aip.org

This Content Appeared In
pt-cover_2012_05.jpeg

Volume 65, Number 5

Related content
/
Article
/
Article
/
Article
/
Article
/
Article
Despite the tumultuous history of the near-Earth object’s parent body, water may have been preserved in the asteroid for about a billion years.

Get PT in your inbox

Physics Today - The Week in Physics

The Week in Physics" is likely a reference to the regular updates or summaries of new physics research, such as those found in publications like Physics Today from AIP Publishing or on news aggregators like Phys.org.

Physics Today - Table of Contents
Physics Today - Whitepapers & Webinars
By signing up you agree to allow AIP to send you email newsletters. You further agree to our privacy policy and terms of service.