Discover
/
Article

Mathematical tools to tame big data

AUG 01, 2018
Data Analysis Techniques for Physical Scientists, Claude A. Pruneau, Cambridge U. Press, 2017, $89.99 Buy on Amazon

DOI: 10.1063/PT.3.4000

Emilie Martin-Hein

In this era of big data, most researchers in the physical sciences will encounter statistics and data analytics at some point in their careers. And data science skills are not just relevant to the physical sciences—they are applicable to a wide array of modern problems in areas ranging from health care to marketing. Data Analysis Techniques for Physical Scientists by Claude Pruneau is thus of potential interest not only for physical scientists but also for those interested in other fields that deal with large data sets and their challenges.

PTO.v71.i8.54_1.f1.jpg

AGSANDREW/ISTOCK/THINKSTOCK

View larger

Pruneau draws on his extensive research experience at the Relativistic Heavy Ion Collider at Brookhaven National Laboratory and at the Large Hadron Collider at CERN, and his book contains much of the fundamental knowledge required for graduate students studying nuclear and high-energy physics. As such, it lies in a sparsely populated middle ground in the literature. Books specifically about data analysis techniques, such as The Elements of Statistical Learning: Data Mining, Inference, and Prediction (2nd edition, 2009) by Trevor Hastie, Robert Tibshirani, and Jerome Friedman, usually assume some prior knowledge of basic statistics and contain little introductory material. The few books about data analysis methods for physical scientists, including Roger Barlow’s Statistics: A Guide to the Use of Statistical Methods in the Physical Sciences (1989) and Brian Martin’s Statistics for Physical Sciences: An Introduction (2012), are aimed primarily at undergraduates.

PTO.v71.i8.54_1.d1.jpg

Data Analysis Techniques for Physical Scientists, in contrast, presents a comprehensive, high-level treatment of topics specific to nuclear- and particle-physics experiments. It is also accessible to researchers who work with multidimensional data sets and are interested in learning about the data analysis methods used in large-scale particle experiments, a topic that’s typically not covered in general statistics texts.

The introductory chapter takes the reader on a philosophical journey that sets the tone for the rest of book. Using anthropological and historical arguments, it explains the meaning and purpose of the scientific method. The chapter illustrates that statistical methods are essential for extracting significant scientific results. It is a delightful read for scientists and nonscientists alike.

The 13 chapters that follow are divided into three parts. Each chapter closes with problems designed to deepen students’ understanding of the concepts discussed. Those exercises allow the reader to derive relevant formulas by means of creative and realistic examples often taken from actual experiments.

The first part of the book, “Foundation in Probability and Statistics,” gives a thorough and mathematically rigorous tour of its topic that could in fact be a standalone introduction to advanced statistics for a broad range of readers. The section features a modern account of the frequentist and Bayesian interpretations of probabilities. The in-depth catalog of the language of statistics and probability is a welcome resource that the reader will be able to reference as needed.

Part 1 also contains three chapters devoted to classical inference, a formal introduction of confidence intervals and hypothesis testing, and an excellent review of Kalman filtering. A study of Bayesian inference methodology brings part 1 to a conclusion.

Part 2, “Measurement Techniques,” is equally outstanding. Discussions of particle decays, cross sections, and corresponding observables are followed by thorough treatments of particle identification, event reconstruction, and correlation functions. Instrumental effects, detection efficiency, and unfolding methods also receive extensive consideration. The brief part 3, “Simulation Techniques,” highlights Monte Carlo methods.

Some topics are missing from this otherwise thorough 716-page book. As he admits in his preface, Pruneau does not go into the details of detector technologies. In my view, that gap is not detrimental to a reader interested in learning specifically about data analysis. However, some missing topics deserved at least a brief mention. A word on minimizing experimenters’ bias by performing blind analyses would be beneficial, especially to graduate students entering the field. Overviews of some commonly used advanced algorithms, such as neural networks and decision trees, would complete this modern data analysis practitioner’s toolbox. Fortunately, those topics are covered elsewhere—for example, interested physical-sciences students may find a good supplemental read in Adrian Bevan’s Statistical Data Analysis for the Physical Sciences (2013).

Data Analysis Techniques for Physical Scientists offers an accessible but rigorous and comprehensive presentation of data analysis techniques in modern large-scale experiments. Furthermore, much of the book is applicable beyond the physical sciences; it is a useful resource on probability and statistics that would benefit anyone who works with large data sets. Taken as a whole, it is an exceptional general reference for graduate students and seasoned experimental researchers alike.

More about the Authors

Emilie Martin-Hein earned a PhD in high-energy physics at the University of California, Irvine, in 2009 and a data mining and applications graduate certificate at Stanford University in 2017. She currently works at Skyline College and City College of San Francisco in California.

Emilie Martin-Hein. City College of San Francisco, San Francisco, California.

This Content Appeared In
pt_cover0818_no_label.jpg

Volume 71, Number 8

Related content
/
Article
Immeasurable Weather: Meteorological Data and Settler Colonialism from 1820 to Hurricane Sandy, Sara J. Grossman
/
Article
/
Article
Predicting Our Climate Future: What We Know, What We Don’t Know, and What We Can’t Know, David Stainforth
/
Article
/
Article
Physics of Wave Turbulence, Sébastien Galtier
/
Article

Get PT in your inbox

Physics Today - The Week in Physics

The Week in Physics" is likely a reference to the regular updates or summaries of new physics research, such as those found in publications like Physics Today from AIP Publishing or on news aggregators like Phys.org.

Physics Today - Table of Contents
Physics Today - Whitepapers & Webinars
By signing up you agree to allow AIP to send you email newsletters. You further agree to our privacy policy and terms of service.