Synchrotrons face a data deluge
The storage ring of the European Synchrotron Research Facility sits at the confluence of the Drac and Isère rivers in Grenoble, France.
ESRF/Jocelyn Chavy
In August the European Synchrotron Research Facility (ESRF) in Grenoble, France, opened the Extremely Brilliant Source (EBS), a newly upgraded light source that is 100 times as brilliant as its predecessor and some 100 billion times as powerful as the average hospital x-ray source.
There’s something else EBS can do better than its predecessor: produce data.
The light source has a theoretical capacity to produce 1 petabyte of data per day, says Harald Reichert
Since the 1980s, both beamline photon fluxes and detector data rates have far outpaced the rate of increase in Moore’s law. Even as scientists turn to automation, the unique nature of synchrotrons, with their myriad applications, makes automating their outputs uniquely complicated.
In the early 2000s, three months’ worth of data from a detector could fit into a 100-megabyte archive, says Stefan Vogt
The data transfer and storage infrastructure can and does fail to keep up, resulting in lags that can stretch up to days, according to James Holton
The rapid inflation of data also makes it difficult to future-proof new beamlines. “You have to be making choices now against what the computing infrastructure is going to look like in about five years’ time,” says Graeme Winter, an x-ray crystallographer at the Diamond Light Source in the UK.
Upgrading the storage infrastructure only shifts the bottleneck further downstream. There, automation can pick up the reins. Not only can AI, machine learning, and neural networks help in analysis, but they can make data much more manageable by throwing away poor-quality data. They can also reduce excess by stopping data collection in mid-experiment when certain conditions have been reached.
Indeed, the Large Hadron Collider (LHC), which CERN claims
Consequently, it’s often left to the users of each beamline and application to develop their own specialized firmware and algorithms. When large synchrotrons like APS host dozens of beamlines, some of which are deeply customizable, the volume of specialized use cases renders a streamlined system like CERN’s impractical.
The sample chamber within a beamline at the National Synchrotron Light Source II in Upton, New York. State-of-the-art synchrotron beamlines can produce up to 12 gigabytes of data per second.
When users do take up the challenge of building tools, the programmers who make them often approach the problem differently from the scientists who need them. Earlier this year, crystallographers used Diamond to aid drug discovery efforts by scanning a protease
Frank von Delft, a macromolecular crystallographer at Diamond, says that programmers should focus on making their tools easier to use. “When that’s achieved,” he says, “your whole platform suddenly becomes powerful.” In particular, he cites Phenix
Fortunately, the future seems to be pointing toward greater streamlining, including at the synchrotron end. Traditionally, facilities left the data-handling part of their science to the users, but the enormous data volumes, as well as other factors such as more computation shifting to the cloud, are changing that.
Reichert believes each synchrotron facility should help provide scientists the tools they need and assist with the computation. “When we give [a scientist] beam time,” he says, “we’d better ask the question: What do you do with the data, and what kind of help do you need to actually get an answer to your scientific problem and put the answer out in the open?”