Big Data Analytics for 3-D Atomic Scale Imaging (#4290)
1 University at Buffalo-the State University of New York, Dept. of Materials Design and Innovation, Buffalo, New York, United States of America
In this talk we explore the application of advanced data analytical on data derived from atomic scale characterization. We will focus on two genres of imaging, atom probe tomography and time resolved X-ray diffraction. Advanced manifold learning methods can help to provide insight into both the interpretation of images as well as insight into instrumentation performance. The presentation will suggest ways in how machine learning can help to guide the development of instruments with computational intelligence capabilities.
Keywords: atom probe tomography, x-ray, manifold learning
Building a Data System for LCLS-II (#4011)
J. B. Thayer1
1 SLAC National Accelerator Laboratory, LCLS, Menlo Park, CA, United States of America
The volume of data generated by the upcoming LCLS-II upgrade will present a considerable challenge for data acquisition, data processing, and data management. According to current estimates, one instrument could generate instantaneous data rates of hundreds of GB/s. In this high-throughput regime, it will be necessary to reduce the data on-the-fly prior to writing it to persistent storage. Even with data compression, it is expected that operating all four new instruments will produce an estimated 100 PB of data in the first year alone. We present a description of the envisioned LCLS-II Data System, which provides a scalable data acquisition system to acquire shot-by-shot data at repetition rates up to 1 MHz, an inline data reduction pipeline that provides a configurable set of tools including feature extraction, data compression, and event veto to reduce the amount of data written to disk, an analysis software framework for the timely processing of this data set, and an expanded data management system for accessing experiment data and meta-data.
Data Challenges in Serial Femtosecond Crystallograpy (#2924)
S. J. Aplin1
1 Deutsches Elektronen-Synchrotron (DESY), Centre for Free Electron Laser Science, Hamburg, Hamburg, Germany
Serial Femtosecond Crystallography (SFX) experiments performed at X-ray Free Electron Laser (FEL) and synchrotrons sources represent one of photon science’s biggest data processing challenges. The latest generation of pixel detectors currently being deployed at synchrotron and X-ray free electron laser facilities are capable of generating data rates well in excess of 1 Gigabyte per second and with frames rates exceeding 1 kHz. This will increase by a factor of 10 in the near future. Such data rates coupled with the fact that data reduction and online data rejection are not widely utilised in the field of photon science, create a significant challenge in terms of data handling at such facilities. Experiments currently undertaken at the Linac Coherent Light Source (LCLS) lead to the creation of experimental data sets of over 100 terabytes in less than a single week of operation. To-date the complete raw data from these experiments has been saved to disk and then archived to tape. The length of time that this data will be held at facilities is currently unclear, but it has assumed to be in excess of 5 years. This has allowed users analyzing the data to perform repeated offline processing on each and every frame produced by the detector. The expected increase in data rates at X-ray free electron laser facilities, such as EU-XFEL and LCLS II, means that these facilities have stated that this approach will no longer be feasible, as storing all data produced by the detector for a number of years would incur prohibitive costs. This presentation will use SFX experiments, and the corresponding data processing software that they employ, to discuss where and how this massive data challenge can be addressed, and focus on opportunities for date reduction and online data rejection.
Keywords: Data Processing, Online, Crystallography