SAXSPipe Full Documentation
Our high-throughput SAXS Autosampler can can collect 96 samples autonomously over a five hour period. To address the large volume of data this generates we have developed SAXSPipe, an automated SAXS data processing and analysis pipeline. SAXSPipe requires no user intervention and produces a web viewable table of graphs and statistics that allow the user to rapidly and easily review the results of the SAXS experiments all the way through from basic data quality checks to protein flexibility analysis, oligomer state determination and protein density modeling procedures. The results of the analysis are stored in an automatic analysis folder that the user can take home and consult later.
Additionally, SAXSPipe can be run from a command line, giving a possibility to re-analyze data after the experiment.
If run without any parameters, it will output a basic how-to:
saxspipe2.py <sample> [buffer]
If only a sample file is given, program will search for the most recent buffer file and, if found, use it for analysis. To analyze the sample without the buffer subtraction input '0' as the buffer.
If only a sample is given and the file has buffer designation (Bxxx in the filename), it will be buffer subtracted only if the found buffer file has the same sample name as the sample file. This is for the case of buffers taken during one concentration series. If the previous buffer has different name, there will be no buffer subtraction. In case that both sample and buffer are given on the command line this test is skipped.
Sample and buffer can be specified in three different ways:
1) full path to the file: saxspipe2.py /path/to/my/data/file.tif
2) relative path to the file: saxspipe2.py ../path/data/file.tif
3) series number of the file: saxspipe2.py 6
for case 3), data are expected to be in ../data/ folder
In case 1), results will be saved in /path/to/my/analysis/automatic_analysis/
In case 2) and 3), results will be saved in ./automatic_analysis/
The pipeline is written in Python2 and uses analysis software from the ATSAS package by Dimitri Svergun’s group at Hamburg and our in-house image scaling and integration software SasTool. The software has been designed to be modular and easily adaptable, allowing the easy addition of new software tools as they arise. SAXSPipe is now regularly used during biological SAXS experiments to help users analyze their data.
SAXSPipe outputs data in organized as shown here:
SAXSPipe html table output overview:
The output analysis table has the following parts:
1) Initial data collection information – this section includes on the data collection such as the time it was taken, the sample and buffer filenames, the results of the scaling and integration of the images by sastool, and an Rg and distribution function analysis. Clicking on the .sub file name will bring up the text file with the scattering profile data which can be then opened in an appropriate program (for instance Primus or Graphit) for further analysis. The sastool log can also be accessed by clicking on the entries in the ‘Buffer(rejected) Sample(rejected)’ column. Log files from autorg and datgnom can be viewed by clicking on the respective links.
2) Image graphs- this section of the table features four graphs to help the user quickly identify problems with the data collection:
- a. The variance of the scaled and integrated images from the buffer and sample data collections, allowing for the rapid identification of radiation damage and other problems like air bubbles. Horizontal dashed line shows the limit above which the images will be rejected.
- b. A comparison of the averaged buffer and sample scattering (tot files), which allows the user to quickly see how strong the sample scatters
- c. Low q (<0.2A^-1) comparison of the subtracted sample scattering profiles for each image, again allowing the user to identify radiation damage over time
- d. A graph of the change in Rg over the image series
3) Data graphs- This section contains a series of basic SAXS data analysis graphs:
- a. log(i) vs q plot- the usual SAXS analysis graph
- b. log(i) vs log(q) plot- As above but with the lower q region emphasized
- c. q2 x I(q) vs q plot (Kratky plot) - Protein flexibility/disorder graph
By clicking on any of these plots you will get their interactive version.
4) Data analysis graphs - results of automated data analysis:
- a. Guinier plot – the Guinier region of the plot used for Rg calculation
- b. P(r) distribition - the P(r) function calculated by datgnom
5) Row analysis - comparison of data from one concentration series:
- a. sub(n)/sub(1) – ratio of concentration series datasets with the first dataset. This graph will show any concentration effects on the scattering.
- b. Rg - a graph of the change of Rg across the concentration series
- c. I0 - a graph of change of I0 across the concentration series