1.Software Introduction
Stereo-seq Analysis Workflow1 (SAW) software suite is a set of pipelines bundled to position sequenced reads to their spatial location on the tissue section, quantify spatial gene expression and visually present spatial expression distribution. SAW processes the sequencing data of Stereo-seq2 to generate spatial gene expression matrices, and then users could take these files as the starting point to perform downstream analysis. SAW includes thirteen essential and suggested pipelines, as well as auxiliary tools for supporting other handy functions:
- splitMask: Split Stereo-seq Chip T mask file into several pieces according to CID indexing in the SE FASTQ files.
- CIDCount: Count CIDs in the Stereo-seq Chip T mask file and roughly estimat memory required to do mapping.
- mapping: Correspond in situ captured sequenced reads recorded in FASTQ(3,4) files by Stereo-seq with their spatial information. It also aligns reads to the reference genome and generates coordination sorted BAM files.
- merge (optional): Combine CID (same as barcodes) listed files with reads count from multiple runs of mapping. Only for an analysis that requires to combine multiple pairs of FASTQ.
- count: Read BAM files generated from mapping to perform gene annotation, de-duplication, and gene expression analysis on the aligned reads.
- register: Align microscopic tissue staining image with gene expression matrix file (GEF) generated from count. register is an optional pipeline when image fails QC or input image is absent.
- imageTools: Convert TIFF images from IPR, such as template-aligned stitched TIFF image, binarized tissue segmentation and cell segmentation images. Optional module when image fails QC or input image is absent.
- tissueCut: Identify tissue coverage area on the chip and extract gene expression matrix of the corresponding spatial location by taking inputs from both count and register or count pipeline alone.
- spatialCluster: Perform clustering analysis for spots (bin200) according to the gene expression matrix of the tissue coverage area generated from tissueCut.
- cellCut: Identify cell coverage area on the staining image and extract gene expression matrix of the corresponding spatial location by taking inputs from both count and register&imageTools pipeline. Optional module when image fails QC or input image is absent.
- cellCluster: Perform clustering analysis for cell bins according to the gene expression matrix which is generated from cellCut. Optional module when image fails QC or input image is absent.
- saturation: Calculate sequencing saturation of tissue coverage area based on the file that was used for sampling data generated from count.
- report: Generate a JSON format statistical summary report that integrates the analysis result from each step, as well as an HTML web analysis report, and shows spatial expression distribution of genes, key statistical metrics, sequencing saturation plots, clustering analysis results. Depending on the image input state and register mode, HTML reports may or may not have cell bin statistical data and image processing key results.
Other handy functions:
- Other applications of cellCut: Manipulat GEF file.
- rapidRegister: Run registration without performing cell segmentation.
- checkGTF: Check GTF/GFF file is prepared in the correct format, otherwise, re-format one that meets the compatibility requirements for count.
- Other applications of imageTools: 1) Merge two to three TIFF images in R-G-B order, which is useful for checking segmentation result. 2) Plot templates on the panoramic image to assist in the evaluation of stitching and registration result. 3) Write TIFF images into RPI format.
- manualRegister: Acquire manual registration operation parameters from StereoMap visualization software and run manualRegister to modify registration records in IPR. Users can turn on the "fine-tune" switch to let manualRegister perform an automatic adjustment to the manually operated result to make the registration more precise.
- lasso: Acquire one or multiple cell or spatial gene expression matrix subsets according to the GEOJSON file, which stores the coordinates data of manually delineated region(s) from StereoMap visualization software.
2.System Requirements
SAW runs on Linux systems that meet the following minimum requirements:
- 8-core Intel or AMD processor (>24 cores recommended)
- 128GB RAM (>256GB recommended)
- 1TB free disk space or higher
- 64-bit CentOS/RedHat 7.8 or Ubuntu 20.04
To install and run SAW, please install one of the following software:
- docker(5): version 20.10.8 or higher
- singularity(6): oversion 3.8 or higher
3.Related Software
- ImageQC: STOmics microscope ImageQC software is a desktop application intended for assessing the quality of microscope images. The fluorescent image should pass ImageQC evaluation to ensure that it fulfills the requirements for SAW pipelines. SAW >= 5.0.0 requires ImageQC version >= 1.1.0 that provides IPR file for recording image processing data. The final version of ImageQC is v1.2.0 and will be no longer updated.
- ImageStudio: ImageStudio is a fully upgraded image process desktop application which integrates the entire ImageQC functionality. It contains four main image processing modules: image QC, manual stitching, manual tissue segmentation, and manual cell segmentation. The outputs of each module can be input into SAW for further analysis. SAW v6.1 recommends ImageStudio version >= v2.1
- StereoMap: StereoMap is an HD visualization desktop application intended for displaying Stereo-seq analysis results. SAW outputs such as gene expression matrix GEF file, image RPI and IPR files, clustering results can be visualized via StereoMap. SAW v6.1 recommends StereoMap version >= v2.1