Cell-gene expression matrix of tissue-covered region using adjusted cell shapes.
File Name | Common Applications | Visualizations Supported | Description |
---|
SN.raw.gef | Transcription | False | Original transcription matrix of whole chip area, containing only bin1 geneExp group. |
SN.tissue.gef | Transcription | False | Transcription matrix of tissue-covered region, containing only bin1 geneExp group. |
SN.gef | Transcription | True | Visual transcription matrix of whole chip area, containing multiple bin geneExp and wholeExp groups. |
SN.cellbin.gef | Transcription, ssDNA/DAPI | True | Cell-gene expression matrix of tissue-covered region. |
SN.adjusted.cellbin.gef | Transcription, ssDNA/DAPI | True | Cell-gene expression matrix of tissue-covered region using adjusted cell shapes. |
SN.<protein_IF>.gef | Transcription, mIF | True | Original transcription matrix of labeled area, containing only bin1 geneExp group. |
SN.<protein_IF>.cellbin.gef | Transcription, mIF | True | Cell-gene expression matrix that is extracted by cellmask of IF image gray scale threshold filtering. [Recommended to name like this, it is not generated by default but needed to switch to cellCut after tissueCut.] |
SN.protein.raw.gef | Protein | False | Visual matrix of labeled area, only containing bin1 geneExp. |
SN.protein.tissue.gef | Protein | True | Protein matrix of the tissue cut area, containing multiple bin geneExp and wholeExp groups. |
SN.protein.gef | Protein | True | Protein visualization matrix of the complete chip area, including multi-bin geneExp and wholeExp. |
SN.protein.cellbin.gef | Protein | True | Cell-protein expression matrix of tissue coverage area |
Option 1: use C++ compiled geftools:
https://github.com/STOmics/geftools
Option 2: use Python package - gefpy (e.g. 0.6.1):
https://pypi.org/project/gefpy/
https://gefpy.readthedocs.io/en/latest/index.html
pip install gefpy==0.6.1
Option 3: with installed SAW sif (e.g. v5.1.3):
https://hub.docker.com/repository/docker/stomics/saw
singularity exec SAW_v5.1.3.sif cellCut
Please use Singularity version 3.8 or later
Bash export HDF5_USE_FILE_LOCKING=FALSE ## gef2gem using geftools geftools view -i <SN>.gef -o <SN>.gem -s <SN> # -i input square bin GEF, e.g.SN.raw.gef or SN.gef # -o output GEM # -s SN
## gef2gem using gefpy python >>> from gefpy.bgef_reader_cy import BgefR >>> bgef=BgefR(filepath='<SN>.gef',bin_size=200,n_thread=4) >>> bgef.to_gem('<SN>.bin200.gem')
## gef2gem using SAW sif ## export SINGULARITY_BIND="/path/to/input/dir,/path/to/output/dir" singularity exec SAW_v5.1.3.sif cellCut view -i <SN>.gef -o <SN>.gem -s <SN> ## cgef2cgem geftools view -i <SN>.cellbin.gef -o <SN>.cellbin.gem -d <SN>.raw.gef -s <SN> # -i input cellbin GEF # -o output cellbin GEM # -d input square bin GEF, e.g. SN.raw.gef or SN.gef # -s SN ## gem2gef geftools bgef -i <SN>.gem -o <SN>.gef -b 1,20,50 -O Transcriptomics # -i input square bin GEM # -o output square bin GEF # -b bin sizes seqarate by comma, default: 1,10,20,50,100,200,500 # -O omics name
The first one can indicate whether the sequencing is saturated. If the fitted curve reaches or approximates a plateau, this means the sample is about to saturate. Depending on the goal of each individual project, you may need additional sequencing runs. For example, a project designed to recover very lowly expressed transcripts or involves precious samples may desire a higher sequencing saturation. A recommended saturation of 80% is an empirical threshold, it is not a rigid value.
The second and third figures are plotted with statistics computed at bin levels, and their stationary stages are lagging behind Figure 1. The first plot serves as the main indicator for the potential benefit of additional sequencing.
SAW register pipeline includes a cell segmentation procedure, whereas rapidRegister does not.
Step 1: Check if the "Valid CID Reads" ratio in the HTML report is lower than 10%. If so, please check whether the FASTQ corresponds with chip SN.
Step 2: Two possibilities that can lead to a low "Valid CID Reads" ratio of around 10% - 30%:
Reference genome does not meet the format requirement: if the ratio of multi-mapped reads is high, and the uniquely mapped reads ratio is extremely low, please run SAW checkGTF for the GFF/GTF file to verify the file format is valid for running pipelines.
Contamination: please perform troubleshooting on the wet lab workflow.
CID filtering: filter out reads with CID that can not be matched with any CID recorded in the Stereo-seq Chip T mask file.
MID filtering: filter out reads with MID containing N base, reads with MID having ploy-A content, and reads with at least one base whose quality scores are lower than 10.
Reads Filtering: flter out reads containing linkers and DNB sequences, and filter out reads with length < 30 bp after removing adapters.
The outcome of cell segmentation is determined by multiple factors such as the performance of microscope imaging and the segmentation algorithm used. Factors like overexposure and blurring can affect the automatic identification of cell areas which results in poor segmentation output. For some dense areas that are also blurred, and even accompanied by overlapping cells, it is especially difficult for the algorithm to do segmentation accurately. Also, segmentation mistakes will arise in cases where brightness is locally uneven over the tissue areas or background impurities and hangover of cell movement were introduced during experiments (see examples below).
From the perspective of the algorithm itself, training of automatic segmentation was done on specific datasets with manually assigned labels. Hence, the algorithm could perform poorly in identifying some particularly rare cell morphology that is not encompassed in the datasets.
If the algorithm segmentation does not work well, users can manually adjust results using ImageStudio, a desktop image processing software, or try to do it again with Stereopy or other algorithms. If there is a need to enlarge the identified cells, the cell correction algorithm in Stereopy can be employed to increase the cell diameters and have a larger cell coverage.
Blur
Overexposure
Abnormal shapes, like fibers or clumps
Hangover
Bubble
Background impurity
Local uneven brightness
Cells of special forms
The immunofluorescence signal visualizes the location of the targeted proteins on the tissue slice. High fluorescent intensity indicates that a large number of cells in that region actively express the target proteins.
In the SAW workflow, the register module takes the use of an automatic global thresholding algorithm to compute the threshold value of the gray level that binarizes the IF image into the foreground and background region. The foreground region of the IF image is used as the mask file in the tissueCut module to acquire the gene expression matrix of the corresponding region.
If the segmentation result based on gray level calculated automatically is not satisfying, users can utilize the ImageStudio "Tissue Segmentation" module to manually adjust the grayscale threshold of the IF image to obtain a new tissue segmentation result.
Our current quality check strategy for IF images requires a paired DAPI image to be input together. The assessment contents include track line recognition of DAPI image, evaluation of microscope stitching for DAPI/IF images, and calibration between DAPI and IF images based on tissue morphology.
The detected track lines from the DAPI image during the QC step provide a fiducial reference frame for automatic image registration with the chip. Microscope stitching evaluation is used to determine whether there are obvious stitching errors in the microscope-stitched global image, guaranteeing the quality of subsequent tissue segmentation and alignment. Calibration evaluation is aimed to ensure that IF images can be processed in the same way as the DAPI image in terms of stitching, rotation, scaling, translation, and flip, and finally register the IF images with the expression matrix.
However, it is possible that the IF images have dissimilar tissue morphology with DAPI, which might fail calibration QC. In such cases, ImageStudio can be used to make adjustments pairwisely with the "Calibration" module.
In the situation where DAPI image fails QC for track line recognition and microscope stitching, the related IF images can not be further processed automatically.
The alignment between IF image and the spatial gene expression matrix is achieved indirectly by taking the DAPI image as a reference frame.
DAPI and IF images of the same tissue slice were shot back to back by switching channels. With the chip fixed during imaging, DAPI and IF images share the same stitching, scale, and angle parameters as compared to those of the spatial gene expression map. So the information used for DAPI image stitching, rotation, scaling, translation, and transformation can be applied to image processing of the IF layer as well, including alignment with the expression matrix.