1. File format:
GFF files or GTF files, supporting gtf/gtf.gz, gff/gff.gz, gff3/gff3.gz as file suffix names.
2. GTF file format:
Comment lines begin with #
The main body has 9 columns, separated by 'tab': seqname source feature start end score strand frame attributes
type: types of annotation information must contain gene,transcript and exon
start/end: need to be less than 231
strand: forward and reverse of strands, represented as + and -, respectively
attributes as the 9th column, whose format is tag "value" , with different attributes separated by space; of which the following four are required.
gene_name value
gene_id value: represents the unique ID of a transcript for the given gene loci of the genome. 'gene_id' and 'value' are separated by space. If the value is empty, it means that there is no corresponding gene.
transcript_name value
transcript_id value : a unique ID to identify a transcript. Empty value means no transcript.
At present, the maximum valid gene number must be less than 220, that is 1048576
Do not disrupt order. The same gene's transcript/exons need to be arranged in order
3. GFF file format:
Comment lines begin with #
The main body has 9 columns, separated by 'tab': seqid source type start end score strand phase attributes
type: types of annotation information must contain gene,mRNA and exon
start/end: max of them need to be less than 231
strand: "+" stands for forward strands, "-" stands for reverse strands, "." indicates there is no need to specify positive or negative strands, "?" means unknown
attributes as the 9th column, whose format is tag=value, with different attributes separated by semicolon
ID Name Parent must provide (Parent is not required for each gene)
For naming rules of the 3rd column, please carefully check on ⇒ "dendrachy" (tree-shaped hierarchy) (do not list 'child' rows without 'parent' rows!) An example is shown as follows:
At present, the maximum valid gene number must be less than 220, that is 1048576
Although ordering is not required, the rules that 'gene' must appear ahead of corresponding mRNA, and mRNA must appear ahead of corresponding exon still need to be met.
4. Others to note:
gene/gene_name should not contain any special symbols (space, all types of brackets, quotation marks, <>, %, etc.) other than common symbols such as "_" and "."
gene/gene_name shorter than 64 characters
Although the mainly used GFF files are version 3 (GFF3), please name them as .gff ; likewise, please name GTF files as .gtf
imageQC | ImageQC description | SAW | SAW description |
---|
<= 1.0.8 | File format: .json + .tar.gz Features: ssDNA image QC | <= 4.1.0 | Support ssDNA image registration and tissue segmentation |
>= 1.1.0 | File format: .ipr + .tar.gz Features: ssDNA image QC | >= 5.1.3 | Support cell segmentation on ssDNA image; enable analysis of FASTQ data in Q4 format |
ImageStudio | ImageStudio description | SAW | SAW description | StereoMap | StereoMap description |
---|
1.0.0 | File format: .ipr + .tar.gz Features: ssDNA image QC and manual processing | >= 5.5.0 | Support cell segmentation on ssDNA image; enable analysis of FASTQ data in Q4 format | 1.0.0 | Support displaying spatial expression heatmap, co-visualization of gene distribution, and ssDNA image. Manual registration enabled |
2.0.0 | File format: .ipr + .tar.gz Features: Image QC for ssDNA, DAPI, mIF stains and manual processing | >= 6.0.0 | Support mIF image registration; allow for rRNA filtering | 2.0.0 | Display of individual mIF images and the ones stacked with different image layers |
2.1 | File format: .ipr + .tar.gz Features: Image QC for ssDNA, DAPI, mIF stains and their manual image processing; Fully manual procedure for QC-failed images | >= 6.1 <7.0 | Support analysis of the manually processed image outputs from ImageStudio and StereoMap | 2.1 <3.0 | Support reading multiple gef files at a time, which will be displayed by individual tabs |
2.2 | File format: .ipr + .tar.gz Features: Image QC for ssDNA, DAPI, mIF stains and their manual image processing; fully manual procedure for QC failed image | >=6.1 <7.0 | Support analysis with the results of fully manual procedure done by ImageStudio | 2.1 <3.0 | Support reading multiple gef files at a time, which will be displayed by individual tabs |
3.0 | File format: .ipr + .tar.gz Features: Image QC for ssDNA, DAPI, H&E, mIF stains and their manual image processing; fully manual procedure for QC failed images | 7.0 | Reconstructed 'count' go online; 'register' reconstructed with new tissue segmentation algorithm and new 'V03' cell segmentation algorithm; Support H&E whole process; Support cell correction using EDM algorithm based on mask file of cell segmentation result | 3.0 | Support reading h5ad files with different binsize/resolution; /codedCellBlock information is written into cgef file after the SAW cellChunk module; Render cellbin heatmap while loading cgef files |
The Image studio is integrated into StereoMap | File format: .tar.gz (includes. ipr) Features: ssDNA, DAPI, H&E, mIF Image QC and manual processing; And full manual processing for QC-failed Image; | 8.0 | ● Now the standard spatial transcriptomic analysis workflow is intergrated into one command line. ● Support one-stop computational workflow for FFPE sample (including microorganism analysis) ● Output zipped report file ● Output zipped package for visualization | 4.0 | ● Visualization: Support reading with .stereo manifest file; compatible with data of old version in reading ● Manual processing: Processe image data in a step by step manner |
8.1 | ● Support Stereo-seq T FF V1.3 and Stereo-CITE T FF data analysis | 4.1 | ● Visualization: Support the display of gene expression heatmaps for cellbin analysis; support linked display for the protein & marker genes ● Manual processing: New registration method available (Feature point registration) ● The output file supports user-defined directories. |
There are three directions in which investigation can be carried out.
1. Sequencing quality. Low sequencing quality can affect alignment results. In addition to Q30, the presence of unknown base calls needs to be considered as well, which can be examined by reviewing base distribution in the sequencing report. If the proportion of N bases is high, it needs to be considered that sequencing problems have affected the valid CID ratio. It is recommended to prioritize such inspection.
2. The chip mask h5 file does not correspond to the FASTQ datasets. Because the CID recorded in the mask does not match the CID obtained by sequencing the sample, the valid CID ratio is low. If this situation occurs alone, the proportion is usually extremely low. If the next situation is also involved, the variation would be of significance, requiring a case by case analysis.
3. (Cross) Contamination. It occured when other samples got mixed in during the experiment, library preparation, or sequencing, which affected the valid CID ratio because of being contaminated. Here comes a likelihood that two chips can be both mapped to the sequencing data of the same library. If there is a lot of mixing, a distinct tissue pattern should be visible. If the proportion is extremely small, in some cases there will be some local bright spots.
Some information, such as cell sizes of specific tissue types, can be used. It is recommended to vary the bin level repeatedly based on the results of downstream analyses, with a spectrum of bin20, 50, 100, and 200. Bin20 is about the size of a regular mammalian cell, while bin50 and bin100 are both frequently adopted in the analysis. And bin200 is generally used for immediate visualization of SAW outputs.
Given that the diameter of a typical mammalian cell is approximately 10μm, it is analogous to a bin20 spot that is 10μm x 10μm in area or a bin14 spot with a diagonal of 10 μm.
No, Stereo-seq Transcriptomics Set library requires different sequencing protocols and sequencing reagents compared to other libraries.
Stereo-seq Transcriptomics Set library can be sequenced on DNBSEQ-G400RS, MGISEQ-2000RS and DNBSEQ-T7RS platforms.
No, samples of different tissue types need to be tested for different permeabilization conditions. In addition, different samples should not be processed on the same chip to prevent cross-contamination.
It can be used for practice.
Yes, they are different.