STOmics STOmics

EN CN
FAQ
Search
Filter Clear
Products
Stereo-seq Solutions
Stereo-seq Solution V1.3
Stereo-seq Solution V1.2
Stereo-seq Solution - mIF
Stereo-seq Large Chip Designs
Stereo-CITE Solution
Stereo-seq OMNI Solution
STOmics Software
Stereo-seq Analysis Workflow
StereoMap
Technical Process
Sample Preparation
Operating Procedure
Experimental Results
Image Process
Sequencing Analysis
Report Interpretation
104results:
Q What are the major filtering steps for the sequencing data in SAW pipelines?
A
  • CID filtering: filter out reads with CID that can not be matched with any CID recorded in the Stereo-seq Chip T mask file.

  • MID filtering: filter out reads with MID containing N base, reads with MID having ploy-A content, and reads with at least one base whose quality scores are lower than 10.

  • Reads Filtering: filter out reads containing DNB sequences; filter out reads with length < 30 bp after removing adapters; filter out reads with length < 30 bp after removing Poly-A sequences.

  • For more information, please refer to [SAW User Manual > Algorithms > Reads processing algorithms].


Q What factors affect cell segmentation results? How to get optimized segmentation?
A
  • The outcome of cell segmentation is determined by multiple factors such as the performance of microscope imaging and the segmentation algorithm used. Factors like overexposure and blurring can affect the automatic identification of cell areas which results in poor segmentation output. For some dense areas that are also blurred, and even accompanied by overlapping cells, it is especially difficult for the algorithm to do segmentation accurately. Also, segmentation mistakes will arise in cases where brightness is locally uneven over the tissue areas or background impurities and hangover of cell movement were introduced during experiments (see examples below). For more information, please refer to Microscope Assessment Guideline: https://enfile.stomics.tech/STUM-PE001%20Microscope%20Assessment%20Guideline_ver%20C.pdf

  • From the perspective of the algorithm itself, training of automatic segmentation was done on specific datasets with manually assigned labels. Hence, the algorithm could perform poorly in identifying some particularly rare cell morphology that is not encompassed in the datasets.

  • If the algorithm segmentation does not work well, users can manually adjust results using StereoMap, or try to do it again with other algorithms. Then import the binary cell mask file into StereoMap Image Processing for further analysis. For detailed operation, please refer to StereoMap User Manual > Image processing Guide > Nuclear staining image > Step 4: Cell segmentation.

1686730194363



Q What's the principle behind IF image QC?
A

For SAW <= v8.1, StereoMap <= v4.1, ImageStudio <= v3.0:

  • The current quality check strategy for IF images requires a paired DAPI image to be input together. The assessment contents include track line recognition of DAPI image, evaluation of microscope stitching for DAPI/IF images, and calibration between DAPI and IF images based on tissue morphology.
  • The detected track lines from the DAPI image during the QC step provide a fiducial reference frame for automatic image registration with the chip. Microscope stitching evaluation is used to determine whether there are obvious stitching errors in the microscope-stitched global image, guaranteeing the quality of subsequent tissue segmentation and alignment. Calibration evaluation is aimed to ensure that IF images can be processed in the same way as the DAPI image in terms of stitching, rotation, scaling, translation, and flip, and finally register the IF images with the expression matrix.

  • However, it is possible that the IF images have dissimilar tissue morphology with DAPI, which might fail calibration QC. In such cases, ImageStudio can be used to make adjustments pairwisely with the "Calibration" module.

img4.263b9b3e

In the situation where DAPI image fails QC for track line recognition and microscope stitching, the related IF images can not be further processed automatically.


Q How are immunofluorescence (IF) images mapped to the gene expression matrix?
A


The alignment between IF image and the spatial gene expression matrix is achieved indirectly by taking the DAPI image as a reference frame.

DAPI and IF images of the same tissue slice were shot back to back by switching channels. With the chip fixed during imaging, DAPI and IF images share the same stitching, scale, and angle parameters as compared to those of the spatial gene expression map. So the information used for DAPI image stitching, rotation, scaling, translation, and transformation can be applied to image processing of the IF layer as well, including alignment with the expression matrix.


Q How to remove rRNA alignments during analysis? Can rRNA sequences that need to be removed be specified manually?
A

Please choose from the below methods based on the specific SAW version used.

SAW >= v8.0
  • Use SAW makeRef to construct a reference genome index file for alignment. If rRNA removal is required, set the --rRNA-fasta parameter during the construction process to automatically add rRNA information and build the index file. Please refer to [SAW User Manual > Tutorials > Preparation of reference]
  • When running SAW count, enable the--rRNA-remove parameter and use the index file that includes rRNA information. Please refer to [SAW User Manual > Analysis > Pipelines > SAW commands] for more info.


SAW < v8.0

It is allowed to manually add rRNA sequences to the reference genome FASTA file, followed by rebuilding reference indices. With rRNARemove switch on, SAW mapping will filter out the reads that are mapped to rRNA sequences. rRNA filtering function is recently added in SAW v6.0.

Rules to add rRNA sequence: include rRNA sequences to filter out in the FASTA file, and append '_rRNA' at the end of the usual sequence name starting with ">", for program identification. Examples are as follows:

img2.4ba7ac55

  • Add a row of "rRNAremove" to bcPara file prior to running SAW mapping . Examples are as follows:

Plain Text
in=<mask>
in1=<lane_read_1.fq.gz>
in2=<lane_read_2.fq.gz>
barcodeReadsCount=<lane.barcodeReadsCount.txt>
barcodeStart=0
barcodeLen=25
umiStart=25
umiLen=10
umiRead=1
mismatch=1
bcNum=<CIDCount>
polyAnum=15
mismatchInPolyA=2
rRNAremove

If a query read has been mapped to a particular rRNA sequence, the 3rd column of the alignment record displays the corresponding RNAME with a suffix of "_rRNA" as the sequence names in the reference genome, and the optional field in the 12th column has XF:i tag set as 3. The ratio of rRNA will be computed according to XF tag records during the following annotation step.

img3.c710d19a



Q Is there any helpful tool for checking for errors in annotation files?
A

Please choose from the methods below based on the specific SAW version used. 


SAW >= v8.0

  • Use checkGTF pipelines from the SAW package as below:

SAW checkGTF \
    -input-gtf=/path/to/input/GTF/or/GFF \
    -output-gtf=/path/to/output/GTF/or/GFF


SAW < v8.0

  • Use checkGTF function from the SAW sif as shown below:

## export SINGULARITY_BIND="/path/to/input/dir,/path/to/output/dir"
singularity exec SAW.sif checkGTF \
    -i <input.gtf/gff> \  ## GTF/GFF file input to be checked
    -o <output.gtf/gff>  ## [optional]. Set to output revised GTF/GFF file. Be aware that this may remove some genes which do not meet the requirements and cannot be fixed.
  • Some genes in the annotation file that could not have their annotation formats automatically rectified would not be retained by the program in the output. You can find the relevant records in the log and call the command again after proper modification of the annotation file.



Q Why are most genes in the annotation file not annotated?
A
  • It is possible that the input annotation file does not conform to the norms. Please double check according to the file format requirements mentioned above.

  • Another possibility is that the forward/reverse symbols of the strand are not in the right format. Strand values in annotation files should only be either "+" (forward) or "-" (reverse), do not confuse "-" (hyphen) with "_" (underline)

  • If there are direction inconsistencies with genes that have same name and come from the same chromosome, the annotation file will be regarded as abnormal, and all genes of this kind will be discarded.


Q How to deal with the error reporting "Fatal INPUT FILE error, no valid exon lines in the GTF file" during reference genome indexing
A

One possibility is that GTF/GFF annotation files are not completely consistent with genome FASTA files in terms of chromosome naming. Please keep the chromosome name unified.

Q What are the situations where corresponding genes are omitted while reading annotation files?
A

There are no attributes of gene_name gene_id transcript_name transcript_id in gtf format (only gene_name and gene_id are needed for each gene)

There are no attributes of ID Name Parent in gff format (Parent is not needed for gene entities)

Multiple gene IDs are assigned to the same gene, as printed by log "Multiple gene IDs for gene xxx: id1, id2..."

Both forward and reverse strands are assigned to the same gene, as printed by log "Strand disagreement for gene xxx - skipping"

No transcript_id for transcript/exon, as printed by log "Record does not have transcriptID for gene xxx"

If a gene has multiple transcripts and the same transcript_id / ID, as printed by log "Transcript appears more than once for xxx"

start > end for some exons, as printed by log "Exon has 0 or negative extent for xxx"

There is overlap between exons of the same transcript, as printed by log "Exons overlap for xxx"

A gene has no transcript present, as printed by log "No transcript for gene xxx"

ps: One contig with multiple genes sharing the same gene_name will merge them into one.


Q What are requirements for genomic annotation files including GTF/GFF formats?
A

1. File format:

GFF files or GTF files, supporting gtf/gtf.gz, gff/gff.gz, gff3/gff3.gz as file suffix names.

2. GTF file format:

Comment lines begin with #

The main body has 9 columns, separated by 'tab': seqname source feature start end score strand frame attributes

type: types of annotation information must contain gene,transcript and exon

start/end: need to be less than 231

strand: forward and reverse of strands, represented as + and -, respectively

attributes as the 9th column, whose format is tag "value" , with different attributes separated by space; of which the following four are required.

gene_name value

gene_id value: represents the unique ID of a transcript for the given gene loci of the genome. 'gene_id' and 'value' are separated by space. If the value is empty, it means that there is no corresponding gene.

transcript_name value

transcript_id value : a unique ID to identify a transcript. Empty value means no transcript.

At present, the maximum valid gene number must be less than 220, that is 1048576

Do not disrupt order. The same gene's transcript/exons need to be arranged in order

3. GFF file format:

Comment lines begin with #

The main body has 9 columns, separated by 'tab': seqid source type start end score strand phase attributes

type: types of annotation information must contain gene,mRNA and exon

start/end: max of them need to be less than 231

strand: "+" stands for forward strands, "-" stands for reverse strands, "." indicates there is no need to specify positive or negative strands, "?" means unknown

attributes as the 9th column, whose format is tag=value, with different attributes separated by semicolon

ID Name Parent must provide (Parent is not required for each gene)

For naming rules of the 3rd column, please carefully check on ⇒ "dendrachy" (tree-shaped hierarchy) (do not list 'child' rows without 'parent' rows!) An example is shown as follows:

img1.ad49f9f8

At present, the maximum valid gene number must be less than 220, that is 1048576

Although ordering is not required, the rules that 'gene' must appear ahead of corresponding mRNA, and mRNA must appear ahead of corresponding exon still need to be met.

4. Others to note:

gene/gene_name should not contain any special symbols (space, all types of brackets, quotation marks, <>, %, etc.) other than common symbols such as "_" and "."

gene/gene_name shorter than 64 characters

Although the mainly used GFF files are version 3 (GFF3), please name them as .gff ; likewise, please name GTF files as .gtf


Reach out to Us
Discover the power of Stereo-seq
Consult