STOmics STOmics

EN CN
FAQ
Filter Clear
Products
Stereo-seq Solutions
Stereo-seq Solution - mIF
Stereo-seq Large Chip Designs
Stereo-CITE Solution
Stereo-seq OMNI Solution
STOmics Software
Stereo-seq Analysis Workflow
StereoMap
Technical Process
Sample Preparation
Operating Procedure
Experimental Results
STOmics Product
Image Process
Sequencing Analysis
Report Interpretation
101results:
Q How to deal with abnormal gene expression visualization result that does not show any tissue morphology?
A

Step 1: Check if the "Valid CID Reads" ratio in the HTML report is lower than 10%. If so, please check whether the FASTQ corresponds with chip SN.

Step 2: Two possibilities that can lead to a low "Valid CID Reads" ratio of around 10% - 30%:

Reference genome does not meet the format requirement: if the ratio of multi-mapped reads is high, and the uniquely mapped reads ratio is extremely low, please run SAW checkGTF for the GFF/GTF file to verify the file format is valid for running pipelines.

Contamination: please perform troubleshooting on the wet lab workflow.


Q What are the major filtering steps for the sequencing data in SAW pipelines?
A

CID filtering: filter out reads with CID that can not be matched with any CID recorded in the Stereo-seq Chip T mask file.

MID filtering: filter out reads with MID containing N base, reads with MID having ploy-A content, and reads with at least one base whose quality scores are lower than 10.

Reads Filtering: flter out reads containing linkers and DNB sequences, and filter out reads with length < 30 bp after removing adapters.


Q What factors affect cell segmentation results? How to get optimized segmentation?
A
  • The outcome of cell segmentation is determined by multiple factors such as the performance of microscope imaging and the segmentation algorithm used. Factors like overexposure and blurring can affect the automatic identification of cell areas which results in poor segmentation output. For some dense areas that are also blurred, and even accompanied by overlapping cells, it is especially difficult for the algorithm to do segmentation accurately. Also, segmentation mistakes will arise in cases where brightness is locally uneven over the tissue areas or background impurities and hangover of cell movement were introduced during experiments (see examples below).

  • From the perspective of the algorithm itself, training of automatic segmentation was done on specific datasets with manually assigned labels. Hence, the algorithm could perform poorly in identifying some particularly rare cell morphology that is not encompassed in the datasets.

  • If the algorithm segmentation does not work well, users can manually adjust results using StereoMap, a desktop image processing software, or try to do it again with other algorithms. Then import the binary cell mask file into StereoMap Image Processing for further analysis.

Blurimg6.002d84d5

Overexposure

img7.1ad5aa84

Abnormal shapes, like fibers or clumps

img8.769c0dce


Hangover

img9.e610e2a1


Bubble

img10.b7bde687


Background impurity

img11.ab239510


Local uneven brightness

img12.2287ff52



Cells of special forms

img13.5ffff81f





Q What's the principle behind IF image QC?
A

Our current quality check strategy for IF images requires a paired DAPI image to be input together. The assessment contents include track line recognition of DAPI image, evaluation of microscope stitching for DAPI/IF images, and calibration between DAPI and IF images based on tissue morphology.

The detected track lines from the DAPI image during the QC step provide a fiducial reference frame for automatic image registration with the chip. Microscope stitching evaluation is used to determine whether there are obvious stitching errors in the microscope-stitched global image, guaranteeing the quality of subsequent tissue segmentation and alignment. Calibration evaluation is aimed to ensure that IF images can be processed in the same way as the DAPI image in terms of stitching, rotation, scaling, translation, and flip, and finally register the IF images with the expression matrix.

However, it is possible that the IF images have dissimilar tissue morphology with DAPI, which might fail calibration QC. In such cases, ImageStudio can be used to make adjustments pairwisely with the "Calibration" module.

img4.263b9b3e

In the situation where DAPI image fails QC for track line recognition and microscope stitching, the related IF images can not be further processed automatically.


Q How are immunofluorescence (IF) images mapped to the gene expression matrix?
A


The alignment between IF image and the spatial gene expression matrix is achieved indirectly by taking the DAPI image as a reference frame.

DAPI and IF images of the same tissue slice were shot back to back by switching channels. With the chip fixed during imaging, DAPI and IF images share the same stitching, scale, and angle parameters as compared to those of the spatial gene expression map. So the information used for DAPI image stitching, rotation, scaling, translation, and transformation can be applied to image processing of the IF layer as well, including alignment with the expression matrix.


Q How to remove rRNA alignments during analysis? Can rRNA sequences that need to be removed be specified manually?
A

It is allowed to manually add rRNA sequences to the reference genome FASTA file, followed by rebuilding reference indices. With rRNARemove switch on, SAW mapping will filter out the reads that are mapped to rRNA sequences. rRNA filtering function is recently added in SAW v6.0.

Rules to add rRNA sequence: include rRNA sequences to filter out in the FASTA file, and append '_rRNA' at the end of the usual sequence name starting with ">", for program identification. Examples are as follows:

img2.4ba7ac55

Add a row of "rRNAremove" to bcPara file prior to running SAW mapping . Examples are as follows:

Plain Text
in=<mask>
in1=<lane_read_1.fq.gz>
in2=<lane_read_2.fq.gz>
barcodeReadsCount=<lane.barcodeReadsCount.txt>
barcodeStart=0
barcodeLen=25
umiStart=25
umiLen=10
umiRead=1
mismatch=1
bcNum=<CIDCount>
polyAnum=15
mismatchInPolyA=2
rRNAremove

If a query read has been mapped to a particular rRNA sequence, the 3rd column of the alignment record displays the corresponding RNAME with a suffix of "_rRNA" as the sequence names in the reference genome, and the optional field in the 12th column has XF:i tag set as 3. The ratio of rRNA will be computed according to XF tag records during the following annotation step.

img3.c710d19a



Q Is there any helpful tool for checking for errors in annotation files?
A

The checkGTF tool of SAW sif has been developed for such a purpose. The execution commands are shown as follows:

Bash
## export SINGULARITY_BIND="/path/to/input/dir,/path/to/output/dir"
singularity exec SAW.sif checkGTF \
    -i <input.gtf/gff> \  ## GTF/GFF file input to be checked
        -o <output.gtf/gff>  ## [optional]. Set to output revised GTF/GFF file. Be aware that this may remove some genes which do not meet the requirements and cannot be fixed.

Gene annotation records that can not be fixed by the program will be removed from the output. But these records will be written into the log file. Please rectify the incorrect items and run the program again.


Q Why are most genes in the annotation file not annotated?
A
  • It is possible that the input annotation file does not conform to the norms. Please double check according to the file format requirements mentioned above.

  • Another possibility is that the forward/reverse symbols of the strand are not in the right format. Strand values in annotation files should only be either "+" (forward) or "-" (reverse), do not confuse "-" (hyphen) with "_" (underline)

  • If there are direction inconsistencies with genes that have same name and come from the same chromosome, the annotation file will be regarded as abnormal, and all genes of this kind will be discarded.


Q How to deal with the error reporting "Fatal INPUT FILE error, no valid exon lines in the GTF file" during reference genome indexing
A

One possibility is that GTF/GFF annotation files are not completely consistent with genome FASTA files in terms of chromosome naming. Please keep the chromosome name unified.

Q What are the situations where corresponding genes are omitted while reading annotation files?
A

There are no attributes of gene_name gene_id transcript_name transcript_id in gtf format (only gene_name and gene_id are needed for each gene)

There are no attributes of ID Name Parent in gff format (Parent is not needed for gene entities)

Multiple gene IDs are assigned to the same gene, as printed by log "Multiple gene IDs for gene xxx: id1, id2..."

Both forward and reverse strands are assigned to the same gene, as printed by log "Strand disagreement for gene xxx - skipping"

No transcript_id for transcript/exon, as printed by log "Record does not have transcriptID for gene xxx"

If a gene has multiple transcripts and the same transcript_id / ID, as printed by log "Transcript appears more than once for xxx"

start > end for some exons, as printed by log "Exon has 0 or negative extent for xxx"

There is overlap between exons of the same transcript, as printed by log "Exons overlap for xxx"

A gene has no transcript present, as printed by log "No transcript for gene xxx"

ps: One contig with multiple genes sharing the same gene_name will merge them into one.


Reach out to Us
Discover the power of Stereo-seq
Consult