Users may refer to this section as a format for testing SAW process, of the files show in this chapter as the reference in testing SAW pipelines. This chapter includes the statistics results and examples of critical files for each key step.
SN:SS200000135TL_D1
"…" in the demo stands for some lines of log information that can be omitted.
1. mapping
1.1 Statistical Report for CID Mapping and Filtering
$ cat /path/to/output/01.mapping/E100026571_L01_trim_read_1_barcodeMap.stat
...
getBarcodePositionMap_uniqBarcodeTypes: 645784920
total_reads: 1002214171
reads_with_polyA: 131113905 13.08%
reads_filteredByPolyA: 22008148 2.20%
mapped_reads: 826344259 82.45%
reads_with_adapter: 9007116 0.90%
reads_with_dnb: 42264284 4.22%
barcode_exactlyOverlap_reads: 682746301 68.12%
barcode_misOverlap_reads: 143590127 14.33%
barcode_withN_reads: 7831 0.00%
Q10_bases_in_barcode: 99.54%
Q20_bases_in_barcode: 97.49%
Q30_bases_in_barcode: 91.74%
Q10_bases_in_umi: 99.26%
Q20_bases_in_umi: 96.32%
Q30_bases_in_umi: 89.45%
Q10_bases_in_seq: 99.47%
Q20_bases_in_seq: 97.12%
Q30_bases_in_seq: 91.08%
umi_filter_reads: 8265089 0.82%
umi_with_N_reads: 13025 0.00%
umi_with_polyA_reads: 12365 0.00%
umi_with_low_quality_base_reads: 8239699 0.82%
mapped_dnbs: 75619113
...
1.2 Statistical Report for Reference Genome Alignment
$ cat /path/to/output/01.mapping/E100026571_L01_trim_read_1.Log.final.out
...
Number of input reads | 766807770
Average input read length | 95
UNIQUE READS:
Uniquely mapped reads number | 643871246
Uniquely mapped reads % | 83.97%
Average mapped length | 95.21
Number of splices: Total | 67595584
Number of splices: Annotated (sjdb) | 65674308
Number of splices: GT/AG | 66407685
Number of splices: GC/AG | 457595
Number of splices: AT/AC | 41563
Number of splices: Non-canonical | 688741
Mismatch rate per base, % | 0.50%
Deletion rate per base | 0.07%
Deletion average length | 3.91
Insertion rate per base | 0.03%
Insertion average length | 1.25
MULTI-MAPPING READS:
Number of reads mapped to multiple loci | 87649341
% of reads mapped to multiple loci | 11.43%
Number of reads mapped to too many loci | 5301054
% of reads mapped to too many loci | 0.69%
UNMAPPED READS:
Number of reads unmapped: too many mismatches | 0
% of reads unmapped: too many mismatches | 0.00%
Number of reads unmapped: too short | 28773993
% of reads unmapped: too short | 3.75%
Number of reads unmapped: other | 1212136
% of reads unmapped: other | 0.16%
CHIMERIC READS:
Number of chimeric reads | 0
% of chimeric reads | 0.00%
1.3 Example of mapping BAM
$ samtools view /path/to/output/01.mapping/E100026571_L01_trim_read_1.Aligned.sortedByCoord.out.bam | head -2
E100026571L1C007R00303973559 256 1 3000644 3 100M * 0 0 GCCTCATTGTGCCCCATATGTTTGCCTATGTTGTGGACTTATTTTCATTAAACTTTAAAACATCTTTAATTTTTTTCTTTATTTCATCATTGACCAAGCT -FCA9D?GFFD<-D<cgfegd-dg*fgfdfbe;e(9bgge38fffg9gg;0?ggfgb?e@g:ggg3gf79f0ggdg?g
2. merge
2.1 Example of Mapped CID List with Reads Count File
$ head /path/to/output/02.merge/SS200000135TL_D1.barcodeReadsCount.txt
7127 18002 48
4348 19028 1
14130 8635 1
7618 14537 24
4912 10945 5
16783 12914 1
15539 8177 1
9288 8082 14
7274 16533 59
9087 10657 10
3. count
3.1 Statistical Report for MID Filtering and Gene Annotation
$ cat /path/to/output/03.count/SS200000135TL_D1.Aligned.sortedByCoord.out.merge.q10.dedup.target.bam.summary.stat
## FILTER & DEDUPLICATION METRICS
TOTAL_READS PASS_FILTER ANNOTATED_READS UNIQUE_READS FAIL_FILTER_RATE FAIL_ANNOTATE_RATE DUPLICATION_RATE
731520587 643871246 532386027 108123310 11.98 17.31 79.69
## ANNOTATION METRICS
TOTAL_READS MAP EXONIC INTRONIC INTERGENIC TRANSCRIPTOME ANTISENSE
643871246 643871246 483163052 49222975 111485219 532386027 109940618
100.0 100.0 75.0 7.6 17.3 82.7 17.1
3.2 Example of Annotated BAM
$ samtools view /path/to/output/03.count/SS200000135TL_D1.Aligned.sortedByCoord.out.merge.q10.dedup.target.bam | head -2
E100026571L1C003R03702347721 0 1 3001778 255 100M * 0 0 GTATGACATCTGTCCAGGATCTTCTAGCTTTCATAGTCTCTGGTGAGAAGTCTGGAGTAATTCTAATAGGCCTGCATTTATATGTTACTTGACCTTTTTC EEFEDFFEFFFFEFFFFEC@EFFFFDFFEEFFEFFFFCFCEFFAFBFCED??FGBEFFDC:FFFDCFAF4FAFFDFFDG?DFBD.F@FECA/FEDEFFAA NH:i:1 HI:i:1 AS:i:92 nM:i:3 Cx:i:12136 Cy:i:14034 UR:Z:C0808 XF:i:2
E100026571L1C005R02302788444 528 1 3016331 0 100M * 0 0 TTTATGTGGAGTTCCTTAATCCACTTAGATTTGACCTTAGTACAAGGAGATAGGAATGGATCAATTCGCATTCTTCTACATGATAACAGCCAGTTGTACC ;FDF>FCFFEAD:FFEBF=@FFDEEFFFC@EFCEFDDFFCE?FDFF7EEECFDEFFFCEFCCEEDEEEFEFBFEEFFDEEFFFEEDFFEDFEEEEFFEED NH:i:5 HI:i:1 AS:i:96 nM:i:1 Cx:i:6628 Cy:i:7872 UR:Z:EDFF9
3.3 Example of count Gene Expression Matrix
$ h5dump -n /path/to/output/03.count/SS200000135TL_D1.raw.gef
HDF5 "/path/to/output/03.count/SS200000135TL_D1.raw.gef" {
FILE_CONTENTS {
group /
group /geneExp
group /geneExp/bin1
dataset /geneExp/bin1/exon
dataset /geneExp/bin1/expression
dataset /geneExp/bin1/gene
}
}
$ h5dump -d /geneExp/bin1/expression /path/to/output/03.count/SS200000135TL_D1.raw.gef | head -15
HDF5 "/path/to/output/03.count/SS200000135TL_D1.raw.gef" {
DATASET "/geneExp/bin1/expression" {
DATATYPE H5T_COMPOUND {
H5T_STD_U32LE "x";
H5T_STD_U32LE "y";
H5T_STD_U8LE "count";
}
DATASPACE SIMPLE { ( 76041339 ) / ( 76041339 ) }
DATA {
(0): {
4888,
10392,
1
},
(1): {
$ h5dump -d /geneExp/bin1/gene /path/to/output/03.count/SS200000135TL_D1.raw.gef | head -20
HDF5 "/path/to/output/03.count/SS200000135TL_D1.raw.gef" {
DATASET "/geneExp/bin1/gene" {
DATATYPE H5T_COMPOUND {
H5T_STRING {
STRSIZE 32;
STRPAD H5T_STR_NULLTERM;
CSET H5T_CSET_ASCII;
CTYPE H5T_C_S1;
} "gene";
H5T_STD_U32LE "offset";
H5T_STD_U32LE "count";
}
DATASPACE SIMPLE { ( 24661 ) / ( 24661 ) }
DATA {
(0): {
"Gm1992",
0,
132
},
(1): {
3.4 Example of count Sampling File
$ head -8 /path/to/output/03.count/SS200000135TL_D1_raw_barcode_gene_exp.txt
y x geneIndex MIDIndex readCount
10392 4888 10551 665954 4
7096 8901 10551 881671 1
7096 8901 10551 357383 20
18783 7397 10551 355789 1
13032 9155 10551 297666 1
13032 9155 10551 298690 1
11778 10617 10551 686313 4
4. register and imageTools
4.1 Registered Image
File /path/to/output/04.register/fov_stitched_transformed.tif and /path/to/output/04.register/SS200000135TL_D1_regist.tif.
/path/to/output/04.register/fov_stitched_transformed.tif
/path/to/output/04.register/SS200000135TL_D1_regist.tif
4.2 Image Process Record File
h5dump -n /path/to/output/04.register/SS200000135TL_D1_20220527_201353_1.1.0.ipr
HDF5 "/path/to/output/04.register/SS200000135TL_D1_20220527_201353_1.1.0.ipr" {
FILE_CONTENTS {
group /
group /CellSeg
dataset /CellSeg/CellMask
group /ImageInfo
dataset /ImageInfo/RGBScale
group /ManualState
dataset /Preview
group /QCInfo
group /QCInfo/CrossPoints
dataset /QCInfo/CrossPoints/0_0
...
dataset /QCInfo/CrossPoints/9_7
dataset /QCInfo/TrackDistanceTemplate
group /Register
dataset /Register/MatrixTemplate
group /StereoResepSwitch
group /Stitch
group /Stitch/BGIStitch
dataset /Stitch/BGIStitch/StitchedGlobalLoc
group /Stitch/ScopeStitch
dataset /Stitch/ScopeStitch/GlobalLoc
group /Stitch/StitchEval
dataset /Stitch/StitchEval/StitchEvalH
dataset /Stitch/StitchEval/StitchEvalV
dataset /Stitch/TemplatePoint
dataset /Stitch/TransformTemplate
group /TissueSeg
dataset /TissueSeg/TissueMask
}
}
$ h5dump -A /path/to/output/04.register/SS200000135TL_D1_20220527_201353_1.1.0.ipr | head -20
HDF5 "/path/to/output/04.register/SS200000135TL_D1_20220527_201353_1.1.0.ipr" {
GROUP "/" {
ATTRIBUTE "IPRVersion" {
DATATYPE H5T_STRING {
STRSIZE H5T_VARIABLE;
STRPAD H5T_STR_NULLTERM;
CSET H5T_CSET_UTF8;
CTYPE H5T_C_S1;
}
DATASPACE SCALAR
DATA {
(0): "0.0.1"
}
}
GROUP "CellSeg" {
ATTRIBUTE "CellSegShape" {
DATATYPE H5T_STD_I64LE
DATASPACE SIMPLE { ( 2 ) / ( 2 ) }
DATA {
(0): 21482, 22337
4.3 ImageTools merge
Merged image of microscopy image SS200000135TL_D1_regist.tif and tissue segmentation mask file SS200000135TL_D1_tissue_cut.tif to check tissue segmentation performance.
Part of merged image of microscopy image SS200000135TL_D1_regist.tif and cell segmentation mask file SS200000135TL_D1_mask.tif to check cell segmentation performance.
4.4 ImageTools overlay
Stack stitching template onto the fov_stitched_transformed.tif to check the result of stitching.
Stack registration template onto the SS200000135TL_D1_register.tif to check the result of registration.
5. tissueCut
5.1 Statistical Report for Tissue Covered Region
$ cat /path/to/output/05.tissuecut/tissuecut.stat
# Tissue Statistic Analysis with Stain Image
Contour_area: 88637560
Number_of_DNB_under_tissue: 36679634
Ratio: 41.38%
Total_gene_type: 24299
MID_counts: 89816137
Fraction_MID_in_spots_under_tissue: 83.07%
Reads_under_tissue: 648371996
Fraction_reads_in_spots_under_tissue: 78.46%
binSize=1
Mean_reads_per_spot: 17.68
Median_reads_per_spot: 11.00
Mean_gene_type_per_spot: 1.71
Median_gene_type_per_spot: 1
Mean_Umi_per_spot: 2.45
Median_Umi_per_spot: 2
binSize=50
Mean_reads_per_spot: 18045.92
Median_reads_per_spot: 16198.00
Mean_gene_type_per_spot: 1151.22
Median_gene_type_per_spot: 1117
Mean_Umi_per_spot: 2499.82
Median_Umi_per_spot: 2309
binSize=100
Mean_reads_per_spot: 71116.81
Median_reads_per_spot: 64454.00
Mean_gene_type_per_spot: 3083.32
Median_gene_type_per_spot: 3081
Mean_Umi_per_spot: 9851.50
Median_Umi_per_spot: 9066
binSize=150
Mean_reads_per_spot: 157601.36
Median_reads_per_spot: 143773.00
Mean_gene_type_per_spot: 4891.22
Median_gene_type_per_spot: 5029
Mean_Umi_per_spot: 21831.83
Median_Umi_per_spot: 20242
binSize=200
Mean_reads_per_spot: 276727.25
Median_reads_per_spot: 254272.00
Mean_gene_type_per_spot: 6403.27
Median_gene_type_per_spot: 6719
Mean_Umi_per_spot: 38333.82
Median_Umi_per_spot: 35679
5.2 Example of Gene Expression Matrix for Tissue Covered Region
$ h5dump -n /path/to/output/05.tissuecut/SS200000135TL_D1.tissue.gef
HDF5 "/path/to/output/05.tissuecut/SS200000135TL_D1.tissue.gef" {
FILE_CONTENTS {
group /
group /geneExp
group /geneExp/bin1
dataset /geneExp/bin1/exon
dataset /geneExp/bin1/expression
dataset /geneExp/bin1/gene
}
$ h5dump -d /geneExp/bin1/expression /path/to/output/05.tissuecut/SS200000135TL_D1.tissue.gef | head -15
HDF5 "/path/to/output/05.tissuecut/SS200000135TL_D1.tissue.gef" {
DATASET "/geneExp/bin1/expression" {
DATATYPE H5T_COMPOUND {
H5T_STD_U32LE "x";
H5T_STD_U32LE "y";
H5T_STD_U8LE "count";
}
DATASPACE SIMPLE { ( 62647604 ) / ( 62647604 ) }
DATA {
(0): {
4888,
10392,
1
},
(1): {
h5dump -d /geneExp/bin1/gene /path/to/output/05.tissuecut/SS200000135TL_D1.tissue.gef | head -20
HDF5 "/path/to/output/05.tissuecut/SS200000135TL_D1.tissue.gef" {
DATASET "/geneExp/bin1/gene" {
DATATYPE H5T_COMPOUND {
H5T_STRING {
STRSIZE 32;
STRPAD H5T_STR_NULLPAD;
CSET H5T_CSET_ASCII;
CTYPE H5T_C_S1;
} "gene";
H5T_STD_U32LE "offset";
H5T_STD_U32LE "count";
}
DATASPACE SIMPLE { ( 24299 ) / ( 24299 ) }
DATA {
(0): {
"Gm1992\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000",
0,
112
},
(1): {
5.3 Example of Gene Expression Matrix for a complete GEF
$ h5dump -n /path/to/output/05.tissuecut/SS200000135TL_D1.gef
HDF5 "/path/to/output/05.tissuecut/SS200000135TL_D1.gef" {
FILE_CONTENTS {
group /
group /geneExp
group /geneExp/bin1
dataset /geneExp/bin1/exon
dataset /geneExp/bin1/expression
dataset /geneExp/bin1/gene
group /geneExp/bin10
dataset /geneExp/bin10/exon
dataset /geneExp/bin10/expression
dataset /geneExp/bin10/gene
group /geneExp/bin100
dataset /geneExp/bin100/exon
dataset /geneExp/bin100/expression
dataset /geneExp/bin100/gene
group /geneExp/bin20
dataset /geneExp/bin20/exon
dataset /geneExp/bin20/expression
dataset /geneExp/bin20/gene
group /geneExp/bin200
dataset /geneExp/bin200/exon
dataset /geneExp/bin200/expression
dataset /geneExp/bin200/gene
group /geneExp/bin50
dataset /geneExp/bin50/exon
dataset /geneExp/bin50/expression
dataset /geneExp/bin50/gene
group /geneExp/bin500
dataset /geneExp/bin500/exon
dataset /geneExp/bin500/expression
dataset /geneExp/bin500/gene
group /stat
dataset /stat/gene
group /wholeExp
dataset /wholeExp/bin1
dataset /wholeExp/bin10
dataset /wholeExp/bin100
dataset /wholeExp/bin20
dataset /wholeExp/bin200
dataset /wholeExp/bin50
dataset /wholeExp/bin500
group /wholeExpExon
dataset /wholeExpExon/bin1
dataset /wholeExpExon/bin10
dataset /wholeExpExon/bin100
dataset /wholeExpExon/bin20
dataset /wholeExpExon/bin200
dataset /wholeExpExon/bin50
dataset /wholeExpExon/bin500
}
}
$ h5dump -d /stat/gene /path/to/output/05.tissuecut/SS200000135TL_D1.gef | head -20
HDF5 "/path/to/output/05.tissuecut/SS200000135TL_D1.gef" {
DATASET "/stat/gene" {
DATATYPE H5T_COMPOUND {
H5T_STRING {
STRSIZE 32;
STRPAD H5T_STR_NULLTERM;
CSET H5T_CSET_ASCII;
CTYPE H5T_C_S1;
} "gene";
H5T_STD_U32LE "MIDcount";
H5T_IEEE_F32LE "E10";
}
DATASPACE SIMPLE { ( 24661 ) / ( 24661 ) }
DATA {
(0): {
"Gm42418",
5861037,
60.1033
},
(1): {
6. cellCut
6.1 Example of Gene Expression Matrix for Cell Bins
$ h5dump -n /path/to/output/051.cellcut/SS200000135TL_D1.cellbin.gef
HDF5 "/path/to/output/051.cellcut/SS200000135TL_D1.cellbin.gef" {
FILE_CONTENTS {
group /
group /cellBin
dataset /cellBin/blockIndex
dataset /cellBin/blockSize
dataset /cellBin/cell
dataset /cellBin/cellBorder
dataset /cellBin/cellExon
dataset /cellBin/cellExp
dataset /cellBin/cellExpExon
dataset /cellBin/cellTypeList
dataset /cellBin/gene
dataset /cellBin/geneExon
dataset /cellBin/geneExp
dataset /cellBin/geneExpExon
}
}
7. saturation
$ cat /path/to/output/07.saturation/sequence_saturation.tsv
sample bar_x bar_y1 bar_y2 bar_umi bin_x bin_y1 bin_y2 bin_umi
0.05 26619302 0.250959 1 19938952 26619302 0.27571 3270 7613
0.1 53238604 0.390241 1 32462699 53238604 0.41122 4268 12394
0.2 106477208 0.543149 1 48644210 106477208 0.558617 5215 18573
0.3 159715808 0.625887 1 59751787 159715808 0.638094 5693 22814
0.4 212954416 0.67839 1 68488171 212954416 0.688522 5995 26150
0.5 266193008 0.714813 1 75914701 266193008 0.723539 6204 28985
0.6 319431616 0.741736 1 82497808 319431616 0.749427 6378 31499
0.7 372670208 0.76249 1 88513055 372670208 0.769402 6517 33795
0.8 425908832 0.779116 1 94076279 425908832 0.78542 6642 35920
0.9 479147392 0.792733 1 99311385 479147392 0.798541 6747 37918
1 532386027 0.804159 1 104262941 532386027 0.809561 6840 39472
8. report
8.1 Example of Statistical Summary Report
cat /path/to/output/08.report/SS200000135TL_D1.statistics.json
{
"version": "version_v2",
"1.Filter_and_Map": {
"1.1.Adapter_Filter": [
{
"Sample_id": "E100026571_L01_trim_read_1",
"getCIDPositionMap_uniqCIDTypes": "645784920",
"total_reads": "1002214171",
"mapped_reads": "826344259(82.45%)",
"CID_misOverlap_reads": "143590127(14.33%)",