MultiQC Report

General Statistics

Showing ²⁶/₂₆ rows and ¹⁵/₂₄ columns.

Sample Name	Error rate	Non-primary	Reads mapped	% Mapped	% Proper pairs	% MapQ 0 reads	Total seqs	Mean insert	≥ 1X	≥ 5X	≥ 10X	≥ 30X	≥ 50X	Median	Vars	SNP	Indel	Ts/Tv	MNP	Multiallelic	Multiallelic SNP	Change rate	Ts/Tv	M Variants
HCC1395_BL	0.45%	0.0M	2.0M	100.0%	99.9%	13.0%	2.0M	301.7bp	100.0%	100.0%	100.0%	100.0%	99.0%	149X	4	0	0	0.00	0	0	0
HCC1395_BL.deconflicted_germline															354	334	20	3.58	0	0	0
HCC1395_BL.deconflicted_germline_custom.ann_snpEff																						131949	3.461	0.00M
HCC1395_BL_custom.ann_snpEff																						193675569	0.000	0.00M
HCC1395_tumor	0.45%	0.0M	5.3M	100.0%	99.9%	13.1%	5.3M	292.3bp	100.0%	100.0%	100.0%	100.0%	100.0%	334X
HCC1395_tumor_vs_HCC1395_BL.manta.diploid_sv															3	0	0	0.00	0	0	0
HCC1395_tumor_vs_HCC1395_BL.manta.diploid_sv_custom.ann_snpEff																						15569994	0.000	0.00M
HCC1395_tumor_vs_HCC1395_BL.manta.somatic_sv															2	0	0	0.00	0	0	0
HCC1395_tumor_vs_HCC1395_BL.manta.somatic_sv_custom.ann_snpEff																						23354991	0.000	0.00M
HCC1395_tumor_vs_HCC1395_BL.strelka.somatic_indels															42	0	42	0.00	0	0	0
HCC1395_tumor_vs_HCC1395_BL.strelka.somatic_indels_custom.ann_snpEff																						1112142	0.000	0.00M
HCC1395_tumor_vs_HCC1395_BL.strelka.somatic_snvs															300	300	0	1.07	0	0	0
HCC1395_tumor_vs_HCC1395_BL.strelka.somatic_snvs_custom.ann_snpEff																						155699	1.069	0.00M
Sig_18_Blood	0.46%	0.0M	2.5M	100.0%	100.0%	12.4%	2.5M	309.3bp	100.0%	100.0%	100.0%	100.0%	99.0%	195X	4	0	0	0.00	0	0	0
Sig_18_Blood.deconflicted_germline															395	369	26	2.93	0	0	0
Sig_18_Blood.deconflicted_germline_custom.ann_snpEff																						118253	2.865	0.00M
Sig_18_Blood_custom.ann_snpEff																						193675569	0.000	0.00M
Sig_18_tissue	0.42%	0.0M	12.3M	100.0%	99.9%	12.0%	12.3M	239.5bp	100.0%	100.0%	100.0%	100.0%	100.0%	838X
custom_Sig_18_tumor_normal.manta.diploid_sv															6	0	3	0.00	0	0	0
custom_Sig_18_tumor_normal.manta.diploid_sv_custom.ann_snpEff																						7784997	0.000	0.00M
custom_Sig_18_tumor_normal.manta.somatic_sv															0	0	0	0.00	0	0	0
custom_Sig_18_tumor_normal.manta.somatic_sv_custom.ann_snpEff																						0	0.000	0.00M
custom_Sig_18_tumor_normal.strelka.somatic_indels															88	0	88	0.00	0	0	0
custom_Sig_18_tumor_normal.strelka.somatic_indels_custom.ann_snpEff																						530795	0.000	0.00M
custom_Sig_18_tumor_normal.strelka.somatic_snvs															237	237	0	1.24	0	0	0
custom_Sig_18_tumor_normal.strelka.somatic_snvs_custom.ann_snpEff																						197088	1.236	0.00M

Expand table

Uncheck the tick box to hide columns. Click and drag the handle on the left to change order. Table ID: general_stats_table_table

Sort	Group	Column	Description	ID	Scale
\|\|	Samtools Flagstat: stats	Error rate	Error rate: mismatches (NM) / bases mapped (CIGAR)	`samtools_flagstat_stats-error_rate`
\|\|	Samtools Flagstat: stats	Non-primary	Non-primary alignments (millions)	`samtools_flagstat_stats-non_primary_alignments`	read_count
\|\|	Samtools Flagstat: stats	Reads mapped	Reads mapped in the bam file (millions)	`samtools_flagstat_stats-reads_mapped`	read_count
\|\|	Samtools Flagstat: stats	% Mapped	% Mapped reads	`samtools_flagstat_stats-reads_mapped_percent`
\|\|	Samtools Flagstat: stats	% Proper pairs	% Properly paired reads	`samtools_flagstat_stats-reads_properly_paired_percent`
\|\|	Samtools Flagstat: stats	% MapQ 0 reads	% of reads that are ambiguously placed (MapQ=0)	`samtools_flagstat_stats-reads_MQ0_percent`
\|\|	Samtools Flagstat: stats	Total seqs	Total sequences in the bam file (millions)	`samtools_flagstat_stats-raw_total_sequences`	read_count
\|\|	Samtools Flagstat: stats	Mean insert	Average insert size	`samtools_flagstat_stats-insert_size_average`
\|\|	Mosdepth	≥ 1X	Fraction of genome with at least 1X coverage	`mosdepth-1_x_pc`
\|\|	Mosdepth	≥ 5X	Fraction of genome with at least 5X coverage	`mosdepth-5_x_pc`
\|\|	Mosdepth	≥ 10X	Fraction of genome with at least 10X coverage	`mosdepth-10_x_pc`
\|\|	Mosdepth	≥ 30X	Fraction of genome with at least 30X coverage	`mosdepth-30_x_pc`
\|\|	Mosdepth	≥ 50X	Fraction of genome with at least 50X coverage	`mosdepth-50_x_pc`
\|\|	Mosdepth	Median	Median coverage	`mosdepth-median_coverage`
\|\|	Bcftools: Stats	Vars	Variations total	`bcftools_stats-number_of_records`
\|\|	Bcftools: Stats	SNP	Variation SNPs	`bcftools_stats-number_of_SNPs`
\|\|	Bcftools: Stats	Indel	Variation Insertions/Deletions	`bcftools_stats-number_of_indels`
\|\|	Bcftools: Stats	Ts/Tv	Variant SNP transition / transversion ratio	`bcftools_stats-tstv`
\|\|	Bcftools: Stats	MNP	Variation multinucleotide polymorphisms	`bcftools_stats-number_of_MNPs`
\|\|	Bcftools: Stats	Multiallelic	Variation sites with multiple alleles	`bcftools_stats-number_of_multiallelic_sites`
\|\|	Bcftools: Stats	Multiallelic SNP	Variation sites with multiple SNPs	`bcftools_stats-number_of_multiallelic_SNP_sites`
\|\|	SNPeff	Change rate	Change rate	`snpeff-Change_rate`
\|\|	SNPeff	Ts/Tv	Transitions / Transversions ratio	`snpeff-Ts_Tv_ratio`
\|\|	SNPeff	M Variants	Number of variants before filter (millions)	`snpeff-Number_of_variants_before_filter`

altera/sarek Workflow Summary

this information is collected when the pipeline is started.https://github.com/altera/sarek

Input/output options

input: s3://natera-platform-sandbox/pipeline-inputs/test_sarek/end_to_end_regression/samplesheet/workorder.csv
outdir: s3://natera-rnd-pltf-dev-s3-gitlab-results/sarek/build2810611/regression-run/63904030

Main options

hrd_coverage_profile: s3://natera-platform-sandbox/pipeline-resources/AIH/hrd_score_altera_bam/dbsnp_baseline_altera.260120.tsv.gz
intervals_dir: s3://natera-platform-sandbox/pipeline-resources/ngi-igenomes/igenomes//Homo_sapiens/GATK/GRCh38/Annotation/intervals/regression_intervals
intervals_vc_dir: s3://natera-platform-sandbox/pipeline-resources/ngi-igenomes/igenomes//Homo_sapiens/GATK/GRCh38/Annotation/intervals/regression_padded_intervals
mimsi_microsatellites_list: s3://natera-platform-sandbox/pipeline-resources/mimsi/1500_dropped_panel_with_boosted_msi_regions.tsv
mimsi_model: s3://natera-platform-sandbox/pipeline-resources/mimsi/mi_msi_v0_4_0_200x_attn.model
tools: ngscheckmate,contamination,tnseq,strelka,manta,msisensor2,mimsi,cnvkit,facets,tmb,hrd,whatshap,snpeff,merge,sentieon_haplotyper_rf,chip_detection,snv_indel,germline_cnv,sex_estimation,somacnv
wes: true

Variant Calling

chip_cohort_blacklist: s3://natera-platform-sandbox/pipeline-resources/ngi-igenomes/igenomes//Homo_sapiens/GATK/GRCh38/Annotation/CHIP/cohort_blacklist.bed.gz
chip_cosmic_heme: s3://natera-platform-sandbox/pipeline-resources/ngi-igenomes/igenomes//Homo_sapiens/GATK/GRCh38/Annotation/CHIP/cosmic_heme.tsv.gz
chip_encode_blacklist: s3://natera-platform-sandbox/pipeline-resources/ngi-igenomes/igenomes//Homo_sapiens/GATK/GRCh38/Annotation/CHIP/encode_blacklist.bed.gz
chip_gene_family_blacklist: s3://natera-platform-sandbox/pipeline-resources/ngi-igenomes/igenomes//Homo_sapiens/GATK/GRCh38/Annotation/CHIP/gene_family_blacklist.bed.gz
chip_gene_tiers: s3://natera-platform-sandbox/pipeline-resources/ngi-igenomes/igenomes//Homo_sapiens/GATK/GRCh38/Annotation/CHIP/gene_tiers.tsv
chip_pon: s3://natera-platform-sandbox/pipeline-resources/ngi-igenomes/igenomes//Homo_sapiens/GATK/GRCh38/Annotation/CHIP_PON/pon.hotspot_protected.raw.vcf.gz
chip_pon_tbi: s3://natera-platform-sandbox/pipeline-resources/ngi-igenomes/igenomes//Homo_sapiens/GATK/GRCh38/Annotation/CHIP_PON/pon.hotspot_protected.raw.vcf.gz.tbi
cnvkit_reference: s3://natera-platform-sandbox/pipeline-resources/ngi-igenomes/igenomes//Homo_sapiens/GATK/GRCh38/Annotation/cnvkit/cnvkit_wes_altera.reference.cnn
pon: s3://natera-platform-sandbox/pipeline-resources/ngi-igenomes/igenomes//Homo_sapiens/GATK/GRCh38/Annotation/PON/pon_tnseq_42_curated_v4.vcf.gz
pon_tbi: s3://natera-platform-sandbox/pipeline-resources/ngi-igenomes/igenomes//Homo_sapiens/GATK/GRCh38/Annotation/PON/pon_tnseq_42_curated_v4.vcf.gz.tbi
pot: s3://natera-platform-sandbox/pipeline-resources/ngi-igenomes/igenomes//Homo_sapiens/GATK/GRCh38/Annotation/POT/aih_tumor_1577_pot_1pct_artifacts_only.vcf.gz
pot_tbi: s3://natera-platform-sandbox/pipeline-resources/ngi-igenomes/igenomes//Homo_sapiens/GATK/GRCh38/Annotation/POT/aih_tumor_1577_pot_1pct_artifacts_only.vcf.gz.tbi

General reference genome options

igenomes_base: s3://natera-platform-sandbox/pipeline-resources/ngi-igenomes/igenomes/
optitype_reference: s3://natera-platform-sandbox/pipeline-resources/ngi-igenomes/igenomes//Homo_sapiens/GATK/GRCh38/Annotation/optitype/original_v3.15_2014/

Reference genome options

blacklist_bed: s3://natera-platform-sandbox/pipeline-resources/ngi-igenomes/igenomes//Homo_sapiens/GATK/GRCh38/Annotation/CustomBEDs/blacklist_grch38.bed.gz
blacklist_bed_tbi: s3://natera-platform-sandbox/pipeline-resources/ngi-igenomes/igenomes//Homo_sapiens/GATK/GRCh38/Annotation/CustomBEDs/blacklist_grch38.bed.gz.tbi
blacklist_header: s3://natera-platform-sandbox/pipeline-resources/ngi-igenomes/igenomes//Homo_sapiens/GATK/GRCh38/Annotation/CustomBEDs/blacklist_header.txt
conpair_markers: s3://natera-platform-sandbox/pipeline-resources/ngi-igenomes/igenomes//Homo_sapiens/GATK/GRCh38/Annotation/Conpair/GRCh38.autosomes.phase3_shapeit2_mvncall_integrated.20130502.SNV.genotype.sselect_v4_MAF_0.4_LD_0.8.liftover.bed
container_registry_seqera: 292967571998.dkr.ecr.us-west-2.amazonaws.com/community.wave.seqera.io
dbsnp: s3://natera-platform-sandbox/pipeline-resources/ngi-igenomes/igenomes//Homo_sapiens/GATK/GRCh38/Annotation/GATKBundle/dbsnp_146.hg38.vcf.gz
dbsnp_tbi: s3://natera-platform-sandbox/pipeline-resources/ngi-igenomes/igenomes//Homo_sapiens/GATK/GRCh38/Annotation/GATKBundle/dbsnp_146.hg38.vcf.gz.tbi
dbsnp_vqsr: --resource:dbsnp,known=false,training=true,truth=false,prior=2.0 dbsnp_146.hg38.vcf.gz
dict: s3://natera-platform-sandbox/pipeline-resources/ngi-igenomes/igenomes//Homo_sapiens/GATK/GRCh38/Sequence/WholeGenomeFasta/Homo_sapiens_assembly38.dict
exome_bed: s3://natera-platform-sandbox/pipeline-inputs/test_sarek/end_to_end_regression/bed/xgen-exome-hyb-panel-v2-targets-hg38_short.mrg_chr21.bed
fasta: s3://natera-platform-sandbox/pipeline-resources/ngi-igenomes/igenomes//Homo_sapiens/GATK/GRCh38/Sequence/WholeGenomeFasta/Homo_sapiens_assembly38.fasta
fasta_fai: s3://natera-platform-sandbox/pipeline-resources/ngi-igenomes/igenomes//Homo_sapiens/GATK/GRCh38/Sequence/WholeGenomeFasta/Homo_sapiens_assembly38.fasta.fai
genome_annotations: s3://natera-platform-sandbox/pipeline-resources/ensembl/Homo_sapiens.GRCh38.110.gtf.gz
germline_rescue_bed: s3://natera-platform-sandbox/pipeline-resources/ngi-igenomes/igenomes//Homo_sapiens/GATK/GRCh38/Annotation/RF_Models/germline_rescue_targets.bed
germline_resource: s3://natera-platform-sandbox/pipeline-resources/ngi-igenomes/igenomes//Homo_sapiens/GATK/GRCh38/Annotation/GATKBundle/af-only-gnomad.hg38.vcf.gz
germline_resource_tbi: s3://natera-platform-sandbox/pipeline-resources/ngi-igenomes/igenomes//Homo_sapiens/GATK/GRCh38/Annotation/GATKBundle/af-only-gnomad.hg38.vcf.gz.tbi
gt_correction_model: s3://natera-platform-sandbox/pipeline-resources/ngi-igenomes/igenomes//Homo_sapiens/GATK/GRCh38/Annotation/RF_Models/gt_correction_model_v2.joblib
gt_correction_threshold_grch38: 0.5
lowdepth_bed: s3://natera-platform-sandbox/pipeline-resources/ngi-igenomes/igenomes//Homo_sapiens/GATK/GRCh38/Annotation/CustomBEDs/low_depth_grch38.tsv.gz
lowdepth_bed_tbi: s3://natera-platform-sandbox/pipeline-resources/ngi-igenomes/igenomes//Homo_sapiens/GATK/GRCh38/Annotation/CustomBEDs/low_depth_grch38.tsv.gz.tbi
lowdepth_header: s3://natera-platform-sandbox/pipeline-resources/ngi-igenomes/igenomes//Homo_sapiens/GATK/GRCh38/Annotation/CustomBEDs/low_depth_header.txt
ngscheckmate_bed: s3://natera-platform-sandbox/pipeline-resources/ngi-igenomes/igenomes//Homo_sapiens/GATK/GRCh38/Annotation/NGSCheckMate/SNP_GRCh38_hg38_wChr.bed
probe_bed: s3://natera-platform-sandbox/pipeline-inputs/test_sarek/end_to_end_regression/bed/xgen-exome-hyb-panel-v2_AND_altera_v3_probes_short_hg38_chr21.bed
repeatmasker_bed: s3://natera-platform-sandbox/pipeline-resources/ngi-igenomes/igenomes//Homo_sapiens/GATK/GRCh38/Annotation/CustomBEDs/repeatmasker_grch38.bed.gz
repeatmasker_bed_tbi: s3://natera-platform-sandbox/pipeline-resources/ngi-igenomes/igenomes//Homo_sapiens/GATK/GRCh38/Annotation/CustomBEDs/repeatmasker_grch38.bed.gz.tbi
repeatmasker_header: s3://natera-platform-sandbox/pipeline-resources/ngi-igenomes/igenomes//Homo_sapiens/GATK/GRCh38/Annotation/CustomBEDs/repeatmasker_header.txt
rf_blacklist_bed: s3://natera-platform-sandbox/pipeline-resources/ngi-igenomes/igenomes//Homo_sapiens/GATK/GRCh38/Annotation/RF_Models/rf_blacklist.bed.gz
rf_blacklist_bed_tbi: s3://natera-platform-sandbox/pipeline-resources/ngi-igenomes/igenomes//Homo_sapiens/GATK/GRCh38/Annotation/RF_Models/rf_blacklist.bed.gz.tbi
rf_boosted_bed: s3://natera-platform-sandbox/pipeline-resources/ngi-igenomes/igenomes//Homo_sapiens/GATK/GRCh38/Annotation/RF_Models/rf_boosted_exons.bed.gz
rf_boosted_bed_tbi: s3://natera-platform-sandbox/pipeline-resources/ngi-igenomes/igenomes//Homo_sapiens/GATK/GRCh38/Annotation/RF_Models/rf_boosted_exons.bed.gz.tbi
rf_indel_fp_bed: s3://natera-platform-sandbox/pipeline-resources/ngi-igenomes/igenomes//Homo_sapiens/GATK/GRCh38/Annotation/RF_Models/indel_fp_regions.bed.gz
rf_indel_fp_bed_tbi: s3://natera-platform-sandbox/pipeline-resources/ngi-igenomes/igenomes//Homo_sapiens/GATK/GRCh38/Annotation/RF_Models/indel_fp_regions.bed.gz.tbi
rf_indel_fp_rates_tsv: s3://natera-platform-sandbox/pipeline-resources/ngi-igenomes/igenomes//Homo_sapiens/GATK/GRCh38/Annotation/RF_Models/indel_locus_fp_rates.tsv
rf_indel_model: s3://natera-platform-sandbox/pipeline-resources/ngi-igenomes/igenomes//Homo_sapiens/GATK/GRCh38/Annotation/RF_Models/indel_rf_model_v7.joblib
rf_low_depth_bed: s3://natera-platform-sandbox/pipeline-resources/ngi-igenomes/igenomes//Homo_sapiens/GATK/GRCh38/Annotation/RF_Models/rf_low_depth.bed.gz
rf_low_depth_bed_tbi: s3://natera-platform-sandbox/pipeline-resources/ngi-igenomes/igenomes//Homo_sapiens/GATK/GRCh38/Annotation/RF_Models/rf_low_depth.bed.gz.tbi
rf_repeatmasker_bed: s3://natera-platform-sandbox/pipeline-resources/ngi-igenomes/igenomes//Homo_sapiens/GATK/GRCh38/Annotation/RF_Models/rf_repeatmasker.bed.gz
rf_repeatmasker_bed_tbi: s3://natera-platform-sandbox/pipeline-resources/ngi-igenomes/igenomes//Homo_sapiens/GATK/GRCh38/Annotation/RF_Models/rf_repeatmasker.bed.gz.tbi
rf_snv_fp_bed: s3://natera-platform-sandbox/pipeline-resources/ngi-igenomes/igenomes//Homo_sapiens/GATK/GRCh38/Annotation/RF_Models/snv_fp_regions.bed.gz
rf_snv_fp_bed_tbi: s3://natera-platform-sandbox/pipeline-resources/ngi-igenomes/igenomes//Homo_sapiens/GATK/GRCh38/Annotation/RF_Models/snv_fp_regions.bed.gz.tbi
rf_snv_model: s3://natera-platform-sandbox/pipeline-resources/ngi-igenomes/igenomes//Homo_sapiens/GATK/GRCh38/Annotation/RF_Models/snv_rf_model_v4.joblib
snpeff_cache: s3://natera-platform-sandbox/pipeline-resources/ngi-igenomes/annotation-cache/snpeff_cache/
snpeff_db: GRCh38.105
target_beds: s3://natera-platform-sandbox/pipeline-inputs/test_sarek/end_to_end_regression/bed/xgen-exome-hyb-panel-v2-targets-hg38_AND_altera_v3_targets_postQC_hg38_chr21.bed,s3://natera-platform-sandbox/pipeline-inputs/test_sarek/end_to_end_regression/bed/altera_v3_targets_coding_postQC_hg38_chr21.bed
vep_cache: s3://natera-platform-sandbox/pipeline-resources/ngi-igenomes/annotation-cache/vep_cache/
vep_cache_version: 113
vep_genome: GRCh38
vep_species: homo_sapiens

Institutional config options

modules_testdata_base_path: s3://natera-platform-sandbox/pipeline-inputs/test_sarek/

Generic options

task_job_queue: Nextflow-OnDemand

Core Nextflow options

configFiles: N/A
container: [GERMLINE_CNV:292967571998.dkr.ecr.us-west-2.amazonaws.com/sarek/altera_cnv:0.5.0, SOMA_CNV_PROBE_COUNTS|SOMA_CNV_ALLELE_COUNTS|SOMA_CNV_NORMALIZE|SOMA_CNV_BUILD_REFERENCE|SOMA_CNV_CALL|SOMA_CNV_EXPORT_VIEWER|SOMA_CNV_BGZIP_TABIX:292967571998.dkr.ecr.us-west-2.amazonaws.com/soma-cnv:20260604-3cb5ec5]
containerEngine: docker
launchDir: /code
profile: docker,test_regression,eks
projectDir: /tmp/home/.nextflow/assets/rd-platform/bioinformatics/nextflow/sarek
runName: gitlab-sarek-build2810611-regression-run-63904030
userName: nextflow
workDir: /natera-rnd-pltf-dev-nextflow-scratch-01/work

Samtools Flagstat

Toolkit for interacting with BAM/CRAM files.http://www.htslib.orgDOI: 10.1093/bioinformatics/btp352

Percent mapped

Alignment metrics from samtools stats; mapped vs. unmapped reads vs. reads mapped with MQ0.

For a set of samples that have come from the same multiplexed library, similar numbers of reads for each sample are expected. Large differences in numbers might indicate issues during the library preparation process. Whilst large differences in read numbers may be controlled for in downstream processings (e.g. read count normalisation), you may wish to consider whether the read depths achieved have fallen below recommended levels depending on the applications.

Low alignment rates could indicate contamination of samples (e.g. adapter sequences), low sequencing quality or other artefacts. These can be further investigated in the sequence level QC (e.g. from FastQC).

Reads mapped with MQ0 often indicate that the reads are ambiguously mapped to multiple locations in the reference sequence. This can be due to repetitive regions in the genome, the presence of alternative contigs in the reference, or due to reads that are too short to be uniquely mapped. These reads are often filtered out in downstream analyses.

Created with MultiQC

Alignment stats

This module parses the output from samtools stats. All numbers in millions.

Created with MultiQC

Sample Name	Total sequences	Mapped & paired	Properly paired	Duplicated	QC Failed	Reads MQ0	Mapped bases (CIGAR)	Bases Trimmed	Duplicated bases	Diff chromosomes	Other orientation	Inward pairs	Outward pairs
HCC1395_BL	2.0M	2.0M	2.0M	0.4M	0.0M	0.3M	281.2Mb	0.0Mb	57.6Mb	0.0M	0.0M	0.9M	0.1M
HCC1395_tumor	5.3M	5.3M	5.3M	2.1M	0.0M	0.7M	767.8Mb	0.0Mb	299.6Mb	0.0M	0.0M	2.4M	0.2M
Sig_18_Blood	2.5M	2.5M	2.5M	0.4M	0.0M	0.3M	358.5Mb	0.0Mb	54.3Mb	0.0M	0.0M	1.1M	0.1M
Sig_18_tissue	12.3M	12.3M	12.3M	3.9M	0.0M	1.5M	1734.8Mb	0.0Mb	546.4Mb	0.0M	0.0M	5.4M	0.8M

Uncheck the tick box to hide columns. Click and drag the handle on the left to change order. Table ID: samtools-stats-dp_table

Sort	Column	Description	ID	Scale
\|\|	Total sequences	Total sequences	`raw_total_sequences`	read_count
\|\|	Mapped & paired	Paired-end technology bit set + both mates mapped	`reads_mapped_and_paired`	read_count
\|\|	Properly paired	Proper-pair bit set	`reads_properly_paired`	read_count
\|\|	Duplicated	PCR or optical duplicate bit set	`reads_duplicated`	read_count
\|\|	QC Failed	QC Failed	`reads_QC_failed`	read_count
\|\|	Reads MQ0	Reads mapped and MQ=0	`reads_MQ0`	read_count
\|\|	Mapped bases (CIGAR)	Mapped bases (CIGAR)	`bases_mapped__cigar`	base_count
\|\|	Bases Trimmed	Bases Trimmed	`bases_trimmed`	base_count
\|\|	Duplicated bases	Duplicated bases	`bases_duplicated`	base_count
\|\|	Diff chromosomes	Pairs on different chromosomes	`pairs_on_different_chromosomes`	read_count
\|\|	Other orientation	Pairs with other orientation	`pairs_with_other_orientation`	read_count
\|\|	Inward pairs	Inward oriented pairs	`inward_oriented_pairs`	read_count
\|\|	Outward pairs	Outward oriented pairs	`outward_oriented_pairs`	read_count

Mosdepth

Fast BAM/CRAM depth calculation for WGS, exome, or targeted sequencing.https://github.com/brentp/mosdepthDOI: 10.1093/bioinformatics/btx699

Cumulative coverage distribution

Proportion of bases in the reference genome with, at least, a given depth of coverage. Note that for 4 samples, a BED file was provided, so the data was calculated across those regions. For 4 samples, it's calculated across the entire genome length. 4 samples have both global and region reports, and we are showing the data for regions

For a set of DNA or RNA reads mapped to a reference sequence, such as a genome or transcriptome, the depth of coverage at a given base position is the number of high-quality reads that map to the reference at that position, while the breadth of coverage is the fraction of the reference sequence to which reads have been mapped with at least a given depth of coverage (Sims et al. 2014).

Defining coverage breadth in terms of coverage depth is useful, because sequencing experiments typically require a specific minimum depth of coverage over the region of interest (Sims et al. 2014), so the extent of the reference sequence that is amenable to analysis is constrained to lie within regions that have sufficient depth. With inadequate sequencing breadth, it can be difficult to distinguish the absence of a biological feature (such as a gene) from a lack of data (Green 2007).

For increasing coverage depths (1×, 2×, …, N×), coverage breadth is calculated as the percentage of the reference sequence that is covered by at least that number of reads, then plots coverage breadth (y-axis) against coverage depth (x-axis). This plot shows the relationship between sequencing depth and breadth for each read dataset, which can be used to gauge, for example, the likely effect of a minimum depth filter on the fraction of a genome available for analysis.

Created with MultiQC

Average coverage per contig

Average coverage per contig or chromosome

Created with MultiQC

XY coverage

Created with MultiQC

Bcftools

Utilities for variant calling and manipulating VCFs and BCFs.https://samtools.github.io/bcftoolsDOI: 10.1093/gigascience/giab008

Variant Substitution Types

Created with MultiQC

Variant Quality

Created with MultiQC

Indel Distribution

Created with MultiQC

Variant depths

Read depth support distribution for called variants

Created with MultiQC

Vcftools

Program to analyse and reporting on VCF files.https://vcftools.github.ioDOI: 10.1093/bioinformatics/btr330

TsTv by Count

Plot of TSTV-BY-COUNT - the transition to transversion ratio as a function of alternative allele count from the output of vcftools TsTv-by-count.

Transition is a purine-to-purine or pyrimidine-to-pyrimidine point mutations. Transversion is a purine-to-pyrimidine or pyrimidine-to-purine point mutation. Alternative allele count is the number of alternative alleles at the site. Note: only bi-allelic SNPs are used (multi-allelic sites and INDELs are skipped.) Refer to Vcftools's manual (https://vcftools.github.io/man_latest.html) on --TsTv-by-count

Created with MultiQC

TsTv by Qual

Plot of TSTV-BY-QUAL - the transition to transversion ratio as a function of SNP quality from the output of vcftools TsTv-by-qual.

Transition is a purine-to-purine or pyrimidine-to-pyrimidine point mutations. Transversion is a purine-to-pyrimidine or pyrimidine-to-purine point mutation. Quality here is the Phred-scaled quality score as given in the QUAL column of VCF. Note: only bi-allelic SNPs are used (multi-allelic sites and INDELs are skipped.) Refer to Vcftools's manual (https://vcftools.github.io/man_latest.html) on --TsTv-by-qual

Created with MultiQC

SNPeff

Annotates and predicts the effects of variants on genes (such as amino acid changes).http://snpeff.sourceforge.netDOI: 10.4161/fly.19695

Variants by Genomic Region

The stacked bar plot shows locations of detected variants in the genome and the number of variants for each location.

The upstream and downstream interval size to detect these genomic regions is 5000bp by default.

Created with MultiQC

Variant Effects by Impact

The stacked bar plot shows the putative impact of detected variants and the number of variants for each impact.

There are four levels of impacts predicted by SnpEff:

High: High impact (like stop codon)
Moderate: Middle impact (like same type of amino acid substitution)
Low: Low impact (ie silence mutation)
Modifier: No impact

Created with MultiQC

Variants by Effect Types

The stacked bar plot shows the effect of variants at protein level and the number of variants for each effect type.

This plot shows the effect of variants with respect to the mRNA.

Created with MultiQC

Variants by Functional Class

The stacked bar plot shows the effect of variants and the number of variants for each effect type.

This plot shows the effect of variants on the translation of the mRNA as protein. There are three possible cases:

Silent: The amino acid does not change.
Missense: The amino acid is different.
Nonsense: The variant generates a stop codon.

Created with MultiQC

Variant Qualities

The line plot shows the quantity as function of the variant quality score.

The quality score corresponds to the QUAL column of the VCF file. This score is set by the variant caller.

Created with MultiQC

VEP

Determines the effect of variants on genes, transcripts and protein sequences, as well as regulatory regions.https://www.ensembl.org/info/docs/tools/vep/index.htmlDOI: 10.1186/s13059-016-0974-4

General Statistics

Table showing general statistics of VEP annotation run

Showing ¹²/₁₂ rows and ⁷/₈ columns.

Sample Name	Overlapped regulatory features	Overlapped transcripts	Overlapped genes	Existing variants	Novel variants	Variants processed	Lines of input read
HCC1395_BL.deconflicted_germline_custom.ann_snpEff_VEP.ann	16	201	193	354	0	354	354
HCC1395_BL_custom.ann_snpEff_VEP.ann	1	6	6	3	0	3	4
HCC1395_tumor_vs_HCC1395_BL.manta.diploid_sv_custom.ann_snpEff_VEP.ann	0	4	4	3	0	3	3
HCC1395_tumor_vs_HCC1395_BL.manta.somatic_sv_custom.ann_snpEff_VEP.ann	0	2	2	2	0	2	2
HCC1395_tumor_vs_HCC1395_BL.strelka.somatic_indels_custom.ann_snpEff_VEP.ann	5	51	51	42	0	42	42
HCC1395_tumor_vs_HCC1395_BL.strelka.somatic_snvs_custom.ann_snpEff_VEP.ann	15	169	162	300	0	300	300
Sig_18_Blood.deconflicted_germline_custom.ann_snpEff_VEP.ann	16	199	192	395	0	395	395
Sig_18_Blood_custom.ann_snpEff_VEP.ann	1	6	6	3	0	3	4
custom_Sig_18_tumor_normal.manta.diploid_sv_custom.ann_snpEff_VEP.ann	0	10	10	6	0	6	6
custom_Sig_18_tumor_normal.manta.somatic_sv_custom.ann_snpEff_VEP.ann	0	0	0			0	0
custom_Sig_18_tumor_normal.strelka.somatic_indels_custom.ann_snpEff_VEP.ann	6	81	81	88	0	88	88
custom_Sig_18_tumor_normal.strelka.somatic_snvs_custom.ann_snpEff_VEP.ann	11	154	151	237	0	237	237

Expand table

Uncheck the tick box to hide columns. Click and drag the handle on the left to change order. Table ID: vep-general-stats_table

Sort	Column	Description	ID	Scale
\|\|	Overlapped regulatory features	Overlapped regulatory features	`vep-Overlapped_regulatory_features`
\|\|	Overlapped transcripts	Overlapped transcripts	`vep-Overlapped_transcripts`
\|\|	Overlapped genes	Overlapped genes	`vep-Overlapped_genes`
\|\|	Existing variants	Existing variants	`vep-Existing_variants`	variants
\|\|	Novel variants	Novel variants	`vep-Novel_variants`	variants
\|\|	Variants filtered out	Variants filtered out	`vep-Variants_filtered_out`	variants
\|\|	Variants processed	Variants processed	`vep-Variants_processed`	variants
\|\|	Lines of input read	Lines of input read	`vep-Lines_of_input_read`

Variant classes

Classes of variants found in the data.

Created with MultiQC

Consequences

Predicted consequences of variations.

Created with MultiQC

SIFT summary

SIFT variant effect prediction.

Created with MultiQC

PolyPhen summary

PolyPhen variant effect prediction.

Created with MultiQC

Variants by chromosome

Number of variants found on each chromosome.

Created with MultiQC

Position in protein

Relative position of affected amino acids in protein.

Created with MultiQC

Software Versions

Software Versions lists versions of software tools extracted from file contents.

Group	Software	Version
AGGREGATE_VARIANT_QC	aggregate_qc_metrics	`2.0.0`
	python	`3.12.6`
ANNOTATE_BED_INFO	bcftools	`1.22`
BCFTOOLS_ANNOTATE	bcftools	`1.2`
BCFTOOLS_CONCAT	bcftools	`1.2`
BCFTOOLS_MPILEUP	bcftools	`1.2`
BCFTOOLS_QUERY	bcftools	`1.22`
BCFTOOLS_STATS	bcftools	`1.2`
BGZIPTABIX_GERMLINE_CNV	tabix	`1.2`
CNVKIT_BATCH	cnvkit	`0.9.10`
	samtools	`1.17`
CNVKIT_CALL	cnvkit	`0.9.10`
CNVKIT_EXPORT	cnvkit	`0.9.10`
CNVKIT_GENEMETRICS	cnvkit	`0.9.10`
CNV_FACETS	facets	`0.6.2`
CONCAT_WHATSHAP_TNSEQ_VCFS	bcftools	`1.2`
CONCAT_WHATSHAP_VCFS	bcftools	`1.2`
CONPAIR_CONTAMINATION	conpair	`1.0`
	numpy	`1.24.4`
	python	`3.8.20`
CONPAIR_PILEUP	conpair	`1.0`
	gatk	`4.6.2.0 Using`
	python	`3.8.20`
DIFF_DV_SENTIEON	cyvcf2	`0.31.1`
	python	`3.11.14`
DV_RESCUE	deepvariant	`1.6.1`
ENSEMBLVEP_VEP	ensemblvep	`113.0`
EXTRACT_CHROMOSOMES	bcftools	`1.22`
EXTRACT_CHROMOSOMES_TNSEQ	bcftools	`1.22`
GT_CORRECTION	bedtools	`2.31.1`
	cyvcf2	`0.31.1`
	lightgbm	`4.6.0`
	python	`3.11.14`
HARD_FILTER_VCF	python	`3.9.15`
INDEL_ENTROPY_ANNOTATION	numpy	`2.4.0`
	pysam	`0.23.3`
	python	`3.13.11`
	scipy	`1.16.3`
LABEL_HOTSPOTS	python	`3.9.15`
MANTA_SOMATIC	manta	`1.6.0`
MERGE_FORCECALLED	bcftools	`1.22`
MERGE_SENTIEON_HAPLOTYPER_VCFS	gatk4	`4.5.0.0`
MERGE_STRELKA_INDELS	gatk4	`4.5.0.0`
MERGE_STRELKA_SNVS	gatk4	`4.5.0.0`
MIMSI_ANALYZE	mimsi	`0.4.5.1`
MSISENSOR2_MSI	msisensor2	`0.1`
Mosdepth	mosdepth	`0.3.8`
N7_COSMIC_SIGS	n7_cosmic_sigs	`0.0.1`
	r-base	`4.3.3`
PREP_STRELKA_INDEL	bcftools	`1.22`
PREP_TNSEQ_SNV	bcftools	`1.22`
SAMTOOLS_MPILEUP	samtools	`1.21`
SAMTOOLS_STATS	samtools	`1.21`
SCARHRD	scarHRD	`0.6.2`
SENTIEON_FORCECALL	sentieon	`202308.03`
SENTIEON_HAPLOTYPER	sentieon	`202308.03`
SENTIEON_HAPLOTYPER_RF_FILTER	bedtools	`2.31.1`
	cyvcf2	`0.31.1`
	lightgbm	`4.6.0`
	python	`3.11.14`
SENTIEON_TNFILTER	sentieon	`202308.03`
SENTIEON_TNHAPLOTYPER2	sentieon	`202308.03`
SNPEFF_SNPEFF	snpeff	`5.1d`
SOMA_CNV_ALLELE_COUNTS	soma-cnv	`20260604-3cb5ec5`
SOMA_CNV_BGZIP_TABIX	htslib	`1.21`
SOMA_CNV_CALL	soma-cnv	`20260604-3cb5ec5`
SOMA_CNV_NORMALIZE	soma-cnv	`20260604-3cb5ec5`
SOMA_CNV_PROBE_COUNTS	soma-cnv	`20260604-3cb5ec5`
SPLIT_MULTIALLELIC	bcftools	`1.22`
STRELKA_SOMATIC	strelka	`2.9.10`
TABIX_BGZIPTABIX	tabix	`1.2`
TABIX_TABIX	tabix	`1.2`
TABIX_VEP	tabix	`1.2`
TIH_HRD_CALLING	hrd_calling	`0.0.1`
	r-base	`4.3.3`
TMB_SNV_INDEL	python	`3.9.15`
VARIANT_QC_TO_CSV	python	`3.12.6`
	variant_qc_to_csv	`1.0.0`
VCFTOOLS_TSTV_COUNT	vcftools	`0.1.16`
WHATSHAP	whatshap	`2.8`
WHATSHAP_TNSEQ	whatshap	`2.8`
Workflow	Nextflow	`25.10.2`
	altera/sarek	`v3.5.0-g3298dc3`

nf-core/sarek Methods Description

Suggested text and references to use when describing pipeline usage within the methods section of a publication.https://github.com/nf-core/sarek

Methods

Data was processed using nf-core/sarek v3.5.0 of the nf-core collection of workflows (Ewels et al., 2020), utilising reproducible software environments from the Bioconda (Grüning et al., 2018) and Biocontainers (da Veiga Leprevost et al., 2017) projects.

The pipeline was executed with Nextflow v25.10.2 (Di Tommaso et al., 2017) with the following command:

nextflow run 'https://gitlab.natera.com/rd-platform/bioinformatics/nextflow/sarek.git' 'https://gitlab.natera.com/rd-platform/bioinformatics/nextflow/sarek.git' -name gitlab-sarek-build2810611-regression-run-63904030 -r 3298dc303ca456eca35c00bea397722f955ed993 -profile docker,test_regression,eks --outdir 's3://natera-rnd-pltf-dev-s3-gitlab-results/sarek/build2810611/regression-run/63904030' -output-dir 's3://natera-rnd-pltf-dev-s3-gitlab-results/sarek/build2810611/regression-run/63904030'

References

Di Tommaso, P., Chatzou, M., Floden, E. W., Barja, P. P., Palumbo, E., & Notredame, C. (2017). Nextflow enables reproducible computational workflows. Nature Biotechnology, 35(4), 316-319. doi: 10.1038/nbt.3820
Ewels, P. A., Peltzer, A., Fillinger, S., Patel, H., Alneberg, J., Wilm, A., Garcia, M. U., Di Tommaso, P., & Nahnsen, S. (2020). The nf-core framework for community-curated bioinformatics pipelines. Nature Biotechnology, 38(3), 276-278. doi: 10.1038/s41587-020-0439-x
Grüning, B., Dale, R., Sjödin, A., Chapman, B. A., Rowe, J., Tomkins-Tinch, C. H., Valieris, R., Köster, J., & Bioconda Team. (2018). Bioconda: sustainable and comprehensive software distribution for the life sciences. Nature Methods, 15(7), 475–476. doi: 10.1038/s41592-018-0046-7
da Veiga Leprevost, F., Grüning, B. A., Alves Aflitos, S., Röst, H. L., Uszkoreit, J., Barsnes, H., Vaudel, M., Moreno, P., Gatto, L., Weber, J., Bai, M., Jimenez, R. C., Sachsenberg, T., Pfeuffer, J., Vera Alvarez, R., Griss, J., Nesvizhskii, A. I., & Perez-Riverol, Y. (2017). BioContainers: an open-source and community-driven framework for software standardization. Bioinformatics (Oxford, England), 33(16), 2580–2582. doi: 10.1093/bioinformatics/btx192

Notes:

If available, make sure to update the text to include the Zenodo DOI of the pipeline version used.
The command above does not include parameters contained in any configs or profiles that may have been used. Ensure the config file is also uploaded with your publication!
You should also cite all software used within this run. Check the "Software Versions" of this report to get version information.

MultiQC Toolbox

Apply Highlight Samples

Rename Samples Bulk input Apply

Apply Show / Hide Samples

Explain with AI

Export Plots

Choose Plots All None

Save Settings

Load Settings

Tool Citations

About MultiQC

Report AI Summary

General Statistics

AI Summary

General Statistics: Columns

altera/sarek Workflow Summary

AI Summary

Samtools Flagstat

Percent mapped Help

AI Summary

Alignment stats

AI Summary

Samtools: stats: Alignment Stats: Columns

Mosdepth

Cumulative coverage distribution Help

AI Summary

Average coverage per contig

AI Summary

XY coverage

AI Summary

Bcftools

Variant Substitution Types

AI Summary

Variant Quality

AI Summary

Indel Distribution

AI Summary

Variant depths

AI Summary

Vcftools

TsTv by Count Help

AI Summary

TsTv by Qual Help

AI Summary

SNPeff

Variants by Genomic Region Help

AI Summary

Variant Effects by Impact Help

AI Summary

Variants by Effect Types Help

AI Summary

Variants by Functional Class Help

AI Summary

Variant Qualities Help

AI Summary

VEP

General Statistics Help

AI Summary

VEP General Statistics: Columns

Variant classes

AI Summary

Consequences

AI Summary

SIFT summary

AI Summary

PolyPhen summary

AI Summary

Variants by chromosome

AI Summary

Position in protein

AI Summary

Software Versions

AI Summary

nf-core/sarek Methods Description

AI Summary

Methods

References

Notes:

Highlight Samples

Rename Samples

Show / Hide Samples

Choose Plots

Percent mapped

Cumulative coverage distribution

TsTv by Count

TsTv by Qual

Variants by Genomic Region

Variant Effects by Impact

Variants by Effect Types

Variants by Functional Class

Variant Qualities

General Statistics