Pipeline overview
Workflow
Viralgenie takes in a set of reads and performs 5 major analyses, each of them are explained in more detail in the following sections:
- Preprocessing
- Metagenomic diversity
- Assembly & Polishing
- Variant analysis & iterative refinement
- Consensus evaluation
By default all analyses are run.
Skipping steps
All steps can be skipped and the pipeline can be run with only the desired steps. This can be done with the --skip_preprocessing
, --skip_read_classification
, --skip_assembly
, --skip_polishing
, --skip_variant_analysis
, --skip_iterative_refinement
, --skip_consensus_qc
flags.
Subway map
- Read QC (
FastQC
) - Performs optional read pre-processing
- Adapter trimming(
fastp
,Trimmomatic
) - Read UMI deduplication (
HUMID
) - Low complexity and quality filtering (
bbduk
) - Host-read removal (
BowTie2
)
- Adapter trimming(
- Metagenomic diveristy mapping
- Denovo assembly (
SPAdes
,TRINITY
,megahit
), combine contigs. - Contig reference idententification (
blastn
)- Identify top 5 blast hits
- Merge blast hit and all contigs of a sample
- [Optional] Precluster contigs based on taxonomy
- Cluster contigs (or every taxonomic bin) of samples, options are:
- Scaffolding of contigs to centroid (
Minimap2
,iVar-consensus
) - [Optional] Annotate 0-depth regions with external reference
bin/lowcov_to_reference.py
. - [Optional] Select best reference from
--mapping_constrains
: - Mapping filtered reads to supercontig and mapping constrains(
BowTie2
,BWAmem2
andBWA
) - [Optional] Deduplicate reads (
Picard
or if UMI's are usedUMI-tools
) - Variant calling and filtering (
BCFTools
,iVar
) - Create consensus genome (
BCFTools
,iVar
) - Repeat step 11-14 multiple times for the denovo contig route
- Consensus evaluation and annotation (
QUAST
,CheckV
,blastn
,mmseqs-search
) - Result summary visualisation for raw read, alignment, assembly, variant calling and consensus calling results (
MultiQC
)