Pipeline overview
Workflow
Viralgenie takes in a set of reads and performs 5 major analyses, each of them are explained in more detail in the following sections:
- Preprocessing
- Metagenomic diversity
- Assembly & Polishing
- Variant analysis & iterative refinement
- Consensus evaluation
By default all analyses are run.
Skipping steps
All steps can be skipped and the pipeline can be run with only the desired steps. This can be done with the --skip_preprocessing
, --skip_read_classification
, --skip_assembly
, --skip_polishing
, --skip_variant_analysis
, --skip_iterative_refinement
, --skip_consensus_qc
flags.
Subway map
- Read QC (
FastQC
) - Performs optional read pre-processing
- Metagenomic diveristy mapping
- Denovo assembly (
SPAdes
,TRINITY
,megahit
), combine contigs. - [Optional] extend the contigs with sspace_basic and filter with
prinseq++
- [Optional] Map reads to contigs for coverage estimation (
BowTie2
,BWAmem2
andBWA
) - Contig reference idententification (
blastn
)- Identify top 5 blast hits
- Merge blast hit and all contigs of a sample
- [Optional] Precluster contigs based on taxonomy
- Cluster contigs (or every taxonomic bin) of samples, options are:
- [Optional] Remove clusters with low read coverage.
bin/extract_clusters.py
- Scaffolding of contigs to centroid (
Minimap2
,iVar-consensus
) - [Optional] Annotate 0-depth regions with external reference
bin/lowcov_to_reference.py
. - [Optional] Select best reference from
--mapping_constrains
: - Mapping filtered reads to supercontig and mapping constrains(
BowTie2
,BWAmem2
andBWA
) - [Optional] Deduplicate reads (
Picard
or if UMI's are usedUMI-tools
) - Variant calling and filtering (
BCFTools
,iVar
) - Create consensus genome (
BCFTools
,iVar
)