Pipeline overview

Workflow

Viralgenie takes in a set of reads and performs 5 major analyses, each of them are explained in more detail in the following sections:

By default all analyses are run.

Skipping steps

All steps can be skipped and the pipeline can be run with only the desired steps. This can be done with the --skip_preprocessing, --skip_read_classification, --skip_assembly, --skip_polishing, --skip_variant_analysis, --skip_iterative_refinement, --skip_consensus_qc flags.

Subway map

viralgenie-workflow

Read QC (FastQC)
Performs optional read pre-processing
- Adapter trimming(fastp, Trimmomatic)
- Read UMI deduplication (HUMID)
- Low complexity and quality filtering (bbduk, prinseq++)
- Host-read removal (BowTie2)
Metagenomic diveristy mapping
- Performs taxonomic classification and/or profiling using one or more of:
  - Kraken2
  - Bracken(optional)
  - Kaiju
- Plotting Kraken2 and Kaiju (Krona)
Denovo assembly (SPAdes, TRINITY, megahit), combine contigs.
[Optional] extend the contigs with sspace_basic and filter with prinseq++
[Optional] Map reads to contigs for coverage estimation (BowTie2,BWAmem2 and BWA)
Contig reference idententification (blastn)
- Identify top 5 blast hits
- Merge blast hit and all contigs of a sample
[Optional] Precluster contigs based on taxonomy
- Identify taxonomy Kraken2 and\or Kaiju
- Resolve potential inconsistencies in taxonomy & taxon filtering | simplification bin/extract_precluster.py
Cluster contigs (or every taxonomic bin) of samples, options are:
[Optional] Remove clusters with low read coverage. bin/extract_clusters.py
Scaffolding of contigs to centroid (Minimap2, iVar-consensus)
[Optional] Annotate 0-depth regions with external reference bin/lowcov_to_reference.py.
[Optional] Select best reference from --mapping_constrains:
- Mash sketch
- Mash screen
Mapping filtered reads to supercontig and mapping constrains(BowTie2,BWAmem2 and BWA)
[Optional] Deduplicate reads (Picard or if UMI's are used UMI-tools)
Variant calling and filtering (BCFTools,iVar)
Create consensus genome (BCFTools,iVar)