Skip to content

Pipeline overview

Workflow

Viralgenie takes in a set of reads and performs 5 major analyses, each of them are explained in more detail in the following sections:

  1. Preprocessing
  2. Metagenomic diversity
  3. Assembly & Polishing
  4. Variant analysis & iterative refinement
  5. Consensus evaluation

By default all analyses are run.

Skipping steps

All steps can be skipped and the pipeline can be run with only the desired steps. This can be done with the --skip_preprocessing, --skip_read_classification, --skip_assembly, --skip_polishing, --skip_variant_analysis, --skip_iterative_refinement, --skip_consensus_qc flags.

Subway map

viralgenie-workflow

  1. Read QC (FastQC)
  2. Performs optional read pre-processing
  3. Metagenomic diveristy mapping
    • Performs taxonomic classification and/or profiling using one or more of:
    • Plotting Kraken2 and Kaiju (Krona)
  4. Denovo assembly (SPAdes, TRINITY, megahit), combine contigs.
  5. Contig reference idententification (blastn)
    • Identify top 5 blast hits
    • Merge blast hit and all contigs of a sample
  6. [Optional] Precluster contigs based on taxonomy
    • Identify taxonomy Kraken2 and\or Kaiju
    • Resolve potential inconsistencies in taxonomy & taxon filtering | simplification bin/extract_precluster.py
  7. Cluster contigs (or every taxonomic bin) of samples, options are:
  8. Scaffolding of contigs to centroid (Minimap2, iVar-consensus)
  9. [Optional] Annotate 0-depth regions with external reference bin/lowcov_to_reference.py.
  10. [Optional] Select best reference from --mapping_constrains:
  11. Mapping filtered reads to supercontig and mapping constrains(BowTie2,BWAmem2 and BWA)
  12. [Optional] Deduplicate reads (Picard or if UMI's are used UMI-tools)
  13. Variant calling and filtering (BCFTools,iVar)
  14. Create consensus genome (BCFTools,iVar)
  15. Repeat step 11-14 multiple times for the denovo contig route
  16. Consensus evaluation and annotation (QUAST,CheckV,blastn, mmseqs-search)
  17. Result summary visualisation for raw read, alignment, assembly, variant calling and consensus calling results (MultiQC)