Viralgenie

A metagenomic analysis pipeline for eukaryotic viruses written in nextflow.

Introduction

Viralgenie is a bioinformatics best-practice analysis pipeline for reconstructing consensus genomes and identifying intra-host variants from metagenomic sequencing data or enriched based sequencing data like hybrid capture.

Pipeline summary

viralgenie-workflow

Read QC (FastQC)
Performs optional read pre-processing
- Adapter trimming(fastp, Trimmomatic)
- Read UMI deduplication (HUMID)
- Low complexity and quality filtering (bbduk, prinseq++)
- Host-read removal (BowTie2)
Metagenomic diversity mapping
- Performs taxonomic classification and/or profiling using one or more of:
  - Kraken2
  - Bracken(optional)
  - Kaiju
- Plotting Kraken2 and Kaiju (Krona)
Denovo assembly (SPAdes, TRINITY, megahit), combine contigs.
[Optional] extend the contigs with sspace_basic and filter with prinseq++
[Optional] Map reads to contigs for coverage estimation (BowTie2,BWAmem2 and BWA)
Contig reference idententification (blastn)
- Identify top 5 blast hits
- Merge blast hit and all contigs of a sample
[Optional] Precluster contigs based on taxonomy
- Identify taxonomy Kraken2 and\or Kaiju
- Resolve potential inconsistencies in taxonomy & taxon filtering | simplification bin/extract_precluster.py
Cluster contigs (or every taxonomic bin) of samples, options are:
- cdhitest
- vsearch
- mmseqs-linclust
- mmseqs-cluster
- vRhyme
- Mash
[Optional] Remove clusters with low read coverage. bin/extract_clusters.py
Scaffolding of contigs to centroid (Minimap2, iVar-consensus)
[Optional] Annotate 0-depth regions with external reference bin/nocov_to_reference.py.
[Optional] Select best reference from --mapping_constraints:
- Mash sketch
- Mash screen
Mapping filtered reads to supercontig and mapping constraints(BowTie2,BWAmem2 and BWA)
[Optional] Deduplicate reads (Picard or if UMI's are used UMI-tools)
Variant calling and filtering (BCFTools,iVar)
Create consensus genome (BCFTools,iVar)
Repeat step 12-15 multiple times for the denovo contig route
Consensus evaluation and annotation (QUAST,CheckV,blastn,prokka mmseqs-search, MAFFT - alignment of contigs vs iterations & consensus)
Result summary visualisation for raw read, alignment, assembly, variant calling and consensus calling results (MultiQC)

Usage

[!NOTE] If you are new to Nextflow and nf-core, please refer to this page on how to set-up Nextflow. Make sure to test your setup with -profile test before running the workflow on actual data.

First, prepare a samplesheet with your input data that looks as follows:

samplesheet.csv:

sample,fastq_1,fastq_2
sample1,AEG588A1_S1_L002_R1_001.fastq.gz,AEG588A1_S1_L002_R2_001.fastq.gz
sample2,AEG588A5_S5_L003_R1_001.fastq.gz,
sample3,AEG588A3_S3_L002_R1_001.fastq.gz,AEG588A3_S3_L002_R2_001.fastq.gz

Each row represents a fastq file (single-end) or a pair of fastq files (paired end).

Now, you can run the pipeline using:

nextflow run Joon-Klaps/viralgenie \
   -profile <docker/singularity/.../institute> \
   --input samplesheet.csv \
   --outdir <OUTDIR>

[!WARNING] Please provide pipeline parameters via the CLI or Nextflow -params-file option. Custom config files including those provided by the -c Nextflow option can be used to provide any configuration except for parameters; see docs.

For more details and further functionality, please refer to the usage documentation and the parameter documentation.

Pipeline output

To see the results of an example test run with a full size dataset refer to the results tab on the nf-core website pipeline page. For more details about the output files and reports, please refer to the output documentation.

Credits

Viralgenie was originally written by Joon-Klaps.

We thank the following people for their extensive assistance in the development of this pipeline:

Contributions and Support

If you would like to contribute to this pipeline, please see the contributing guidelines.

Citations

[!WARNING] Viralgenie is currently not Published. Please cite as: Klaps J, Lemey P, Kafetzopoulou L. Viralgenie: A metagenomics analysis pipeline for eukaryotic viruses. Github https://github.com/Joon-Klaps/viralgenie

An extensive list of references for the tools used by the pipeline can be found in the CITATIONS.md file.