Installation

This guide provides instructions on how to set up your own HAVoC.

HAVoC

The tool can be downloaded directly from the repository (https://bitbucket.org/auto_cov_pipeline/havoc.git) by visiting the link or by using the following command in terminal:

git clone https://bitbucket.org/auto_cov_pipeline/havoc.git

Note that downloading these may take time depending on your internet speed, as the FASTQ files are relatively large (200–400 MB).

Bioinformatics software

  1. Trimmomatic or Fastp
  2. BWA-MEM or Bowtie2
  3. SAMBAMBA/Samtools
  4. BEDtools
  5. Lowfreq
  6. BCFtools/Samtools
  7. Pangolin

In­stalling bioin­form­at­ics soft­ware

Before starting to use HAVoC , you will need to get all these tools (listed below) installed on their system.

We recommend installing these via package managers, such as Bioconda (https://bioconda.github.io/) or, alternatively, brew (https://brew.sh/).

All dependencies can be conveniently installed with Bioconda with the following command:

conda install fastp trimmomatic bowtie2 bwa sambamba samtools bedtools lofreq bcftools pangolin

or please follow the installation instruction from each tool website on how to install them. These are very popular and common bioinformatics tools and majority could be found install on various university servers.

Trimmomatic

Trimmomatic performs a variety of useful preprocessing tasks for illumina paired-end and single ended data. See the documentation of Trimmomatic for further information. The tool can be downloaded via:

wget http://www.usadellab.org/cms/uploads/supplementary/Trimmomatic/Trimmomatic-0.39…

unzip Trimmomatic-0.39.zip

Fastp

Fastp is a fast all-in-one read preprocessing software similar to Trimmomatic. Fastp includes automated adapter detection and polyG tail trimming. For further information refer to Fastp documentation. The tool can be downloaded via:

wget http://opengene.org/fastp/fastp

BWA-MEM

BWA is a fast and accurate aligner designed to align reads and other short DNA sequences against large reference genomes. See the documentation of Burrow-Wheeler Aligner for installation and use.

Bowtie2

Bowtie 2 is an ultrafast and memory-efficient tool for aligning sequencing reads to long reference sequences or genomes.

Samtools

Samtools provides various utilities for manipulating alignments in the SAM format, including sorting, merging, indexing and generating alignments in a per-position format. See the documentation of Samtools for installation and use.

BEDtools

Bedtools is a collection of tools for a wide-range of genomics analysis tasks. A useful function of it is masking low coverage regions in a sequnce.

Lowfreq

Lowfreq is a sensitive and robust tool for calling single-nucleotide variants (SNVs) from high-coverage sequencing datasets.

Pangolin

Pangolin was developed to implement the dynamic nomenclature of SARS-CoV-2 lineages, known as the Pango nomenclature. It allows the user to assign the most likely Pango lineage to a SARS-CoV-2 query sequence.