Installation

This guide provides instructions on how to set up your own HAVoC.

HAVoC

The tool can be downloaded directly from the repository () by visiting the link or by using the following command in terminal:

git clone

Note that downloading these may take time depending on your internet speed, as the FASTQ files are relatively large (200–400 MB).

Bioinformatics software

  1. Trimmomatic or Fastp
  2. BWA-MEM or Bowtie2
  3. SAMBAMBA/Samtools
  4. BEDtools
  5. Lowfreq
  6. BCFtools/Samtools
  7. Pangolin

In­stalling bioin­form­at­ics soft­ware

Before starting to use HAVoC , you will need to get all these tools (listed below) installed on their system.

We recommend installing these via package managers, such as Bioconda () or, alternatively, brew ().

All dependencies can be conveniently installed with Bioconda with the following command:

conda install fastp trimmomatic bowtie2 bwa sambamba samtools bedtools lofreq bcftools pangolin

or please follow the installation instruction from each tool website on how to install them. These are very popular and common bioinformatics tools and majority could be found install on various university servers.

Trimmomatic

Trimmomatic performs a variety of useful preprocessing tasks for illumina paired-end and single ended data. See the documentation of for further information. The tool can be downloaded via:

wget

unzip Trimmomatic-0.39.zip

Fastp

Fastp is a fast all-in-one read preprocessing software similar to Trimmomatic. Fastp includes automated adapter detection and polyG tail trimming. For further information refer to . The tool can be downloaded via:

wget

BWA-MEM

BWA is a fast and accurate aligner designed to align reads and other short DNA sequences against large reference genomes. See the documentation of for installation and use.

Bowtie2

is an ultrafast and memory-efficient tool for aligning sequencing reads to long reference sequences or genomes.

Samtools

Samtools provides various utilities for manipulating alignments in the SAM format, including sorting, merging, indexing and generating alignments in a per-position format. See the documentation of for installation and use.

BEDtools

is a collection of tools for a wide-range of genomics analysis tasks. A useful function of it is masking low coverage regions in a sequnce.

Lowfreq

is a sensitive and robust tool for calling single-nucleotide variants (SNVs) from high-coverage sequencing datasets.

Pangolin

was developed to implement the dynamic nomenclature of SARS-CoV-2 lineages, known as the Pango nomenclature. It allows the user to assign the most likely Pango lineage to a SARS-CoV-2 query sequence.