Toolbox for generic NGS analyses

Manual for people in hurry

Input files naming convention

TOGGLe will automatically assign sample name (also calle readgroup) based on the input file name structure, picking up the “text” before the first underscore (_) or first dot (.).

  • individual1_1.fastq will be understood as individual1
  • mapping1.sam will be understood as mapping1
  • myVcf.complete.vcf will be understood as myVcf

Please use only UTF-8 standard symbols, no weird characters or space, pipe, tilde or any type of commas. In addition, if you are working with paired fastq files, you do not need to call the mate files by the same name. TOGGLe will re-assemble the pairs based on the name of the first read of each file.

Creating pipeline

The SNPdiscoveryPaired.config.txt file is an example of how to customize your pipeline.

Providing an order

The order of a pipeline is provided with key $order

3=bwa mem

Sending soft options

Then add all software configuration using key $softName and after options such below:

$First Software

For example:

$bwa mem

Launching an analysis

The current version is based on the script -d|--directory DIR -c|--config FILE -o|--outputdir DIR [-r|--reference FILE] [-k|--keyfile FILE] [-g|--gff FILE] [-nocheck|--nocheckFastq] [-report|--report] [-add|--add] [-rerun|--rerun] [--help|-h]

Thus for instance, a classical command would be: -d ~/toggle/inputFile -c ~/toggle/SNPdiscoveryPaired.config.txt -o ~/toggle/outputFolder -r ~/toggle/reference.fasta -nocheck

All the the paths (files and folders) can be provided as absolute (/home/mylogin/data/myRef.fasta) or relative (../data/myRef.fasta or ~/data/myRef.fasta).

Required named arguments:  
-d / –directory DIR: a folder with raw data to be treated (FASTQ, FASTQ.GZ, SAM, BAM, VCF, VCF.GZ, BED, GFF3, GTF, TXT)
-c / –config FILE: generally it is the software.config.txt file but it can be any text file structured as shown below.
-o / –outputdir DIR: the folder were results will be outputted. Must be empty. If it does not exist, TOGGLe will create it.
Optional named arguments:  
-r / –reference FILE: the reference FASTA file to be used. (1)
-g / -gff FILE: the GFF/GTF file to be used for some tools.
-k / –keyfile FILE: the keyfile use for demultiplexing step.
-add / –add: use if you want to add new samples to an already run analysis.
-rerun / –rerun: use if you want to re-run samples that have encountered error previously.
-nocheck / –nocheckFastq: by default TOGGLe checks if input format is correct in every file. This option allows to skip this step.
-report / –report: generate pdf report (more info)
-v / –version: Use if you want to know which version of TOGGLe you are using
-h / –help: show help message and exit

(1): If no index exists it will be created accordingly to the pipeline requested index. If the index exist, they will not be re-created UNLESS the pipeline order (see below) expressively requests it (updating the index e.g.)

Output results

TOGGLe will generate an output folder containing different files and subfolders, as follows:

TOGGLe Output Folder

The final results are contained in the finalResults folder. TOGGLe will also copy the software config (and reference if any) files corresponding to the analysis, in order that users can recover their options. The output folder contains all sub analyses, i.e. the individual analyses or intermediate data.


TOGGLE will generate two main types of logs, the .o for normal output and the .e for the errors and warnings (these last ones are normally empty files). Each level of TOGGLE will generate this pair of log:

  • TOGGLe_ANALYSIS_date.o/.e logs represent the general output for the complete analysis ( logs). They are located at the root of the output directory.
  • IndividualName_global_log.o/.e logs represent the local output for sub analysis ( and logs). They are located in their respective subdirectories in the output folder.