Download TAGADA latest release

Take it from https://github.com/FAANG/proj-gs-rna-seq/releases

For example

wget https://github.com/FAANG/proj-gs-rna-seq/archive/0.3.1.zip
unzip 0.3.1.zip
rm 0.3.1.zip

In this case the code will be located in proj-gs-rna-seq-0.3.1, a directory we will call $codedir

Install the necessary software

Notes:

Check the pipeline works on a small example

For example on genotoul slurm cluster and with singularity

srun --pty bash
module load bioinfo/Nextflow-v20.01.0
module load system/singularity-3.5.3
export SINGULARITY_PULLFOLDER=/work/project/fragencode/workspace/geneswitch/code/containers/singularity
$codedir/nextflow-run $codedir/main.nf --output <outdir> --profile test slurm singularity > nextflow.out

A typical command line for TAGADA

./nextflow-run path/to/pipeline 
  --output path/to/output 
  --profile slurm singularity 
  --reads path/to/reads/* 
  --annotation path/to/annotation.gtf 
  --genome path/to/genome.fa
  --keep-temp 
  --resume

Running TAGADA on the pig FR-AgENCODE data (10 tissues, 4 animals, several runs per sample)

The command line would be

cd /work2/project/fragencode/workspace/sdjebali/courses/tagada_plus4pigs_march16th2021/fragencode
sbatch --mem=8G --cpus-per-task=1 -J tag.pig --mail-user=sarah.djebali@inserm.fr --mail-type=END,FAIL --export=ALL --workdir=$PWD -p workq launch.TAGADA.sh

where the content of launch.TAGADA.sh is the following:

#!/bin/sh

module load bioinfo/Nextflow-v20.01.0
module load system/singularity-3.5.3
export SINGULARITY_PULLFOLDER=/work2/project/fragencode/workspace/geneswitch/code/containers/singularity
export SINGULARITY_CACHEDIR=/work2/project/fragencode/workspace/geneswitch/code/containers/singularity
export SINGULARITY_TMPDIR=/work2/project/fragencode/workspace/geneswitch/code/containers/singularity

basedir=/work2/project/fragencode/workspace/sdjebali/courses/tagada_plus4pigs_march16th2021
datadir=/work2/project/fragencode/data

$basedir/proj-gs-rna-seq-0.3.1/nextflow-run $basedir/proj-gs-rna-seq-0.3.1/main.nf 
--output $basedir/fragencode 
--profile slurm,singularity 
--reads $datadir/metadata/rnaseq/sus_scrofa_read.files_for_TAGADA.txt 
--metadata $datadir/metadata/rnaseq/sus_scrofa_metadata_for_TAGADA.tsv --merge tissue 
--annotation $datadir/species/sus_scrofa/Sscrofa11.1.102/sus_scrofa.gtf
--genome $datadir/species/sus_scrofa/Sscrofa11.1.102/sus_scrofa.fa      
--keep-temp --resume > $basedir/fragencode/nextflow.out

We can look at the input files

This job finishes in less than 12 hours on genologin

We can look at the output files

How to troubleshoot errors by looking in temp