STRAP is a nextflow pipeline for studying Trans-Kingdom small RNAs (mirna, sirna, pirna) from smRNAseq data, based on Shortstack and Mirdeep2 software.
It takes as input a samplesheet describing the FastQ samples (ID, path, group) and the genomes of the reference species.
- Quality check and trimming
- Raw read QC (FastQC)
- 3’ adapter trimming (Shortstack) (Optional)
- Read quality and length filter (fastp)
- Trim read QC (FastQC)
- List of known miRNAs
- Known miRNAs download from Mirbase (Optional)
- Novel miRNAs and known miRNAs discovery (MiRDeep2) (Optional)
- Repeat Sequence GTF creation for piRNA analysis (Optional)
- De-Novo Repeat Discovery (RepeatModeler)
- Annotation of the repeats (RepeatMasker)
- Novel miRNAs and known miRNAs quantification
- Reads alignment on 1st reference (Shortstack)
- Unmapped reads filtration (SAMtools)
- Unmapped reads alignment on 2nd reference (Shortstack)
- Differential Expression analysis (AskoR) (Optional)
- Overlap analysis and final report
- miRNA QC (miRTrace) (Optional)
- Complete the samplesheet with ids (not starting with a number) and paths. The group column is used for differential expression analysis, and the code column for Mirdeep2 (3-character code).
- Run the pipeline with :
nextflow run strap.nf
-profile slurm,[singularity/conda]
-i samplesheet
--genome_ref_1 path/to/ref
--genome_ref_2 path/to/ref
--gtf_ref_1 path/to/ref
--gtf_ref_2 path/to/ref
-
if you have repeat gtf, add:
--gtf_repeat_ref_1 path/to/ref
--gtf_repeat_ref_2 path/to/ref
--skip_modeler -
if you have known mirna fasta, add:
--mature path/to/file (optional for Shortstack and Mirdeep2)
--mature_other path/to/file (optional for Mirdeep2)
--hairpin path/to/file (optional for Mirdeep2)- or --species xxx (for download automatically from mirbase mature and hairpin)
- or --skip_mirdeep
-
if you have only one group :
--skip_askor
[option]
Command : nextflow run strap.nf -profile [standard/slurm,singularity/conda] [option]
REQUIRED parameter
-profile
--input Samplesheet
--genome_ref_1 Genome reference for the first alignment
--genome_ref_2 Genome reference for the second alignment
--gtf_ref_1 GTF/GFF with gene to overlap miRNA
--gtf_ref_2 GTF/GFF with gene to overlap miRNA
if profile singularity
--singularity "-B root/to/mount/"
if trimming TRUE
--trimming_key Sequence of a highly abundant known mirna
For exemple:
aphid: TGGAATGTAAAGAAGTATGGAG
plant: TCGGACCAGGCTTCATTCCCC
OPTIONAL parameter
-resume
Cluster ressources
--max_memory ["200.GB"]
--max_cpus ["32"]
--max_time ["336.h"]
Overlap
--gtf_repeat_ref_1 GTF/GFF with repeat sequece to overlap miRNA
--gtf_repeat_ref_2 GTF/GFF with repeat sequece to overlap miRNA
Known_MiRNA
--species Code Mirbase (3 letters) to filter mirna from mirbase if files not provided [No filter]
--mature Mature miRNA file (shortstack and mirdeep)
--mature_other Mature related species miRNA file (mirdeep)
--hairpin Hairpin species miRNA file (mirdeep)
Results directory name
--resultsDir ["Results"]
Shortstack
--pad Merge clusters less than x nucleotides away [100]
Filtering (reads length)
--min ["18"]
--max ["30"]
AskoR (DE analysis)
--contrast contrast file (auto creation if not provided)
--logfc ["0"]
Skip process
--skip_trimming Skip trimming [False]
--skip_mirtrace Skip MirTrace [False]
--skip_mirdeep Skip MirDeep2 [False]
--skip_modeler Skip repeat gff [False]
--skip_askor Skip DE anlayses [False]
Each of the previous parameters can be specified as command line options, in launch file or in config files
AskoR : https://github.com/askomics/askoR
Fastp : https://github.com/OpenGene/fastp
Mirdeep2 : https://github.com/rajewsky-lab/mirdeep2
Repeat Modeler : https://github.com/Dfam-consortium/RepeatModeler
Repeat Masker : https://github.com/Dfam-consortium/RepeatMasker
Samtools : https://github.com/samtools/samtools
Shortstack : https://github.com/MikeAxtell/ShortStack
AskoR : Alves-Carvalho, S.; Gazengel, K.; Bretaudeau, A.; Robin, S.; Daval, S.; Legeai, F. AskoR, A R Package for Easy RNASeq Data Analysis, in Proceedings of the 1st International Electronic Conference on Entomology, 1–15 July 2021, MDPI: Basel, Switzerland, doi:10.3390/IECE-10646
Fastp : Shifu Chen. 2023. Ultrafast one-pass FASTQ data preprocessing, quality control, and deduplication using fastp. iMeta 2: e107. https://doi.org/10.1002/imt2.107
Mirdeep2 : Sebastian Mackowiak & Marc Friedländer
RepeatModeler & Repeat Masker : Robert Hubley, Arian Smit - Institute for Systems Biology
Samtools : Twelve years of SAMtools and BCFtools Petr Danecek, James K Bonfield, Jennifer Liddle, John Marshall, Valeriu Ohan, Martin O Pollard, Andrew Whitwham, Thomas Keane, Shane A McCarthy, Robert M Davies, Heng Li GigaScience, Volume 10, Issue 2, February 2021, giab008, https://doi.org/10.1093/gigascience/giab008
Shortstack : Johnson NR, Yeoh JM, Coruh C, Axtell MJ. (2016). G3 6:2103-2111. doi:10.1534/g3.116.030452