+
Skip to content

alexisbourdais/STRAP

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

51 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

STRAP

Overview

STRAP is a nextflow pipeline for studying Trans-Kingdom small RNAs (mirna, sirna, pirna) from smRNAseq data, based on Shortstack and Mirdeep2 software.

screenshot

It takes as input a samplesheet describing the FastQ samples (ID, path, group) and the genomes of the reference species.

Pipeline summary

  1. Quality check and trimming
    1. Raw read QC (FastQC)
    2. 3’ adapter trimming (Shortstack) (Optional)
    3. Read quality and length filter (fastp)
    4. Trim read QC (FastQC)
  2. List of known miRNAs
    1. Known miRNAs download from Mirbase (Optional)
    2. Novel miRNAs and known miRNAs discovery (MiRDeep2) (Optional)
  3. Repeat Sequence GTF creation for piRNA analysis (Optional)
    1. De-Novo Repeat Discovery (RepeatModeler)
    2. Annotation of the repeats (RepeatMasker)
  4. Novel miRNAs and known miRNAs quantification
    1. Reads alignment on 1st reference (Shortstack)
    2. Unmapped reads filtration (SAMtools)
    3. Unmapped reads alignment on 2nd reference (Shortstack)
  5. Differential Expression analysis (AskoR) (Optional)
  6. Overlap analysis and final report
  7. miRNA QC (miRTrace) (Optional)

Quick start

  • Complete the samplesheet with ids (not starting with a number) and paths. The group column is used for differential expression analysis, and the code column for Mirdeep2 (3-character code).
  • Run the pipeline with :

nextflow run strap.nf
-profile slurm,[singularity/conda]
-i samplesheet
--genome_ref_1 path/to/ref
--genome_ref_2 path/to/ref
--gtf_ref_1 path/to/ref
--gtf_ref_2 path/to/ref

  • if you have repeat gtf, add:
    --gtf_repeat_ref_1 path/to/ref
    --gtf_repeat_ref_2 path/to/ref
    --skip_modeler

  • if you have known mirna fasta, add:
    --mature path/to/file (optional for Shortstack and Mirdeep2)
    --mature_other path/to/file (optional for Mirdeep2)
    --hairpin path/to/file (optional for Mirdeep2)

    • or --species xxx (for download automatically from mirbase mature and hairpin)
    • or --skip_mirdeep
  • if you have only one group :
    --skip_askor

[option]

Parameters

Command : nextflow run strap.nf -profile [standard/slurm,singularity/conda] [option]

REQUIRED parameter

-profile

--input             Samplesheet
--genome_ref_1      Genome reference for the first alignment
--genome_ref_2      Genome reference for the second alignment
--gtf_ref_1         GTF/GFF with gene to overlap miRNA
--gtf_ref_2         GTF/GFF with gene to overlap miRNA

if profile singularity
--singularity       "-B root/to/mount/"

if trimming TRUE
--trimming_key      Sequence of a highly abundant known mirna
                    For exemple: 
                    aphid: TGGAATGTAAAGAAGTATGGAG
                    plant: TCGGACCAGGCTTCATTCCCC

OPTIONAL parameter

-resume

Cluster ressources
--max_memory        ["200.GB"]
--max_cpus          ["32"]
--max_time          ["336.h"]  

Overlap
--gtf_repeat_ref_1  GTF/GFF with repeat sequece to overlap miRNA
--gtf_repeat_ref_2  GTF/GFF with repeat sequece to overlap miRNA

Known_MiRNA
--species           Code Mirbase (3 letters) to filter mirna from mirbase if files not provided [No filter]
--mature            Mature miRNA file (shortstack and mirdeep)
--mature_other      Mature related species miRNA file (mirdeep)
--hairpin           Hairpin species miRNA file (mirdeep)

Results directory name
--resultsDir        ["Results"]

Shortstack
--pad               Merge clusters less than x nucleotides away [100]

Filtering (reads length)
--min               ["18"]
--max               ["30"]

AskoR (DE analysis)
--contrast          contrast file (auto creation if not provided)
--logfc             ["0"]

Skip process
--skip_trimming     Skip trimming    [False]
--skip_mirtrace     Skip MirTrace    [False]
--skip_mirdeep      Skip MirDeep2    [False]
--skip_modeler      Skip repeat gff  [False]
--skip_askor        Skip DE anlayses [False]

Each of the previous parameters can be specified as command line options, in launch file or in config files

Documentation

AskoR : https://github.com/askomics/askoR

Fastp : https://github.com/OpenGene/fastp

Mirdeep2 : https://github.com/rajewsky-lab/mirdeep2

Repeat Modeler : https://github.com/Dfam-consortium/RepeatModeler

Repeat Masker : https://github.com/Dfam-consortium/RepeatMasker

Samtools : https://github.com/samtools/samtools

Shortstack : https://github.com/MikeAxtell/ShortStack

Citations

AskoR : Alves-Carvalho, S.; Gazengel, K.; Bretaudeau, A.; Robin, S.; Daval, S.; Legeai, F. AskoR, A R Package for Easy RNASeq Data Analysis, in Proceedings of the 1st International Electronic Conference on Entomology, 1–15 July 2021, MDPI: Basel, Switzerland, doi:10.3390/IECE-10646

Fastp : Shifu Chen. 2023. Ultrafast one-pass FASTQ data preprocessing, quality control, and deduplication using fastp. iMeta 2: e107. https://doi.org/10.1002/imt2.107

Mirdeep2 : Sebastian Mackowiak & Marc Friedländer

RepeatModeler & Repeat Masker : Robert Hubley, Arian Smit - Institute for Systems Biology

Samtools : Twelve years of SAMtools and BCFtools Petr Danecek, James K Bonfield, Jennifer Liddle, John Marshall, Valeriu Ohan, Martin O Pollard, Andrew Whitwham, Thomas Keane, Shane A McCarthy, Robert M Davies, Heng Li GigaScience, Volume 10, Issue 2, February 2021, giab008, https://doi.org/10.1093/gigascience/giab008

Shortstack : Johnson NR, Yeoh JM, Coruh C, Axtell MJ. (2016). G3 6:2103-2111. doi:10.1534/g3.116.030452

About

Small Trans-kingdom RNA Analysis Pipeline

Topics

Resources

Stars

Watchers

Forks

点击 这是indexloc提供的php浏览器服务,不要输入任何密码和下载