This repository is a library of common scripts and templates created to avoid reinventing the wheel then and again.
Please use to cite the use of this repos :-)
2022/12/06 I found a serious bug in the DE template! I have corrected the DE template: https://github.com/UPSCb/UPSCb-common/tree/master/templates/R, but please check out your analysis done since 2021/03/05. Sadly, it means that any analysis needs redoing! 😞 😞 😞 The issue is that the template used the lfcThreshold when retrieving the DE results (using the DESeq2 results function). What this does, unlike the alpha parameter that sets the FDR threshold to return results, is to change the test that is being done. Instead of comparing for a difference in expression of 0, it tests for a difference in expression at the selected value (+0.5 by default). Results are likely to be quite drastically different!
There are three important directories:
|- pipeline
|- src
|- bash
|- R
|- templates
|- R
pipeline
is a directory that contains scripts written (mostly) in bash to be copied into your project and used through a SLURM queueing system.src
(source) contains a number of subdirectories sorted by programming language holding a number of utilitiestemplates
contains a number of subdirectories sorted by programming language holding a number of templates to be re-used. Most notably it contains examples of data analysis in R (e.g. BiologicalQA.R, DifferentialExpression.R, etc.) and some bash examples to submit jobs to the queue.
This instructions are specific to the users at the Umeå Plant Science Centre, you are welcome to get inspired by them though.
- Follows the instructions there and our discussions in Slack.
We use git
and SLURM
to ensure reproducible research, here are gists on how to enable that in your projects:
- Git setup
- SLURM usage
- A video of a tech seminar on both (duration ~1h)
- A description of our server structure
- A video on how to download data from Novogene
Before re-inventing the wheel, check the templates directory! A number of useful templates are available there:
R/empty.R
to initiate an R script with the Rmd header and session info blocks.R/BiologicalQA.R
to do the initial Exploratory Data Analysis (EDA) of your RNA-Seq dataR/DifferentialExpression.R
to perform a Differential Expression (DE) of your data (follows the previous one)bash/runTemplate.sh
to prototype script to be run on an HPC (High Perf. Compute) using SLURM (a job manager)
The following link will bring you to a server introduction guide with tips about how to connect to our server, how to run slurm jobs properly and how to use AspSeq.
It also contains a video about how to make apptainer containers if you need to install a software that we currently do not have.
For UPSC members, ask us to be added to our Slack channel as well as mailing list. These are the two channels we use to communicate about server updates and downtime (as well as other technical issues), but also those we use to discuss projects, provide support, etc.
Special thanks to loalon for some contributed code over the years.