These repositories represent accumulated tools and knowledge that enable labs to get up and running with cloud computing, mostly using cancer genome analysis as the use case. Some of the contents include:
- How to get your lab signed up for Google Cloud access through WUSTL IT
- A lecture and slides on Getting started on cloud with details specific to WUSTL.
- End-to-end genomic workflows for variant calling, rnaseq analyses, epigenomics, and more.
- Guidance for running these pipelines (or others) on Google Cloud, including complete walkthrough examples of using it to run an immunogenomics pipeline see below.
- Guides to running workflows on the local compute cluster, using GMS (example) or Cromwell for workflow orchestration
- Annotation files pre-loaded on the cloud that are needed for many of these pipelines, as well as detailed instructions for creating your own
- Links to useful resources for analysis, cloud computing, or running workflows on other providers such as Terra
The following github repositories contain tutorials on how to run the immuno workflow, end-to-end in different compute environments. Each tutorial differs in where the data is stored, whether it gets staged somewhere else for compute, the job scheduler used (e.g. GCP Batch, LSF, Slurm), use of cloud vs. on-premises compute, etc.
- Data stored on the local storage1 cluster that needs to be staged to Google storage and then processed using Google Cloud Compute
- Data stored on the local storage1 cluster, that will be processed using LSF and compute1
- A simplified version where data may already be put on the cloud that will be processed using Google Cloud Compute
- Data stored in Google storage and processed using Terra
- Data stored on JHU SafeStor, that will be processed using Slurm and JHU Discovery