+
Skip to content

nebiolabs/EM-seq

Repository files navigation

EM-seq Analysis Pipeline

Test Status

This repository contains Nextflow-based analysis tools for Enzymatic Methylation Sequencing (EM-seq) and Enzymatic 5hmC-seq (E5hmC-seq) data processing.

Main Analysis Pipeline (main.nf)

Complete EM-seq processing pipeline that accepts UBAM inputs:

  • Adapter trimming and read alignment with (fastp, bwa-meth)
  • Duplicate marking (Picard)
  • Methylation calling (MethylDackel)
  • Quality control metrics and statistics (Picard, Samtools, FastQC, MultiQC)
  • Optional BED file intersection for targeted analysis (bedtools)

Fastq to uBam pipeline (fastq_to_ubam.nf)

If your files are in fastq format you will need to convert them to uBams prior to running the main pipeline, e.g.:

nextflow run fastq_to_ubam.nf \
  --input_glob "tests/fixtures/fastq/emseq-test*{.ds.1,.ds.2}.fastq.gz" \
  --read_format 'paired-end'
Parameter Description Default
--input_glob glob for your gzipped fastq files ['*.{1,2}.fastq.gz']
--read_format 'paired-end' or 'single-end' 'paired-end'

Quick Start

  1. Install miniforge and bioconda (see Requirements)
  2. Install Nextflow (e.g. conda install nextflow, or see Nextflow installation guide)
  3. Clone this repository (git clone https://github.com/nebiolabs/EM-seq.git). Modify nextflow.config as needed for your environment, e.g. if running locally, change executor block to 'local' and set, e.g. --max_cpus 10 --max_memory 30.GB.
  4. Download or prepare a genome reference FASTA file (see Reference Genomes)
  5. Create a bwameth index for the fasta and add it to your references in conf/references.config
  6. Run the pipeline with appropriate parameters (see Basic Usage)
  7. Examine results in the EM-seq_output directory
    • EM-seq-Alignment-Summary-<FLOWCELL_ID>_multiqc_report.html in em-seq_output for overall QC summary
    • Mbias files em-seq_output/methylDackelExtracts/mbias (to identify sample-dependent positional biases)
    • Methylation output files in em-seq_output/methylDackelExtracts (suitable for analysis with methylKit)
    • Aligned reads in em-seq_output/markduped_bams (methylation coloring is recommended for visualization in IGV)

Basic Usage

nextflow run main.nf \
  --genome 'test' \
  --ubam_dir './' \
  --email your.email@example.com \
  --flowcell FLOWCELL_ID

ubam_dir should be the folder where your ubam files are.

Key Parameters

Parameter Description Default
--genome reference genome found in conf/references.config Required
--email Email for notifications Required
--flowcell Flowcell identifier Optional
--outputDir Output directory em-seq_output
--enable_neb_agg Enable NEB aggregation reporting False

References Config

Modify the conf/references.config file to specify your genome files

  • genome_fa path to your genome fasta file
  • genome_fai path to your genome fasta fai file
  • bwameth_index path to your genome fasta file where bwameth indices exist
  • target_bed BED file for targeted analysis, Optional

Advanced Options

  • --tmp_dir - Temporary directory (default: /tmp)
  • --workflow - Workflow identifier (default: EM-seq)
  • --enable_neb_agg - Enable NEB aggregation reporting (default: False)

Reference Genomes

Pre-built reference genomes with methylation spike-in controls:

Requirements

Historical Workflows

These in the "legacy" folder are retained for reference and reproducibility but are not actively maintained and are not compatible with the latest Nextflow versions. Use NXF_VER=22.10.4 nextflow run ... to reproduce the results in the EM-seq paper.

  • em-seq.nf - Original alignment and methylation calling workflow
  • bins.nf - TSS-centered binned coverage analysis
  • cov_vs_meth.nf - Coverage vs methylation analysis for genomic features

Citation

Analysis methods in this repository were used in the following publication:

Vaisvila R, Ponnaluri VKC, Sun Z, et al. Enzymatic methyl sequencing detects DNA methylation at single-base resolution from picograms of DNA. Genome Res. 2021;31(7):1280-1289. doi:10.1101/gr.266551.120

Related Projects

You may also be interested in the nf-core methylseq project

Developer documentation

Production:

  • git tag -f current_production
  • git push -f origin current_production

Development:

  • development workflow will run from master branch

Testing:

  • Tests are run using nf-test and are integrated into github actions
  • install nf-test from bioconda using conda/mamba
  • To run all tests:
nf-test test
  • When new tests are added or results change, to update the results snapshot:
nf-test test --updateSnapshot

About

Tools and Data related to Enzymatic Methylation Sequencing

Resources

License

Stars

Watchers

Forks

Packages

No packages published

Contributors 5

点击 这是indexloc提供的php浏览器服务,不要输入任何密码和下载