This repository contains Nextflow-based analysis tools for Enzymatic Methylation Sequencing (EM-seq) and Enzymatic 5hmC-seq (E5hmC-seq) data processing.
Complete EM-seq processing pipeline that accepts UBAM inputs:
- Adapter trimming and read alignment with (fastp, bwa-meth)
- Duplicate marking (Picard)
- Methylation calling (MethylDackel)
- Quality control metrics and statistics (Picard, Samtools, FastQC, MultiQC)
- Optional BED file intersection for targeted analysis (bedtools)
If your files are in fastq format you will need to convert them to uBams prior to running the main pipeline, e.g.:
nextflow run fastq_to_ubam.nf \
--input_glob "tests/fixtures/fastq/emseq-test*{.ds.1,.ds.2}.fastq.gz" \
--read_format 'paired-end'
Parameter | Description | Default |
---|---|---|
--input_glob |
glob for your gzipped fastq files | ['*.{1,2}.fastq.gz'] |
--read_format |
'paired-end' or 'single-end' | 'paired-end' |
- Install miniforge and bioconda (see Requirements)
- Install Nextflow (e.g. conda install nextflow, or see Nextflow installation guide)
- Clone this repository (
git clone https://github.com/nebiolabs/EM-seq.git
). Modifynextflow.config
as needed for your environment, e.g. if running locally, change executor block to 'local' and set, e.g.--max_cpus 10 --max_memory 30.GB
. - Download or prepare a genome reference FASTA file (see Reference Genomes)
- Create a bwameth index for the fasta and add it to your references in conf/references.config
- Run the pipeline with appropriate parameters (see Basic Usage)
- Examine results in the EM-seq_output directory
EM-seq-Alignment-Summary-<FLOWCELL_ID>_multiqc_report.html
in em-seq_output for overall QC summary- Mbias files
em-seq_output/methylDackelExtracts/mbias
(to identify sample-dependent positional biases) - Methylation output files in
em-seq_output/methylDackelExtracts
(suitable for analysis with methylKit) - Aligned reads in
em-seq_output/markduped_bams
(methylation coloring is recommended for visualization in IGV)
nextflow run main.nf \
--genome 'test' \
--ubam_dir './' \
--email your.email@example.com \
--flowcell FLOWCELL_ID
ubam_dir
should be the folder where your ubam files are.
Parameter | Description | Default |
---|---|---|
--genome |
reference genome found in conf/references.config | Required |
--email |
Email for notifications | Required |
--flowcell |
Flowcell identifier | Optional |
--outputDir |
Output directory | em-seq_output |
--enable_neb_agg |
Enable NEB aggregation reporting | False |
Modify the conf/references.config file to specify your genome files
genome_fa
path to your genome fasta filegenome_fai
path to your genome fasta fai filebwameth_index
path to your genome fasta file where bwameth indices existtarget_bed
BED file for targeted analysis, Optional
--tmp_dir
- Temporary directory (default:/tmp
)--workflow
- Workflow identifier (default:EM-seq
)--enable_neb_agg
- Enable NEB aggregation reporting (default:False
)
Pre-built reference genomes with methylation spike-in controls:
- T2T CHM13: https://neb-em-seq-sra.s3.amazonaws.com/T2T_chm13v2.0%2Bbs_controls.fa
- GRCh38: https://neb-em-seq-sra.s3.amazonaws.com/grch38_core%2Bbs_controls.fa
- Create your own reference by appending the control sequences to your preferred genome fasta (e.g.
cat genome.fa methylation_controls.fa > genome+methylation_controls.fa
)
- Nextflow
- Miniforge, Micromamba, or Conda for dependency management
- Bioconda channel configured
- Sufficient computational resources (memory scales with input size)
These in the "legacy" folder are retained for reference and reproducibility but are not actively maintained and are not compatible with the latest Nextflow versions. Use NXF_VER=22.10.4 nextflow run ...
to reproduce the results in the EM-seq paper.
em-seq.nf
- Original alignment and methylation calling workflowbins.nf
- TSS-centered binned coverage analysiscov_vs_meth.nf
- Coverage vs methylation analysis for genomic features
Analysis methods in this repository were used in the following publication:
Vaisvila R, Ponnaluri VKC, Sun Z, et al. Enzymatic methyl sequencing detects DNA methylation at single-base resolution from picograms of DNA. Genome Res. 2021;31(7):1280-1289. doi:10.1101/gr.266551.120
You may also be interested in the nf-core methylseq project
- git tag -f current_production
- git push -f origin current_production
- development workflow will run from master branch
- Tests are run using nf-test and are integrated into github actions
- install nf-test from bioconda using conda/mamba
- To run all tests:
nf-test test
- When new tests are added or results change, to update the results snapshot:
nf-test test --updateSnapshot