A new compression strategy to reduce the size of nanopore sequencing data
- Kavindu Jayasooriya1,2,3,4,7,
- Sasha P. Jenner1,7,
- Pasindu Marasinghe4,
- Udith Senanayake4,
- Hassaan Saadat5,
- David Taubman5,
- Roshan Ragel4,
- Hasindu Gamaarachchi1,2,3,8 and
- Ira W. Deveson1,2,6,8
- 1Genomics and Inherited Disease Program, Garvan Institute of Medical Research, Sydney, New South Wales 2010, Australia;
- 2Centre for Population Genomics, Garvan Institute of Medical Research and Murdoch Children's Research Institute, Sydney, New South Wales 2010, Australia;
- 3School of Computer Science and Engineering, University of New South Wales, Sydney, New South Wales 2052, Australia;
- 4Department of Computer Engineering, University of Peradeniya, Peradeniya 20400, Sri Lanka;
- 5School of Electrical Engineering and Telecommunications, University of New South Wales, Sydney, New South Wales 2052, Australia;
- 6St Vincent's Clinical School, Faculty of Medicine, University of New South Wales, Sydney, New South Wales 2052, Australia
Abstract
Nanopore sequencing is an increasingly central tool for genomics. Despite rapid advances in the field, large data volumes and computational bottlenecks continue to pose major challenges. Here, we introduce ex-zd, a new data compression strategy that helps address the large size of raw signal data generated during nanopore experiments. Ex-zd encompasses both a lossless compression method, which modestly outperforms all current methods for nanopore signal data compression, and a ‘lossy’ method, which can be used to achieve additional savings. The latter component works by reducing the number of bits used to encode signal data. We show that the three least significant bits in signal data generated on instruments from Oxford Nanopore Technologies (ONT) predominantly encode noise. Their removal reduces file sizes by half without impacting downstream analyses, including basecalling and detection of modified DNA or RNA bases. Ex-zd compression saves hundreds of gigabytes on a single ONT sequencing experiment, thereby increasing the scalability, portability, and accessibility of nanopore sequencing.
Footnotes
-
↵7 Joint first authors.
-
↵8 Joint senior authors.
-
[Supplemental material is available for this article.]
-
Article published online before print. Article, supplemental material, and publication date are at https://www.genome.org/cgi/doi/10.1101/gr.280090.124.
-
Freely available online through the Genome Research Open Access option.
- Received October 2, 2024.
- Accepted May 2, 2025.
This article, published in Genome Research, is available under a Creative Commons License (Attribution 4.0 International), as described at http://creativecommons.org/licenses/by/4.0/.