这是indexloc提供的服务,不要输入任何密码

A new compression strategy to reduce the size of nanopore sequencing data

  1. Ira W. Deveson1,2,6,8
  1. 1Genomics and Inherited Disease Program, Garvan Institute of Medical Research, Sydney, New South Wales 2010, Australia;
  2. 2Centre for Population Genomics, Garvan Institute of Medical Research and Murdoch Children's Research Institute, Sydney, New South Wales 2010, Australia;
  3. 3School of Computer Science and Engineering, University of New South Wales, Sydney, New South Wales 2052, Australia;
  4. 4Department of Computer Engineering, University of Peradeniya, Peradeniya 20400, Sri Lanka;
  5. 5School of Electrical Engineering and Telecommunications, University of New South Wales, Sydney, New South Wales 2052, Australia;
  6. 6St Vincent's Clinical School, Faculty of Medicine, University of New South Wales, Sydney, New South Wales 2052, Australia
  • Corresponding authors: hasindu{at}garvan.org.au, i.deveson{at}garvan.org.au
  • Abstract

    Nanopore sequencing is an increasingly central tool for genomics. Despite rapid advances in the field, large data volumes and computational bottlenecks continue to pose major challenges. Here, we introduce ex-zd, a new data compression strategy that helps address the large size of raw signal data generated during nanopore experiments. Ex-zd encompasses both a lossless compression method, which modestly outperforms all current methods for nanopore signal data compression, and a ‘lossy’ method, which can be used to achieve additional savings. The latter component works by reducing the number of bits used to encode signal data. We show that the three least significant bits in signal data generated on instruments from Oxford Nanopore Technologies (ONT) predominantly encode noise. Their removal reduces file sizes by half without impacting downstream analyses, including basecalling and detection of modified DNA or RNA bases. Ex-zd compression saves hundreds of gigabytes on a single ONT sequencing experiment, thereby increasing the scalability, portability, and accessibility of nanopore sequencing.

    Footnotes

    • 7 Joint first authors.

    • 8 Joint senior authors.

    • [Supplemental material is available for this article.]

    • Article published online before print. Article, supplemental material, and publication date are at https://www.genome.org/cgi/doi/10.1101/gr.280090.124.

    • Freely available online through the Genome Research Open Access option.

    • Received October 2, 2024.
    • Accepted May 2, 2025.

    This article, published in Genome Research, is available under a Creative Commons License (Attribution 4.0 International), as described at http://creativecommons.org/licenses/by/4.0/.

    | Table of Contents
    OPEN ACCESS ARTICLE

    Preprint Server