+
Skip to content

read_biom segfaults on EMP data #5

@pbradz

Description

@pbradz

Issue description

I am attempting to read in a subset of the Earth Microbiome Project data available at ftp://ftp.microbio.me/emp/release1/otu_tables/deblur (sorry, FTP links don't seem to work here), specifically the version emp_deblur_90bp.subset_2k.rare_5000.biom. However, I am getting the following error:

> library(biomformat)
> tmp <- read_biom("emp_deblur_90bp.subset_2k.rare_5000.biom")
Error: segfault from C stack overflow
Warning message:
Lost warning messages
Error : C stack usage  140735518249684 is too close to the limit

R then becomes unresponsive until I send a Ctrl+C.

Steps to reproduce the issue

  1. Download biom file from EMP ftp site.
  2. Attempt to read with biomformat::read_biom.

Workaround

If I use the command line biom tool to convert this .biom file to .tsv format and back, following the guide here and preserving just the sample metadata (I don't need any taxon metadata for my purpose), the file loads fine:

> library(biomformat)
> emp <- read_biom("emp_deblur_orig_metadata.biom")
> meta <- biomformat::sample_metadata(emp)
> meta[1:5,1:5]
                                                   depth_m BarcodeSequence
1883.2008.269.Crump.Artic.LTREB.main.lane2.NoIndex      16    TCAAGCAATACG
1453.45796SDZ4.G4.Pnem.stom                              0    CTCGTGAATGAC
1039.L.Vermelha.SA                                    0.08    GTCGGAAATTGT
1773.Thraupis.gauco3.lgi                                 0    CATCTGGGCAAT
1453.45300SDZ4.D7.Pnem.stom                              0    ACAGCTCAAACA
                                                   run_center altitude_m
1883.2008.269.Crump.Artic.LTREB.main.lane2.NoIndex       CCME        0.0
1453.45796SDZ4.G4.Pnem.stom                              CCME        0.0
1039.L.Vermelha.SA                                       CCME        0.0
1773.Thraupis.gauco3.lgi                                  ANL        0.0
1453.45300SDZ4.D7.Pnem.stom                              CCME        0.0
                                                   elevation_m
1883.2008.269.Crump.Artic.LTREB.main.lane2.NoIndex       719.0
1453.45796SDZ4.G4.Pnem.stom                              18.57
1039.L.Vermelha.SA                                       13.11
1773.Thraupis.gauco3.lgi                                   100
1453.45300SDZ4.D7.Pnem.stom                              18.57

The file size ends up being somewhat smaller (36M vs. 54M) but I'm not remotely close to running out of memory so I don't think that's the problem.

Additional details

Output of sessionInfo():

R version 3.5.3 (2019-03-11)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 16.04.6 LTS

Matrix products: default
BLAS: /usr/lib/atlas-base/atlas/libblas.so.3.0
LAPACK: /usr/lib/atlas-base/atlas/liblapack.so.3.0

locale:
 [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C
 [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8
 [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8
 [7] LC_PAPER=en_US.UTF-8       LC_NAME=C
 [9] LC_ADDRESS=C               LC_TELEPHONE=C
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base

other attached packages:
[1] biomformat_1.10.1

loaded via a namespace (and not attached):
 [1] compiler_3.5.3  plyr_1.8.4      Matrix_1.2-16   tools_3.5.3
 [5] rhdf5_2.26.2    Rcpp_1.0.1      grid_3.5.3      jsonlite_1.6
 [9] lattice_0.20-38 Rhdf5lib_1.4.3

May possibly be related to this issue?

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions

      点击 这是indexloc提供的php浏览器服务,不要输入任何密码和下载