-
Notifications
You must be signed in to change notification settings - Fork 6
Description
Issue description
I am attempting to read in a subset of the Earth Microbiome Project data available at ftp://ftp.microbio.me/emp/release1/otu_tables/deblur
(sorry, FTP links don't seem to work here), specifically the version emp_deblur_90bp.subset_2k.rare_5000.biom
. However, I am getting the following error:
> library(biomformat)
> tmp <- read_biom("emp_deblur_90bp.subset_2k.rare_5000.biom")
Error: segfault from C stack overflow
Warning message:
Lost warning messages
Error : C stack usage 140735518249684 is too close to the limit
R then becomes unresponsive until I send a Ctrl+C.
Steps to reproduce the issue
- Download biom file from EMP ftp site.
- Attempt to read with
biomformat::read_biom
.
Workaround
If I use the command line biom
tool to convert this .biom file to .tsv format and back, following the guide here and preserving just the sample metadata (I don't need any taxon metadata for my purpose), the file loads fine:
> library(biomformat)
> emp <- read_biom("emp_deblur_orig_metadata.biom")
> meta <- biomformat::sample_metadata(emp)
> meta[1:5,1:5]
depth_m BarcodeSequence
1883.2008.269.Crump.Artic.LTREB.main.lane2.NoIndex 16 TCAAGCAATACG
1453.45796SDZ4.G4.Pnem.stom 0 CTCGTGAATGAC
1039.L.Vermelha.SA 0.08 GTCGGAAATTGT
1773.Thraupis.gauco3.lgi 0 CATCTGGGCAAT
1453.45300SDZ4.D7.Pnem.stom 0 ACAGCTCAAACA
run_center altitude_m
1883.2008.269.Crump.Artic.LTREB.main.lane2.NoIndex CCME 0.0
1453.45796SDZ4.G4.Pnem.stom CCME 0.0
1039.L.Vermelha.SA CCME 0.0
1773.Thraupis.gauco3.lgi ANL 0.0
1453.45300SDZ4.D7.Pnem.stom CCME 0.0
elevation_m
1883.2008.269.Crump.Artic.LTREB.main.lane2.NoIndex 719.0
1453.45796SDZ4.G4.Pnem.stom 18.57
1039.L.Vermelha.SA 13.11
1773.Thraupis.gauco3.lgi 100
1453.45300SDZ4.D7.Pnem.stom 18.57
The file size ends up being somewhat smaller (36M vs. 54M) but I'm not remotely close to running out of memory so I don't think that's the problem.
Additional details
Output of sessionInfo():
R version 3.5.3 (2019-03-11)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 16.04.6 LTS
Matrix products: default
BLAS: /usr/lib/atlas-base/atlas/libblas.so.3.0
LAPACK: /usr/lib/atlas-base/atlas/liblapack.so.3.0
locale:
[1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
[3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8
[5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
[7] LC_PAPER=en_US.UTF-8 LC_NAME=C
[9] LC_ADDRESS=C LC_TELEPHONE=C
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
attached base packages:
[1] stats graphics grDevices utils datasets methods base
other attached packages:
[1] biomformat_1.10.1
loaded via a namespace (and not attached):
[1] compiler_3.5.3 plyr_1.8.4 Matrix_1.2-16 tools_3.5.3
[5] rhdf5_2.26.2 Rcpp_1.0.1 grid_3.5.3 jsonlite_1.6
[9] lattice_0.20-38 Rhdf5lib_1.4.3
May possibly be related to this issue?