GitHub - kalibera/iotools: High-performance I/O tools to run distributed R jobs seamlessly on Hadoop and handle chunk-wise data processing

High-performance I/O tools for R

Anyone dealing with large data knows that stock tools in R are bad at loading (non-binary) data to R. This package started as an attempt to provide high-performance parsing tools that minimize copying and avoid the use of strings when possible (see mstrsplit, for example).

To allow processing of arbitrarily large files we have added way to process chunk-wise input, making it possible to compute on streaming input as well as very large files (see chunk.reader and chunk.apply).

The next natural progress was to wrap support for Hadoop streaming. The major goal was to make it possible to compute using Hadoop Map Reduce by writing code that is very natural - very much like using lapply on data chunks without the need to know anything about Hadoop. See the WiKi page for the idea and hmr function for the documentation.

Name		Name	Last commit message	Last commit date
Latest commit History 211 Commits
.github/workflows		.github/workflows
R		R
man		man
src		src
tests		tests
.Rbuildignore		.Rbuildignore
.gitignore		.gitignore
DESCRIPTION		DESCRIPTION
NAMESPACE		NAMESPACE
NEWS		NEWS
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

High-performance I/O tools for R

About

Uh oh!

Releases

Packages

Languages

kalibera/iotools

Folders and files

Latest commit

History

Repository files navigation

High-performance I/O tools for R

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages