这是indexloc提供的服务,不要输入任何密码
Skip to content

Conversation

@mdeland
Copy link
Contributor

@mdeland mdeland commented Mar 30, 2014

Introduces two ways to apply a function over a stream, one using threads and one using processes, without having to read the entire stream.
We altered the default implementation of .to_vw() to use the parallel process approach. Benchmark tests have shown a 2-4x speedup over using a single process. The more CPU bound the function is, the more improvement you can realize.
The threading functions are not currently used by the project, but could be useful in the future.

dkrasner pushed a commit that referenced this pull request Apr 3, 2014
@dkrasner dkrasner merged commit dcdd19c into columbia-applied-data-science:master Apr 3, 2014
@dkrasner dkrasner deleted the stream_parallel branch April 3, 2014 16:19
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants