-
-
Notifications
You must be signed in to change notification settings - Fork 71
add section about mirai #880
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Changes from all commits
579b80c
9195731
d954914
971f649
d2bf6c2
108af3a
f7984a0
ba0ae59
0bf8c43
d0c2f0c
15689c0
9148578
fb2dae7
5bc056b
55c42cd
46884aa
1c28b4a
792ae66
a1e7a62
d75668a
11f881a
7b768b8
82b42db
eb5f96a
797f47a
3be9abe
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -491,6 +491,101 @@ lrn_rpart$parallel_predict = TRUE | |
prediction = lrn_rpart$predict(tsk_sonar) | ||
``` | ||
|
||
### Parallelization with `mirai` {#sec-parallel-mirai} | ||
|
||
```{r, include = FALSE} | ||
mirai::daemons(0) | ||
``` | ||
|
||
With `mlr3` 1.0.0, we integrated the `r ref_pkg("mirai")` package as an alternative parallelization backend. | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. we don't mention the version everywhere else. |
||
`mirai` provides a lightweight approach to parallelization by starting persistent R sessions called daemons that evaluate tasks in parallel. | ||
These daemons can be launched either locally or on remote machines via SSH or cluster managers. | ||
Compared to the `r ref_pkg("future")` package, `mirai` has significantly lower overhead per task. | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. this is especially advantageous when training many fast fitting models. |
||
Like parallelization with `future`, users only need to configure the backend before starting any computations. | ||
The following sections demonstrate how to use `mirai` for parallelizing resamplings, benchmarks, and tuning. | ||
|
||
To use `mirai` for parallelization, we first need to start the daemons. | ||
We start two daemons and check the status of the daemons. | ||
|
||
```{r, eval = FALSE} | ||
library(mirai) | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. mention seed, |
||
|
||
mirai::daemons(2) | ||
|
||
mirai::status() | ||
``` | ||
|
||
We parallelize a three-fold CV for a decision tree on the sonar task. | ||
|
||
```{r} | ||
tsk_sonar = tsk("sonar") | ||
lrn_rpart = lrn("classif.rpart") | ||
rsmp_cv3 = rsmp("cv", folds = 3) | ||
system.time({resample(tsk_sonar, lrn_rpart, rsmp_cv3)}) | ||
``` | ||
|
||
One advantage of `mirai` is that it eliminates the need to manually set chunk sizes, as it automatically handles task distribution efficiently. | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. explain this a bit more in depth There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Does this also depend on whether the dispatcher is used, right? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I would maybe explain how the dispatcher sends the tasks to the daemons |
||
|
||
Since the daemons are already running, we can proceed directly with the tuning example. | ||
|
||
```{r} | ||
instance = tune( | ||
tnr("random_search", batch_size = 12), | ||
tsk("penguins"), | ||
lrn("classif.rpart", minsplit = to_tune(2, 128)), | ||
rsmp("cv", folds = 3), | ||
term_evals = 20 | ||
) | ||
|
||
instance$archive$n_evals | ||
``` | ||
|
||
`mirai` also supports nested resampling, where the outer loop can be parallelized while the inner loop runs sequentially. | ||
We start a daemon for each outer resampling iteration. | ||
The inner loop runs sequentially. | ||
|
||
```{r} | ||
# reset daemons | ||
mirai::daemons(0) | ||
|
||
mirai::daemons(5) | ||
|
||
lrn_rpart = lrn("classif.rpart", | ||
minsplit = to_tune(2, 128)) | ||
|
||
lrn_rpart_tuned = auto_tuner(tnr("random_search", batch_size = 2), | ||
lrn_rpart, rsmp("cv", folds = 3), msr("classif.ce"), 2) | ||
|
||
rr = resample(tsk("penguins"), lrn_rpart_tuned, rsmp("cv", folds = 5)) | ||
``` | ||
|
||
We can also parallelize both outer and inner loops using the `everywhere()` function to set up daemons for the inner loop on the daemons of the outer loop. | ||
|
||
```{r, eval = FALSE} | ||
# reset daemons | ||
mirai::daemons(0) | ||
|
||
mirai::daemons(5) | ||
|
||
everywhere({ | ||
mirai::daemons(3) | ||
}) | ||
``` | ||
|
||
Note that running the outer loop in the main session while parallelizing the inner loop is currently not supported. | ||
However, you can run the outer loop in a single daemon and the inner loop on multiple daemons | ||
|
||
```{r, eval = FALSE} | ||
# reset daemons | ||
mirai::daemons(0) | ||
|
||
mirai::daemons(1) | ||
|
||
everywhere({ | ||
mirai::daemons(3) | ||
}) | ||
``` | ||
|
||
## Error Handling {#sec-error-handling} | ||
|
||
In large experiments, it is not uncommon that a model fit or prediction fails with an error.\index{debugging} | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.