From 9195731f0e3329e83c28dc0318ddde033270bedc Mon Sep 17 00:00:00 2001 From: be-marc Date: Wed, 21 May 2025 08:23:47 +0200 Subject: [PATCH 01/14] add learner weights --- book/chapters/chapter1/introduction_and_overview.qmd | 7 +++++++ book/chapters/chapter2/data_and_basic_modeling.qmd | 7 ++++--- 2 files changed, 11 insertions(+), 3 deletions(-) diff --git a/book/chapters/chapter1/introduction_and_overview.qmd b/book/chapters/chapter1/introduction_and_overview.qmd index 8d7b45026..5d4cb2365 100644 --- a/book/chapters/chapter1/introduction_and_overview.qmd +++ b/book/chapters/chapter1/introduction_and_overview.qmd @@ -3,6 +3,13 @@ aliases: - "/introduction_and_overview.html" --- + +```{r} +# extra packages that must be installed in the docker image +remotes::install_github("mlr-org/mlr3") +remotes::install_github("mlr-org/mlr3learners") +``` + # Introduction and Overview {#sec-introduction} {{< include ../../common/_setup.qmd >}} diff --git a/book/chapters/chapter2/data_and_basic_modeling.qmd b/book/chapters/chapter2/data_and_basic_modeling.qmd index 1ee1bb062..c1e0ebf27 100644 --- a/book/chapters/chapter2/data_and_basic_modeling.qmd +++ b/book/chapters/chapter2/data_and_basic_modeling.qmd @@ -1027,7 +1027,8 @@ There are seven column roles: 4. `"order"`: Variable(s) used to order data returned by `$data()`; must be sortable with `order()`. 5. `"group"`: Variable used to keep observations together during resampling. 6. `"stratum"`: Variable(s) to stratify during resampling. -7. `"weight"`: Observation weights. Only one numeric column may have this role. +7. `"weights_learner"`: Weights used during training by the learner. Only one numeric column may have this role. +8. `"weights_measure"`: Weights used during scoring by the measure. Only one numeric column may have this role. We have already seen how features and targets work in @sec-tasks, which are the only column roles that each task must have. In @sec-strat-group we will have a look at the `stratum` and `group` column roles. @@ -1051,7 +1052,7 @@ tsk_mtcars_order$data(ordered = TRUE) In this example we can see that by setting `"idx"` to have the `"order"` column role, it is no longer used as a feature when we run `$data()` but instead is used to order the observations according to its value. This metadata is not passed to a learner. -The `weights` column role is used to weight data points differently. +The `weights_learner` column role is used to weight data points differently. One example of why we would do this is in classification tasks with severe class imbalance, where weighting the minority class more heavily may improve the model's predictive performance for that class. For example in the `breast_cancer` dataset, there are more instances of benign tumors than malignant tumors, so if we want to better predict malignant tumors we could weight the data in favor of this class: @@ -1065,7 +1066,7 @@ df$weights = ifelse(df$class == "malignant", 2, 1) # create new task and role cancer_weighted = as_task_classif(df, target = "class") -cancer_weighted$set_col_roles("weights", roles = "weight") +cancer_weighted$set_col_roles("weights", roles = "weights_learner") # compare weighted and unweighted predictions split = partition(cancer_unweighted) From d954914ef0959fe0f6f88035f505a65eb1bd10fa Mon Sep 17 00:00:00 2001 From: be-marc Date: Wed, 21 May 2025 08:32:29 +0200 Subject: [PATCH 02/14] ... --- book/chapters/chapter1/introduction_and_overview.qmd | 1 - 1 file changed, 1 deletion(-) diff --git a/book/chapters/chapter1/introduction_and_overview.qmd b/book/chapters/chapter1/introduction_and_overview.qmd index 5d4cb2365..e56a5e8ef 100644 --- a/book/chapters/chapter1/introduction_and_overview.qmd +++ b/book/chapters/chapter1/introduction_and_overview.qmd @@ -7,7 +7,6 @@ aliases: ```{r} # extra packages that must be installed in the docker image remotes::install_github("mlr-org/mlr3") -remotes::install_github("mlr-org/mlr3learners") ``` # Introduction and Overview {#sec-introduction} From 971f649f364be504c043a1f1a375b704112a1831 Mon Sep 17 00:00:00 2001 From: be-marc Date: Wed, 21 May 2025 08:45:18 +0200 Subject: [PATCH 03/14] ... --- book/chapters/chapter1/introduction_and_overview.qmd | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/book/chapters/chapter1/introduction_and_overview.qmd b/book/chapters/chapter1/introduction_and_overview.qmd index e56a5e8ef..a8a42e7bf 100644 --- a/book/chapters/chapter1/introduction_and_overview.qmd +++ b/book/chapters/chapter1/introduction_and_overview.qmd @@ -6,7 +6,7 @@ aliases: ```{r} # extra packages that must be installed in the docker image -remotes::install_github("mlr-org/mlr3") +remotes::install_github("mlr-org/mlr3@custom_cv") ``` # Introduction and Overview {#sec-introduction} From 108af3a52df69ab8e578aa7281fc91e7099c3319 Mon Sep 17 00:00:00 2001 From: be-marc Date: Wed, 21 May 2025 09:46:27 +0200 Subject: [PATCH 04/14] ... --- book/chapters/chapter1/introduction_and_overview.qmd | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/book/chapters/chapter1/introduction_and_overview.qmd b/book/chapters/chapter1/introduction_and_overview.qmd index a8a42e7bf..b1a02a11f 100644 --- a/book/chapters/chapter1/introduction_and_overview.qmd +++ b/book/chapters/chapter1/introduction_and_overview.qmd @@ -6,7 +6,8 @@ aliases: ```{r} # extra packages that must be installed in the docker image -remotes::install_github("mlr-org/mlr3@custom_cv") +remotes::install_github("mlr-org/mlr3") +remotes::install_github("mlr-org/mlr3pipelines") ``` # Introduction and Overview {#sec-introduction} From f7984a06e2acd130c883f8d9ec2393b308f74ed1 Mon Sep 17 00:00:00 2001 From: be-marc Date: Wed, 21 May 2025 11:17:04 +0200 Subject: [PATCH 05/14] ... --- book/chapters/chapter1/introduction_and_overview.qmd | 1 + 1 file changed, 1 insertion(+) diff --git a/book/chapters/chapter1/introduction_and_overview.qmd b/book/chapters/chapter1/introduction_and_overview.qmd index b1a02a11f..8414097ec 100644 --- a/book/chapters/chapter1/introduction_and_overview.qmd +++ b/book/chapters/chapter1/introduction_and_overview.qmd @@ -8,6 +8,7 @@ aliases: # extra packages that must be installed in the docker image remotes::install_github("mlr-org/mlr3") remotes::install_github("mlr-org/mlr3pipelines") +remotes::install_github("mlr-org/mlr3fairness@weights") ``` # Introduction and Overview {#sec-introduction} From 9148578365f0ad1809cf8cb87c4b79bd5e16fea6 Mon Sep 17 00:00:00 2001 From: be-marc Date: Fri, 23 May 2025 12:16:08 +0200 Subject: [PATCH 06/14] ... --- book/chapters/chapter9/preprocessing.qmd | 1 + 1 file changed, 1 insertion(+) diff --git a/book/chapters/chapter9/preprocessing.qmd b/book/chapters/chapter9/preprocessing.qmd index 95a7e1632..acd7d7797 100644 --- a/book/chapters/chapter9/preprocessing.qmd +++ b/book/chapters/chapter9/preprocessing.qmd @@ -239,6 +239,7 @@ magick::image_trim(fig) Using this pipeline we can now run experiments with `lrn("regr.ranger")`, which cannot handle missing data; we also compare a simpler pipeline that only uses OOR imputation to demonstrate performance differences resulting from different strategies. ```{r preprocessing-016} +#| eval: false glrn_rf_impute_hist = as_learner(impute_hist %>>% lrn("regr.ranger")) glrn_rf_impute_hist$id = "RF_imp_Hist" From 1c28b4a01025a0c04f25651152cfc842f5aab619 Mon Sep 17 00:00:00 2001 From: be-marc Date: Fri, 23 May 2025 13:17:39 +0200 Subject: [PATCH 07/14] ... --- book/chapters/chapter1/introduction_and_overview.qmd | 1 + 1 file changed, 1 insertion(+) diff --git a/book/chapters/chapter1/introduction_and_overview.qmd b/book/chapters/chapter1/introduction_and_overview.qmd index 8414097ec..368e4f901 100644 --- a/book/chapters/chapter1/introduction_and_overview.qmd +++ b/book/chapters/chapter1/introduction_and_overview.qmd @@ -9,6 +9,7 @@ aliases: remotes::install_github("mlr-org/mlr3") remotes::install_github("mlr-org/mlr3pipelines") remotes::install_github("mlr-org/mlr3fairness@weights") +remotes::install_github("mlr-org/mlr3learners") ``` # Introduction and Overview {#sec-introduction} From 865bab9a177a18d44a9927fdc4ef6b92277356c6 Mon Sep 17 00:00:00 2001 From: be-marc Date: Fri, 23 May 2025 14:21:20 +0200 Subject: [PATCH 08/14] ... --- book/chapters/chapter9/preprocessing.qmd | 1 + 1 file changed, 1 insertion(+) diff --git a/book/chapters/chapter9/preprocessing.qmd b/book/chapters/chapter9/preprocessing.qmd index acd7d7797..f14606112 100644 --- a/book/chapters/chapter9/preprocessing.qmd +++ b/book/chapters/chapter9/preprocessing.qmd @@ -447,6 +447,7 @@ These outputs look sensible compared to @fig-energy so we can now run our final We do not need to add the `PipeOp` to each learner as we can apply it once (as above) before any model training by applying it to all available data. ```{r preprocessing-027, warning=FALSE, R.options = list(datatable.print.nrows = 13, datatable.print.class = FALSE, datatable.print.keys = FALSE, datatable.print.trunc.cols = TRUE)} +#| eval: false learners = list(lrn_baseline, lrn("regr.rpart"), glrn_xgb_impact, glrn_rf_impute_oor, glrn_lm_robust, glrn_log_lm_robust) From d52e8ed7b247ea961416dc2124efaa9ff95d1d93 Mon Sep 17 00:00:00 2001 From: be-marc Date: Mon, 26 May 2025 07:13:44 +0200 Subject: [PATCH 09/14] ... --- book/chapters/chapter9/preprocessing.qmd | 6 +----- 1 file changed, 1 insertion(+), 5 deletions(-) diff --git a/book/chapters/chapter9/preprocessing.qmd b/book/chapters/chapter9/preprocessing.qmd index 71b1717eb..977d402f3 100644 --- a/book/chapters/chapter9/preprocessing.qmd +++ b/book/chapters/chapter9/preprocessing.qmd @@ -238,12 +238,8 @@ magick::image_trim(fig) Using this pipeline we can now run experiments with `lrn("regr.ranger")`, which cannot handle missing data; we also compare a simpler pipeline that only uses OOR imputation to demonstrate performance differences resulting from different strategies. -<<<<<<< HEAD -```{r preprocessing-016} -#| eval: false -======= ```{r preprocessing-015} ->>>>>>> main +#| eval: false glrn_rf_impute_hist = as_learner(impute_hist %>>% lrn("regr.ranger")) glrn_rf_impute_hist$id = "RF_imp_Hist" From 2ed60bc843cfc47d4e72ec20e6212c6dd3ab00c5 Mon Sep 17 00:00:00 2001 From: be-marc Date: Mon, 26 May 2025 08:51:09 +0200 Subject: [PATCH 10/14] ... --- book/chapters/chapter1/introduction_and_overview.qmd | 1 + 1 file changed, 1 insertion(+) diff --git a/book/chapters/chapter1/introduction_and_overview.qmd b/book/chapters/chapter1/introduction_and_overview.qmd index 5472f22b4..ac98f7e8d 100644 --- a/book/chapters/chapter1/introduction_and_overview.qmd +++ b/book/chapters/chapter1/introduction_and_overview.qmd @@ -10,6 +10,7 @@ remotes::install_github("mlr-org/mlr3") remotes::install_github("mlr-org/mlr3pipelines") remotes::install_github("mlr-org/mlr3fairness@weights") remotes::install_github("mlr-org/mlr3learners") +remotes::install_github("mlr-org/mlr3batchmark@logger_index") ``` # Introduction and Overview {#sec-introduction} From 1d1b40993bcb4398b4916b5ae633ffc3ae8b263e Mon Sep 17 00:00:00 2001 From: be-marc Date: Mon, 26 May 2025 10:24:35 +0200 Subject: [PATCH 11/14] ... --- book/chapters/chapter1/introduction_and_overview.qmd | 1 + 1 file changed, 1 insertion(+) diff --git a/book/chapters/chapter1/introduction_and_overview.qmd b/book/chapters/chapter1/introduction_and_overview.qmd index ac98f7e8d..6a485c147 100644 --- a/book/chapters/chapter1/introduction_and_overview.qmd +++ b/book/chapters/chapter1/introduction_and_overview.qmd @@ -11,6 +11,7 @@ remotes::install_github("mlr-org/mlr3pipelines") remotes::install_github("mlr-org/mlr3fairness@weights") remotes::install_github("mlr-org/mlr3learners") remotes::install_github("mlr-org/mlr3batchmark@logger_index") +remotes::install_cran("iml") ``` # Introduction and Overview {#sec-introduction} From b19464a523bc23e9a26fcb89bc61f82e18f92665 Mon Sep 17 00:00:00 2001 From: be-marc Date: Mon, 26 May 2025 11:37:46 +0200 Subject: [PATCH 12/14] ... --- book/chapters/chapter1/introduction_and_overview.qmd | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/book/chapters/chapter1/introduction_and_overview.qmd b/book/chapters/chapter1/introduction_and_overview.qmd index 6a485c147..683bc584b 100644 --- a/book/chapters/chapter1/introduction_and_overview.qmd +++ b/book/chapters/chapter1/introduction_and_overview.qmd @@ -10,7 +10,7 @@ remotes::install_github("mlr-org/mlr3") remotes::install_github("mlr-org/mlr3pipelines") remotes::install_github("mlr-org/mlr3fairness@weights") remotes::install_github("mlr-org/mlr3learners") -remotes::install_github("mlr-org/mlr3batchmark@logger_index") +remotes::install_github("mlr-org/mlr3batchmark") remotes::install_cran("iml") ``` From 83d4cb1fd6b5b3fc5757133ab098cc769c150a2b Mon Sep 17 00:00:00 2001 From: be-marc Date: Mon, 26 May 2025 13:42:18 +0200 Subject: [PATCH 13/14] ... --- book/chapters/chapter1/introduction_and_overview.qmd | 1 + 1 file changed, 1 insertion(+) diff --git a/book/chapters/chapter1/introduction_and_overview.qmd b/book/chapters/chapter1/introduction_and_overview.qmd index 683bc584b..ec5ca2cd6 100644 --- a/book/chapters/chapter1/introduction_and_overview.qmd +++ b/book/chapters/chapter1/introduction_and_overview.qmd @@ -12,6 +12,7 @@ remotes::install_github("mlr-org/mlr3fairness@weights") remotes::install_github("mlr-org/mlr3learners") remotes::install_github("mlr-org/mlr3batchmark") remotes::install_cran("iml") +remotes::install_github("mlr-org/mlr3spatiotempcv@task_row_hash") ``` # Introduction and Overview {#sec-introduction} From fa663755766a20b23f4a2ebb7bdbc29e74a53766 Mon Sep 17 00:00:00 2001 From: be-marc Date: Mon, 26 May 2025 15:55:34 +0200 Subject: [PATCH 14/14] ... --- book/chapters/appendices/solutions.qmd | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) diff --git a/book/chapters/appendices/solutions.qmd b/book/chapters/appendices/solutions.qmd index dad9012d9..3589568a2 100644 --- a/book/chapters/appendices/solutions.qmd +++ b/book/chapters/appendices/solutions.qmd @@ -2162,10 +2162,12 @@ rows = seq_len(nrow(df))[df$race %in% c("Black", "White") & df$sex %in% c("Femal adult_subset$filter(rows) adult_subset$set_col_roles("race", add_to = "pta") ``` + And evaluate our measure again: ```{r solutions-122} -prediction$score(msr_3, adult_subset) +#| eval: false +prediction$score(msr_3, task = adult_subset) ``` We can see, that between women there is an even bigger discrepancy compared to men.