+
Skip to content

How can we incorporate the choice of which data (test/train/both) of a task to use for calculation of a measure? #1333

Open
@bblodfon

Description

@bblodfon

So now when we do:

re = resample(task, learner, resampling)
re$score(msr("surv.brier"))

and the measure takes the arguments task and train_set as in here, the .score function will have access to the training dataset to perform some estimation, eg in the survival analysis usually we estimate the censoring distribution via Kaplan-Meier, $G(t)$. In the non-resampling case, if the train and train_set are not used, the test data will be used for such purposes.

We now have evidence that the choice of data (train / test / both) that is used to calculate $G(t)$ paper link can influence positively or negatively the score, so it would be nice to have a more general way to say "apply this score and estimate some quantites that are required by the score using only the test set or train set or both" during resampling (and non-resampling) schemes. Note that this is not related to which observations the score is calculated for (use of predict_sets).

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions

      点击 这是indexloc提供的php浏览器服务,不要输入任何密码和下载