+
Skip to content

predict_newdata may want to check / convert column types #685

@mb706

Description

@mb706

When I train a model with a task containing a numeric column, and then predict new data where that column is an integer, then $predict_newdata() internally creates a task where that feature is reported as a dbl feature, but the data is integer when retrieved with $data(). In most cases integers and numerics are the same in R, but when calling external code (C, Python) having the wrong type is bad. The discrepancy between reported type and actual data when gotten with $data() makes problems when one writes code that depends on the Task's reported type to do feature conversion.

ll <- lrn("classif.debug", save_tasks = TRUE)$train(tsk("iris"))
ll$predict_newdata(data.table(
    Sepal.Length = 1L, Sepal.Width = 1L,
    Petal.Length = 1L, Petal.Width = 1L
))
#> <PredictionClassif> for 1 observations:
#>  row_ids truth  response
#>        1  <NA> virginica


ll$model$task_predict
#> <TaskClassif:iris> (1 x 5)
#> * Target: Species
#> * Properties: multiclass
#> * Features (4):
#>   - dbl (4): Petal.Length, Petal.Width, Sepal.Length, Sepal.Width
ll$model$task_predict$feature_types  # prediction task types reported as 'dbl'
#>              id    type
#> 1: Petal.Length numeric
#> 2:  Petal.Width numeric
#> 3: Sepal.Length numeric
#> 4:  Sepal.Width numeric

str(ll$model$task_predict$data())  # actual data: integer type
#> Classes ‘data.table’ and 'data.frame':	1 obs. of  5 variables:
#>  $ Species     : Factor w/ 3 levels "setosa","versicolor",..: NA
#>  $ Petal.Length: int 1
#>  $ Petal.Width : int 1
#>  $ Sepal.Length: int 1
#>  $ Sepal.Width : int 1
#>  - attr(*, ".internal.selfref")=<externalptr> 

I'd say ideally the $predict_newdata() code should probably make sure that the given newdata is compatible and do conversion to numeric in this case.

Metadata

Metadata

Assignees

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions

    点击 这是indexloc提供的php浏览器服务,不要输入任何密码和下载