-
-
Notifications
You must be signed in to change notification settings - Fork 95
Closed
Description
When I train a model with a task containing a numeric column, and then predict new data where that column is an integer, then $predict_newdata()
internally creates a task where that feature is reported as a dbl
feature, but the data is integer
when retrieved with $data()
. In most cases integers and numerics are the same in R
, but when calling external code (C, Python) having the wrong type is bad. The discrepancy between reported type and actual data when gotten with $data()
makes problems when one writes code that depends on the Task
's reported type to do feature conversion.
ll <- lrn("classif.debug", save_tasks = TRUE)$train(tsk("iris"))
ll$predict_newdata(data.table(
Sepal.Length = 1L, Sepal.Width = 1L,
Petal.Length = 1L, Petal.Width = 1L
))
#> <PredictionClassif> for 1 observations:
#> row_ids truth response
#> 1 <NA> virginica
ll$model$task_predict
#> <TaskClassif:iris> (1 x 5)
#> * Target: Species
#> * Properties: multiclass
#> * Features (4):
#> - dbl (4): Petal.Length, Petal.Width, Sepal.Length, Sepal.Width
ll$model$task_predict$feature_types # prediction task types reported as 'dbl'
#> id type
#> 1: Petal.Length numeric
#> 2: Petal.Width numeric
#> 3: Sepal.Length numeric
#> 4: Sepal.Width numeric
str(ll$model$task_predict$data()) # actual data: integer type
#> Classes ‘data.table’ and 'data.frame': 1 obs. of 5 variables:
#> $ Species : Factor w/ 3 levels "setosa","versicolor",..: NA
#> $ Petal.Length: int 1
#> $ Petal.Width : int 1
#> $ Sepal.Length: int 1
#> $ Sepal.Width : int 1
#> - attr(*, ".internal.selfref")=<externalptr>
I'd say ideally the $predict_newdata()
code should probably make sure that the given newdata is compatible and do conversion to numeric
in this case.