-
Notifications
You must be signed in to change notification settings - Fork 53
Description
Hi, @fabsig thank you for your work, this sounds like an exciting method.
IIRC, you currently only support 0/1 binary outcomes with a logistic link, ctrl+F searching for 'logit'
Is there a small modification you can make where it enables gpboost to run on proportion outcomes between [0,1] ?
# in R
X = matrix(rnorm(2*100), ncol=2)
b = c(2, -2)
eta = X%*%b - 1
p = plogis(eta)
n = rep(10, length(p))
# Simulate y wins in n games
y = rbinom(100, n, p)
outcome_xgb = y/n
group_data = sample(letters[1:4], length(p),replace=TRUE)
gp_model <- fitGPModel(group_data=group_data, likelihood="bernoulli_logit", y=outcome_xgb, X=X)
I get the obvious error
Error in gpb.call("GPB_OptimLinRegrCoefCovPar_R", ret = NULL, private$handle, :
[GPBoost] [Fatal] Response variable (label) data needs to be 0 or 1 for likelihood of type 'bernoulli_logit'.
I was hoping it would work similary to xgboost when using the 'reg:logistic' objective
https://datascience.stackexchange.com/questions/10595/difference-between-logistic-regression-and-binary-logistic-regression
library(xgboost)
# ?xgboost
# https://datascience.stackexchange.com/questions/10595/difference-between-logistic-regression-and-binary-logistic-regression
# objective = "reg:logistic"
param <- list(max_depth = 2, eta = 1, verbose = 0, nthread = 2, objective = "reg:logistic")
# convert pair (y,n) into scalar proportion (y/n)
outcome_xgb = y/n
dtrain <- xgb.DMatrix(X, label = outcome_xgb)
dtest <- xgb.DMatrix(X, label = outcome_xgb)
watchlist <- list(train = dtrain, eval = dtest)
bst <- xgb.train(param, dtrain, nrounds = 2, watchlist)
pred <- predict(bst,dtrain)
head(pred)
This feature would solve a special use case and I think would be helpful in other scenarios as well.
I see this post, and hope this is not a big ask, but understand if there is hidden complexity that I can not for see