这是indexloc提供的服务,不要输入任何密码
Skip to content

proportion outcome between [0,1] #75

@mikejacktzen

Description

@mikejacktzen

Hi, @fabsig thank you for your work, this sounds like an exciting method.

IIRC, you currently only support 0/1 binary outcomes with a logistic link, ctrl+F searching for 'logit'

https://github.com/fabsig/GPBoost/blob/18f32760ac8617e3db65e1b0993fc4f00c1017be/include/GPBoost/likelihoods.h

Is there a small modification you can make where it enables gpboost to run on proportion outcomes between [0,1] ?

# in R

X = matrix(rnorm(2*100), ncol=2)
b = c(2, -2)
eta = X%*%b - 1
p = plogis(eta)
n = rep(10, length(p))
# Simulate y wins in n games
y = rbinom(100, n, p)

outcome_xgb = y/n
group_data = sample(letters[1:4], length(p),replace=TRUE)

gp_model <- fitGPModel(group_data=group_data, likelihood="bernoulli_logit", y=outcome_xgb, X=X)

I get the obvious error

Error in gpb.call("GPB_OptimLinRegrCoefCovPar_R", ret = NULL, private$handle, :
[GPBoost] [Fatal] Response variable (label) data needs to be 0 or 1 for likelihood of type 'bernoulli_logit'.

I was hoping it would work similary to xgboost when using the 'reg:logistic' objective
https://datascience.stackexchange.com/questions/10595/difference-between-logistic-regression-and-binary-logistic-regression


library(xgboost)
# ?xgboost
# https://datascience.stackexchange.com/questions/10595/difference-between-logistic-regression-and-binary-logistic-regression

# objective = "reg:logistic"
param <- list(max_depth = 2, eta = 1, verbose = 0, nthread = 2, objective = "reg:logistic")

# convert pair (y,n) into scalar proportion (y/n)
outcome_xgb = y/n
dtrain <- xgb.DMatrix(X, label = outcome_xgb)
dtest <- xgb.DMatrix(X, label = outcome_xgb)
watchlist <- list(train = dtrain, eval = dtest)

bst <- xgb.train(param, dtrain, nrounds = 2, watchlist)
pred <- predict(bst,dtrain)
head(pred)

This feature would solve a special use case and I think would be helpful in other scenarios as well.

I see this post, and hope this is not a big ask, but understand if there is hidden complexity that I can not for see

#7

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions