+
Skip to content

Update Section 13.2 in the intial survival analysis example #884

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 3 commits into
base: main
Choose a base branch
from
Open
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
Expand Up @@ -128,12 +128,14 @@ By using `po("learner_cv")` for internal resampling and `po("tunethreshold")` to
## Survival Analysis {#sec-survival}

`r index("Survival analysis")` is a field of statistics concerned with trying to predict/estimate the time until an event takes place.
This predictive problem is unique as survival models are trained and tested on data that may include 'censoring', which occurs when the event of interest does *not* take place.
This predictive problem is unique because survival models are trained and tested on data that may include censoring, which occurs when the exact event time is not observed.
The most common type of censoring is 'right censoring', which happens when the event of interest has not yet occurred by the time observation ends.
For the rest of this section, censoring means right censoring unless otherwise stated.
Survival analysis can be hard to explain in the abstract, so as a working example consider a marathon runner in a race.
Here the 'survival problem' is trying to predict the time when the marathon runner finishes the race.
However, if the event of interest does not take place (e.g., the marathon runner gives up and does not finish the race), they are said to be censored.
However, if the organizers stop recording finish times after a certain point, then any runner still running beyond that time will be censored.
Instead of throwing away information about censored events, survival analysis datasets include a status variable that provides information about the 'status' of an observation.
So in our example, we might write the runner's outcome as $(4, 1)$ if they finish the race at four hours, otherwise, if they give up at two hours we would write $(2, 0)$.
In our example, we might record a runner's outcome as $(3, 1)$ if they finish the race at three hours and we observe it, and as $(4, 0)$ if they are still running at four hours when we stop observing.

The key to modeling in survival analysis is that we assume there exists a hypothetical time the marathon runner would have finished if they had not been censored, it is then the job of a survival learner to estimate what the true survival time would have been for a similar runner, assuming they are *not* censored (see @fig-censoring).
Mathematically, this is represented by the hypothetical event time, $Y$, the hypothetical censoring time, $C$, the observed outcome time, $T = \min(Y, C)$, the event indicator $\Delta := (T = Y)$, and as usual some features, $X$.
Expand Down
Loading
点击 这是indexloc提供的php浏览器服务,不要输入任何密码和下载