I am porting the functionality of this repository into Survival. This package should still be usable (under the name AcceleratedFailure, to avoid name conflicts): I will officially deprecate it when all the functionality has been ported.
To install the package, simply run
Pkg.clone("https://github.com/piever/AcceleratedFailure.jl.git")
in the Julia REPL.
Load relevant packages:
using AcceleratedFailure
using DataFrames
Load dataset and create event column. Event.time is time, whereas Event.censored is true if the data is censored and false otherwise.
filepath = joinpath(Pkg.dir("AcceleratedFailure", "examples"), "rossi.csv")
rossi = readtable(filepath)
rossi[:event] = Event.(rossi[:week], rossi[:arrest].== 0)
Run Cox regression
outcome = coxph(@formula(event ~ fin+age+race+wexp+mar+paro+prio),rossi)
And you should get this outcome (computed with Efron method for ties):
Model: Cox; Formula: event ~ fin + age + race + wexp + mar + paro + prio
Estimate Std.Error z value Pr(>|z|)
fin -0.378636 0.191352 -1.97874 0.0478
age -0.0570578 0.0219674 -2.59738 0.0094
race 0.314349 0.308045 1.02046 0.3075
wexp -0.152399 0.212026 -0.718774 0.4723
mar -0.431883 0.381726 -1.1314 0.2579
paro -0.0850866 0.195743 -0.434686 0.6638
prio 0.0907363 0.0286157 3.17086 0.0015
This package also includes Kaplan Meier estimator of the survival function, which takes a vector of events as input:
x,surv = kaplan_meier(rossi[:event])
In the output x
will be the array with the times of death, surv
the estimated cdf.
To visualize it (using Plots.jl) you can simply type:
using Plots
plot(x,surv, line = :step)
Nelson Aalen estimator for the cumulative hazard:
x, chaz = nelson_aalen(rossi[:event])
To check that everything went well, you may want to verify that cumulative = 1 - exp.(-chaz)
plot(x,surv, line = :step)
plot!(x,exp.(-chaz), line = :step)
You can also get the baseline cumulative hazard from the outcome of a Cox regression:
x, chaz = nelson_aalen(outcome)
In turn, it's always possible to get the cdf from the cumulative hazard:
chaz2cdf(x,chaz)
and the same is true for the hazard:
chaz2haz(x,chaz)
chaz2haz(x,chaz, bw)
where, in the second case, bw
is a parameter used for smoothing.
This package also supports accelerated failure time models.
using Distributions
using DataFrames
using AcceleratedFailure
Let's generate a fake dataset:
N = 50000
x = randn(N)
y = randn(N)
z = randn(N)
t1 = rand.(Gamma.(10,exp.(x-0.3y)))
t2 = rand(Gamma(15,1),N)
W = [(t2[i]>t1[i]) ? Event(t1[i], false) : Event(t2[i], true) for i = 1:N]
df = DataFrame(x = x, y = y, z = z, a = W)
Let's specify the formula and distribution (only PGamma
is implemented so far, where PGamma(params) = Gamma(exp(params[1]),exp(-params[1]))
):
res = aft(@formula(a ~ 1 + x +y + x*y+ z), df, PGamma(); tol = 1e-3)
The outcome should look something like this:
Model: Accelerated Failure Time, dist = AcceleratedFailure.PGamma(params=[2.30856]);
Formula: a ~ 1 + x + y + x * y + z
Estimate Std.Error z value Pr(>|z|)
params1 2.30856 0.00754425 306.003 <1e-99
(Intercept) 2.30315 0.00205682 1119.76 <1e-99
x 0.999012 0.00226804 440.474 <1e-99
y -0.295107 0.00200584 -147.124 <1e-99
z -0.000542461 0.00168103 -0.322695 0.7469
x & y -0.00222899 0.00215507 -1.0343 0.3010
The first coefficient is the parameter of the distribution (in this case the parameter of PGamma
is the log of the shape paramete of the corresponding Gamma
distribution). The remaining coefficients are the log of the contribution to the mean of the various covariates.
nelson_aalen
, kaplan_meier
, Cox regression and accelerated failure time models can be used with Nullable Arrays. The rows missing relevant data will be discarded.
01/02/2017
over 1 year ago
47 commits