Finding area unde the curve in a Gompertz distribution in R - r

I am trying to fit a gompertz model to survival data. I am using the package 'flexsurv', and after setting up the data I use the following especification:
Gompertz.fit <- flexsurvreg(surv.age ~ Region + Sex + Income,
data = SA_Data, dist = "gompertz")
As I want to estimate the area under the curve but couldn't find a command, I thought about estimating the AUC "by hand", for which I need the gamma parameter. However, I can't seem to find it. When I try the survreg to estimate the parameters I get the following answer
> result.survreg.0 <- survreg(Surv(Age_At_DC, status)~1, data = SA_Data, dist = "gompertz") Error in match.arg(dist,
names(survreg.distributions)) : 'arg' should be one of “extreme”,
“logistic”, “gaussian”, “weibull”, “exponential”, “rayleigh”,
“loggaussian”, “lognormal”, “loglogistic”, “t”
Has anyone else estimated the AUC with a gompertz distribution in R?

Related

How to estimate coefficients for linear regression model on CEM matched data using `R` `cem` package?

I'm working on a project where we'd like to run a follow-up linear regression model on treatment-control data where the treatments have been matched to the controls using the cem package to perform coarsened exact matching:
match <- cem(treatment="cohort", data=df, drop=c("member_id","period","cohort_period"))
est <- att(match, total_cost ~ cohort + period + cohort_period, data = df)
where I'd like to estimate the coefficient and 95% CI on the "cohort_period" interaction term. It seems the att function in the cem package only estimates the coefficient for the specified treatment variable (in this case, "cohort") while adjusting for other variables in the regression.
Is there a way to return the coefficients and 95% CIs for the other regression terms?
Figured it out! I was using the wrong package - instead of cem I discovered the MatchIt and Zelig packages allow me to perform both exact matching and parameteric regression on the matched data:
library(MatchIt)
library(Zelig)
matched_df <- matchit(cohort ~ age_catg + sex + market_code + risk_score_catg, method="exact", data=df)
matched_df_reg <- zelig(total_cost ~ cohort + period + cohort_period, data = match.data(matched_df), model = "ls")

How do you obtain the zero inflation parameter (pz) of a zero-inflated NB model using glmmADMB?

I have run a zero-inflated negative binomial model using the glmmADMB package in R. From what I understand, the pz parameter is the zero-inflation parameter and it is fitted by the package to the model that you run- the pz value that best fits your data is searched for and the package starts searching from pz=0.2. This is the default and can be changed.
After you run the model, does anyone know how to find what pz value is chosen for the data?
The zero-inflation estimate can be obtained (along with its standard deviation) from the model object. See below using built-in data from the glmmADMB package:
library(glmmADMB)
# munge data
Owls = transform(Owls, Nest = reorder(Nest, NegPerChick),
logBroodSize = log(BroodSize), NCalls = SiblingNegotiation)
# fit model
fit_zinb = glmmadmb(NCalls ~ (FoodTreatment + ArrivalTime) * SexParent +
offset(logBroodSize) + (1 | Nest),
data = Owls, zeroInflation = TRUE,
family = "nbinom")
# overall summary, check for match
summary(fit_zinb)
# zero-inflation estimate
fit_zinb$pz
# zero-inflation standard deviation
fit_zinb$sd_pz

Format in R for point prediction of survival analysis

I am befuddled by the format to perform a simple prediction using R's survival package
library(survival)
lung.surv <- survfit(Surv(time,status) ~ 1, data = lung)
So fitting a simple exponential regression (for example purposes only) is:
lung.reg <- survreg(Surv(time,status) ~ 1, data = lung, dist="exponential")
How would I predict the percent survival at time=400?
When I use the following:
myPredict400 <- predict(lung.reg, newdata=data.frame(time=400), type="response")
I get the following:
myPredict400
1
421.7758
I was expecting something like 37% so I am missing something pretty obvious
The point with this survival function is to find an empirical distribution that fits the survival times. Essentially you are associating a survival time with a probability. Once you have that distribution, you can pick out the survival rate for a given time.
Try this:
library(survival)
lung.reg <- survreg(Surv(time,status) ~ 1, data = lung) # because you want a distribution
pct <- 1:99/100 # this creates the empirical survival probabilities
myPredict400 <- predict(lung.reg, newdata=data.frame(time=400),type='quantile', p=pct)
indx = which(abs(myPredict400 - 400) == min(abs(myPredict400 - 400))) # find the closest survival time to 400
print(1 - pct[indx]) # 0.39
Straight from the help docs, here's a plot of it:
matplot(myPredict400, 1-pct, xlab="Months", ylab="Survival", type='l', lty=c(1,2,2), col=1)
Edited
You're basically fitting a regression to a distribution of probabilities (hence 1...99 out of 100). If you make it go to 100, then the last value of your prediction is inf because the survival rate in the 100th percentile is infinite. This is what the quantile and pct arguments do.
For example, setting pct = 1:999/1000 you get much more precise values for the prediction (myPredict400). Also, if you set pct to be some value that's not a proper probability (i.e. less than 0 or more than 1) you'll get an error. I suggest you play with these values and see how they impact your survival rates.

R: varying-coefficient GAMM models in mgcv - extracting 'by' variable coefficients?

I am creating a varying-coefficient GAMM using 'mgcv' in R with a continuous 'by' variable by using the by setting. However, I am having difficulty in locating the parameter estimate of the effect of the 'by' variable. In this example we determine the spatially-dependent effect of temperature t on sole eggs (i.e. how the linear effect of temperature on sole eggs changes across space):
require(mgcv)
require(gamair)
data(sole)
b = gam(eggs ~ s(la,lo) + s(la,lo, by = t), data = sole)
We can then plot the predicted effects of s(la,lo, by = t) against the predictor t:
pred <- predict(b, type = "terms", se.fit =T)
by.variable.prediction <- pred[[1]][,2]
plot(x= sole$t, y = by.variable.prediction)
However, I can't find a listing/function with the parameter estimates of the 'by' variable t for each sampling location. summary(), coef(), and predict() do not give you the parameter estimates.
Any help would be appreciated!
So the coefficient for the variable t is the value where t is equal to 1, conditional on the latitude and longitude. So one way to get the coefficient/parameter estimate for t at each latitude and longitude is to construct your own dataframe with a range of latitude/longitude combinations with t=1 and run predict.gam on that (rather than running predict.gam on the data used the fit the model, as you have done). So:
preddf <- expand.grid(list(la=seq(min(sole$la), max(sole$la), length.out=100),
lo=seq(min(sole$lo), max(sole$lo), length.out=100),
t=1))
preddf$parameter <- predict(b, preddf, type="response")
And then if you want to visualize this coefficient over space, you could graph it with ggplot2.
library(ggplot2)
ggplot(preddf) +
geom_tile(aes(x=lo, y=la, fill=parameter))

R How to get confidence interval for multinominal logit?

Let me use UCLA example on multinominal logit as a running example---
library(nnet)
library(foreign)
ml <- read.dta("http://www.ats.ucla.edu/stat/data/hsbdemo.dta")
ml$prog2 <- relevel(ml$prog, ref = "academic")
test <- multinom(prog2 ~ ses + write, data = ml)
dses <- data.frame(ses = c("low", "middle", "high"), write = mean(ml$write))
predict(test, newdata = dses, "probs")
I wonder how can I get 95% confidence interval?
This can be accomplished with the effects package, which I showcased for another question at Cross Validated here.
Let's look at your example.
library(nnet)
library(foreign)
ml <- read.dta("http://www.ats.ucla.edu/stat/data/hsbdemo.dta")
ml$prog2 <- relevel(ml$prog, ref = "academic")
test <- multinom(prog2 ~ ses + write, data = ml)
Instead of using the predict() from base, we use Effect() from effects
require(effects)
fit.eff <- Effect("ses", test, given.values = c("write" = mean(ml$write)))
data.frame(fit.eff$prob, fit.eff$lower.prob, fit.eff$upper.prob)
prob.academic prob.general prob.vocation L.prob.academic L.prob.general L.prob.vocation U.prob.academic
1 0.4396845 0.3581917 0.2021238 0.2967292 0.23102295 0.10891758 0.5933996
2 0.4777488 0.2283353 0.2939159 0.3721163 0.15192359 0.20553211 0.5854098
3 0.7009007 0.1784939 0.1206054 0.5576661 0.09543391 0.05495437 0.8132831
U.prob.general U.prob.vocation
1 0.5090244 0.3442749
2 0.3283014 0.4011175
3 0.3091388 0.2444031
If we want to, we can also plot the predicted probabilities with their respective confidence intervals using the facilities in effects.
plot(fit.eff)
Simply use the confint function on your model object.
ci <- confint(test, level=0.95)
Note that confint is a generic function and a specific version is run for multinom, as you can see by running
> methods(confint)
[1] confint.default confint.glm* confint.lm* confint.multinom*
[5] confint.nls*
EDIT:
as for the matter of calculating confidence interval for the predicted probabilities, I quote from: https://stat.ethz.ch/pipermail/r-help/2004-April/048917.html
Is there any possibility to estimate confidence intervalls for the
probabilties with the multinom function?
No, as confidence intervals (sic) apply to single parameters not
probabilities (sic). The prediction is a probability distribution, so
the uncertainty would have to be some region in Kd space, not an interval.
Why do you want uncertainty statements about predictions (often called
tolerance intervals/regions)? In this case you have an event which
happens or not and the meaningful uncertainty is the probability
distribution. If you really have need of a confidence region, you could
simulate from the uncertainty in the fitted parameters, predict and
summarize somehow the resulting empirical distribution.

Resources