survreg with left truncated (delayed entry) data - r

I am trying to fit a delayed entry parametric regression model for a Poisson process with Weibull baseline rate. It doesn't appear that R's survreg function supports left truncated data (I get the error: start-stop type Surv objects are not supported). Is there an alternate approach/R package that I could use to do this?

You may want to try something like:
flexsurv::flexsurvreg(formula = Surv(starttime, stoptime, status) ~ x1 + x2,
data=data, dist = "weibull")
Check the options the package offers which may fit your need.

The other way would be to define a truncated distribution and use interval values. See the survival packages example of using a truncated Gaussian distribution aka ”tobit”. https://www.rdocumentation.org/packages/survival/versions/2.44-1.1/topics/tobin

Related

Generalized Linear Model (GLM) in R

I have a response variable (A) which I transformed (logA) and predictor (B) from data (X) which are both continuous. How do I check the linearity between the two variables using Generalized Additive Model (GAM) in R. I use the following code
model <- gamlss(logA ~ pb(B) , data = X, trace = F)
but I am not sure about it, can I add "family=Poisson" in the code when logA is continuous in GLM? Any thoughts on this?
Thanks in advance
If your dependent variable is a count variable, you can use family=PO() without log transformation. With family=PO() a log link is already applied to transform the variable. See help page for gamlss family and also vignette on count regression section 2.1.
So it will go like:
library(gamlss)
fit = gamlss(gear ~ pb(mpg),data=mtcars,family=PO())
You can see that the predictions are log transformed and you need to take the exponential:
with(mtcars,plot(mpg,gear))
points(mtcars$mpg,exp(predict(fit,what="mu")),col="blue",pch=20)

fit weibull modified in R 3 parameter

I want to fit the following data to a Weibull distribution multiplied by a.
datos: enter link description here
y=b1*(1-exp(-(x/b2)^b3)
However, I could not find a solution using the nls function in R.
Could someone guide me down the path to follow in order to find a solution?
The code used is the following:
ajuste_cg<-nls(y~b1*(-exp(-((x/b2)^b3))),data=d,start=list(b1=1000,b2=140,b3=20), trace=T,control = list(maxiter=10000000))
Thanks!
I suggest you to use the package survival. It is made for implementing parametric survival regressions (Weibull models included, of course). Here's the code:
library(survival)
weibull_model = survreg(Surv(time, event) ~ expalatory_variables, dist="weibull")
The Surv() object that you see instead of a y is an R "survival object", and works like your dependent variable in a survival regression. The time and event variables must represent duration and event occurrence (0 or 1), respectively.
Please replace explanatory_variables with your appropriate set of variables.

How to extract the value of the loss function of Cox models from glmnet in R?

I fit a given data using Cox model via glmnet R package and my
little R example is:
library(fastcox);data(FHT);attach(FHT) #
library(glmnet)
library(survival)
fit = glmnet(x,Surv(y,status),family="cox",alpha=1)
From the help document, we know glmnet fits penalized models like
-loglik/nobs + λ*penalty
i.e., objective function = loss function + penalty function.
I want to fetch -loglik/nobs (loss function value,
the negative partial log-likelihood of the fitted model
or two term
Taylor series expansions of the log likelihoods) from the fit object.
Any idea? Tks
BTW, we also tried
fit0 = glmnet(x,Surv(y,status),family="cox",alpha=1,lambda=0)
according to -loglik/nobs + λ*penalty, but it shows errors.

How to get coefficients and their confidence intervals in mixed effects models?

In lm and glm models, I use functions coef and confint to achieve the goal:
m = lm(resp ~ 0 + var1 + var1:var2) # var1 categorical, var2 continuous
coef(m)
confint(m)
Now I added random effect to the model - used mixed effects models using lmer function from lme4 package. But then, functions coef and confint do not work any more for me!
> mix1 = lmer(resp ~ 0 + var1 + var1:var2 + (1|var3))
# var1, var3 categorical, var2 continuous
> coef(mix1)
Error in coef(mix1) : unable to align random and fixed effects
> confint(mix1)
Error: $ operator not defined for this S4 class
I tried to google and use docs but with no result. Please point me in the right direction.
EDIT: I was also thinking whether this question fits more to https://stats.stackexchange.com/ but I consider it more technical than statistical, so I concluded it fits best here (SO)... what do you think?
Not sure when it was added, but now confint() is implemented in lme4.
For example the following example works:
library(lme4)
m = lmer(Reaction ~ Days + (Days | Subject), sleepstudy)
confint(m)
There are two new packages, lmerTest and lsmeans, that can calculate 95% confidence limits for lmer and glmer output. Maybe you can look into those? And coefplot2, I think can do it too (though as Ben points out below, in a not so sophisticated way, from the standard errors on the Wald statistics, as opposed to Kenward-Roger and/or Satterthwaite df approximations used in lmerTest and lsmeans)... Just a shame that there are still no inbuilt plotting facilities in package lsmeans (as there are in package effects(), which btw also returns 95% confidence limits on lmer and glmer objects but does so by refitting a model without any of the random factors, which is evidently not correct).
I suggest that you use good old lme (in package nlme). It has confint, and if you need confint of contrasts, there is a series of choices (estimable in gmodels, contrast in contrasts, glht in multcomp).
Why p-values and confint are absent in lmer: see http://finzi.psych.upenn.edu/R/Rhelp02a/archive/76742.html .
Assuming a normal approximation for the fixed effects (which confint would also have done), we can obtain 95% confidence intervals by
estimate + 1.96*standard error.
The following does not apply to the variance components/random effects.
library("lme4")
mylm <- lmer(Reaction ~ Days + (Days|Subject), data =sleepstudy)
# standard error of coefficient
days_se <- sqrt(diag(vcov(mylm)))[2]
# estimated coefficient
days_coef <- fixef(mylm)[2]
upperCI <- days_coef + 1.96*days_se
lowerCI <- days_coef - 1.96*days_se
I'm going to add a bit here. If m is a fitted (g)lmer model (most of these work for lme too):
fixef(m) is the canonical way to extract coefficients from mixed models (this convention began with nlme and has carried over to lme4)
you can get the full coefficient table with coef(summary(m)); if you have loaded lmerTest before fitting the model, or convert the model after fitting (and then loading lmerTest) via coef(summary(as(m,"merModLmerTest"))), then the coefficient table will include p-values. (The coefficient table is a matrix; you can extract the columns via e.g. ctab[,"Estimate"], ctab[,"Pr(>|t|)"], or convert the matrix to a data frame and use $-indexing.)
As stated above you can get likelihood profile confidence intervals via confint(m); these may be computationally intensive. If you use confint(m, method="Wald") you'll get the standard +/- 1.96SE confidence intervals. (lme uses intervals(m) instead of confint().)
If you prefer to use broom.mixed:
tidy(m,effects="fixed") gives you a table with estimates, standard errors, etc.
tidy(as(m,"merModLmerTest"), effects="fixed") (or fitting with lmerTest in the first place) includes p-values
adding conf.int=TRUE gives (Wald) CIs
adding conf.method="profile" (along with conf.int=TRUE) gives likelihood profile CIs
You can also get confidence intervals by parametric bootstrap (method="boot"), which is considerably slower but more accurate in some circumstances.
To find the coefficient, you can simply use the summary function of lme4
m = lm(resp ~ 0 + var1 + var1:var2) # var1 categorical, var2 continuous
m_summary <- summary(m)
to have all coefficients :
m_summary$coefficient
If you want the confidence interval, multiply the standart error by 1.96:
CI <- m_summary$coefficient[,"Std. Error"]*1.96
print(CI)
I'd suggest tab_model() function from sjPlot package as alternative. Clean and readable output ready for markdown. Reference here and examples here.
For those more visually inclined plot_model() from the same package might come handy too.
Alternative solution is via parameters package using model_parameters() function.

GLM with autoregressive term to correct for serial correlation

I have a stationary time series to which I want to fit a linear model with an autoregressive term to correct for serial correlation, i.e. using the formula At = c1*Bt + c2*Ct + ut, where ut = r*ut-1 + et
(ut is an AR(1) term to correct for serial correlation in the error terms)
Does anyone know what to use in R to model this?
Thanks
Karl
The GLMMarp package will fit these models. If you just want a linear model with Gaussian errors, you can do it with the arima() function where the covariates are specified via the xreg argument.
There are several ways to do this in R. Here are two examples using the "Seatbelts" time series dataset in the datasets package that comes with R.
The arima() function comes in package:stats that is included with R. The function takes an argument of the form order=c(p, d, q) where you you can specify the order of the auto-regressive, integrated, and the moving average component. In your question, you suggest that you want to create a AR(1) model to correct for first-order autocorrelation in the errors and that's it. We can do that with the following command:
arima(Seatbelts[,"drivers"], order=c(1,0,0),
xreg=Seatbelts[,c("kms", "PetrolPrice", "law")])
The value for order specifies that we want an AR(1) model. The xreg compontent should be a series of other Xs we want to add as part of a regression. The output looks a little bit like the output of summary.lm() turned on its side.
Another alternative process might be more familiar to the way you've fit regression models is to use gls() in the nlme package. The following code turns the Seatbelt time series object into a dataframe and then extracts and adds a new column (t) that is just a counter in the sorted time series object:
Seatbelts.df <- data.frame(Seatbelts)
Seatbelts.df$t <- 1:(dim(Seatbelts.df)[1])
The two lines above are only getting the data in shape. Since the arima() function is designed for time series, it can read time series objects more easily. To fit the model with nlme you would then run:
library(nlme)
m <- gls(drivers ~ kms + PetrolPrice + law,
data=Seatbelts.df,
correlation=corARMA(p=1, q=0, form=~t))
summary(m)
The line that begins with "correlation" is the way you pass in the ARMA correlation structure to GLS. The results won't be exactly the same because arima() uses maximum likelihood to estimate models and gls() uses restricted maximum likelihood by default. If you add method="ML" to the call to gls() you will get identical estimates you got with the ARIMA function above.
What is your link function?
The way you describe it sounds like a basic linear regression with autocorrelated errors. In that case, one option is to use lm to get a consistent estimate of your coefficients and use Newey-West HAC standard errors.
I'm not sure the best answer for GLM more generally.

Resources