I have the following problem:
I'm using the data "Railtrail" from this library "mosaicData".
I already have the coeffiecient of this following linear regression model :
lm(volume ~ hightemp + cloudcover + weekday, data = RailTrail) , compute for the population.
Now, I need to estimate the coeffiecient of that model with samples and to build a confidence interval (95%).
So I need to compute all the coefficients of the data samples previously generated. I was asked to use a loop 'for' but I don't know how to compute the LR models. I also need to store the coefficient obtained.
I tried to do it doing this
trial <- list()
set.seed(101)
for(i in 1:100){
trial[[i]] <- RailTrail %>%
lm(volume ~ hightemp + cloudcover + weekday, data = RailTrail)
}
but I get the following error:
Error in xj[i] : invalid subscript type 'language'
Thank you,
Don't hesitate to ask further precision if my request is not clear.
Francisco
Do you mean the confidence interval for the model parameters? If so, this example I hope illustrates a succinct way of doing so:
model <- lm(mpg ~ cyl + disp + gear, data = mtcars)
bind_cols(broom::tidy(model), broom::confint_tidy(model))
Related
I'm performing predictive analysis where I train a model to a portion of my data and test the model with the remaining portion. I'm familiar with the MICE package and the imputation procedure using predictive mean matching.
My understanding is that the proper way to utilize imputation is to create numerous imputed data sets, fit a model to each of those imputed data sets, then combine the coefficients across all of those fitted models into one single model. I know how to do this and view the summary of the coefficients with which I can perform inference on the variables. However, that is not my objective; I need to end up with a single model that I can use to predict new values.
Simply put, when I try to use the predict function with this model I got from using MICE, it doesn't work.
Any suggestions? I am coding this in R.
Edit: using the airquality data set as an example, my code looks like this:
imputed_data <- mice(airquality, method = c(rep("pmm", 6)), m = 5, maxit = 5)
model <- with(imputed_data, lm(Ozone ~ Solar.R + Wind + Temp + Month + Day))
pooled_model <- pool(model)
This gives me a pooled model across my 5 imputed data sets. However, I am unable to use the predict function with this model. When I then execute:
predict(pooled_model, newdata = airquality)
I get this error:
Error in UseMethod("predict") :
no applicable method for 'predict' applied to an object of class "c('mira', 'matrix')"
Not sure exactly what you're looking for, but something like this might work:
library(mice)
library(mitools)
data(mtcars)
mtcars$qsec[c(4,6,8,21)] <- NA
imps <- mice(mtcars, m=10)
comps <- lapply(1:imps$m, function(i)complete(imps, i))
mods <- lapply(comps, function(x)lm(qsec ~hp + drat + wt, data=x))
pmod <- MIcombine(mods)
pmod$coefficients
#> (Intercept) hp drat wt
#> 18.15389098 -0.02570887 0.11434023 0.92348390
newvals <- data.frame(hp=300, drat=4, wt=2.58)
X <- model.matrix(~hp + drat + wt, data=newvals)
preds <- X %*% pmod$coefficients
preds
#> [,1]
#> 1 13.28118
Created on 2023-02-01 by the reprex package (v2.0.1)
I would like to test the main effect of a categorical variable using a permutation test on a likelihood ratio test. I have a continuous outcome and a dichotomous grouping predictor and a categorical time predictor (Day, 5 levels).
Data is temporarily available in rda format via this Drive link.
library(lme4)
lmer1 <- lmer(outcome ~ Group*Day + (1 | ID), data = data, REML = F, na.action=na.exclude)
lmer2 <- lmer(outcome ~ Group + (1 | ID), data = data, REML = F, na.action=na.exclude)
library(predictmeans)
permlmer(lmer2,lmer1)
However, this code gives me the following error:
Error in density.default(c(lrtest1, lrtest), kernel = "epanechnikov") :
need at least 2 points to select a bandwidth automatically
The following code does work, but does not exactly give me the outcome of a permutated LR-test I believe:
library(nlme)
lme1 <- lme(outcome ~ Genotype*Day,
random = ~1 | ID,
data = data,
na.action = na.exclude)
library(pgirmess)
PermTest(lme1)
Can anyone point out why I get the "epanechnikov" error when using the permlmer function?
Thank you!
The issue is with NANs, remove all nans from your dataset and rerun the models. I had the same problem and that solved it.
I am a complete newbie to R.
I have the following logit equation I am estimating:
allAM <- glm (AM ~ VS + Prom + LS_Exp + Sex + Age + Age2 + Jpart + X2004LS + X2009LS + X2014LS + factor(State), family = binomial(link = "logit"), data = mydata)
AM is a standard binary (happened/didn’t happen). The three “X****LS” variables are dummies indicating different sessions of congress and “factor(State)” is used to generate fixed effects/dummies for each state.
VS is the key independent variable of interest and I want to generate the predicated probability that AM=1 for each value of VS between 0 and 60, holding everything else at its mean.
I am running into trouble, however, generating and plotting the predicted probabilities because “State” is a factor. I want to be able to show the average effects, not 50 different charts/effects for each state.
Per (Hanmer and Kalkan 2013) http://onlinelibrary.wiley.com/doi/10.1111/j.1540-5907.2012.00602.x/abstract I was advised to do the following to plot the predicted probabilities:
pred.seq <- seq(from=0, to=60, by=0.01)
pred.out <- c()
for(i in 1:length(pred.seq)){
mydata.c <- mydata
mydata.c$VS <- pred.seq[i]
pred.out[i] <- mean(predict(allAM, newdata=mydata.c, type="response"))
}
plot(pred.out ~ pred.seq, type="l")
This approach seems to work, though I don’t really understand it.
I want to add the upper and lower 95% confidence intervals to the plot, but when I attempt to do it by hand the way I know how:
lower <- pred.out$fit - (1.96*pred.out$se.fit)
upper <- pred.out$fit + (1.96*pred.out$se.fit)
I get the following error:
Error in pred.outfit:fit: operator is invalid for atomic vectors
Can anyone advise how I can plot the confidence intervals and how I can specify different levels of VS so that I can report some specific predicted probabilities?
Here is my problem.
Sample data:
library(lme4)
library(ISLR)
data(Auto)
mdl<-lmer(mpg ~ horsepower + I(horsepower^2) + displacement + I(displacement^2) + (1|name) + (1|year),data=Auto)
I want to use this model to predict for a range of horsepower while keeping displacement at its mean value.
horsepower <- min(Auto$horsepower):max(Auto$horsepower)
displacement <- rep(mean(Auto$displacement),185)
data <- data.frame(horsepower,displacement)
# Use predict
yVals <- predict(mdl, newdata = data)
Error in eval(expr, envir, enclos) : object 'name' not found
I think this error is happening because I am not specifying name which is my random effect inside the predict function. Does anyone know how to address this error.
In addition, I want to fit the quadratic function in the plot of mpg ~ horsepower based on the coefficients generated from the model.
I have built a survival cox-model, which includes a covariate * time interaction (non-proportionality detected).
I am now wondering how could I most easily get survival predictions from my model.
My model was specified:
coxph(formula = Surv(event_time_mod, event_indicator_mod) ~ Sex +
ageC + HHcat_alt + Main_Branch + Acute_seizure + TreatmentType_binary +
ICH + IVH_dummy + IVH_dummy:log(event_time_mod)
And now I was hoping to get a prediction using survfit and providing new.data for the combination of variables I am doing the predictions:
survfit(cox, new.data=new)
Now as I have event_time_mod in the right-hand side in my model I need to specify it in the new data frame passed on to survfit. This event_time would need to be set at individual times of the predictions. Is there an easy way to specify event_time_mod to be the correct time to survfit?
Or are there any other options for achieving predictions from my model?
Of course I could create as many rows in the new data frame as there are distinct times in the predictions and setting to event_time_mod to correct values but it feels really cumbersome and I thought that there must be a better way.
You have done what is refereed to as
An obvious but incorrect approach ...
as stated in Using Time Dependent Covariates and Time Dependent Coefficients in the Cox Model vignette in version 2.41-3 of the R survival package. Instead, you should use the time-transform functionality, i.e., the tt function as stated in the same vignette. The code would be something similar to the example in the vignette
> library(survival)
> vfit3 <- coxph(Surv(time, status) ~ trt + prior + karno + tt(karno),
+ data=veteran,
+ tt = function(x, t, ...) x * log(t+20))
>
> vfit3
Call:
coxph(formula = Surv(time, status) ~ trt + prior + karno + tt(karno),
data = veteran, tt = function(x, t, ...) x * log(t + 20))
coef exp(coef) se(coef) z p
trt 0.01648 1.01661 0.19071 0.09 0.9311
prior -0.00932 0.99073 0.02030 -0.46 0.6462
karno -0.12466 0.88279 0.02879 -4.33 1.5e-05
tt(karno) 0.02131 1.02154 0.00661 3.23 0.0013
Likelihood ratio test=53.8 on 4 df, p=5.7e-11
n= 137, number of events= 128
The survfit though does not work when you have a tt term
> survfit(vfit3, veteran[1, ])
Error in survfit.coxph(vfit3, veteran[1, ]) :
The survfit function can not yet process coxph models with a tt term
However, you can easily get out the terms, linear predictor or mean response with predict. Further, you can create the term over time for the tt term using the answer here.