Regression Modelling of Linear, Exponential, and Power Curves in R - r

Please note this is cross-posted: https://stats.stackexchange.com/questions/427649/regression-modelling-of-linear-exponential-and-power-curves
I am trying to model reaction time (and other) data across trials (trial 1-5) using different mathematical functions. Specifically I model linear, exponential, and power functions using linear mixed effect models by transforming data and use AIC/BIC to compare fits:
Linear: lmer(ReactionTime ~ Trial + (Trial | Subjects), data = lmerdata)
Exponential: lmer(log(ReactionTime) ~ Trial + (Trial | Subjects), data = lmerdata)
Power: lmer(log(ReactionTime) ~ log(Trial) + (Trial | Subjects), data = lmerdata)
By doing this, the exponential and power equations imply a different distribution for errors than the linear equation. The consequence of this is inflated exponential and power function fits relative to the linear fit.
Is there a way to account for this using lmer()? Alternatively, how would this be done using non-linear mixed effects modelling? I've attempted to do it with nlme(), nlmer(), glmer() but all methods end up running into issues (e.g., do not converge).
Here is sample data:
#Create Empty Matrix
lmerdata <- matrix(NA, 20, 3)
#Add Participant IDs
lmerdata[, 1] <- rep(1:4, 5)
#Add Trial Counts
lmerdata[, 2] <- as.numeric(sort(rep(1:5, 4)))
#Add Reaction Time Data
lmerdata[, 3] <- c(2.184308,2.754287,2.396167,1.305267,1.943866,1.70844,2.586035,1.261954,1.768063,1.76659,2.242142,1.489634,1.62544,1.677268,2.378175,1.550744,1.481052,1.424327,1.738102,1.247097)
#Name Columns
colnames(lmerdata) <- c('Subjects', 'Trial', 'ReactionTime')
#Convert to Data Frame
lmerdata <- as.data.frame(lmerdata)
#Turn Subjects into Factor
lmerdata$Subjects <- as.factor(lmerdata$Subjects)

Related

Including an offset when using cph {rms} for validation of a Cox model

I am externally validating and updating a Cox model in R. The model predicts 5 year risk. I don't have access to the original data, just the equation for the linear predictor and the value of the baseline survival probability at 5 years.
I have assessed calibration and discrimination of the model in my dataset and found that the model needs to be updated.
I want to update the model by adjusting baseline risk only, so I have been using a Cox model with the linear predictor ("beta.sum") included as an offset term, to restrict its coefficient to be 1.
I want to be able to use cph instead of coxph as it makes internal validation by bootstrapping much easier. However, when including the linear predictor as an offset I get the error:
"Error in exp(object$linear.predictors) :
non-numeric argument to mathematical function"
Is there something I am doing incorrectly, or does the cph function not allow an offset within the formula? If so, is there another way to restrict the coefficient to 1?
My code is below:
load(file="k.Rdata")
### Predicted risk ###
# linear predictor (LP)
k$beta.sum <- -0.2201 * ((k$age/10)-7.036) + 0.2467 * (k$male - 0.5642) - 0.5567 * ((k$epi/5)-7.222) +
0.4510 * (log(k$acr_mgmmol/0.113)-5.137)
k$pred <- 1 - 0.9365^exp(k$beta.sum)
# Recalibrated model
# Using coxph:
cox.new <- coxph(Surv(time, rrt) ~ offset(beta.sum), data = k, x=TRUE, y=TRUE)
# new baseline survival at 5 years
library(pec)
predictSurvProb(cox.new, newdata=data.frame(beta.sum=0), times = 5) #baseline = 0.9570
# Using cph
cph.new <- cph(Surv(time, rrt) ~ offset(beta.sum), data=k, x=TRUE, y=TRUE, surv=TRUE)
The model will run without surv=TRUE included, but this means a lot of the commands I want to use cannot work, such as calibrate, validate and predictSurvProb.
EDIT:
I will include a way to reproduce the error
library(purr)
library(rms)
n <- 1000
set.seed(1234)
status <- as.numeric(rbernoulli(n, p=0.1))
time <- -5* log(runif(n))
lp <- rnorm(1000, mean=-2.7, sd=1)
mydata <- data.frame(status, time, lp)
test <- cph(Surv(time, status) ~ offset(lp), data=mydata, surv=TRUE)

Pooling sandwich variance estimator over multiply imputed datasets

I am running a poisson regression on multiply imputed data to predict a common binary outcome. After running mice, I have obtained a stacked data frame comprising the raw data and five imputed datasets. Here is a toy example:
df <- mice::nhanes
imp <- mice(df) #impute data
com <- complete(imp, "long", TRUE) #creates data frame
I now want to:
Run the regression on each imputed dataset
Calculate robust standard errors using a sandwich variance estimator
Combine / pool the results of both analyses
I can run the regression on the mids object using the with and pool commands:
fit.pois.mids <- with(imp, glm(hyp ~ age + bmi + chl, family = poisson))
summary(pool(fit.pois.mids))
I can also run the regression on each of the imputed datasets before combining them:
imp.df <- split(com, com$.imp); names(imp.df) <- c("raw", "imp1", "imp2", "imp3", "imp4", "imp5") #creates list of data frames representing each imputed dataset
fit.pois <- lapply(imp.df, function(x) {
fit <- glm(hyp ~ age + bmi + chl, data = x, family = poisson)
fit
})
summary(MIcombine(fit.pois))
Similarly, I can calculate the standard errors for each imputed dataset:
sand <- lapply(fit.pois, function(x) {
se <- coeftest(x, vcov = sandwich)
se
})
Unfortunately, MIcombine does not seem to return p-values. This post suggests using Zelig, but for that matter, I may as well just use mice. Further it does not appear to be possible to combine the estimates of the standard errors:
summary(MIcombine(sand.df))
Error in UseMethod("vcov") :
no applicable method for 'vcov' applied to an object of class "coeftest"
For the sake of simplicity, it seems that mice is a better option for pooling the results of the regression; however, I am wondering how I would go about updating (i.e., pooling and combining) the standard errors. What are some ways this could be addressed?

GLM BACI analysis in R

I am trying to conduct a BACI analysis in R using logistic regression. Due to the use of reference levels in the output of GLMs, I am having difficulty interpreting my results. Has anyone had any luck retrieving a summary of all pairwise interactions?
(Depth is a continuous predictor variable, but I can convert it to a categorical if necessary.)
Towards <- c(4,7,9,0,15,10,11,23,1,4)
Total <- c(6,14,10,7,15,12,20,41,5,8)
Depth <- c(-.3,-.25,-.21,-.17,-.05,0,0,.25,.5,.56)
DPM <- c("Pre","Post","Pre","Pre","Post","Pre","Post","Post","Post","Pre")
Proximity <- c("Far","Near","East","East","East","Near","Far", "Far","Near","Far")
Area <- c("DPM","Control","Control","DPM","Control","Control",
"DPM","DPM","Control‌​","DPM")
Data <- data.frame(Towards,Total, Depth, DPM, Proximity, Area)}
mod <- glm(cbind(Towards, Total-Towards) ~ DPM*Site*Depth,
data=LogReg, family=binomial('logit'))

Simulation-based power analysis for Linear Mixed Model (repeated measures) using pilot data

I am doing a simple mixed model analysis and would like to estimate the power of the study (for multiple possible sample sizes). I am using lme4 to fit the models and would like to use the simulate() function in order to simulate new data based on the parameters estimated in my pilot study model and then to use this data for power analysis.
My model is:
m.2 <- lmer(y ~ time + group + (time | subject), REML=FALSE)
And the data looks like this:
npg <- 20 # No of subjects per group
subject <- 1:(2*npg) # Subjects' ids
group <- gl(2, npg, labels = c("Control", "Treatment"))
dts <- data.frame(subject, group) # Subject-level data
dtL <- list(time = 0:9,
subject = subject)
dtLong <- expand.grid(dtL) # "Long" format
mrgDt <- merge(dtLong, dts, sort = FALSE) # Merged
I know I can get the parameters from the pilot data model like this:
newparams <- list(
beta = fixef(m.2),
theta = getME(m.2, "theta"),
sigma = getME(m.2, "sigma"))
And that I can simulate new data using this:
ss <- simulate(~ time + group + (time|subject),
nsim=1,
newdata=d,
family=gaussian,
newparams=newparams)
Unfortunately, I'm new to simulation and don't really know how to do power analysis with simulated data sets. Could you help me with some code and some advice?
EDIT:
I would like to do a power analysis similar to the one in Gałecki & Burzykowski (2013) book.
Gałecki, A., & Burzykowski, T. (2013). Linear mixed-effects models using R: A step-by-step approach. Springer Science & Business Media, p. 515. The example is: Empirical power of the F-test for the treatment effect based on the simulated values of the F-test statistics.
The info in https://peerj.com/articles/1226.pdf and the Supplemental Information (https://peerj.com/articles/1226/#supp-1) is very useful and I think it's the way to go, but I don't understand how to apply their code.
I wanted to simulate data based on the real parameters of the fitted model from my pilot study.
I don't have all the data ready at the current time.For the sake of example and reproducibility let's say I use the plantgrowth data set
df<-read.table('http://www.uib.no/People/nzlkj/statkey/data/plantgrowth.csv',header=T)
and my model is
m.2 <- lmer(height ~ time + treat + (time | individual), REML=FALSE)
Under my control is the number of individuals in each arm or the study (the two groups, treatment and control).

Generating predictive simulations from a multilevel model with random intercepts

I am struggling to understand how, in R, to generate predictive simulations for new data using a multilevel linear regression model with a single set of random intercepts. Following the example on pp. 146-147 of this text, I can execute this task for a simple linear model with no random effects. What I can't wrap my head around is how to extend the set-up to accommodate random intercepts for a factor added to that model.
I'll use iris and some fake data to show where I'm getting stuck. I'll start with a simple linear model:
mod0 <- lm(Sepal.Length ~ Sepal.Width, data = iris)
Now let's use that model to generate 1,000 predictive simulations for 250 new cases. I'll start by making up those cases:
set.seed(20912)
fakeiris <- data.frame(Sepal.Length = rnorm(250, mean(iris$Sepal.Length), sd(iris$Sepal.Length)),
Sepal.Width = rnorm(250, mean(iris$Sepal.Length), sd(iris$Sepal.Length)),
Species = sample(as.character(unique(iris$Species)), 250, replace = TRUE),
stringsAsFactors=FALSE)
Following the example in the aforementioned text, here's what I do to get 1,000 predictive simulations for each of those 250 new cases:
library(arm)
n.sims = 1000 # set number of simulations
n.tilde = nrow(fakeiris) # set number of cases to simulate
X.tilde <- cbind(rep(1, n.tilde), fakeiris[,"Sepal.Width"]) # create matrix of predictors describing those cases; need column of 1s to multiply by intercept
sim.fakeiris <- sim(mod0, n.sims) # draw the simulated coefficients
y.tilde <- array(NA, c(n.sims, n.tilde)) # build an array to hold results
for (s in 1:n.sims) { y.tilde[s,] <- rnorm(n.tilde, X.tilde %*% sim.fakeiris#coef[s,], sim.fakeiris#sigma[s]) } # use matrix multiplication to fill that array
That works fine, and now we can do things like colMeans(y.tilde) to inspect the central tendencies of those simulations, and cor(colMeans(y.tilde), fakeiris$Sepal.Length) to compare them to the (fake) observed values of Sepal.Length.
Now let's try an extension of that simple model in which we assume that the intercept varies across groups of observations --- here, species. I'll use lmer() from the lme4 package to estimate a simple multilevel/hierarchical model that matches that description:
library(lme4)
mod1 <- lmer(Sepal.Length ~ Sepal.Width + (1 | Species), data = iris)
Okay, that works, but now what? I run:
sim.fakeiris.lmer <- sim(mod1, n.sims)
When I use str() to inspect the result, I see that it is an object of class sim.merMod with three components:
#fixedef, a 1,000 x 2 matrix with simulated coefficients for the fixed effects (the intercept and Sepal.Width)
#ranef, a 1,000 x 3 matrix with simulated coefficients for the random effects (the three species)
#sigma, a vector of length 1,000 containing the sigmas associated with each of those simulations
I can't wrap my head around how to extend the matrix construction and multiplication used for the simple linear model to this situation, which adds another dimension. I looked in the text, but I could only find an example (pp. 272-275) for a single case in a single group (here, species). The real-world task I'm aiming to perform involves running simulations like these for 256 new cases (pro football games) evenly distributed across 32 groups (home teams). I'd greatly appreciate any assistance you can offer.
Addendum. Stupidly, I hadn't looked at the details on simulate.merMod() in lme4 before posting this. I have now. It seems like it should do the trick, but when I run simulate(mod0, nsim = 1000, newdata = fakeiris), the result has only 150 rows. The values look sensible, but there are 250 rows (cases) in fakeiris. Where is that 150 coming from?
One possibility is to use the predictInterval function from the merTools package. The package is about to be submitted to CRAN, but the current developmental release is available for download from GitHub,
install.packages("devtools")
devtools::install_github("jknowles/merTools")
To get the median and a 95% credible interval of 100 simulations:
mod1 <- lmer(Sepal.Length ~ Sepal.Width + (1 | Species), data = iris)
out <- predictInterval(mod1, newdata=fakeiris, level=0.95,
n.sims=100, stat="median")
By default, predictInterval includes the residual variation, but you can
turn that feature off with:
out2 <- predictInterval(mod1, newdata=fakeiris, level=0.95,
n.sims=100, stat="median",
include.resid.var=FALSE)
Hope this helps!
This might help: it doesn't use sim(), but instead uses mvrnorm() to draw the new coefficients from the sampling distribution of the fixed-effect parameters, uses a bit of internal machinery (setBeta0) to reassign the internal values of the fixed-effect coefficients. The internal values of the random effect coefficients are automatically resampled by simulate.merMod using the default argument re.form=NA. However, the residual variance is not resampled -- it is held fixed across the simulations, which isn't 100% realistic.
In your use case, you would specify newdata=fakeiris.
library(lme4)
mod1 <- lmer(Sepal.Length ~ Sepal.Width + (1 | Species), data = iris)
simfun <- function(object,n=1,newdata=NULL,...) {
v <- vcov(object)
b <- fixef(object)
betapars <- MASS::mvrnorm(n,mu=b,Sigma=v)
npred <- if (is.null(newdata)) {
length(predict(object))
} else nrow(newdata)
res <- matrix(NA,npred,n)
for (i in 1:n) {
mod1#pp$setBeta0(betapars[i,])
res[,i] <- simulate(mod1,newdata=newdata,...)[[1]]
}
return(res)
}
ss <- simfun(mod1,100)

Resources