Applying lmer() function across all columns in dataframe. I have made a list of variables and used lapply. Below is the code:
varlist=names(Genus_abundance)[5:ncol(Genus_abundance)]
lapply(varlist, function(x){lmer(substitute(i ~ Status + (1|Match), list(i=as.name(x), data=Genus_abundance, na.action = na.exclude)))})
However, I keep getting this error:
Error in eval(predvars, data, env) : object 'Acetatifactor' not found
I have checked and Acetatifactor is in the Genus_abundance dataframe.
Bit stuck about where its going wrong
EDIT:
Added a working example:
set.seed(43)
n <- 6
dat <- data.frame(id=1:n, Status=rep(LETTERS[1:2], n/2), age= sample(18:90, n, replace=TRUE), match=1:n, Acetatifactor=runif(n), Acutalibacter=runif(n), Adlercreutzia=runif(n))
head(dat)
id Status age match Acetatifactor Acutalibacter Adlercreutzia
1 1 A 49 1 0.1861022 0.1364904 0.8626298
2 2 B 31 2 0.7297301 0.8246794 0.3169752
3 3 A 23 3 0.4118721 0.5923042 0.2592606
4 4 B 64 4 0.4140497 0.7943970 0.7422665
5 5 A 60 5 0.4803101 0.7690324 0.7473611
6 6 B 79 6 0.4274945 0.9180564 0.9179040
lapply(varlist,
function(x){lmer(substitute(i ~ status + (1|match), list(i=as.name(x))),
data=dd)
})
The specific problem here is misplaced parentheses. You should close the substitute(..., list(i=as.name(x))) with three close-parentheses so that the whole chunk is properly understood as the first argument to lme4.
More generally I agree with #Kat in the comments that this is a good place to look. Since your arguments are already strings (not symbols) you don't really need all of the substitute() business and could use
fit_fun <- function(v) {
lmer(reformulate(c("status", "(1|match)"), response = v),
data = dd, na.action = na.exclude)
}
lapply(varlist, fit_fun)
Or you could use refit to fit the first column, then update the fit with each of the next columns. For large models this is much more efficient.
m1 <- lmer(resp1 ~ status + (1|match), ...)
m_other <- lapply(dd[-(1:3)], refit, object = m1)
c(list(m1), m_other)
Related
#Create subset of a dataset
df <- subset(dat,select = c(id,obs,day_clos,posaff,er89,qol1))
### remove rows with missing values on a variable
df <- subset(df, !is.na(day_clos))
df <- subset(df, !is.na(er89))
df <- subset(df, !is.na(qol1))
df <- subset(df,!is.na(posaff))
any(is.na(df)) ## returns FALSE
Then my data looks like this
id obs day_clos posaff er89 qol1
1 0 16966.61 2.000000 2.785714 3
1 1 16967.79 1.666667 2.785714 4
1 2 16968.82 1.666667 3.142857 3
1 3 16969.76 1.166667 3.071429 4
1 4 16970.95 2.083333 3.000000 4
1 5 16971.75 1.416667 2.857143 4
model.Y <- lm(qol1 ~ posaff,df)
summary(model.Y)
model.M <- lm(qol1 ~ er89, df)
summary(model.M)
#### There is no problem running the regression analyses, however:
results <- mediate(model.M, model.Y, treat="posaff", mediator="er89", boot=TRUE, sims=500)
Returns error message: [.data.frame(m.data, , treat) : undefined columns selected
Any one know how to fix this?
Variables used in treat and mediator must be presents in both models:
treat a character string indicating the name of the treatment variable used in the models.
The treatment can be either binary (integer or a two-valued factor) or continuous
(numeric).
mediator a character string indicating the name of the mediator variable used in the models
Source
A trivial working example:
library("mediation")
db<-data.frame(y=c(1,2,3,4,5,6,7,8,9),x1=c(9,8,7,6,5,4,3,2,1),x2=c(9,9,7,7,5,5,3,3,1),x3=c(1,1,1,1,1,1,1,1,1))
model.M <- lm(x2 ~ x1+x3,db)
model.Y <- lm(y ~ x1+x2+x3)
results <- mediate(model.M, model.Y, treat="x1", mediator="x2", boot=TRUE, sims=500)
I think that I have what you suggested but it is still giving the same error message.
model.mediator <- lmer(PercAccuracy~factor(Rep1) +
(factor(Rep1)| ParticipantPublicID),
data = data, REML=FALSE , control = control_params)
summary(model.mediator)
model.outcome <- lmer(Sharing~factor(Rep1) +PercAccuracy+
(factor(Rep1)+PercAccuracy| ParticipantPublicID),
data = data, REML=FALSE , control = control_params)
summary(model.outcome )
effectModel<-mediate(model.mediator, model.outcome, treat = "Rep1", mediator="PercAccuracy")
summary(effectModel)
Using R and estimating a simple equation by least squares that has the dependent variable as a independent (explanatory, right hand side) variable, I want to forecast out of sample and use the dependent variable forecasts in the out of sample period as a lag for each step ahead.
I.e., I want to extend forecasts of y to be outside the data period
a <- lm( y ~ x + lag(y,1), data= dset1)
b <- forecast(a,newdata=dset2)
where dset2 has the full period of extra x variables, but not the lagged y.
Here is an example using the AirPassengers data set, where dset2 was created with some missing ap data. The results below show only row 143 gets filled in not 144 because forecast did not have the 143 lag.
I looked at the dyn dynlm and forecast packages but nonw seem to work with type of model. (I do not want to restate as an ARMA or a VAR)
What package can easily do this, or am I using forecast incorrectly?
I can loop and step ahead on period at a time, but rather not do that.
##Example case using airline data
data("AirPassengers", package = "datasets")
ap <- log(AirPassengers)
ap <- as.ts(ap)
d1 <- data.frame(ap, index= as.Date(ap))
m1 <- lm(ap ~ lag(ap,1), data=d1)
m2 <- dynlm(ap ~ lag(ap,1), data=d1)
m3 <- dyn(lm(ap ~ lag(ap,1), data=d1))
summary(m3)
##Neither lm or dyn or dynlm obects worked as I want
## Try forecast missing values, 2 steps, rows 143 and 144
d2 <- d1
d2$apx = d2$ap
d2$apx[143:144]= NA
mx <- lm(apx ~ lag(apx,1), data=d2)
b <- forecast(mx,newdata=d2)
Results:
> b
Point Forecast Lo 80 Hi 80 Lo 95 Hi 95
1 NA NA NA NA NA
2 4.756850 4.619213 4.894488 4.545513 4.968188
3 4.807218 4.669783 4.944653 4.596191 5.018245
....
140 6.411559 6.273546 6.549572 6.199644 6.623474
141 6.386407 6.248507 6.524306 6.174667 6.598146
142 6.216154 6.078941 6.353368 6.005467 6.426841
143 6.122453 5.985553 6.259354 5.912247 6.332659
144 NA NA NA NA NA
other lm like objects produced errors for forecast
mx <- dynlm(apx ~ lag(apx,1), data=d2)
b <- forecast(mx,newdata=d2)
Error in forecast.lm(mx, newdata = d2) : invalid type/length
(symbol/0) in vector allocation
mx <- dyn(lm(apx ~ lag(apx,1), data=d2))
b <- forecast(mx,newdata=d2)
Error in predict.lm(object, newdata = newdata, se.fit = TRUE, interval
= "prediction", : formal argument "se.fit" matched by multiple actual arguments
I am trying to apply a t-test to a factor with 24 levels (speaker). My goal is to see if there is a significant difference between orthography (2 levels: jj or L) according to the continuous variable, intensity difference (intdiff). However, when using the by() function, it returned the following error:
Error in FUN(X[[1L]], ...) : could not find function "FUN"
My syntax which produced the error was:
by(data, data$speaker, t.test(intdiff~orthography))
I specified the arguments according to the R documentation, so I can't figure out why it's not accepting the function I provided. Any help would be greatly appreciated. In the event you need to try to reproduce the problem, here is the data set with which I am working:
https://www.dropbox.com/s/bxb9ebavln1rh3u/SpanishPalatals.csv
Many thanks in advance.
This: t.test(intdiff~orthography) is not a function. It appears you are expecting by to split a dataframe so this might succeed:
by(data, data$speaker, function(d){ t.test(d$intdiff ~ d$orthography, data=d)} )
To explain further: function(d){ t.test(d$intdiff ~ d$orthography)} is a function. Or you could try:
by(data, data$speaker, t.test, form= intdiff ~ orthography ) # untested
The second version uses t.test (which is a function 'name' rather than a function 'call') and there is a formula method for t.test. The matching with argument names accepts partial names, so the dataframe being passed to`.test should get automatically matched to the 'data' argument.
The following:
ff <- function(spkr){
tt <- t.test(intdiff~orthography,data=df[df$speaker==spkr,])
p <- tt$p.value
return (c(as.character(spkr), p,
ifelse(p<0.01,"***",ifelse(p<0.05,"**",ifelse(p<0.1,"*","")))))
}
result <- sapply(unique(df$speaker),ff)
result <- data.frame(t(result))
colnames(result) <- c("speaker","p","")
Produces this with your dataset:
> result
speaker p
1 f11r 0.274156477338993
2 f13r 0.713051221315941
3 f15a 0.572200487250118
4 f16a 0.192474372524439
5 f19s 0.071456754899202 *
6 f21s 0.172336984420981
7 f23s 0.00711798616059324 ***
8 f24s 0.875438396151962
9 f31s 0.0191665818354575 **
10 f35s 0.550666959777641
11 f36s 0.715870353562376
12 m09a 0.195488505334365
13 m10a 0.0083410071012031 ***
14 m12r 0.461148808729932
15 m14r 0.407116475315898
16 m17s 0.00147426201434577 ***
17 m18s 0.614243811131762
18 m20s 0.204627912633947
19 m25s 0.00652026971231048 ***
20 m26s 0.135705391035981
21 m27s 0.099118573524907 *
22 m28s 0.0789796806312655 *
23 m32s 0.27026239413494
Note that one of the speakers had only 1 orthography (speaker = f22s), which causes the t.test to fail, so I removed it.
I am attempting to carry out lasso regression using the lars package but can not seem to get the lars bit to work. I have inputted code:
diabetes<-read.table("diabetes.txt", header=TRUE)
diabetes
library(lars)
diabetes.lasso = lars(diabetes$x, diabetes$y, type = "lasso")
However, I get an error message of :
Error in rep(1, n) : invalid 'times' argument.
I have tried entering it like this:
diabetes<-read.table("diabetes.txt", header=TRUE)
library(lars)
data(diabetes)
diabetes.lasso = lars(age+sex+bmi+map+td+ldl+hdl+tch+ltg+glu, y, type = "lasso")
But then I get the error message:
'Error in lars(age+sex + bmi + map + td + ldl + hdl + tch + ltg + glu, y, type = "lasso") :
object 'age' not found'
Where am I going wrong?
EDIT: Data - as below but with another 5 columns.
ldl hdl tch ltg glu
1 -0.034820763 -0.043400846 -0.002592262 0.019908421 -0.017646125
2 -0.019163340 0.074411564 -0.039493383 -0.068329744 -0.092204050
3 -0.034194466 -0.032355932 -0.002592262 0.002863771 -0.025930339
4 0.024990593 -0.036037570 0.034308859 0.022692023 -0.009361911
5 0.015596140 0.008142084 -0.002592262 -0.031991445 -0.046640874
I think some of the confusion may have to do with the fact that the diabetes data set that comes with the lars package has an unusual structure.
library(lars)
data(diabetes)
sapply(diabetes,class)
## x y x2
## "AsIs" "numeric" "AsIs"
sapply(diabetes,dim)
## $x
## [1] 442 10
##
## $y
## NULL
##
## $x2
## [1] 442 64
In other words, diabetes is a data frame containing "columns" which are themselves matrices. In this case, with(diabetes,lars(x,y,type="lasso")) or lars(diabetes$x,diabetes$y,type="lasso") work fine. (But just lars(x,y,type="lasso") won't, because R doesn't know to look for the x and y variables within the diabetes data frame.)
However, if you are reading in your own data, you'll have to separate the response variable and the predictor matrix yourself, something like
X <- as.matrix(mydiabetes[names(mydiabetes)!="y",])
mydiabetes.lasso = lars(X, mydiabetes$y, type = "lasso")
Or you might be able to use
X <- model.matrix(y~.,data=mydiabetes)
lars::lars does not appear to have a formula interface, which means you cannot use the formula specification for the column names (and furthermore it does not accept a "data=" argument). For more information on this and other "data mining" topics, you might want to get a copy of the classic text: "Elements of Statistical Learning". Try this:
# this obviously assumes require(lars) and data(diabetes) have been executed.
> diabetes.lasso = with( diabetes, lars(x, y, type = "lasso"))
> summary(diabetes.lasso)
LARS/LASSO
Call: lars(x = x, y = y, type = "lasso")
Df Rss Cp
0 1 2621009 453.7263
1 2 2510465 418.0322
2 3 1700369 143.8012
3 4 1527165 86.7411
4 5 1365734 33.6957
5 6 1324118 21.5052
6 7 1308932 18.3270
7 8 1275355 8.8775
8 9 1270233 9.1311
9 10 1269390 10.8435
10 11 1264977 11.3390
11 10 1264765 9.2668
12 11 1263983 11.0000
Why doesn't a model matrix necessarily have the same number of rows as the data frame?
mergem = model.matrix(as.formula(paste(response, '~ .')), data=mergef)
dim(mergef)
# [1] 115562 71
dim(mergem)
# [1] 66786 973
I tried looking for hints in the documentation but couldn't find anything. Thanks in advance.
Well, if a row has NAs in it, that row is (by default) removed:
d <- data.frame(x=c(1,1,2), y=c(2,2,4), z=c(4,NA,8))
m <- model.matrix(x ~ ., data=d)
nrow(d) # 3
nrow(m) # 2
This behavior is controlled by the option "na.action":
options(na.action="na.fail")
m <- model.matrix(x ~ ., data=d) # Error: missing values in object