Anova in R: Dataframe selection

Anova in R: Dataframe selection - r

I just run into a problem when using a variable in the anova term. Normally I would use "AGE" directly in the term, but run it all in a loop so myvar will change.
myvar=as.name("AGE")
x=summary( aov (dat ~ contrasts*myvar)+ Error(ID/(contrasts)), data =set))
names(set) = "contrasts" "AGE" "ID" "dat"
It's like when I want to select:
set$myvar
not function! but set$AGE yes
Is there any code for this?

You need to create a string representation of the model formula, then convert it using as.formula.
myvar <- "AGE"
f <- as.formula(paste("dat ~", myvar))
aov(f)

As Richie wrote, pasting seems like the simplest solution. Here's a more complete example:
myvar <- "AGE"
f <- as.formula(paste("dat ~ contrasts *", myvar, "+ Error(ID/contrasts)"))
x <- summary( aov(f, data=set) )
...and instead of set$myvar you would write
set[[myvar]]
A more advanced answer is that a formula is actually a call to the "~" operator. You can modify the call directly, which would be slightly more efficient inside the loop:
> f <- dat ~ contrasts * PLACEHOLDER + Error(ID/contrasts) # outside loop
> f[[3]][[2]][[3]] <- as.name(myvar) # inside loop
> f # see what it looks like...
dat ~ contrasts * AGE + Error(ID/contrasts)
The magic [[3]][[2]][[3]] specifies the part of the formula you want to replace. The formula actually looks something like this (a parse tree):
`~`(dat, `+`(`*`(contrasts, PLACEHOLDER), Error(`/`(ID, contrasts))
Play around with indexing the formula and you'll understand:
> f[[3]]
contrasts * AGE + Error(ID/contrasts)
> f[[3]][[2]]
contrasts * AGE
UPDATE: What are the benefits of this? Well, it is more robust - especially if you don't control the data's column names. If myvar <- "AGE GROUP" the current paste solution doesn't work. And if myvar <- "file.create('~/OWNED')", you have a serious security risk...

Related

Write a loop for my function in r

I am currently trying to write my first loop for lagged regressions on 30 variables. Variables are labeled as rx1, rx2.... rx3, and the data frame is called my_num_data.
I have created a loop that looks like this:
z <- zoo(my_num_data)
for (i in 1:30)
{dyn$lm(my_num_data$rx[i] ~ lag(my_num_data$rx[i], 1)
+ lag(my_num_data$rx[i], 2))
}
But I received an error message:
Error in model.frame.default(formula = dyn(my_num_data$rx[i] ~ lag(my_num_data$rx[i], :
invalid type (NULL) for variable 'my_num_data$rx[i]'
Can anyone tell me what the problem is with the loop?
Thanks!

This produces a list, L, whose ith component has the name of the ith column of z and whose content is the regression of the ith column of z on its first two lags. Lag is same as lag except for a reversal of argument k's sign.
library(dyn)
z <- zoo(anscombe) # test input using builtin data.frame anscombe
Lag <- function(x, k) lag(x, -k)
L <- lapply(as.list(z), function(x) dyn$lm(x ~ Lag(x, 1:2)))

First problem, I'm pretty sure the function you're looking for is dynlm(), without the $ character. Second, using $rx[i] doesn't concatenate rx and the contents of i, it selects the (single) element in $rx with index i. Try this... edited I don't have your data, so I can't test it on my machine:
results <- list()
for (i in 1:30) {
results[[i]] <- dynlm(my_num_data[,i] ~ lag(my_num_data[,i], 1)
+ lag(my_num_data[,i], 2))
}
and then list element results[[1]] will be the results from the first regresssion, and so on.
Note that this assumes your my_num_data data.frame ONLY consists of columns rx1, rx2, etc.

I am not super familiar with R, but it appears you are trying to increase the index of rx. Is rx a vector with values at different indices?
If not the solution my be to concatenate a string
for (i in 1:30){
varName <-- "rx"+i
dyn$lm(my_num_data$rx[i] ~ lag(my_num_data$rx[i], 1)
+ lag(my_num_data$varName, 2))
}
Again, I may be way off here, as this if my first post and R is still pretty new to me.

R: Use string containing variable names in regression

I first use grep to obtain all variable names that begin with the preface: "h_." I then collapse that array into a single string, separated with plus signs. Is there a way to subsequently use this string in a linear regression?
For example:
holiday_array <- grep("h_", names(df), value=TRUE)
holiday_string = paste(holiday_array, collapse=' + ' )
r_3 <- lm(log(assaults) ~ year + month + holiday_string, data = df)
I get the straightforward error variable lengths differ (found for 'holiday_string')
I can do it like this, for example:
holiday_formula <- as.formula(paste('log(assaults) ~ attend_v + year+ month + ', paste("", holiday_vars, collapse='+')))
r_3 <- lm(holiday_formula, data = df)
But I don't want to have to type a separate formula construction for each new set of controls. I want to be able to add the "string" inside the lm function. Is this possible?
The above is problematic, because let's say I want to then add another set of control variables to the formula contained in holiday_formula, so something like this:
weather_vars <- grep("w_", names(df), value=TRUE) weather_formula
<- as.formula(paste(holiday_formula, paste("+", weather_vars,
collapse='+')))
Not sure how you would do the above.

I don't know a simple method for construction of a formula argument different than the one you are rejecting (although I considered and rejected using update.formula since it would also have required using as.formula), but this is an alternate method for achieving the same goal. It uses the "."-expansion feature of R-formulas and relies on the ability of the [-function to accept character argument for column selection:
r_3 <- lm(log(assaults) ~ attend_v + year+ month + . ,
data = df[ , c('assaults', 'attend_v', 'year', 'month', holiday_vars] )

Pasting a string in a loop

I am writing some code for a loop and I want to paste a string in the loop. However, for some reason the command "paste" does not seems to work:
A simple example:
### Creating some variables
test1<-c(1,2,3,4,5,6,7,8,9,10)
test2<-c(4,6,7,2,5,3,6,2,7,1)
test3<-c(3,5,6,7,7,7,7,3,5,3)
### An example of a loop
for (i in 1:2)
{
name<-paste("test",i,sep="")
fit <- lm(name~test2+test3)
}
I don't understand why this works:
fit <- lm(test1~test2+test3)
But this doesn't:
fit <- lm(name~test2+test3)
even though paste is equal to test1.
Any help would be much appreciated. Ideally I would like to use a loop rather than apply.

First, put your vectors in a data.frame. Second, you don't need a loop in this example.
DF <- data.frame(test1,
test2,
test3)
fits <- lm(do.call(cbind, DF[, 1:2]) ~ test2 + test3, data=DF)
#Coefficients:
# test1 test2
#(Intercept) 7.655e+00 1.123e-15
#test2 -3.669e-01 1.000e+00
#test3 -1.089e-01 3.594e-17
Note that the result for test2 differs from lm(test2 ~ test2 + test3) because the response variable on the RHS is not removed.

get returns the value of a named object:
fit2 <- lm(get(name)~test2+test3)

Assign values from a vector to the `call` attribute in a list of lm models (mapply?)

I am working with a list of lm models. Let's create a small example of that:
set.seed(1234)
mydata <- matrix(rnorm(40),ncol=4)
modlist <- list()
for (i in 1:3) {
modlist[[i]] <- lm(mydata[,1] ~ mydata[,i+1])
}
In reality there about 50 models. If you print the modlist object, you'll notice that the call attribute for each model is generic, namely lm(formula = mydata[, 1] ~ mydata[, i + 1]). As later subsets of this list will be needed, I would like to have the convenience to see the name of the dependent variable in each model, assigning that name to the respective call attribute:
modlist[[1]]$call <- "Factor 1"
One can see that the model call has changed to "Factor 1" in the first element of modlist. Let us say I have a vector of names, which I would like to assign:
modnames <- paste0("Factor",1:3)
It would be, of course, possible to assign the respective value of that vector to the respective model in the list, e.g.:
for (i in 1:3) {
modlist[[i]]$call <- modnames[i]
}
Is there a vectorized version of this? I suspect it will be mapply, but I can't figure out how to combine the assignment operator with extracting the respective element of the list, i.e. [[(). More of a purist anti-loop premature optimization exercise, but still :) Thank you!

data.frame columns as tabular row labels in R

I am trying to make a table in R/Sweave using the tabular command. I want the row labels to be the headings of my data frame consca. (Each column is a question, and each row is a student's responses to each question.) The command I am using is this:
latex(tabular(Heading('Questions')*(paste(labels(consca)[[2]],collapse='+')) ~ (n=1) + (mn +
sdev),data=consca))
Which throws this error:
Error in term2table(rows[[i]], cols[[j]], data, n) :
Argument paste(labels(consca)[[2]], collapse = "+") is not length 298
The paste argument works...
paste(labels(consca)[[2]],collapse='+')
[1] "Q02+Q03+Q06+Q17+Q19+Q25+Q31+Q33+Q36+Q39+Q45+Q50"
And produces the output I desire:
latex(tabular(Heading('Questions')*(Q02+Q03+Q06+Q17+Q19+Q25+Q31+Q33+Q36+Q39+Q45+Q50) ~ (n=1) +
(mn + sdev),data=consca))
However, I want to do this with multiple scales (i.e. I want to change consca to other objects and I want to eliminate the copy/paste step.)
I have fiddled with eval and as.symbol, but to no avail. Perhaps I am not using them in the right way.
OK, and for those of you who will want a minimal reproducible example, here goes:
require(tables)
a <- rnorm(10)
b <- rnorm(10,2)
c <- rnorm(10,100)
x <- data.frame(a,b,c)
# This works:
tabular(a+b+c ~ (mean + sd), x)
# This fails:
tabular(paste(labels(x)[[2]],collapse='+') ~ (mean+sd),x)
# Even though:
paste(labels(x)[[2]],collapse='+')
[1] "a+b+c"

I found a workaround using the describe function in the psych package. (Ultimately, I wanted more than the mean and sd, and the describe function automatically calculates them.) This creates a data.frame, which is trivial to turn into a \LaTeX table. 😃

Develop Reference

r css asp.net wordpress firebase qt symfony nginx http apache-flex

Anova in R: Dataframe selection - r

You need to create a string representation of the model formula, then convert it using as.formula. myvar <- "AGE" f <- as.formula(paste("dat ~", myvar)) aov(f)

Related

Write a loop for my function in r

R: Use string containing variable names in regression

Pasting a string in a loop

Assign values from a vector to the `call` attribute in a list of lm models (mapply?)

data.frame columns as tabular row labels in R

Categories

Resources