Referencing factor names in R for ANOVA - r

I'm relatively new to R and am trying to streamline an ANOVA script to read a set of factor names from a table, and perform statistical tests on the interactions between these factors.
My basic question is how to not have to manually write the name of factors when I call aov, like this:
aov2 <- aov(no_gap ~ Diag*Age, data=data)
But instead, to index a variable which contains the names of the factors of interest, like this (but this doesn't work):
aov2 <- aov(get(vars[5]) ~ get(vars[1])*get(vars[2]), data=data)
Here's my whole script:
#Load data
outName <- read_file("fileNameToWrite.txt")
data <- read.table(header=TRUE, "testDataTable.txt",stringsAsFactors = TRUE)
vars <- colnames(data)
# Make sure subject column is a factor
cols <- c(vars[1:2])
data[,cols] <- data.frame(apply(data[cols], 2, as.factor))
##
# 2x2 between:
aov2 <- aov(get(vars[5]) ~ get(vars[1])*get(vars[2]), data=data)
aov2 <- aov(no_gap ~ Diag*Age, data=data)
aov2 <- aov(apply(vars[5]) ~ get(vars[1])*get(vars[2]), data=data)
summary(aov2)
For reference, this is what "vars" looks like when evaluated:
> vars
[1] "subject" "Diag" "Age" "gap" "no_gap"
Thanks so much for your help!!

The argument no_gap ~ Diag*Age you are passing to aov is a formula object. You can create a formula object from vars as follows:
myform <- as.formula(sprintf("%s ~ %s * %s", vars[5], vars[1], vars[2]))
aov2 <- aov(myform, data=data)

Related

fitting linear regression models with different predictors using loops

I want to fit regression models using a single predictor variable at a time. In total I have 7 predictors and 1 response variable. I want to write a chunk of code that picks a predictor variable from data frame and fits a model. I would further want to extract regression coefficient( not the intercept) and the sign of it and store them in 2 vectors. Here's my code-
for (x in (1:7))
{
fit <- lm(distance ~ FAA_unique_with_duration_filtered[x] , data=FAA_unique_with_duration_filtered)
coeff_values<-summary(fit)$coefficients[,1]
coeff_value<-coeff_values[2]
append(coeff_value_vector,coeff_value , after = length(coeff_value_vector))
append(RCs_sign_vector ,sign(coeff_values[2]) , after = length(RCs_sign_vector))
}
Over here x in will use the first column , then the 2nd and so on. However, I am getting the following error.
Error in model.frame.default(formula = distance ~ FAA_unique_with_duration_filtered[x], :
invalid type (list) for variable 'FAA_unique_with_duration_filtered[x]'
Is there a way to do this using loops?
You don't really need loops for this.
Suppose we want to regress y1, the 5th column of the built-in anscombe dataset, separately on each of the first 4 columns.
Then:
a <- anscombe
reg <- function(i) coef(lm(y1 ~., a[c(5, i)]))[[2]] # use lm
coefs <- sapply(1:4, reg)
signs <- sign(coefs)
# or
a <- anscombe
reg <- function(i) cov(a$y1, a[[i]]) / var(a[[i]]) # use formula for slope
coefs <- sapply(1:4, reg)
signs <- sign(coefs)
Alternately the following where reg is either of the reg definitions above.
a <- anscombe
coefs <- numeric(4)
for(i in 1:4) coefs[i] <- reg(i)
signs <- sign(coefs)

Writing Regression summary as CSV, including Model Stats

The question is as it sounds; at present I use the broom package to tidy up my regression summary and then use write_csv to turn that summary into a csv naturally. However, the problem is, is that this 'tidied' summary doesn't contain useful stats like R Squared, Residual distribution and p value from the F Statistic.
Does anyone know how to write up a regression summary into a csv which would contain this useful information?
Thanks.
An alternative would be to create a function that stores in a list all the information you need as follows:
lm(mpg ~ cyl, mtcars) -> model
model_stats <- function(model) {
data.frame(model = toString(model$call),
broom::augment(model)) -> info1
data.frame(model = toString(model$call),
broom::tidy(model)) -> info2
data.frame(model = toString(model$call),
broom::glance(model)) -> info3
list(info1, info2, info3) -> info_all
return(info_all)
}
out <- model_stats(model)
sapply(seq_along(out), function(i) write.csv(model_stats(model)[[i]], paste0('info', i, '.csv')))
I assume, the core issue of your question is that you don't handle all summary tables as data.frames or list (by compiling all information), aren't you!?
So if you just want to write a particular statistics summary (aov, TukeyHSD, augmented, glance etc.) in csv, you should change it to a data.frame
Some example from "broom" vignette: https://cran.r-project.org/web/packages/broom/vignettes/broom.html
glmfit <- glm(am ~ wt, mtcars, family = "binomial")
tidy(glmfit)
fit1 <- as.data.frame(augment(glmfit))
write.csv(fit1, "test.csv")
The value returned by lm is an object which can be further processed using summary. From the value returned by summary you can access different information and process it manually.
# Data and model fit
df <- data.frame(a = rnorm(100), b = rnorm(100))
mod <- lm(a~b, data = df)
su <- summary(mod)
# Helper function to create the output
fill <- function(row, mat) {
c(row, rep("", ncol(mat)-length(row)))
}
# Create the output. `su$coefficients` provides the matrix to which more information is added using `rbind`.
output <- rbind(
su$coefficients,
`Additional stats` = fill("", output),
Rsq = fill(su$r.squared, output),
Adj.Rsq = fill(su$adj.r.squared, output),
F = fill(su$fstatistic, output))
# Write to disk with row and colnames (col.names = NA)
write.table(output, file="model-summary.csv", sep = ",", col.names=NA)

Looping over variables for multilevel regression produces type error [duplicate]

This question already has answers here:
Formula with dynamic number of variables
(5 answers)
Closed 3 years ago.
I am writing a multilevel regression model, in which I begin the second level with a dataframe of predictands (coefficients from the first level) and a dataframe of predictors. Both dataframes have the same number of observations.
I wish to loop over the preditands (columns in first dataframe) and use lm() to regress them against the entire second dataframe of predictors. However, when I do, I get an error that I cannot figure out.
Example:
data(iris)
iris1 <- iris[-5] # remove the categories
iris2 <- iris[-5] * 6
for (col in names(iris1)) {
lm(iris1[col] ~ iris2)
}
## Error in model.frame.default(formula = iris1[col] ~ iris2, drop.unused.levels = TRUE) :
## invalid type (list) for variable 'iris1[col]'
I just can't understand what this means or why R considers iris1[col] to be a list.
For simplicity's sake I've tried merging them:
for (col in names(iris1)) {
tmp_df <- cbind(iris1[col], iris2)
colnames(tmp_df) <- letters[1:5] # to avoid duplicate names
lm(1 ~ ., tmp_df)
}
## Error in model.frame.default(formula = 1 ~ ., data = tmp_df, drop.unused.levels = TRUE) :
## variable lengths differ (found for 'a')
And this one's particularly frustrating because they are clearly the same length.
Note that lm can accept a matrix on the left hand side of the formula so we could do this:
lm(as.matrix(iris1) ~., iris2)
or if we want a separate lm object for each column of iris1:
regr <- function(y) lm(y ~., iris2))
Map(regr, iris1)
or
regr2 <- function(nm) {
fo <- as.formula(sprintf("iris2$%s ~.", nm))
do.call("lm", list(fo, quote(iris2)))
}
Map(regr2, names(iris1))
or lm.fit:
regr.fit <- function(y) lm.fit(cbind(1, as.matrix(iris2)), y)
Map(regr.fit, iris1)
Note that the component names of the result will be the y column name in iris1.

Stargazer one line per data set

I am running regressions using various subsets of a data set and a number of dependent variables.
An example using attitude data:
library(stargazer)
#REGRESSIONS USING DATASET 1
linear1.1 <- lm(rating ~ complaints, data = attitude) #dependent 1
linear1.2 <- lm(privileges ~ complaints, data = attitude) #dependent 2
#REGRESSIONS USING DATASET 2
linear2.1 <- lm(rating ~ complaints, data = attitude[1:15,]) #dependent 1
linear2.2 <- lm(privileges ~ complaints, data = attitude[1:15,]) #dependent 2
As you can see, both depdendent variables rating and privileges are used in regressions for both subsets of the data. Using a standard stargazer approach produces the following table:
stargazer::stargazer(linear1.1,linear1.2,linear2.1,linear2.2,
omit.stat = "all",
keep = "complaints")
Each column represents one of the regression models. However, I'd like to have each column represent one dependent variable. Each subset of the data should represent one row:
I have produced this table by hand. Does anyone know whether it's possible to achieve this using stargazer? I have a lot of regression subsets and dependent variables, so a highly automatic solution is appreciated. Thanks!
I just wonder if this little modification from this (Exporting output of custom multiple regressions from R to Latex) will suit you
library(stargazer)
library(broom)
## generate dummy data
set.seed(123)
x <- runif(1000)
z <- x^0.5
y <- x + z + rnorm(1000, sd=.05)
model1 <- lm(y ~ x)
model2 <- lm(y ~ z)
## transform model summaries into dataframes
tidy(model1) -> model1_tidy
tidy(model2) -> model2_tidy
output <- rbind(model1_tidy,model2_tidy)
stargazer(output, type='text', summary=FALSE)

R: Regression of each variable depending on all the others

In R, I have the following data.frame:
df <- data.frame(var1,var2,var3)
I would like to fit a regression function, like multinom, for each variable with respect to the others, without using the variable names explicitely. In other words, I would like to obtain this result:
fit1 <- multinom(var1 ~ ., data=df)
fit2 <- multinom(var2 ~ ., data=df)
fit3 <- multinom(var3 ~ ., data=df)
But in a for loop, without using the variable names (so that I can use the same code for any data.frame). Something similar to this:
for (i in colnames(df))
{
fit[i] <- lm(i ~ ., data=df)
}
(This code does not work.)
Maybe my question is trivial, but I have no idea on how to proceed.
Thanks!
You need to add an extra step to build the formula object using string operation
fit <- vector(mode = "list", length = ncol(df))
for (i in colnames(df)) {
fm <- as.formula(paste0(i, " ~ ."))
fit[[i]] <- lm(fm, data = df)
}

Resources