R: Regression of each variable depending on all the others - r

In R, I have the following data.frame:
df <- data.frame(var1,var2,var3)
I would like to fit a regression function, like multinom, for each variable with respect to the others, without using the variable names explicitely. In other words, I would like to obtain this result:
fit1 <- multinom(var1 ~ ., data=df)
fit2 <- multinom(var2 ~ ., data=df)
fit3 <- multinom(var3 ~ ., data=df)
But in a for loop, without using the variable names (so that I can use the same code for any data.frame). Something similar to this:
for (i in colnames(df))
{
fit[i] <- lm(i ~ ., data=df)
}
(This code does not work.)
Maybe my question is trivial, but I have no idea on how to proceed.
Thanks!

You need to add an extra step to build the formula object using string operation
fit <- vector(mode = "list", length = ncol(df))
for (i in colnames(df)) {
fm <- as.formula(paste0(i, " ~ ."))
fit[[i]] <- lm(fm, data = df)
}

Related

multiple linear models in the same data frame

I have a function that takes a data frame, the first column must be Y and the user selects which column will be X. I need to run multiple linear models in the same data.frame (find which lm has the best results for my user).
Using mtcars dataset, what I have for only one linear model:
results_LM <- function(data, var) {
fm1 <- as.formula(paste(colnames(data)[1], "~", var))
lm1(fm, data = data)
return(lm1)
}
fit <- results_LM(mtcars, "disp")
I would do the same for each linear model I'll test (and store in a final list that I'll use later):
results_LM <- function(data, var) {
fm1 <- as.formula(paste(colnames(data)[1], "~", var))
lm1(fm, data = data)
fm2 <- as.formula(paste(colnames(data)[1], "~", var, "+ I(", var, "^2)"))
lm2(fm, data = data)
all_lm <- list("FirstLM" = lm1, "SeconLM" = lm2)
return(all_lm)
}
And this goes on for fm3, lm3... fm99, lm 99
This would work, but I guess that are a MUCH better way to do this
Any ideas on how to run multiple linear models in the same data frame?
Alreay solved, looking at this post
I put all my models inside a list like and used lapply to run all of them
results_LM <- function(data, var) {
formulas <- list(as.formula(paste(colnames(data)[1], "~", var),
as.formula(paste(colnames(data)[1], "~", var, "+ I(", var, "^2)")))
models <- lapply(formulas, lm, data = data)
return(models)
}

Referencing factor names in R for ANOVA

I'm relatively new to R and am trying to streamline an ANOVA script to read a set of factor names from a table, and perform statistical tests on the interactions between these factors.
My basic question is how to not have to manually write the name of factors when I call aov, like this:
aov2 <- aov(no_gap ~ Diag*Age, data=data)
But instead, to index a variable which contains the names of the factors of interest, like this (but this doesn't work):
aov2 <- aov(get(vars[5]) ~ get(vars[1])*get(vars[2]), data=data)
Here's my whole script:
#Load data
outName <- read_file("fileNameToWrite.txt")
data <- read.table(header=TRUE, "testDataTable.txt",stringsAsFactors = TRUE)
vars <- colnames(data)
# Make sure subject column is a factor
cols <- c(vars[1:2])
data[,cols] <- data.frame(apply(data[cols], 2, as.factor))
##
# 2x2 between:
aov2 <- aov(get(vars[5]) ~ get(vars[1])*get(vars[2]), data=data)
aov2 <- aov(no_gap ~ Diag*Age, data=data)
aov2 <- aov(apply(vars[5]) ~ get(vars[1])*get(vars[2]), data=data)
summary(aov2)
For reference, this is what "vars" looks like when evaluated:
> vars
[1] "subject" "Diag" "Age" "gap" "no_gap"
Thanks so much for your help!!
The argument no_gap ~ Diag*Age you are passing to aov is a formula object. You can create a formula object from vars as follows:
myform <- as.formula(sprintf("%s ~ %s * %s", vars[5], vars[1], vars[2]))
aov2 <- aov(myform, data=data)

How to create a loop for a linear model in R

I am here to ask your help.
I have to run a series of OLS regression on multiple depended variable using the same set for the independent ones.
I.e. I have a dataframe of size (1510x5), in particular each one represent the return of a portfolio, and I would like to regress it agains the same set of dependent variable (1510x4), which in my case are the factors from the Carhart model. Since, beside the value for the coefficients, I am interested in both their P-value and on the R2 of the regression, is there a way to build a loop that allows me to store the information?
What I have tried so far is:
for (i in 1:ncol(EW_Portfolio)) {
lmfit <- lm(EW_Portfolio[, i] ~ FFM)
summary(lmfit_i)
}
in the hope that, every time the loop repeated itself, I could see the result of each individual regression.
The easiest would be to store it in a list:
resultsList <- list()
for (i in 1:ncol(EW_Portfolio)) {
lmfit <- lm(EW_Portfolio[, i] ~ FFM)
resultsList[[i]] <- summary(lmfit_i)
}
You can then access the results you mention:
resultsList[[1]]$coefficients
resultsList[[1]]$r.squared
it may be something like, couldn't sure about the p.values
data("mtcars")
formulas <- list(
mpg ~ disp,
mpg ~ disp + wt
)
res <- vector("list", length = length(formulas))
my.r2 <- vector("list", length = length(formulas))
my.sum <- vector("list", length = length(formulas))
for(i in seq_along(formulas)){
res[[i]] <- lm(formulas[[i]], data = mtcars)
my.r2[[i]] <- (summary(res[[i]]))$adj.r.squared
my.sum[[i]] <- (summary(res[[i]]))
}
res
unlist(my.r2)
my.sum
lapply(formulas, lm, data = mtcars)

Rank a list of models based on AIC values

After applying a model between one response variable and several exlanatory variables across a dataframe, I would like to rank each model by the AIC score.
I have encountered a very similar question that does exactly what I want to do.
Using lapply on a list of models, but it does not seem to work for me and I'm not sure why. Here's an example using the mtcars dataset:
lm_multiple <- lapply(mtcars[,-1], function(x) summary(lm(mtcars$mpg ~ x)))
An approved answer from the link above suggested:
sapply(X = lm_multiple, FUN = AIC)
But this does not work for me, I get this warning message.
Error in UseMethod("logLik") :
no applicable method for 'logLik' applied to an object of class "summary.lm"
Here is an answer from the original question...
x <- seq(1:10)
y <- sin(x)^2
model.list <- list(model1 = lm(y ~ x),
model2 = lm(y ~ x + I(x^2) + I(x^3)))
sapply(X = model.list, FUN = AIC)
you should remove the summary like this
lm_multiple <- lapply(mtcars[,-1], function(x) lm(mtcars$mpg ~ x))
sapply(X = lm_multiple, FUN = AIC)

Give the formula of a SVM with R

I use this code for my SVM prediction
library(gdata)
data = read.csv2("test.csv")
data
library(e1071)
model <- svm(cote ~ .,data,kernel='radial')
#model1 <- svm(y ~ x1+x2, data=f, type='nu-classification',kernel='radial',tolerance=0.001,gamma=2.5,cost=2,nu=0.8,cross=10,shrinking=FALSE)
predict(model, subset(data, select = - c(cote)))
Now I need to take the literal formula of this SVM to paste it on a C++ program. How can I do that ?
Thx
Maybe the formula can be recovered from the 'model'-object. Try this:
model$call[[2]]
Example:
> ?e1071::predict.svm
> model <- svm(Species ~ ., data = iris)
> model$call[[2]]
# Species ~ .
If you want that as a character variable the usual methods of coercion work as expected.

Resources