How to do linear regression with this particular data set? - r

I have a response variable y.
Also I have a list of 5 dependent variables
x <- list(x1, x2, x3, x4, x5)
Lastly I have a Logical Vector z of length 5. E.g.
z <- c(TRUE, TRUE, FALSE, FALSE, TRUE)
Given this I want R to automatically do linear Regression
lm(y ~ x1 + x2 + x5)
Basically the TRUE/FALSE correspond to whether to include the dependent variable or not.
I am unable to do this.
I tried doing lm(y ~x[z]) but it does not work.

You may do
lm(y ~ do.call(cbind, x[z]))
do.call(cbind, x[z]) will convert x[z] into a matrix, which is an acceptable input format for lm. One problem with this is that the names of the regressors (assuming that x is a named list) in the output are a little messy. So, instead you may do
lm(y ~ ., data = data.frame(y = y, do.call(cbind, x[z])))
that would give nice names in the output (again, assuming that x is a named list).

Try something like binding your y to a data.frame or matrix (cbind) before you do your linear regression. You can filter your dependent variables by doing something like this:
x <- list(x1 = 1:5, x2 = 1:5, x3 = 1:10, x4 = 1:5, x5 = 1:5)
z <- c(TRUE, TRUE, FALSE, FALSE, TRUE)
b <- data.frame(x[which(z == TRUE)])

Related

How to use a dataframe in a function in r

I need to insert the variables of a dataframe into a function in r. The function in question is "y=[1- (x1-x2) / x3]". When I write, and enter the variables manually it works, however, I need to use the random numbers from the dataframe.
#Original function
f<-function(x1, x2, x3)
+{}
f<-function(x1, x2, x3)
+{return(1-(x1-x2)/x3)}
f(0.9, 0.5, 0.5)```
#Dataframe function
f<-function(x1, x2, x3)
+{}
f<-function(x1, x2, x3)
+{return(1-(x1-x2)/x3)}
f(x1 = x1, x2 = x2, x3 = x3, DATA = DF)
The first output is ok, however, the second output appears the error message. Error in f(VMB = VMB, VMR = VMR, DATA = DATA1) : unused argument (DATA = DATA1) I know I'm not properly inserting the dataframe into the code, but I'm already circling, can anyone help me?
As the comments suggest, your problem is that the function doesn't contain a data argument. R doesn't know where x1, x2, x3 comes from and will only look at through the global environment trying to find them. If these are contained in a data frame, it doesn't know that it should take them from there, and will fail.
For example
f <- function(x,y,z)
1 + (x-y)/z
f(0.9, 0.5, 0.5)
will work, because it knows where to retrieve the values. So will
x1 <- 0.9
x2 <- 0.5
x3 <- 0.5
f(x1, x2, x3)
because it looks through these environemnts, but
df <- data.frame(x = 0.9, y = 0.5, z = 0.5)
f(x, y, z) #fails
fails, because it doesn't look for them in df. Instead you can use
f(df$x, df$y, df$z)
with(df, f(x, y, z)) #same
which lets R know where to get the variables. (Here i used x, y and z to avoid conflict names)
If this function should always take a data.frame and use columns x1, x2, x3 you could use rewrite it to incorporate this, as below.
f <- function(df){
with(df, 1 + (x1-x2)/x3)
}

How to paste formula into model.matrix function in R?

By way of simplified example, say you have the following data:
n <- 10
df <- data.frame(x1 = rnorm(n, 3, 1), x2 = rnorm(n, 0, 1))
And you wish to create a model matrix of the following form:
model.matrix(~ df$x1 + df$x2)
or more preferably:
model.matrix(~ x1 + x2, data = df)
but instead by pasting the formula into model.matrix. I have experimented with the following but encounter errors with all of them:
form1 <- "df$x1 + df$x2"
model.matrix(~ as.formula(form1))
model.matrix(~ eval(parse(text = form1)))
model.matrix(~ paste(form1))
model.matrix(~ form1)
I've also tried the same with the more preferable structure:
form2 <- "x1 + x2, data = df"
Is there a direct solution to this problem? Or is the model.matrix function not conducive to this approach?
Do you mean something like this?
expr <- "~ x1 + x2"
model.matrix(as.formula(expr), df)
You need to give df as the data argument outside of as.formula, as the data argument defines the environment within which to evaluate the formula.
If you don't want to specify the data argument you can do
model.matrix(as.formula("~ df$x1 + df$x2"))

Extracting residual values from lavaan list matrices in R

I am using lavaan package and my intention is to get my model residuals as dataframes for further use. I run several models that have grouping variables. Here's the basic workflow:
require(lavaan)
df <- data.frame(
y1 = sample(1:100),
y2 = sample(1:100),
x1 = sample(1:100),
x2 = sample(1:100),
x3 = sample(1:100),
grpvar = sample(c("grp1","grp2"), 100, replace = T))
semModel <- list(length = 2)
semModel[1] <- 'y1 ~ c(a,b)*x1 + c(a,b)*x2'
semModel[2] <- 'y1 ~ c(a,b)*x1
y2 ~ c(a,b)*x2 + c(a,b)*x3'
funEstim <- function(model){
sem(model, data = df, group = "grpvar", estimator = "MLM")}
fits <- lapply(semModel, funEstim)
residuals <- lapply(fits, function(x) resid(x, "obs"))
Now the resulting residuals object bugs me. It is a list of matrices that is nested few times. How do I get each of the matrices as a separate dataframe without any hardcoding? I don't want to unlist them as that would lose some information.
You can use list2env along with unlist to make the grp1, grp2, length.grp1, and length.grp2 directly available in the global environment.
list2env(unlist(residuals, recursive=FALSE), envir=.GlobalEnv)
ls()
#[1] "df" "fits" "funEstim" "grp1" "grp2"
#[6] "length.grp1" "length.grp2" "residuals" "semModel"
But they won't be data frames. For that you could convert them to data frames before calling list2env:
df.list <- lapply(unlist(residuals, recursive=FALSE), data.frame)
list2env(df.list, envir=.GlobalEnv)

List Indexing in R over a loop

I'm new to using lists in R and am trying to run a loop over various data frames that stores multiple models for each frame. I would like the models that correspond to a given data frame within the first index of the list; e.g. [[i]][1], [[i]][2]. The following example overwrites the list:
f1 <- data.frame(x = seq(1:6), y = sample(1:100, 6, replace = TRUE), z = rnorm(6))
f2 <- data.frame(x = seq(6,11), y = sample(1:100, 6, replace = TRUE), z = rnorm(6))
data.frames <- list(f1,f2)
fit <- list()
for(i in 1:length(data.frames)){
fit[[i]] <- lm(y ~ x, data = data.frames[[i]])
fit[[i]] <- lm(y ~ x + z, data = data.frames[[i]])
}
Any idea how to set up the list or the indexing in the loop such that it generates an output that has the two models for the first frame referenced as [[1]][1] and [[1]][2] and the second frame as [[2]][1] and [[2]][2]? Thanks for any and all help.
Calculate both models in a single lapply call applied to each part of the data.frames list:
lapply(data.frames, function(i) {
list(lm(y ~ x, data = i),
lm(y ~ x + z, data=i))
})

How to construct a big regular formula for a model in R?

I am trying create model to predict "y" from data "D" that contain predictor x1 to x100 and other 200 variables . since all Xs are not stored consequently I can't call them by column.
I can't use ctree( y ~ , data = D) because other variables , Is there a way that I can refer them x1:100 ?? in the model ?
instead of writing a very long code
ctree( y = x1 + x2 + x..... x100)
Some recommendation would be appreciated.
Two more. The simplest in my mind is to subset the data:
ctree(y ~ ., data = D[, c("y", paste0("x", 1:100))]
Or a more functional approach to building dynamic formulas:
ctree(reformulate(paste0("x", 1:100), "y"), data = D)
Construct your formula as a text string, and convert it with as.formula.
vars <- names(D)[1:100] # or wherever your desired predictors are
fm <- paste("y ~", paste(vars, collapse="+"))
fm <- as.formula(fm)
ctree(fm, data=D, ...)
You can use this:
fml = as.formula(paste("y", paste0("x", 1:100, collapse=" + "), sep=" ~ "))
ctree(fmla)

Resources