How to use a dataframe in a function in r - r

I need to insert the variables of a dataframe into a function in r. The function in question is "y=[1- (x1-x2) / x3]". When I write, and enter the variables manually it works, however, I need to use the random numbers from the dataframe.
#Original function
f<-function(x1, x2, x3)
+{}
f<-function(x1, x2, x3)
+{return(1-(x1-x2)/x3)}
f(0.9, 0.5, 0.5)```
#Dataframe function
f<-function(x1, x2, x3)
+{}
f<-function(x1, x2, x3)
+{return(1-(x1-x2)/x3)}
f(x1 = x1, x2 = x2, x3 = x3, DATA = DF)
The first output is ok, however, the second output appears the error message. Error in f(VMB = VMB, VMR = VMR, DATA = DATA1) : unused argument (DATA = DATA1) I know I'm not properly inserting the dataframe into the code, but I'm already circling, can anyone help me?

As the comments suggest, your problem is that the function doesn't contain a data argument. R doesn't know where x1, x2, x3 comes from and will only look at through the global environment trying to find them. If these are contained in a data frame, it doesn't know that it should take them from there, and will fail.
For example
f <- function(x,y,z)
1 + (x-y)/z
f(0.9, 0.5, 0.5)
will work, because it knows where to retrieve the values. So will
x1 <- 0.9
x2 <- 0.5
x3 <- 0.5
f(x1, x2, x3)
because it looks through these environemnts, but
df <- data.frame(x = 0.9, y = 0.5, z = 0.5)
f(x, y, z) #fails
fails, because it doesn't look for them in df. Instead you can use
f(df$x, df$y, df$z)
with(df, f(x, y, z)) #same
which lets R know where to get the variables. (Here i used x, y and z to avoid conflict names)
If this function should always take a data.frame and use columns x1, x2, x3 you could use rewrite it to incorporate this, as below.
f <- function(df){
with(df, 1 + (x1-x2)/x3)
}

Related

Why can't I add variables with 1x1 tibbles to a tibble with tibble::add_row?

This is a simple example of what I am trying to do. Why can't I add a row with each variable a 1x1 tibble?
d <- tibble(x = 2, y = 3)
x1 <- tibble(1)
y1 <- tibble(7)
d2 <- d %>%
tibble::add_row(
x = x1,
y = y1
)
I know the following works but it is not tidy and it just seems like there should be a better way.
d <- tibble(x = 2, y = 3)
x1 <- tibble(1)
y1 <- tibble(7)
d2 <- d %>%
tibble::add_row(
x = as.numeric(x1),
y = as.numeric(y1)
)
Thanks in advance awesome R people (:
Upgrading comment to an answer:
You're trying to insert a whole dataset ('tibble' in this instance) into a single cell in your first attempt. You need to select the single column out of the dataset to add as rows.
add_row(d, x = x1[[1]], y = y1[[1]])
Running as.numeric(x1) also converts x1 to a vector in a round-about way, which as you've found, will also work.
You could also go full dplyr and do:
add_row(d, x = pull(x1,1), y = pull(y1,1))
...if you wanted to avoid using the [[ brackets.

How to do linear regression with this particular data set?

I have a response variable y.
Also I have a list of 5 dependent variables
x <- list(x1, x2, x3, x4, x5)
Lastly I have a Logical Vector z of length 5. E.g.
z <- c(TRUE, TRUE, FALSE, FALSE, TRUE)
Given this I want R to automatically do linear Regression
lm(y ~ x1 + x2 + x5)
Basically the TRUE/FALSE correspond to whether to include the dependent variable or not.
I am unable to do this.
I tried doing lm(y ~x[z]) but it does not work.
You may do
lm(y ~ do.call(cbind, x[z]))
do.call(cbind, x[z]) will convert x[z] into a matrix, which is an acceptable input format for lm. One problem with this is that the names of the regressors (assuming that x is a named list) in the output are a little messy. So, instead you may do
lm(y ~ ., data = data.frame(y = y, do.call(cbind, x[z])))
that would give nice names in the output (again, assuming that x is a named list).
Try something like binding your y to a data.frame or matrix (cbind) before you do your linear regression. You can filter your dependent variables by doing something like this:
x <- list(x1 = 1:5, x2 = 1:5, x3 = 1:10, x4 = 1:5, x5 = 1:5)
z <- c(TRUE, TRUE, FALSE, FALSE, TRUE)
b <- data.frame(x[which(z == TRUE)])

How to paste formula into model.matrix function in R?

By way of simplified example, say you have the following data:
n <- 10
df <- data.frame(x1 = rnorm(n, 3, 1), x2 = rnorm(n, 0, 1))
And you wish to create a model matrix of the following form:
model.matrix(~ df$x1 + df$x2)
or more preferably:
model.matrix(~ x1 + x2, data = df)
but instead by pasting the formula into model.matrix. I have experimented with the following but encounter errors with all of them:
form1 <- "df$x1 + df$x2"
model.matrix(~ as.formula(form1))
model.matrix(~ eval(parse(text = form1)))
model.matrix(~ paste(form1))
model.matrix(~ form1)
I've also tried the same with the more preferable structure:
form2 <- "x1 + x2, data = df"
Is there a direct solution to this problem? Or is the model.matrix function not conducive to this approach?
Do you mean something like this?
expr <- "~ x1 + x2"
model.matrix(as.formula(expr), df)
You need to give df as the data argument outside of as.formula, as the data argument defines the environment within which to evaluate the formula.
If you don't want to specify the data argument you can do
model.matrix(as.formula("~ df$x1 + df$x2"))

Writing if / ifelse function in R

I am attempting to write a function in order to create a variable (BBDR) based on the conditions of another variable (Site0) using the if function. I have the following code using the if function.
x1 <- (africanaDamRate$BB6-africanaDamRate$BB0)/29
x2 <- (africanaDamRate$BB6-africanaDamRate$BB0)/22
x3 <- (africanaDamRate$BB6-africanaDamRate$BB0)/34
x4 <- (africanaDamRate$BB6-africanaDamRate$BB0)/30
F1 <- function(y){
if(africanaDamRate$Site0==1){africanaDamRate$BBDR<-x1}
if(africanaDamRate$Site0==2){africanaDamRate$BBDR<-x2}
if(africanaDamRate$Site0==3){africanaDamRate$BBDR<-x3}
if(africanaDamRate$Site0==4){africanaDamRate$BBDR<-x4}
}
africanaDamRate$BBDR<-F1(y)
But when I attempt this code I receive "The condition has length greater than 1..."
I have also attempted using the ifelse function with the following code:
africanaDamRate$BBDR<-ifelse(c(africanaDamRate$Site0==1, x1, NA), c(africanaDamRate$Site0==2, x2, NA), c(africanaDamRate$Site0==3, x3, NA), c(africanaDamRate$Site0==4, x4, NA))
But get the "unused argument" error.
Does anyone have any ideas of how I can do this (without subsetting)? Thanks so much!
Ryan
Your ifelse statement is wrong. It could be written like this:
africanaDamRate$BBDR <- ifelse(africanaDamRate$Site0 == 1, x1,
ifelse(africanaDamRate$Site0 == 2, x2,
ifelse(africanaDamRate$Site0 == 3, x3,
ifelse(africanaDamRate$Site0 == 4, x4, NA))))

make a list of lm objects, retain their class

Apologies for such a rudimentary question--I must be missing something obvious.
I want to build a list of lm objects, which I'm then going to use in an llply call to perform mediation analysis on this list. But this is immaterial--I just first want to make a list of length m (where m is the set of models) and each element within m will itself contain n lm objects.
So in this simple example
d1 <- data.frame(x1 = runif(100, 0, 1),
x2 = runif(100, 0, 1),
x3 = runif(100, 0, 1),
y1 = runif(100, 0, 1),
y2 = runif(100, 0, 1),
y3 = runif(100, 0, 1))
m1 <- lm(y1 ~ x1 + x2 + x3, data = d1)
m2 <- lm(x1 ~ x2 + x3, data = d1)
m3 <- lm(y2 ~ x1 + x2 + x3, data = d1)
m4 <- lm(x2 ~ x1 + x3, data = d1)
m5 <- lm(y3 ~ y1 + y2 + x3, data = d1)
m6 <- lm(x3 ~ x1 + x2, data = d1)
I want a list containing 3 elements, and the first element will contain m1 and m2, the second will contain m3 and m4, etc. My initial attempt is sort of right, but the lmm objects don't retain their class.
mlist <- list(c(m1,m2),
c(m3,m4),
c(m5,m6))
It has the right length (ie length(mlist) equals 3), but I thought I could access the lm object itself with
class(mlist[1][[1]])
but this element is apparently a list.
Am I screwing up how I build the list in the first step, or is this something more fundamental regarding lm objects?
No, you're just getting confused with c and list indexing. Try this:
mlist <- list(list(m1,m2),
list(m3,m4),
list(m5,m6))
> class(mlist[[1]][[1]])
[1] "lm"
So c will concatenate lists by flattening them. In the case of a lm object, that basically means it's flattening each lm object in a list of each of the object components, and then concatenating all those lists together. c is more intuitively used on atomic vectors.
The indexing of lists often trips people up. The thing to remember is that [ will always return a sub-list, while [[ selects an element.
In my example above, this means that mlist[1] will return a list of length one. That first element is still a list. So you'd have to do something like mlist[1][[1]][[1]] to get all the way down to the lm object that way.

Resources