Using paste and substitute in combination with quotation marks in R - r

Please note that I already had a look at this and that but still cannot solve my problem.
Suppose a minimal working example:
a <- c(1,2,3)
b <- c(2,3,4)
c <- c(4,5,6)
dftest <- data.frame(a,b,c)
foo <- function(x, y, data = data) {
data[, c("x","y")]
}
foo(a, b, data = dftest)
Here, the last line obviously returns an Error: undefined columns selected. This error is returned because the columns to be selected are x and y, which are not part of the data frame dftest.
Question: How do I need to formulate the definition of the function to obtain the desired output, which is
> dftest[, c("a","b")]
# a b
# 1 1 2
# 2 2 3
# 3 3 4
which I want to obtain by calling the function foo.
Please be aware that in order for the solution to be useful for my purposes, the format of the function call of foo is to be regarded fixed, that is, the only changes are to be made to the function itself, not the call. I.e. foo(a, b, data = dftest) is the only input to be allowed.
Approach: I tried to use paste and substitute in combination with eval to first replace the x and y with the arguments of the function call and then evaluate the call. However, escaping the quotation marks seems to be a problem here:
foo <- function(x, y, data = data) {
substitute(data[, paste("c(\"",x,"\",\"",y,"\")", sep = "")])
}
foo(a, b, data = dftest)
eval(foo(a, b, data = dftest))
Here, foo(a, b, data = dftest) returns:
dftest[, paste("c(\"", a, "\",\"", b, "\")", sep = "")]
However, when evaluating with eval() (focusing only on the paste part),
paste("c(\"", a, "\",\"", b, "\")", sep = "")
returns:
# "c(\"1\",\"2\")" "c(\"2\",\"3\")" "c(\"3\",\"4\")"
and not, as I would hope c("a","b"), thus again resulting in the same error as above.

Try this:
foo <- function(x, y, data = data) {
x <- deparse(substitute(x))
y <- deparse(substitute(y))
data[, c(x, y)]
}

Related

How to write a function with an unspecified number of arguments where the arguments are column names

I am trying to write a function with an unspecified number of arguments using ... but I am running into issues where those arguments are column names. As a simple example, if I want a function that takes a data frame and uses within() to make a new column that is several other columns pasted together, I would intuitively write it as
example.fun <- function(input,...){
res <- within(input,pasted <- paste(...))
res}
where input is a data frame and ... specifies column names. This gives an error saying that the column names cannot be found (they are treated as objects). e.g.
df <- data.frame(x = c(1,2),y=c("a","b"))
example.fun(df,x,y)
This returns "Error in paste(...) : object 'x' not found "
I can use attach() and detach() within the function as a work around,
example.fun2 <- function(input,...){
attach(input)
res <- within(input,pasted <- paste(...))
detach(input)
res}
This works, but it's clunky and runs into issues if there happens to be an object in the global environment that is called the same thing as a column name, so it's not my preference.
What is the correct way to do this?
Thanks
1) Wrap the code in eval(substitute(...code...)) like this:
example.fun <- function(data, ...) {
eval(substitute(within(data, pasted <- paste(...))))
}
# test
df <- data.frame(x = c(1, 2), y = c("a", "b"))
example.fun(df, x, y)
## x y pasted
## 1 1 a 1 a
## 2 2 b 2 b
1a) A variation of that would be:
example.fun.2 <- function(data, ...) {
data.frame(data, pasted = eval(substitute(paste(...)), data))
}
example.fun.2(df, x, y)
2) Another possibility is to convert each argument to a character string and then use indexing.
example.fun.3 <- function(data, ...) {
vnames <- sapply(substitute(list(...))[-1], deparse)
data.frame(data, pasted = do.call("paste", data[vnames]))
}
example.fun.3(df, x, y)
3) Other possibilities are to change the design of the function and pass the variable names as a formula or character vector.
example.fun.4 <- function(data, formula) {
data.frame(data, pasted = do.call("paste", get_all_vars(formula, data)))
}
example.fun.4(df, ~ x + y)
example.fun.5 <- function(data, vnames) {
data.frame(data, pasted = do.call("paste", data[vnames]))
}
example.fun.5(df, c("x", "y"))

R cbind with get paste

cbind() function works as x <- cbind(a,b)
where column name 'b' can be specified for the function b = get(paste0('var',i)),
that is x <- cbind(a,b = get(paste0('var',i)))
I am trying to do the following:
x <- cbind(a, get(paste0('var',i))) = j), where "j" can be a vector or a function.
however, got the following error: Error: unexpected '=' in "x <- cbind(a, get(paste0('var',i))) = j)"
If i just specify "x <- cbind(a, get(paste0('var',i))))", then the 2nd column name is "get(paste0('var',i))))", which is not convenient.
How can I define column names with a function get(paste()) within cbind() or rbind() or bind_cols()? Or what would be the alternative solution?
An example would have been helpful to understand the problem but maybe this?
x <- cbind(a, j)
colnames(x)[2] <- get(paste0('var',i))
Or if you want to do it in single line -
x <- cbind(a, setNames(j, get(paste0('var',i))))
We can use
x <- data.frame(a, j)
colnames(x)[2] <- get(paste('var', i, sep=""))
Or use tibble
tibble(a, !! b := j)

Convert a variable into a string in R

I have a simple question. Suppose that I have this code:
y <- x
name <- "x"
Where x could be any R object.
Is there an a way the variable name to automatically take the string x once I assign x to y ?
If I understand correctly, you want the value of x to be assigned to y, while the variable name itself to be assigned to name as a string. If so, then you can capture x as an unevaluated expression, then 1) evaluate it and store the result to y, and 2) deparse it and store the resulting string to name:
z <- quote(x) # z contains unevaluated expression `x`
y <- eval(z) # evaluates the expression, returning the value of x
name <- deparse(z) # returns the expression as a string
If the question is how to get the name of the variable that was assigned to y then one cannot do that perfectly but a heuristic would be to examine every variable and return the name (or names) of anything with matching value.
If the code you are using is within a function and you are looking for variables that are defined in that function then use e <- environment() in place of the line that defines e below.
# test data
# start a fresh R session
a <- 1
x <- 2
y <- x
e <- .GlobalEnv
setdiff(names(Filter(isTRUE, eapply(e, identical, y))), "y")
## [1] "x"
Note
If the question is how to get value value of x from its name then:
Use get:
# test input
x <- 3
name <- "x"
y <- get(name)
y
## [1] 3
This will also work if x is in the global environment:
y <- .GlobalEnv[[name]]
y
## [1] 3
Would that work for you? I personally would not use anything like this in my scripts, but it accomplishes what you need:
assign_and_save_name <- function(new_var, var, var_name){
var_value <- sapply(ls(envir = parent.frame()), get)[[var]]
assign(new_var, var_value, parent.frame())
assign(var_name, var, parent.frame())
}
x <- 3
assign_and_save_name('y', 'x', 'name_x')

Use of multiple ... function arguments in R

Quick question I was thinking about, I understand if you have one argument in your function with varying parameters, you can use ... and move on. But if you have several arguments with varying parameters, how would that work?
Example:
d <- data.frame(alpha=1:3, beta=4:6, gamma=7:9, type = c("okay", "no", "yes"))
rownames(d) <- d[,4]
test <- function(...){
x <- c(...)
y <- d[x,]
z <- y$alpha
print(z)
}
test("okay","no")
But, in the situation where I might have a second dataframe that I want to include in my function:
d2 <- data.frame(one=2:4, two=10:12, three=20:22, label = c("blue","yellow","red"))
rownames(d2) <- d[,4]
test2 <- function(..., ...){
x <- c(...)
y <- d[x,]
z <- y$alpha
x2 <- c(...) ## ?
}
How would I tell R that the first ... argument is for df, and the second ... argument is for df2?
EDIT -
the function call for the test2 would ideally be: test2("okay","no","blue","red") where the first two arguments are for df, and the other two for df2. I'm just not sure how to tell R how to differentiate between these arguments.

How to write a function() with arguments x,y that only returns values from column x that are == y

I have a data frame:
df <- data.frame( a = 1:5, b = 1:5, c = 1:5, d = as.factor(1:5))
I want to write a function that takes as its argument one of the columns a,b or c, and one of the factors of column d, and returns only the values of column a, b, or c, that have said factor value for column d.
I tried the following code:
fun1 <- function(x,y) {
u <- x[data$d == "y"]
return(u)
}
and I keep getting back numeric(0) as the output of the function. When I try similar code outside of the function() environment, it appears to work fine. Any help would be appreciated.
Probably a duplicate but I don't know how I would find it in the haystack of items with tags: data.frame, indexing, columns, values. Best practice is to pass the "data" as well as the search terms. (Calling the object df1 rather than df.)
fun1 <- function(dfrm, col,val) {
u <- dfrm[dfrm$d == val , col]
return(u)
}
fun1(df1, 'b', 3)
#[1] 3

Resources