Nonstandard evaluation of list of variable names

Nonstandard evaluation of list of variable names - r

I'm writing an estimation procedure in R loops through a list of variables names from a data.frame that the user declares. I'm trying to avoid requiring the user to enquote the variables to make their life easier (the goal is to upload this to CRAN, so we care a lot about user experience).
To prevent R from trying to evaluate the variable names, I constructed the function alt() that is like an alternative to c() and list(), but does not evaluate the elements.
My question is how I can elegantly do away with the alt() function, so users can learn one less function. Here is a simple MWE that hopefully illustrates the problem:
## Construct non-evaluating list function
alt <- function(...) {
alt <- as.list(substitute(list(...)))
return(alt[-1])
}
## Construct function that enquotes non-evaluated vectors
## contained in 'alt()'. Perhaps enquoting variable names
## is unavoidable because the data set is stored as a
## data.frame, but at least the user will not have to do it.
restring <- function(vector) {
vector <- deparse(vector)
if (substr(vector, start = 1, stop = 2) == "c(") {
vector <- substr(vector, 3, nchar(vector) - 1)
vector <- strsplit(vector, ", ")[[1]]
}
return(vector)
}
## Example of a function that loops over the list above
## for a given data set. The function simply prints out
## the columns declared in each element of 'alt()'.
test <- function(data, vlist) {
for (i in 1:length(vlist)) {
print(paste0("Data set ", i, ":"))
print(data[, restring(vlist[[i]])])
}
}
## Construct example data
N <- 4
df <- data.frame(x1 = c(1, 2),
x2 = c(3, 4))
## Example of user-declared list of variables to loop over
vlist <- alt(x1, c(x1, x2))
## Output from running this example
> test(df, vlist)
[1] "Data set 1:"
[1] 1 2
[1] "Data set 2:"
x1 x2
1 1 3
2 2 4
The user could also have declared
test(df, alt(x1, c(x1, x2)))
But it would be nice if I did not have to require the user to use a different function to declare these lists of variables. If it could work using standard R functions, like
test(df, list(x1, c(x1, x2)))
that would be great, but I haven't been able to find a way other than performing some ungainly string manipulations using deparse(substitute()), similar to the restring() function (not sure how CRAN feels about that).
Any thoughts on this non-standard evaluation issue would be appreciated. Also, if alt() is easy enough to use that it is not worth removing, that would also be good to know.

A more compact option would be en_exprs from rlang
library(rlang)
alt1 <- function(...) enexprs(...)
test(df, alt1(x1, c(x1, x2)))
#[1] "Data set 1:"
#[1] 1 2
#[1] "Data set 2:"
# x1 x2
#1 1 3
#2 2 4
Or without using any external package, quote the expressions in a list
test(df, list(quote(x1), quote(c(x1, x2))))
#[1] "Data set 1:"
#[1] 1 2
#[1] "Data set 2:"
# x1 x2
#1 1 3
#2 2 4

Related

How to transform the object of a function in r?

I want to create a function that transforms its object.
I have tried to transform the variable as you would normally, but within the function.
This works:
vec <- c(1, 2, 3, 3)
vec <- (-1*vec)+1+max(vec, na.rm = T)
[1] 3 2 1 1
This doesn't work:
vec <- c(1, 2, 3, 3)
func <- function(x){
x <- (-1*x)+1+max(x, na.rm = T))
}
func(vec)
vec
[1] 1 2 3 3

R is functional so normally one returns the output. If you want to change
the value of the input variable to take on the output value then it is normally done by the caller, not within the function. Using func from the question it would normally be done like this:
vec <- func(vec)
Furthermore, while you can overwrite variables it is, in general, not a good
idea. It makes debugging difficult. Is the current value of vec the
input or output and if it is the output what is the value of the input? We
don't know since we have overwritten it.
func_ovewrite
That said if you really want to do this despite the comments above then:
# works but not recommended
func_overwrite <- function(x) eval.parent(substitute({
x <- (-1*x)+1+max(x, na.rm = TRUE)
}))
# test
v <- c(1, 2, 3, 3)
func_overwrite(v)
v
## [1] 3 2 1 1
Replacement functions
Despite R's functional nature it actually does provide one facility for overwriting although the function in the question is not really a good candidate for it so let us change the example to provide a function incr which increments the input variable by a given value. That is, it does this:
x <- x + b
We can write this in R as:
`incr<-` <- function(x, value) x + value
# test
xx <- 3
incr(xx) <- 10
xx
## [1] 13
T vs. TRUE
One other comment. Do not use T for true. Always write it out. TRUE is a reserved name in R but T is a valid variable name so it can lead to hard to find errors such as when someone uses T for temperature.

function to subset data supplying subset argument as text string

m <- matrix(1:4, ncol=2)
l <- list(a=1:3, b='c')
d <- data.frame(a=1:3, b=3:1)
I was wondering if it is possible to make a function that takes a base R object (matrix, vector, list or data.frame, ...) as well as a text that specifies the subset of the object.
f1 <- function(object, subset) {
# object'subset'
}
For instance
f1(m, '[1,1]') #to evaluate m[1,1]
f1(l, '[[1]][2:3]') #l[[1]][2:3]
f1(d, '$a') #d$a
would give us (respectively):
[1] 1
[1] 2 3
[1] 1 2 3
I guess the function need somehow to glue the two arguments before evaluating. I guess one could make a kind of interpreter for each bit of the subset text and the (for the matrix example) do something like:
`[`(1,1)
This would possible but I thought there would be an easier more direct way (my 'glue' above).

Well one way to go is to use eval(parse)) methodology, i.e.
f1 <- function(x, text){
eval(parse(text = paste0(x, text)))
}
f1('d', '$a')
#[1] 1 2 3
f1('m', '[1,1]')
#[1] 1
f1('l', '[[1]][2:3]')
#[1] 2 3

f1<-function(object, subset){
return(eval(parse(text=paste0(substitute(object),subset))))
}
> m=matrix(4,2,2)
> l=list(c(1,2,3),c(2,3,4))
> f1(m,'[1,1]')
[1] 4
> f1(l,'[[1]][1:2]')
[1] 1 2

treat string as object name in a loop in R

I want to create a string in a loop and use this string as object in this loop. Here is a simplified example:
for (i in 1:2) {
x <- paste("varname",i, sep="")
x <- value
}
the loop should create varname1, varname2. Then I want to use varname1, varname2 as objects to assign values. I tried paste(), print() etc.
Thanks for help!

You could create the call() to <- and then evaluate it. Here's an example,
value <- 1:5
for (i in 1:2) {
x <- paste("varname",i, sep="")
eval(call("<-", as.name(x), value))
}
which creates the two objects varname1 and varname2
varname1
# [1] 1 2 3 4 5
varname2
# [1] 1 2 3 4 5
But you should really try to avoid assigning to the global environment from with in a method/function. We could use a list along with substitute() and then we have the new variables together in the same place.
f <- function(aa, bb) {
eval(substitute(a <- b, list(a = as.name(aa), b = bb)))
}
Map(f, paste0("varname", 1:2), list(1:3, 3:6))
# $varname1
# [1] 1 2 3
#
# $varname2
# [1] 3 4 5 6

assign("variableName", 5)
would do that.
For example if you have variable names in array of strings you can set them in loop as:
assign(varname[1], 2 + 2)
More and more information
https://stat.ethz.ch/R-manual/R-patched/library/base/html/assign.html

#MahmutAliÖZKURAN has answered your question about how to do this using a loop. A more "R-ish" way to accomplish this might be:
mapply(assign, <vector of variable names>, <vector of values>,
MoreArgs = list(envir = .GlobalEnv))
Or, as in the case you specified above:
mapply(assign, paste0("varname", 1:2), <vector of values>,
MoreArgs = list(envir = .GlobalEnv))

I had the same issue and for some reason my apply's weren't working (lapply, assign directly, or my preferred goto, mclapply)
But this worked
vectorXTS <- mclapply(symbolstring,function(x)
{
df <- symbol_data_set[symbol_data_set$Symbol==x,]
return(xts(as.data.frame(df[,-1:-2]),order.by=as.POSIXct(df$Date)))
})
names(symbolstring) <- symbolstring
names(vectorXTS) <- symbolstring
for(i in symbolstring) assign(symbolstring[i],vectorXTS[i])

R: creating a named vector from variables

Inside a function I define a bunch of scalar variables like this:
a <- 10
b <- a*100
c <- a + b
At the end of the function, I want to return a,b,c in a named vector, with the same names as the variables, with minimal coding, i.e. I do not want to do:
c( a = a, b = b, c = c )
Is there a language construct that does this? For example, if I simply do return(c(a,b,c)) it returns an unnamed vector, which is not what I want. I currently have a hacky way of doing this:
> cbind(a,b,c)[1,]
a b c
10 1000 1010
Is there perhaps a better, less hacky, way?

Here's a function to do that for you, which also allows you to optionally name some of the values. There's not much to it, except for the trick to get the unevaluated expression and deparse it into a single character vector.
c2 <- function(...) {
vals <- c(...)
if (is.null(names(vals))) {
missing_names <- rep(TRUE, length(vals))
} else {
missing_names <- names(vals) == ""
}
if (any(missing_names)) {
names <- vapply(substitute(list(...))[-1], deparse, character(1))
names(vals)[missing_names] <- names[missing_names]
}
vals
}
a <- 1
b <- 2
c <- 3
c2(a, b, d = c)
# a b d
# 1 2 3
Note that it's not guaranteed to produce syntactically valid names. If you want that, apply the make.names function to the names vector.
c2(mean(a,b,c))
# mean(a, b, c)
# 1
Also, as with any function that uses substitute, c2 is more suited for interactive use than to be used within another function.

How to assign from a function which returns more than one value?

Still trying to get into the R logic... what is the "best" way to unpack (on LHS) the results from a function returning multiple values?
I can't do this apparently:
R> functionReturningTwoValues <- function() { return(c(1, 2)) }
R> functionReturningTwoValues()
[1] 1 2
R> a, b <- functionReturningTwoValues()
Error: unexpected ',' in "a,"
R> c(a, b) <- functionReturningTwoValues()
Error in c(a, b) <- functionReturningTwoValues() : object 'a' not found
must I really do the following?
R> r <- functionReturningTwoValues()
R> a <- r[1]; b <- r[2]
or would the R programmer write something more like this:
R> functionReturningTwoValues <- function() {return(list(first=1, second=2))}
R> r <- functionReturningTwoValues()
R> r$first
[1] 1
R> r$second
[1] 2
--- edited to answer Shane's questions ---
I don't really need giving names to the result value parts. I am applying one aggregate function to the first component and an other to the second component (min and max. if it was the same function for both components I would not need splitting them).

(1) list[...]<- I had posted this over a decade ago on r-help. Since then it has been added to the gsubfn package. It does not require a special operator but does require that the left hand side be written using list[...] like this:
library(gsubfn) # need 0.7-0 or later
list[a, b] <- functionReturningTwoValues()
If you only need the first or second component these all work too:
list[a] <- functionReturningTwoValues()
list[a, ] <- functionReturningTwoValues()
list[, b] <- functionReturningTwoValues()
(Of course, if you only needed one value then functionReturningTwoValues()[[1]] or functionReturningTwoValues()[[2]] would be sufficient.)
See the cited r-help thread for more examples.
(2) with If the intent is merely to combine the multiple values subsequently and the return values are named then a simple alternative is to use with :
myfun <- function() list(a = 1, b = 2)
list[a, b] <- myfun()
a + b
# same
with(myfun(), a + b)
(3) attach Another alternative is attach:
attach(myfun())
a + b
ADDED: with and attach

I somehow stumbled on this clever hack on the internet ... I'm not sure if it's nasty or beautiful, but it lets you create a "magical" operator that allows you to unpack multiple return values into their own variable. The := function is defined here, and included below for posterity:
':=' <- function(lhs, rhs) {
frame <- parent.frame()
lhs <- as.list(substitute(lhs))
if (length(lhs) > 1)
lhs <- lhs[-1]
if (length(lhs) == 1) {
do.call(`=`, list(lhs[[1]], rhs), envir=frame)
return(invisible(NULL))
}
if (is.function(rhs) || is(rhs, 'formula'))
rhs <- list(rhs)
if (length(lhs) > length(rhs))
rhs <- c(rhs, rep(list(NULL), length(lhs) - length(rhs)))
for (i in 1:length(lhs))
do.call(`=`, list(lhs[[i]], rhs[[i]]), envir=frame)
return(invisible(NULL))
}
With that in hand, you can do what you're after:
functionReturningTwoValues <- function() {
return(list(1, matrix(0, 2, 2)))
}
c(a, b) := functionReturningTwoValues()
a
#[1] 1
b
# [,1] [,2]
# [1,] 0 0
# [2,] 0 0
I don't know how I feel about that. Perhaps you might find it helpful in your interactive workspace. Using it to build (re-)usable libraries (for mass consumption) might not be the best idea, but I guess that's up to you.
... you know what they say about responsibility and power ...

Usually I wrap the output into a list, which is very flexible (you can have any combination of numbers, strings, vectors, matrices, arrays, lists, objects int he output)
so like:
func2<-function(input) {
a<-input+1
b<-input+2
output<-list(a,b)
return(output)
}
output<-func2(5)
for (i in output) {
print(i)
}
[1] 6
[1] 7

I put together an R package zeallot to tackle this problem. zeallot includes a multiple assignment or unpacking assignment operator, %<-%. The LHS of the operator is any number of variables to assign, built using calls to c(). The RHS of the operator is a vector, list, data frame, date object, or any custom object with an implemented destructure method (see ?zeallot::destructure).
Here are a handful of examples based on the original post,
library(zeallot)
functionReturningTwoValues <- function() {
return(c(1, 2))
}
c(a, b) %<-% functionReturningTwoValues()
a # 1
b # 2
functionReturningListOfValues <- function() {
return(list(1, 2, 3))
}
c(d, e, f) %<-% functionReturningListOfValues()
d # 1
e # 2
f # 3
functionReturningNestedList <- function() {
return(list(1, list(2, 3)))
}
c(f, c(g, h)) %<-% functionReturningNestedList()
f # 1
g # 2
h # 3
functionReturningTooManyValues <- function() {
return(as.list(1:20))
}
c(i, j, ...rest) %<-% functionReturningTooManyValues()
i # 1
j # 2
rest # list(3, 4, 5, ..)
Check out the package vignette for more information and examples.

functionReturningTwoValues <- function() {
results <- list()
results$first <- 1
results$second <-2
return(results)
}
a <- functionReturningTwoValues()
I think this works.

There's no right answer to this question. I really depends on what you're doing with the data. In the simple example above, I would strongly suggest:
Keep things as simple as possible.
Wherever possible, it's a best practice to keep your functions vectorized. That provides the greatest amount of flexibility and speed in the long run.
Is it important that the values 1 and 2 above have names? In other words, why is it important in this example that 1 and 2 be named a and b, rather than just r[1] and r[2]? One important thing to understand in this context is that a and b are also both vectors of length 1. So you're not really changing anything in the process of making that assignment, other than having 2 new vectors that don't need subscripts to be referenced:
> r <- c(1,2)
> a <- r[1]
> b <- r[2]
> class(r)
[1] "numeric"
> class(a)
[1] "numeric"
> a
[1] 1
> a[1]
[1] 1
You can also assign the names to the original vector if you would rather reference the letter than the index:
> names(r) <- c("a","b")
> names(r)
[1] "a" "b"
> r["a"]
a
1
[Edit] Given that you will be applying min and max to each vector separately, I would suggest either using a matrix (if a and b will be the same length and the same data type) or data frame (if a and b will be the same length but can be different data types) or else use a list like in your last example (if they can be of differing lengths and data types).
> r <- data.frame(a=1:4, b=5:8)
> r
a b
1 1 5
2 2 6
3 3 7
4 4 8
> min(r$a)
[1] 1
> max(r$b)
[1] 8

If you want to return the output of your function to the Global Environment, you can use list2env, like in this example:
myfun <- function(x) { a <- 1:x
b <- 5:x
df <- data.frame(a=a, b=b)
newList <- list("my_obj1" = a, "my_obj2" = b, "myDF"=df)
list2env(newList ,.GlobalEnv)
}
myfun(3)
This function will create three objects in your Global Environment:
> my_obj1
[1] 1 2 3
> my_obj2
[1] 5 4 3
> myDF
a b
1 1 5
2 2 4
3 3 3

Lists seem perfect for this purpose. For example within the function you would have
x = desired_return_value_1 # (vector, matrix, etc)
y = desired_return_value_2 # (vector, matrix, etc)
returnlist = list(x,y...)
} # end of function
main program
x = returnlist[[1]]
y = returnlist[[2]]

Yes to your second and third questions -- that's what you need to do as you cannot have multiple 'lvalues' on the left of an assignment.

How about using assign?
functionReturningTwoValues <- function(a, b) {
assign(a, 1, pos=1)
assign(b, 2, pos=1)
}
You can pass the names of the variable you want to be passed by reference.
> functionReturningTwoValues('a', 'b')
> a
[1] 1
> b
[1] 2
If you need to access the existing values, the converse of assign is get.

[A]
If each of foo and bar is a single number, then there's nothing wrong with c(foo,bar); and you can also name the components: c(Foo=foo,Bar=bar). So you could access the components of the result 'res' as res[1], res[2]; or, in the named case, as res["Foo"], res["BAR"].
[B]
If foo and bar are vectors of the same type and length, then again there's nothing wrong with returning cbind(foo,bar) or rbind(foo,bar); likewise nameable. In the 'cbind' case, you would access foo and bar as res[,1], res[,2] or as res[,"Foo"], res[,"Bar"]. You might also prefer to return a dataframe rather than a matrix:
data.frame(Foo=foo,Bar=bar)
and access them as res$Foo, res$Bar. This would also work well if foo and bar were of the same length but not of the same type (e.g. foo is a vector of numbers, bar a vector of character strings).
[C]
If foo and bar are sufficiently different not to combine conveniently as above, then you shuld definitely return a list.
For example, your function might fit a linear model and
also calculate predicted values, so you could have
LM<-lm(....) ; foo<-summary(LM); bar<-LM$fit
and then you would return list(Foo=foo,Bar=bar) and then access the summary as res$Foo, the predicted values as res$Bar
source: http://r.789695.n4.nabble.com/How-to-return-multiple-values-in-a-function-td858528.html

Year 2021 and this is something I frequently use.
tidyverse package has a function called lst that assigns name to the list elements when creating the list.
Post which I use list2env() to assign variable or use the list directly
library(tidyverse)
fun <- function(){
a<-1
b<-2
lst(a,b)
}
list2env(fun(), envir=.GlobalEnv)#unpacks list key-values to variable-values into the current environment

This is only for the sake of completeness and not because I personally prefer it. You can pipe %>% the result, evaluate it with curly braces {} and write variables to the parent environment using double-arrow <<-.
library(tidyverse)
functionReturningTwoValues() %>% {a <<- .[1]; b <<- .[2]}
UPDATE:
Your can also use the multiple assignment operator from the zeallot package:: %<-%
c(a, b) %<-% list(0, 1)

I will post a function that returns multiple objects by way of vectors:
Median <- function(X){
X_Sort <- sort(X)
if (length(X)%%2==0){
Median <- (X_Sort[(length(X)/2)]+X_Sort[(length(X)/2)+1])/2
} else{
Median <- X_Sort[(length(X)+1)/2]
}
return(Median)
}
That was a function I created to calculate the median. I know that there's an inbuilt function in R called median() but nonetheless I programmed it to build other function to calculate the quartiles of a numeric data-set by using the Median() function I just programmed. The Median() function works like this:
If a numeric vector X has an even number of elements (i.e., length(X)%%2==0), the median is calculated by averaging the elements sort(X)[length(X)/2] and sort(X)[(length(X)/2+1)].
If Xdoesn't have an even number of elements, the median is sort(X)[(length(X)+1)/2].
On to the QuartilesFunction():
QuartilesFunction <- function(X){
X_Sort <- sort(X) # Data is sorted in ascending order
if (length(X)%%2==0){
# Data number is even
HalfDN <- X_Sort[1:(length(X)/2)]
HalfUP <- X_Sort[((length(X)/2)+1):length(X)]
QL <- Median(HalfDN)
QU <- Median(HalfUP)
QL1 <- QL
QL2 <- QL
QU1 <- QU
QU2 <- QU
QL3 <- QL
QU3 <- QU
Quartiles <- c(QL1,QU1,QL2,QU2,QL3,QU3)
names(Quartiles) = c("QL (1)", "QU (1)", "QL (2)", "QU (2)","QL (3)", "QU (3)")
} else{ # Data number is odd
# Including the median
Half1DN <- X_Sort[1:((length(X)+1)/2)]
Half1UP <- X_Sort[(((length(X)+1)/2)):length(X)]
QL1 <- Median(Half1DN)
QU1 <- Median(Half1UP)
# Not including the median
Half2DN <- X_Sort[1:(((length(X)+1)/2)-1)]
Half2UP <- X_Sort[(((length(X)+1)/2)+1):length(X)]
QL2 <- Median(Half2DN)
QU2 <- Median(Half2UP)
# Methods (1) and (2) averaged
QL3 <- (QL1+QL2)/2
QU3 <- (QU1+QU2)/2
Quartiles <- c(QL1,QU1,QL2,QU2,QL3,QU3)
names(Quartiles) = c("QL (1)", "QU (1)", "QL (2)", "QU (2)","QL (3)", "QU (3)")
}
return(Quartiles)
}
This function returns the quartiles of a numeric vector by using three methods:
Discarding the median for the calculation of the quartiles when the number of elements of the numeric vector Xis odd.
Keeping the median for the calculation of the quartiles when the number of elements of the numeric vector Xis odd.
Averaging the results obtained by using methods 1 and 2.
When the number of elements in the numeric vector X is even, the three methods coincide.
The result of the QuartilesFunction() is a vector that depicts the first and third quartiles calculated by using the three methods outlined.

With R 3.6.1, I can do the following
fr2v <- function() { c(5,3) }
a_b <- fr2v()
(a_b[[1]]) # prints "5"
(a_b[[2]]) # prints "3"

To obtain multiple outputs from a function and keep them in the desired format you can save the outputs to your hard disk (in the working directory) from within the function and then load them from outside the function:
myfun <- function(x) {
df1 <- ...
df2 <- ...
save(df1, file = "myfile1")
save(df2, file = "myfile2")
}
load("myfile1")
load("myfile2")

Develop Reference

r css asp.net wordpress firebase qt symfony nginx http apache-flex

Nonstandard evaluation of list of variable names - r

Related

How to transform the object of a function in r?

function to subset data supplying subset argument as text string

treat string as object name in a loop in R

R: creating a named vector from variables

How to assign from a function which returns more than one value?

Categories

Resources