Combining logical functions in R - r

I'm running several tests for a given object x. For a given test (being a test a function that returns TRUE or FALSE when applied to an object) it is quite easy, as you can do lapply(x, test). For example:
# This would return TRUE
lapply('a', is.character)
However, I would like to create a function pass_tests, which would be able to combine multiple tests, i.e. that it could run something like this:
pass_tests('a', is.character | is.numeric)
Therefore, it should combine multiple functions given in an argument of the function, combining its result when testing an object x. In this case, it would return whether 'a' is character OR numeric, which would be TRUE. The following line should return FALSE:
pass_tests('a', is.character & is.numeric)
The idea is that it could be flexible for different combinations , e.g.:
pass_tests(x, test1 & (test2 | test3))
Any idea if functions can be logically combined this way?

Another option would be to use the pipes
library(magrittr) # or dplyr
"a" %>% {is.character(.) & is.numeric(.)}
#FALSE
"a" %>% {is.character(.) | is.numeric(.)}
#TRUE
1 %>% {is.finite(.) & (is.character(.) | is.numeric(.))}
#TRUE
Edit: used in a function with string
pass_test <- function(x, expr) {
x %>% {eval(parse(text = expr))}
}
pass_test(1, "is.finite(.) & (is.character(.) | is.numeric(.))")
#TRUE
The argument expr can be a string or an expression as in expression(is.finite(.) & (is.character(.) | is.numeric(.))).

Here's another way to do it by creating infix operators.
`%and%` <- function(lhs, rhs) {
function(...) lhs(...) & rhs(...)
}
`%or%` <- function(lhs, rhs) {
function(...) lhs(...) | rhs(...)
}
(is.character %and% is.numeric)('a')
#> [1] FALSE
(is.character %or% is.numeric)('a')
#> [1] TRUE
These can be chained together. However, it will not have the normal AND/OR precedence. It will be evaluated left-to-right.
(is.double %and% is.numeric %and% is.finite)(12)
#> [1] TRUE

Related

NSE in nested function calls

I'd like to use a utility function to check whether a given column exists within a given data.frame. I'm piping within the tidyverse. The best I've come up with so far is
library(magrittr)
columnExists <- function(data, col) {
tryCatch({
rlang::as_label(rlang::enquo(col)) %in% names(data)
},
error=function(e) FALSE
)
}
This works in the global environment
> mtcars %>% columnExists(mpg)
[1] TRUE
> mtcars %>% columnExists(bad)
[1] FALSE
But not when called from within another function, which is my actual use case
outerFunction <- function(d, col) {
d %>% columnExists((col))
}
> mtcars %>% outerFunction(mpg) # Expected TRUE
[1] FALSE
> mtcars %>% outerFunction(bad) # Expected FALSE
[1] FALSE
What am I doing wrong? Is it possible to have a single function that works correctly in the global environment and also when nested in another function?
I have found several SO posts related to checking for the existence of a given column or columns, but they all seem to assume either that the column name will be passed as a string or the call to check existence is not nested (or both). That is not the case here.
You want to pass though the original symbol in your outerFunction. Use
outerFunction <- function(d, col) {
d %>% columnExists( {{col}} )
}
The "embrace" syntax will prevent early evaluation.

difference between `%in%` and `in` operator in R

What's the difference between the in and the %in% operator in R? Why do I sometimes need the percentage signs and other times I do not?
The 3 following objects are all functions :
identity
%in%
for
We can call them this way :
`identity`(1)
#> [1] 1
`%in%`(1, 1:2)
#> [1] TRUE
`for`(x, seq(3), print("yes"))
#> [1] "yes"
#> [1] "yes"
#> [1] "yes"
But usually we don't!
"identity" is syntactic (i.e. it's a "regular" name, doesn't contain weird symbols etc), AND it is not a protected word so we can skip the tick marks and call just :
identity(1)
%in% is not syntactic but it starts and ends with "%" so it can be used in infix form. you could define your own `%fun%` <-function(x,y) ... and use it this way to, so we would call :
1 %in% 1:2
for is a control flow construct, like if, while and repeat, all of those are functions with a given number of arguments, but they come in the language with more convenient ways to call them than the above. here we'd do :
for (x in seq(3)) print("yes")
in is just used to parse the code, it's not a function here (just like else isn't either.
?`%in%` will show you what the function does.
Depending on how you define it, there is no in operator in R, only an %in% operator. Instead, in is “syntactic sugar” as part of the syntax for the for loop.
By contrast, %in% is an actual operator defined in R which tests whether the left-hand expression is contained in the right-hand expression. As other operators in R, %in% is a regular function and can be called as such:
if (`%in%`(x, seq(3, 5))) message("yes")
… or it can be redefined:
`%in%` = function (x, table) {
message("I redefined %in%!")
match(x, table, nomatch = 0L) > 0L
}
if (5 %in% 1 : 10) message("yes")
# I redefined %in%!
# yes
Usage-wise, I have figured out the answer: I can only use in when I loop through everything, and %in%for checking whether something is contained in something else, e.g.
for (x in seq(3)){
if (x %in% seq(3,5)) print("yes")
}

How to get the list of in-built functions used within a function

Lets say I have a function named Fun1 within which I am using many different in-built functions of R for different different processes. Then how can I get a list of in-built functions used inside this function Fun1
Fun1 <- function(x,y){
sum(x,y)
mean(x,y)
c(x,y)
print(x)
print(y)
}
So My output should be like list of characters i.e. sum, mean, c, print. Because these are the in-built functions I have used inside function Fun1.
I have tried using grep function
grep("\\(",body(Fun1),value=TRUE)
# [1] "sum(x, y)" "mean(x, y)" "c(x, y)" "print(x)" "print(y)"
It looks ok, but arguments should not come i.e. x and y. Just the list of function names used inside body of function Fun1 here.
So my overall goal is to print the unique list of in-built functions or any create functions inside a particular function, here Fun1.
Any help on this is highly appreciated. Thanks.
You could use all.vars() to get all the variable names (including functions) that appear inside the body of Fun1, then compare that with some prepared list of functions. You mention in-built functions, so I will compare it with the base package object names.
## full list of variable names inside the function body
(vars <- all.vars(body(Fun1)[-1], functions = TRUE))
# [1] "sum" "x" "y" "mean" "c" "print"
## compare it with the base package object names
intersect(vars, ls(baseenv()))
# [1] "sum" "mean" "c" "print"
I removed the first element of the function body because presumably you don't care about {, which would have been matched against the base package list.
Another possibility, albeit a bit less reliable, would be to compare the formal arguments of Fun1 to all the variable names in the function. Like I said, likely less reliable though because if you make assignments inside the function you will end up with incorrect results.
setdiff(vars, names(formals(Fun1)))
# [1] "sum" "mean" "c" "print"
These are fun though, and you can fiddle around with them.
Access to the parser tokens is available with functions from utils.
tokens <- utils::getParseData(parse(text=deparse(body(Fun1))))
unique(tokens[tokens[["token"]] == "SYMBOL_FUNCTION_CALL", "text"])
[1] "sum" "mean" "c" "print"
This should be somewhat helpful - this will return all functions however.
func_list = Fun1 %>%
body() %>% # extracts function
toString() %>% # converts to single string
gsub("[{}]", "", .) %>% # removes curly braces
gsub("\\s*\\([^\\)]+\\)", "", .) %>% # removes all contents between brackets
strsplit(",") %>% # splits strings at commas
unlist() %>% # converts to vector
trimws(., "both") # removes all white spaces before and after`
[1] "" "sum" "mean" "c" "print" "print"
> table(func_list)
func_list
c mean print sum
1 1 1 2 1
This is extremely limited to your example... you could modify this to be more robust. It will fall over where a function has brackets nesting other functions etc.
this is not so beautiful but working:
Fun1 <- function(x,y){
sum(x,y)
mean(x,y)
c(x,y)
print(x)
print(y)
}
getFNamesInFunction <- function(f.name){
f <- deparse(body(get(f.name)))
f <- f[grepl(pattern = "\\(", x = f)]
f <- sapply(X = strsplit(split = "\\(", x = f), FUN = function(x) x[1])
unique(trimws(f[f != ""]))
}
getFNamesInFunction("Fun1")
[1] "sum" "mean" "c" "print"
as.list(Fun1)[3]
gives you the part of the function between the curly braces.
{
sum(x, y)
mean(x, y)
c(x, y)
print(x)
print(y)
}
Hence
gsub( ").*$", "", as.list(Fun1)[3])
gives you everything before the first " ) " appears which is presumable the name of the first function.
Taking this as a starting point you should be able to include a loop which gives you the other functions and not the first only the first one.

Operator | for choosing the data to load

I would like to use if to load the data from csv files which I determine at the beginning of my script.
I use this function:
if(which_data == "data1") {tbl <- read.csv("aaa.csv")}
but I would like to add operator OR | to load the data which I want if I put two different names to which_data.
The function should look like:
if(which_data == "data1" | "data2") {tbl <- read.csv("aaa.csv")}
but the problem is that such operator can be used only for numeric, logical or complex types. What else can I do ?
Test if your variable is "in" one of the values:
if(which_data %in% c("data1" ,"data2")) {tbl <- read.csv("aaa.csv")}
Note that | doesn't do what maybe you think it does with numeric types:
> 3 == 2|3
[1] TRUE
> 3 == 2|1
[1] TRUE
Its testing (3==2) or (1), and in R, 1 evaluates as TRUE, so the expression 3==2|1 is TRUE.

Finding the names of all functions in an R expression

I'm trying to find the names of all the functions used in an arbitrary legal R expression, but I can't find a function that will flag the below example as a function instead of a name.
test <- expression(
this_is_a_function <- function(var1, var2){
this_is_a_function(var1-1, var2)
})
all.vars(test, functions = FALSE)
[1] "this_is_a_function" "var1" "var2"
all.vars(expr, functions = FALSE) seems to return functions declarations (f <- function(){}) in the expression, while filtering out function calls ('+'(1,2), ...).
Is there any function - in the core libraries or elsewhere - that will flag 'this_is_a_function' as a function, not a name? It needs to work on arbitrary expressions, that are syntactically legal but might not evaluate correctly (e.g '+'(1, 'duck'))
I've found similar questions, but they don't seem to contain the solution.
If clarification is needed, leave a comment below. I'm using the parser package to parse the expressions.
Edit: #Hadley
I have expressions with contain entire scripts, which usually consist of a main function containing nested function definitions, with a call to the main function at the end of the script.
Functions are all defined inside the expressions, and I don't mind if I have to include '<-' and '{', since I can easy filter them out myself.
The motivation is to take all my R scripts and gather basic statistics about how my use of functions has changed over time.
Edit: Current Solution
A Regex-based approach grabs the function definitions, combined with the method in James' comment to grab function calls. Usually works, since I never use right-hand assignment.
function_usage <- function(code_string){
# takes a script, extracts function definitions
require(stringr)
code_string <- str_replace(code_string, 'expression\\(', '')
equal_assign <- '.+[ \n]+<-[ \n]+function'
arrow_assign <- '.+[ \n]+=[ \n]+function'
function_names <- sapply(
strsplit(
str_match(code_string, equal_assign), split = '[ \n]+<-'),
function(x) x[1])
function_names <- c(function_names, sapply(
strsplit(
str_match(code_string, arrow_assign), split = '[ \n]+='),
function(x) x[1]))
return(table(function_names))
}
Short answer: is.function checks whether a variable actually holds a function. This does not work on (unevaluated) calls because they are calls. You also need to take care of masking:
mean <- mean (x)
Longer answer:
IMHO there is a big difference between the two occurences of this_is_a_function.
In the first case you'll assign a function to the variable with name this_is_a_function once you evaluate the expression. The difference is the same difference as between 2+2 and 4.
However, just finding <- function () does not guarantee that the result is a function:
f <- function (x) {x + 1} (2)
The second occurrence is syntactically a function call. You can determine from the expression that a variable called this_is_a_function which holds a function needs to exist in order for the call to evaluate properly. BUT: you don't know whether it exists from that statement alone. however, you can check whether such a variable exists, and whether it is a function.
The fact that functions are stored in variables like other types of data, too, means that in the first case you can know that the result of function () will be function and from that conclude that immediately after this expression is evaluated, the variable with name this_is_a_function will hold a function.
However, R is full of names and functions: "->" is the name of the assignment function (a variable holding the assignment function) ...
After evaluating the expression, you can verify this by is.function (this_is_a_function).
However, this is by no means the only expression that returns a function: Think of
f <- function () {g <- function (){}}
> body (f)[[2]][[3]]
function() {
}
> class (body (f)[[2]][[3]])
[1] "call"
> class (eval (body (f)[[2]][[3]]))
[1] "function"
all.vars(expr, functions = FALSE) seems to return functions declarations (f <- function(){}) in the expression, while filtering out function calls ('+'(1,2), ...).
I'd say it is the other way round: in that expression f is the variable (name) which will be asssigned the function (once the call is evaluated). + (1, 2) evaluates to a numeric. Unless you keep it from doing so.
e <- expression (1 + 2)
> e <- expression (1 + 2)
> e [[1]]
1 + 2
> e [[1]][[1]]
`+`
> class (e [[1]][[1]])
[1] "name"
> eval (e [[1]][[1]])
function (e1, e2) .Primitive("+")
> class (eval (e [[1]][[1]]))
[1] "function"
Instead of looking for function definitions, which is going to be effectively impossible to do correctly without actually evaluating the functions, it will be easier to look for function calls.
The following function recursively spiders the expression/call tree returning the names of all objects that are called like a function:
find_calls <- function(x) {
# Base case
if (!is.recursive(x)) return()
recurse <- function(x) {
sort(unique(as.character(unlist(lapply(x, find_calls)))))
}
if (is.call(x)) {
f_name <- as.character(x[[1]])
c(f_name, recurse(x[-1]))
} else {
recurse(x)
}
}
It works as expected for a simple test case:
x <- expression({
f(3, g())
h <- function(x, y) {
i()
j()
k(l())
}
})
find_calls(x)
# [1] "{" "<-" "f" "function" "g" "i" "j"
# [8] "k" "l"
Just to follow up here as I have also been dealing with this problem: I have now created a C-level function to do this using code very similar to the C implementation of all.names and all.vars in base R. It however only works with objects of type "language" i.e. function calls, not type "expression". Demonstration:
ex = quote(sum(x) + mean(y) / z)
all.names(ex)
#> [1] "+" "sum" "x" "/" "mean" "y" "z"
all.vars(ex)
#> [1] "x" "y" "z"
collapse::all_funs(ex)
#> [1] "+" "sum" "/" "mean"
Created on 2022-08-17 by the reprex package (v2.0.1)
This generalizes to arbitrarily complex nested calls.

Resources