Using function argument inside grepl pattern (R)

Using function argument inside grepl pattern (R) - r

Is there a way to use an argument from a user-defined function as part of grepl pattern?
Ex:
Function1 <- function(x, y) {
grepl(pattern = ".*\\sy", x)
}
whereby the "y" inside the pattern would differ according to how you called the function,
ie:
data <- c("Joe Smith", "John Doe")
Function1(data, S)
would return
[1] TRUE FALSE
Is there a way for grepl to recognize y as an outside variable? (I've tried 'y' \\y and y inside backticks to no effect)

The pattern is just a string. you can use paste() for string concatenation
grepl(pattern = paste(".*\\s",y), x)
There is no other "special" way to reference variables inside the regular expression string.

You can build your pattern by using paste0():
Function1 <- function(x,y) {
grepl(pattern = paste0(".*\\s",y), x)
}
Function1(data, 'S')
[1] TRUE FALSE

Related

Make access to non-existant list element an error

In R, one can access list elements with $. When one accesses a field which is not included in the list, the resulting value is just NULL. This is problematic in the parts of my code where I further work with object. Take this code:
l <- list(foo = 1, bar = 2)
print(l$foobar)
The output will just be NULL and no error and no warning. I am aware that this might be needed such that assignment of new elements (l$foobar <- 3) can work.
Is there some way where I can make read-access of a field in a list a hard error if it does not exist?

An extreme option is to overload the $ operator, then define an S3 method for list objects that checks the names.
`$` <- function(x, y) {
UseMethod("$")
}
`$.list` <- function(x, y) {
ylab <- deparse(substitute(y))
stopifnot(ylab %in% names(x))
do.call(.Primative("$"), list(x, ylab))
}
Then:
myList <- list('a' = pi)
> myList$b
Error: ylab %in% names(x) is not TRUE
> myList$a
3.141593
You must, of course, be sure to set the following:
`$.default` <- base::`$`
to avoid any conflict with existing usage of the "$" operator
If you wish to continue to use partial matching with the "$" when applied to a list, you can make use of stopifnot(length(pmatch(ylab, names(x)))>0).

I know this is an old question with an already accepted answer writing a custom function, but I think get from base R should do the trick as well:
l <- list(foo = 1, bar = 2)
get('foobar', l)
#> Error in get("foobar", l): object 'foobar' not found

You could try writing your own function:
extract_ls <- function(ls, el){
if (!el %in% names(ls)) stop(paste0("The specified element ", el, " does not exist in the list"))
else return(ls[[el]])
}
Then you can do:
extract_ls(l, "baz")
#Error in extract_ls(l, "baz") :
# The specified element baz does not exist in the list
extract_ls(l, "bar")
#[1] 2
Note that I use [[ instead of $. These two perform the same operation, but $ does partial matching. Also [[ requires a string.
To make this an infix function, just change the function name to `%[[%`. The usage is then:
l %[[% "baz"
#Error in l %[[% "baz" :
# The specified element baz does not exist in the list
l %[[% "bar"
#[1] 2

Try this
el_names <- function(l,variable)
{
if(is.null(l[[variable]]) & length(l)>1)
{
stop("Null found")
}
else{
print(l[[variable]])
}
}
> el_names(l,"foo")
[1] 1

Use capture group within gsub() as index/name for another object (R) [duplicate]

I'm new to R and am stuck with backreferencing that doesn't seem to work. In:
gsub("\\((\\d+)\\)", f("\\1"), string)
It correctly grabs the number in between parentheses but doesn't apply the (correctly defined, working otherwise) function f to replace the number --> it's actually the string "\1" that passes through to f.
Am I missing something or is it just that R does not handle this? If so, any idea how I could do something similar, i.e. applying a function "on the fly" to the (actually many) numbers that occur in between parentheses in the text I'm parsing?
Thanks a lot for your help.

R does not have the option of applying a function directly to a match via gsub. You'll actually have to extract the match, transform the value, then replace the value. This is relativaly easy with the regmatches function. For example
x<-"(990283)M (31)O (29)M (6360)M"
f<-function(x) {
v<-as.numeric(substr(x,2,nchar(x)-1))
paste0(v+5,".1")
}
m <- gregexpr("\\(\\d+\\)", x)
regmatches(x, m) <- lapply(regmatches(x, m), f)
x
# [1] "990288.1M 36.1O 34.1M 6365.1M"
Of course you can make f do whatever you like just make sure it's vector-friendly. Of course, you could wrap this in your own function
gsubf <- function(pattern, x, f) {
m <- gregexpr(pattern, x)
regmatches(x, m) <- lapply(regmatches(x, m), f)
x
}
gsubf("\\(\\d+\\)", x, f)
Note that in these examples we're not using a capture group, we're just grabbing the entire match. There are ways to extract the capture groups but they are a bit messier. If you wanted to provide an example where such an extraction is required, I might be able to come up with something fancier.

To use a callback within a regex-capable replacement function, you may use either gsubfn or stringr functions.
When choosing between them, note that stringr is based on ICU regex engine and with gsubfn, you may use either the default TCL (if the R installation has tcltk capability, else it is the default TRE) or PCRE (if you pass the perl=TRUE argument).
Also, note that gsubfn allows access to all capturing groups in the match object, while str_replace_all will only allow to manipulate the whole match only. Thus, for str_replace_all, the regex should look like (?<=\()\d+(?=\)), where 1+ digits are matched only when they are enclosed with ( and ) excluding them from the match.
With stringr, you may use str_replace_all:
library(stringr)
string <- "(990283)M (31)O (29)M (6360)M"
## Callback function to increment found number:
f <- function(x) { as.integer(x) + 1 }
str_replace_all(string, "(?<=\\()\\d+(?=\\))", function(m) f(m))
## => [1] "(990284)M (32)O (30)M (6361)M"
With gsubfn, pass perl=TRUE and backref=0 to be able to use lookarounds and just modify the whole match:
gsubfn("(?<=\\()\\d+(?=\\))", ~ f(m), string, perl=TRUE, backref=0)
## => [1] "(990284)M (32)O (30)M (6361)M"
If you have multiple groups in the pattern, remoe backref=0 and enumerate the group value arguments in the callback function declaration:
gsubfn("(\\()(\\d+)(\\))", function(m,n,o) paste0(m,f(n),o), string, perl=TRUE)
^ 1 ^^ 2 ^^ 3 ^ ^^^^^^^ ^^^^

This is for multiple different replacements.
text="foo(200) (300)bar (400)foo (500)bar (600)foo (700)bar"
f=function(x)
{
return(as.numeric(x[[1]])+5)
}
a=strsplit(text,"\\(\\K\\d+",perl=T)[[1]]
b=f(str_extract_all(text,perl("\\(\\K\\d+")))
paste0(paste0(a[-length(a)],b,collapse=""),a[length(a)]) #final output
#[1] "foo(205) (305)bar (405)foo (505)bar (605)foo (705)bar"

Here's a way by tweaking a bit stringr::str_replace(), in the replace argument, just use a lambda formula as the replace argument, and reference the captured group not by ""\\1" but by ..1, so your gsub("\\((\\d+)\\)", f("\\1"), string) will become str_replace2(string, "\\((\\d+)\\)", ~f(..1)), or just str_replace2(string, "\\((\\d+)\\)", f) in this simple case :
str_replace2 <- function(string, pattern, replacement, type.convert = TRUE){
if(inherits(replacement, "formula"))
replacement <- rlang::as_function(replacement)
if(is.function(replacement)){
grps_mat <- stringr::str_match(string, pattern)[,-1, drop = FALSE]
grps_list <- lapply(seq_len(ncol(grps_mat)), function(i) grps_mat[,i])
if(type.convert) {
grps_list <- type.convert(grps_list, as.is = TRUE)
replacement <- rlang::exec(replacement, !!! grps_list)
replacement <- as.character(replacement)
} else {
replacement <- rlang::exec(replacement, !!! grps_list)
}
}
stringr::str_replace(string, pattern, replacement)
}
str_replace2(
"foo (4)",
"\\((\\d+)\\)",
sqrt)
#> [1] "foo 2"
str_replace2(
"foo (4) (5)",
"\\((\\d+)\\) \\((\\d+)\\)",
~ sprintf("(%s)", ..1 * ..2))
#> [1] "foo (20)"
Created on 2020-01-24 by the reprex package (v0.3.0)

How to get the list of in-built functions used within a function

Lets say I have a function named Fun1 within which I am using many different in-built functions of R for different different processes. Then how can I get a list of in-built functions used inside this function Fun1
Fun1 <- function(x,y){
sum(x,y)
mean(x,y)
c(x,y)
print(x)
print(y)
}
So My output should be like list of characters i.e. sum, mean, c, print. Because these are the in-built functions I have used inside function Fun1.
I have tried using grep function
grep("\\(",body(Fun1),value=TRUE)
# [1] "sum(x, y)" "mean(x, y)" "c(x, y)" "print(x)" "print(y)"
It looks ok, but arguments should not come i.e. x and y. Just the list of function names used inside body of function Fun1 here.
So my overall goal is to print the unique list of in-built functions or any create functions inside a particular function, here Fun1.
Any help on this is highly appreciated. Thanks.

You could use all.vars() to get all the variable names (including functions) that appear inside the body of Fun1, then compare that with some prepared list of functions. You mention in-built functions, so I will compare it with the base package object names.
## full list of variable names inside the function body
(vars <- all.vars(body(Fun1)[-1], functions = TRUE))
# [1] "sum" "x" "y" "mean" "c" "print"
## compare it with the base package object names
intersect(vars, ls(baseenv()))
# [1] "sum" "mean" "c" "print"
I removed the first element of the function body because presumably you don't care about {, which would have been matched against the base package list.
Another possibility, albeit a bit less reliable, would be to compare the formal arguments of Fun1 to all the variable names in the function. Like I said, likely less reliable though because if you make assignments inside the function you will end up with incorrect results.
setdiff(vars, names(formals(Fun1)))
# [1] "sum" "mean" "c" "print"
These are fun though, and you can fiddle around with them.

Access to the parser tokens is available with functions from utils.
tokens <- utils::getParseData(parse(text=deparse(body(Fun1))))
unique(tokens[tokens[["token"]] == "SYMBOL_FUNCTION_CALL", "text"])
[1] "sum" "mean" "c" "print"

This should be somewhat helpful - this will return all functions however.
func_list = Fun1 %>%
body() %>% # extracts function
toString() %>% # converts to single string
gsub("[{}]", "", .) %>% # removes curly braces
gsub("\\s*\\([^\\)]+\\)", "", .) %>% # removes all contents between brackets
strsplit(",") %>% # splits strings at commas
unlist() %>% # converts to vector
trimws(., "both") # removes all white spaces before and after`
[1] "" "sum" "mean" "c" "print" "print"
> table(func_list)
func_list
c mean print sum
1 1 1 2 1
This is extremely limited to your example... you could modify this to be more robust. It will fall over where a function has brackets nesting other functions etc.

this is not so beautiful but working:
Fun1 <- function(x,y){
sum(x,y)
mean(x,y)
c(x,y)
print(x)
print(y)
}
getFNamesInFunction <- function(f.name){
f <- deparse(body(get(f.name)))
f <- f[grepl(pattern = "\\(", x = f)]
f <- sapply(X = strsplit(split = "\\(", x = f), FUN = function(x) x[1])
unique(trimws(f[f != ""]))
}
getFNamesInFunction("Fun1")
[1] "sum" "mean" "c" "print"

as.list(Fun1)[3]
gives you the part of the function between the curly braces.
{
sum(x, y)
mean(x, y)
c(x, y)
print(x)
print(y)
}
Hence
gsub( ").*$", "", as.list(Fun1)[3])
gives you everything before the first " ) " appears which is presumable the name of the first function.
Taking this as a starting point you should be able to include a loop which gives you the other functions and not the first only the first one.

Applying a function to a backreference within gsub in R

I'm new to R and am stuck with backreferencing that doesn't seem to work. In:
gsub("\\((\\d+)\\)", f("\\1"), string)
It correctly grabs the number in between parentheses but doesn't apply the (correctly defined, working otherwise) function f to replace the number --> it's actually the string "\1" that passes through to f.
Am I missing something or is it just that R does not handle this? If so, any idea how I could do something similar, i.e. applying a function "on the fly" to the (actually many) numbers that occur in between parentheses in the text I'm parsing?
Thanks a lot for your help.

R does not have the option of applying a function directly to a match via gsub. You'll actually have to extract the match, transform the value, then replace the value. This is relativaly easy with the regmatches function. For example
x<-"(990283)M (31)O (29)M (6360)M"
f<-function(x) {
v<-as.numeric(substr(x,2,nchar(x)-1))
paste0(v+5,".1")
}
m <- gregexpr("\\(\\d+\\)", x)
regmatches(x, m) <- lapply(regmatches(x, m), f)
x
# [1] "990288.1M 36.1O 34.1M 6365.1M"
Of course you can make f do whatever you like just make sure it's vector-friendly. Of course, you could wrap this in your own function
gsubf <- function(pattern, x, f) {
m <- gregexpr(pattern, x)
regmatches(x, m) <- lapply(regmatches(x, m), f)
x
}
gsubf("\\(\\d+\\)", x, f)
Note that in these examples we're not using a capture group, we're just grabbing the entire match. There are ways to extract the capture groups but they are a bit messier. If you wanted to provide an example where such an extraction is required, I might be able to come up with something fancier.

To use a callback within a regex-capable replacement function, you may use either gsubfn or stringr functions.
When choosing between them, note that stringr is based on ICU regex engine and with gsubfn, you may use either the default TCL (if the R installation has tcltk capability, else it is the default TRE) or PCRE (if you pass the perl=TRUE argument).
Also, note that gsubfn allows access to all capturing groups in the match object, while str_replace_all will only allow to manipulate the whole match only. Thus, for str_replace_all, the regex should look like (?<=\()\d+(?=\)), where 1+ digits are matched only when they are enclosed with ( and ) excluding them from the match.
With stringr, you may use str_replace_all:
library(stringr)
string <- "(990283)M (31)O (29)M (6360)M"
## Callback function to increment found number:
f <- function(x) { as.integer(x) + 1 }
str_replace_all(string, "(?<=\\()\\d+(?=\\))", function(m) f(m))
## => [1] "(990284)M (32)O (30)M (6361)M"
With gsubfn, pass perl=TRUE and backref=0 to be able to use lookarounds and just modify the whole match:
gsubfn("(?<=\\()\\d+(?=\\))", ~ f(m), string, perl=TRUE, backref=0)
## => [1] "(990284)M (32)O (30)M (6361)M"
If you have multiple groups in the pattern, remoe backref=0 and enumerate the group value arguments in the callback function declaration:
gsubfn("(\\()(\\d+)(\\))", function(m,n,o) paste0(m,f(n),o), string, perl=TRUE)
^ 1 ^^ 2 ^^ 3 ^ ^^^^^^^ ^^^^

This is for multiple different replacements.
text="foo(200) (300)bar (400)foo (500)bar (600)foo (700)bar"
f=function(x)
{
return(as.numeric(x[[1]])+5)
}
a=strsplit(text,"\\(\\K\\d+",perl=T)[[1]]
b=f(str_extract_all(text,perl("\\(\\K\\d+")))
paste0(paste0(a[-length(a)],b,collapse=""),a[length(a)]) #final output
#[1] "foo(205) (305)bar (405)foo (505)bar (605)foo (705)bar"

Here's a way by tweaking a bit stringr::str_replace(), in the replace argument, just use a lambda formula as the replace argument, and reference the captured group not by ""\\1" but by ..1, so your gsub("\\((\\d+)\\)", f("\\1"), string) will become str_replace2(string, "\\((\\d+)\\)", ~f(..1)), or just str_replace2(string, "\\((\\d+)\\)", f) in this simple case :
str_replace2 <- function(string, pattern, replacement, type.convert = TRUE){
if(inherits(replacement, "formula"))
replacement <- rlang::as_function(replacement)
if(is.function(replacement)){
grps_mat <- stringr::str_match(string, pattern)[,-1, drop = FALSE]
grps_list <- lapply(seq_len(ncol(grps_mat)), function(i) grps_mat[,i])
if(type.convert) {
grps_list <- type.convert(grps_list, as.is = TRUE)
replacement <- rlang::exec(replacement, !!! grps_list)
replacement <- as.character(replacement)
} else {
replacement <- rlang::exec(replacement, !!! grps_list)
}
}
stringr::str_replace(string, pattern, replacement)
}
str_replace2(
"foo (4)",
"\\((\\d+)\\)",
sqrt)
#> [1] "foo 2"
str_replace2(
"foo (4) (5)",
"\\((\\d+)\\) \\((\\d+)\\)",
~ sprintf("(%s)", ..1 * ..2))
#> [1] "foo (20)"
Created on 2020-01-24 by the reprex package (v0.3.0)

Finding the names of all functions in an R expression

I'm trying to find the names of all the functions used in an arbitrary legal R expression, but I can't find a function that will flag the below example as a function instead of a name.
test <- expression(
this_is_a_function <- function(var1, var2){
this_is_a_function(var1-1, var2)
})
all.vars(test, functions = FALSE)
[1] "this_is_a_function" "var1" "var2"
all.vars(expr, functions = FALSE) seems to return functions declarations (f <- function(){}) in the expression, while filtering out function calls ('+'(1,2), ...).
Is there any function - in the core libraries or elsewhere - that will flag 'this_is_a_function' as a function, not a name? It needs to work on arbitrary expressions, that are syntactically legal but might not evaluate correctly (e.g '+'(1, 'duck'))
I've found similar questions, but they don't seem to contain the solution.
If clarification is needed, leave a comment below. I'm using the parser package to parse the expressions.
Edit: #Hadley
I have expressions with contain entire scripts, which usually consist of a main function containing nested function definitions, with a call to the main function at the end of the script.
Functions are all defined inside the expressions, and I don't mind if I have to include '<-' and '{', since I can easy filter them out myself.
The motivation is to take all my R scripts and gather basic statistics about how my use of functions has changed over time.
Edit: Current Solution
A Regex-based approach grabs the function definitions, combined with the method in James' comment to grab function calls. Usually works, since I never use right-hand assignment.
function_usage <- function(code_string){
# takes a script, extracts function definitions
require(stringr)
code_string <- str_replace(code_string, 'expression\\(', '')
equal_assign <- '.+[ \n]+<-[ \n]+function'
arrow_assign <- '.+[ \n]+=[ \n]+function'
function_names <- sapply(
strsplit(
str_match(code_string, equal_assign), split = '[ \n]+<-'),
function(x) x[1])
function_names <- c(function_names, sapply(
strsplit(
str_match(code_string, arrow_assign), split = '[ \n]+='),
function(x) x[1]))
return(table(function_names))
}

Short answer: is.function checks whether a variable actually holds a function. This does not work on (unevaluated) calls because they are calls. You also need to take care of masking:
mean <- mean (x)
Longer answer:
IMHO there is a big difference between the two occurences of this_is_a_function.
In the first case you'll assign a function to the variable with name this_is_a_function once you evaluate the expression. The difference is the same difference as between 2+2 and 4.
However, just finding <- function () does not guarantee that the result is a function:
f <- function (x) {x + 1} (2)
The second occurrence is syntactically a function call. You can determine from the expression that a variable called this_is_a_function which holds a function needs to exist in order for the call to evaluate properly. BUT: you don't know whether it exists from that statement alone. however, you can check whether such a variable exists, and whether it is a function.
The fact that functions are stored in variables like other types of data, too, means that in the first case you can know that the result of function () will be function and from that conclude that immediately after this expression is evaluated, the variable with name this_is_a_function will hold a function.
However, R is full of names and functions: "->" is the name of the assignment function (a variable holding the assignment function) ...
After evaluating the expression, you can verify this by is.function (this_is_a_function).
However, this is by no means the only expression that returns a function: Think of
f <- function () {g <- function (){}}
> body (f)[[2]][[3]]
function() {
}
> class (body (f)[[2]][[3]])
[1] "call"
> class (eval (body (f)[[2]][[3]]))
[1] "function"
all.vars(expr, functions = FALSE) seems to return functions declarations (f <- function(){}) in the expression, while filtering out function calls ('+'(1,2), ...).
I'd say it is the other way round: in that expression f is the variable (name) which will be asssigned the function (once the call is evaluated). + (1, 2) evaluates to a numeric. Unless you keep it from doing so.
e <- expression (1 + 2)
> e <- expression (1 + 2)
> e [[1]]
1 + 2
> e [[1]][[1]]
`+`
> class (e [[1]][[1]])
[1] "name"
> eval (e [[1]][[1]])
function (e1, e2) .Primitive("+")
> class (eval (e [[1]][[1]]))
[1] "function"

Instead of looking for function definitions, which is going to be effectively impossible to do correctly without actually evaluating the functions, it will be easier to look for function calls.
The following function recursively spiders the expression/call tree returning the names of all objects that are called like a function:
find_calls <- function(x) {
# Base case
if (!is.recursive(x)) return()
recurse <- function(x) {
sort(unique(as.character(unlist(lapply(x, find_calls)))))
}
if (is.call(x)) {
f_name <- as.character(x[[1]])
c(f_name, recurse(x[-1]))
} else {
recurse(x)
}
}
It works as expected for a simple test case:
x <- expression({
f(3, g())
h <- function(x, y) {
i()
j()
k(l())
}
})
find_calls(x)
# [1] "{" "<-" "f" "function" "g" "i" "j"
# [8] "k" "l"

Just to follow up here as I have also been dealing with this problem: I have now created a C-level function to do this using code very similar to the C implementation of all.names and all.vars in base R. It however only works with objects of type "language" i.e. function calls, not type "expression". Demonstration:
ex = quote(sum(x) + mean(y) / z)
all.names(ex)
#> [1] "+" "sum" "x" "/" "mean" "y" "z"
all.vars(ex)
#> [1] "x" "y" "z"
collapse::all_funs(ex)
#> [1] "+" "sum" "/" "mean"
Created on 2022-08-17 by the reprex package (v2.0.1)
This generalizes to arbitrarily complex nested calls.

Develop Reference

r css asp.net wordpress firebase qt symfony nginx http apache-flex

Using function argument inside grepl pattern (R) - r

The pattern is just a string. you can use paste() for string concatenation grepl(pattern = paste(".*\\s",y), x) There is no other "special" way to reference variables inside the regular expression string.

You can build your pattern by using paste0(): Function1 <- function(x,y) { grepl(pattern = paste0(".*\\s",y), x) } Function1(data, 'S') [1] TRUE FALSE

Related

Make access to non-existant list element an error

Use capture group within gsub() as index/name for another object (R) [duplicate]

How to get the list of in-built functions used within a function

Applying a function to a backreference within gsub in R

Finding the names of all functions in an R expression

Categories

Resources