How to Perform a Regex Task in R - r

So I am trying to find a way for R to detect the characters "ar1" for a function I am making.
if(str_detect(as.character(y1.AR2), regex('ar1', ignore_case = T)) == T){
print('love')
} else {
print('nolove')
}
For example, the above code evaluates out to True, but I want it evaluate to false because there is no 'AR1', in the order 'A' followed by 'R' followed by '1', in the name of the object 'y1.AR2'. The only time I want the statement to evaluate to True is if it matches 'AR1' in that order whether it is in upper or lowercase.
Anyone know of a way to make this possible?
Thank you in advance!

func <- function(x) {
xname <- deparse(substitute(x))
if (grepl("ar1", xname, ignore.case = TRUE)) "love" else "nolove"
}
y1.AR2 <- 1
func(y1.AR2)
# [1] "nolove"
y1.AR1 <- 2
func(y1.AR1)
# [1] "love"
Finding the name of an argument to a function is a little fragile. For instance, doing
func(c("1", "quux", "tar1234"))
# [1] "love"
because deparse(substitute(x)) resolved to the literal expression used to form the first argument.

Related

R: using a switch function with a numeric type value? [duplicate]

I am new to R programming. I don't know whether we could use switch statements for numerical objects.
This is my code,
myfunction <- function() {
x <- 10
switch(x,
1={
print("one")
},
2={
print("two")
},
3={
print("three")
},
{
print("default") #Edited one..
}
)
}
I got this error,
test.R:4:18: unexpected '='
3: switch(x,
4: 1=
^
Please help me out this problem.
To take full advantage of switch's functionality (in particular its ability to handle arbitrary values with a final "default" expression) and to handle numbers other than 1,2,3,..., you'd be better off converting any input to a character string.
I would do something like this:
myfunction <- function(x) {
switch(as.character(x),
"1" = print("one"),
"2" = print("two"),
"3" = print("three"),
print("something other than 'one', 'two', or 'three'"))
}
myfunction(1)
# [1] "one"
myfunction(345)
# [1] "something other than 'one', 'two', or 'three'"
myfunction <- function(x) {
switch(x,
print("one"),
print("two"),
print("three"))}
myfunction(1)
## [1] "one"
Edit:
As mentioned in comments, this method isn't evaluating the values that are being entered, rather uses them as an index. Thus, it works in your case but it won't work if the statements were to be reordered (see #Joshs answer for better approach).
Either way, I don't think switch is the right function to use in this case, because it is mainly meant for switching between different alternatives, while in your case, you are basically running the same function over and over. Thus, adding extra a statement for each alternative seems like too much work (if you, for example, wanted to display 20 different numbers, you'll have to write 20 different statements).
Instead, you could try the english package which will allow you to display as many numbers as you will define in the ifelse statement
library(english)
myfunction2 <- function(x) {
ifelse(x %in% 1:3,
as.character(as.english(x)),
"default")}
myfunction2(1)
## [1] "one"
myfunction2(4)
## [1] "default"
Alternatively, you could also avoid using switch (though not necessarily recommended) by using match
myfunction3 <- function(x) {
df <- data.frame(A = 1:3, B = c("one", "two", "three"), stringsAsFactors = FALSE)
ifelse(x %in% 1:3,
df$B[match(x, df$A)],
"default")}
myfunction3(1)
## [1] "one"
myfunction3(4)
## [1] "default"
I would suggest reading the ?switch help page. This seems fairly well described there. Names in R can never be numeric, ie c(1=5) is not allowed, nor is f(1=5, 2=5). If you really have 1,2 or 3, then you want just
switch(x,
{print("one")},
{print("two")},
{print("three")}
)
(omit the names for numeric values)

Use capture group within gsub() as index/name for another object (R) [duplicate]

I'm new to R and am stuck with backreferencing that doesn't seem to work. In:
gsub("\\((\\d+)\\)", f("\\1"), string)
It correctly grabs the number in between parentheses but doesn't apply the (correctly defined, working otherwise) function f to replace the number --> it's actually the string "\1" that passes through to f.
Am I missing something or is it just that R does not handle this? If so, any idea how I could do something similar, i.e. applying a function "on the fly" to the (actually many) numbers that occur in between parentheses in the text I'm parsing?
Thanks a lot for your help.
R does not have the option of applying a function directly to a match via gsub. You'll actually have to extract the match, transform the value, then replace the value. This is relativaly easy with the regmatches function. For example
x<-"(990283)M (31)O (29)M (6360)M"
f<-function(x) {
v<-as.numeric(substr(x,2,nchar(x)-1))
paste0(v+5,".1")
}
m <- gregexpr("\\(\\d+\\)", x)
regmatches(x, m) <- lapply(regmatches(x, m), f)
x
# [1] "990288.1M 36.1O 34.1M 6365.1M"
Of course you can make f do whatever you like just make sure it's vector-friendly. Of course, you could wrap this in your own function
gsubf <- function(pattern, x, f) {
m <- gregexpr(pattern, x)
regmatches(x, m) <- lapply(regmatches(x, m), f)
x
}
gsubf("\\(\\d+\\)", x, f)
Note that in these examples we're not using a capture group, we're just grabbing the entire match. There are ways to extract the capture groups but they are a bit messier. If you wanted to provide an example where such an extraction is required, I might be able to come up with something fancier.
To use a callback within a regex-capable replacement function, you may use either gsubfn or stringr functions.
When choosing between them, note that stringr is based on ICU regex engine and with gsubfn, you may use either the default TCL (if the R installation has tcltk capability, else it is the default TRE) or PCRE (if you pass the perl=TRUE argument).
Also, note that gsubfn allows access to all capturing groups in the match object, while str_replace_all will only allow to manipulate the whole match only. Thus, for str_replace_all, the regex should look like (?<=\()\d+(?=\)), where 1+ digits are matched only when they are enclosed with ( and ) excluding them from the match.
With stringr, you may use str_replace_all:
library(stringr)
string <- "(990283)M (31)O (29)M (6360)M"
## Callback function to increment found number:
f <- function(x) { as.integer(x) + 1 }
str_replace_all(string, "(?<=\\()\\d+(?=\\))", function(m) f(m))
## => [1] "(990284)M (32)O (30)M (6361)M"
With gsubfn, pass perl=TRUE and backref=0 to be able to use lookarounds and just modify the whole match:
gsubfn("(?<=\\()\\d+(?=\\))", ~ f(m), string, perl=TRUE, backref=0)
## => [1] "(990284)M (32)O (30)M (6361)M"
If you have multiple groups in the pattern, remoe backref=0 and enumerate the group value arguments in the callback function declaration:
gsubfn("(\\()(\\d+)(\\))", function(m,n,o) paste0(m,f(n),o), string, perl=TRUE)
^ 1 ^^ 2 ^^ 3 ^ ^^^^^^^ ^^^^
This is for multiple different replacements.
text="foo(200) (300)bar (400)foo (500)bar (600)foo (700)bar"
f=function(x)
{
return(as.numeric(x[[1]])+5)
}
a=strsplit(text,"\\(\\K\\d+",perl=T)[[1]]
b=f(str_extract_all(text,perl("\\(\\K\\d+")))
paste0(paste0(a[-length(a)],b,collapse=""),a[length(a)]) #final output
#[1] "foo(205) (305)bar (405)foo (505)bar (605)foo (705)bar"
Here's a way by tweaking a bit stringr::str_replace(), in the replace argument, just use a lambda formula as the replace argument, and reference the captured group not by ""\\1" but by ..1, so your gsub("\\((\\d+)\\)", f("\\1"), string) will become str_replace2(string, "\\((\\d+)\\)", ~f(..1)), or just str_replace2(string, "\\((\\d+)\\)", f) in this simple case :
str_replace2 <- function(string, pattern, replacement, type.convert = TRUE){
if(inherits(replacement, "formula"))
replacement <- rlang::as_function(replacement)
if(is.function(replacement)){
grps_mat <- stringr::str_match(string, pattern)[,-1, drop = FALSE]
grps_list <- lapply(seq_len(ncol(grps_mat)), function(i) grps_mat[,i])
if(type.convert) {
grps_list <- type.convert(grps_list, as.is = TRUE)
replacement <- rlang::exec(replacement, !!! grps_list)
replacement <- as.character(replacement)
} else {
replacement <- rlang::exec(replacement, !!! grps_list)
}
}
stringr::str_replace(string, pattern, replacement)
}
str_replace2(
"foo (4)",
"\\((\\d+)\\)",
sqrt)
#> [1] "foo 2"
str_replace2(
"foo (4) (5)",
"\\((\\d+)\\) \\((\\d+)\\)",
~ sprintf("(%s)", ..1 * ..2))
#> [1] "foo (20)"
Created on 2020-01-24 by the reprex package (v0.3.0)

Finding the position of a character within a string

I am trying to find the equivalent of the ANYALPHA SAS function in R. This function searches a character string for an alphabetic character, and returns the first position at which at which the character is found.
Example: looking at the following string '123456789A', the ANYALPHA function would return 10 since first alphabetic character is at position 10 in the string. I would like to replicate this function in R but have not been able to figure it out. I need to search for any alphabetic character regardless of case (i.e. [:alpha:])
Thanks for any help you can offer!
Here's an anyalpha function. I added a few extra features. You can specify the maximum amount of matches you want in the n argument, it defaults to 1. You can also specify if you want the position or the value itself with value=TRUE:
anyalpha <- function(txt, n=1, value=FALSE) {
txt <- as.character(txt)
indx <- gregexpr("[[:alpha:]]", txt)[[1]]
ret <- indx[1:(min(n, length(indx)))]
if(value) {
mapply(function(x,y) substr(txt, x, y), ret, ret)
} else {ret}
}
#test
x <- '123A56789BC'
anyalpha(x)
#[1] 4
anyalpha(x, 2)
#[1] 4 10
anyalpha(x, 2, value=TRUE)
#[1] "C" "A"

Switch statement is not working for numerical objects

I am new to R programming. I don't know whether we could use switch statements for numerical objects.
This is my code,
myfunction <- function() {
x <- 10
switch(x,
1={
print("one")
},
2={
print("two")
},
3={
print("three")
},
{
print("default") #Edited one..
}
)
}
I got this error,
test.R:4:18: unexpected '='
3: switch(x,
4: 1=
^
Please help me out this problem.
To take full advantage of switch's functionality (in particular its ability to handle arbitrary values with a final "default" expression) and to handle numbers other than 1,2,3,..., you'd be better off converting any input to a character string.
I would do something like this:
myfunction <- function(x) {
switch(as.character(x),
"1" = print("one"),
"2" = print("two"),
"3" = print("three"),
print("something other than 'one', 'two', or 'three'"))
}
myfunction(1)
# [1] "one"
myfunction(345)
# [1] "something other than 'one', 'two', or 'three'"
myfunction <- function(x) {
switch(x,
print("one"),
print("two"),
print("three"))}
myfunction(1)
## [1] "one"
Edit:
As mentioned in comments, this method isn't evaluating the values that are being entered, rather uses them as an index. Thus, it works in your case but it won't work if the statements were to be reordered (see #Joshs answer for better approach).
Either way, I don't think switch is the right function to use in this case, because it is mainly meant for switching between different alternatives, while in your case, you are basically running the same function over and over. Thus, adding extra a statement for each alternative seems like too much work (if you, for example, wanted to display 20 different numbers, you'll have to write 20 different statements).
Instead, you could try the english package which will allow you to display as many numbers as you will define in the ifelse statement
library(english)
myfunction2 <- function(x) {
ifelse(x %in% 1:3,
as.character(as.english(x)),
"default")}
myfunction2(1)
## [1] "one"
myfunction2(4)
## [1] "default"
Alternatively, you could also avoid using switch (though not necessarily recommended) by using match
myfunction3 <- function(x) {
df <- data.frame(A = 1:3, B = c("one", "two", "three"), stringsAsFactors = FALSE)
ifelse(x %in% 1:3,
df$B[match(x, df$A)],
"default")}
myfunction3(1)
## [1] "one"
myfunction3(4)
## [1] "default"
I would suggest reading the ?switch help page. This seems fairly well described there. Names in R can never be numeric, ie c(1=5) is not allowed, nor is f(1=5, 2=5). If you really have 1,2 or 3, then you want just
switch(x,
{print("one")},
{print("two")},
{print("three")}
)
(omit the names for numeric values)

Applying a function to a backreference within gsub in R

I'm new to R and am stuck with backreferencing that doesn't seem to work. In:
gsub("\\((\\d+)\\)", f("\\1"), string)
It correctly grabs the number in between parentheses but doesn't apply the (correctly defined, working otherwise) function f to replace the number --> it's actually the string "\1" that passes through to f.
Am I missing something or is it just that R does not handle this? If so, any idea how I could do something similar, i.e. applying a function "on the fly" to the (actually many) numbers that occur in between parentheses in the text I'm parsing?
Thanks a lot for your help.
R does not have the option of applying a function directly to a match via gsub. You'll actually have to extract the match, transform the value, then replace the value. This is relativaly easy with the regmatches function. For example
x<-"(990283)M (31)O (29)M (6360)M"
f<-function(x) {
v<-as.numeric(substr(x,2,nchar(x)-1))
paste0(v+5,".1")
}
m <- gregexpr("\\(\\d+\\)", x)
regmatches(x, m) <- lapply(regmatches(x, m), f)
x
# [1] "990288.1M 36.1O 34.1M 6365.1M"
Of course you can make f do whatever you like just make sure it's vector-friendly. Of course, you could wrap this in your own function
gsubf <- function(pattern, x, f) {
m <- gregexpr(pattern, x)
regmatches(x, m) <- lapply(regmatches(x, m), f)
x
}
gsubf("\\(\\d+\\)", x, f)
Note that in these examples we're not using a capture group, we're just grabbing the entire match. There are ways to extract the capture groups but they are a bit messier. If you wanted to provide an example where such an extraction is required, I might be able to come up with something fancier.
To use a callback within a regex-capable replacement function, you may use either gsubfn or stringr functions.
When choosing between them, note that stringr is based on ICU regex engine and with gsubfn, you may use either the default TCL (if the R installation has tcltk capability, else it is the default TRE) or PCRE (if you pass the perl=TRUE argument).
Also, note that gsubfn allows access to all capturing groups in the match object, while str_replace_all will only allow to manipulate the whole match only. Thus, for str_replace_all, the regex should look like (?<=\()\d+(?=\)), where 1+ digits are matched only when they are enclosed with ( and ) excluding them from the match.
With stringr, you may use str_replace_all:
library(stringr)
string <- "(990283)M (31)O (29)M (6360)M"
## Callback function to increment found number:
f <- function(x) { as.integer(x) + 1 }
str_replace_all(string, "(?<=\\()\\d+(?=\\))", function(m) f(m))
## => [1] "(990284)M (32)O (30)M (6361)M"
With gsubfn, pass perl=TRUE and backref=0 to be able to use lookarounds and just modify the whole match:
gsubfn("(?<=\\()\\d+(?=\\))", ~ f(m), string, perl=TRUE, backref=0)
## => [1] "(990284)M (32)O (30)M (6361)M"
If you have multiple groups in the pattern, remoe backref=0 and enumerate the group value arguments in the callback function declaration:
gsubfn("(\\()(\\d+)(\\))", function(m,n,o) paste0(m,f(n),o), string, perl=TRUE)
^ 1 ^^ 2 ^^ 3 ^ ^^^^^^^ ^^^^
This is for multiple different replacements.
text="foo(200) (300)bar (400)foo (500)bar (600)foo (700)bar"
f=function(x)
{
return(as.numeric(x[[1]])+5)
}
a=strsplit(text,"\\(\\K\\d+",perl=T)[[1]]
b=f(str_extract_all(text,perl("\\(\\K\\d+")))
paste0(paste0(a[-length(a)],b,collapse=""),a[length(a)]) #final output
#[1] "foo(205) (305)bar (405)foo (505)bar (605)foo (705)bar"
Here's a way by tweaking a bit stringr::str_replace(), in the replace argument, just use a lambda formula as the replace argument, and reference the captured group not by ""\\1" but by ..1, so your gsub("\\((\\d+)\\)", f("\\1"), string) will become str_replace2(string, "\\((\\d+)\\)", ~f(..1)), or just str_replace2(string, "\\((\\d+)\\)", f) in this simple case :
str_replace2 <- function(string, pattern, replacement, type.convert = TRUE){
if(inherits(replacement, "formula"))
replacement <- rlang::as_function(replacement)
if(is.function(replacement)){
grps_mat <- stringr::str_match(string, pattern)[,-1, drop = FALSE]
grps_list <- lapply(seq_len(ncol(grps_mat)), function(i) grps_mat[,i])
if(type.convert) {
grps_list <- type.convert(grps_list, as.is = TRUE)
replacement <- rlang::exec(replacement, !!! grps_list)
replacement <- as.character(replacement)
} else {
replacement <- rlang::exec(replacement, !!! grps_list)
}
}
stringr::str_replace(string, pattern, replacement)
}
str_replace2(
"foo (4)",
"\\((\\d+)\\)",
sqrt)
#> [1] "foo 2"
str_replace2(
"foo (4) (5)",
"\\((\\d+)\\) \\((\\d+)\\)",
~ sprintf("(%s)", ..1 * ..2))
#> [1] "foo (20)"
Created on 2020-01-24 by the reprex package (v0.3.0)

Resources