Split vector, categorized by regex - r

I am searching for a method to split a character vector based on a RegEx pattern.
Example of input:
input <- c("a_foo","b_foo", "c_bar", "d_bar")
split_by <- c("foo", "bar")
The result I am searching for:
$foo
[1] "a_foo" "b_foo"
$bar
[1] "c_bar" "d_bar"
EDIT
Based on the comments and answers, there is need for a clarification.
split_by can have any number of elements;
the RegEx pattern varies from case to case; and
an element in input may be assigned to 0 (no matches), 1, or multiple splits depending on the match.
Hence, the following input:
input <- c("foo_bar", "nothing", "a_foo", "c_bar")
split_by <- c("foo", "bar")
Could return:
$foo
[1] "foo_bar" "a_foo"
$bar
[1] "foo_bar" "c_bar"

In real case, can you extract split_by values from input data?
This works for the example shared.
split(input, sub('.*_', '', input))
#$bar
#[1] "c_bar" "d_bar"
#$foo
#[1] "a_foo" "b_foo"
where
sub('.*_', '', input) #returns
#[1] "foo" "foo" "bar" "bar"

lapply(split_by, grep, x = input, value = TRUE)
# [[1]]
# [1] "a_foo" "b_foo"
#
# [[2]]
# [1] "c_bar" "d_bar"
To get named output you could do:
lapply(setNames(split_by, split_by), grep, x = input, value = TRUE)

split.regex <- function(input, split_by, pattern, add_names=TRUE) {
out <- lapply(split_by, function(x) {
input[grepl(sprintf(pattern, x), input)]
})
if (add_names) {
names(out) <- split_by
}
return(out)
}
First, the pattern must be defined. Since foo and bar occurs at the end of the string, sprintf("%s$", split_by) can be used. In the function, I defined the sprintf inside the function so the argument pattern should be defined as the sprintf string "%s$".
First example
By defining input and split_by as in the question's first example, and then running:
split.regex(input=input, split_by=split_by, pattern="%s$", add_names=TRUE)
We get the desired result:
$foo
[1] "a_foo" "b_foo"
$bar
[1] "c_bar" "d_bar"
Second example
By defining input and split_by as in the question's second example, and then running:
split.regex(input=input, split_by=split_by, pattern="(%s)", add_names=TRUE)
We get the desired result:
$foo
[1] "foo_bar" "a_foo"
$bar
[1] "foo_bar" "c_bar"
Since the input "nothing" didn't match on any, it was correctly excluded from the split, whereas "foo_bar" was correctly added to both splits as it matched on both.

Related

Use object name in function with map/lapply [duplicate]

I am looking for the reverse of get().
Given an object name, I wish to have the character string representing that object extracted directly from the object.
Trivial example with foo being the placeholder for the function I am looking for.
z <- data.frame(x=1:10, y=1:10)
test <- function(a){
mean.x <- mean(a$x)
print(foo(a))
return(mean.x)}
test(z)
Would print:
"z"
My work around, which is harder to implement in my current problem is:
test <- function(a="z"){
mean.x <- mean(get(a)$x)
print(a)
return(mean.x)}
test("z")
The old deparse-substitute trick:
a<-data.frame(x=1:10,y=1:10)
test<-function(z){
mean.x<-mean(z$x)
nm <-deparse(substitute(z))
print(nm)
return(mean.x)}
test(a)
#[1] "a" ... this is the side-effect of the print() call
# ... you could have done something useful with that character value
#[1] 5.5 ... this is the result of the function call
Edit: Ran it with the new test-object
Note: this will not succeed inside a local function when a set of list items are passed from the first argument to lapply (and it also fails when an object is passed from a list given to a for-loop.) You would be able to extract the ".Names"-attribute and the order of processing from the structure result, if it were a named vector that were being processed.
> lapply( list(a=4,b=5), function(x) {nm <- deparse(substitute(x)); strsplit(nm, '\\[')} )
$a # This "a" and the next one in the print output are put in after processing
$a[[1]]
[1] "X" "" "1L]]" # Notice that there was no "a"
$b
$b[[1]]
[1] "X" "" "2L]]"
> lapply( c(a=4,b=5), function(x) {nm <- deparse(substitute(x)); strsplit(nm, '\\[')} )
$a
$a[[1]] # but it's theoretically possible to extract when its an atomic vector
[1] "structure(c(4, 5), .Names = c(\"a\", \"b\"))" ""
[3] "1L]]"
$b
$b[[1]]
[1] "structure(c(4, 5), .Names = c(\"a\", \"b\"))" ""
[3] "2L]]"
deparse(quote(var))
My intuitive understanding
In which the quote freeze the var or expression from evaluation
and the deparse function which is the inverse of parse function makes that freezed symbol back to String
Note that for print methods the behavior can be different.
print.foo=function(x){ print(deparse(substitute(x))) }
test = list(a=1, b=2)
class(test)="foo"
#this shows "test" as expected
print(test)
#this (just typing 'test' on the R command line)
test
#shows
#"structure(list(a = 1, b = 2), .Names = c(\"a\", \"b\"), class = \"foo\")"
Other comments I've seen on forums suggests that the last behavior is unavoidable. This is unfortunate if you are writing print methods for packages.
To elaborate on Eli Holmes' answer:
myfunc works beautifully
I was tempted to call it within another function (as discussed in his Aug 15, '20 comment)
Fail
Within a function, coded directly (rather than called from an external function), the deparse(substitute() trick works well.
This is all implicit in his answer, but for the benefit of peeps with my degree of obliviousness, I wanted to spell it out.
an_object <- mtcars
myfunc <- function(x) deparse(substitute(x))
myfunc(an_object)
#> [1] "an_object"
# called within another function
wrapper <- function(x){
myfunc(x)
}
wrapper(an_object)
#> [1] "x"

How to convert object variable into string inside a function

I have the following list of vectors
v1 <- c("foo","bar")
v2 <- c("qux","uip","lsi")
mylist <- list(v1,v2)
mylist
#> [[1]]
#> [1] "foo" "bar"
#>
#> [[2]]
#> [1] "qux" "uip" "lsi"
What I want to do is to apply a function so that it prints the this string:
v1:foo,bar
v2:qux,uip,lsi
So it involves two step: 1) Convert object variable to string and
2) make the vector into string. The latter is easy as I can do this:
make_string <- function (content_vector) {
cat(content_vector,sep=",")
}
make_string(mylist[[1]])
# foo,bar
make_string(mylist[[2]])
# qux,uip,lsi
I am aware of this solution, but I don't know how can I turn the object name into a string within a function so that
it prints like my desired output.
I need to to this inside a function, because there are many other output I need to process.
We can use
cat(paste(c('v1', 'v2'), sapply(mylist, toString), sep=":", collapse="\n"), '\n')
#v1:foo, bar
#v2:qux, uip, lsi
If we need to pass the original object i.e. 'v1', 'v2'
make_string <- function(vec){
obj <- deparse(substitute(vec))
paste(obj, toString(vec), sep=":")
}
make_string(v1)
#[1] "v1:foo, bar"
If you want to use a list, you can name the objects in the list to be able to use them in a function. Remove the cat if you just want a string to be returned.
v1 <- c("foo","bar")
v2 <- c("qux","uip","lsi")
# objects given names here
mylist <- list("v1" = v1, "v2" = v2)
# see names now next to the $
mylist
$v1
[1] "foo" "bar"
$v2
[1] "qux" "uip" "lsi"
make_string <- function (content_vector) {
vecname <- names(content_vector)
cat(paste0(vecname, ":", paste(sapply(content_vector, toString), sep = ",")))
}
make_string(mylist[1])
v1:foo, bar
make_string(mylist[2])
v2:qux, uip, lsi

Make argument of your function a text string [duplicate]

I am looking for the reverse of get().
Given an object name, I wish to have the character string representing that object extracted directly from the object.
Trivial example with foo being the placeholder for the function I am looking for.
z <- data.frame(x=1:10, y=1:10)
test <- function(a){
mean.x <- mean(a$x)
print(foo(a))
return(mean.x)}
test(z)
Would print:
"z"
My work around, which is harder to implement in my current problem is:
test <- function(a="z"){
mean.x <- mean(get(a)$x)
print(a)
return(mean.x)}
test("z")
The old deparse-substitute trick:
a<-data.frame(x=1:10,y=1:10)
test<-function(z){
mean.x<-mean(z$x)
nm <-deparse(substitute(z))
print(nm)
return(mean.x)}
test(a)
#[1] "a" ... this is the side-effect of the print() call
# ... you could have done something useful with that character value
#[1] 5.5 ... this is the result of the function call
Edit: Ran it with the new test-object
Note: this will not succeed inside a local function when a set of list items are passed from the first argument to lapply (and it also fails when an object is passed from a list given to a for-loop.) You would be able to extract the ".Names"-attribute and the order of processing from the structure result, if it were a named vector that were being processed.
> lapply( list(a=4,b=5), function(x) {nm <- deparse(substitute(x)); strsplit(nm, '\\[')} )
$a # This "a" and the next one in the print output are put in after processing
$a[[1]]
[1] "X" "" "1L]]" # Notice that there was no "a"
$b
$b[[1]]
[1] "X" "" "2L]]"
> lapply( c(a=4,b=5), function(x) {nm <- deparse(substitute(x)); strsplit(nm, '\\[')} )
$a
$a[[1]] # but it's theoretically possible to extract when its an atomic vector
[1] "structure(c(4, 5), .Names = c(\"a\", \"b\"))" ""
[3] "1L]]"
$b
$b[[1]]
[1] "structure(c(4, 5), .Names = c(\"a\", \"b\"))" ""
[3] "2L]]"
deparse(quote(var))
My intuitive understanding
In which the quote freeze the var or expression from evaluation
and the deparse function which is the inverse of parse function makes that freezed symbol back to String
Note that for print methods the behavior can be different.
print.foo=function(x){ print(deparse(substitute(x))) }
test = list(a=1, b=2)
class(test)="foo"
#this shows "test" as expected
print(test)
#this (just typing 'test' on the R command line)
test
#shows
#"structure(list(a = 1, b = 2), .Names = c(\"a\", \"b\"), class = \"foo\")"
Other comments I've seen on forums suggests that the last behavior is unavoidable. This is unfortunate if you are writing print methods for packages.
To elaborate on Eli Holmes' answer:
myfunc works beautifully
I was tempted to call it within another function (as discussed in his Aug 15, '20 comment)
Fail
Within a function, coded directly (rather than called from an external function), the deparse(substitute() trick works well.
This is all implicit in his answer, but for the benefit of peeps with my degree of obliviousness, I wanted to spell it out.
an_object <- mtcars
myfunc <- function(x) deparse(substitute(x))
myfunc(an_object)
#> [1] "an_object"
# called within another function
wrapper <- function(x){
myfunc(x)
}
wrapper(an_object)
#> [1] "x"

R: Argument as variablename and string in function? [duplicate]

I am looking for the reverse of get().
Given an object name, I wish to have the character string representing that object extracted directly from the object.
Trivial example with foo being the placeholder for the function I am looking for.
z <- data.frame(x=1:10, y=1:10)
test <- function(a){
mean.x <- mean(a$x)
print(foo(a))
return(mean.x)}
test(z)
Would print:
"z"
My work around, which is harder to implement in my current problem is:
test <- function(a="z"){
mean.x <- mean(get(a)$x)
print(a)
return(mean.x)}
test("z")
The old deparse-substitute trick:
a<-data.frame(x=1:10,y=1:10)
test<-function(z){
mean.x<-mean(z$x)
nm <-deparse(substitute(z))
print(nm)
return(mean.x)}
test(a)
#[1] "a" ... this is the side-effect of the print() call
# ... you could have done something useful with that character value
#[1] 5.5 ... this is the result of the function call
Edit: Ran it with the new test-object
Note: this will not succeed inside a local function when a set of list items are passed from the first argument to lapply (and it also fails when an object is passed from a list given to a for-loop.) You would be able to extract the ".Names"-attribute and the order of processing from the structure result, if it were a named vector that were being processed.
> lapply( list(a=4,b=5), function(x) {nm <- deparse(substitute(x)); strsplit(nm, '\\[')} )
$a # This "a" and the next one in the print output are put in after processing
$a[[1]]
[1] "X" "" "1L]]" # Notice that there was no "a"
$b
$b[[1]]
[1] "X" "" "2L]]"
> lapply( c(a=4,b=5), function(x) {nm <- deparse(substitute(x)); strsplit(nm, '\\[')} )
$a
$a[[1]] # but it's theoretically possible to extract when its an atomic vector
[1] "structure(c(4, 5), .Names = c(\"a\", \"b\"))" ""
[3] "1L]]"
$b
$b[[1]]
[1] "structure(c(4, 5), .Names = c(\"a\", \"b\"))" ""
[3] "2L]]"
deparse(quote(var))
My intuitive understanding
In which the quote freeze the var or expression from evaluation
and the deparse function which is the inverse of parse function makes that freezed symbol back to String
Note that for print methods the behavior can be different.
print.foo=function(x){ print(deparse(substitute(x))) }
test = list(a=1, b=2)
class(test)="foo"
#this shows "test" as expected
print(test)
#this (just typing 'test' on the R command line)
test
#shows
#"structure(list(a = 1, b = 2), .Names = c(\"a\", \"b\"), class = \"foo\")"
Other comments I've seen on forums suggests that the last behavior is unavoidable. This is unfortunate if you are writing print methods for packages.
To elaborate on Eli Holmes' answer:
myfunc works beautifully
I was tempted to call it within another function (as discussed in his Aug 15, '20 comment)
Fail
Within a function, coded directly (rather than called from an external function), the deparse(substitute() trick works well.
This is all implicit in his answer, but for the benefit of peeps with my degree of obliviousness, I wanted to spell it out.
an_object <- mtcars
myfunc <- function(x) deparse(substitute(x))
myfunc(an_object)
#> [1] "an_object"
# called within another function
wrapper <- function(x){
myfunc(x)
}
wrapper(an_object)
#> [1] "x"

In R, how to get an object's name after it is sent to a function?

I am looking for the reverse of get().
Given an object name, I wish to have the character string representing that object extracted directly from the object.
Trivial example with foo being the placeholder for the function I am looking for.
z <- data.frame(x=1:10, y=1:10)
test <- function(a){
mean.x <- mean(a$x)
print(foo(a))
return(mean.x)}
test(z)
Would print:
"z"
My work around, which is harder to implement in my current problem is:
test <- function(a="z"){
mean.x <- mean(get(a)$x)
print(a)
return(mean.x)}
test("z")
The old deparse-substitute trick:
a<-data.frame(x=1:10,y=1:10)
test<-function(z){
mean.x<-mean(z$x)
nm <-deparse(substitute(z))
print(nm)
return(mean.x)}
test(a)
#[1] "a" ... this is the side-effect of the print() call
# ... you could have done something useful with that character value
#[1] 5.5 ... this is the result of the function call
Edit: Ran it with the new test-object
Note: this will not succeed inside a local function when a set of list items are passed from the first argument to lapply (and it also fails when an object is passed from a list given to a for-loop.) You would be able to extract the ".Names"-attribute and the order of processing from the structure result, if it were a named vector that were being processed.
> lapply( list(a=4,b=5), function(x) {nm <- deparse(substitute(x)); strsplit(nm, '\\[')} )
$a # This "a" and the next one in the print output are put in after processing
$a[[1]]
[1] "X" "" "1L]]" # Notice that there was no "a"
$b
$b[[1]]
[1] "X" "" "2L]]"
> lapply( c(a=4,b=5), function(x) {nm <- deparse(substitute(x)); strsplit(nm, '\\[')} )
$a
$a[[1]] # but it's theoretically possible to extract when its an atomic vector
[1] "structure(c(4, 5), .Names = c(\"a\", \"b\"))" ""
[3] "1L]]"
$b
$b[[1]]
[1] "structure(c(4, 5), .Names = c(\"a\", \"b\"))" ""
[3] "2L]]"
deparse(quote(var))
My intuitive understanding
In which the quote freeze the var or expression from evaluation
and the deparse function which is the inverse of parse function makes that freezed symbol back to String
Note that for print methods the behavior can be different.
print.foo=function(x){ print(deparse(substitute(x))) }
test = list(a=1, b=2)
class(test)="foo"
#this shows "test" as expected
print(test)
#this (just typing 'test' on the R command line)
test
#shows
#"structure(list(a = 1, b = 2), .Names = c(\"a\", \"b\"), class = \"foo\")"
Other comments I've seen on forums suggests that the last behavior is unavoidable. This is unfortunate if you are writing print methods for packages.
To elaborate on Eli Holmes' answer:
myfunc works beautifully
I was tempted to call it within another function (as discussed in his Aug 15, '20 comment)
Fail
Within a function, coded directly (rather than called from an external function), the deparse(substitute() trick works well.
This is all implicit in his answer, but for the benefit of peeps with my degree of obliviousness, I wanted to spell it out.
an_object <- mtcars
myfunc <- function(x) deparse(substitute(x))
myfunc(an_object)
#> [1] "an_object"
# called within another function
wrapper <- function(x){
myfunc(x)
}
wrapper(an_object)
#> [1] "x"

Resources