Regex in R lists to call specific function - r

It is of course possible to store functions in a list to call it.
It is also possible to name that list entry to have a better access to it later.
Now I need the list item name to be a regular expression like this:
funcList <- list("^\\+[0-9]{1,3}$"=lead, "^\\-[0-9]{1,3}$"=lag)
a <- funcList$"+12"(a,12) # this will fire function "lead"
a <- funcList$"-4"(a,-4) # this will fire function "lag"
a <- funcList$"^\\+[0-9]{1,3}$"(a,12) # this works of course but is not what I want...
Of course this is not working correctly and I am getting the error "Error: attempt to apply non-function" because it is not used as regex but as a normal string value.
Is it possible to do what I need?

You could use the names of the array as parameters for grepl:
funcList <- list("^\\+[0-9]{1,3}$"=lead, "^\\-[0-9]{1,3}$"=lag)
f1 <- funcList[sapply(names(funcList), function(x) grepl(x,"+12"))][[1]]
f2 <- funcList[sapply(names(funcList), function(x) grepl(x,"-4"))][[1]]
> f1(seq(1,10))
[1] 2 3 4 5 6 7 8 9 10 NA
> f2(seq(1,10))
[1] NA 1 2 3 4 5 6 7 8 9

I think you can map strings like "+4" and "-12" to lead/lag more straightforwardly like:
set.seed(123)
df = data.frame(
x = sample(1:20, 10)
)
shifted = function(x, shift) {
direction = substr(shift, 1, 1)
amount = as.integer(substr(shift, 2, nchar(shift)))
if (direction == "+") {
return(lead(x, amount))
} else {
return(lag(x, amount))
}
}
df %>%
mutate(
plus4 = shifted(x, "+4"),
minus3 = shifted(x, "-3")
)
You could use regex within the shifted function if you need to do more validation of the "+4" strings, but I prefer not to go for complicated regexes unless they're definitely needed.

Related

Using replace in R multiple times in function

I'm trying to use R's replace multiple times in a function, but only the last use seems to work. For instance, using x where
x <- c(1:3)
if I wanted to add one to each odd value, I tried
test <- function(x) {
replace(x,x==1,2)
replace(x,x==3,4)
}
but test(x) results in (1,2,4) where I wanted it to be (2,2,4)--in other words, only the last "replace" seems to be working. I know I could refer to the values by location within the vector, but anyone know how to fix this if I want to refer to the values themselves?
Thanks so much!
you need to assign the output of the replace function to a variable
x <- c(1:3)
test <- function(x) {
x <- replace(x,x==1,2)
replace(x,x==3,4)
}
test(x)
[1] 2 2 4
Or using the case_when function from dplyr
library(dplyr)
case_when(x == 1 ~ 2,
x == 3 ~ 4,
TRUE ~ as.double(x))
[1] 2 2 4

Shuffling around elements in a list

My general question concerns shuffling elements around in a list efficiently.
Say I have a list:
region <- list(c(1,3,2,6),c(5,8,9),c(10,4,7))
and two constants:
value <- 2
swapin <- 5
I want to do two things. I want to remove the element == value from the vector in the list, and then add it to the vector in which the first element of that vector == swapin
The result should look like:
region <- list(c(1,3,6),c(5,8,9,2),c(10,4,7))
For the first step, the only way I can think of doing it is doing something like this:
region <- lapply(1:length(region), function(x) region[[x]][region[[x]] != value])
but this seems inefficient. My actual data could involve a very large list, and this approach seems cumbersome. Is there an easy trick to avoiding the looping going on?
For the second step, I can create an updated vector like this:
updated <- c(unlist(region[sapply(region, `[`, 1)==swap]),best)
but I am stumped on how to replace the vector currently in the list, c(5,8,9), with the updated vector, c(5,8,9,2). Maybe I can just add the element some easier way?
Can anyone help, please?
Something like this will do the trick:
region <- list(c(1,3,2,6),c(5,8,9),c(10,4,7))
value <- 2
swapin <- 5
step1 = lapply(region, function(x) x[x != value])
step2 = lapply(step1, function(x){
if(x[1]==swapin){
return(c(x, value))
} else {
return(x)
}
})
Instead of looping through region by feeding in it's element indices, you can just loop through region itself. This is actually how lapply is intended to be used - to apply a function to each element of a list. The second step replaces each element x, with x + value if the first element of x matches with swapin, or with x itself if swapin doesn't match.
Result:
> step2
[[1]]
[1] 1 3 6
[[2]]
[1] 5 8 9 2
[[3]]
[1] 10 4 7
You can also easily make it a convenience function for later use:
element_swap = function(list, value, swapin){
step1 = lapply(list, function(x) x[x != value])
step2 = lapply(step1, function(x){
if(x[1]==swapin){
return(c(x, value))
} else {
return(x)
}
})
return(step2)
}
Result:
> element_swap(region, 1, 10)
[[1]]
[1] 3 2 6
[[2]]
[1] 5 8 9
[[3]]
[1] 10 4 7 1

Various results with distinct() in a custom function

I want to create a function in R that will create a numerical column based on a character/categorical column. In order to do this I need to get the distinct values in the categorical column. I can do this outside a function well, but would like to make a reusable function to do it. The issue I've run into is that the same distinct() formula that works outside the function doesn't behave the same way within the formula. I've created a demo below:
# test of call to db to numericize
DF <- data.frame("a" = c("a","b","c","a","b","c"),
"b" = paste(0:5, ".1", sep = ""),
"c" = letters[1:6],
stringsAsFactors = FALSE)
catnum <- function(db, inputcolname) {
x <- distinct(db,inputcolname);
print(x);
return(x);
}
y <- distinct(DF,a)
y
catnum(DF,'a')
While y gives the correct distinct one column answer (one column with (a,b,c) in it), x within the function is the entire dataframe. I have tried with and without the ' ', as in catnum(DF,a) but the results are the same.
Could someone tell me what is happening or suggest some code that would work?
One solution is to use distinct_ function inside function. The distinct expect column name and it doesn't work with column names in a variable.
For example distinct(DF, "a") will not work. The actual syntax is: distinct(DF, a). Notice the missing quotes. When distinct is called from function then column name was provided as variable name (i.e inputcolname) which was evaluated. Hence unexpected result. But distinct_ works on variable name for columns.
library(dplyr)
catnum <- function(db, inputcolname) {
x <- distinct_(db,inputcolname);
#print(x);
return(x);
}
#With modified function results were as expected.
catnum(DF,'a')
# a
# 1 a
# 2 b
# 3 c
Not sure what you are trying to do and where distinct function is coming from. Are you looking for this?
catnum<-function(DF,var){
length(unique(DF[[var]]))
}
catnum(DF,'a')
You're inputs are not the same, and so you get different results. If you give distinct the same arguments you give catnum, you will get the same result:
isTRUE(all.equal(distinct(DF, a),
catnum(DF, "a")))
## [1] FALSE
isTRUE(all.equal(distinct(DF, "a"),
catnum(DF, "a")))
##[1] TRUE
Unfortunately, this does not work:
catnum(DF, a)
## a b c
## 1 a 0.1 a
## 2 b 1.1 b
## 3 c 2.1 c
## 4 a 3.1 d
## 5 b 4.1 e
## 6 c 5.1 f
The reason, as explained in
vignette("programming")
is that you must jump through several annoying hoops if you want to write functions that use functions from dplyr. The solution (as you will learn in the vignette) is as follows:
catnum <- function(db, inputcolname) {
inputcolname <- enquo(inputcolname)
distinct(db, !!inputcolname)
}
catnum(DF, a)
## a
## 1 a
## 2 b
## 3 c
Or you could conclude that this is all too confusing and do something like
catnum <- function(db, inputcolname) {
unique(db[, inputcolname, drop = FALSE])
}
catnum(DF, "a")
## a
## 1 a
## 2 b
## 3 c
instead.

Reference vector from data frame using custom function

I'm trying to call a vector "a" from a data frame "df" using a function. I know I could do this just fine with the following:
> df$a
[1] 1 2 3
But I'd like to use a function where both the data frame and vector names are input separately as arguments. This is the best that I've come up with:
show_vector <- function(data.set, column) {
data.set$column
}
But here's how it goes when I try it out:
> show_vector(df, a)
NULL
How could I change this function in order to successfully reference vector df$a where the names of both are input to a function as arguments?
It's actually possible to do this without passing the column name as a string (in other words, you can pass in the unquoted column name:
show_vector <- function(data.set, column) {
eval(substitute(column), envir = data.set)
}
Usage example:
df <- data.frame(a = 1:3, b = 4:6)
show_vector(df, b)
# 4 5 6
I've wondered about this kind of thing a lot in the past and haven't found an easy fix. The best I've come up with is this:
df <- data.frame(c(1, 2, 3), c(4, 5, 6))
colnames(df) <- c("A", "B")
test <- function(dataframe, columnName) {
return(dataframe[, match(columnName, colnames(dataframe))])
}
test(df, "A")
Your code would work if you only put the column name in quotes i.e. show_vector(df, "a")
Other multiple ways to do this:
Using base functionality
func <- function(df, cname){
return(df[, grep(cname, colnames(df))])
}
Or even
func <- function(df, cname){
return(df[, cname])
}
You can use substitute to capture the input vector name as it is then use `as.character to make it as a character.
show_vector <- function(data.set, column) {
data.set[,as.character(substitute(column))]
}
Now lets take a look:
(dat=data.frame(a=1:3,b=4:6,c=10:12))
a b c
1 1 4 10
2 2 5 11
3 3 6 12
show_vector(dat,a)
[1] 1 2 3
show_vector(dat,"a")
[1] 1 2 3
It works.
we can also write a simple one where we just input a character string:
show_vector1 <- function(data.set, column) {
data.set[,column]
}
show_vector1(dat,"a")
[1] 1 2 3
Although this will not work if the column name is not a character:
show_vector1(dat,a)
**Show Traceback
Rerun with Debug
Error in `[.data.frame`(data.set, , column) : undefined columns selected**

Call an object using a function in R

I have R objects:
"debt_30_06_2010" "debt_30_06_2011" "debt_30_06_2012" ...
and need to call them using a function:
paste0("debt_",date) ## "date" being another object
The problem is that when I assign the call to another object it takes only the name not the content:
debt_a <- paste0("endeud_", date1)
> debt_a
[1] "debt_30_06_2014"
I've tried to use the function "assign" without success:
assign("debt_a", paste0("debt_", date))
> debt_a
[1] "debt_30_06_2014"
I would like to know there is any method to achieve this task.
We could use get to get the value of the object. If there are multiple objects, use mget. For example, here I am assigning 'debt_a' with the value of 'debt_30_06_2010'
assign('debt_a', get(paste0('debt_', date[1])))
debt_a
#[1] 1 2 3 4 5
mget returns a list. So if we are assigning 'debt_a' to multiple objects,
assign('debt_a', mget(paste0('debt_', date)))
debt_a
#$debt_30_06_2010
#[1] 1 2 3 4 5
#$debt_30_06_2011
#[1] 6 7 8 9 10
data
debt_30_06_2010 <- 1:5
debt_30_06_2011 <- 6:10
date <- c('30_06_2010', '30_06_2011')
I'm not sure if I understood your question correctly, but I suspect that your objects are names of functions, and that you want to construct these names as characters to use the functions. If this is the case, this example might help:
myfun <- function(x){sin(x)**2}
mychar <- paste0("my", "fun")
eval(call(mychar, x = pi / 4))
#[1] 0.5
#> identical(eval(call(mychar, x = pi / 4)), myfun(pi / 4))
#[1] TRUE

Resources