How to make a function(x) accept x as a column name? - r

I have the following setup:
mydata:
today_date
r1 11.11.21
r2 11.11.21
r3 11.11.21
I want to convert column like 'today_date' to a date using
as.Date(today_date,tryFormats = c("%d.%m.%Y")).
So I'm using the following function, which is supposed to change the corresponding column to proper dates:
myfun <- function(x){
x<- as.Date(x, tryFormats = c("%d.%m.%Y"))
}
In this function x is representing a variable corresponding to: mydata$today_date
Sadly, x is properly representing the object that's to be replaced, so instead of:
myfun(mydata$today_date)
I still have to use:
mydata$today_date<- myfun(mydata$today_date)
How can I manipulate the function so the as.Date()-functionality is directly applied? I'm pretty certain that the variable in myfun(x) is not properly able to represent the subsection of my dataframe that I want to change. Any help is very welcome!

Try doing this.
df <- data.frame(today_date = c("11.11.21","11.11.21","11.11.21"))
myfun <- function(df, var = 'today_date'){
df[[var]] <- as.Date(df[[var]], tryFormats = c("%d.%m.%Y"))
return(df)
}
The output is
> myfun(df, "today_date")
today_date
1 0021-11-11
2 0021-11-11
3 0021-11-11

I like the magrittr assignment pipe syntax for this.
library(magrittr)
mydata$today_date %<>% myfun()
Instead of mydata$today_date<- myfun(mydata$today_date)

Related

mutate not adding the correct column name when performing calculation

When I try to add a column within a function using inputs from the function, a column is added with the wrong name. Here is a sample of the data:
AllGlut1 <- data.frame(Date = c("11/1/2021", "11/2/2021", "11/3/2021"), Row = c(3, 6, 8), d.15N.14N = c(-4.593, -4.427, -4.436))
known <- "d15N_known"
RefMaterials <- data.frame(d15N_known = c(6.485, 2.632, 9.235), d13C_known = c(-21.523, -23.344, -24.892))
colm <- "d.15N.14N"
driftcorr <- function(colm, known, df){
AllGlut1 <- AllGlut1 %>% mutate(res_drift = RefMaterials[1,known] - AllGlut1[colm])
return(AllGlut1)
}
results <- driftcorr(colm, known, AllGlut1)
When I just do:
res_drift <- RefMaterials[1,known] - AllGlut1[colm]
in the console, it works perfectly fine.
Anybody know what is happening here?
Use [, colm] instead of [colm] to reference the column of AllGlut1:
driftcorr <- function(colm, known, df){
AllGlut1 <- AllGlut1 %>%
mutate(res_drift = RefMaterials[1,known] - AllGlut1[, colm])
return(AllGlut1)
}
or, as #Martin Gal says, use RefMaterials[1,known] - !!sym(colm) (I checked, it does work ...)
AllGlut1[colm] returns a one-column data frame
AllGlut1[, colm] returns a vector if AllGlut1 is a data frame, or a one-column tibble if AllGlut1 is a tibble
AllGlut1[[colm]] always returns a vector (as does pull(AllGlut1, colm) or AllGlut1[,colm, drop=TRUE])
It looks like you're using a mixture of base-R and tidyverse approaches, which can potentially get confusing ...

Calling a character string into object names within a function

I've currently got a very lengthy and repetitive bit of code for data normalisation and inversion ((x-min)/(max-min)*-1)+1) that I want to clean up a bit.
This is a small sample of what it currently looks like:
W3_E1_Norm_New <- W3_E1_Average%>%
mutate(W3_E1_Norm_New = ((W3_E1_zoo-W3_E1_Min)/(W3_E1_Max-W3_E1_Min)*-1)+1)
W3_E2_Norm_New <- W3_E2_Average%>%
mutate(W3_E2_Norm_New = ((W3_E2_zoo-W3_E2_Min)/(W3_E2_Max-W3_E2_Min)*-1)+1)
W3_E3_Norm_New <- W3_E3_Average%>%
mutate(W3_E3_Norm_New = ((W3_E3_zoo-W3_E3_Min)/(W3_E3_Max-W3_E3_Min)*-1)+1)
Each 'W3_E1' refers to a sample ID, and at present each sample ID requires the two lines of code to be written out each time.
Ideally I'd like to write a function which can call a character string (Sample_IDs) into the names of each data frame, so something like
a_Norm_New
would return
W3_E1_Norm_New
then
W3_E2_Norm_New
etc.
Is there a way to write a function that could accomplish this?
Many thanks
I don't have your data but this should work. Define a function:
my_fun <- function (x) {
norm_new <- paste0(x,"_Norm_New")
average <- paste0(x,"_Average")
zoo <- paste0(x, "_zoo")
min <- paste0(x, "_Min")
max <- paste0(x, "_Max")
df <- get(average) %>%
mutate(new_norm = ((zoo - min) / (max - min) * - 1) + 1)
assign(df, norm_new)
}
Then run a for loop:
Sample_IDs <- c("W3_E1", "W3_E2", "W3_E3")
for (i in Sample_IDs) {
my_fun(i)
}
With data.table, it is very easy to write functions that use quoted variable names (see a blog post I wrote on the subject).
Here, we paste the pattern of your column name everywhere with the sufx variable:
library(data.table)
normalize <- function(dt, sufx = "W3_E1"){
df <- as.data.table(dt)
df[, (paste0(sufx,"_Norm_New")) := (
(get(paste0(sufx,_zoo)) - get(paste0(sufx,"_Min"))
)/(
get(paste0(sufx,"_Max")) - get(paste0(sufx,"_Min"))
)*-1)+1)
}
Here the code is not easy to read because I wanted to show that this can be done in one line but you can give more readability easily.
In this solution, you use get to unquote your variable name.

dplyr and overlapping variable names with surrounding environment

Let's say I have a (dplyr/tibble) data-frame/tbl constructed like so:
df <- data_frame(x = 1:10)
Now, I'd like to use this within a function that works with df via some dplyr verbs, like so:
myfun <- function(df, x) {
x <- doSomeStuffTo(x)
filter(df, x == x)
}
But this will always return the full df... I'm trying to figure out a way to implement scoping within a dplyr verb, something like:
filter_(df, ~x == x)
... which doesn't work, either. In some other languages, you might be able to achieve this via something like:
df.filter(this.x == x)
... where this refers to the df instance.
My only work-around so far is naming the function's variable like so:
myfun <- function(df, query_x) {
query_x <- doSomeStuffTo(query_x)
filter(df, x == query_x)
}
I suspect this is doable (without using a name like query_x) somehow with SE dplyr verbs (e.g. filter_), but I haven't stumbled upon the correct pattern yet. Anyone here have the answer?
To dynamically build different dplyr commands you typically use the standard evaluation versions of the functions (the ones with the underscores) and the lazyeval package. Here's how you could change your function
doSomeStuffTo <- function(x) {x+1}
myfun <- function(df, x) {
x <- doSomeStuffTo(x)
filter_(df, lazyeval::interp(~x == y, y=x))
}
df <- data_frame(x = 1:10)
myfun(df,3)
but even in the interp we can't have x==x because it's not clear which x you want to replace. Both filter(df, 3==x) and filter(df, x==3) work with dplyr. You can have constants or column names on either side of the equality.
If you use filter_ you can pass logical expressions via quote:
myfun <- function(df, t) {
df$x <- 5*df$x
filter_(df, t )
}
> myfun(df, t= quote(x < 25) )
# A tibble: 4 x 1
x
<dbl>
1 5
2 10
3 15
4 20
I stumbled into the same issue. Instead of wrangling with even more complex evaluations, it's usually easier to just rename the function argument. Like this:
myfun <- function(df, x) {
x_ <- doSomeStuffTo(x)
filter(df, x == x_)
}
This solution is still dangerous because we might hit another variable called x_. One can be defensive about this by checking the variable names in df and making sure to pick one that isn't there. Or more lazily, one can use very implausible variable names. I often use stuff like _____temp.
Maybe the new dplyr 0.6.0 evaluation system will handle this better. See the notes about the new system, tidyeval.

character string as function argument r

I'm working with dplyr and created code to compute new data that is plotted with ggplot.
I want to create a function with this code. It should take a name of a column of the data frame that is manipulated by dplyr. However, trying to work with columnnames does not work. Please consider the minimal example below:
df <- data.frame(A = seq(-5, 5, 1), B = seq(0,10,1))
library(dplyr)
foo <- function (x) {
df %>%
filter(x < 1)
}
foo(B)
Error in filter_impl(.data, dots(...), environment()) :
object 'B' not found
Is there any solution to use the name of a column as a function argument?
If you want to create a function which accepts the string "B" as an argument (as in you question's title)
foo_string <- function (x) {
eval(substitute(df %>% filter(xx < 1),list(xx=as.name(x))))
}
foo_string("B")
If you want to create a function which accepts captures B as an argument (as in dplyr)
foo_nse <- function (x) {
# capture the argument without evaluating it
x <- substitute(x)
eval(substitute(df %>% filter(xx < 1),list(xx=x)))
}
foo_nse(B)
You can find more information in Advanced R
Edit
dplyr makes things easier in version 0.3. Functions with suffixes "_" accept a string or an expression as an argument
foo_string <- function (x) {
# construct the string
string <- paste(x,"< 1")
# use filter_ instead of filter
df %>% filter_(string)
}
foo_string("B")
foo_nse <- function (x) {
# capture the argument without evaluating it
x <- substitute(x)
# construct the expression
expression <- lazyeval::interp(quote(xx < 1), xx = x)
# use filter_ instead of filter
df %>% filter_(expression)
}
foo_nse(B)
You can find more information in this vignette
I remember a similar question which was answered by #Richard Scriven. I think you need to write something like this.
foo <- function(x,...)filter(x,...)
What #Richard Scriven mentioned was that you need to use ... here. If you type ?dplyr, you will be able to find this: filter(.data, ...) I think you replace .data with x or whatever. If you want to pick up rows which have values smaller than 1 in B in your df, it will be like this.
foo <- function (x,...) filter(x,...)
foo(df, B < 1)

Create a list of functions from a vector of characters

Thanks in advance, and sorry if this question has been answered previously - I have looked pretty extensively. I have a dataset containing a row of with concatenated information, specifically: name,color code,some function expression. For example, one value may be:
cost#FF0033#log(x)+6.
I have all of the code to extract the information, and I end up with a vector of expressions that I would like to convert to a list of actual functions.
For example:
func.list <- list()
test.func <- c("x","x+1","x+2","x+3","x+4")
where test.func is the vector of expressions. What I would like is:
func.list[[3]]
To be equivalent to
function(x){x+3}
I know that I can create a function using:
somefunc <- function(x){eval(parse(text="x+1"))}
to convert a character value into a function. The problem comes when I try and loop through to make multiple functions. For an example of something I tried that didn't work:
for(i in 1:length(test.func)){
temp <- test.func[i]
f <- assign(function(x){eval(expr=parse(text=temp))})
func.list[[i]] <- f
}
Based on another post (http://stats.stackexchange.com/questions/3836/how-to-create-a-vector-of-functions) I also tried this:
makefunc <- function(y){y;function(x){y}}
for(i in 1:length(test.func)){
func.list[[i]] <- assign(x=paste("f",i,sep=""),value=makefunc(eval(parse(text=test.func[i]))))
}
Which gives the following error: Error in eval(expr, envir, enclos) : object 'x' not found
The eventual goal is to take the list of functions and apply the jth function to the jth column of the data.frame, so that the user of the script can specify how to normalize each column within the concatenated information given by the column header.
Maybe initialize your list with a single generic function, and then update them using:
foo <- function(x){x+3}
> body(foo) <- quote(x+4)
> foo
function (x)
x + 4
More specifically, starting from a character, you'd probably do something like:
body(foo) <- parse(text = "x+5")
Just to add onto joran's answer, this is what finally worked:
test.data <- matrix(data=rep(1,25),5,5)
test.data <- data.frame(test.data)
test.func <- c("x","x+1","x+2","x+3","x+4")
func.list <- list()
for(i in 1:length(test.func)){
func.list[[i]] <- function(x){}
body(func.list[[i]]) <- parse(text=test.func[i])
}
processed <- mapply(do.call,func.list,lapply(test.data,list))
Thanks again, joran.
This is what I do:
f <- list(identity="x",plus1 = "x+1", square= "x^2")
funCreator <- function(snippet){
txt <- snippet
function(x){
exprs <- parse(text = txt)
eval(exprs)
}
}
listOfFunctions <- lapply(setNames(f,names(f)),function(x){funCreator(x)}) # I like to have some control of the names of the functions
listOfFunctions[[1]] # try to see what the actual function looks like?
library(pryr)
unenclose(listOfFunctions[[3]]) # good way to see the actual function http://adv-r.had.co.nz/Functional-programming.html
# Call your funcions
listOfFunctions[[2]](3) # 3+1 = 4
do.call(listOfFunctions[[3]],list(3)) # 3^2 = 9
attach(listOfFunctions) # you can also attach your list of functions and call them by name
square(3) # 3^2 = 9
identity(7) # 7 ## masked object identity, better detach it now!
detach(listOfFunctions)

Resources