I would like to write a function which takes a list of variables out of a dataframe, say:
df <- data.frame(a = c(1,2,3,4,5), b = c(6,7,8,9,10))
And to compute always the same calculation, say calculate the standard deviation like:
test.function <- function(var){
for (i in var) {
paste0(i, "_per_sd") <- i / sd(i)
}
}
In order to create a new variable a_per_sd which is divided by it's standard deviation. Unfortunately, I am stuck and get a Error in paste0(i, "_per_sd") <- i/sd(i) : could not find function "paste0<-" error.
The expected usage should be:
test.function(df$a, df$b)
The expected result should be:
> df$a_per_sd
[1] 0.6324555 1.2649111 1.8973666 2.5298221 3.1622777
And for every other variable which was given.
Somehow I think I should use as.formula and/or eval, but I might be doing a thinking error.
Thank you very much for your attention and help.
Is this what you are after?
df <- data.frame(a = c(1,2,3,4,5), b = c(6,7,8,9,10))
test.function <- function(...){
x <- list(...)
xn <- paste0(unlist(eval(substitute(alist(...)))),
"_per_sd")
setNames(lapply(x, function(y) y/sd(y)), xn)
}
cbind(df, test.function(df$a, df$b))
#> a b df$a_per_sd df$b_per_sd
#> 1 1 6 0.6324555 3.794733
#> 2 2 7 1.2649111 4.427189
#> 3 3 8 1.8973666 5.059644
#> 4 4 9 2.5298221 5.692100
#> 5 5 10 3.1622777 6.324555
Created on 2020-07-23 by the reprex package (v0.3.0)
The question is not completely clear to me, but you might get sd of rows/columns or vectors by these approaches:
apply(as.matrix(df), MARGIN = 1, FUN = sd) #across rows
#[1] 3.535534 3.535534 3.535534 3.535534 3.535534
apply(as.matrix(df), MARGIN = 2, FUN = sd) #across columns
# a b
#1.581139 1.581139
lapply(df, sd) #if you provide list of vectors (columns of `df` in this case)
#$a
#[1] 1.581139
#
#$b
#[1] 1.581139
I got this far. Is this what you are looking for?
test.function <- function(var)
{
newvar = paste(var, "_per_sd")
assign(newvar, var/sd(var))
get(newvar)
}
Input:
test.function(df$a)
Result:
[1] 0.6324555 1.2649111 1.8973666 2.5298221 3.1622777
I got the idea from here: Assignment using get() and paste()
At the end this is what my code looks like:
test.function <- function(...){
x <- list(...)
xn <- paste0(unlist(eval(substitute(alist(...)))),
"_per_sd")
setNames(lapply(x, function(y) y/sd(y, na.rm = TRUE)), xn)
}
test.function.wrap <- function(..., dataframe) {
assign(deparse(substitute(dataframe)), cbind(dataframe, test.function(...)) , envir=.GlobalEnv)
}
test.function.wrap(df$a, df$b , dataframe = df)
To be able to assign the new variables to the existing dataframe, I put the (absolutely genius) tips together and wrapped the function in another function to do the trick. I am aware it might not be as elegant, but it does the work!
Related
I have multiple objects and I need to apply some function to them, in my example mean. But the function call shouldn't include list, it must look like this: my_function(a, b, d).
Advise how to do it please, probably I need quote or substitute, but I'm not sure how to use them.
a <- c(1:15)
b <- c(1:17)
d <- c(1:19)
my_function <- function(objects) {
lapply(objects, mean)
}
my_function(list(a, b, d))
A possible solution:
a <- c(1:15)
b <- c(1:17)
d <- c(1:19)
my_function <- function(...) {
lapply(list(...), mean)
}
my_function(a, b, d)
#> [[1]]
#> [1] 8
#>
#> [[2]]
#> [1] 9
#>
#> [[3]]
#> [1] 10
To still be able to benefit from the other arguments of mean such as na.rm= and trim=, i.e. to generalize, we may match the formalArgs with the dots and split the call accordingly.
my_function <- function(...) {
cl <- match.call()
m <- match(formalArgs(base:::mean.default), names(cl), 0L)
vapply(as.list(cl)[-c(1L, m)], function(x) {
eval(as.call(c(quote(base:::mean.default), list(x), as.list(cl[m]))))
}, numeric(1L))
}
## OP's example
my_function(a, b, d)
# [1] 8 9 10
## generalization:
set.seed(42)
my_function(rnorm(12), rnorm(5), c(NA, rnorm(3)))
# [1] 0.7553736 -0.2898547 NA
set.seed(42)
my_function(rnorm(12), rnorm(5), c(NA, rnorm(3)), na.rm=TRUE)
# 0.7553736 -0.2898547 -1.2589363
set.seed(42)
my_function(rnorm(12), rnorm(5), c(NA, rnorm(3)), na.rm=TRUE, trim=.5)
# 0.5185655 -0.2787888 -2.4404669
Data:
a <- 1:15; b <- 1:17; d <- 1:19
I want my function to be able to take a value or a column name. How can I do this with data.table?
library(data.table)
df <- data.table(a = c(1:5),
b = c(5:1),
c = c(1, 3, 5, 3, 1))
myfunc <- function(val) {
df[a >= val]
}
# This works:
myfunc(2)
# This does not work:
myfunc("c")
If I define my function as:
myfunc <- function(val) {
df[a >= get(val)]
}
# This doesn't work:
myfunc(2)
# This works:
myfunc("c")
What is the best way to resolve this?
Edit: To be clear, I want to results to be the same as:
# myfunc(2)
df %>%
filter(a >= 2)
# myfunc("c")
df %>%
filter(a >= c)
EDIT:
Thanks all for the responses, I think I like dww's answer the best.
I wish it was as easy as in dplyr, where I can do:
myfunc <- function(val) {
df %>%
filter(a >= {{val}})
}
# Both work:
myfunc(2)
myfunc(c)
If you build and parse the whole expression, then you can evaluate it in its entirety. For example
myfunc <- function(val) {
df[eval(parse(text=paste("a >= ", val)))]
}
Though relying on a function that lets you mix values and variable names in the same parameter might be dangerous. Especially in the case where you actually wanted to match on character values rather than variable names. If you passed in the whole expression you could do
myfunc <- function(expr) {
expr <- substitute(expr)
df[eval(expr)]
}
myfunc(a>=3)
myfunc(a>=c)
The question did not actually define the desired behavior so we assume that df must be a data.table and if a character string is passed then the column of that name should be returned and if a number is passed then those rows whose a column exceed that number should be returned.
Define an S3 generic and methods for character and default.
myfunc <- function(x, data = df) UseMethod("myfunc")
myfunc.character <- function(x, data = df) data[[x]]
myfunc.default <- function(x, data = df) data[a > x]
myfunc(2)
## a b c
## 1: 3 3 5
## 2: 4 2 3
## 3: 5 1 1
myfunc("c")
## [1] 1 3 5 3 1
I am trying to modify the output of tapply to obtain a vertical transposition of the results.
Something like this:
Levels of y Mean of x
A 1.7
B 3.5
C 5.0
instead of:
A B C
1.7 3.5 5.0
I have managed to produce a dataframe, by:
myfunction=function(x,y,FUN,...) {
array1<-tapply(x,y,FUN,...)
a<-data.frame(names(array1),array1)
rownames(a)<-NULL
print(a)
}
attach(InsectSprays)
myfunction(count,spray,mean)
This works and produces this:
names.array1. array1
1 A 14.500000
2 B 15.333333
3 C 2.083333
4 D 4.916667
5 E 3.500000
6 F 16.666667
Problem 1)
Now I would like to modify the function in order to change colnames of the dataframe using the arguments which are passed to myfunction at the call of the function itself (in this specific case "spray" and "Sum of count").
I have tried something like this
myfunction=function(x,y,FUN,...) {
array1<-tapply(x,y,FUN,...)
a<-data.frame(names(array1),array1)
rownames(a)<-NULL
colnames(a)<-c(y,print(FUN,"of",x)
print(a)
}
but I think R tries to use the entire vector y instead of its name.
I can not figure out what the solution maybe.
I have also tried with args() and formals() without luck.
Problem 2)
I would like to call myfunction in this way, passing data=... to tapply from the original call (to avoid attaching and detaching the dataset or passing variables in the form df$variable1).
I have tried:
myfunction=function(x,y,FUN,...) {
array1<-tapply(x,y,FUN,...)
a<-data.frame(names(array1),array1)
rownames(a)<-NULL
print(a)
}
myfunction<-(count,spray,sum,data=InsectSprays)
but tapply does not find the object "spray".
Obviously the solution to all my problems may have been using aggregate(), but I think the solutions to these questions will teach me a lot about writing functions.
Thank you very much for your help.
The method you are trying to employ is called non-standard evaluation, and it is used extensively in the tidyverse family of packages, as well as some of the functions in base R such as with, within and the $ operator.
You might wish to explore the concept here.
In the meantime, it is also possible to use a function in base R that employs non-standard evaluation using deparse and substitute:
myfunction <- function(x, y, data, FUN, ...)
{
x <- deparse(substitute(x))
y <- deparse(substitute(y))
array1 <- tapply(data[[x]], data[[y]], FUN, ...)
a <- setNames(data.frame(names(array1),array1),
c(y, paste(deparse(substitute(FUN)), "of", y)))
rownames(a) <- NULL
print(a)
}
myfunction(count, spray, data = InsectSprays, mean)
#> spray mean of spray
#> 1 A 14.500000
#> 2 B 15.333333
#> 3 C 2.083333
#> 4 D 4.916667
#> 5 E 3.500000
#> 6 F 16.666667
myfunction(cyl, gear, mtcars, sum)
#> gear sum of gear
#> 1 3 112
#> 2 4 56
#> 3 5 30
A more advanced version of this function would also allow you to pass vectors directly without a data argument:
myfunction <- function(x, y, data, FUN, ...)
{
if (missing(data)) data <- parent.frame()
y_name <- deparse(substitute(y))
col_name <- paste(deparse(substitute(FUN)), "of", y_name)
x <- eval(substitute(x), envir = as.environment(data))
y <- eval(substitute(y), envir = as.environment(data))
array1 <- tapply(x, y, FUN, ...)
a <- setNames(data.frame(names(array1), array1), c(y_name, col_name))
rownames(a) <- NULL
print(a)
}
This has the same functionalty as the first example, but in addition you can run it using vectors in the calling environment:
var1 <- 1:10
var2 <- rep(1:2, 5)
myfunction(var1, var2, FUN = median)
#> var2 median of var2
#> 1 1 5
#> 2 2 6
Created on 2020-05-27 by the reprex package (v0.3.0)
Toy example:
> myfn = function(a,x){sum(a*x)}
> myfn(a=2, x=c(1,2,3))
[1] 12
Good so far. Now:
> df = data.frame(a=c(4,5))
> df$ans = myfn(a=df$a, x=c(1,2,3))
Warning message:
In a * x : longer object length is not a multiple of shorter object length
> df
a ans
1 4 26
2 5 26
What I want to happen is that for the first row, it is as if I called myfn(a=4, x=c(1,2,3), giving an answer of 24, and for the second row, it is as if I called myfn(a=5, x=c(1,2,3) giving an answer of 30. How do I do this? Thank you.
EDIT: slightly more complex version. Now suppose that the function is
myfn = function(a,b, x){sum((a+b)*x)}
and that I have the data frame
df = data.frame(a=c(4,5), b=c(6,7), c=c(9,9))
I want to create df$ans such that, for the first row it is as if I called myfn(a=4, b=6, x=c(1,2,3) and for the second for it is as if I called myfn(a=5, b=7, x=c(1,2,3), that is, use df$x for a, df$y for b, and ignore df$z.
Something like this would work:
myfn = function(a,x){
return(sum(a*x))
}
df <- data.frame(a=c(4,5))
df$ans <- apply(df, 1, myfn, x = c(1,2,3))
df$ans
a ans
1 4 24
2 5 30
** Edited Based On User Edit **
df = data.frame(a=c(4,5), b=c(6,7), c=c(9,9))
df$ans <- apply(df[, c("a", "b")], 1, function(y) sum((y['a']+y['b'])*c(1,2,3)))
a b c ans
1 4 6 9 60
2 5 7 9 72
There are several ways this can be done, each with it's own charms. If you don't want to modify the function I would just do
mapply(myfn, df$x, df$y, MoreArgs = list(x = 1:3))
Alternatively, you can bake the iteration right into the function, e.g,
myfn = function(a,b, x){
sapply(a+b, function(ab) {
sum(ab*x)
})
}
myfn(df$x, df$y, 1:3)
That's probably the way I would do it.
I want to create a string in a loop and use this string as object in this loop. Here is a simplified example:
for (i in 1:2) {
x <- paste("varname",i, sep="")
x <- value
}
the loop should create varname1, varname2. Then I want to use varname1, varname2 as objects to assign values. I tried paste(), print() etc.
Thanks for help!
You could create the call() to <- and then evaluate it. Here's an example,
value <- 1:5
for (i in 1:2) {
x <- paste("varname",i, sep="")
eval(call("<-", as.name(x), value))
}
which creates the two objects varname1 and varname2
varname1
# [1] 1 2 3 4 5
varname2
# [1] 1 2 3 4 5
But you should really try to avoid assigning to the global environment from with in a method/function. We could use a list along with substitute() and then we have the new variables together in the same place.
f <- function(aa, bb) {
eval(substitute(a <- b, list(a = as.name(aa), b = bb)))
}
Map(f, paste0("varname", 1:2), list(1:3, 3:6))
# $varname1
# [1] 1 2 3
#
# $varname2
# [1] 3 4 5 6
assign("variableName", 5)
would do that.
For example if you have variable names in array of strings you can set them in loop as:
assign(varname[1], 2 + 2)
More and more information
https://stat.ethz.ch/R-manual/R-patched/library/base/html/assign.html
#MahmutAliĆZKURAN has answered your question about how to do this using a loop. A more "R-ish" way to accomplish this might be:
mapply(assign, <vector of variable names>, <vector of values>,
MoreArgs = list(envir = .GlobalEnv))
Or, as in the case you specified above:
mapply(assign, paste0("varname", 1:2), <vector of values>,
MoreArgs = list(envir = .GlobalEnv))
I had the same issue and for some reason my apply's weren't working (lapply, assign directly, or my preferred goto, mclapply)
But this worked
vectorXTS <- mclapply(symbolstring,function(x)
{
df <- symbol_data_set[symbol_data_set$Symbol==x,]
return(xts(as.data.frame(df[,-1:-2]),order.by=as.POSIXct(df$Date)))
})
names(symbolstring) <- symbolstring
names(vectorXTS) <- symbolstring
for(i in symbolstring) assign(symbolstring[i],vectorXTS[i])