I'm trying to set up details for which function to run and which arguments to include at the start of my script, to then later call the function. I'm having trouble specifying arguments to be input into the function.
I have a fixed object
v <- c(1,2,3,5,6,7,8,9,NA)
I want to specify which measurement function I will use as well as any relevant arguments.
Example 1:
chosenFunction <- mean
chosenArguments <- "trim = 0.1, na.rm = T"
Example 2:
chosenFunction <- median
chosenArguments <- "na.rm = F"
Then I want to be able to run this specified function
chosenFunction(v, chosenArguments)
Unfortunately, I can't just put in the string chosenArguments and expect the function to run. Is there any alternative way to specify the arguments to my function?
Updated answer based on OP's clarifications
chosenFunction <- mean
get_summary <- function(x, fun, ...) fun(x, ...)>
v <- 1:100
get_summary(v, chosenFunction, na.rm = TRUE)
# [1] 50.5
Later on if you want to change the function
chosenFunction <- median
get_summary(v, chosenFunction, na.rm = TRUE)
# [1] 50.5
Original answer
get_summary <- function(x, chosenFunction, ...) chosenFunction(x, ...)
v <- 1:100
get_summary(v, mean, na.rm = TRUE, trim = 1)
# [1] 50.5
get_summary(v, median, na.rm = TRUE)
# [1] 50.5
By doing ..., you don't have to specify all arguments
get_summary(mean, na.rm = TRUE)
# [1] 50.5
If we want to calculate mean, we do it by
mean(v, na.rm = TRUE, time = 0.1)
#[1] 5.125
Another way is by using do.call
do.call(mean, list(v, na.rm = TRUE, trim = 0.1))
#[1] 5.125
We can leverage this fact and create a named list for chosenArguments and use it in do.call
chosenFunction <- mean
chosenArguments <- list(na.rm = TRUE, trim = 0.1)
do.call(chosenFunction, c(list(v), chosenArguments))
#[1] 5.125
Related
How can I return the mean, median, and standard deviation within same function in R? All that I can get to return is the last part of the function which calculated the standard deviation. I was thinking that by assigning summarystat(Tail_wags) to b that when I returned 'b' that I would have all three value. Added the result for the three values I need outside of the function after 'b' to see what values are supposed to be.
Dog_biscuits <- c(0,1,2,3,4,5,6,7,8,9,10)
Tail_wags <- c(0,0,1,3,8,13,14,12,15,16,14)
dog_wags<-cbind(Dog_biscuits,Tail_wags)
dog_wags
summarystat<- function(x) {
z1 <- mean(x)
z2<-median(x)
z3<-sd(x)
}
b<-summarystat(Tail_wags)
b
b
[1] 6.497552
> mean(Tail_wags)
[1] 8.727273
> median(Tail_wags)
[1] 12
> sd(Tail_wags)
[1] 6.497552
You can only return one object from a function. The trick to achieve what you want is to return a list:
summarystat<- function(x) {
z1 <- mean(x)
z2 <- median(x)
z3 <- sd(x)
return(list(mean=z1, median=z2, sd=z3))
}
You can combine and return the variables using the generic c() function.
summarystat<- function(x) {
z1 <- mean(x, na.rm = TRUE)
z2<-median(x, na.rm = TRUE)
z3<-sd(x,na.rm = TRUE)
return(c(mean=z1,median=z2,standard_dev=z3))
}
Tail_wags <- c(0,0,1,3,8,13,14,12,15,16,14)
summarystat(Tail_wags)
# mean median standard_dev
# 8.727273 12.000000 6.497552
Your are looking after something like:
summarystat <- function(x) {
my_list <- list("mean" = mean(x), "median" = median(x), "sd" = sd(x))
return(my_list)
}
Usage:
vals <- summarystat(Tail_wags)
> a$mean
> a$sd
> a$median
Function and application:
do.call("rbind", lapply(dog_wags, function(x){
list(mean_val = mean(x),
median_val = median(x),
sd_val = sd(x))
}
)
)
Data:
Dog_biscuits <- c(0,1,2,3,4,5,6,7,8,9,10)
Tail_wags <- c(0,0,1,3,8,13,14,12,15,16,14)
dog_wags <- data.frame(cbind(Dog_biscuits,Tail_wags))
Alternatively, you can get rid of the function completely and use something like pastecs::stat.desc and then subtract the values you want
Dog_biscuits <- c(0,1,2,3,4,5,6,7,8,9,10)
Tail_wags <- c(0,0,1,3,8,13,14,12,15,16,14)
dog_wags<-cbind(Dog_biscuits,Tail_wags)
pastecs::stat.desc(Tail_wags)[["mean"]]
# 8.727273
Check out this article for more summary functions.
A somewhat different approach that lets one choose the functions to return.
Code:
fooapply <- function(x, functions = c("mean", "median", "sd"), na.rm = T){
func <- functions
vec <- c()
for(i in 1:length(func)){
if(na.rm == T){
eval(parse(text = paste0("vec[", i,"]", "<-", func[i], "(x, na.rm = T)")))
}
else{
eval(parse(text = paste0("vec[", i,"]", "<-", func[i], "(x)")))
}
}
names(vec) <- functions
return(vec)
}
Result
To obtain your desired result you can just your vector into the function. Per default, the function will omit NA's and calculate the mean, median and sd.
fooapply(Tail_wags)
mean median sd
8.727273 12.000000 6.497552
Additionally, one can also add or remove functions or swap them out:
fooapply(Tail_wags, c("mean", "median", "IQR"))
Note that some of the included functions will report an error when NA's are included without specifying na.rm = T, others will just report NA as result.
The mean() function, for example, will return NA when calculated for a vector that includes NA. In contrast, IQR() will throw an error when NA's are included within the vector and hence requires the na.rm = T (which is set as TRUE by default) statement in order for fooapply() to work.
I am writing my own function to calculate the mean of a column in a data set and then applying it using apply() but it only returns the first column's mean. Below is my code:
mymean <- function(cleaned_us){
column_total = sum(cleaned_us)
column_length = length(cleaned_us)
return (column_total/column_length)
}
Average_2 <- apply(numeric_clean_usnews,2,mymean,na.rm=T)
We need to use the na.rm=TRUE in the sum and using it in apply is not going to work as mymean doesn't have that argument
mymean <- function(cleaned_us){
column_total = sum(cleaned_us, na.rm = TRUE) #change
column_length = sum(!is.na(cleaned_us)) #change
return(column_total/column_length)
}
Note that colMeans can be used for getting the mean for each column.
In order to pass an na.rm parameter to the function you defined, you need to make it a parameter of the function. The sum() function has an na.rm param, but length() doesn't. So to write the function you are trying to write, you could say:
# include `na.rm` as a param of the argument
mymean <- function(cleaned_us, na.rm){
# pass it to `sum()`
column_total = sum(cleaned_us, na.rm=na.rm)
# if `na.rm` is set to `TRUE`, then don't count `NA`s
if (na.rm==TRUE){
column_length = length(cleaned_us[!is.na(cleaned_us)])
# but if it's `FALSE`, just use the full length
} else {
column_length = length(cleaned_us)
}
return (column_total/column_length)
}
Then your call should work:
Average_2 <- apply(numeric_clean_usnews, 2, mymean, na.rm=TRUE)
Use na.omit()
set.seed(1)
m <- matrix(sample(c(1:9, NA), 100, replace=TRUE), 10)
mymean <- function(cleaned_us, na.rm){
if (na.rm) cleaned_us <- na.omit(cleaned_us)
column_total = sum(cleaned_us)
column_length = length(cleaned_us)
column_total/column_length
}
apply(m, 2, mymean, na.rm=TRUE)
# [1] 5.000 5.444 4.111 5.700 6.500 4.600 5.000 6.222 4.700 6.200
Trying to get my head around Non-Standard Evaluation as used by dplyr but without success. I'd like a short function that returns summary statistics (N, mean, sd, median, IQR, min, max) for a specified set of variables.
Simplified version of my function...
my_summarise <- function(df = temp,
to.sum = 'eg1',
...){
## Summarise
results <- summarise_(df,
n = ~n(),
mean = mean(~to.sum, na.rm = TRUE))
return(results)
}
And running it with some dummy data...
set.seed(43290)
temp <- cbind(rnorm(n = 100, mean = 2, sd = 4),
rnorm(n = 100, mean = 3, sd = 6)) %>% as.data.frame()
names(temp) <- c('eg1', 'eg2')
mean(temp$eg1)
[1] 1.881721
mean(temp$eg2)
[1] 3.575819
my_summarise(df = temp, to.sum = 'eg1')
n mean
1 100 NA
N is calculated, but the mean is not, can't figure out why.
Ultimately I'd like my function to be more general, along the lines of...
my_summarise <- function(df = temp,
group.by = 'group'
to.sum = c('eg1', 'eg2'),
...){
results <- list()
## Select columns
df <- dplyr::select_(df, .dots = c(group.by, to.sum))
## Summarise overall
results$all <- summarise_each(df,
funs(n = ~n(),
mean = mean(~to.sum, na.rm = TRUE)))
## Summarise by specified group
results$by.group <- group_by_(df, ~to.group) %>%
summarise_each(df,
funs(n = ~n(),
mean = mean(~to.sum, na.rm = TRUE)))
return(results)
}
...but before I move onto this more complex version (which I was using this example for guidance) I need to get the evaluation working in the simple version first as thats the stumbling block, the call to dplyr::select() works ok.
Appreciate any advice as to where I'm going wrong.
Thanks in advance
The basic idea is that you have to actually build the appropriate call yourself, most easily done with the lazyeval package.
In this case you want to programmatically create a call that looks like ~mean(eg1, na.rm = TRUE). This is how:
my_summarise <- function(df = temp,
to.sum = 'eg1',
...){
## Summarise
results <- summarise_(df,
n = ~n(),
mean = lazyeval::interp(~mean(x, na.rm = TRUE),
x = as.name(to.sum)))
return(results)
}
Here is what I do when I struggle to get things working:
Remember that, just like the ~n() you already have, the call will have to start with a ~.
Write the correct call with the actual variable and see if it works (~mean(eg1, na.rm = TRUE)).
Use lazyeval::interp to recreate that call, and check this by running only the interp to visually see what it is doing.
In this case I would probably often write interp(~mean(x, na.rm = TRUE), x = to.sum). But running that will give us ~mean("eg1", na.rm = TRUE) which is treating eg1 as a character instead of a variable name. So we use as.name, as is taught to us in vignette("nse").
Trying to get my head around Non-Standard Evaluation as used by dplyr but without success. I'd like a short function that returns summary statistics (N, mean, sd, median, IQR, min, max) for a specified set of variables.
Simplified version of my function...
my_summarise <- function(df = temp,
to.sum = 'eg1',
...){
## Summarise
results <- summarise_(df,
n = ~n(),
mean = mean(~to.sum, na.rm = TRUE))
return(results)
}
And running it with some dummy data...
set.seed(43290)
temp <- cbind(rnorm(n = 100, mean = 2, sd = 4),
rnorm(n = 100, mean = 3, sd = 6)) %>% as.data.frame()
names(temp) <- c('eg1', 'eg2')
mean(temp$eg1)
[1] 1.881721
mean(temp$eg2)
[1] 3.575819
my_summarise(df = temp, to.sum = 'eg1')
n mean
1 100 NA
N is calculated, but the mean is not, can't figure out why.
Ultimately I'd like my function to be more general, along the lines of...
my_summarise <- function(df = temp,
group.by = 'group'
to.sum = c('eg1', 'eg2'),
...){
results <- list()
## Select columns
df <- dplyr::select_(df, .dots = c(group.by, to.sum))
## Summarise overall
results$all <- summarise_each(df,
funs(n = ~n(),
mean = mean(~to.sum, na.rm = TRUE)))
## Summarise by specified group
results$by.group <- group_by_(df, ~to.group) %>%
summarise_each(df,
funs(n = ~n(),
mean = mean(~to.sum, na.rm = TRUE)))
return(results)
}
...but before I move onto this more complex version (which I was using this example for guidance) I need to get the evaluation working in the simple version first as thats the stumbling block, the call to dplyr::select() works ok.
Appreciate any advice as to where I'm going wrong.
Thanks in advance
The basic idea is that you have to actually build the appropriate call yourself, most easily done with the lazyeval package.
In this case you want to programmatically create a call that looks like ~mean(eg1, na.rm = TRUE). This is how:
my_summarise <- function(df = temp,
to.sum = 'eg1',
...){
## Summarise
results <- summarise_(df,
n = ~n(),
mean = lazyeval::interp(~mean(x, na.rm = TRUE),
x = as.name(to.sum)))
return(results)
}
Here is what I do when I struggle to get things working:
Remember that, just like the ~n() you already have, the call will have to start with a ~.
Write the correct call with the actual variable and see if it works (~mean(eg1, na.rm = TRUE)).
Use lazyeval::interp to recreate that call, and check this by running only the interp to visually see what it is doing.
In this case I would probably often write interp(~mean(x, na.rm = TRUE), x = to.sum). But running that will give us ~mean("eg1", na.rm = TRUE) which is treating eg1 as a character instead of a variable name. So we use as.name, as is taught to us in vignette("nse").
Using the following function foo() as a simple example, I'd like to distribute the values given in ... two different functions, if possible.
foo <- function(x, y, ...) {
list(sum = sum(x, ...), grep = grep("abc", y, ...))
}
In the following example, I would like na.rm to be passed to sum(), and value to be passed to grep(). But I get an error for an unused argument in grep().
X <- c(1:5, NA, 6:10)
Y <- "xyzabcxyz"
foo(X, Y, na.rm = TRUE, value = TRUE)
# Error in grep("abc", y, ...) : unused argument (na.rm = TRUE)
It seems like the arguments were sent to grep() first. Is that correct? I would think R would see and evaluate sum() first, and return an error for that case.
Furthermore, when trying to split up the arguments in ..., I ran into trouble. sum()'s formal arguments are NULL because it is a .Primitive, and therefore I cannot use
names(formals(sum)) %in% names(list(...))
I also don't want to assume that the leftover arguments from
names(formals(grep)) %in% names(list(...))
are to automatically be passed to sum().
How can I safely and efficiently distribute ... arguments to multiple functions so that no unnecessary evaluations are made?
In the long-run, I'd like to be able to apply this to functions with a long list of ... arguments, similar to those of download.file() and scan().
Separate Lists If you really want to pass different sets of parameters to different functions then it's probably cleaner to specify separate lists:
foo <- function(x, y, sum = list(), grep = list()) {
list(sum = do.call("sum", c(x, sum)), grep = do.call("grep", c("abc", y, grep)))
}
# test
X <- c(1:5, NA, 6:10)
Y <- "xyzabcxyz"
foo(X, Y, sum = list(na.rm = TRUE), grep = list(value = TRUE))
## $sum
## [1] 55
##
## $grep
## [1] "xyzabcxyz"
Hybrid list / ... An alternative is that we could use ... for one of these and then specify the other as a list, particularly in the case that one of them is frequently used and the other is infrequently used. The frequently used one would be passed via ... and the infrequently used via a list. e.g.
foo <- function(x, y, sum = list(), ...) {
list(sum = do.call("sum", c(x, sum)), grep = grep("abc", y, ...))
}
foo(X, Y, sum = list(na.rm = TRUE), value = TRUE)
Here are a couple of examples of the hybrid approach from R itself:
i) The mapply function takes that approach using both ... and a MoreArgs list:
> args(mapply)
function (FUN, ..., MoreArgs = NULL, SIMPLIFY = TRUE, USE.NAMES = TRUE)
NULL
ii) nls also takes this approach using both ... and the control list:
> args(nls)
function (formula, data = parent.frame(), start, control = nls.control(),
algorithm = c("default", "plinear", "port"), trace = FALSE,
subset, weights, na.action, model = FALSE, lower = -Inf,
upper = Inf, ...)
NULL
Why does grep error before sum?
See that sum is a lot more accommodating with its arguments:
X <- c(1:5, NA, 6:10)
sum(X, na.rm = TRUE, value = TRUE)
## [1] 56
It doesn't failed because it doesn't care about other named arguments, so the value = TRUE simplifies to just TRUE which sums to 1. Incidentally:
sum(X, na.rm = TRUE)
## [1] 55
How to split ... to different functions?
One method (that is very prone to error) is to look for the args for the target functions. For instance:
foo <- function(x, y, ...){
argnames <- names(list(...))
sumargs <- intersect(argnames, names(as.list(args(sum))))
grepargs <- intersect(argnames, names(as.list(args(grep))))
list(sum = do.call(sum, c(list(x), list(...)[sumargs])),
grep = do.call(grep, c(list("abc", y), list(...)[grepargs])))
}
This is prone to error anytime the arguments a function uses are not properly reported by args, such as S3 objects. As an example:
names(as.list(args(plot)))
## [1] "x" "y" "..." ""
names(as.list(args(plot.default)))
## [1] "x" "y" "type" "xlim" "ylim"
## [6] "log" "main" "sub" "xlab" "ylab"
## [11] "ann" "axes" "frame.plot" "panel.first" "panel.last"
## [16] "asp" "..." ""
In this case, you could substitute the appropriate S3 function. Because of this, I don't have a generalized solution for this (though I don't know that it does or does not exist).
You can only pass the ... argument to another function, if that other function includes all named arguments that you pass to ... or if it has a ... argument itself. So for sum, this is no problem (args(sum) returns function (..., na.rm = FALSE)). On the other hand grep has neither na.rm nor ... as an argument.
args(grep)
# function (pattern, x, ignore.case = FALSE, perl = FALSE, value = FALSE,
# fixed = FALSE, useBytes = FALSE, invert = FALSE)
This does not include ... and also does not include a named argument na.rm either. A simple solution is to just define your own function mygrep as follows:
mygrep <- function (pattern, x, ignore.case = FALSE, perl = FALSE, value = FALSE,
fixed = FALSE, useBytes = FALSE, invert = FALSE, ...)
grep(pattern, x, ignore.case, perl, value, fixed, useBytes, invert)
Then it seems to work:
foo <- function(x, y, ...){
list(sum = sum(x, ...), grep = mygrep("abc", y, ...))
}
X <- c(1:5, NA, 6:10)
Y <- "xyzabcxyz"
foo(X, Y, na.rm = TRUE, value = TRUE)
# $sum
# [1] 56
#
# $grep
# [1] "xyzabcxyz"
This answer does not directly the original question but could be helpful to others who experience a similar problem with their own functions (as opposed to existing functions like sum and grep).
#shadow's answer contains an insight that points to a very simple solution in such cases: just make sure your nested functions have ... as an argument and you won't get the unused argument error.
For example:
nested1 <- function(x, a) {
x + a
}
nested2 <- function(x, b) {
x - b
}
f <- function(x, ...) {
if (x >= 0) {
nested1(x, ...)
} else {
nested2(x, ...)
}
}
If we call f(x = 2, a = 3, b = 4) we get an error: Error in nested1(x, ...) : unused argument (b = 4).
But just add a ... to the formals of nested1 and nested2 and run again:
nested1 <- function(x, a, ...) {
x + a
}
nested2 <- function(x, b, ...) {
x - b
}
Now, f(x = 2, a = 3, b = 4) yields the desired result: 5. Problem solved.