Let's assume that we have 3 functions with this minimal functionality:
func1 <- function (x) {
x + 1
}
func2 <- function (x, plus = T) {
if (plus == TRUE) {
x + 2
} else {
x - 5
}
}
func3 <- function (x) {
x + 3
}
I would like to nest this function to each other like this with the pipe (%>%) operator:
library(magrittr)
func1(0) %>% func2(plus = T) %>% func2(plus = F) %>% func3
# result: 1
Which is the equivalent version of it:
func3(func2(func2(func1(0), plus = T), plus = F))
# result: 1
I try to find a method which doesn't require to duplacate the func2() function (because I have to run it many times and also I would like to change the number of function calls and the parameter dinamically). Currently I am not a big expert of apply functions or map package but I guess at least one of it can do this job.
It is of course just a dummy example, my real code is much more complicated, I just try to simplify my problem to find a solution.
I have to use the pipe operator so I only interested in that solutions which also work with pipes.
Write a function that takes the initial x and the outcomes to feed to func2 and loops through those outcomes:
func2_iterate = function(x, outcomes){
for (i in 1:length(outcomes)){
x = func2(x, outcomes[i])
}
return(x)
}
Then run (with func1, func2, func3 as above):
func1(0) %>% func2_iterate(c(T, F)) %>% func3
#result: 1
I'd also like to point out that in this particular case the output of func2_iterate is just its input, plus 2 times the number of T in outcomes, minus 5 times the number of F in outcomes. But I assume you actually have functions that do something more complicated.
Using a partial / compose / invoke combo :
library(tidyverse)
f2b <- invoke(compose, map(c(F,T), ~substitute(partial(func2, plus =.), lst(.))))
func1(0) %>% f2b %>% func3
# [1] 1
Related
I've got a vector that I want to perform the same operation on multiple times in series, modifying one argument, and I can't figure out a way to do it without a loop.
For example, say I want to take the vector c(1, 2, 3) and I want to perform a series of logs on it. In a loop, I would do this:
foo <- 30:32
for(anum in c(2,3,4)){
foo <- log(foo, base = anum)
}
A way to do it without a loop would be:
30:32 |>
log(base = 2) |>
log(base = 3) |>
log(base = 4)
What I'm wanting is to have a function that will take 2:4 as an argument and do this. So something like:
30:32 |> serialfunction(log, 2:4)
Does a function like this exist?
Thanks!
There's not really a built-in function that does exactly that. But you can do this with Reduce. For example
Reduce(function(x, y) {log(x, base=y)}, 2:4, init=30:32)
# [1] 0.2669627 0.2713007 0.2754373
You could create serial function with
serialfunction <- function(x, fun, y) {
Reduce(function(x, y) {fun(x, y)}, y, init=x)
}
30:32 |> serialfunction(log, 2:4)
# [1] 0.2669627 0.2713007 0.2754373
I am working on an R project, and I have many different functions (I'm calculating RMSEs on various data sets with various requirements).
I am currently using the "do.call()" function to invoke the function name I'm passing in, but
this causes my whole system to stall and nothing works. This has happened many times over, and I've had to restart R Studio (using version 4.0.2).
I would like to pass in a function as an argument into my parent function (which is recursive but only to 2 passes), and I would like to be able to pass in the parameters from the parent function to the child functions, as well as the recursive function call.
I'm not sure of the correct execution of this.
Any help on where I'm going wrong is greatly appreciated.
Currently, my code is as follows:
#find_generic_lambda is the parent function that is called, and the FUN argument is the named function I would like to pass in to execute inside
find_generic_lambda <- function(seq_start, seq_end, seq_increment, FUN, detailed_flag = FALSE, training_set, testing_set)
{
lambdas <- seq(seq_start, seq_end, seq_increment)
params = c(lambdas, train_set, test_set)
#invoking the passed-in function here with the parameters I'm setting
#this is where the code stumbles
RMSE <- sapply(lambdas, do.call(FUN, params))
#find the smallest lamdba
qplot(lambdas, RMSE)
#saving the first round lambda
min_lambda_first_try <- lambdas[which.min(RMSE)]
min_lambda_first_try
if (detailed_flag)
{
#if this is the first iteration of the function, continue with taking a 10% lower and 10% higher lambda value to iterate through new lambdas that are much more granuluar, with increments at 10% of what they were previously.
new_lambda_range = (seq_end + seq_start)/10
new_lambda_range
min_lambda_first_try <- find_generic_lambda(seq_start = min_lambda_first_try - new_lambda_range, seq_end = min_lambda_first_try + new_lambda_range,
seq_increment = seq_increment/10, FUN, detailed_flag = FALSE, training_set = training_set, testing_set = testing_set)
}
return (min_lambda_first_try)
}
#this is one of the functions that will be passed in as a parameter
regularized_rmse_3 <- function(l, train_set, test_set)
{
mu <- mean(train_set$rating)
just_the_sum <- train_set %>%
group_by(movieId) %>%
summarize(s = sum(rating - mu), n_i = n())
predicted_ratings <- test_set %>%
left_join(just_the_sum, by='movieId') %>%
mutate(b_i = s/(n_i+l)) %>%
mutate(pred = mu + b_i) %>%
pull(pred)
return(RMSE(predicted_ratings, test_set$rating))
}
rmse3_lambda <- find_generic_lambda(seq_start=0, seq_end=10, seq_increment=0.5,
FUN="regularized_rmse_3",
detailed_flag = TRUE, training_set=training_set, testing_set=testing_set)
Expanding on my comments:
Here's a simplified version of your functions (so I can make example dataset) -
f <- function (l_candidate, FUN) {
RMSE <- sapply(l_candidate, FUN)
l_min_RMSE <- l_candidate[which.min(RMSE)]
return(l_min_RMSE)
}
g <- function (l, trainset, testset) {
p <- mean(trainset + l)
sqrt(mean((testset - p)^2))
}
trainset <- c(1, 1, 2, 1)
testset <- c(3, 4)
Then:
f(1:5, FUN = function (x) g(x, trainset, testset))
# [1] 2
So you pass the function g via a wrapper function into f and it will do the job for you.
Alternative
R allows you to create a function out of another function:
g <- function (trainset, testset) function (l) {
p <- mean(trainset + l)
sqrt(mean((testset - p)^2))
}
g1 <- g(trainset, testset)
g1(1)
# [1] 1.346291
In this situation, g() takes two arguments, and return a function that takes 1 argument l. So you can create a new function g1() out of g().
Then you can pass it to your parent function giving you the same results in this example:
f(1:5, FUN = g1)
# [1] 2
First a quick example of what I'm seeing, then some context as to why I'm doing what I'm doing.
dt = data.table(i=rep(1:3, each=4), t=rep(1:4, times=3), x=runif(12))
dt[, .(sum=sum(x), cnt=.N), keyby=.(i)] # works as expected
# i sum cnt
# 1: 1 2.932400 4
# 2: 2 1.483940 4
# 3: 3 2.113194 4
dt[, .(sum=sum(x), cnt=.N), keyby=list(i)] # same as above
# let j and keyby be specified by user, optionally NULL
j_str = parse(text=".(sum=sum(x), cnt=.N)")
by_str = parse(text="keyby=.(i)")
dt[, eval(j_str), eval(by_str)] # could not find function .
# Error in .(i) : could not find function "."
by_str = parse(text="keyby=list(i)")
dt[, eval(j_str), eval(by_str)] # correct results, but not correct column names
# keyby sum cnt
# 1: 1 2.932400 4
# 2: 2 1.483940 4
# 3: 3 2.113194 4
Notice two problems, I'm mostly concerned about the second (wrong column names).
What I would in particular prefer to do is just pass in one string that gets evaluated inside of the data.table[], but I couldn't get that to work, only i, j, etc. separately
Why am I doing this, well, the simplified version is that I'm writing a function that does this evaluation.
stupidfnc = function(dt, j_str, by_str) {
return(dt[, eval(j_str), eval(by_str)])
}
The longer answer is that I want to loop over files, aggregate, rbind, and then aggregate again. However, the full list of aggregated data is too large to fit in memory. Thus, I'm doing a little bit of looping, rbinding, aggregating, little more looping, rbinding, aggregating, then aggregating the aggregates, then looping ..... I have a function that allows me to write a function to do this in a flexible manner without having to rewrite the loop every time. I've been doing this doing this quite a bit and working with the various loop levels is a higher cognitive burden than it really should be. So I was hoping a function like this would be useful.
That function is below.
#' find z the maximum integer divisor of x st z <= sqrt(x)
#'
#' you can find y = x / z easily enough
#' useful for rbind'ing in chunks w/ a merge or collapse
integer_approx_sqrt = function(x) {
upper = floor(sqrt(x))
for (cand in upper:1) {
if ((x %% cand) == 0) {
break
}
}
return(cand)
}
#' loop over l, apply FUN, and aggregate with j_aggr by by_agg
#'
#' todo
mclapply_rbind_aggr = function(l, FUN, j_aggr, by_aggr, mc.cores=1,
mc.preschedule=F, chunksize=0, ...) {
if (chunksize == 0) {
chunksize = integer_approx_sqrt(length(l))
}
if (length(l) <= chunksize | chunksize == 1) {
dtl = mclapply(l, FUN=FUN, mc.cores=mc.cores,
mc.preschedule=mc.preschedule, ...)
} else {
dtl = lapply(splitIndices(length(l), chunksize),
function(indcs) {
rbindlist(mclapply(indcs, FUN=FUN, mc.cores=mc.cores,
mc.preschedule=mc.preschedule, ...))[,
eval(parse(text=j_aggr)), eval(parse(text=by_aggr))]})
}
return(rbindlist(dtl)[, eval(parse(text=j_aggr)), eval(parse(text=by_aggr))])
}
I have a list of filtering functions f1,f2,f3,f4,.... which take a matrix m and a number of options as input and return a subset of the rows of matrix as output. Now I would like to be able to define in an orderly way some meta-filtering function settings metaf1, metaf2, metaf3,... which would specify the sequential application of a specified nr of filtering functions, e.g. first f2 and then f3, using given options for each. I would like to store these filtering settings in a list of say class "metafiltering", and then have another function apply the filtering steps specified in a given metafiltering object. My idea would be able to in this way allow filtering settings to be stored and applied in an orderly way. How would I achieve this in the most elegant way in R? Or is there perhaps other convenient methods to achieve something like this?
EDIT: to give an example, say I have matrix
m=replicate(10, rnorm(20))
and filtering functions (these are just examples, obviously mine are more complicated :-) )
f1=function(m,opt1,opt2) {
return(m[(m[,2]>opt1)&(m[,1]>opt2),])
}
f2=function(m,opt1) {
return(m[(m[,3]>opt1),])
}
And I have defined the following metafiltering settings of specific class which would specify two functions which would have to be applied sequentially to matrix m
metafilterfuncs=list(fun1=f1(opt1=0.1,opt2=0.2),fun2=f2(opt1=0.5))
class("metafilterfuncs")="metafiltering"
The question I have then is how I could apply the filtering steps of an arbitrary metafiltering function object to given matrix m using the specified functions and settings?
You can do something like this :
You define a sort of functions pieplines where you give a priority for each function.
pipelines <- c(f1=100,f2=300,f3=200)
I define 3 dummy functions here for test:
f1 <- function(m,a) m + a
f2 <- function(m,b) m + b
f3 <- function(m,c) m + c
For each function , you store the argument in another list :
args <- list(f1=c(a=1),f2=c(b=2),f3=c(c=3))
Then you apply your functions :
m <- matrix(1:2,ncol=2)
for (func in names(pipelines[order(pipelines)]))
{
m <- do.call(func,list(m,args[[func]]))
}
pryr has a function, compose, like what you need, but it doesn't quite cut it. The compose function requires the functions to be given one by one, not in a list, and it cannot take arguments. It's also oddly placed in that package. A similar function can be found in plyr, namely each. But this function does not apply functions sequentially, but individually and outputs a named vector (list?).
agstudy provided a solution above, but it suffers from a problem: it can only take scalar arguments because it gives the arguments in a named vector. The solution to this is to use a named list instead. So, here's an improved function to replace the one in pryr.
compose2 = function(x, funcs, args, msg_intermediate = F) {
if (length(funcs) != length(args)) stop("length of functions and arguments must match")
for (i in seq_along(funcs)) {
x = do.call(what = funcs[[i]], args = c(x, args[[i]]))
if ((i != length(funcs)) && msg_intermediate) message(x)
}
x
}
msg_intermediate is a nice debugging argument that messages the intermediate results, so one can easier understand what happens.
Test it:
adder = function(x, n) x + n
compose2(0,
funcs = list(adder, adder, adder),
args = list(list(n = 1), list(n = 2), list(n = 3)),
msg_intermediate = T
)
Outputs:
1
3
[1] 6
This is what you get when you take 0, then add 1 (=1), then add 2 (=3), then add 3 (=6).
The args argument for compose2 takes a list of lists, so that one can supply non-scalar function arguments. Here's an example:
add_div = function(x, n, d) (x + n) / d
compose2(0,
funcs = list(add_div, add_div, add_div),
args = list(list(n = 1, d = 1), list(n = 2, d = 2), list(n = 3, d = 3)),
msg_intermediate = T
)
Output:
1
1.5
[1] 1.5
Which is what you get when you take 0, add 1, divide by 1 (=1), then take 1, add 2 then divide by 2 (=1.5), then take 1.5, add 3 and then divide by 3 (=1.5).
I have a function that I use to get a "quick look" at a data.frame... I deal with a lot of survey data and this acts as a quick tool to see what's what.
f.table <- function(x) {
if (is.factor(x[[1]])) {
frequency <- function(x) {
x <- round(length(x)/n, digits=2)
}
x <- na.omit(melt(x,c()))
x <- cast(x, variable ~ value, frequency)
x <- cbind(x,top2=x[,ncol(x)]+x[,ncol(x)-1], bottom=x[,2])
}
if (is.numeric(x[[1]])) {
frequency <- function(x) {
x[x > 1] <- 1
x[is.na(x)] <- 0
x <- round(sum(x)/n, digits=2)
}
x <- na.omit(melt(x))
x <- cast(x, variable ~ ., c(frequency, mean, sd, min, max))
x <- transform(x, variable=reorder(variable, frequency))
}
return(x)
}
What I find happens is that if I don't define "frequency" outside of the function, it returns wonky results for data frames with continuous variables. It doesn't seem to matter which definition I use outside of the function, so long as I do.
try:
n <- 100
x <- data.frame(a=c(1:25),b=rnorm(100),c=rnorm(100))
x[x > 20] <- NA
Now, select either one of the frequency functions and paste them in and try it again:
frequency <- function(x) {
x <- round(length(x)/n, digits=2)
}
f.table(x)
Why is that?
Crucially, I think this is where your problem is. cast() is evaluating those functions without reference to the function it was called from. Inside cast() it evaluates fun.aggregate via funstofun and, although I don't really follow what it is doing, is getting stats:::frequency and not your local one.
Hence my comment to your Q. What do you wan the function to do? At the moment it would seem necessary to define a "frequency" function in the global environment so that cast() or funstofun() finds it. Give it a unique name so it is unlikely to clash with anything so it should be the only thing found, say .Frequency(). Without knowing what you want to do with the function (rather than what you thought the function [f.table] should do) it is a bit difficult to provide further guidance, but why not have .FrequencyNum() and .FrequencyFac() defined in the global workspace and rewrite your f.table() wrapper calls to cast to use the relevant one?
.FrequencyFac <- function(X, N) {
round(length(X)/N, digits=2)
}
.FrequencyNum <- function(X, N) {
X[X > 1] <- 1
X[is.na(X)] <- 0
round(sum(X)/N, digits=2)
}
f.table <- function(x, N) {
if (is.factor(x[[1]])) {
x <- na.omit(melt(x, c()))
x <- dcast(x, variable ~ value, .FrequencyFac, N = N)
x <- cbind(x,top2=x[,ncol(x)]+x[,ncol(x)-1], bottom=x[,2])
}
if (is.numeric(x[[1]])) {
x <- na.omit(melt(x))
x <- cast(x, variable ~ ., c(.FrequencyNum, mean, sd, min, max), N = N)
##x <- transform(x, variable=reorder(variable, frequency))
## left this out as I wanted to see what cast returned
}
return(x)
}
Which I thought would work, but it is not finding N, and it should be. So perhaps I am missing something here?
By the way, it is probably not a good idea to rely on function that find n (in your version) from outside the function. Always pass in the variables you need as arguments.
I don't have the package that contains melt, but there are a couple potential issues I can see:
Your frequency functions do not return anything.
It's generally bad practice to alter function inputs (x is the input and the output).
There is already a generic frequency function in stats package in base R, which may cause issues with method dispatch (I'm not sure).