R Pipelining with Anonymous Functions - r

I have a question which is an extension of another question.
I am wanting to be able to pipeline anonymous functions. In the previous question the answer to pipeline defined functions was to create a pipeline operator "%|>%" and to define it this way:
"%|>%" <- function(fun1, fun2){
function(x){fun2(fun1(x))}
}
This would allow you to call a series of functions while continually passing the result of the previous function to the next. The caveat was that the functions to to be predefined. Now I'm trying to figure how to do this with anonymous functions. The previous solution which used predefined functions looks like this:
square <- function(x){x^2}
add5 <- function(x){x + 5}
pipelineTest <-
square %|>%
add5
Which gives you this behviour:
> pipelineTest(1:10)
[1] 6 9 14 21 30 41 54 69 86 105
I would like to be able to define the pipelineTest function with anonymous functions like this:
anonymousPipelineTest <-
function(x){x^2} %|>%
function(x){x+5} %|>%
x
When I try to call this with the same arguments as above I get the following:
> anonymousPipelineTest(1:10)
function(x){fun2(fun1(x))}
<environment: 0x000000000ba1c468>
What I'm hoping to get is the same result as pipelineTest(1:10). I know that this is a trivial example. What I'm really trying to get at is a way to pipeline anonymous functions. Thanks for the help!

Using Compose, and calling the resulting function gives this:
"%|>%" <- function(...) Compose(...)()
Now get rid of the 'x' as the final "function" (replaced with an actual function, that is not needed but here for example):
anonymousPipelineTest <-
function(x){x^2} %|>%
function(x){x+5} %|>% function(x){x}
anonymousPipelineTest(1:10)
[1] 6 9 14 21 30 41 54 69 86 105

This is an application of an example offered on the ?funprog help page:
Funcall <- function(f, ...) f(...)
anonymousPipelineTest <- function(x) Reduce( Funcall, list(
function(x){x+5}, function(x){x^2}),
x, right=TRUE)
anonymousPipelineTest(1:10)
#[1] 6 9 14 21 30 41 54 69 86 105

I am putting up an answer which is the closest thing I've found for other people looking for the same thing. I won't give myself point for the answer though because it is not what I am wanting.
Returning a Function:
If you want to put several functions together the easiest thing I've found is to use the 'Compose' function found in the 'Functional' package for R. It would look something like this:
anonymousPipe <- Compose(
function(x){x^2},
function(x){x+5})
This allows you to call this series of functions like this:
> anonymousPipe(1:10)
[1] 6 9 14 21 30 41 54 69 86 105
Returning Data:
If all you want to do is start with some data and send it through a series of transformations (my original intent) then the first function in the 'Compose' function should be your starting data and after the close of the 'Compose' function add a parenthesis pair to call the function. It looks like this:
anonymousPipeData <- Compose(
seq(1:10),
function(x){x^2},
function(x){x+5})()
'anonymousPipeData' is now the data which is a result of the series of functions. Please note the pair of parenthesis at the end. This is what causes R to return the data rather than a function.

Related

Using plyr ldply parallel with function within function

I have a data frame with multiple IDs and I am trying to perform feature extraction on the different ID sets. The data looks like this:
id x y
1 3812 60 7
2 3812 63 105
3 3812 65 1000
4 3812 69 8
5 3812 75 88
6 3812 78 13
where id takes on about 200 different values. So I am trying to extract features from the (x,y) data, and I'd like to do it in parallel, since for some datasets, doing it sequentially can take about 20 minutes or so. Right now I am using dplyr as such:
x = d %>% group_by(id) %>% do(data.frame(getFeatures(., func_args))
where func_args are just additional function inputs to the function getFeaures. I am trying to use plyr::ldply with parallel=TRUE to do this, but there is a problem in that within getFeatures, I am using another function that I've written. So, when I try to run parallel, I get an error:
Error in do.ply(i) :
task 1 failed - "could not find function "desparsify""
In addition: Warning messages:
1: <anonymous>: ... may be used in an incorrect context: ‘.fun(piece, ...)’
Error in do.ply(i) :
task 1 failed - "could not find function "desparsify""
where desparsify is a custom function written to process the (x,y) data (it effectively adds zeros to x locations that are not present in the dataset). I get a similar error when I try to use the cosine function from package lsa. Is there a way to use parallel processing when calling external/non-base functions in R?
You don't show how you set up plyr to parallelize, but I think I can guess what you're doing. I also guess you're on Windows. Here's a teeny standalone example illustrating what's going on:
library(plyr)
## On Windows, doParallel::registerDoParallel(2) becomes:
cl <- parallel::makeCluster(2)
doParallel::registerDoParallel(cl)
desparsify <- function(x) sqrt(x)
y <- plyr::llply(1:3, function(x) desparsify(x), .parallel=TRUE)
## Error in do.ply(i) :
## task 1 failed - "could not find function "desparsify""
If you use doFuture instead of doParallel, the underlying future framework will make sure 'desparsify' is found, e.g.
library(plyr)
doFuture::registerDoFuture()
future::plan("multisession", workers = 2)
desparsify <- function(x) sqrt(x)
y <- plyr::llply(1:3, function(x) desparsify(x), .parallel=TRUE)
str(y)
## List of 3
## $ : num 1
## $ : num 1.41
## $ : num 1.73
(disclaimer: I'm the author of the future framework)
PS. Note that plyr is a legacy package no longer maintained. You might want to look into future.apply, furrr, or foreach with doFuture as alternatives for parallelization.
There is. Take a look in the parApply functions family. I usually use the parLapply one.
You'll need to set the number of cores with cl <- makeCluster(number of cores) and pass it, together with a vector of your ids (may depend on how your functions identify the entries for each id) and your functions, to parLapply to produce a list with the output of your function applied to each group in parallel.
cl <- makeCluster(number of cores)
ids=1:10
clusterExport(cl=cl,varlist=c('variable name','function name')) ## in case you need to export variable/functions
result=parLapply(cl=cl,ids, your function)
stopCluster(cl)

Map_dbl in R to replace for loop

This is an easy question, but I had problems solving it so please don't laugh at me.
I'm given a task to re-create my own function for mean in R instead of using the in-built mean function.
The condition for my function is that I need to use map_dbl to handle any iteration in my function.
I know that mean = (sum of all elements)/(number of elements)
The question is, does anyone knows how to calculate the sum of all elements using map_dbl?
A bit overkill:
x <- c(1:10)
counter <- 0
mapsum <- map_dbl(x, ~{counter <<- counter + .x})
mapsum
[1] 1 3 6 10 15 21 28 36 45 55
tail(mapsum,1)
55
As mentionned in comments, this works but sum/mean is a reduce operation, not a map operation.

Determine when a sequence of numbers has been broken in R

Say I have a series of numbers:
seq1<-c(1:20,25:40,48:60)
How can I return a vector that lists points in which the sequence was broken, like so:
c(21,24)
[1] 21 24
c(41,47)
[1] 41 47
Thanks for any help.
To show my miserably failing attempt:
nums<-min(seq1):max(seq1) %in% seq1
which(nums==F)[1]
res.vec<-vector()
counter<-0
res.vec2<-vector()
counter2<-0
for (i in 2:length(seq1)){
if(nums[i]==F & nums[i-1]!=F){
counter<-counter+1
res.vec[counter]<-seq1[i]
}
if(nums[i]==T & nums[i-1]!=T){
counter2<-counter2+1
res.vec2[counter2]<-seq1[i]
}
}
cbind(res.vec,res.vec2)
I have changed the general function a bit so I think this should be a sepparate answer.
You could try
seq1<-c(1:20,25:40,48:60)
myfun<-function(data,threshold){
cut<-which(c(1,diff(data))>threshold)
return(cut)
}
You get the points you have to care about using
myfun(seq1,1)
[1] 21 37
In order to better use is convenient to create an object with it.
pru<-myfun(seq1,1)
So you can now call
df<-data.frame(pos=pru,value=seq1[pru])
df
pos value
1 21 25
2 37 48
You get a data frame with the position and the value of the brakes with your desired threshold. If you want a list instead of a data frame it works like this:
list(pos=pru,value=seq1[pru])
$pos
[1] 21 37
$value
[1] 25 48
Function diff will give you the differences between successive values
> x <- c(1,2,3,5,6,3)
> diff(x)
[1] 1 1 2 1 -3
Now look for those values that are not equal to one for "breakpoints" in your sequence.
Taking in account the comments made here. For a general purpose, you could use.
fun<-function(data,threshold){
t<-which(c(1,diff(data)) != threshold)
return(t)
}
Consider that data could be any numerical vector (such as a data frame column). I would also consider using grep with a similar approach but it all depends on user preference.

R - how to use apply (or some variant) to replace nested looping

I've been searching the forums for a while now, and I can't seem to figure out the answer to my problem (although I've come close a few times). My apologies if this has already been answered elsewhere and I've missed it.
I'm working with the Egyptian Skulls data from the HSAUR2 library. I'll explain my problem via the code below. I first load the skulls data and run statistical summaries on it (eg boxplots, means, std. devs, etc). These summaries (not shown here) are broken down by variable (in columns 2-5 of the skulls data) and by "epoch" (column 1 of the skulls data).
library(HSAUR2) # load the skulls data
head(skulls)
# epoch mb bh bl nh
# 1 c4000BC 131 138 89 49
# 2 c4000BC 125 131 92 48
# 3 c4000BC 131 132 99 50
# 4 c4000BC 119 132 96 44
# 5 c4000BC 136 143 100 54
# 6 c4000BC 138 137 89 56
I then call powerTransform (part of the car package) to suggest appropriate transformations to convert the data so that the resulting distributions are "more Normal". I have one transformation for each variable/epoch combination.
library(car)
tfms_mb <- by(skulls$mb,skulls$epoch, function(x) powerTransform(x))
tfms_bh <- by(skulls$bh,skulls$epoch, function(x) powerTransform(x))
tfms_bl <- by(skulls$bl,skulls$epoch, function(x) powerTransform(x))
tfms_nh <- by(skulls$nh,skulls$epoch, function(x) powerTransform(x))
To extract the coefficients, I use sapply.
mbc <- sapply(tfms_mb,coef)
bhc <- sapply(tfms_bh,coef)
blc <- sapply(tfms_bl,coef)
nhc <- sapply(tfms_nh,coef)
Question:
How do I apply the appropriate transformation to each variable/epoch pair?
I am currently using the bct() function (from the TeachingDemos package) to apply the transformation and I can work out how to do it with one set value (eg raise all data to the power of 1.5):
library(TeachingDemos)
by(skulls[,-1], skulls[,1], function(x) { bct(x,1.5)})
My question is, how do I replace the "1.5" in the above line, to cycle through the coefficients in mbc, bhc, etc. and apply the correct power to each variable/epoch combination?
I've been reading up on the apply family of functions for a number of hours and also the the plyr package but this one has me stumped! Any help would be appreciated.
Here is a data.table solution that will be memory and time efficient
library(data.table)
SKULLS <- data.table(skulls)
SKULLS[, lapply(.SD, function(x){bct(x,coef(powerTransform(x)))}),by = epoch]
This is a solution using lapply twice:
library(HSAUR2)
library(car)
library(TeachingDemos)
do.call("rbind",
lapply(unique(skulls[["epoch"]]),
function(x) {
coefs <- coef(powerTransform(subset(skulls, epoch == x)[ , 2:5]));
do.call("cbind",
lapply(seq(length(coefs)),
function(y) bct(subset(skulls, epoch == x)[ , (y+1)], coefs[y])))
}
)
)

aaply fails on a vector

I am trying to understand how to use the excellent plyr package's commands on a vector (in my case, of strings). I suppose I'd want to use aaply, but it fails, asking for a margin. But there aren't columns or rows in my vector!
To be a bit more concrete, the following command works, but returns results in a wierd list. states.df is a data frame, and region is the name of the state (returned using Hadley's map_data("state") command). Thus, states.df$region is a vector of strings (specifically, state names). opinion.new is a vector of numbers, named using state names.
states.df <- map_data("state")
ch = sapply(states.df$region, function (x) { opinion.new[names(opinion.new)==x] } )
What I'd like to do is:
ch = aaply(states.df$region, function (x) { opinion.new[names(opinion.new)==x] } )
Where ch is the vector of numbers looked up or pulled from opinion.new. But aaply requires an array, and fails on a vector.
Thanks!
If you want to use plyr on a vector, you have to use l*ply, as follows:
v <- 1:10
sapply(v, function(x)x^2)
[1] 1 4 9 16 25 36 49 64 81 100
laply(v, function(x)x^2)
[1] 1 4 9 16 25 36 49 64 81 100
In other words, sapply and laply are equivalent

Resources