How to parrallellize a for loop in R - r

I have this loop and I'm wondering what are the different ways to parralelize it :
for (i in 1:nrow(dataset)){
dataset$dayDiff[i] = dataset$close[i] - dataset$open[i]
}
I was thinking of using lapply but I don't see how to use a list in this context. Maybe I would use foreach in the parallel package but I don't know how to use it.

There is no good reason to use a loop here. Simply do dataset$dayDiff <- dataset$close - dataset$open. R is vectorized.

Related

How to check if a function has been called from the console?

I am trying to track the number of times certain functions are called from the console.
My plan is to add a simple function such as "trackFunction" in each function that can check whether they have been called from the console or as underlying functions.
Even though the problem sounds straight-forward I can't find a good solution to this problem as my knowledge in function programming is limited. I've been looking at the call stack and rlang::trace_back but without a good solution to this.
Any help is appreciated.
Thanks
A simple approach would be to see on which level the current frame lies. That is, if a function is called directly in the interpreter, then sys.nframe() returns 1, otherwise 2 or higher.
Relate:
Rscript detect if R script is being called/sourced from another script
myfunc <- function(...) {
if (sys.nframe() == 1) {
message("called from the console")
} else {
message("called from elsewhere")
}
}
myfunc()
# called from the console
g <- function() myfunc()
g()
# called from elsewhere
Unfortunately, this may not always be intuitive:
ign <- lapply(1, myfunc)
# called from elsewhere
for (ign in 1) myfunc()
# called from the console
While for many things the lapply-family and for loops are similar, they behave separately here. If this is a problem, perhaps the only way to mitigate this is the analyze/parse the call stack and perhaps "ignore" certain functions. If this is what you need, then perhaps this is more appropriate:
R How to check that a custom function is called within a specific function from a certain package

R callback functions using sparklyr

I hope to use mapPartitions and reduce function of Spark (http://spark.apache.org/docs/latest/programming-guide.html), using sparklyr.
It is easy in pyspark, the only thing I need to use is a plain python code. I can simply add python functions as callback function. So easy.
For example, in pyspark, I can use those two functions as follows:
mapdata = self.rdd.mapPartitions(mycbfunc1(myparam1))
res = mapdata.reduce(mycbfunc2(myparam2))
However, it seems this is not possible in R, for example sparklyr library. I checked RSpark, but it seems it is another way of query/wrangling data in R, nothing else.
I would appreciate if someone let me know how to use those two functions in R, with R callback functions.
In SparkR you could use internal functions - hence the prefix SparkR::: - to accomplish the same.
newRdd = SparkR:::toRDD(self)
mapdata = SparkR:::mapPartitions(newRdd, function(x) { mycbfunc1(x, myparam1)})
res = SparkR:::reduce(mapdata, function(x) { mycbfunc2(x, myparam2)})
I believe sparklyr interfaces only with the DataFrame / DataSet API.

How can Iist only functions and those that come from a package

I use the foreach package to parallelize some stuff and I am tired of indicating 5 functions in .export everytime I need to use it.
I know I can do foreach(...,.export=ls(.GlobalEnv)) but this transfers a lot of data to the workers and slow me down (there can be big tables defined).
So the question is how can I list only functions in the .GlobalEnv
I did that:
getAllFunctions <- function(envir=.GlobalEnv){
allClasses <- sapply(grep(x=ls(envir), pattern='^%', value=TRUE, invert=TRUE), FUN=function(x){class(eval(parse(text=x)))})
fnNames <- names(allClasses)[allClasses == 'function']
return(fnNames)
}
But that's ugly (and gives everything) and I'm sure there is an idiomatic way
From the comments:
as.list(.GlobalEnv)[sapply(.GlobalEnv, is.function)]

Official guidelines for using functions newly added to base R

I am writing a package that performs a statistical analysis while handling missing values. I am using the wonderful, life-changing function anyNA which was added sometime after 3.0 (commit). Another recently added function that people might want to use is OlsonNames.
So as I am using this function, my package won't work on older versions of R. I see four options for dealing with this.
Make the whole package depend on R >= 3.1 in DESCRIPTION.
Redefine the function in my source.
Redefine the function if the user is using <3.1 and don't define it if they are using >= 3.1 or make the function check the version each time e.g.
anyNA <- function(x)
if(as.numeric(R.Version()$minor) > 3.1){
return(anyNA(x)
} else {
return(any(is.NA(x))
}
}
or
if(as.numeric(R.Version()$minor) > 3.1){
anyNA <- base::anyNA
} else {
anyNA <- function(x) any(is.na(x))
}
I'm not even sure this second one would work in package source code.
Rewrite my code using any(is.na(x)).
My concrete question is is there an official CRAN preference for one of these?
Failing that, are there good reasons to use one over the others? To my eyes they all have failings. 1) It seems unnecessary to require users have R >= 3.1 for the sake of a small function. 2) If I redefine the function, any improvements made to the function in R base won't get used in my package. 3) This mostly seems messy. But also, if the base R version of the function changes I might end up with hard to fix bugs that only occur in certain R versions. 4) Code readability is reduced.

R Foreach parallel processing with ffdf mapply function

I have a large ffdf named 'Scenarios' that I am applying a function to from the NGA package. I am already using mychunks to try and speed things up but it is still slow. Could I run it with parallel processing as well using say the Foreach package? My code at present is shown below:
PGA = (rep(NA,Nevs))
mychunks <- chunk(Scenarios)
for(myblock in mychunks){
ScenariosINRAM <- Scenarios[myblock, ]
PGA[seq(min(myblock), max(myblock))] <- mapply(Sa.ba,ScenariosINRAM$Magnitude, ScenariosINRAM$Rjb, Vs30, ScenariosINRAM$Epsilon,T=0,rake=NA, U=0, SS=1, NS=0, RS=0, AB11=1)
}
I have not had much success with Foreach, and I need to get the speed up, any help would be greatly appreciated. Thanks

Resources