Is there an R function to parallelize nlminb()? - r

The function BEKK11 from the library MTS uses nlminb(start = par, objective = mlikeG, RTN = RTN, include.mean = include.mean, lower = c1, upper = c2) to optimize and thereby fit the BEKK model. Unfortunately, R uses only one core to optimize the function. This takes ages. Is there any way I can get R to use more cores on nlminb()? I have used the library doparallel before but this library only applies to looping routines and not to optimization functions.
Thank you very much in advance!

Related

Can I run R parallel computing along side JuliaCall?

I have an R function code that runs with JuliaCall. However, due to the computation time, I would like to parallel compute my code. My code below produces an error that the Distributions and PoissonRandom packages in Julia do not exist even though the package has already been installed. My question: is it possible to run the parallel package, JuliaCall, and Julia packages?
two_part_sim_RTLs <- function(station_data){
julia_setup(JULIA_HOME = "C:/Users/Kenneth Kin Pomeyie/AppData/Local/Programs/Julia-1.6.1/bin/")
julia_library("Distributions")
julia_library("DataFrames")
julia_library("PoissonRandom")
lambda = station_data$lambda
location = station_data$location
scale = station_data$scale
shape = station_data$shape
julia_assign("lambda", lambda)
julia_assign("location", location)
julia_assign("scale", scale)
julia_assign("shape", shape)
julia_eval("x = rand(Truncated(Poisson(lambda), 0.0, 10), 50000000)")
julia_eval("load = Array{Float64}(undef, 50000000)")
julia_eval("
for i in 1:50000000
load[i] = max(rand(GeneralizedPareto(location, scale, shape)), x[i])
end
")
load = julia_eval("load")
}

Error in optim: L-BFGS-B needs finite value of fn

I am trying to run impute_errors() function of the imputeTestBench package for a series of values. I am using six user defined methods for selection of best imputation method. Below is my code:
correctedSalesHistoryMatrix[, 1:2],
matrix(unlist(apply(X = as.matrix(correctedSalesHistoryMatrix[, -c(1, 2)]),
MARGIN = 1,
FUN = impute_errors,
smps = "mcar",
methods = c(
"imputationMethod1"
, "imputationMethod2"
, "imputationMethod3"
, "imputationMethod4"
, "imputationMethod5"
, "imputationMethod6"
),
methodPath = "C:\\Documents\\Imputations.R",
errorParameter = "mape",
missPercentFrom = 10,
missPercentTo = 10
)
), nrow = nrow(correctedSalesHistoryMatrix), byrow = T
)
)
When I am using a small dataset, the function executes successfully. When I am using a large dataset I am using the following error:
Error in optim(init[mask], getLike, method = "L-BFGS-B", lower = rep(0, :
L-BFGS-B needs finite values of 'fn'
Called from: optim(init[mask], getLike, method = "L-BFGS-B", lower = rep(0,
np + 1L), upper = rep(Inf, np + 1L), control = optim.control)
I don't think this is an easy fix.
Error is probably not caused by imputeTestBench itself, but rather by one of your user defined imputation methods.
Run impute_errors like before and only add na_mean as method instead of your user defined methods (impute_errors(..., methods = 'na_mean') ) to see if this suggestion is true.
The error itself occurs quite often and has to do with stats::optim receiving inputs it can't deal with. Quite likely you are not using stats::optim in your user defined imputation methods (so you can't easily fix the input). More likely is that a package your are using is doing some calculations and then using stats::optim. Or even worse a package you are using is using another package, that is using stats::optim.
In the answers to this question you can see an explanation underlying problem. Overall seems to occur especially for large datasets, when the fn input parameter to stats::optim becomes Inf.
Here a some examples of the problem also occurring for different R packages and functions (which all use stats::optim somewhere internally): 1, 2, 3
Not too much you can do overall, if you don't want to go extremely deep into the underlying packages.
If you are using the imputeTS package for one of your user supplied imputation methods, in this Github Issue a workaround is proposed, which might help, if the error occurs within the na_kalman or na_seadec method.

How to run a single model using parallel processing in R?

I am running a single network model in R. I want to utilize parallel processing in R to run it. I see examples online but they are usually for multiple operations.
model.2 <- ergmm(network.M.CS ~ euclidean(d=2, G=2)+
nodematch("Party", diff = F) +
nodematch("State", diff = F) +
absdiff("Ideology")+
edgecov(Donor.Network),
response = "Norm.Num.Bill.CS",
family = c("Bernoulli"),
control=ergmm.control(burnin=20000, sample.size= 4000,interval=10),
verbose=T)
summary(model.2)
As an example, this is the model I would like to run. Any tips you can offer that would allow me to use parallel processing on this single model would be much appreciated. Thank you!

Why using a custom proximity function in proxy::dist makes it so slow in R?

I am trying to use proxy::dist function with a custom distance matrix but what I have now is very slow.
This is a reproducible example of how I call my custom function:
set.seed(1)
test <- matrix(runif(4200), 60, 70)
train <- matrix(runif(4200), 60, 70)
dMatrix <- proxy::dist(x = test, y = train, method = customDTW,
by_rows = T,
auto_convert_data_frames = T)
which is supposed to calculate the distance between each time series in test matrix with all time series in the train matrix (each row being a time series).
My custom function is:
customDTW <- function(ts1, ts2){
d <- dtw(ts1, ts2,
dist.method = "Euclidean",
window.type = "sakoechiba",
window.size = 20
)
return(d$distance)
}
The problem is that, comparing to when I use method="DTW", or even to the case where I calculate the distance matrix by myself, this is extremely slower, and as the length of the time series or the number of them grows, it get slower exponentially. Of course this is rooted in the nested loop, but I am surprised by the scale of the effect. There must be another reason that I am not seeing it.
My question is that how else I could implement my customDTW to make it faster, using proxy::dist?
This is my little experiment on the execution time:
Execution time for 60X7 (using proxy::dist + customDTW)
user system elapsed
2.852 0.012 2.867
Execution time for 60X70 (using proxy::dist + customDTW)
user system elapsed
5.384 0.000 5.382
Execution time for 60X700 (using proxy::dist + customDTW)
user system elapsed
509.088 18.652 529.115
Execution time for 60X700 (without using proxy::dist)
user system elapsed
26.696 0.004 26.753
DTW is slow by nature
Have you considered trying to use dtwclust (parallelized implementation of dtw)
https://github.com/asardaes/dtwclust
https://cran.r-project.org/web/packages/dtwclust/vignettes/dtwclust.pdf
R is an interpreted language, and under the hood it is implemented in C. The proxy package is, as far as I understand, using R's interpretation capabilities from within C to call R code several times, but that still can't avoid the interpretation's overhead, so almost any "pure" R implementation will be slower.
Specifying loop=TRUE when registering a function with proxy means that the aforementioned will happen (proxy will interpret the R code several times to fill the distance matrix). If you really want to speed things up, you'd need to implement the filling itself in C/C++, and register the function with proxy with loop=FALSE; this is what dtwclust does (among other things).
You might want to look at the parallelDist package if you want to test your own custom C/C++ functions, even if you don't want to use parallelization.
This is what I found that seems to improve the speed, but it is still not as fast as I expect it to be. (Any other idea is still very welcome.)
The trick is to register the custom distance function with proxy (i.e., Registry of proximities here) so that you can use it like a built-in distance measure. So, first:
proxy::pr_DB$set_entry(FUN = customDTW, names=c("customDTW"),
loop = TRUE, type = "metric", distance = TRUE)
and now you can use it as if it was already in the proxy package.
dMatrix <- proxy::dist(x = test, y = train, method = "customDTW",
by_rows = T,
auto_convert_data_frames = T)
Note: If you want to use this method, then the customDTW method has to deal with one pair of time series, instead of all of them. So the customDTW would look like this:
customDTW2 <- function(ts1, ts2){
d <- dtw(ts1, ts2,
dist.method = "Euclidean",
window.type = "sakoechiba",
window.size = 20
)
return(d$distance)
}
For more, see ?pr_DB.

R package mlr: option for custom kernels in classif.ksvm

I want to build an svm with a custom kernel.
Usually I use the R package kernlab for that.
Since, I want to try out different kernels and tune the hyper-parameters, I wanted to use the nice package mlr. However, as far as I can see, it doesn't support the kernel type option "matrix" to pass a custom kernel to the ksvm learner ("classif.ksvm").
Does somebody know if there is a plan to fix that? Or if there is another package which allows custom kernels and nice wrappers for tuning parameters and resampling methods. To my knowledge, the caret package is also not taking custom kernels.
We don't have any plans to support that. You can define a custom learner to support this quite easily though. There are two changes to classif.ksvm as far as I can see (untested).
First, allow the new parameter value for the kernel parameter:
makeDiscreteLearnerParam(id = "kernel", default = "rbfdot",
values = c("vanilladot", "polydot", "rbfdot", "tanhdot", "laplacedot", "besseldot", "anovadot", "splinedot", "matrix"))
Then, change the train function to take the new kernel into account:
trainLearner.classif.ksvm = function(.learner, .task, .subset, .weights = NULL, degree, offset, scale, sigma, order, length, lambda, normalized, ...) {
kpar = learnerArgsToControl(list, degree, offset, scale, sigma, order, length, lambda, normalized)
f = getTaskFormula(.task)
pm = .learner$predict.type == "prob"
parlist = list(...)
if (base::length(kpar) > 0L)
kernlab::ksvm(f, data = getTaskData(.task, .subset), kpar = kpar, prob.model = pm, ...)
else if (parlist$kernel == "matrix")
kernlab::ksvm(kernlab::as.kernelMatrix(getTaskData(.task, .subset)), data = getTaskData(.task, .subset), prob.model = pm, ...)
else
kernlab::ksvm(f, data = getTaskData(.task, .subset), prob.model = pm, ...)
}
This is assuming that the data you're passing in the task defines the custom kernel, which is a bit kludgy...

Resources