I am attempting to implement a particle filter within Rcpp and use OpenMP to parallelise the transition step. I am using dqrng to create threadsafe RNG with using the boost distribution functions as per here.
The code for the R API can be found here and introducing dqrng here
The issue I am having is that, using the R API, I achieve correct results, verified against alternate implementations, with the density of the estimator being roughly normally distributed as expected. However, for the dqrng version the density of the estimator does not appear correct with differing results being obtained. The density plots can be seen below.
Does anyone have any understanding of why this might be the case?
Related
Is there any possible explanation for multiple optim instances with set starting values to return different results if run in parallel or one after another on a single core?
Basically, I do a rolling forecast with refitting of the model each time, so I can easily parallelize over the rolling windows, but the results are different if I do not parallelize...
Sadly, I don't have a simple reproducible example. I know that if I link to different BLAS then the results differ, so is there anything like different numerical precision / set of libraries used, that might cause this?
I am working on a problem that needs a numerical integration of a bivariate function, where each evaluation of the function takes about 1 minute. Since numerical integration on a single core would evaluate the function thousands to tens of thousand times, I would like to parallelize the calculation. Right now I am using the bruteforce approach that calculates a naive grid of points and add them up with appropriate area multipliers. This is definitely not efficient and I suspect any modern multidimensional numerical integration algorithm would be able to achieve the same precision with a lot fewer function evaluations. There are many packages in R that would calculate 2-d integration much more efficiently and accurately (e.g. R2Cuba), but I haven't found anything that can be easily parallelized on a cluster with SGE managed job queues. Since this is only a small part of a bigger research problem, I would like to see if this can be done with reasonable effort , before I try to parallelize one of the cubature-rule based methods in R by myself.
I have found that using sparse grid achieves the best compromise between speed and accuracy in multi-dimensional integration, and it's easily parallized on the cluster because it doesn't involve any sequential steps. It won't be as accurate as other sequentially adpative integration algorithms but it is much better than the naive method because it provides a much sparser grid of points to calculate on each core.
The following R code deals with 2-dimensional integration, but can be easily modified for higher dimensions. The apply function towards the end can be easily parallelized on a cluster.
sg.int<-function(g,...,lower,upper)
{ require("SparseGrid")
lower<-floor(lower)
upper<-ceiling(upper)
if (any(lower>upper)) stop("lower must be smaller than upper")
gridss<-as.matrix(expand.grid(seq(lower[1],upper[1]-1,by=1),seq(lower[2],upper[2]-1,by=1)))
sp.grid <- createIntegrationGrid( 'KPU', dimension=2, k=5 )
nodes<-gridss[1,]+sp.grid$nodes
weights<-sp.grid$weights
for (i in 2:nrow(gridss))
{
nodes<-rbind(nodes,gridss[i,]+sp.grid$nodes)
weights<-c(weights,sp.grid$weights)
}
gx.sp <- apply(nodes, 1, g,...)
val.sp <- gx.sp %*%weights
val.sp
}
I need to implement the model show here:
http://www.ssc.upenn.edu/~fdiebold/papers/paper55/DRAfinal.pdf
The model estimation step on p.315 notes that:
"We maximize the likelihood by iterating the Marquart and
Berndt–Hall–Hall–Hausman algorithms, using numerical derivatives, optimal
stepsize, and a convergence criterion of 10^-6 for the change in the norm of the
parameter vector from one iteration to the next."
Now I know that stata supports switching between optimizers,
http://www.stata.com/manuals13/rmaximize.pdf
see bottom of p2.
Is there an R package or Matlab function/s that can do the same thing?
Specifically I need to be able to switch between BHHH and Levenberg-Marquardt.
Kind Regards
Baz
For R, check out the CRAN Task View on Optimization. Searching that page, it looks like BHHH and Marquardt are available in separate packages (minpack.lm and maxLik, respectively). You could write your own code to handle switching between them.
I need to make efficient d-dimensional points searching and also make efficient k-NN queries of a point in d-dimension. Therefore i require an R-Tree library. I require a library which will build the R-Tree structure, which i can use to query whenever needed.
Also i need to have some library like that of METIS or hMETIS, although my application does not involve hypergraphs. My requirement is to find the min cut set of a graph which divides the graph in roughly two equal sized graphs.
The thing is i would require libraries which support these in R.
I have found a library RANN, which has kd-tree based k-NN queries, but the problem is that either i have to make all the k-NN queries at once and store the results in a huge array, or need to call the function (nn or nn2) every time i need, which defeates the O(n lg n) retrieval growth of time.
Can anyone tell me if there is any such libraries in R?
Note: I would require the R-Tree library for implementing clustering algorithms efficiently, and the graph partition library would be required to implement the CHAMELEON clustering algorithm.
After some study on R and its libraries i think it is better to get the required libraries or make my own code in C or C++ and then use it through the .C() or .Call() R to C language interface.
Also i need to have some library like that of METIS or hMETIS, although my application does not involve hypergraphs. My requirement is to find the min cut set of a graph which divides the graph in roughly two equal sized graphs.
Despite that this is an old question, I have written something like this recently. That is,
A Kernighan-Lin like algorithm.
An algorithm to find an approximately connected balanced partition using the method suggested by Chlebíková (1996).
An algorithm that takes the solution found by the method in 2. and tries to minimize the cut price using a Kernighan-Lin like algorithm while still requiring that the two sets in the partition are connected.
From the graphs I am working with, 3. seems to often find a quite good solution often for bigger graphs (say ~ 1-4 million edges with ~ 1 million vertices). This takes seconds or a few minutes. The implementation is in the pedmod package at https://github.com/boennecd/pedmod. Call the following to install the package and to find a vignette with further details:
remotes::install_github("boennecd/pedmod", build_vignettes = TRUE)
vignette("pedigree_partitioning", package = "pedmod")
I am not sure how my implementation compares in terms of speed and quality of the partition compared with other software though.
References
Chlebíková, Janka. 1996. “Approximating the Maximally Balanced Connected Partition Problem in Graphs.” Information Processing Letters 60 (5): 225–30.
Kernighan, B. W., and S. Lin. 1970. “An Efficient Heuristic Procedure for Partitioning Graphs.” The Bell System Technical Journal 49 (2): 291–307
I'm checking a simple moving average crossing strategy in R. Instead of running a huge simulation over the 2 dimenional parameter space (length of short term moving average, length of long term moving average), I'd like to implement the Particle Swarm Optimization algorithm to find the optimal parameter values. I've been browsing through the web and was reading that this algorithm was very effective. Moreover, the way the algorithm works fascinates me...
Does anybody of you guys have experience with implementing this algorithm in R? Are there useful packages that can be used?
Thanks a lot for your comments.
Martin
Well, there is a package available on CRAN called pso, and indeed it is a particle swarm optimizer (PSO).
I recommend this package.
It is under actively development (last update 22 Sep 2010) and is consistent with the reference implementation for PSO. In addition, the package includes functions for diagnostics and plotting results.
It certainly appears to be a sophisticated package yet the main function interface (the function psoptim) is straightforward--just pass in a few parameters that describe your problem domain, and a cost function.
More precisely, the key arguments to pass in when you call psoptim:
dimensions of the problem, as a vector
(par);
lower and upper bounds for each
variable (lower, upper); and
a cost function (fn)
There are other parameters in the psoptim method signature; those are generally related to convergence criteria and the like).
Are there any other PSO implementations in R?
There is an R Package called ppso for (parallel PSO). It is available on R-Forge. I do not know anything about this package; i have downloaded it and skimmed the documentation, but that's it.
Beyond those two, none that i am aware of. About three months ago, I looked for R implementations of the more popular meta-heuristics. This is the only pso implementation i am aware of. The R bindings to the Gnu Scientific Library GSL) has a simulated annealing algorithm, but none of the biologically inspired meta-heuristics.
The other place to look is of course the CRAN Task View for Optimization. I did not find another PSO implementation other than what i've recited here, though there are quite a few packages listed there and most of them i did not check other than looking at the name and one-sentence summary.