I'm checking a simple moving average crossing strategy in R. Instead of running a huge simulation over the 2 dimenional parameter space (length of short term moving average, length of long term moving average), I'd like to implement the Particle Swarm Optimization algorithm to find the optimal parameter values. I've been browsing through the web and was reading that this algorithm was very effective. Moreover, the way the algorithm works fascinates me...
Does anybody of you guys have experience with implementing this algorithm in R? Are there useful packages that can be used?
Thanks a lot for your comments.
Martin
Well, there is a package available on CRAN called pso, and indeed it is a particle swarm optimizer (PSO).
I recommend this package.
It is under actively development (last update 22 Sep 2010) and is consistent with the reference implementation for PSO. In addition, the package includes functions for diagnostics and plotting results.
It certainly appears to be a sophisticated package yet the main function interface (the function psoptim) is straightforward--just pass in a few parameters that describe your problem domain, and a cost function.
More precisely, the key arguments to pass in when you call psoptim:
dimensions of the problem, as a vector
(par);
lower and upper bounds for each
variable (lower, upper); and
a cost function (fn)
There are other parameters in the psoptim method signature; those are generally related to convergence criteria and the like).
Are there any other PSO implementations in R?
There is an R Package called ppso for (parallel PSO). It is available on R-Forge. I do not know anything about this package; i have downloaded it and skimmed the documentation, but that's it.
Beyond those two, none that i am aware of. About three months ago, I looked for R implementations of the more popular meta-heuristics. This is the only pso implementation i am aware of. The R bindings to the Gnu Scientific Library GSL) has a simulated annealing algorithm, but none of the biologically inspired meta-heuristics.
The other place to look is of course the CRAN Task View for Optimization. I did not find another PSO implementation other than what i've recited here, though there are quite a few packages listed there and most of them i did not check other than looking at the name and one-sentence summary.
Related
I have a calculation in R that needs to iteratively call a function for a fixed point contraction mapping. I've been using the squarem function out of the SQUAREM package by Ravi Varadhan. Today while trying to figure out a way around an issue I was having with squarem I came across the TURBOEM package, also by Varadhan. At first glance TURBOEM seems to do the same things as SQUAREM, but with additional functionality in some dimensions.
Does anyone know whether one or the other of these packages is preferred, either in general or for particular applications? Is one more current/updated than the other? TURBOEM seems to have the ability to customize the convergence criterion, which might get me out of the current bind I'm in, but I'm concerned there might be other issues. Obviously I can go off and test the corresponding functions from each package, but if someone out there knows some background on the two packages it might save me a ton of time.
There are four underlying SQUAREM algorithms used by each package. They are effectively identical*. You can see the underlying functions for yourself by using:
SQUAREM:::cyclem1
SQUAREM:::cyclem2
SQUAREM:::squarem1
SQUAREM:::squarem2
turboEM:::bodyCyclem1
turboEM:::bodyCyclem2
turboEM:::bodySquarem1
turboEM:::bodySquarem2
* apart from some differences due to the way in which these are used within the packages. And the argument method in SQUAREM is called version in turboEM
I would say turboEM would probably be preferred in general, for the following reasons:
As you mention, turboEM allows the user to select the convergence criterion, either based on the L2-norm of the change in the parameter vector (convtype = "parameter"), the L1-norm of the change in the objective function (convtype = "objfn"), or by specifying a custom function (convfn.user). SQUAREM only checks convergence using the L2-norm of the change in parameter vector.
turboEM can also stop the algorithm prior to convergence based on either the number of iterations (stoptype = "maxiter") or the amount of time elapsed (stoptype = "maxtime"). SQUAREM only stops after the specified number of iterations.
The pconstr and project arguments to turboem allow the user to define parameter space constraints and a function that projects estimates back into the parameter space if these are violated. SQUAREM does not have this functionality.
turboEM can easily apply multiple versions of the algorithm to the same data (e.g. with different orders, step sizes, ...), by providing a vector to the method argument and a list to the control.method argument...
... and it can do this in parallel via the foreach package.
turboEM also offers a convenient interface through which to apply a vanilla EM algorithm, as well as EM acceleration schemes other than SQUAREM: parabolic EM (method = "pem"), dynamic ECME ("decme") and quasi-Newton ("qn").
The turboEM package also provides the turboSim function, which allows the user to easily conduct benchmark studies comparing the different acceleration schemes.
The one downside that I can see to using turboEM instead of SQUAREM is that, if you are really interested in the particulars of the SQUAREM algorithm, the trace provided by squarem gives more specific information (residual, extrapolation, step length) than that provided by turboem (objective function [if calculated], iteration number, L2-norm of parameter change).
One final aside: The current version of SQUAREM on CRAN (v 2016.8-2) has indeed been updated more recently than the version of turboEM on CRAN (v 2014.8-1). However, the NEWS suggests that the only updates to SQUAREM since December 2010 have been vignettes and demos, while turboEM's first release was in December 2011.
Thanks for your interest in SQUAREM and turboEM. I am the author of both packages. In future, you may contact me directly with any questions.
The goals of the 2 packages are different. SQUAREM implements one class of acceleration methods. turboEM on the other hand includes a variety of state-of-art EM acceleration techniques. The goal of turboEM is to provide a go-to-place for all your EM acceleration needs! In particular, turboEM allows you to benchmark the different algorithms for your problem and determine the best one. In my experience, the squarem class of algorithms most often out perform the other 3 classes (quasi-Newton, dynamic EM, and parabolic EM). Hence, you might also directly use the SQUAREM package. However, turboEM has a number of additional features as pointed out by Mark.
Ravi Varadhan
I have used biglm in R and found it very useful. Now I need the same type of functionality in python. Any ideas? I have seen that patsy/statsmodels has an incremental mode, but have not been able to find any samples to copy/adapt. Any pointers would be much appreciated.
from a related answer of Nathaniel Smith on the statsmodels mailing list
My incremental LS code might be useful here, it's basically the same
problem:
https://github.com/njsmith/pyrerp/blob/master/pyrerp/incremental_ls.py#L330
The new X'X is the sum of the old X'Xs, then you have to re-do the
scaling and inversion to get the new vcov matrix for the estimates.
Should be doable so long as you know how many data points are in each
and the various sums-of-squares. (The code I linked has some extra
complexity because of handling a particular sort of heteroskedasticity
via FGLS, but it can pretty much be ignored.)
statsmodels doesn't have anything in this area yet.
There is an incremental OLS function in statsmodels, however that was written as helper function for cusum tests (in memory) and hasn't been used or checked for any other purpose:
http://statsmodels.sourceforge.net/devel/generated/statsmodels.stats.diagnostic.recursive_olsresiduals.html
I need to solve (many times, for lots of data, alongside a bunch of other things) what I think boils down to a second order cone program. It can be succinctly expressed in CVX something like this:
cvx_begin
variable X(2000);
expression MX(2000);
MX = M * X;
minimize( norm(A * X - b) + gamma * norm(MX, 1) )
subject to
X >= 0
MX((1:500) * 4 - 3) == MX((1:500) * 4 - 2)
MX((1:500) * 4 - 1) == MX((1:500) * 4)
cvx_end
The data lengths and equality constraint patterns shown are just arbitrary values from some test data, but the general form will be much the same, with two objective terms -- one minimizing error, the other encouraging sparsity -- and a large number of equality constraints on the elements of a transformed version of the optimization variable (itself constrained to be non-negative).
This seems to work pretty nicely, much better than my previous approach, which fudges the constraints something rotten. The trouble is that everything else around this is happening in R, and it would be quite a nuisance to have to port it over to Matlab. So is doing this in R viable, and if so how?
This really boils down to two separate questions:
1) Are there any good R resources for this? As far as I can tell from the CRAN task page, the SOCP package options are CLSCOP and DWD, which includes an SOCP solver as an adjunct to its classifier. Both have similar but fairly opaque interfaces and are a bit thin on documentation and examples, which brings us to:
2) What's the best way of representing the above problem in the constraint block format used by these packages? The CVX syntax above hides a lot of tedious mucking about with extra variables and such, and I can just see myself spending weeks trying to get this right, so any tips or pointers to nudge me in the right direction would be very welcome...
You might find the R package CVXfromR useful. This lets you pass an optimization problem to CVX from R and returns the solution to R.
OK, so the short answer to this question is: there's really no very satisfactory way to handle this in R. I have ended up doing the relevant parts in Matlab with some awkward fudging between the two systems, and will probably migrate everything to Matlab eventually. (My current approach predates the answer posted by user2439686. In practice my problem would be equally awkward using CVXfromR, but it does look like a useful package in general, so I'm going to accept that answer.)
R resources for this are pretty thin on the ground, but the blog post by Vincent Zoonekynd that he mentioned in the comments is definitely worth reading.
The SOCP solver contained within the R package DWD is ported from the Matlab solver SDPT3 (minus the SDP parts), so the programmatic interface is basically the same. However, at least in my tests, it runs a lot slower and pretty much falls over on problems with a few thousand vars+constraints, whereas SDPT3 solves them in a few seconds. (I haven't done a completely fair comparison on this, because CVX does some nifty transformations on the problem to make it more efficient, while in R I'm using a pretty naive definition, but still.)
Another possible alternative, especially if you're eligible for an academic license, is to use the commercial Mosek solver, which has an R interface package Rmosek. I have yet to try this, but may give it a go at some point.
(As an aside, the other solver bundled with CVX, SeDuMi, fails completely on the same problem; the CVX authors aren't kidding when they suggest trying multiple solvers. Also, in a significant subset of cases, SDTP3 has to switch from Cholesky to LU decomposition, which makes the processing orders of magnitude slower, with only very marginal improvement in the objective compared to the pre-LU steps. I've found it worth reducing the requested precision to avoid this, but YMMV.)
There is a new alternative: CVXR, which comes from the same people.
There is a website, a paper and a github project.
Disciplined Convex Programming seems to be growing in popularity observing cvxpy (Python) and Convex.jl (Julia), again, backed by the same people.
I am currently using RGP as a genetic programming library. If anyone has an idea for another library (better documentation, more active development, etc.) I would like to hear your suggestions.
The question is rather simple: given a function with n parameters in R, how can i find the global minimum using genetic programming. I tried modifying one of the example programs but it seems this example uses linear regression which I don't think is appropriate in my situation.
Does anyone have any example code i could use?
I can recommend to use HeuristicLab. There are some algorithms implemented: Genetic Algorithm, Evolution Strategy, Simulated Annealing, Particle Swarm Optimization, and more which might be interesting if you're looking into the minimization of real-valued functions. The software is implemented in C# and runs on Windows. It offers a GUI where you can optimize several provided test functions (Rosenbrock, Schaffer, Ackley, etc.). There's also a very good implementation of genetic programming (GP) available, but from my impression you don't need GP. In genetic programming you evolve a function given the output data of an unknown function. I think in your case the function is known and you need to find those parameters that minimize the function's output.
The latest major version of the software was released to the public in 2010 and has since been further developed in several minor releases. We now have a release about two times a year. There's a google group where you can ask for help which is getting more and more active and there are some video tutorials that show the functionality. Check out the tour video on youtube which gives an overview of the features in less than 3 minutes. The research group around Prof. Affenzeller - a researcher in the field of Metaheuristics - has developed this software and is situated in Austria. I'm part of this group also.
Check out the howtos how you can implement your function in the GUI or, if you know C#, how you can implement your problem as a plugin.
You can use a genetic algorithm instead of GP to find the minimum of a function with n variables.
Basically what you do is:
assign initial values
generate initial population of n chromosomes
While (true)
evaluate fitness f(x, y) for each chromosome
if we reach a satisfactory solution of f(x, y) → Exit the loop
create the selection scheme (tournament selection)
select chromosomes (selection):
elitism
crossover
create mutations
alter duplicated chromosomes
replace the original population of chromosomes
end While
Crossposted with STATS.se since this problem could straddle both STATs.se/SO
https://stats.stackexchange.com/questions/17712/parallelize-solve-for-ax-b
I have some extremely large sparse matrices created using spMatrix function from the matrix package.
Using the solve() function works for my Ax=b issue, but it takes a very long time. Several days.
I noticed that http://cran.r-project.org/web/packages/RScaLAPACK/RScaLAPACK.pdf
appears to have a function that can parallelize the solve function, however, it can take several weeks to get new packages installed on this particular server.
The server already has the snow package installed it.
So
Is there a way of using snow to parallelize this operation?
If not, are there other ways to speed up this type of operation?
Are there other packages like RScaLAPACK? My search on RScaLAPACK seemed to suggest people had a lot of issues with it.
Thanks.
[EDIT] -- Additional details
The matrices are about 370,000 x 370,000.
I'm using it to solve for alpha centrality, http://en.wikipedia.org/wiki/Alpha_centrality. I was originally using the alpha centrality function in the igraph package, but it would crash R.
More details
This is on a single machine with 12 cores and 96 gigs of memory (I believe)
It's a directed graph along the lines of paper citation relationships.
Calculating condition number and density will take awhile. Will post as it comes available.
Will crosspost on stat.SE and will add a link back to here
[Update 1: For those just tuning in: The original question involved parallelizing computations to solving a regression problem; given that the underlying problem is related to alpha centrality, some of the issues, such as bagging and regularized regression may not be as immediately applicable, though that leads down the path of further statistical discussions.]
There are a bundle of issues to address here, from the infrastructural to the statistical.
Infrastructure
[Updated - also see Update #2 below.]
Regarding parallelized linear solvers, you can replace R's BLAS / LAPACK library with one that supports multithreaded computations, such as ATLAS, Goto BLAS, Intel's MKL, or AMD's ACML. Personally, I use AMD's version. ATLAS is irritating, because one fixes the number of cores at compilation, not at run-time. MKL is commercial. Goto is not well supported anymore, but is often the fastest, but only by a slight margin. It's under the BSD license. You can also look at Revolution Analytics's R, which includes, I think, the Intel libraries.
So, you can start using all of the cores right away, with a simple back-end change. This could give you a 12X speedup (b/c of the number of cores) or potentially much more (b/c of better implementation). If that brings down the time to an acceptable range, then you're done. :) But, changing the statistical methods could be even better.
You've not mentioned the amount of RAM available (or the distribution of it per core or machine), but A sparse solver should be pretty smart about managing RAM accesses and not try to chew on too much data at once. Nonetheless, if it is on one machine and if things are being done naively, then you may encounter a lot of swapping. In that case, take a look at packages like biglm, bigmemory, ff, and others. The former addresses solving linear equations (or GLMs, rather) in limited memory, the latter two address shared memory (i.e. memory mapping and file-based storage), which is handy for very large objects. More packages (e.g. speedglm and others) can be found at the CRAN Task View for HPC.
A semi-statistical, semi-computational issue is to address visualization of your matrix. Try sorting by the support per row & column (identical if graph is undirected, else do one then the other, or try a reordering method like reverse Cuthill-McKee), and then use image() to plot the matrix. It would be interesting to see how this is shaped, and that affects which computational and statistical methods one could try.
Another suggestion: Can you migrate to Amazon's EC2? It is inexpensive, and you can manage your own installation. If nothing else, you can prototype what you need and migrate it in-house once you have tested the speedups. JD Long has a package called segue that apparently makes life easier for distributing jobs on Amazon's Elastic MapReduce infrastructure. No need to migrate to EC2 if you have 96GB of RAM and 12 cores - distributing it could speed things up, but that's not the issue here. Just getting 100% utilization on this machine would be a good improvement.
Statistical
Next up are multiple simple statistical issues:
BAGGING You could consider sampling subsets of your data in order to fit the models and then bag your models. This can give you a speedup. This can allow you to distribute your computations on as many machines & cores as you have available. You can use SNOW, along with foreach.
REGULARIZATION The glmnet supports sparse matrices and is very fast. You would be wise to test it out. Be careful about ill-conditioned matrices and very small values of lambda.
RANK Your matrices are sparse: are they full rank? If they are not, that could be part of the issue you're facing. When matrices are either singular or very nearly so (check your estimated condition number, or at least look at how your 1st and Nth eigenvalues compare - if there's a steep drop off, you're in trouble - you might check eval1 versus ev2,...,ev10,...). Again, if you have nearly singular matrices, then you need to go back to something like glmnet to shrink out the variables are either collinear or have very low support.
BOUNDING Can you reduce the bandwidth of your matrix? If you can block diagonalize it, that's great, but you'll likely have cliques and members of multiple cliques. If you can trim the most poorly connected members, then you may be able to estimate their alpha centrality as being upper bounded by the lowest value in the same clique. There are some packages in R that are good for this sort of thing (check out Reverse Cuthill-McKee; or simply look to see how you'd convert it into rectangles, often relating to cliques or much smaller groups). If you have multiple disconnected components, then, by all means, separate the data into separate matrices.
ALTERNATIVES Are you wedded to the Alpha Centrality? There may be other measures that are monotonically correlated (i.e. have high rank correlation) with the same value that could be calculated more cheaply or at least implemented quite efficiently. If those will work, then your analyses could proceed with a lot less effort. I have a few ideas, but SO isn't really the place to go about that discussion.
For more statistical perspectives, appropriate Q&A should occur on the stats.stackexchange.com, Cross-Validated.
Update 2: I was a bit too quick in answering and didn't address this from the long-term perspective. If you are planning to do research on such systems for the long-term, you should look at other solvers that may be more applicable to your type of data and computing infrastructure. Here is a very nice directory of the options for both solvers and pre-conditioners. It seems this doesn't include IBM's "Watson" solver suite. Although it may take weeks to get software installed, it's quite possible that one of the packages is already installed if you have a good HPC administrator.
Also, keep in mind that R packages can be installed to the user directory - you need not have a package installed in the general directory. If you need to execute something as a user other than yourself, you could also download a package to the scratch or temporary space (if you're running within just 1 R instance, but using multiple cores, check out tempdir).