julia vector space model package? - vector

just starting with Julia and the jetson nano. I want to look into using the 128 gpu's in implementing a vector space model. I understand that Julia has many packages but I can't find one for vector space modelling. I'm pretty sure one exists and I am not searching in the right place. If anyone knows a julia package that has the basis for building a rudimentary vector space model from a set of documents I would be grateful to hear about it.
thank you

Related

Handling extremely small numbers

I need to find a way to handle extremely small numbers in R, particularly in order to take the log of extremely small numbers. According to the R-manual, “on a typical R platform the smallest positive double is about 5e-324.” Well, I need to deal with numbers even smaller (at least as small as 10^-350). If R is incapable of doing this, I was wondering if there is a way I can use a program that can do this (such as Matlab or Mathematica) from R.
Specifically, I am computing a matrix of probabilities, and some of these probabilities are so small that R does not distinguish them from 0. The reason I know this is because each probability is the product of two other probabilities; so I’ll have p(x)=10^-300, p(y)=10^-50, and then p(x)*p(y)=0. I’d like to be able to do these computations, take the log of the resultant very small number (-805.905 for my example, according to Mathematica), and then continue working with the log values in R.
So to be more detailed, I have a matrix of values for p(x), a matrix of values for p(y), both computed using dnorm, and I’m computing the product. In many cases, R is capable of evaluating p(x) and p(y), but the p(x)*p(y) is too small. In a few cases, though, even the p(x) or p(y) value itself is too small, and is itself just equated to 0 in R.
I’ve seen that there is stuff out there for calling R from Mathematica, but not much pertaining to calling Mathematica from R. I’d honestly prefer to do the latter than the former here. So if any one either knows how to do this (either employing Mathematica or Matlab or something else in R) or has another solution to this issue, I’d greatly appreciate it.
Note that I realize there are a few other threads on this topic, discussing such things as using the Brobdingnag package to deal with small numbers, but these do not appear applicable here.

Is there any Python equivalent of R's biglm?

I have used biglm in R and found it very useful. Now I need the same type of functionality in python. Any ideas? I have seen that patsy/statsmodels has an incremental mode, but have not been able to find any samples to copy/adapt. Any pointers would be much appreciated.
from a related answer of Nathaniel Smith on the statsmodels mailing list
My incremental LS code might be useful here, it's basically the same
problem:
https://github.com/njsmith/pyrerp/blob/master/pyrerp/incremental_ls.py#L330
The new X'X is the sum of the old X'Xs, then you have to re-do the
scaling and inversion to get the new vcov matrix for the estimates.
Should be doable so long as you know how many data points are in each
and the various sums-of-squares. (The code I linked has some extra
complexity because of handling a particular sort of heteroskedasticity
via FGLS, but it can pretty much be ignored.)
statsmodels doesn't have anything in this area yet.
There is an incremental OLS function in statsmodels, however that was written as helper function for cusum tests (in memory) and hasn't been used or checked for any other purpose:
http://statsmodels.sourceforge.net/devel/generated/statsmodels.stats.diagnostic.recursive_olsresiduals.html

Large matrix: solve(crossprod(X)) when dim(X) = 100,000:5000

I need to do run a simple two-stage least squares regression on large data matrices. This just requires some crossprod() and solve() commands, but the matrices have dimensions 100,000 by 5000 matrix. My understanding is that holding a matrix like this in memory would take up a bit less than 4GB of memory. Unfortunately, my 64-bit Win7 machine only has 8GB of RAM. When I try to manipulate the matrices in question, I get the usual 'can't allocate vector of size' message.
I have considered a number of options such as the ff and bigmemory packages. However, the base R functions for the matrix operations I need only support the usual matrix object type, not the bigmatrix type.
It seems like it may be possible to extend the code from biglm(), but I'm on a tight schedule for this project, so I wanted to check-in with you all to see if there existed a ready-made solution for problems like this. Apologies if this was addressed before (I couldn't find it) or if the question is too generic.
Yes, a ready-made solution exist in biglm, the package you already identified. Linear regression can work with an updating scheme; that basic property is implemented in the package.
Dump your data to disk, say to SQLite and study the package documentation and proceed in, say, 10 chunks on 10,000 each.

Implementation of Particle Swarm Optimization Algorithm in R

I'm checking a simple moving average crossing strategy in R. Instead of running a huge simulation over the 2 dimenional parameter space (length of short term moving average, length of long term moving average), I'd like to implement the Particle Swarm Optimization algorithm to find the optimal parameter values. I've been browsing through the web and was reading that this algorithm was very effective. Moreover, the way the algorithm works fascinates me...
Does anybody of you guys have experience with implementing this algorithm in R? Are there useful packages that can be used?
Thanks a lot for your comments.
Martin
Well, there is a package available on CRAN called pso, and indeed it is a particle swarm optimizer (PSO).
I recommend this package.
It is under actively development (last update 22 Sep 2010) and is consistent with the reference implementation for PSO. In addition, the package includes functions for diagnostics and plotting results.
It certainly appears to be a sophisticated package yet the main function interface (the function psoptim) is straightforward--just pass in a few parameters that describe your problem domain, and a cost function.
More precisely, the key arguments to pass in when you call psoptim:
dimensions of the problem, as a vector
(par);
lower and upper bounds for each
variable (lower, upper); and
a cost function (fn)
There are other parameters in the psoptim method signature; those are generally related to convergence criteria and the like).
Are there any other PSO implementations in R?
There is an R Package called ppso for (parallel PSO). It is available on R-Forge. I do not know anything about this package; i have downloaded it and skimmed the documentation, but that's it.
Beyond those two, none that i am aware of. About three months ago, I looked for R implementations of the more popular meta-heuristics. This is the only pso implementation i am aware of. The R bindings to the Gnu Scientific Library GSL) has a simulated annealing algorithm, but none of the biologically inspired meta-heuristics.
The other place to look is of course the CRAN Task View for Optimization. I did not find another PSO implementation other than what i've recited here, though there are quite a few packages listed there and most of them i did not check other than looking at the name and one-sentence summary.

Quickly cross-check complex math results?

I am doing matrix operations on large matrices in my C++ program. I need to verify the results that I get, I used to use WolframAlpha for the task up until now. But my inputs are very large now, and the web interface does NOT accept such large values (textfield is limited).
I am looking for a better solution to quickly cross-check/do math problems.
I know there is Matlab but I have never used it and I don't know if thats what will suffice my needs and how steep the learning curve would be?
Is this the time to make the jump? or there are other solutions?
If you don't mind using python, numpy might be an option.
Apart from the license costs, MATLAB is the state of the art numerical math tool. There is octave as free open source alternative, with a similar syntax. The learning curve is for both tools absolutely smooth!
WolframAlpha is web interface to Wolfram Mathematica. The command syntax is exactly the same. If you have access to Mathematica at your university, it would be most smooth choice for you since you already have experience with WolframAlpha.
You may also try some packages to convert Mathematica commands to MATLAB:
ToMatlab
Mathematica Symbolic Toolbox for MATLAB 2.0
Let us know in more details what is your validation process. How your data look like and what commands have you used in WolframALpha? Then we can help you with MATLAB alternative.

Resources