Compute first few principal components of a large data set, quickly [closed] - r

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
We don’t allow questions seeking recommendations for books, tools, software libraries, and more. You can edit the question so it can be answered with facts and citations.
Closed 4 years ago.
Improve this question
I'm working with large data sets (matrices of dimension 6000 x 3072), and using the prcomp() function to do my principal component calculation. However, the function is extremely slow. Even using the rank argument which can limit the number of components to calculate, it still takes 7-8 minutes. Now, I need to calculate principal components 45 times, as well as do some other intermediate calculation that take a few minutes on their own. So I don't want to sit staring at my computer screen for 8-9 hours on this simple analysis.
What are the fastest principal component analysis packages, I can use to speed up the process. I only need to calculate the first 20, so, so the majority of the computation can be ignored.

Related

Is there an R package for working with very large graphs? [closed]

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
We don’t allow questions seeking recommendations for books, tools, software libraries, and more. You can edit the question so it can be answered with facts and citations.
Closed 3 years ago.
Improve this question
I'm trying to find maxflow/mincut in a very large graph using R Language. I tried using RBGL package, which is a wrapper for some C library, so it's supposed to be much faster than pure-R packages, but I'm getting stuck on creating a graph object.
Creating graphAM object causes an error, that there's not enough memory to allocate vector of size 100Gb
Creating graphNEL object takes a very long time(waited over an hour and it still didn't finish).
In my graph I have only 154403 vertices and 618082 edges. Is there a package in R, that can efficiently work with this kind of graph and has necessary function to calculate maxflow/mincut?
I expect that it should create an object and calculate maxflow/mincut in around 5 minutes.
I've used igraph successfully with some big graphs, though its hard to predict if it will meet your 5 minute mark.
igraph has functions for max_flow (https://igraph.org/r/doc/max_flow.html) and mincut (https://igraph.org/r/doc/min_cut.html).

Package for Converting Time Series to be Stationary in R [closed]

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
We don’t allow questions seeking recommendations for books, tools, software libraries, and more. You can edit the question so it can be answered with facts and citations.
Closed 4 years ago.
Improve this question
Are there any packages in R out there that will do the work of transforming a uni-variate or bi-variate time series to be stationary?
Thanks; any help would be greatly appreciated!
Is there a one for all package with a bunch of different functions to convert non stationary time series to stationary? No (As far as I know)
Its all about the data and figuring out what method would work.
To check if your time series is stationary - can try box.test, adf.test or kpss.test
Did you try diff()? diff calculates the differences between all consecutive values of a vector.
"One way to make a non-stationary time series stationary — compute the differences between consecutive observations. This is known as differencing." - from link
Another way would be log() transformation which is often used with diff().
Other methods are square, log difference, lag. Could try different combinations of those techniques for example log square difference or try other things like Box-Cox transformations.

spline approximation with specified number of intervals [closed]

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
We don’t allow questions seeking recommendations for books, tools, software libraries, and more. You can edit the question so it can be answered with facts and citations.
Closed 7 years ago.
Improve this question
So - edited because some of us thought that this question is off-topic.
I need to build spline (approximation) on 100 points in one of environments listed in tags. But I need it with exact number of intervals (maximum of 6 intervals - separate equations - in whole domain). Packages / libraries in R and Maxima which I know let me for building spline on this points but with 25-30 intervals (separate equations). Does anyone know how to build spline with set number of intervals without coding whole algorithm all over again?
What you're looking for might be described as "local regression" or "localized regression"; searching for those terms might turn up some hits.
I don't know if you can find exactly what you've described. But implementing it doesn't seem too complicated: (1) Split the domain into N intervals (say N=10). For each interval, (2) make a list of the data in the interval, (3) fit a low-order polynomial (e.g. cubic) to the data in the interval using least squares.
If that sounds interesting to you, I can go into details, or maybe you can work it out yourself.

Calculate differential in Fortran [closed]

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
We don’t allow questions seeking recommendations for books, tools, software libraries, and more. You can edit the question so it can be answered with facts and citations.
Closed 8 years ago.
Improve this question
I want to calculate w for j=0 to n in the below function. Is there any already written library for this in FORTRAN?
Actually I want to write a program that get n from the user, and print w in output. What shall I do for differential and for creating the equation Ln(x)?
That recurrence relation will generate the n-th order Legendre polynomial, and from the xj and wj, I assume you are writing a program to perform Gauss-Legendre integration (no idea why the q(x) is there).
This Florida State page provides an LGPL Fortran90 program that calculates the nodes and weights using a tridiagonal-eigenvalue method and writing them to an external file. You could try and collect all of the contained functions and place them into a module for run-time calculation of the nodes and weights.

Cluster one-dimensional data optimally? [closed]

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
We don’t allow questions seeking recommendations for books, tools, software libraries, and more. You can edit the question so it can be answered with facts and citations.
Closed 6 years ago.
Improve this question
Does anyone have a paper that explains how the Ckmeans.1d.dp algorithm works?
Or: what is the most optimal way to do k-means clustering in one-dimension?
Univariate k-means clustering can be solved in O(kn) time (on already sorted input) based on theoretical results on Monge matrices, but the approach was not popular most likely due to numerical instability and also perhaps coding challenges.
A better option is an O(knlgn) method that is now implemented in Ckmeans.1d.dp version 3.4.6. This implementation is as fast as heuristic k-means but offers guaranteed optimality, orders of magnitude better than heuristic k-means especially for large k's.
The generic dynamic programming solution by Richard Bellman (1973) does not touch upon specifics of the k-means problem and the implied runtime is O(kn^3).

Resources