Linear optimisation and limitation in R - r

We are trying to solve the next linear optimization problem:
We have:
Pij, i=1:3, j=1:30000, Pij are positive
Bi, i=1:3, integer positive
The searching result is matrix of 3 x 30000 of binary values Xij with next conditions:
Constraints:
For each j =1:30000, Sum (by index of i=1:3)Xij=1
For each i =1:3, Sum by index of (j=1:30000) Xij≤Bi
Objective: Optimize:
Maximize (Sum (by index of i=1:3) Sum (by index of j=1:30000) Pij *Xij)
The decision was to reduce this task to a linear programming task. Hence we construct one matrix with 3 x 3*j dimension for the Bi constraints and one matrix j x 3*j dimension for the Xij constrains. Then, the two matrixes should be combined vertically as constraint of task – the received matrix is 3+j x 3*j dimensional. The object vector is constructed by Pij but as vector 3*j x 1. The rhs constraint is combination between Bi (1 x 3) and vector of 1 (1 x j) – vector is 1 x 3+j.
It worked with lp or Rglpk_solve_LP.
I checked this with several combinations. It worked for j=5000, but it doesn’t worked for j=10000. We should use it for 30000 cases. The matrixes became too large.
Is it possible to solve this task in another way?
My computer has 8GB RAM. The size of matrix is 15.6GB. The returned error is:
Error: cannot allocate vector of size 15.6 Gb
What are the limitation for the linear programing procedure?
Are they come only by RAM of computer and size of matrixes?

Related

Calculate the reconstruction error as the difference between the original and the reconstructed matrix

I am currently in an online class in genomics, coming in as a wetlab physician, so my statistical knowledge is not the best. Right now we are working on PCA and SVD in R. I got a big matrix:
head(mat)
ALL_GSM330151.CEL ALL_GSM330153.CEL ALL_GSM330154.CEL ALL_GSM330157.CEL ALL_GSM330171.CEL ALL_GSM330174.CEL ALL_GSM330178.CEL ALL_GSM330182.CEL
ENSG00000224137 5.326553 3.512053 3.455480 3.472999 3.639132 3.391880 3.282522 3.682531
ENSG00000153253 6.436815 9.563955 7.186604 2.946697 6.949510 9.095092 3.795587 11.987291
ENSG00000096006 6.943404 8.840839 4.600026 4.735104 4.183136 3.049792 9.736803 3.338362
ENSG00000229807 3.322499 3.263655 3.406379 9.525888 3.595898 9.281170 8.946498 3.473750
ENSG00000138772 7.195113 8.741458 6.109578 5.631912 5.224844 3.260912 8.889246 3.052587
ENSG00000169575 7.853829 10.428492 10.512497 13.041571 10.836815 11.964498 10.786381 11.953912
Those are just the first few columns and rows, it has 60 columns and 1000 rows. Columns are cancer samples, rows are genes
The task is to:
removing the eigenvectors and reconstructing the matrix using SVD, then we need to calculate the reconstruction error as the difference between the original and the reconstructed matrix. HINT: You have to use the svd() function and equalize the eigenvalue to $0$ for the component you want to remove.
I have been all over google, but can't find a way to solve this task, which might be because I don't really get the question itself.
so i performed SVD on my matrix m:
d <- svd(mat)
Which gives me 3 matrices (Eigenassays, Eigenvalues and Eigenvectors), which i can access using d$u and so on.
How do I equalize the eigenvalue and ultimately calculate the error?
https://www.rdocumentation.org/packages/base/versions/3.6.2/topics/svd
the decomposition expresses your matrix mat as a product of 3 matrices
mat = d$u x diag(d$d) x t(d$v)
so first confirm you are able to do the matrix multiplications to get back mat
once you are able to do this, set the last couple of elements of d$d to zero before doing the matrix multiplication
It helps to create a function that handles the singular values.
Here, for instance, is one that zeros out any singular value that is too small compared to the largest singular value:
zap <- function(d, digits = 3) ifelse(d < 10^(-digits) * max(abs(d))), 0, d)
Although mathematically all singular values are guaranteed non-negative, numerical issues with floating point algorithms can--and do--create negative singular values, so I have prophylactically wrapped the singular values in a call to abs.
Apply this function to the diagonal matrix in the SVD of a matrix X and reconstruct the matrix by multiplying the components:
X. <- with(svd(X), u %*% diag(zap(d)) %*% t(v))
There are many ways to assess the reconstruction error. One is the Frobenius norm of the difference,
sqrt(sum((X - X.)^2))

Probability of selecting exactly n elements

I have a list of about 100 000 probabilities on an event stored in a vector.
I want to know if it is possible to calculate the probability of n occuring events (e.g. what is the probability that exactly 1000 events occur).
I managed to calculate several probabilities in R :
p is the vector containing all the probabilities
probability of none : prod(1-p)
probability of at least one : 1 - prod(1-p)
I found how to calculate the probability of exactly one event :
sum(p * (prod(1-p) / (1-p)))
But I don't know how to generate a formula for n events.
I do not know R, but I know how I would solve this with programming.
This is a straightforward dynamic programming problem. We start with a vector v = [1.0] of probabilities. Then in untested Python:
for p_i in probabilities:
next_v = [p_i * v[0]]
v.append(0.0)
for j in range(len(v) - 1):
next_v.append(v[j]*p_i + v[j+1]*(1-p_i)
# For roundoff errors
total = sum(next_v)
for j in range(len(next_v)):
next_v[j] /= total
v = next_v
And now your answers can be just read off of the right entry in the vector.
This approach is equivalent to calculating Pascal's triangle row by row, throwing away the old row when you're done.

Matrix dimension do not mach in regression formula

I'm trying to calculate this regression formula, but I have problem with the dimension calculation, they are not correct:
Where:
X-a matrix with dimensions 200x20, n=200 samples, p=20 predictors,
y-a matrix with dimensions 200x1,
- a sequence of coefficients, dimensions 20x1, and k=1,2,3...
- dimensions 20x200
j- and value from 1...p so from 1...20,
The problem is when I calculate
For example for k=20, k-1=19 i have and the dimensions do not match to do a substraction 200x1 - 200x20 x 1x1 =200x1 - 200x20 will not work.
If I take all the beta vector then it is correct. does this: mean to take the 19th value of Beta and to multiply it with the matrix X?
Source of the formula:
You should be using the entire beta vector at each stage of the calculation.
(Tibshirani has been a bit permissive with his use of notation, perhaps...)
The k is just a counter for which step of the algorithm we are on. Right at the start (k = 0 or "step 0") we initialise the entire beta vector to have all elements equal to zero:
At each step of the algorithm (steps k = 1, 2, 3... and so on) we use our previous estimate of the vector beta ( calculated in step k - 1) to calculate a new improved estimate for the vector beta (). The superscript number is not an index into the vector, rather it is a label telling us at which stage of the algorithm that beta vector was produced.
I hope this makes sense. The important point is that each of the values is a different 20x1 vector.

Perform sum of vectors in CUDA/thrust

So I'm trying to implement stochastic gradient descent in CUDA, and my idea is to parallelize it similar to the way that is described in the paper Optimal Distributed Online Prediction Using Mini-Batches
That implementation is aimed at MapReduce distributed environments so I'm not sure if it's optimal when using GPUs.
In short the idea is: at each iteration, calculate the error gradients for each data point in a batch (map), take their average by sum/reducing the gradients, and finally perform the gradient step updating the weights according to the average gradient. The next iteration starts with the updated weights.
The thrust library allows me to perform a reduction on a vector allowing me for example to sum all the elements in a vector.
My question is: How can I sum/reduce an array of vectors in CUDA/thrust?
The input would be an array of vectors and the output would be a vector that is the sum of all the vectors in the array (or, ideally, their average).
Converting my comment into this answer:
Let's say each vector has length m and the array has size n.
An "array of vectors" is then the same as a matrix of size n x m.
If you change your storage format from this "array of vectors" to a single vector of size n * m, you can use thrust::reduce_by_key to sum each row of this matrix separately.
The sum_rows example shows how to do this.

scilab plotting factorial; first trying to correct the equation?

I'm trying to execute this equation in scilab; however, I'm getting error: 59 of function %s_pow called ... even though I define x.
n=0:1:3;
x=[0:0.1:2];
z = factorial(3); w = factorial(n);u = factorial(3-n);
y = z /(w.*u);
t = y.*x^n*(1-x)^(3-n)
(at this point I haven't added in the plot command, although I would assume it's plot(t)?)
Thanks for any input.
The power x^n and (1-x)^(3-n) on the last line both cause the problem, because x and n are matrices and they are not the same size.
As mentioned in the documentation the power operation can only be performed between:
(A:square)^(b:scalar) If A is a square matrix and b is a scalar then A^b is the matrix A to the power b.
(A:matrix).^(b:scalar) If b is a scalar and A a matrix then A.^b is
the matrix formed by the element of A to the power b (elementwise
power). If A is a vector and b is a scalar then A^b and A.^b performs
the same operation (i.e elementwise power).
(A:scalar).^(b:matrix) If A is a scalar and b is a matrix (or
vector) A^b and A.^b are the matrices (or vectors) formed by
a^(b(i,j)).
(A:matrix).^(b:matrix) If A and b are vectors (matrices) of the same
size A.^b is the A(i)^b(i) vector (A(i,j)^b(i,j) matrix).

Resources