Scale numeric matrix into values from 0 - 1 uniformly, in R - r

I am running into a scaling problem with several matrices I am working with. Here is an example of my matrix:
my_matrix = matrix(data = c(1,2,3,4,5,6,7,8,25), nrow = 3)
My current method of scaling between 0 to 1 is using the equation (value - min) / (max - min). Using this on the entire matrix gives the following:
mn = min(my_matrix); mx = max(my_matrix);
(my_matrix - mn) / (mx - mn)
[,1] [,2] [,3]
[1,] 0.00000000 0.1250000 0.2500000
[2,] 0.04166667 0.1666667 0.2916667
[3,] 0.08333333 0.2083333 1.0000000
I understand the calculation, as well as why I am receiving this matrix. However, I would much prefer to scale 0 to 1 based on percentiles, and receive this matrix instead:
[,1] [,2] [,3]
[1,] 0.11111111 0.4444444 0.7777778
[2,] 0.22222222 0.5555556 0.8888889
[3,] 0.33333333 0.6666667 1.0000000
Does anybody know an easy way to do this? Thanks!

Related

How to divide each value in a matrix with the max value of the corresponding column

e.g.
m <- matrix(c(1:9),3,3)
the maximum values per column would be ...
n <- matrix(c(3,6,9),1,3)
how can I then perform an operation so that the output is a 3 by 3 matrix with the values: 0.33,0.66,1 in the first column (as the first column is divided by 3) 0.66,0.83,6 in the second column (as the second column is divided by 6) 0.77, 0.88, 9 in the third column (as the third column is divided by 9).
We can replicate 'n' by the col of 'm'
m/n[col(m)]
# [,1] [,2] [,3]
#[1,] 0.3333333 0.6666667 0.7777778
#[2,] 0.6666667 0.8333333 0.8888889
#[3,] 1.0000000 1.0000000 1.0000000
Using apply:
apply(m, 2, function(x) x / max(x))
[,1] [,2] [,3]
[1,] 0.3333333 0.6666667 0.7777778
[2,] 0.6666667 0.8333333 0.8888889
[3,] 1.0000000 1.0000000 1.0000000
Is it what you are looking for ?
With base R, you can use pmax like below
> t((u<-t(m))/do.call(pmax,data.frame(u)))
[,1] [,2] [,3]
[1,] 0.3333333 0.6666667 0.7777778
[2,] 0.6666667 0.8333333 0.8888889
[3,] 1.0000000 1.0000000 1.0000000
You can use sweep:
sweep(m, 2, apply(m, 2, max), "/")
[,1] [,2] [,3]
[1,] 0.3333333 0.6666667 0.7777778
[2,] 0.6666667 0.8333333 0.8888889
[3,] 1.0000000 1.0000000 1.0000000
or subset the max with col:
m / apply(m, 2, max)[col(m)]
or make a Matrix Multiplication with the diag of 1/max:
m %*% diag(1/apply(m, 2, max))
Have a look at: How to divide each row of a matrix by elements of a vector in R

Finding inverse matrix in R

I have a variance covariance matrix S:
> S
[,1] [,2]
[1,] 4 -3
[2,] -3 9
I am trying to find an inverse of it.
The code I have is:
>invS <- (1/((S[1,1]*S[2,2])-(S[1,2]*S[2,1])))*S
[,1] [,2]
[1,] 0.1481481 -0.1111111
[2,] -0.1111111 0.3333333
However, if I use solve(), I get this:
>invSalt <- solve(S)
[,1] [,2]
[1,] 0.3333333 0.1111111
[2,] 0.1111111 0.1481481
Why is invS incorrect? What should I change to correct it?
You correctly found the determinant in the denominator, but the rest is wrong.
Off-diagonal elements should be with the opposite sign, while the diagonal elements should be switched. Both of those things are clearly visible when comparing the two matrices.
That's not the most convenient thing to do by hand, so solve is really better. If you insist on doing it manually, then you could use
matrix(rev(S), 2, 2) / (prod(diag(S)) - S[1, 2] * S[2, 1]) * (2 * diag(1, 2) - 1)
# [,1] [,2]
# [1,] 0.3333333 0.1111111
# [2,] 0.1111111 0.1481481
The correct formula is
(1/((S[1,1]*S[2,2])-(S[1,2]*S[2,1])))* matrix(c(S[2,2], -S[2,1], -S[1,2], S[1,1]),2)

Return position of specific element in matrix - R

I have similarity value matrix (m) as bellow:
[,1] [,2] [,3]
[1,] 1.0000000 1.0000000 0.8579698
[2,] 1.0000000 1.0000000 0.8579698
[3,] 0.8579698 0.8579698 1.0000000
I would like to get the position of 0.8579698 by easy way .
I have tried to use which function
it works fine for element 1.
which( m == 1.0000000, TRUE)
Any idea ?
The question doesn't say how this matrix has been constructed, but this problem seems to arise from 0.8579698 being the truncated expression of a real (float) value. In general, you can't use exact equality for real values:
> .72==.72
[1] TRUE
But:
> sqrt(.72)
[1] 0.8485281
> sqrt(.72)==0.8485281
[1] FALSE
There is a small difference between those apparently equal numbers:
> sqrt(.72)-0.8485281
[1] 3.742386e-08
A common workaround is to use a difference threshold instead of an equality:
> m<-matrix(c(1,1,.72,1,1,.72,.72,.72,1),nrow=3,ncol=3)
> (m<-sqrt(m))
[,1] [,2] [,3]
[1,] 1.0000000 1.0000000 0.8485281
[2,] 1.0000000 1.0000000 0.8485281
[3,] 0.8485281 0.8485281 1.0000000
> which(abs(m-.8485)<.0001,arr.ind = TRUE)
row col
[1,] 3 1
[2,] 3 2
[3,] 1 3
[4,] 2 3

How does the `prop.table()` function work in r?

I've just started learning r and have had trouble finding an (understandable) explanation of what the prop.table() function does. I found the following explanation and example:
prop.table: Express Table Entries as Fraction of Marginal Table
Examples
m <- matrix(1:4, 2)
m
prop.table(m, 1)
But, as a beginner, I do not understand what this explanation means. I've also attempted to discern its functionality from the result of the above example, but I haven't been able to make sense of it.
With reference to the example above, what does the prop.table() function do? Furthermore, what is a "marginal table"?
The values in each cell divided by the sum of the 4 cells:
prop.table(m)
The value of each cell divided by the sum of the row cells:
prop.table(m, 1)
The value of each cell divided by the sum of the column cells:
prop.table(m, 2)
I think this can help
include all those things like prop.table(m), prop.table(m, 1), prop.table(m, 2)
m <- matrix(1:4, 2)
> m
[,1] [,2]
[1,] 1 3
[2,] 2 4
> prop.table(m) #sum=1+2+3+4=10, 1/10=0.1, 2/10=0.2, 3/10=0.3,4/10=0.4
[,1] [,2]
[1,] 0.1 0.3
[2,] 0.2 0.4
> prop.table(m,1)
[,1] [,2]
[1,] 0.2500000 0.7500000 #row1: sum=1+3=4, m(0,0)=1/4=0.25, m(0,1)=3/4=0.75
[2,] 0.3333333 0.6666667 #row2: sum=2+4=6, m(1,0)=2/6=0.33, m(1,1)=4/6=0.66
> prop.table(m,2)
[,1] [,2]
[1,] 0.3333333 0.4285714 #col1: sum=1+2=3, m(0,0)=1/3=0.33, m(1,0)=2/3=0.4285
[2,] 0.6666667 0.5714286 #col2: sum=3+4=7, m(0,1)=3/7=0.66, m(1,1)=4/7=0.57
>
when m is the 2D matrix: (m,1) refers to a fraction of row marginal table (sum over each row), (m,2) refers to a fraction of column marginal table (sum over each column). In short, just a "% of total sum of row of column", if you dont want to care about the term marginal.
Example:
m with extra row and column margin
[,1] [,2] ***
[1,] 1 4 5
[2,] 2 5 7
[3,] 3 6 9
*** 6 15
> prop.table(m,1)
` [,1] [,2]
[1,] 0.2000000 0.8000000
[2,] 0.2857143 0.7142857
[3,] 0.3333333 0.6666667
> prop.table(m,2)
[,1] [,2]
[1,] 0.1666667 0.2666667
[2,] 0.3333333 0.3333333
[3,] 0.5000000 0.4000000

R's `chol` differs from MATLAB's `cholcov`. How to do a Cholesky-alike covariance decomposition?

I've been trying to reproduce a cholesky-like covariance decomposition in R - like it is done in Matlab using cholcov(). Example taken from https://uk.mathworks.com/help/stats/cholcov.html.
Result of the original cholcov() function as of their example:
T =
-0.2113 0.7887 -0.5774 0
0.7887 -0.2113 -0.5774 0
1.1547 1.1547 1.1547 1.7321
I am trying to replicate this T in R. I tried:
C1 <- cbind(c(2,1,1,2), c(1,2,1,2), c(1,1,2,2), c(2,2,2,3))
T1 <- chol(C1)
C2 <- t(T1) %*% T1
My result:
[,1] [,2] [,3] [,4]
[1,] 1.414214 0.7071068 0.7071068 1.414214e+00
[2,] 0.000000 1.2247449 0.4082483 8.164966e-01
[3,] 0.000000 0.0000000 1.1547005 5.773503e-01
[4,] 0.000000 0.0000000 0.0000000 1.290478e-08
C2 recovers C1, but T1 is quite different from MATLAB's solution. I then thought maybe it would be a Cholesky composition of the covariance matrix:
T1 <- chol(cov(C1))
but I get
[,1] [,2] [,3] [,4]
[1,] 0.5773503 0.0000000 0.0000000 2.886751e-01
[2,] 0.0000000 0.5773503 0.0000000 2.886751e-01
[3,] 0.0000000 0.0000000 0.5773503 2.886751e-01
[4,] 0.0000000 0.0000000 0.0000000 3.725290e-09
which is not right either.
Could anyone give me a hint how cholcov() in Matlab is calculated so that I could replicate it in R?
You are essentially abusing R function chol in this case. The cholcov function from MATLAB is a composite function.
If the covariance is positive, it does Cholesky factorization, returning a full-rank upper triangular Cholesky factor;
If the covariance is positive-semidefinite, it does Eigen decomposition, returning a rectangular matrix.
On the other hand, chol from R only does Choleksy factorization. The example you give, C1, falls into the second case. So, we should resort to eigen function in R.
E <- eigen(C1, symmetric = TRUE)
#$values
#[1] 7.000000e+00 1.000000e+00 1.000000e+00 2.975357e-17
#
#$vectors
# [,1] [,2] [,3] [,4]
#[1,] -0.4364358 0.000000e+00 8.164966e-01 -0.3779645
#[2,] -0.4364358 -7.071068e-01 -4.082483e-01 -0.3779645
#[3,] -0.4364358 7.071068e-01 -4.082483e-01 -0.3779645
#[4,] -0.6546537 8.967707e-16 -2.410452e-16 0.7559289
V <- E$vectors
D <- sqrt(E$values) ## root eigen values
Since numerical rank is 3, we drop the last eigen value and eigen vector:
V1 <- V[, 1:3]
D1 <- D[1:3]
Thus the factor you want is:
R <- D1 * t(V1) ## diag(D1) %*% t(V1)
# [,1] [,2] [,3] [,4]
#[1,] -1.1547005 -1.1547005 -1.1547005 -1.732051e+00
#[2,] 0.0000000 -0.7071068 0.7071068 8.967707e-16
#[3,] 0.8164966 -0.4082483 -0.4082483 -2.410452e-16
We can verify that:
crossprod(R) ## t(R) %*% R
# [,1] [,2] [,3] [,4]
#[1,] 2 1 1 2
#[2,] 1 2 1 2
#[3,] 1 1 2 2
#[4,] 2 2 2 3
The R factor above is not as same as the one returned by cholcov due to different algorithms used for Eigen factorization. R uses LAPACK routine DSYVER in which some pivoting is done so that eigen values are non-increasing. MATLAB's cholcov is not open-source, so I'm not sure what algorithm it uses. But it is easy to demonstrate that it does not arrange eigen values in non-increasing order.
Consider the factor T returned by cholcov:
T <- structure(c(-0.2113, 0.7887, 1.1547, 0.7887, -0.2113, 1.1547,
-0.5774, -0.5774, 1.1547, 0, 0, 1.7321), .Dim = 3:4)
We can get eigen values by
rowSums(T ^ 2)
# [1] 1.000086 1.000086 7.000167
There are some round-off error because T is not precise, but we can see clearly that eigen values are 1, 1, 7. On the other hand, we have 7, 1, 1 from R (recall D1).

Resources