I've just started learning r and have had trouble finding an (understandable) explanation of what the prop.table() function does. I found the following explanation and example:
prop.table: Express Table Entries as Fraction of Marginal Table
Examples
m <- matrix(1:4, 2)
m
prop.table(m, 1)
But, as a beginner, I do not understand what this explanation means. I've also attempted to discern its functionality from the result of the above example, but I haven't been able to make sense of it.
With reference to the example above, what does the prop.table() function do? Furthermore, what is a "marginal table"?
The values in each cell divided by the sum of the 4 cells:
prop.table(m)
The value of each cell divided by the sum of the row cells:
prop.table(m, 1)
The value of each cell divided by the sum of the column cells:
prop.table(m, 2)
I think this can help
include all those things like prop.table(m), prop.table(m, 1), prop.table(m, 2)
m <- matrix(1:4, 2)
> m
[,1] [,2]
[1,] 1 3
[2,] 2 4
> prop.table(m) #sum=1+2+3+4=10, 1/10=0.1, 2/10=0.2, 3/10=0.3,4/10=0.4
[,1] [,2]
[1,] 0.1 0.3
[2,] 0.2 0.4
> prop.table(m,1)
[,1] [,2]
[1,] 0.2500000 0.7500000 #row1: sum=1+3=4, m(0,0)=1/4=0.25, m(0,1)=3/4=0.75
[2,] 0.3333333 0.6666667 #row2: sum=2+4=6, m(1,0)=2/6=0.33, m(1,1)=4/6=0.66
> prop.table(m,2)
[,1] [,2]
[1,] 0.3333333 0.4285714 #col1: sum=1+2=3, m(0,0)=1/3=0.33, m(1,0)=2/3=0.4285
[2,] 0.6666667 0.5714286 #col2: sum=3+4=7, m(0,1)=3/7=0.66, m(1,1)=4/7=0.57
>
when m is the 2D matrix: (m,1) refers to a fraction of row marginal table (sum over each row), (m,2) refers to a fraction of column marginal table (sum over each column). In short, just a "% of total sum of row of column", if you dont want to care about the term marginal.
Example:
m with extra row and column margin
[,1] [,2] ***
[1,] 1 4 5
[2,] 2 5 7
[3,] 3 6 9
*** 6 15
> prop.table(m,1)
` [,1] [,2]
[1,] 0.2000000 0.8000000
[2,] 0.2857143 0.7142857
[3,] 0.3333333 0.6666667
> prop.table(m,2)
[,1] [,2]
[1,] 0.1666667 0.2666667
[2,] 0.3333333 0.3333333
[3,] 0.5000000 0.4000000
Related
trying to get the correlation coeffizient between the columns of each row of the matrix. I am reall new to R and it is a real beginner thing here. One of the first tasks I have to do for class.
Matrix:
A2
[,1] [,2]
[1,] 4 -2
[2,] 8 -3
[3,] 6 1
[4,] 2 2
[5,] -1 1
I tried to use cor(A) since I read it will automatically calculate the correlation coeffizient for columns of each row, but it gives me the following result:
cor(A2)
[,1] [,2]
[1,] 1.0000000 -0.6338878
[2,] -0.6338878 1.0000000
when using cor(t(A2))
cor(t(A2))
[,1] [,2] [,3] [,4] [,5]
[1,] 1 1 1 NA -1
[2,] 1 1 1 NA -1
[3,] 1 1 1 NA -1
[4,] NA NA NA 1 NA
[5,] -1 -1 -1 NA 1
But I expected it to have 5 rows, one column with the result in it.
There are several ways to use the cor() function. If you want to calculate the correlation between two columns in a matrix, then you can provide two arguments like this:
> cor(A2[,1], A2[,2])
[1] -0.6338878
If you input a single matrix as an argument, then it will return a correlation matrix.
> cor(A2)
[,1] [,2]
[1,] 1.0000000 -0.6338878
[2,] -0.6338878 1.0000000
In this case, position [1,1] is the correlation between the A2[,1] and A2[,1] (which is exactly 1). In the position [1,2], you can find the correlation between A2[,1] and A2[,2]. The correlation matrix is symmetric, and the diagnonal is always 1, because the correlation of a vector with itself is 1.
I have a variance covariance matrix S:
> S
[,1] [,2]
[1,] 4 -3
[2,] -3 9
I am trying to find an inverse of it.
The code I have is:
>invS <- (1/((S[1,1]*S[2,2])-(S[1,2]*S[2,1])))*S
[,1] [,2]
[1,] 0.1481481 -0.1111111
[2,] -0.1111111 0.3333333
However, if I use solve(), I get this:
>invSalt <- solve(S)
[,1] [,2]
[1,] 0.3333333 0.1111111
[2,] 0.1111111 0.1481481
Why is invS incorrect? What should I change to correct it?
You correctly found the determinant in the denominator, but the rest is wrong.
Off-diagonal elements should be with the opposite sign, while the diagonal elements should be switched. Both of those things are clearly visible when comparing the two matrices.
That's not the most convenient thing to do by hand, so solve is really better. If you insist on doing it manually, then you could use
matrix(rev(S), 2, 2) / (prod(diag(S)) - S[1, 2] * S[2, 1]) * (2 * diag(1, 2) - 1)
# [,1] [,2]
# [1,] 0.3333333 0.1111111
# [2,] 0.1111111 0.1481481
The correct formula is
(1/((S[1,1]*S[2,2])-(S[1,2]*S[2,1])))* matrix(c(S[2,2], -S[2,1], -S[1,2], S[1,1]),2)
I have similarity value matrix (m) as bellow:
[,1] [,2] [,3]
[1,] 1.0000000 1.0000000 0.8579698
[2,] 1.0000000 1.0000000 0.8579698
[3,] 0.8579698 0.8579698 1.0000000
I would like to get the position of 0.8579698 by easy way .
I have tried to use which function
it works fine for element 1.
which( m == 1.0000000, TRUE)
Any idea ?
The question doesn't say how this matrix has been constructed, but this problem seems to arise from 0.8579698 being the truncated expression of a real (float) value. In general, you can't use exact equality for real values:
> .72==.72
[1] TRUE
But:
> sqrt(.72)
[1] 0.8485281
> sqrt(.72)==0.8485281
[1] FALSE
There is a small difference between those apparently equal numbers:
> sqrt(.72)-0.8485281
[1] 3.742386e-08
A common workaround is to use a difference threshold instead of an equality:
> m<-matrix(c(1,1,.72,1,1,.72,.72,.72,1),nrow=3,ncol=3)
> (m<-sqrt(m))
[,1] [,2] [,3]
[1,] 1.0000000 1.0000000 0.8485281
[2,] 1.0000000 1.0000000 0.8485281
[3,] 0.8485281 0.8485281 1.0000000
> which(abs(m-.8485)<.0001,arr.ind = TRUE)
row col
[1,] 3 1
[2,] 3 2
[3,] 1 3
[4,] 2 3
I am running into a scaling problem with several matrices I am working with. Here is an example of my matrix:
my_matrix = matrix(data = c(1,2,3,4,5,6,7,8,25), nrow = 3)
My current method of scaling between 0 to 1 is using the equation (value - min) / (max - min). Using this on the entire matrix gives the following:
mn = min(my_matrix); mx = max(my_matrix);
(my_matrix - mn) / (mx - mn)
[,1] [,2] [,3]
[1,] 0.00000000 0.1250000 0.2500000
[2,] 0.04166667 0.1666667 0.2916667
[3,] 0.08333333 0.2083333 1.0000000
I understand the calculation, as well as why I am receiving this matrix. However, I would much prefer to scale 0 to 1 based on percentiles, and receive this matrix instead:
[,1] [,2] [,3]
[1,] 0.11111111 0.4444444 0.7777778
[2,] 0.22222222 0.5555556 0.8888889
[3,] 0.33333333 0.6666667 1.0000000
Does anybody know an easy way to do this? Thanks!
How to solve a non-square linear system with R : A X = B ?
(in the case the system has no solution or infinitely many solutions)
Example :
A=matrix(c(0,1,-2,3,5,-3,1,-2,5,-2,-1,1),3,4,T)
B=matrix(c(-17,28,11),3,1,T)
A
[,1] [,2] [,3] [,4]
[1,] 0 1 -2 3
[2,] 5 -3 1 -2
[3,] 5 -2 -1 1
B
[,1]
[1,] -17
[2,] 28
[3,] 11
If the matrix A has more rows than columns, then you should use least squares fit.
If the matrix A has fewer rows than columns, then you should perform singular value decomposition. Each algorithm does the best it can to give you a solution by using assumptions.
Here's a link that shows how to use SVD as a solver:
http://www.ecse.rpi.edu/~qji/CV/svd_review.pdf
Let's apply it to your problem and see if it works:
Your input matrix A and known RHS vector B:
> A=matrix(c(0,1,-2,3,5,-3,1,-2,5,-2,-1,1),3,4,T)
> B=matrix(c(-17,28,11),3,1,T)
> A
[,1] [,2] [,3] [,4]
[1,] 0 1 -2 3
[2,] 5 -3 1 -2
[3,] 5 -2 -1 1
> B
[,1]
[1,] -17
[2,] 28
[3,] 11
Let's decompose your A matrix:
> asvd = svd(A)
> asvd
$d
[1] 8.007081e+00 4.459446e+00 4.022656e-16
$u
[,1] [,2] [,3]
[1,] -0.1295469 -0.8061540 0.5773503
[2,] 0.7629233 0.2908861 0.5773503
[3,] 0.6333764 -0.5152679 -0.5773503
$v
[,1] [,2] [,3]
[1,] 0.87191556 -0.2515803 -0.1764323
[2,] -0.46022634 -0.1453716 -0.4694190
[3,] 0.04853711 0.5423235 0.6394484
[4,] -0.15999723 -0.7883272 0.5827720
> adiag = diag(1/asvd$d)
> adiag
[,1] [,2] [,3]
[1,] 0.1248895 0.0000000 0.00000e+00
[2,] 0.0000000 0.2242431 0.00000e+00
[3,] 0.0000000 0.0000000 2.48592e+15
Here's the key: the third eigenvalue in d is very small; conversely, the diagonal element in adiag is very large. Before solving, set it equal to zero:
> adiag[3,3] = 0
> adiag
[,1] [,2] [,3]
[1,] 0.1248895 0.0000000 0
[2,] 0.0000000 0.2242431 0
[3,] 0.0000000 0.0000000 0
Now let's compute the solution (see slide 16 in the link I gave you above):
> solution = asvd$v %*% adiag %*% t(asvd$u) %*% B
> solution
[,1]
[1,] 2.411765
[2,] -2.282353
[3,] 2.152941
[4,] -3.470588
Now that we have a solution, let's substitute it back to see if it gives us the same B:
> check = A %*% solution
> check
[,1]
[1,] -17
[2,] 28
[3,] 11
That's the B side you started with, so I think we're good.
Here's another nice SVD discussion from AMS:
http://www.ams.org/samplings/feature-column/fcarc-svd
Aim is to solve Ax = b
where A is p by q, x is q by 1 and b is p by 1 for x given A and b.
Approach 1: Generalized Inverse: Moore-Penrose
https://en.wikipedia.org/wiki/Generalized_inverse
Multiplying both sides of the equation, we get
A'Ax = A' b
where A' is the transpose of A. Note that A'A is q by q matrix now. One way to solve this now multiply both sides of the equation by the inverse of A'A. Which gives,
x = (A'A)^{-1} A' b
This is the theory behind generalized inverse. Here G = (A'A)^{-1} A' is pseudo-inverse of A.
library(MASS)
ginv(A) %*% B
# [,1]
#[1,] 2.411765
#[2,] -2.282353
#[3,] 2.152941
#[4,] -3.470588
Approach 2: Generalized Inverse using SVD.
#duffymo used SVD to obtain a pseudoinverse of A.
However, last elements of svd(A)$d may not be quite as small as in this example. So, probably one shouldn't use that method as is. Here's an example where none of the last elements of A is close to zero.
A <- as.matrix(iris[11:13, -5])
A
# Sepal.Length Sepal.Width Petal.Length Petal.Width
# 11 5.4 3.7 1.5 0.2
# 12 4.8 3.4 1.6 0.2
# 13 4.8 3.0 1.4 0.1
svd(A)$d
# [1] 10.7820526 0.2630862 0.1677126
One option would be to look as the singular values in cor(A)
svd(cor(A))$d
# [1] 2.904194e+00 1.095806e+00 1.876146e-16 1.155796e-17
Now, it is clear there is only two large singular values are present. So, one now can apply svd on A to get pseudo-inverse as below.
svda <- svd(A)
G = svda$v[, 1:2] %*% diag(1/svda$d[1:2]) %*% t(svda$u[, 1:2])
# to get x
G %*% B