Related
I have a matrix with 50 rows and 50 columns:
[,1] [,2] [,3]...[,50]
[1,] 1 0.8 0.7
[2,] 0.8 1 0.5
[3,] 0.7 0.5 1
...
[50,]
And I want to sum 0.02 in values up to diagonal to obtain something like this:
[,1] [,2] [,3]...[,50]
[1,] 1 0.82 0.72
[2,] 0.8 1 0.52
[3,] 0.7 0.5 1
...
[50,]
Does anyone know how the sum could be done only in the values that are above the diagonal of the matrix using R?
Example of matrix code:
matrix <- as.matrix(data.frame(A = c(1, 0.8, 0.7), B = c(0.8, 1, 0.5), C = c(0.7, 0.5, 1)), nrow=3, ncol=3)
Try upper.tri like below
matrix[upper.tri(matrix)] <- matrix[upper.tri(matrix)] + 0.02
You can use lower.tri(m) or upper.tri(m) functions in R. Which m is your matrix.
m = matrix(1:36, 6, 6)
m[upper.tri(m)] = m[upper.tri(m)] + 0.02
m
I'm afraid I'm missing something obvious, but I just can't see what I am doing wrong.
If anyone can help me find it, please, it would be great.
Here's the full, symmetrical distance matrix I'm starting from:
d2 <- structure(list(P1 = c(0, 0.1, 0.3, 0.2, 0, 0.1), P2 = c(0.1,
0, 0.5, 0.7, 1, 0.9), P3 = c(0.3, 0.5, 0, 1, 0.2, 0.3), P4 = c(0.2,
0.7, 1, 0, 0.2, 0.5), P5 = c(0, 1, 0.2, 0.2, 0, 0.7), P6 = c(0.1,
0.9, 0.3, 0.5, 0.7, 0)), class = c("tbl_df", "tbl", "data.frame"
), row.names = c(NA, -6L))
sum(abs(d2-t(d2)))
#[1] 0
I want to generate coordinates for the corresponding 6 points, so that the (euclidean) distance matrix resulting from those coordinates is as close as possible to my d2.
From the cmdscale documentation:
A set of Euclidean distances on n points can be represented exactly in at most n - 1 dimensions.
I would have thought (n-1)/2 dimensions would suffice, and indeed, when I run cmdscale, if I go anywhere higher than k=3 I get something close to 0 for the higher coordinates, or even error messages:
cmdscale(d2,k=3)
# [,1] [,2] [,3]
#[1,] -0.03526127 0.07755701 1.708755e-05
#[2,] -0.50626939 0.31256816 -5.646907e-02
#[3,] -0.26333957 -0.40518119 -6.978213e-02
#[4,] 0.35902238 0.37455879 2.148406e-02
#[5,] 0.33997864 -0.17998635 -2.809260e-01
#[6,] 0.10586921 -0.17951643 3.856760e-01
cmdscale(d2,k=4)
# [,1] [,2] [,3] [,4]
#[1,] -0.03526127 0.07755701 1.708755e-05 -7.450581e-09
#[2,] -0.50626939 0.31256816 -5.646907e-02 -7.450581e-09
#[3,] -0.26333957 -0.40518119 -6.978213e-02 -7.450581e-09
#[4,] 0.35902238 0.37455879 2.148406e-02 -7.450581e-09
#[5,] 0.33997864 -0.17998635 -2.809260e-01 -7.450581e-09
#[6,] 0.10586921 -0.17951643 3.856760e-01 -7.450581e-09
cmdscale(d2,k=5)
# [,1] [,2] [,3] [,4]
#[1,] -0.03526127 0.07755701 1.708755e-05 -7.450581e-09
#[2,] -0.50626939 0.31256816 -5.646907e-02 -7.450581e-09
#[3,] -0.26333957 -0.40518119 -6.978213e-02 -7.450581e-09
#[4,] 0.35902238 0.37455879 2.148406e-02 -7.450581e-09
#[5,] 0.33997864 -0.17998635 -2.809260e-01 -7.450581e-09
#[6,] 0.10586921 -0.17951643 3.856760e-01 -7.450581e-09
#Warning message:
#In cmdscale(d2, k = 5) : only 4 of the first 5 eigenvalues are > 0
So, assuming that k=3 is sufficient, this is what happens when I try to reverse the operation:
dd <- dist(cmdscale(d2,k=3),diag = T,upper = T)
dd
# 1 2 3 4 5 6
#1 0.0000000 0.5294049 0.5384495 0.4940956 0.5348482 0.4844970
#2 0.5294049 0.0000000 0.7578630 0.8710048 1.0045529 0.9013064
#3 0.5384495 0.7578630 0.0000000 1.0018275 0.6777074 0.6282371
#4 0.4940956 0.8710048 1.0018275 0.0000000 0.6319294 0.7097335
#5 0.5348482 1.0045529 0.6777074 0.6319294 0.0000000 0.7065166
#6 0.4844970 0.9013064 0.6282371 0.7097335 0.7065166 0.0000000
Which is quite different from what I expected:
as.matrix(dd)-d2
# P1 P2 P3 P4 P5 P6
#1 0.0000000 0.429404930 0.238449457 0.294095619 0.534848178 0.384497043
#2 0.4294049 0.000000000 0.257862963 0.171004810 0.004552925 0.001306386
#3 0.2384495 0.257862963 0.000000000 0.001827507 0.477707386 0.328237091
#4 0.2940956 0.171004810 0.001827507 0.000000000 0.431929428 0.209733518
#5 0.5348482 0.004552925 0.477707386 0.431929428 0.000000000 0.006516573
#6 0.3844970 0.001306386 0.328237091 0.209733518 0.006516573 0.000000000
sum(abs(as.matrix(dd)-d2))
#[1] 7.543948
Has anyone got any idea why the two distance matrices don't match at all?
I could try building my own least squares problem to find the coordinates, but first I need to understand if I'm doing something wrong with these out of the box R functionalities.
Thanks!
EDIT possible inconsistency in the data found
Could the issue be that according to d2 points 1 and 5 coincide (they have distance 0):
as.matrix(d2)
# P1 P2 P3 P4 P5 P6
#[1,] 0.0 0.1 0.3 0.2 0.0 0.1
#[2,] 0.1 0.0 0.5 0.7 1.0 0.9
#[3,] 0.3 0.5 0.0 1.0 0.2 0.3
#[4,] 0.2 0.7 1.0 0.0 0.2 0.5
#[5,] 0.0 1.0 0.2 0.2 0.0 0.7
#[6,] 0.1 0.9 0.3 0.5 0.7 0.0
but then these two points have different distances from other points, e.g. d(1-2) is 0.1 whereas d(5-2) is 1?
Replacing the two 0's does not seem to help though:
d3 <- d2
d3[1,5] <- 0.2
d3[5,1] <- 0.2
dd3 <- cmdscale(as.matrix(d3),k=3)
sum(abs(as.matrix(dist(dd3))-as.matrix(d3)))
#[1] 7.168348
Does this perhaps indicate that not all distance matrices can be reduced to a completely consistent set of points, regardless of how many dimensions one uses?
EDIT 2 possible answer to the last question.
I suspect that the answer is yes. And I was wrong on the number of dimensions, I see now why you need N-1 rather than half that.
If I have a distance d(A-B) = 1, I can represent that in 2-1 = 1 dimensions (x axis), i.e. on a line, placing A in (xA=0) and B in (xB=1).
Then I introduce a third point C and I state that d(A-C) = 2.
I have 3 points, so I need 3-1 = 2 dimensions (xy plane).
The constraint given by d(A-C) is:
(xC - 0)^2 + (yC - 0)^2 = d(A-C)^2 = 4.
i.e. C can be anywhere on a circumference of radius 2 centred in A.
This constrains both xC and yC to be in [-2,2].
However, previously I had not considered that this constrains the possible values of d(B-C), too, because:
d(B-C)^2 = (xC - 1)^2 + (yC - 0)^2
thus, by substitution of the (yC - 0)^2 term:
d(B-C)^2 = (xC - 1)^2 + 4 - (xC - 0)^2 = -2*xC + 5
d(B-C)^2 is therefore bound to [-2*(+2)+5,-2*(-2)+5] = [1,9].
So if my distance matrix contained d(A-B) = 1, d(A-C) = 2 and d(B-C) anywhere outside [1,3], it would configure a system that does not correspond to 3 points in Euclidean space.
At least, I hope this makes sense.
So I guess my original question must be revoked.
I thought I'd leave the reasoning here for future reference or if anyone else should have the same doubt.
Multidimensional scaling creates coordinates for the specified number of dimensions such that they will represent the distances in the original matrix as closely as possible. But the distances will be at different scales. In your example, d3 is the original distance matrix, dd3 is the matrix of coordinates, and dist(dd3) is the distance matrix from the reconstructed coordinates. The values are different, but they reflect the same relationships between points:
d3.v <- as.vector(as.dist(d3)) # Vector of original distances
dd3.v <- as.vector(dist(dd3)) # Vector of distances computed from coordinates
cor(d3.v, dd3.v)
# [1] 0.9433903
plot(d3.v, dd3.v, pch=16)
I have a series of elements A, B, C and D. For each possible pair (AB, AC, AD, BC, BD, CD), I have computed a distance measure and store in a vector at position x.
Position x is determined by the following loop:
(n is number of elements, in this example case, 4)
n=1
for i in 1:(n-1)
for j in (i+1):n
distancevector[n] = distancemeasure
n = n+1
What is the easiest way to transform distancevector into a distance matrix in R?
Example:
distancevector = c(0.1, 0.2, 0.3, 0.4, 0.5, 0.6)
what I want would be this distance matrix:
A 1 0.1 0.2 0.3
B 0.1 1 0.4 0.5
C 0.2 0.4 1 0.6
D 0.3 0.5 0.6 1
In base R we can try:
n <- 4
distancevector <- c(0.1, 0.2, 0.3, 0.4, 0.5, 0.6)
D <- diag(n)
D[lower.tri(D)] <- distancevector
D[upper.tri(D)] <- t(D)[upper.tri(D)]
> D
[,1] [,2] [,3] [,4]
[1,] 1.0 0.1 0.2 0.3
[2,] 0.1 1.0 0.4 0.5
[3,] 0.2 0.4 1.0 0.6
[4,] 0.3 0.5 0.6 1.0
Lets say I have two list-of-lists, one being solely binary and the other one being quantitative. The order in the lists matters. I would like to map the binary matrices onto its qualitatively counterpart while creating a new list-of-lists with the same number of nested matrices with the same dimensions. These matrices will be subsets of their qualitative counterparts; where there are 1s in the binary matrices.
# dummy data
dat1 <- c(0,1,0,1,1,0,0,0,1,0,0,0,1,1,0,1)
mat1 <- matrix(dat1, ncol=4, nrow=4, byrow=T)
dat2 <- c(1,1,0,1,0,0,1,1,0,1,0,1,0,1,0,0)
mat2 <- matrix(dat1, ncol=4, nrow=4, byrow=T)
lsMat1 <- list(mat1, mat2)
dat3 <- c(0.3,0.1,0.6,0.3,0.9,0.1,0.1,0.3,0.6,0.2,0.7,0.8,0.4,0.1,0.4,0.5)
mat3 <- matrix(dat3, ncol=4, nrow=4, byrow=T)
dat4 <- c(0.5,0.3,0.6,0.8,0.1,0.4,0.5,0.1,0.5,0.1,0.0,0.1,0.4,0.6,0.0,0.8)
mat4 <- matrix(dat4, ncol=4, nrow=4, byrow=T)
lsMat2 <- list(mat3, mat4)
Desired new nested list
[[1]]
[,1] [,2] [,3] [,4]
[1,] 0.0 0.1 0 0.3
[2,] 0.9 0.0 0 0.0
[3,] 0.6 0.0 0 0.0
[4,] 0.4 0.1 0 0.5
[[2]]
[,1] [,2] [,3] [,4]
[1,] 0.0 0.3 0 0.8
[2,] 0.1 0.0 0 0.0
[3,] 0.5 0.0 0 0.0
[4,] 0.4 0.6 0 0.8
Any pointers would be highly appreciated, thanks!
I'm going to assume the output you supplied above is incorrect. Since you have 0's and 1's in your binary matrix and you only want to keep the 1's values, you can use simple elementwise multiplication. You can do that for each item in the list with
Map(`*`, lsMat1, lsMat2)
which returns
[[1]]
[,1] [,2] [,3] [,4]
[1,] 0.0 0.1 0 0.3
[2,] 0.9 0.0 0 0.0
[3,] 0.6 0.0 0 0.0
[4,] 0.4 0.1 0 0.5
[[2]]
[,1] [,2] [,3] [,4]
[1,] 0.0 0.3 0 0.8
[2,] 0.1 0.0 0 0.0
[3,] 0.5 0.0 0 0.0
[4,] 0.4 0.6 0 0.8
given that column three in both matrices in lsMat1 are all 0, this seems more correct.
If i understood the question i would do a element-wise matrix multiplication. Im not familiar with the syntax you posted but IN MATLAB:
mat1 .* mat3
Now all elements that are zero in your binary matrix will stay zero, and all that are one will become the value from your qualitative matrix.
Hope it helps!
I have a matrix with either 1s or 0s.
xmat = matrix(round(runif(12),0), ncol=3)
[,1] [,2] [,3]
[1,] 0 1 1
[2,] 1 0 1
[3,] 1 0 0
[4,] 1 0 1
I also have a rule table, which is a list.
a = c(0.2, 0.5)
b = c(0.5, 0.6)
c = c(0.8, 0.1)
names(a) = c("0", "1")
names(b) = c("0", "1")
names(c) = c("0", "1")
ruletable = list(a, b, c)
[[1]]
0 1
0.2 0.5
[[2]]
0 1
0.5 0.6
[[3]]
0 1
0.8 0.1
I need to replace the 1s and 0s in each column of xmat with the corresponding values specified by the rule table. For example, the first column of xmat is (0, 1, 1, 1), which needs to be converted into (0.2, 0.5, 0.5, 0.5) using ruletable[[1]]. Similarly, the second column of xmat (1, 0, 0, 0) needs to be converted into (0.6, 0.5, 0.5, 0.5) using ruletable[[2]]. Since this is potentially a huge matrix, I am looking for a solution without using for loop.
Thanks!
This should be reasonably efficient:
vapply(
1:length(ruletable),
function(x) ruletable[[x]][xmat[, x] + 1L],
numeric(nrow(xmat))
)
original matrix (set.seed(1)):
# [,1] [,2] [,3]
# [1,] 0 0 0
# [2,] 0 1 0
# [3,] 0 1 0
# [4,] 1 1 1
and result:
# [,1] [,2] [,3]
# [1,] 0.2 0.5 0.8
# [2,] 0.2 0.6 0.8
# [3,] 0.2 0.6 0.8
# [4,] 0.5 0.6 0.1
mapply answer:
xmat <- matrix(c(0,1,1,1,1,0,0,0,1,1,0,1),nrow=4)
mapply(function(x,y) y[as.character(x)], data.frame(xmat),ruletable)
X1 X2 X3
0 0.2 0.6 0.1
1 0.5 0.5 0.1
1 0.5 0.5 0.8
1 0.5 0.5 0.1
If you don't want the names, they are easy to remove:
unname(mapply(function(x,y) y[as.character(x)], data.frame(xmat),ruletable))