I was wondering if there was a generic way to sort a symmetrical matrix in R, whilst preserving the diagonal values.
For example, if I have a matrix like this:
# Create Matrix -----------------------------------------------------------
offdiag <- c(rep(1:6))
m <- matrix(NA, ncol = 4, nrow = 4,dimnames = list(c("A","B","C","D"), c("A","B","C","D")))
m[lower.tri(m)] <- offdiag
m[upper.tri(m)] <- t(m)[upper.tri(t(m))]
diag(m) <- 0
m
which produces this:
A B C D
A 0 1 2 3
B 1 0 4 5
C 2 4 0 6
D 3 5 6 0
In the above example the values C and D share the largest value. So what I am trying to achieve is to reorder the matrix so the largest value is in the top left of the upper triangle (whilst not altering the diagonal 0's).
So if I were to explicitly rearrange the matrix, by hand, the final result would be:
# Create sorted matrix by hand --------------------------------------------
A <- c(2,3,0,1)
B <- c(4,5,1,0)
C <- c(0,6,2,4)
D <- c(6,0,3,5)
matr <- cbind(C,D,A,B)
rownames(matr) <- c("C","D","A","B")
matr
which would produce:
C D A B
C 0 6 2 4
D 6 0 3 5
A 2 3 0 1
B 4 5 1 0
What I'm wondering is, is there a way to generically sort the matrix like in my example for a (n X n) matrix?
Maybe you can try the code below
q <- which(colSums(m == max(m))>0,arr.ind = T)
o <- c(q, seq(ncol(m))[-q])
mout <- m[o,o]
such that
> mout
C D A B
C 0 6 2 4
D 6 0 3 5
A 2 3 0 1
B 4 5 1 0
Related
I have a "small" square matrix that I want to add to a "big" matrix. The big matrix contains all the rows and columns of the small matrix plus extras. I want to add the values where the indices are in common and just keep the values from the big one where that index is not contained in the small one. Unfortunately, all the data is copied on the addition so it takes a long time and can temporarily spike memory when the matrices are large.
I have tried adding subsets using matrices and data.frames, as well as a data.table method using rbindlist. Both the data.frame and matrix methods seem to cause a memory copy (why?) and the rbindlist method is not ideal because it requires a melt and dcast and temporarily spiking the memory by spiking the number of rows.
Is there any way to just change the values of some items in a matrix without causing a copy of the entire matrix?
Here are my attempts:
MList <- list(M1,M2)
unionCols <- Reduce(union, lapply(MList, colnames))
MTotal <- matrix(as.double(rep(0,(length(unionCols))^2)), nrow = length(unionCols))
rownames(MTotal) <- colnames(MTotal) <- unionCols
DFTotal <- as.data.frame(MTotal)
DFList <- lapply(MList, as.data.frame)
for(i in 1:length(MList)){
tracemem(MTotal)
tracemem(DFTotal)
mCol <- match(colnames(MList[[i]]), colnames(MTotal))
MTotal[mCol,mCol] <- MTotal[mCol,mCol] + MList[[i]] # this causes a copy
DFTotal[mCol,mCol] <- DFTotal[mCol,mCol] + DFList[[i]] # this causes a copy
}
M1
M2
MTotal
# rbindlist method
.AggDMCMatsSingleM2 <- function(M1, M2){
.MyMelt <- function(M){
DT <- setnames(reshape2::melt(M, id.vars = colnames(M)), c('Var1','Var2'), c('row','col'))
}
M_total <- as.matrix(data.table::dcast(rbindlist(lapply(list(M1,M2), .MyMelt)),
formula = as.formula(row ~ col),
value.var = 'value',
fun.aggregate = sum,
fill = 0),
rownames = 'row')
return(M_total)
}
M1
M2
.AggDMCMatsSingleM2(M1,M2)
If I follow what you are asking we can directly add and write to the big matrix using the bracket notation row/col names of the small matrix:
big_matrix<-matrix(data=rep(1, 25), nrow=5,
dimnames = list(c(LETTERS[1:5]),
c(letters[1:5])))
# a b c d e
#A 1 1 1 1 1
#B 1 1 1 1 1
#C 1 1 1 1 1
#D 1 1 1 1 1
#E 1 1 1 1 1
small_matrix<-matrix(data=c(1:9), nrow=3,
dimnames = list(c(LETTERS[2:4]),
c(letters[2:4])))
# b c d
#B 1 4 7
#C 2 5 8
#D 3 6 9
big_matrix[rownames(small_matrix), colnames(small_matrix)] <-
big_matrix[rownames(small_matrix), colnames(small_matrix)] + small_matrix
# a b c d e
#A 1 1 1 1 1
#B 1 2 5 8 1
#C 1 3 6 9 1
#D 1 4 7 10 1
#E 1 1 1 1 1
More complex test:
big_matrix<-matrix(data=rep(1, 25), nrow=5,
dimnames = list(c(LETTERS[1:5]),
c(letters[1:5])))
# a b c d e
#A 1 1 1 1 1
#B 1 1 1 1 1
#C 1 1 1 1 1
#D 1 1 1 1 1
#E 1 1 1 1 1
small_matrix<-matrix(data=c(1:9), nrow=3,
dimnames = list(c("A", "D", "C"),
c(letters[c(2:4)])))
# b c d
#A 1 4 7
#D 2 5 8
#C 3 6 9
big_matrix[rownames(small_matrix), colnames(small_matrix)] <-
big_matrix[rownames(small_matrix), colnames(small_matrix)] + small_matrix
big_matrix
# a b c d e
#A 1 2 5 8 1
#B 1 1 1 1 1
#C 1 4 7 10 1
#D 1 3 6 9 1
#E 1 1 1 1 1
I want to merge two data frames: X with length 10 (for example) and Y with length 3 but with similar row names. I only want to retain the length of X but with the values of B in the 3 corresponding rows and the other 10 - 3 = 7 values set to zero.
For example,
X<-data.frame(c(1,2,3,4,5,6,7,8,9,10))
rownames(X)<-c("a","b","c","d","e","f","g","h","i","j")
Y<-data.frame(c(20,30,40))
rownames(Y)<-c("d","f","h")
gives me these data frames
X Y
a 1 d 20
b 2 f 30
c 3 h 40
d 4
e 5
f 6
g 7
h 8
i 9
j 10
and I want this now
Z
a 0
b 0
c 0
d 20
e 0
f 30
g 0
h 40
i 0
j 0
Can this be done easily?
We can use match to find the positions of the row names of Y that are found in X. The values of Y are put into a vector and concatenated with 0. We use the nomatch argument to fill in 0 when there is no match. This returns z as a vector:
Z <- c(unlist(Y, use.names=FALSE), 0)[match(row.names(X), row.names(Y), nomatch=4L)]
Z
[1] 0 0 0 20 0 30 0 40 0 0
To get a data.frame
Z <- data.frame(Z)
dplyr have nice option for left_join. Code of it it's easy to read and explain.
X<-data.frame(V1 = c(1,2,3,4,5,6,7,8,9,10),
KEY = c("a","b","c","d","e","f","g","h","i","j"),
stringsAsFactors = F)
Y<-data.frame(V2 = c(20,30,40),
KEY = c("d","f","h"),
stringsAsFactors = F)
library(dplyr)
XandY <- X %>%
left_join(Y, by = "KEY") %>%
mutate(Z = ifelse(is.na(V2), 0, V2))
I lefted all of columns in XandY to show you all computation and results.
> XandY
V1 KEY V2 Z
1 1 a NA 0
2 2 b NA 0
3 3 c NA 0
4 4 d 20 20
5 5 e NA 0
6 6 f 30 30
7 7 g NA 0
8 8 h 40 40
9 9 i NA 0
10 10 j NA 0
I have a data looks like below Column A and B.
Column A Column B
A 0.098
B 0.076
C 0.871
D 0.837
E 1.981
F 0.736
I want to calculate the absolute value between each and all column A elements based on B column.
So the output should look like this, So each time one element values will be used to calculate abs.
A B C D E F
A 0 abs(0.098-0.076) abs(0.098-0.871) .....
B abs(0.098-0.076) 0 abs(0.076-0.871) .....
C 0.871 ......
D 0.837 ......
E 1.981 ......
F 0.736 ......
If your column vector is x, you can do:
dist(x, upper = TRUE, diag = TRUE)
Example:
x <- setNames(1:5, LETTERS[1:5])
dist(x, upper = TRUE, diag = TRUE)
# A B C D E
# A 0 1 2 3 4
# B 1 0 1 2 3
# C 2 1 0 1 2
# D 3 2 1 0 1
# E 4 3 2 1 0
I don't want to use dist, oops I forget to mention it.
You can try outer:
abs(outer(x, x, "-"))
# A B C D E
# A 0 1 2 3 4
# B 1 0 1 2 3
# C 2 1 0 1 2
# D 3 2 1 0 1
# E 4 3 2 1 0
For sure dist is efficient, but then if I want to change to another criterion I cannot.
I think you want to see some more complicated example. Now here it is. Define a bivariate function:
f <- function(a, b) exp(abs(sin(a) + cos(b)))
We can use outer:
outer(x, x, f)
# A B C D E
# A 3.981957 1.530086 1.160118 1.206625 3.080627
# B 4.261408 1.637467 1.084040 1.291306 3.296824
# C 1.976687 1.316566 2.337010 1.669499 1.529257
# D 1.241723 3.231509 5.736189 4.097783 1.605027
# E 1.519866 3.955358 7.021078 5.015674 1.964548
Note that f is not symmetric, i.e., f(a, b) != f(b, a), hence the matrix is not symmetric.
Essentially, outer evaluates the function f at the grid expanded by x, x. See ?outer for more.
This is the first part of my code:
BSum=0.0
mydata = NULL
while(BSum < 5)
{
A=(rpois (1, lambda=1))
y=runif(A,0,1)
B1 = length(which(y<=0.5))
BSum = BSum + B1
C = A - B1
mydata=rbind(mydata,c("A"=A,"B"=B1,"C"=C))
}
I need 3 more columns here. For column D(row x) I would generate as many random nos. (between 0 and 1) as is the value in Column B(row x). Then I see how many of those random nos. are less than or equal to 0.1. I put the total count of these in Column D. The remainder (B-D) becomes column F. I generate another column E that will get populated the same same way D was generated from B. The remainder again gets added to what had accumulated in Column F.
:= is from the data.table package. As you don't have this loaded, either your object isn't a data.table object or it is but you don't say and the package was not loaded.
If the former (your object is a data frame, not a data.table) then you want cbind(). As in:
set.seed(1)
df <- data.frame(A = runif(10))
cbind(df, list(B = runif(10), C = letters[1:10]))
> cbind(df, list(B = runif(10), C = letters[1:10]))
A B C
1 0.26550866 0.2059746 a
2 0.37212390 0.1765568 b
3 0.57285336 0.6870228 c
4 0.90820779 0.3841037 d
5 0.20168193 0.7698414 e
6 0.89838968 0.4976992 f
7 0.94467527 0.7176185 g
8 0.66079779 0.9919061 h
9 0.62911404 0.3800352 i
10 0.06178627 0.7774452 j
For your particular problem, try:
myfun <- function(z) {
ret1 <- apply(z, 1, function(x) sum(runif(x) <= 0.1))
ret2 <- z[,1] - ret1
cbind(z, B = ret1, C = ret2)
}
set.seed(1)
df <- data.frame(A = rpois(10, 2))
myfun(df)
> myfun(df)
A B C
1 1 0 1
2 1 0 1
3 2 0 2
4 4 0 4
5 1 0 1
6 4 0 4
7 4 1 3
8 2 0 2
9 2 0 2
10 0 0 0
You could make this more efficient, say by not doing each row individually, but it'd involve more coding.
Updated
If I understand your update (and I might not as I already showed you how to do those steps, though not in the same configuration as you now want), then I think this is what you wanted. Note that how you create E is a little ambiguous. I took you literally and just did exactly the same as for D.
set.seed(2)
BSum <- 0.0
mydata <- NULL
while(BSum < 5) {
A <- rpois(1, lambda = 1)
B1 <- sum(runif(A, 0, 1) <= 0.5)
BSum <- BSum + B1
C <- A - B1
D <- sum(runif(B1) <= 0.1)
F <- B1 - D
E <- sum(runif(B1) <= 0.1)
F <- F + (D - E)
mydata <- rbind(mydata, c(A = A, B = B1, C = C, D = D, E = E, F = F))
}
With that seed I get
R> mydata
A B C D E F
[1,] 0 0 0 0 0 0
[2,] 1 0 1 0 0 0
[3,] 0 0 0 0 0 0
[4,] 3 1 2 0 0 1
[5,] 1 1 0 0 0 1
[6,] 1 0 1 0 0 0
[7,] 3 3 0 0 0 3
I would like to ask,if some of You dont know any simple way to solve this kind of problem:
I need to generate all combinations of A numbers taken from a set B (0,1,2...B), with their sum = C.
ie if A=2, B=3, C=2:
Solution in this case:
(1,1);(0,2);(2,0)
So the vectors are length 2 (A), sum of all its items is 2 (C), possible values for each of vectors elements come from the set {0,1,2,3} (maximum is B).
A functional version since I already started before SO updated:
A=2
B=3
C=2
myfun <- function(a=A, b=B, c=C) {
out <- do.call(expand.grid, lapply(1:a, function(x) 0:b))
return(out[rowSums(out)==c,])
}
> out[rowSums(out)==c,]
Var1 Var2
3 2 0
6 1 1
9 0 2
z <- expand.grid(0:3,0:3)
z[rowSums(z)==2, ]
Var1 Var2
3 2 0
5 1 1
7 0 2
If you wanted to do the expand grid programmatically this would work:
z <- expand.grid( rep( list(C), A) )
You need to expand as a list so that the items remain separate. rep(0:3, 3) would not return 3 separate sequences. So for A=3:
> z <- expand.grid(rep(list(0:3), 3))
> z[rowSums(z)==2, ]
Var1 Var2 Var3
3 2 0 0
6 1 1 0
9 0 2 0
18 1 0 1
21 0 1 1
33 0 0 2
Using the nifty partitions() package, and more interesting values of A, B, and C:
library(partitions)
A <- 2
B <- 5
C <- 7
comps <- t(compositions(C, A))
ii <- apply(comps, 1, FUN=function(X) all(X %in% 0:B))
comps[ii, ]
# [,1] [,2]
# [1,] 5 2
# [2,] 4 3
# [3,] 3 4
# [4,] 2 5