I have a matrix with 50 rows and 50 columns:
[,1] [,2] [,3]...[,50]
[1,] 1 0.8 0.7
[2,] 0.8 1 0.5
[3,] 0.7 0.5 1
...
[50,]
And I want to sum 0.02 in values up to diagonal to obtain something like this:
[,1] [,2] [,3]...[,50]
[1,] 1 0.82 0.72
[2,] 0.8 1 0.52
[3,] 0.7 0.5 1
...
[50,]
Does anyone know how the sum could be done only in the values that are above the diagonal of the matrix using R?
Example of matrix code:
matrix <- as.matrix(data.frame(A = c(1, 0.8, 0.7), B = c(0.8, 1, 0.5), C = c(0.7, 0.5, 1)), nrow=3, ncol=3)
Try upper.tri like below
matrix[upper.tri(matrix)] <- matrix[upper.tri(matrix)] + 0.02
You can use lower.tri(m) or upper.tri(m) functions in R. Which m is your matrix.
m = matrix(1:36, 6, 6)
m[upper.tri(m)] = m[upper.tri(m)] + 0.02
m
Related
if I have the following:
mm <- matrix(0, 4, 3)
mm<-apply(mm, c(1, 2), function(x) sample(c(0, 1), 1))
> mm
[,1] [,2] [,3]
[1,] 1 1 1
[2,] 1 1 0
[3,] 0 0 0
[4,] 1 0 1
How do I output a matrix that expresses the frequency or percentage of different columns where both values = 1. For example - there are two rows out of 4 where column 1 and column 2 both equal 1 (=0.5) and 1 row out of 4 where column 2 and column 3 = 1 (=0.25), so in this case I'd need:
[,1] [,2] [,3]
[1,] 1 0.5 0.5
[2,] 0.5 1 0.25
[3,] 0.5 0.25 1
I am not interested in comparing the same columns, so by default the diagonal remains at 1.
I thought I may get somewhere with cor(mm) where there may be a way to output co-frequencies or co-percentages instead of correlation coefficients but this appears to not be the case. But the dimensions of the final output should be an N by N column matrix as cor() outputs:
> cor(mm)
[,1] [,2] [,3]
[1,] 1.0000000 0.5773503 0.5773503
[2,] 0.5773503 1.0000000 0.0000000
[3,] 0.5773503 0.0000000 1.0000000
but obviously these are correlation coefficients, I just want to co-frequencies or co-percentages instead.
A base R solution is using crossprod, i.e.,
r <- `diag<-`(crossprod(mm)/nrow(mm),1)
such that
> r
[,1] [,2] [,3]
[1,] 1.0 0.50 0.50
[2,] 0.5 1.00 0.25
[3,] 0.5 0.25 1.00
DATA
mm <- structure(c(1, 1, 0, 1, 1, 1, 0, 0, 1, 0, 0, 1), .Dim = 4:3)
set.seed(123)
mm <- matrix(0, 4, 3)
mm<-apply(mm, c(1, 2), function(x) sample(c(0, 1), 1))
combinations <- expand.grid(1:ncol(mm), 1:ncol(mm))
matrix(unlist(Map(function(x, y) {
if (x == y) {
res <- 1
} else {
res <- sum(mm[, x] * mm[, y]) / nrow(mm)
}
res
}, combinations[, 1], combinations[, 2])), 3)
# [,1] [,2] [,3]
# [1,] 1.00 0.25 0.0
# [2,] 0.25 1.00 0.5
# [3,] 0.00 0.50 1.0
I think my example is something special. Since I am not advanced in the use of lapply I am stucking with the following calculation. Here is a short reproducivle example: Assume I've a list containing three matrices:
list <- list(est1=matrix(sample(c(0,0.4,0.2,1), replace=TRUE, size=10), ncol=2), est2=matrix(sample(c(0,0.4,0.2,1), replace=TRUE, size=10), ncol=2),
est3=matrix(sample(c(0,0.4,0.2,1), replace=TRUE, size=10), ncol=2))
$`est1`
[,1] [,2]
[1,] 0.4 1.0
[2,] 0.0 0.4
[3,] 0.0 0.0
[4,] 0.0 0.4
[5,] 0.0 1.0
$est2
[,1] [,2]
[1,] 0.0 0.2
[2,] 0.4 0.4
[3,] 1.0 0.0
[4,] 0.2 1.0
[5,] 0.4 0.4
$est3
[,1] [,2]
[1,] 1.0 0.2
[2,] 0.4 1.0
[3,] 1.0 0.0
[4,] 1.0 0.2
[5,] 0.4 0.4
Each matrix contains coefficient estimates for different iterations. Each element inside one matrix belongs to one coefficient. I want to calculate the percentage over the three Matrices at which a coefficient is different from zero.
Expected Output:
[,1] [,2]
0.67 1
0.67 1
0.67 0
0.67 1
0.67 1
Please do not call your list list. In the following, it will be called z.
z <- list(est1=matrix(sample(c(0,0.4,0.2,1), replace=TRUE, size=10), ncol=2), est2=matrix(sample(c(0,0.4,0.2,1), replace=TRUE, size=10), ncol=2),
est3=matrix(sample(c(0,0.4,0.2,1), replace=TRUE, size=10), ncol=2))
For the kind of problems that you describe, I like to use arrays, so the first step is to transform your list into an array.
library(abind)
A <- abind(list, along=3)
Then, you can apply a function along the third dimension:
apply(A, 1:2, function(x) 100 * sum(x!=0) / length(x))
[,1] [,2]
[1,] 100.0 100.0
[2,] 100.0 66.7
[3,] 100.0 66.7
[4,] 100.0 66.7
[5,] 66.7 66.7
Maybe the following does what you want.
I start by setting the RNG seed to make the results reproducible
set.seed(2081) # Make the results reproducible
list <- list(est1 = matrix(sample(c(0,0.4,0.2,1), replace=TRUE, size=10), ncol=2),
est2 = matrix(sample(c(0,0.4,0.2,1), replace=TRUE, size=10), ncol=2),
est3 = matrix(sample(c(0,0.4,0.2,1), replace=TRUE, size=10), ncol=2))
zeros <- sapply(list, `==`, 0)
res <- rowSums(zeros) / ncol(zeros)
matrix(res, ncol = 2)
# [,1] [,2]
#[1,] 0.3333333 0.3333333
#[2,] 0.0000000 0.6666667
#[3,] 0.0000000 0.3333333
#[4,] 0.3333333 0.3333333
#[5,] 0.6666667 0.3333333
EDIT.
The following uses rowMeans and is simpler. The result is identical() to res above.
res2 <- rowMeans(zeros)
identical(res, res2)
#[1] TRUE
matrix(res2, ncol = 2)
I have an example matrix:
p <- matrix(c(0.5, 0.3, 0.3, -0.1, 0.6, 0.7, -0.2, -0.1), ncol = 4, byrow = T)
> p
[,1] [,2] [,3] [,4]
[1,] 0.5 0.3 0.3 -0.1
[2,] 0.6 0.7 -0.2 -0.1
with one or more negative elements in every row. The largest element is on the diagonal.
I want to create a function, which substracts row wise the negative values from the diagonal and then sets these elements to zero, so that the row sum is again 1.
I tried it myself with the apply function but had no luck until now.
Hope someone could help me.
Best Wishes
shearer
Here's one way:
negs <- p < 0
diag(p) <- diag(p) + rowSums(replace(p, ! negs, 0))
p[negs] <- 0
# [,1] [,2] [,3] [,4]
# [1,] 0.4 0.3 0.3 0
# [2,] 0.6 0.4 0.0 0
I have a matrix with either 1s or 0s.
xmat = matrix(round(runif(12),0), ncol=3)
[,1] [,2] [,3]
[1,] 0 1 1
[2,] 1 0 1
[3,] 1 0 0
[4,] 1 0 1
I also have a rule table, which is a list.
a = c(0.2, 0.5)
b = c(0.5, 0.6)
c = c(0.8, 0.1)
names(a) = c("0", "1")
names(b) = c("0", "1")
names(c) = c("0", "1")
ruletable = list(a, b, c)
[[1]]
0 1
0.2 0.5
[[2]]
0 1
0.5 0.6
[[3]]
0 1
0.8 0.1
I need to replace the 1s and 0s in each column of xmat with the corresponding values specified by the rule table. For example, the first column of xmat is (0, 1, 1, 1), which needs to be converted into (0.2, 0.5, 0.5, 0.5) using ruletable[[1]]. Similarly, the second column of xmat (1, 0, 0, 0) needs to be converted into (0.6, 0.5, 0.5, 0.5) using ruletable[[2]]. Since this is potentially a huge matrix, I am looking for a solution without using for loop.
Thanks!
This should be reasonably efficient:
vapply(
1:length(ruletable),
function(x) ruletable[[x]][xmat[, x] + 1L],
numeric(nrow(xmat))
)
original matrix (set.seed(1)):
# [,1] [,2] [,3]
# [1,] 0 0 0
# [2,] 0 1 0
# [3,] 0 1 0
# [4,] 1 1 1
and result:
# [,1] [,2] [,3]
# [1,] 0.2 0.5 0.8
# [2,] 0.2 0.6 0.8
# [3,] 0.2 0.6 0.8
# [4,] 0.5 0.6 0.1
mapply answer:
xmat <- matrix(c(0,1,1,1,1,0,0,0,1,1,0,1),nrow=4)
mapply(function(x,y) y[as.character(x)], data.frame(xmat),ruletable)
X1 X2 X3
0 0.2 0.6 0.1
1 0.5 0.5 0.1
1 0.5 0.5 0.8
1 0.5 0.5 0.1
If you don't want the names, they are easy to remove:
unname(mapply(function(x,y) y[as.character(x)], data.frame(xmat),ruletable))
I have a large dataset (202k points). I know that there are 8 values over 0.5. I want to subset on those rows.
How do I find/return a list the row numbers where the values are > 0.5?
If the dataset is a vector named x:
(1:length(x))[x > 0.5]
If the dataset is a data.frame or matrix named x and the variable of interest is in column j:
(1:nrow(x))[x[,j] > 0.5]
But if you just want to find the subset and don't really need the row numbers, use
subset(x, x > 0.5)
for a vector and
subset(x, x[,j] > 0.5)
for a matrix or data.frame.
which(x > 0.5)
Here's some dummy data:
D<-matrix(c(0.6,0.1,0.1,0.2,0.1,0.1,0.23,0.1,0.8,0.2,0.2,0.2),nrow=3)
Which looks like:
> D
[,1] [,2] [,3] [,4]
[1,] 0.6 0.2 0.23 0.2
[2,] 0.1 0.1 0.10 0.2
[3,] 0.1 0.1 0.80 0.2
And here's the logical row index,
index <- (rowSums(D>0.5))>=1
You can use it to extract the rows you want:
PeakRows <- D[index,]
Which looks like this:
> PeakRows
[,1] [,2] [,3] [,4]
[1,] 0.6 0.2 0.23 0.2
[2,] 0.1 0.1 0.80 0.2
Using the argument arr.ind=TRUE with which is a great way for finding the row (or column) numbers where a condition is TRUE,
df <- matrix(c(0.6,0.2,0.1,0.25,0.11,0.13,0.23,0.18,0.21,0.29,0.23,0.51), nrow=4)
# [,1] [,2] [,3]
# [1,] 0.60 0.11 0.21
# [2,] 0.20 0.13 0.29
# [3,] 0.10 0.23 0.23
# [4,] 0.25 0.18 0.51
which with arr.ind=TRUE returns the array indices where the condition is TRUE
which(df > 0.5, arr.ind=TRUE)
row col
[1,] 1 1
[2,] 4 3
so the subset becomes
df[-which(df > 0.5, arr.ind=TRUE)[, "row"], ]
# [,1] [,2] [,3]
# [1,] 0.2 0.13 0.29
# [2,] 0.1 0.23 0.23