How to print out the row names instead of their indices? - r

I have a matrix m and made some calculation on it, as the result I obtained a matrix ind. In the code below ind is the constant matrix.
k=10; n = 8
m <- matrix(c(1, 1, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 1, 1, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 1, 1, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 1, 1, 0,
0, 0, 0, 0, 0, 0, 1, 0, 0, 1,
0, 0, 0, 0, 1, 0, 1, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 1, 1,
0, 1, 1, 0, 0, 1, 0, 1, 0, 0), n, k, byrow = TRUE)
colnames(m)<-1:k; rownames(m)<-LETTERS[1:n]
# some calculation
ind <-matrix(c(
1, 1,
2, 2,
3, 6,
4, 7,
5, 8), 5, 2, byrow = TRUE)
I need to output the row names of m matrix instead of row indeces.
My attempt is:
noquote(rownames(m)[ind])
[1] A B C D E A B F G H
Expected result is:
[,1] [,2]
[1,] A A
[2,] B B
[3,] C F
[4,] D G
[5,] E H

You can use subsetting as follow:
rn = ind
rn[] <- rownames(m)[ind]
[,1] [,2]
[1,] "A" "A"
[2,] "B" "B"
[3,] "C" "F"
[4,] "D" "G"
[5,] "E" "H"

You can use apply
apply(ind, 2, function(x) rownames(m)[x])
# [,1] [,2]
#[1,] "A" "A"
#[2,] "B" "B"
#[3,] "C" "F"
#[4,] "D" "G"
#[5,] "E" "H"

Here is a solution as you expect it, with noquotes
noquote(matrix(rownames(m)[ind], nrow(ind)), right = T)
-output
[,1] [,2]
[1,] A A
[2,] B B
[3,] C F
[4,] D G
[5,] E H

Related

Count the frequency of concecutive zeros in a every time they appear in a each row

I have this dataframe and would like to compute a count of zero sequences every time they appear in a row so that the output would be A: 2 4, B:1 2 1, C:2 5, D: 2 3, E: 1 1
df <- data.frame(
A=c(1, 0, 0, 1, 1, 0, 0, 0, 0),
B=c(0, 1, 1, 0, 0, 1, 0, 1, 1),
C=c(0, 0, 1, 1, 0, 0, 0, 0, 0),
D=c(0, 0, 1, 1, 1, 1, 0, 0, 0),
E=c(1, 0, 1, 1, 1, 1, 0, 1, 1)
)
We may use rle by looping over the columns of the data.frame and get the lengths of the 0 values in base R
lapply(df1, function(x) with(rle(x), lengths[!values]))
-output
$A
[1] 2 4
$B
[1] 1 2 1
$C
[1] 2 5
$D
[1] 2 3
$E
[1] 1 1
data
df1 <- structure(list(A = c(1, 0, 0, 1, 1, 0, 0, 0, 0), B = c(0, 1,
1, 0, 0, 1, 0, 1, 1), C = c(0, 0, 1, 1, 0, 0, 0, 0, 0), D = c(0,
0, 1, 1, 1, 1, 0, 0, 0), E = c(1, 0, 1, 1, 1, 1, 0, 1, 1)), row.names = c(NA,
-9L), class = "data.frame")

How to find bounding boxes of objects in raster?

I have a binary raster consisting of objects (1) and background (0). How can I find bounding boxes of objects? Each object should have its own bouding box.
Input:
library("raster")
mat = matrix(
c(0, 0, 0, 0, 0, 0,
0, 1, 1, 1, 0, 0,
0, 0, 1, 1, 1, 0,
0, 0, 0, 0, 0, 0,
0, 0, 1, 1, 0, 0,
0, 1, 1, 1, 1, 0,
0, 0, 1, 1, 0, 0,
0, 0, 0, 0, 0, 0),
ncol = 6, nrow = 8, byrow = TRUE
)
ras = raster(mat)
I expect this result:
result = raster(matrix(
c(0, 0, 0, 0, 0, 0,
0, 1, 1, 1, 1, 0,
0, 1, 1, 1, 1, 0,
0, 0, 0, 0, 0, 0,
0, 1, 1, 1, 1, 0,
0, 1, 0, 0, 1, 0,
0, 1, 1, 1, 1, 0,
0, 0, 0, 0, 0, 0),
ncol = 6, nrow = 8, byrow = TRUE
))
Here in an approach
Example data
library(raster)
mat = matrix(
c(0, 0, 0, 0, 0, 0,
0, 1, 1, 1, 0, 0,
0, 0, 1, 1, 1, 0,
0, 0, 0, 0, 0, 0,
0, 0, 1, 1, 0, 0,
0, 1, 1, 1, 1, 0,
0, 0, 1, 1, 0, 0,
0, 0, 0, 0, 0, 0),
ncol = 6, nrow = 8, byrow = TRUE )
ras <- raster(mat)
Solution
f <- function(r) {
x <- reclassify(ras, cbind(0,NA))
y <- rasterToPolygons(x, dissolve=TRUE)
z <- disaggregate(y)
e <- sapply(1:length(z), function(i) extent(z[i,]))
p <- spPolygons(e)
r <- rasterize(p, r)
d <- boundaries(r)
reclassify(d, cbind(NA, 0))
}
r <- f(res)
as.matrix(r)
# [,1] [,2] [,3] [,4] [,5] [,6]
#[1,] 0 0 0 0 0 0
#[2,] 0 1 1 1 1 0
#[3,] 0 1 1 1 1 0
#[4,] 0 0 0 0 0 0
#[5,] 0 1 1 1 1 0
#[6,] 0 1 0 0 1 0
#[7,] 0 1 1 1 1 0
#[8,] 0 0 0 0 0 0
It is of course possible that bounding boxes of objects overlap, in which there is no solution, I suppose.

Replacing values in one matrix with values from another

I'm trying to compare to matrices. When the values aren't equivalent then I want to use the value from mat2 so long as it is greater than 0; if it is zero, then I want the value from mat1. As the code is currently, it appears to constantly return the value of mat1.
Here is my attempt:
mat.data1 <- c(1, 0, 1, 1, 1, 0, 1, 1, 1, 1, 1, 1, 1, 0, 0, 1, 1, 0, 1, 0, 1, 1, 0, 0, 1)
mat1 <- matrix(data = mat.data1, nrow = 5, ncol = 5, byrow = TRUE)
mat.data2 <- c(0, 0, 0, 0, 0, 0, 1, 2, 0, 0, 0, 1, 2, 2, 0, 0, 0, 1, 2, 2, 0, 2, 1, 0, 1)
mat2 <- matrix(data = mat.data2, nrow = 5, ncol = 5, byrow = TRUE)
mat3 = if(mat1 == mat2){mat1} else {if(mat2>0){mat2} else {mat1}}
the expected output should be
1 0 1 1 1
0 1 2 1 1
1 1 2 2 0
1 1 1 2 2
1 1 1 0 1
Here is one potential way to do it.
mat.data1 <- c(1, 0, 1, 1, 1, 0, 1, 1, 1, 1, 1, 1, 1, 0, 0, 1, 1, 0, 1, 0, 1, 1, 0, 0, 1)
mat1 <- matrix(data = mat.data1, nrow = 5, ncol = 5, byrow = TRUE)
mat.data2 <- c(0, 0, 0, 0, 0, 0, 1, 2, 0, 0, 0, 1, 2, 2, 0, 0, 0, 1, 2, 2, 0, 2, 1, 0, 1)
mat2 <- matrix(data = mat.data2, nrow = 5, ncol = 5, byrow = TRUE)
mat3 <- mat1
to_change <- which(mat2 != mat1 & mat2 > 0)
mat3[to_change] <- mat2[to_change]
This specific use of which essentially asks for the locations in mat2 that are not equal to that in mat1 AND where mat2 is greater than zero. You can then just do a subset and place those values in mat3.
This output is then:
> mat3
[,1] [,2] [,3] [,4] [,5]
[1,] 1 0 1 1 1
[2,] 0 1 2 1 1
[3,] 1 1 2 2 0
[4,] 1 1 1 2 2
[5,] 1 2 1 0 1
We can use coalesce
library(dplyr)
out <- coalesce(replace(mat2, !mat2, NA), replace(mat1, !mat1, NA))
replace(out, is.na(out), 0)
Or as #Axeman mentioned
coalesce(out, 0)

Find the column index when the value of character for the first time and last time in R?

I have a data-frame which looks like this
a <- as.data.frame(c(1,0,0, 0,1,1,1,1,1,0,0))
I want to find the column indexes where for every row when it was 1 for the first time and last time
eg: for row a it is 2,10
You could do:
x <- df==1
rbind(max.col(x, "first"), max.col(x, "last"))
# [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10]
#[1,] 2 1 1 1 2 3 3 2 2 4
#[2,] 2 6 6 4 4 5 6 6 2 6
OR with apply:
apply(df, 1, function(x) c(min(which(x==1)),max(which(x==1))))
data
df <- structure(list(a = c(0, 1, 1, 1, 0, 0, 0, 0, 0, 0), b = c(1,
0, 0, 0, 1, 0, 0, 1, 1, 0), c = c(0, 1, 0, 1, 0, 1, 1, 1, 0,
0), d = c(0, 1, 0, 1, 1, 0, 1, 0, 0, 1), e = c(0, 0, 0, 0, 0,
1, 0, 1, 0, 1), f = c(0, 1, 1, 0, 0, 0, 1, 1, 0, 1)), .Names = c("a",
"b", "c", "d", "e", "f"), row.names = c(NA, -10L), class = "data.frame")

Match information from a correlation matrix according to their p-value cut off

I have used rcorr function of Hmisc library for calculation of correlations and p-values. Then extracted pvalues to Pval matrix and correlation coefficients to corr matrix.
Rvalue<-structure(c(1, 1, 1, 1, 0, 1, 1, 0, 1, 1, 1, 1, 0, 0, 0, 0, 0,
0, 1, 1, 1, 0, 1, 1, 0, 1, 1, 1, 1, 1, 1, 0, 1, 1, 0, 1, 1, 0,
1, 1, 0, 0, 0, 0, 1, 1, 0, 1, 0, 0, 1, 0, 1, 1, 1, 1, 1, 1, 1,
1, 1, 0, 1, 1, 0, 1, 1, 1, 1, 1, 0, 0, 1, 0, 1, 1, 1, 1, 0, 0,
1, 1, 1, 1, 0, 1, 1, 0, 1, 1, 1, 1, 1, 1, 0, 1, 1, 0, 1, 1), .Dim = c(10L,
10L), .Dimnames = list(c("41699", "41700", "41701", "41702",
"41703", "41704", "41705", "41707", "41708", "41709"), c("41699",
"41700", "41701", "41702", "41703", "41704", "41705", "41707",
"41708", "41709")))
> Pvalue<-structure(c(NA, 0, 0, 0, 0.0258814351024321, 0, 0, 0, 0, 0, 0,
NA, 6.70574706873595e-14, 0, 0, 2.1673942640632e-09, 1.08217552696743e-07,
0.0105345133269157, 0, 0, 0, 6.70574706873595e-14, NA, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, NA, 2.22044604925031e-15, 0, 0, 0, 0,
0, 0.0258814351024321, 0, 0, 2.22044604925031e-15, NA, 0, 0,
0, 0.000322310440723728, 0.00298460759118657, 0, 2.1673942640632e-09,
0, 0, 0, NA, 0, 0, 0, 0, 0, 1.08217552696743e-07, 0, 0, 0, 0,
NA, 0, 0, 0, 0, 0.0105345133269157, 0, 0, 0, 0, 0, NA, 0, 0,
0, 0, 0, 0, 0.000322310440723728, 0, 0, 0, NA, 0, 0, 0, 0, 0,
0.00298460759118657, 0, 0, 0, 0, NA), .Dim = c(10L, 10L), .Dimnames = list(
c("41699", "41700", "41701", "41702", "41703", "41704", "41705",
"41707", "41708", "41709"), c("41699", "41700", "41701",
"41702", "41703", "41704", "41705", "41707", "41708", "41709"
)))
Then I converted corr matrix to Boolean matrix (0,1) which number one means good correlation. Then I want to math good correlations with significant pvalues. I need an edge list including the p-value. I implemented following code:
n=1
m=list()
for(i in 1:nrow(Rvalue))
{
for (j in 1:nrow(Rvalue))
{
if (i<j & Pvalue[i,j]<0.05 & Rvalue[i,j]==1)
{
m[[n]]<-c(rownames(Rvalue)[i], colnames(Rvalue)[j], signif(Pvalue[i,j], digits = 4))
n=n+1
}
}
print(i)
}
then, then output is:
> m
[[1]]
[1] "41699" "41700" "0"
[[2]]
[2] "41699" "41701" "0"
[[3]]
[3] "41699" "41702" "0"
[[4]]
[4] "41699" "41704" "0"
...
Result is OK, but since the matrices are very big, it needs much time. How can I speed up this process? Please note that I need node names. Is there any related functions?
I also have found two similar questions but not exactly what I needed (+ and +). Thanks in advance.
You could try
indx <- which(Rvalue==1 & Pvalue < 0.05 & !is.na(Pvalue), arr.ind=TRUE)
d1 <- data.frame(rN=row.names(Rvalue)[indx[,1]],
cN=colnames(Rvalue)[indx[,2]], Pval=signif(Pvalue[indx],
digits=4))
head(d1,2)
# rN cN Pval
#1 41700 41699 0
#2 41701 41699 0
Update
Not sure why you are getting the same result when you change the cutoff. It may be possible that the P values may be too small that it would be TRUE in the cutoffs you tried. Here is an example to show that it does return different values. Suppose, I create a function from the above code,
f1 <- function(Rmat, Pmat, cutoff){
indx <- which(Rmat==1 & Pmat < cutoff & !is.na(Pmat), arr.ind=TRUE)
d1 <- data.frame(rN=row.names(Rmat)[indx[,1]],
cN=colnames(Rmat)[indx[,2]], Pval=signif(Pmat[indx],
digits=4))
d1}
f1(R1, P1, 0.05)
# rN cN Pval
#1 B A 0.021
#2 C A 0.018
#3 D A 0.001
#4 A B 0.021
#5 A C 0.018
#6 E C 0.034
#7 A D 0.001
#8 C E 0.034
f1(R1, P1, 0.01)
# rN cN Pval
#1 D A 0.001
#2 A D 0.001
f1(R1, P1, 0.001)
#[1] rN cN Pval
#<0 rows> (or 0-length row.names)
data
set.seed(24)
R1 <- matrix(sample(c(0,1), 5*5, replace=TRUE), 5,5,
dimnames=list(LETTERS[1:5], LETTERS[1:5]))
R1[lower.tri(R1)] <- 0
R1 <- R1+t(R1)
diag(R1) <- 1
set.seed(49)
P1 <- matrix(sample(seq(0,0.07, by=0.001), 5*5, replace=TRUE), 5, 5,
dimnames=list(LETTERS[1:5], LETTERS[1:5]))
P1[lower.tri(P1)] <- 0
P1 <- P1+t(P1)
diag(P1) <- NA
Since your matrix has a large number of columns and rows, that would be a good idea to avoid simultaneous "for loop". You can instead use mapply function which is more handy.
mapply(FUN = NULL , ...)
instead of FUN use the following function:
myf= function(x){ x "les then threshold"}
You can use mapply(FUN = myf , "Your Matrix") twice to check if the elements of two correlation and pvalue matrices agree with threshold.
Store the results in two boolean matrices, P1 and P2. Then multiply P1 and P2 (direct multiplication).
myf1 = function(x) {x<0.05}
myf2 = function(x) {x>0.7}
P1 = mapply(FUN = myf1 , matP)
P2 = mapply(FUN = myf2 , matR)
P = P1 * P2
The elements in P which are labeled as "True" are the desired nodes. It will work fine!
And here there is the result for your smaple:
P1 = mapply(FUN = myf1 , Pvalue)
P2 = mapply(FUN = myf2 , Rvalue)
P = P1 * P2
NA 1 1 1 0 1 1 0 1 1 1 NA 0 0 0 0 0 0 1 1 1 0 NA 1 0
1 1 1 1 1 1 0 1 NA 0 1 1 0 1 1 0 0 0 0 NA 1 0 1 0 0
1 0 1 1 1 NA 1 1 1 1 1 0 1 1 0 1 NA 1 1 1 0 0 1 0 1
1 1 NA 0 0 1 1 1 1 0 1 1 0 NA 1 1 1 1 1 0 1 1 0 1 NA

Resources