R programming - from a matrix containing points to neighbours list - r

I know there is a R method that enables programmers to convert neighbours list to a matrix.
I am trying to do the opposite process; is there anyway that I can convert a matrix that contains x and y coordinates of points to a neighbours list?
thank you
PS: I tried the following:
require(spdep)
mat2listw(myMatrix)$neighbours
but this is causing a problem since myMatrix has to be a square matrix...in my case my matrix is 11*2, which is not square.

You can try the following.
# need indices. Use Names or numeric index
rr <- if(is.null(rownames(myMatrix))) seq(nrow(myMatrix)) else rownames(myMatrix)
cc <- if(is.null(colnames(myMatrix))) seq(ncol(myMatrix)) else colnames(myMatrix)
inds <- as.matrix( expand.grid(X=rr, Y=cc) )
## as a data.frame
data.frame(inds, Dist=myMatrix[inds])
## as a list
ret <- apply(inds, 1, function(i) list(c(i, Dist=myMatrix[rbind(i)])))
# possibly unlist if needed
unlist(ret, recursive=FALSE)
Sample Data
set.seed(1)
myMatrix <- matrix(sample(8, 22, TRUE), ncol=2, dimnames=list(LETTERS[1:11], letters[1:2]))
myMatrix

Related

R - change matrix values based on another matrix indices

I have two matrices:
m1 <- matrix(runif(750), nrow = 50, byrow=T)
m2 <- matrix(rep(TRUE,750), nrow = 50, byrow=T)
For each m1 row, I need to find the indices of the two lowest values. Then, I need to use the remaining indices (i.e. not the two lowest values) to assign FALSE in m2.
It is fairly easy to do for one row:
ind <- order(m1[1,], decreasing=FALSE)[1:2]
m2[1,][-ind] <- FALSE
Therefore, I can use a loop to do the same for all rows:
for (i in 1:dim(m1)[1]){
ind <- order(m1[i,], decreasing=FALSE)[1:2]
m2[i,][-ind] <- FALSE
}
However, in my data set this loop runs slower than I would like (since my matrices are quite large - 500000x150000).
Is there any faster, R way to achieve the same result without the use of loops?
You can try the code below
m2 <- t(apply(m1,1,function(x) x %in% head(sort(x),2)))
You can try apply since you have matrix :
val <- rep(TRUE, ncol(m1))
m3 <- t(apply(m1, 1, function(x) {val[-order(x)[1:2]] <- FALSE;val}))
You can do:
m2 <- t(apply(m1, 1, function(x) rank(x)<3))
Using pmap
library(purrr)
pmap_dfr(as.data.frame(m1), ~ min_rank(c(...)) < 3)

apply list of indices to list of dataframes

I need to apply a list of indices to a list of dataframes with a one on one mapping. First element of the list of indices goes to the first dataframe only and so on. List of indices applies to the rows in the dataframes.
And a list of complementary dataframes needs to created by selecting rows not mentioned in the indices list.
Here is some sample data:
set.seed(1)
A <- data.frame(matrix(rnorm(40,0,1), nrow = 10))
B <- data.frame(matrix(rnorm(40,2,3), nrow = 10))
C <- data.frame(matrix(rnorm(40,3,4), nrow = 10))
dflis <- list(A,B,C)
# Create a sample row index
ix <- lapply(lapply(dflis,nrow), sample, size = 6)
So far I have managed this working but ugly looking code:
dflis.train <- lapply(seq_along(dflis), function(x) dflis[[x]][ix[[x]],])
dflis.test <- lapply(seq_along(dflis), function(x) dflis[[x]][-ix[[x]],])
Can someone suggest something better, more elegant?
Use Map/mapply instead of the univariate lapply, so that you can iterate over both objects and apply a function, like:
Map(function(d,r) d[r,], dflis, ix)
Or if you want to be fancy:
Map(`[`, dflis, ix, TRUE)
Matches your requested answer.
identical(
Map(function(d,r) d[r,], dflis, ix),
lapply(seq_along(dflis), function(x) dflis[[x]][ix[[x]],])
)
#[1] TRUE

Loop with matrix accumulating values in column

I'm trying to make a loop to simplify:
dens1ha <- (densidade[1:45,5])
dens10ha <- (densidade[46:90,5])
dens100ha <- (densidade[91:135,5])
densfc <- (densidade[136:180,5])
denscap <- (densidade[181:225,5])
I need it stored in a single vector (x) and matrix (mm) as follows:
values of the matrix density line 1 to line 45, column 5, are stored in column 1 of vector x and the matrix mm. The line density matrix values 46 to line 90, column 5, are stored in column 2 of vector x and the matrix mm
and so on.
I tried:
x=c()
ii[1]=1
for(i in seq(1, 255, by = 44)) {
x[i]=densidade[i:(i+44),5]
ii=ii+1
mm = matrix(x,nrow=i,ncol=ii)
}
From your description of mm, I think this should do the trick:
mm <- matrix(densidade[,5], ncol=5)
You could also add names to the columns if this were desirable:
colnames(mm) <- c("dens1ha", "dens10ha", "dens100ha", "densfc", "denscap")
The goal of storing the vector x is less clear. I suspect that all of your goals may be achieved through extracting from the matrix mm rather than building a separate matrix:
# get dens1ha values as a vector
mm[,"dens1ha"]
mm[,1]
If you really would like to store these values in a separate, non-matrix structure, the most natural object to use in R is a list:
x <- list()
for(i in 1:5) {
x[[i]] <- densidade[(((i-1)*45)+1):(i*45),5]
}
# name the elements of the list
names(x) <- c("dens1ha", "dens10ha", "dens100ha", "densfc", "denscap")
You can extract vectors from this list using either
x[["dens1ha"]]
or
x[[1]]

Writing to dataframe in double for loop

I have a nested list (FCdat.pp.con[[x]][[y]]) in which seperate columns hold values of a matrix (for each x and nested y). Now I want to make the matrices by writing to dataframes.
However, I don't succeed in giving the matrices the correct name in a double for loop. At the moment, I only succeeded in making matrices with information about "x" but not with information about "y". Please help me out or suggest another method to do this.
for (x in 1:36){
for (y in 1:5){
cells <- c(0,
FCdat.pp.con[[x]][[y]]$A12,
FCdat.pp.con[[x]][[y]]$A13,
FCdat.pp.con[[x]][[y]]$A14,
FCdat.pp.con[[x]][[y]]$A21,0,
FCdat.pp.con[[x]][[y]]$A23,
FCdat.pp.con[[x]][[y]]$A24,
FCdat.pp.con[[x]][[y]]$A31,
FCdat.pp.con[[x]][[y]]$A32,
0,
FCdat.pp.con[[x]][[y]]$A34,
FCdat.pp.con[[x]][[y]]$A41,
FCdat.pp.con[[x]][[y]]$A42,
FCdat.pp.con[[x]][[y]]$A43,
0)
rnames <- c("ALG1","ALG2","ALG3","ALG4")
cnames <- c("ALG1","ALG2","ALG3","ALG4")
Ind.matr <- t(matrix(cells, nrow=4, ncol = 4))
dimnames(Ind.matr)=list(rnames, cnames)
assign(paste0("Indmatr", x), Ind.matr)
}
}
Add this before the loop begins: result <- lapply(1:36, function(.)vector("list",5))
Then replace assign(paste0("Indmatr", x), Ind.matr) with result[[x]][[y]] <- Ind.matr.

How do you find the sample sizes used in calculations on r?

I am running correlations between variables, some of which have missing data, so the sample size for each correlation are likely different. I tried print and summary, but neither of these shows me how big my n is for each correlation. This is a fairly simple problem that I cannot find the answer to anywhere.
like this..?
x <- c(1:100,NA)
length(x)
length(x[!is.na(x)])
you can also get the degrees of freedom like this...
y <- c(1:100,NA)
x <- c(1:100,NA)
cor.test(x,y)$parameter
But I think it would be best if you show the code for how your are estimating the correlation for exact help.
Here's an example of how to find the pairwise sample sizes among the columns of a matrix. If you want to apply it to (certain) numeric columns of a data frame, combine them accordingly, coerce the resulting object to matrix and apply the function.
# Example matrix:
xx <- rnorm(3000)
# Generate some NAs
vv <- sample(3000, 200)
xx[vv] <- NA
# reshape to a matrix
dd <- matrix(xx, ncol = 3)
# find the number of NAs per column
apply(dd, 2, function(x) sum(is.na(x)))
# tack on some column names
colnames(dd) <- paste0("x", seq(3))
# Function to find the number of pairwise complete observations
# among all pairs of columns in a matrix. It returns a data frame
# whose first two columns comprise all column pairs
pairwiseN <- function(mat)
{
u <- if(is.null(colnames(mat))) paste0("x", seq_len(ncol(mat))) else colnames(mat)
h <- expand.grid(x = u, y = u)
f <- function(x, y)
sum(apply(mat[, c(x, y)], 1, function(z) !any(is.na(z))))
h$n <- mapply(f, h[, 1], h[, 2])
h
}
# Call it
pairwiseN(dd)
The function can easily be improved; for example, you could set h <- expand.grid(x = u[-1], y = u[-length(u)]) to cut down on the number of calculations, you could return an n x n matrix instead of a three-column data frame, etc.
Here is a for-loop implementation of Dennis' function above to output an n x n matrix rather than have to pivot_wide() that result. On my databricks cluster it cut the compute time for 1865 row x 69 column matrix down from 2.5 - 3 minutes to 30-40 seconds.
Thanks for your answer Dennis, this helped me with my work.
pairwise_nxn <- function(mat)
{
cols <- if(is.null(colnames(mat))) paste0("x", seq_len(ncol(mat))) else colnames(mat)
nn <- data.frame(matrix(nrow = length(cols), ncol = length(cols)))
rownames(nn) <- colnames(nn) <- cols
f <- function(x, y)
sum(apply(mat[, c(x, y)], 1, function(z) !any(is.na(z))))
for (i in 1:nrow(nn))
for (j in 1:ncol(nn))
nn[i,j] <- f(rownames(nn)[i], colnames(nn)[j])
nn
}
If your variables are vectors named a and b, would something like sum(is.na(a) | is.na(b)) help you?

Resources