Using mat2listw function in R to create spatial weights matrix - r

I am attempting to create a weights object in R with the mat2listw function. I have a very large spatial weights matrix (roughly 22,000x22,000)
that was created in Excel and read into R, and I'm now trying to implement:
library(spdep)
SW=mat2listw(matrix)
I am getting the following error:
Error in if (any(x<0)) stop ("values in x cannot be negative"): missing
value where TRUE/FALSE needed.
What's going wrong here? My current matrix is all 0's and 1's, with no
missing values and no negative elements. What am I missing?
I'd appreciate any advice. Thanks in advance for your help!

Here is a simple test to your previous comment:
library(spdep)
m1 <-matrix(rbinom(100, 1, 0.5), ncol =10, nrow = 10) #create a random 10 * 10 matrix
m2 <- m1 # create a duplicate of the first matrix
m2[5,4] <- NA # assign an NA value in the second matrix
SW <- mat2listw(m1) # create weight list matrix
SW2 <- mat2listw(m2) # create weight list matrix
The first matrix one does not fail, but the second matrix does. The real question is now why your weight matrix is created containing NAs. Have you considered creating spatial weight matrix in r? Using dnearneigh or other function.

Related

Using popbio package to calculate a population projection correctly?

So I have been working through a population ecology exercise using the popbio package in R-Studio that focuses on using Leslie Matrix's. I have successfully created a Leslie matrix with the proper dimensions using the Fecundity (mx) and Annual Survival values (sx) that I have calculated with my life table. I then am trying to use the pop.projection function in the popbio package to multiply my Leslie matrix (les.mat) by a starting population vector (N0) followed by the number of time intervals (4 years). It is my understanding that you should be able to take a Leslie matrix and multiply by a population vector to calculate a population size after a set number of time intervals. Have I done something wrong here, when I try to run my pop.projection line of code I get the following error message in R:
"> projA <- pop.projection(les.mat,N0,10)
Error in A %*% n : non-conformable arguments"
Could the problem be an issue with my pop.projection function? I am thinking it may be an issue with by N0 argument (population vector), when I look at my N0 values it seems like it has been saved in R as a "Numeric Type", should I be converting it into its own matrix, or as it's own vector somehow to get my pop.projection line of code to run? Any advice would be greatly appreciated, the short code I have been using will be linked below!
Sx <- c(0.8,0.8,0.7969,0.6078,0.3226,0)
mx <- c(0,0,0.6,1.09,0.2,0)
Fx <- mx # fecundity values
S <- Sx # dropping the first value
F <- Fx
les.mat <- matrix(rep(0,36),nrow=6)
les.mat[1,] <- F
les.mat
for(i in 1:5){
les.mat[(i+1),i] <- S[i]
}
les.mat
N0 <- c(100,80,64,51,31,10,0)
projA <- pop.projection(les.mat,N0,10)
The function uses matrix multiplication on the first and second arguments so they must match. The les.mat matrix is 6x6, but N0 is length 7. Try
projA <- pop.projection(les.mat, N0[-7], 10) # Delete last value
or
projA <- pop.projection(les.mat, N0[-1], 10) # Delete first value

Finding p value between two rasters in R

How can a p value be obtained between two rasters?
I currently have two rasters, and I would like to compute a p value.
I convert both into dataframes with na.rm=T.
df1<-as.data.frame(r1,na.rm=T)
df1<-as.data.frame(r2,na.rm=T)
cor.test(df1$gc,df2$ip)$p.value
Error in cor.test.default(df1$gc,df2$ip)) :
'x' and 'y' must have the same length
Even if I dont go for the na.rm, this comes
df1<-as.data.frame(r1)
df1<-as.data.frame(r2)
cor.test(df1$gc,df2$ip)$p.value
[1] 0
you have to pass a numeric vector to cor.test(). You are getting to that converting to a data.frame and getting the column, but it's probably easier to use the raster function values():
set.seed(666)
r1 <- raster(matrix(rnorm(100), 10, 10))
r2 <- raster(matrix(rnorm(100), 10, 10))
cor.test(values(r1), values(r2))$p.value
0.07144313
The error when you remove NA is because the two vector do not have the same length, i.e. in one vector you have more NA cells than in the other. Depending on many NA you have, the p-value you get at the end (0) may be important or not. Did you try making a simple boxplot or distribution plot of the values in both rasters? This may help you understand if the p-value is simply an artefact of too many NA.
d <- data.frame(
x = c(values(r1), values(r2)),
y = c(rep(1, ncell(r1)), rep(2, ncell(r2)))
)
boxplot(d$x ~ d$y)
/Emilio

Pearson coefficient per rows on large matrices

I'm currently working with a large matrix (4 cols and around 8000 rows).
I want to perform a correlation analysis using Pearson's correlation coefficient between the different rows composing this matrix.
I would like to proceed the following way:
Find Pearson's correlation coefficient between row 1 and row 2. Then between rows 1 and 3... and so on with the rest of the rows.
Then find Pearson's correlation coefficient between row 2 and row 3. Then between rows 2 and 4... and so on with the rest of the rows. Note I won't find the coefficient with row 1 again...
For those coefficients being higher or lower than 0.7 or -0.7 respectively, I would like to list on a separate file the row names corresponding to those coefficients, plus the coefficient. E.g.:
row 230 - row 5812 - 0.76
I wrote the following code for this aim. Unfortunately, it takes a too long running time (I estimated almost a week :( ).
for (i in 1:7999) {
print("Analyzing row:")
print(i)
for (j in (i+1):8000) {
value<- cor(alpha1k[i,],alpha1k[j,],use = "everything",method = "pearson")
if(value>0.7 | value<(-0.7)){
aristi <- c(row.names(alpha1k)[i],row.names(alpha1k)[j],value)
arist1p<-rbind(arist1p,aristi)
}
}
Then my question is if there's any way I could do this faster. I read about making these calculations in parallel but I have no clue on how to make this work. I hope I made myself clear enough, thank you on advance!
As Roland pointed out, you can use the matrix version of cor to simplify your task. Just transpose your matrix to get a "row" comparison.
mydf <- data.frame(a = c(1,2,3,1,2,3,1,2,3,4), b = rep(5,2,10), c = c(1:10))
cor_mat <- cor(t(mydf)) # correlation of your transposed matrix
idx <- which((abs(cor_mat) > 0.7), arr.ind = T) # get relevant indexes in a matrix form
cbind(idx, cor_mat[idx]) # combine coordinates and the correlation
Note that parameters use = everything and method = "pearson" are used by default for correlation. There is no need to specify them.

Covariance with colinear vectors

I'm trying to calculate the covariance of a matrix which has two colinear vectors. I have read that it was impossible with the "cov" function from R.
Does a different function exist on R to calculate the covariance of a matrix which has two colinear vectors (since it works on Matlab and Excel).
Thank you in advance for your answers
Please consider providing a reproducible example with sample of your data and the corresponding code. Broadly speaking, a covariance matrix can be created with use of the code below:
# Vectors
V1 <- c(1:4)
V2 <- c(4:8)
V3 <- runif(n = 4)
V4 <- runif(n = 4)
#create matrix
M <- cbind(V1,V2, V3, V4)
# Covariance
cov(M)
I'm guessing that you may be getting the following error:
number of rows of result is not a multiple of vector length (arg 1)
You could first try to use the cov function as discussed here.

How to combine data from different columns, e.g. mean of surrounding columns for a given column

I am trying to smooth a matrix by attributing the mean value of a window covering n columns around a given column. I've managed to do it but I'd like to see how would be 'the R way' of doing it as I am making use of for loops. Is there a way to get this using apply or some function of the same family?
Example:
# create a toy matrix
mat <- matrix(ncol=200);
for(i in 1:100){ mat <- rbind(mat,sample(1:200, 200) )}
# quick visualization
image(t(mat))
This is the matrix before smoothing:
I wrote the function smooth_mat that takes a matrix and the length of the smoothing kernel:
smooth_row_mat <- function(k, k.d=5){
k.range <- (k.d + 2):(ncol(k) - k.d - 1)
k.smooth <- matrix(nrow=nrow(k))
for( i in k.range){
if (i %% 10 == 0) cat('\r',round(i/length(k.range), 2))
k.smooth <- cbind( k.smooth, rowMeans(k[,c( (i-1-k.d):(i-1) ,i, (i+1):(i + 1 - k.d) )]) )
}
return(k.smooth)
}
Now we use smooth_row_mat() with mat
mat.smooth <- smooth_mat(mat)
And we have successfully smoothed, on a row basis, the content of the matrix.
This is the matrix after:
This method is good for such a small matrix although my real matrices are around 40,000 x 400, still works but I'd like to improve my R skills.
Thanks!
You can apply a filter (running mean) across each row of your matrix as follows:
apply(k, 1, filter, rep(1/k.d, k.d))
Here's how I'd do it, with the raster package.
First, create a matrix filled with random data and coerce it to a raster object.
library(raster)
r <- raster(matrix(sample(200, 200*200, replace=TRUE), nc=200))
plot(r)
Then use the focal function to calculate a neighbourhood mean for a neighbourhood of n cells either side of the focal cell. The values in the matrix of weights you provide to the focal function determine how much the value of each cell contributes to the focal summary. For a mean, we say we want each cell to contribute 1/n, so we fill a matrix of n columns, with values 1/n. Note that n must be an odd number, and the cell in the centre of the matrix is considered the focal cell.
n <- 3
smooth_r <- focal(r, matrix(1/n, nc=n))
plot(smooth_r)

Resources