Quadratic Assignment Procedure(QAP) in R is producing different results - r

I would like to say thank you in advance for anyone who looks at my question and shares their thoughts and experiences. I am trying to run a quadratic assignment procedure (QAP) on correlations of behaviors between a community of five individuals. I have ten matrices that represent frequencies of behavior between individuals, and I calculated correlations (pearson's r) between pairs of matrices. For example, I found the correlation between matrix 1 and matrix 2, matrix 2 and matrix 3, matrix 3 and matrix 4... and so on. I then wanted to assess the significance of these correlations using the qaptest function from the R package sna. As per the R documentation on qaptest, I placed all of my matrices into an array. I then calculated the QAP p-value between pairs of matrices (matrix 1 and matrix 2, matrix 2 and matrix 3... etc.). However, I noticed that if I changed the number of matrices in the array (for example, if I only placed the first five into the array), the QAP p-values for the first set of matrices changed dramatically. Based on my understanding of arrays and QAP, this should not happen because the removed matrices have nothing to do with running a QAP test on matrix 1 and matrix 2. Has anyone else ran into this problem before? I included my matrices and my script below.
Here are my matrices in a list format (in the code below, this is the step where I made filelist1. The second half of the code only uses matrices 1-5):
[[1]]
1 2 3 4 5
1 1 0 0 0 0
2 5 0 3 5 0
3 0 0 0 0 0
4 0 0 0 0 0
5 2 0 1 0 0
[[2]]
1 2 3 4 5
1 0 0 1 0 0
2 3 6 10 1 2
3 0 0 0 0 0
4 0 5 0 0 0
5 0 0 5 0 0
[[3]]
1 2 3 4 5
1 0 1 0 0 0
2 2 0 5 7 0
3 0 0 0 0 3
4 1 0 0 0 0
5 1 2 2 3 0
[[4]]
1 2 3 4 5
1 0 6 0 0 2
2 2 0 8 5 0
3 0 5 0 0 0
4 1 0 0 0 0
5 0 0 1 3 2
[[5]]
1 2 3 4 5
1 0 0 0 0 0
2 1 0 2 5 1
3 0 0 0 0 0
4 1 2 3 0 1
5 0 3 3 1 0
[[6]]
1 2 3 4 5
1 0 0 0 0 0
2 2 0 3 0 3
3 0 0 0 0 0
4 1 0 4 0 0
5 1 5 7 0 0
[[7]]
1 2 3 4 5
1 0 0 0 0 0
2 2 0 6 0 3
3 0 0 0 0 0
4 6 0 4 0 0
5 1 0 2 0 0
[[8]]
1 2 3 4 5
1 0 0 0 1 0
2 2 0 1 6 0
3 0 0 0 0 0
4 0 0 0 0 0
5 6 0 2 2 0
[[9]]
1 2 3 4 5
1 0 0 0 0 0
2 0 0 2 3 2
3 0 0 0 0 0
4 0 0 0 0 0
5 1 0 2 0 0
[[10]]
1 2 3 4 5
1 0 0 0 0 0
2 1 0 1 1 0
3 0 0 0 0 0
4 0 0 0 0 0
5 6 0 1 2 0
This is my R script:
# read in all ten of the matrices
a<-read.csv("test1.csv")
b<-read.csv("test2.csv")
c<-read.csv("test3.csv")
d<-read.csv("test4.csv")
e<-read.csv("test5.csv")
f<-read.csv("test6.csv")
g<-read.csv("test7.csv")
h<-read.csv("test8.csv")
i<-read.csv("test9.csv")
j<-read.csv("test10.csv")
filelist<-list(a,b,c,d,e,f,g,h,i,j) #place files in a list
filelist1<-lapply(filelist,function(x){
x<-x[1:5, 2:6] #choose only columns in the matrix
colnames(x)<-1:5 #rename columns according to identity
x<-as.matrix(x) #make a matrix
return(x)
})
ee<-array(dim=c(5,5,10)) #create an empty array
array<-function(files) {
names(files) <- c("c1","c2","c3", "c4", "c5", "c6", "c7", "c8", "c9", "c10") #name the matrices
invisible(lapply(names(files), function(x) assign(x,files[[x]],envir=.GlobalEnv))) #place the matrices in a global environment
ee[,,1]<-c(c1) #place each matrix in order into the array
ee[,,2]<-c(c2)
ee[,,3]<-c(c3)
ee[,,4]<-c(c4)
ee[,,5]<-c(c5)
ee[,,6]<-c(c6)
ee[,,7]<-c(c7)
ee[,,8]<-c(c8)
ee[,,9]<-c(c9)
ee[,,10]<-c(c10)
return(ee) #return the completely filled in array
}
a.array<-array(filelist1) # apply the function to the list of matrices
q1.2<-qaptest(a.array,gcor,g1=1,g2=2) #run the qaptest funtion
#a.array is the array with the matrices,gcor tells the function that we want a correlation
#g1=1 and g2=2 indicates that the qap analysis should be run between the first and second matrices in the array.
summary.qaptest(q1.2) #provides a summary of the qap results
#in this case, the p-value is roughly: p(f(perm) >= f(d)): 0.176
############ If I take out the last five matrices, the q1.2 p-value changes dramatically
#first clear the memory or R will not create another blank array
rm(list = ls())
a<-read.csv("test1.csv") #read in all five files
b<-read.csv("test2.csv")
c<-read.csv("test3.csv")
d<-read.csv("test4.csv")
e<-read.csv("test5.csv")
filelist<-list(a,b,c,d,e) #create a list of the files
filelist1<-lapply(filelist,function(x){
x<-x[1:5, 2:6] #include only the matrix
colnames(x)<-1:5 #rename the columns
x<-as.matrix(x) #make it a matrix
return(x)
})
ee<-array(dim=c(5,5,5)) #this time the array only has five slots
array<-function(files) {
names(files) <- c("c1","c2","c3", "c4", "c5")
invisible(lapply(names(files), function(x) assign(x,files[[x]],envir=.GlobalEnv)))
ee[,,1]<-c(c1)
ee[,,2]<-c(c2)
ee[,,3]<-c(c3)
ee[,,4]<-c(c4)
ee[,,5]<-c(c5)
return(ee)
}
a.array<-array(filelist1)
q1.2<-qaptest(a.array,gcor,g1=1,g2=2)
#in this case, the p-value is roughly: p(f(perm) >= f(d)): 0.804
summary.qaptest(q1.2)
I cannot think of a reason why the p-values would be so different when I am analyzing the exact same pair of matrices. The only difference is the number of additional matrices placed in the array. Has anyone else experienced this issue?
Thank you!

qaptest() reads graphs from the first dimension of the array, not the last. So ee[,,1]<-c(c1) (etc.) should read ee[1,,]<-c(c1) (etc.). When you place all the graph in the first dimension, the qaptests should yield identical results. Personally, I prefer using list() instead of array() with qaptest.

Related

Generating neighbors list in R

I am trying to generate a neighbors list, specifically an asymmetrical site-by-edge matrix (See below for paper). However, I am having a hard time creating multiple origins and have tried multiple different ways.
**1st attempt**
>nb <- cell2nb(3,2, "queen")
>xy <- coordinates
>xy
> no lat long
>1 1 30.20924 -97.49967
>2 2 30.11203 -97.32514
>3 3 29.70528 -96.53542
>4 4 29.53580 -97.88101
>5 5 29.48454 -97.44769
>6 6 28.82390 -97.03054
>edge.mat <- aem.build.binary(nb, xy)
This generates the following:
enter image description here
and edges are the following:
enter image description here
I want 0 (origin) to branch to both 1 and 4 separately. I also do not want 1:3 and 4:6 to cross. I want 1 to go to 2, 2 to go to 3 and then 4 to go to 5 and 5 to go to 6.
**2nd attempt**
I also tried to construct my own edges matrix and build a binary matrix that way, but 4 still does not branch from 0.
edges file
> Var1 Var2
>1 0 1
>2 1 2
>3 2 3
>4 0 4
>5 4 5
>6 5 6
>xy
> no lat long
>1 1 30.20924 -97.49967
>2 2 30.11203 -97.32514
>3 3 29.70528 -96.53542
>4 4 29.53580 -97.88101
>5 5 29.48454 -97.44769
>6 6 28.82390 -97.03054
>bin.mat <- aem.build.binary(coords = xy, link = edges)
>bin.mat
>$se.mat
> [,1] [,2] [,3] [,4] [,5]
> [1,] 0 0 0 0 0
> [2,] 0 1 0 0 0
> [3,] 0 1 1 0 0
> [4,] 0 0 0 0 0
> [5,] 0 0 0 1 0
> [6,] 0 0 0 1 1
> [7,] 1 0 0 0 0
> [8,] 0 0 0 0 0
> [9,] 0 0 0 0 0
>[10,] 0 0 0 0 0
>[11,] 0 0 0 0 0
>[12,] 0 0 0 0 0
>[13,] 0 0 0 0 0
>[14,] 0 0 0 0 0
>[15,] 0 0 0 0 0
>$edges
> from to
> 0 7
>2 1 2
>3 2 3
>5 4 5
>6 5 6
enter image description here
Any suggestions? I would greatly appreciate it. Thanks!
Blanchet, F. G., Legendre, P., Maranger, R., Monti, D., & Pepin, P. (2011). Modelling the effect of directional spatial ecological processes at different scales. Oecologia, 166(2), 357-368.

How to convert a binary data frame to a vector?

Suppose I have a data frame such like
dat<-data.frame('0'=c(1,1,0,0,0,0,0,0),
'1'=c(0,0,1,0,1,0,0,0),
'2'=c(0,0,0,1,0,0,1,1),
'3'=c(0,0,0,0,0,1,0,0))
dat
X0 X1 X2 X3
1 1 0 0 0
2 1 0 0 0
3 0 1 0 0
4 0 0 1 0
5 0 1 0 0
6 0 0 0 1
7 0 0 1 0
8 0 0 1 0
I wanted to convert it to a vector like 1,1,2,3,2,4,3,3 where the numbers corresponding the column-th with unit 1. For example, 4 means the col 4th on row number 6th is 1.
Use
max.col(dat)
# [1] 1 1 2 3 2 4 3 3
In base R, we can use apply
apply(dat == 1, 1, which)
#[1] 1 1 2 3 2 4 3 3

How to create a variable for interquartile range in R?

I have a dataset (STATPOP2016 by Swiss Federal Statistical Office) that contains number of households of different sizes per each hectar of Swiss territory. In other terms, for each hectar i I have:
x1 households consisting of one individual
x2 households consisting of two individuals
...
x6 households with 6 or more individuals (I consider them as having 6 people for simplicity).
I need to create a variable that will show me interquartile range for the households' number per each hectar. I have the code that works, but it is very slow. Is there a smarter way to do the same thing?
There is my code:
# Vector that contains all possible sizes of households
vector_hh_size <- c(1:6)
# Variable for interquantile range in household size. A is my dataframe
A$hh_size_IQR <- 0
# Vector that contains frequency of each size of household in a given hectar
vector_hh_frequency <- c(0,0,0,0,0,0)
for (i in 1:NROW(A)) {
for (j in 1:6){
vector_hh_frequency[j] <- eval(parse(text = paste("A$hh",j,"[",i,"]",sep = "")))
}
A$hh_size_IQR[i] <- wtd.quantile(vector_hh_size, weights = vector_hh_frequency)[4] - wtd.quantile(vector_hh_size, weights = vector_hh_frequency)[2]
}
Here is example of data:
hh1 hh2 hh3 hh4 hh5 hh6 IQR
1 0 3 0 0 0 0 0
2 0 3 0 0 0 0 0
3 0 0 3 0 0 0 0
4 0 3 0 0 0 0 0
5 3 6 3 3 0 0 1
6 0 3 0 0 3 0 3
7 11 7 4 7 3 0 3
8 3 3 0 3 0 0 3
9 3 3 0 3 0 0 3
10 0 3 0 0 0 0 0
#OBSis observation number, hhi shows how many households with i people there are. IQR is interquartile range for each observation - this is the variable I am building.
Here is a shorter version of your code:
library("Hmisc")
A <- read.table(header=TRUE, text=
" hh1 hh2 hh3 hh4 hh5 hh6
1 0 3 0 0 0 0
2 0 3 0 0 0 0
3 0 0 3 0 0 0
4 0 3 0 0 0 0
5 3 6 3 3 0 0
6 0 3 0 0 3 0
7 11 7 4 7 3 0
8 3 3 0 3 0 0
9 3 3 0 3 0 0
10 0 3 0 0 0 0")
vector_hh_size <- 1:ncol(A)
myIQR <- function(Ai) wtd.quantile(vector_hh_size, weights=Ai)[4] - wtd.quantile(vector_hh_size, weights=Ai)[2]
A$IQR <- apply(A, 1, myIQR)
# > A
# hh1 hh2 hh3 hh4 hh5 hh6 IQR
# 1 0 3 0 0 0 0 0
# 2 0 3 0 0 0 0 0
# 3 0 0 3 0 0 0 0
# 4 0 3 0 0 0 0 0
# 5 3 6 3 3 0 0 1
# 6 0 3 0 0 3 0 3
# 7 11 7 4 7 3 0 3
# 8 3 3 0 3 0 0 3
# 9 3 3 0 3 0 0 3
# 10 0 3 0 0 0 0 0

Error running the "netlm" command (sna)

I have four matrices of one multigraph, like this:
> projects
1 2 3 4 5
1 0 0 4 1 0
2 0 0 3 2 5
3 0 0 0 0 0
4 0 0 0 0 1
5 0 0 0 0 0
> infrastructure
1 2 3 4 5
1 0 0 0 5 0
2 0 0 4 0 0
3 0 0 0 2 2
4 0 0 0 0 3
5 0 0 0 0 0
> information
1 2 3 4 5
1 0 1 3 0 0
2 0 0 2 3 4
3 0 0 0 0 0
4 0 0 0 0 0
5 0 0 0 0 0
> problems
1 2 3 4 5
1 0 1 0 1 0
2 0 0 0 0 0
3 0 0 0 1 1
4 0 0 0 0 0
5 0 0 0 0 0
I rearrange it's with ...
x <- array(NA, c(length(infrastructure[1,]),length(infrastructure[,1]),3))
x[,,1] <- infrastructure
x[,,2] <- information
x[,,3] <- problems
nl <- netlm(projects,x,reps=100)
when i perform "netlm" command, the next message appears:
"Error in netlm(projects, x, reps = 100) :
Homogeneous graph orders required in netlm."
How can I fix it?
Thanks
The problem here is that netlm expects a list rather than an array, so I think it is not reading the entries as separate networks. The error indicates as much. It is not seeing three 5x5 matrices. Use list() instead.
nets <- rgraph(5,4)
y <- nets[1,,]
info <- nets[2,,]
infra <- nets[3,,]
prob <- nets[4,,]
Now, you can use list() in the netlm() command itself (saves a step):
nl <- netlm(y,list(info,infra,prob),reps=100)
Or you can create the list as an object and use it that way:
x <- list(info,infra,prob)
nl <- netlm(y,x,reps=100)
Since you have three separate networks already, you can just do:
nl <- netlm(projects,list(problems, information, infrastructure),reps=100)
I made a mistake in defining the array, I should write the following code: array(NA,c(3,length(infrastructure[1,]),length(infrastructure[,1])))

Incidence Matrix of Experimental Design Using R language Program

I am working on educational assignment to produce an Incidence matrix from a BIB design using R language software.
I found a web page http://wiki.math.yorku.ca/index.php/R:_Incidence_matrix related to problem. But it produces Data matrix instead of Incidence matrix. can anyone please help me out with R language code. the codes for obtaining the BIB design matrix is:
b=4 # Number of Blocks
t=8 # Number of Column
z=c(1,2,3) # Shift
m=NULL
y=c(0)
w=c(y,cumsum(z) %%t) # cumsum() is for the running totals
p=seq(from=0, to=t-1, by=1)
l=NULL
for(i in 1:b)
{
for(j in 1:t)
{
l=c(l,rep((w[i]+p[j]+t)%% t))
}
}
#"BIB design" it has 4 rows (blocks b) and 8 column (treatments t)
x= matrix(c(l),nrow=b,ncol=t,byrow = TRUE)
print (x)
0 1 2 3 4 5 6 7
1 2 3 4 5 6 7 0
3 4 5 6 7 0 1 2
6 7 0 1 2 3 4 5
(it can be generated at any t-treatments and b-blocks size generally)
using above design matrix x (4*8). i need the following Incidence matrix (8*8)
1 1 0 1 0 0 1 0
0 1 1 0 1 0 0 1
1 0 1 1 0 1 0 0
0 1 0 1 1 0 1 0
0 0 1 0 1 1 0 1
0 1 0 0 1 0 1 1
1 0 1 0 0 1 0 1
Consider Design Matrix Column wise and generate Incidence Matrix Row wise. For example the 1st column of x is
0
1
6
3
Now see the 1st row of the required Incidence Matrix (IM).
1 1 0 1 0 0 1 0
At 1st place of x is 0 so put 1 in 1st place of IM.
At 2nd place of x is 1 so put also 1 at the 2nd place of IM.
Here 2 is missing in the column of x so put 0 at 3rd place of IM.
x contains 3 so put 1 at 4th place, 4 and 5 is missing put two 0's in a row consecutively.
X has 6 put 1 at 7th place and 7 is missing put 0 at 8th place of IM.
Take 2nd column of x and similarly filled 2nd row of IM. If the particular number (0 to 7) is present put one otherwise zero.
I hope, i make it clear for every one now.
Making the x matrix different to have two identical entries in one column I get this logic to work:
x[4,1] <- 1
t( apply(x, 2, function(z){ ret <- numeric(8)
for( i in seq_along(z) ){ret[z[i]+1] <- ret[z[i]+1]+ 1}
ret}) )
[,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8]
[1,] 1 2 0 1 0 0 0 0
[2,] 0 1 1 0 1 0 0 1
[3,] 1 0 1 1 0 1 0 0
[4,] 0 1 0 1 1 0 1 0
[5,] 0 0 1 0 1 1 0 1
[6,] 1 0 0 1 0 1 1 0
[7,] 0 1 0 0 1 0 1 1
[8,] 1 0 1 0 0 1 0 1
I'm not exactly sure how you are going by getting your intended output. However, the reason you are getting a much longer output than you anticipated is possibly due to the [ as.factor(vec),] part of your code .
as.factor(vec) is taking your 4x4 matrix and turning it into a single vector of 16 elements. (Well, technically, vec is already a vector, but let's not confuse things).
as.factor(vec)
[1] 0 1 3 2 1 2 0 3 2 3 1 0 3 0 2 1
Levels: 0 1 2 3
You are then using that as an index, which is repeating values of A.
** By the way, are you sure you should get a matrix of all 1's? And not perhaps just 1's on the diagonal?
contrasts( as.factor(vec), contrasts =FALSE)
# 0 1 2 3
# 0 1 0 0 0
# 1 0 1 0 0
# 2 0 0 1 0
# 3 0 0 0 1

Resources