Incidence Matrix of Experimental Design Using R language Program - r

I am working on educational assignment to produce an Incidence matrix from a BIB design using R language software.
I found a web page http://wiki.math.yorku.ca/index.php/R:_Incidence_matrix related to problem. But it produces Data matrix instead of Incidence matrix. can anyone please help me out with R language code. the codes for obtaining the BIB design matrix is:
b=4 # Number of Blocks
t=8 # Number of Column
z=c(1,2,3) # Shift
m=NULL
y=c(0)
w=c(y,cumsum(z) %%t) # cumsum() is for the running totals
p=seq(from=0, to=t-1, by=1)
l=NULL
for(i in 1:b)
{
for(j in 1:t)
{
l=c(l,rep((w[i]+p[j]+t)%% t))
}
}
#"BIB design" it has 4 rows (blocks b) and 8 column (treatments t)
x= matrix(c(l),nrow=b,ncol=t,byrow = TRUE)
print (x)
0 1 2 3 4 5 6 7
1 2 3 4 5 6 7 0
3 4 5 6 7 0 1 2
6 7 0 1 2 3 4 5
(it can be generated at any t-treatments and b-blocks size generally)
using above design matrix x (4*8). i need the following Incidence matrix (8*8)
1 1 0 1 0 0 1 0
0 1 1 0 1 0 0 1
1 0 1 1 0 1 0 0
0 1 0 1 1 0 1 0
0 0 1 0 1 1 0 1
0 1 0 0 1 0 1 1
1 0 1 0 0 1 0 1
Consider Design Matrix Column wise and generate Incidence Matrix Row wise. For example the 1st column of x is
0
1
6
3
Now see the 1st row of the required Incidence Matrix (IM).
1 1 0 1 0 0 1 0
At 1st place of x is 0 so put 1 in 1st place of IM.
At 2nd place of x is 1 so put also 1 at the 2nd place of IM.
Here 2 is missing in the column of x so put 0 at 3rd place of IM.
x contains 3 so put 1 at 4th place, 4 and 5 is missing put two 0's in a row consecutively.
X has 6 put 1 at 7th place and 7 is missing put 0 at 8th place of IM.
Take 2nd column of x and similarly filled 2nd row of IM. If the particular number (0 to 7) is present put one otherwise zero.
I hope, i make it clear for every one now.

Making the x matrix different to have two identical entries in one column I get this logic to work:
x[4,1] <- 1
t( apply(x, 2, function(z){ ret <- numeric(8)
for( i in seq_along(z) ){ret[z[i]+1] <- ret[z[i]+1]+ 1}
ret}) )
[,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8]
[1,] 1 2 0 1 0 0 0 0
[2,] 0 1 1 0 1 0 0 1
[3,] 1 0 1 1 0 1 0 0
[4,] 0 1 0 1 1 0 1 0
[5,] 0 0 1 0 1 1 0 1
[6,] 1 0 0 1 0 1 1 0
[7,] 0 1 0 0 1 0 1 1
[8,] 1 0 1 0 0 1 0 1

I'm not exactly sure how you are going by getting your intended output. However, the reason you are getting a much longer output than you anticipated is possibly due to the [ as.factor(vec),] part of your code .
as.factor(vec) is taking your 4x4 matrix and turning it into a single vector of 16 elements. (Well, technically, vec is already a vector, but let's not confuse things).
as.factor(vec)
[1] 0 1 3 2 1 2 0 3 2 3 1 0 3 0 2 1
Levels: 0 1 2 3
You are then using that as an index, which is repeating values of A.
** By the way, are you sure you should get a matrix of all 1's? And not perhaps just 1's on the diagonal?
contrasts( as.factor(vec), contrasts =FALSE)
# 0 1 2 3
# 0 1 0 0 0
# 1 0 1 0 0
# 2 0 0 1 0
# 3 0 0 0 1

Related

I need help putting values from one vector into another in R

I have two vectors in R
Vector 1
0 0 0 0 0 0 0 0 0 0
Vector 2
1 1 3 1 1 1 1 1
I need to put the values from vector 2 into vector 1 but into specific positions so that vector 1 becomes
1 1 3 0 0 1 1 1 1 1
I need to do this in one line of code. I tried doing:
vector1[1:3,6:10] = vector2[1:3,4:8]
but I am getting the error "incorrect number of dimensions".
Is it possible to do this?
vector1[c(1:3,6:10)] = vector2[c(1:3,4:8)]
> vector1
[1] 1 1 3 0 0 1 1 1 1 1
We may use negative indexing
vector1[-(4:5)] <- vector2
vector1
[1] 1 1 3 0 0 1 1 1 1 1

How can I create this special sequence?

I would like to create the following vector sequence.
0 1 0 0 2 0 0 0 3 0 0 0 0 4
My thought was to create 0 first with rep() but not sure how to add the 1:4.
Create a diagonal matrix, take the upper triangle, and remove the first element:
d <- diag(0:4)
d[upper.tri(d, TRUE)][-1L]
# [1] 0 1 0 0 2 0 0 0 3 0 0 0 0 4
If you prefer a one-liner that makes no global assignments, wrap it up in a function:
(function() { d <- diag(0:4); d[upper.tri(d, TRUE)][-1L] })()
# [1] 0 1 0 0 2 0 0 0 3 0 0 0 0 4
And for code golf purposes, here's another variation using d from above:
d[!lower.tri(d)][-1L]
# [1] 0 1 0 0 2 0 0 0 3 0 0 0 0 4
rep and rbind up to their old tricks:
rep(rbind(0,1:4),rbind(1:4,1))
#[1] 0 1 0 0 2 0 0 0 3 0 0 0 0 4
This essentially creates 2 matrices, one for the value, and one for how many times the value is repeated. rep does not care if an input is a matrix, as it will just flatten it back to a vector going down each column in order.
rbind(0,1:4)
# [,1] [,2] [,3] [,4]
#[1,] 0 0 0 0
#[2,] 1 2 3 4
rbind(1:4,1)
# [,1] [,2] [,3] [,4]
#[1,] 1 2 3 4
#[2,] 1 1 1 1
You can use rep() to create a sequence that has n + 1 of each value:
n <- 4
myseq <- rep(seq_len(n), seq_len(n) + 1)
# [1] 1 1 2 2 2 3 3 3 3 4 4 4 4 4
Then you can use diff() to find the elements you want. You need to append a 1 to the end of the diff() output, since you always want the last value.
c(diff(myseq), 1)
# [1] 0 1 0 0 1 0 0 0 1 0 0 0 0 1
Then you just need to multiply the original sequence with the diff() output.
myseq <- myseq * c(diff(myseq), 1)
myseq
# [1] 0 1 0 0 2 0 0 0 3 0 0 0 0 4
unlist(lapply(1:4, function(i) c(rep(0,i),i)))
# the sequence
s = 1:4
# create zeros vector
vec = rep(0, sum(s+1))
# assign the sequence to the corresponding position in the zeros vector
vec[cumsum(s+1)] <- s
vec
# [1] 0 1 0 0 2 0 0 0 3 0 0 0 0 4
Or to be more succinct, use replace:
replace(rep(0, sum(s+1)), cumsum(s+1), s)
# [1] 0 1 0 0 2 0 0 0 3 0 0 0 0 4

Quadratic Assignment Procedure(QAP) in R is producing different results

I would like to say thank you in advance for anyone who looks at my question and shares their thoughts and experiences. I am trying to run a quadratic assignment procedure (QAP) on correlations of behaviors between a community of five individuals. I have ten matrices that represent frequencies of behavior between individuals, and I calculated correlations (pearson's r) between pairs of matrices. For example, I found the correlation between matrix 1 and matrix 2, matrix 2 and matrix 3, matrix 3 and matrix 4... and so on. I then wanted to assess the significance of these correlations using the qaptest function from the R package sna. As per the R documentation on qaptest, I placed all of my matrices into an array. I then calculated the QAP p-value between pairs of matrices (matrix 1 and matrix 2, matrix 2 and matrix 3... etc.). However, I noticed that if I changed the number of matrices in the array (for example, if I only placed the first five into the array), the QAP p-values for the first set of matrices changed dramatically. Based on my understanding of arrays and QAP, this should not happen because the removed matrices have nothing to do with running a QAP test on matrix 1 and matrix 2. Has anyone else ran into this problem before? I included my matrices and my script below.
Here are my matrices in a list format (in the code below, this is the step where I made filelist1. The second half of the code only uses matrices 1-5):
[[1]]
1 2 3 4 5
1 1 0 0 0 0
2 5 0 3 5 0
3 0 0 0 0 0
4 0 0 0 0 0
5 2 0 1 0 0
[[2]]
1 2 3 4 5
1 0 0 1 0 0
2 3 6 10 1 2
3 0 0 0 0 0
4 0 5 0 0 0
5 0 0 5 0 0
[[3]]
1 2 3 4 5
1 0 1 0 0 0
2 2 0 5 7 0
3 0 0 0 0 3
4 1 0 0 0 0
5 1 2 2 3 0
[[4]]
1 2 3 4 5
1 0 6 0 0 2
2 2 0 8 5 0
3 0 5 0 0 0
4 1 0 0 0 0
5 0 0 1 3 2
[[5]]
1 2 3 4 5
1 0 0 0 0 0
2 1 0 2 5 1
3 0 0 0 0 0
4 1 2 3 0 1
5 0 3 3 1 0
[[6]]
1 2 3 4 5
1 0 0 0 0 0
2 2 0 3 0 3
3 0 0 0 0 0
4 1 0 4 0 0
5 1 5 7 0 0
[[7]]
1 2 3 4 5
1 0 0 0 0 0
2 2 0 6 0 3
3 0 0 0 0 0
4 6 0 4 0 0
5 1 0 2 0 0
[[8]]
1 2 3 4 5
1 0 0 0 1 0
2 2 0 1 6 0
3 0 0 0 0 0
4 0 0 0 0 0
5 6 0 2 2 0
[[9]]
1 2 3 4 5
1 0 0 0 0 0
2 0 0 2 3 2
3 0 0 0 0 0
4 0 0 0 0 0
5 1 0 2 0 0
[[10]]
1 2 3 4 5
1 0 0 0 0 0
2 1 0 1 1 0
3 0 0 0 0 0
4 0 0 0 0 0
5 6 0 1 2 0
This is my R script:
# read in all ten of the matrices
a<-read.csv("test1.csv")
b<-read.csv("test2.csv")
c<-read.csv("test3.csv")
d<-read.csv("test4.csv")
e<-read.csv("test5.csv")
f<-read.csv("test6.csv")
g<-read.csv("test7.csv")
h<-read.csv("test8.csv")
i<-read.csv("test9.csv")
j<-read.csv("test10.csv")
filelist<-list(a,b,c,d,e,f,g,h,i,j) #place files in a list
filelist1<-lapply(filelist,function(x){
x<-x[1:5, 2:6] #choose only columns in the matrix
colnames(x)<-1:5 #rename columns according to identity
x<-as.matrix(x) #make a matrix
return(x)
})
ee<-array(dim=c(5,5,10)) #create an empty array
array<-function(files) {
names(files) <- c("c1","c2","c3", "c4", "c5", "c6", "c7", "c8", "c9", "c10") #name the matrices
invisible(lapply(names(files), function(x) assign(x,files[[x]],envir=.GlobalEnv))) #place the matrices in a global environment
ee[,,1]<-c(c1) #place each matrix in order into the array
ee[,,2]<-c(c2)
ee[,,3]<-c(c3)
ee[,,4]<-c(c4)
ee[,,5]<-c(c5)
ee[,,6]<-c(c6)
ee[,,7]<-c(c7)
ee[,,8]<-c(c8)
ee[,,9]<-c(c9)
ee[,,10]<-c(c10)
return(ee) #return the completely filled in array
}
a.array<-array(filelist1) # apply the function to the list of matrices
q1.2<-qaptest(a.array,gcor,g1=1,g2=2) #run the qaptest funtion
#a.array is the array with the matrices,gcor tells the function that we want a correlation
#g1=1 and g2=2 indicates that the qap analysis should be run between the first and second matrices in the array.
summary.qaptest(q1.2) #provides a summary of the qap results
#in this case, the p-value is roughly: p(f(perm) >= f(d)): 0.176
############ If I take out the last five matrices, the q1.2 p-value changes dramatically
#first clear the memory or R will not create another blank array
rm(list = ls())
a<-read.csv("test1.csv") #read in all five files
b<-read.csv("test2.csv")
c<-read.csv("test3.csv")
d<-read.csv("test4.csv")
e<-read.csv("test5.csv")
filelist<-list(a,b,c,d,e) #create a list of the files
filelist1<-lapply(filelist,function(x){
x<-x[1:5, 2:6] #include only the matrix
colnames(x)<-1:5 #rename the columns
x<-as.matrix(x) #make it a matrix
return(x)
})
ee<-array(dim=c(5,5,5)) #this time the array only has five slots
array<-function(files) {
names(files) <- c("c1","c2","c3", "c4", "c5")
invisible(lapply(names(files), function(x) assign(x,files[[x]],envir=.GlobalEnv)))
ee[,,1]<-c(c1)
ee[,,2]<-c(c2)
ee[,,3]<-c(c3)
ee[,,4]<-c(c4)
ee[,,5]<-c(c5)
return(ee)
}
a.array<-array(filelist1)
q1.2<-qaptest(a.array,gcor,g1=1,g2=2)
#in this case, the p-value is roughly: p(f(perm) >= f(d)): 0.804
summary.qaptest(q1.2)
I cannot think of a reason why the p-values would be so different when I am analyzing the exact same pair of matrices. The only difference is the number of additional matrices placed in the array. Has anyone else experienced this issue?
Thank you!
qaptest() reads graphs from the first dimension of the array, not the last. So ee[,,1]<-c(c1) (etc.) should read ee[1,,]<-c(c1) (etc.). When you place all the graph in the first dimension, the qaptests should yield identical results. Personally, I prefer using list() instead of array() with qaptest.

Working With a Diagonal Matrix

Hi I'm pretty much stumped on on trying to figure this out and could use a little help. Basically, I have a n x n matrix where the diagonal is set to a value k and every other value is 0.
1 2 3 4 5
1 k 0 0 0 0
2 0 k 0 0 0
3 0 0 k 0 0
4 0 0 0 k 0
5 0 0 0 0 k
Basically, I need to be able to make two other diagonals in this matrix with the value of 1 so it ends up looking like this:
1 2 3 4 5
1 k 1 0 0 0
2 1 k 1 0 0
3 0 1 k 1 0
4 0 0 1 k 1
5 0 0 0 1 k
So far all I have for code is being able to make the diagonal matrix
m=diag(k,n,n) but I have no idea on how to add the two other diagonals. Would I use apply() and cbind() or rbind()?
You can use col and row to create and index to subset and assign the upper and lower diagonals.
k=3
m <- k* diag(6)
m[abs(row(m) - col(m)) == 1] <- 1
m
# [,1] [,2] [,3] [,4] [,5] [,6]
#[1,] 3 1 0 0 0 0
#[2,] 1 3 1 0 0 0
#[3,] 0 1 3 1 0 0
#[4,] 0 0 1 3 1 0
#[5,] 0 0 0 1 3 1
#[6,] 0 0 0 0 1 3
If you wanted reverse diagonals you could use col(m) - row(m)
Try this function, it will make a matrix of dimensions row X col and diagonal of the numeric n.
matfun <- function(diag=n, row=4,col=4){
x = diag(1,row,col)
diag*x+rbind(as.vector(rep(0,col)),x[1:(row-1),])+cbind(as.vector(rep(0,row)),x[,1:(col-1)])
}
HTH

R- creating a counter-party frequency matrix

I have data from a barter economy. I am trying to create a matrix that counts how frequently items act as counterparties with other items.
As an example:
myDat <- data.frame(
TradeID = as.factor(c(1,1,1,2,2,2,3,3,4,4,5,5,6,6,7,7,8,8,8)),
Origin = as.factor(c(1,0,0,1,1,0,1,0,1,0,1,0,1,0,1,0,1,0,0)),
ItemID = as.factor(c(1,2,3,4,5,1,1,6,7,1,1,8,7,5,1,1,2,3,4))
)
TradeID Origin ItemID
1 1 1 1
2 1 0 2
3 1 0 3
4 2 1 4
5 2 1 5
6 2 0 1
7 3 1 1
8 3 0 6
9 4 1 7
10 4 0 1
11 5 1 1
12 5 0 8
13 6 1 7
14 6 0 5
15 7 1 1
16 7 0 1
17 8 1 2
18 8 0 3
19 8 0 4
20 9 1 1
21 9 0 8
Where TradeID indicates a specific transaction. ItemID indicates an item, and Origin indicates which direction the item went.
For example, given my data the matrix I'd create would look something like this:
For example, the value 2 at [1,8] indicates that item 1 & 8 were counterparties in two trades. (Note that it's a symmetric matrix, and so [8,1] also has the value 2).
While the value of 1 at [1,2] indicates that item 1 and 2 were counterparties in only one trade (all the other 1s throughout the matrix indicate the same)
As an odd example, note at [1,1], the value of 1 indicates that item 1 was a counterparty to itself once (trade number 7)
A little extra insight into my motivation, note in my simple example that item 1 tends to act as counterparty with many different items. In a barter economy (one without explicit money) we might expect a commodity currency to be a counterparty relatively more frequently than non-commodity-currencies. A matrix like this would be the first step at one way of discovering which item was a commodity currency.
I've been struggling with this for a while. But I think I'm nearly done with an overly complicated solution, which I'll post shortly.
I'm curious if y'all might offer a bit of help also.
Alright, I think I've got this figured out. The short answer is:
Reduce("+",by(myDat, myDat$TradeID, function(x) pmin(table(x$ItemID[x$Origin==0]) %o% table(x$ItemID[x$Origin==1]) + table(x$ItemID[x$Origin==1]) %o% table(x$ItemID[x$Origin==0]),1)))
Which gives the following matrix, matching the desired result:
1 2 3 4 5 6 7 8
1 1 1 1 1 1 1 1 2
2 1 0 1 1 0 0 0 0
3 1 1 0 0 0 0 0 0
4 1 1 0 0 0 0 0 0
5 1 0 0 0 0 0 1 0
6 1 0 0 0 0 0 0 0
7 1 0 0 0 1 0 0 0
8 2 0 0 0 0 0 0 0
Here's the long answer. You can get a list of matrices for each TradeID using the by and outer (%o%) and table functions. But this double-counts Trade 7, where item 1 is traded for item 1, so I use the pmax function to fix this. Then I sum across the list by using the Reduce function.
And here's the steps to get there. Note the addition of TradeID # 9, which was left out of the question's code.
# Data
myDat <- data.frame(
TradeID = as.factor(c(1,1,1,2,2,2,3,3,4,4,5,5,6,6,7,7,8,8,8,9,9)),
Origin = as.factor(c(1,0,0,1,1,0,1,0,1,0,1,0,1,0,1,0,1,0,0,1,0)),
ItemID = as.factor(c(1,2,3,4,5,1,1,6,7,1,1,8,7,5,1,1,2,3,4,1,8))
)
# Sum in 1 direction
by(myDat, myDat$TradeID, function(x) table(x$ItemID[x$Origin==0]) %o% table(x$ItemID[x$Origin==1]))
# Sum in both directions
by(myDat, myDat$TradeID, function(x) table(x$ItemID[x$Origin==1]) %o% table(x$ItemID[x$Origin==0]) + table(x$ItemID[x$Origin==0]) %o% table(x$ItemID[x$Origin==1]))
# Remove double-count in trade 7
by(myDat, myDat$TradeID, function(x) pmin(table(x$ItemID[x$Origin==0]) %o% table(x$ItemID[x$Origin==1]) + table(x$ItemID[x$Origin==1]) %o% table(x$ItemID[x$Origin==0]),1))
# Sum across lists
Reduce("+",by(myDat, myDat$TradeID, function(x) pmin(table(x$ItemID[x$Origin==0]) %o% table(x$ItemID[x$Origin==1]) + table(x$ItemID[x$Origin==1]) %o% table(x$ItemID[x$Origin==0]),1)))
One way to speed this up would be to sum in only 1 direction (taking advantage of symmetry) and then clean up the results.
result = Reduce("+",by(myDat, myDat$TradeID, function(x) table(x$ItemID[x$Origin==0]) %o% table(x$ItemID[x$Origin==1])))
result2 = result + t(result)
diag(result2) = diag(result)
result2
1 2 3 4 5 6 7 8
1 1 1 1 1 1 1 1 2
2 1 0 1 1 0 0 0 0
3 1 1 0 0 0 0 0 0
4 1 1 0 0 0 0 0 0
5 1 0 0 0 0 0 1 0
6 1 0 0 0 0 0 0 0
7 1 0 0 0 1 0 0 0
8 2 0 0 0 0 0 0 0
This appears to run nearly twice as fast.
> microbenchmark(Reduce("+",by(myDat, myDat$TradeID, function(x) pmin(table(x$ItemID[x$Origin==0]) %o% table(x$ItemID[x$Origin==1]) + table(x$ItemID[x$Origin==1]) %o% table(x$ItemID[x$Origin==0]),1))))
Unit: milliseconds
min lq median uq max neval
7.489092 7.733382 7.955861 8.536359 9.83216 100
> microbenchmark(Reduce("+",by(myDat, myDat$TradeID, function(x) table(x$ItemID[x$Origin==0]) %o% table(x$ItemID[x$Origin==1]))))
Unit: milliseconds
min lq median uq max neval
4.023964 4.18819 4.277767 4.452824 5.801171 100
This will give you the number of observations per TradeID and ItemID
myDat <- data.frame(
TradeID = as.factor(c(1,1,1,2,2,2,3,3,4,4,5,5,6,6,7,7,8,8,8)),
Origin = as.factor(c(1,0,0,1,1,0,1,0,1,0,1,0,1,0,1,0,1,0,0)),
ItemID = as.factor(c(1,2,3,4,5,1,1,6,7,1,1,8,7,5,1,1,2,3,4))
)
result = tapply(myDat$Origin, list(myDat$ItemID,myDat$TradeID), length)
result[is.na(result)] = 0
result["1","7"]
result will then be:
> result
1 2 3 4 5 6 7 8
1 1 1 1 1 1 0 2 0
2 1 0 0 0 0 0 0 1
3 1 0 0 0 0 0 0 1
4 0 1 0 0 0 0 0 1
5 0 1 0 0 0 1 0 0
6 0 0 1 0 0 0 0 0
7 0 0 0 1 0 1 0 0
8 0 0 0 0 1 0 0 0
This will give you the proportion of 1 Origin per TradeID and ItemID
result = tapply(myDat$Origin, list(myDat$ItemID,myDat$TradeID), function(x) { sum(as.numeric(as.character(x)))/length(x) })
You can set the NA values in the last matrix to 0 using result[is.na(result)] = 0 but that would confuse no observations with nothing but 0 Origin trades.
This will give you the number of observations per consecutive ItemIDs:
idxList <- with(myDat, tapply(ItemID, TradeID, FUN = function(items)
lapply(seq(length(items) - 1),
function(i) sort(c(items[i], items[i + 1])))))
# indices of observations
idx <- do.call(rbind, unlist(idxList, recursive = FALSE))
# create a matrix
ids <- unique(myDat$ItemID)
mat <- matrix(0, length(ids), length(ids))
# place values in matrix
for (i in seq(nrow(idx))) {
mat[idx[i, , drop = FALSE]] <- mat[idx[i, , drop = FALSE]] + 1
}
# create symmatric marix
mat[lower.tri(mat)] <- t(mat)[lower.tri(mat)]
[,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8]
[1,] 1 1 0 0 1 1 1 1
[2,] 1 0 2 0 0 0 0 0
[3,] 0 2 0 1 0 0 0 0
[4,] 0 0 1 0 1 0 0 0
[5,] 1 0 0 1 0 0 1 0
[6,] 1 0 0 0 0 0 0 0
[7,] 1 0 0 0 1 0 0 0
[8,] 1 0 0 0 0 0 0 0

Resources