count cycles in network - r

What is the best way, or are there any ways implemented in are to count both 3 and 4 cycles in networks.
3 cycles equal connected groups of three nodes(triangles) to be calculated from one mode networks
4 cycles equal connected groups of four nodes(squares) to be calculated from two mode networks
If i have networks like this:
onemode <- read.table(text= "start end
1 2
1 3
4 5
4 6
5 6",header=TRUE)
twomode <- read.table(text= "typa typev
aa a
bb b
bb a
aa b",header=TRUE)
I thought
library(igraph)
g <- graph.data.frame(twomode)
E(g)
graph.motifs(g, size = 4)
would count the number of squares in my two mode network but I dont understand the output. I thought the result would be 1

?graph.motifs
graph.motifs searches a graph for motifs of a given size and returns a
numeric vector containing the number of different motifs. The order of
the motifs is defined by their isomorphism class, see graph.isoclass.
So the output of this is numeric vector where each value is the count of a certain motif(with sizes is 4 or 3) in your graph.
graph.motifs(g,size=4)
To get the total number of the motifs, you can use graph.motifs.no
graph.motifs.no(g,size=4)
[1] 1
Which is the number of the motif 20
which(graph.motifs(g,size=4) >0)
[1] 20

Another function that might be easier to use for this taks is kcycle.census {sna}. Details: http://svitsrv25.epfl.ch/R-doc/library/sna/html/path.census.html

Related

cutree alternative to extract cluster with given number of objects

While stats::cutree() takes an hclust-object and cuts it into a given number of clusters, I'm looking for a function that takes a given amount of elements and attempts to set k accordingly. In other words: Return the first cluster with n elements.
For example:
Searching for the first cluster with n = 9 objects.
library(psych)
data(bfi)
x <- bfi
hclust.res <- hclust(dist(abs(cor(na.omit(x)))))
cutree.res <- cutree(hclust.res, k = 2)
cutree.table <- table(cutree.res)
cutree.table
# no cluster with n = 9 elements
> cutree.res
1 2
23 5
while k = 3 yields
cutree.res <- cutree(hclust.res, k = 3)
# three clusters, whereas cluster 2 contains the required amount of objects
> cutree.table
cutree.res
1 2 3
14 9 5
Is there a more convenient way then iterating over this?
Thanks
You can easily write code for this yourself that only does one pass over the dendrogram rather than calling cutter in a loop.
Just execute the merges one by one and note the cluster sizes. Then keep the one that you "liked" the best.
Note that there might be no such solution. For example on the 1 dimensional data set -11 -10 +10 +11, cutting the dendrogram in merge order will return clusters with 1,2, or 4 elements only. So you'll have to handle this case, too.

Iterating Pearsons R Correlation through all unique pairs

If I have an input file with 7 columns or an arbitrary number of columns, and each column has a number of values in it, and I want to run correlations for all unique pairs(where AB and BA are the same) of columns without having to go in and do cor.test(column$A,
Column$B) for every possible pair how do I do this in R?
Example data:
A B C D
1 2 2 3
3 2 2 1
5 2 4 3
5 2 3 3
In this case A, B, C,D are different columns and I would want to do all possible correlations for unique pairs where AB and BA count as the same pair, just as DA and AD would be the same pair.
You can try:
cor(df, use="complete.obs", method="kendall") #or whichever method fits you
or:
#this gives significance levels also
library(Hmisc)
rcorr(df, type="pearson") # type can be pearson or spearman
#or as matrix
rcorr(as.matrix(df))
If this doesn't work, try creating a list of column vectors and loop through everything (I will try to provide an example in edit)
Hope this helps

Formatting data for two sample t-tests on R

Suppose I have the dataset that has the following information:
1) Number (of products bought, for example)
1 2 3
2) Frequency for each number (e.g., how many people purchased that number of products)
2 5 10
Let's say I have the above information for each of the 2 groups: control and test data.
How do I format the data such that it would look like this:
controldata<-c(1,1,2,2,2,2,2,3,3,3,3,3,3,3,3,3,3)
(each number * frequency listed as a vector)
testdata<- (similar to above)
so that I can perform the two independent sample t-test on R?
If I don't even need to make them a vector / if there's an alternative clever way to format the data to perform the t-test, please let me know!
It would be simple if the vector is small like above, but I can have the frequency>10000 for each number.
P.S.
Control and test data have a different sample size.
Thanks!
Use rep. Using your data above
rep(c(1, 2, 3), c(2, 5, 10))
# [1] 1 1 2 2 2 2 2 3 3 3 3 3 3 3 3 3 3
Or, for your case
control_data = rep(n_bought, frequency)

How to compare communities in two consecutive graphs

I have the same graph represented at two different times, g.t0 and g.t1. g.t1 differs from g.t0 for having one additional edge but maintains the same vertices.
I want to compare the communities in g.t0 and g.t1, that is, to test whether the vertices moved to a different community from t0 to t1. I tried the following
library(igraph)
m <- matrix(c(0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0),nrow=4,ncol=4)
g.t0 <- graph.adjacency(m)
memb.t0 <- membership(edge.betweenness.community(g.t0))
V(g.t0)
# Vertex sequence:
# [1] 1 2 3 4
memb.t0
# [1] 1 2 2 3
g.t1 <- add.edges(g.t0,c(1,2))
memb.t1 <- membership(edge.betweenness.community(g.t1))
V(g.t1)
# Vertex sequence:
# [1] 1 2 3 4
memb.t1
# [1] 1 1 1 2
But of course the problem is that the indexing of the communities always start from 1. Then in the example it seems that all the vertices have moved to a different community, but the most intuitive reading is that actually only the vertex 1 changed community, moving with 2 and 3.
How could I approach the problem of counting the number of vertices that changed communities from t0 to t1?
Actually this is not an easy question. In general you need to match the communities in the two graphs, using some rule or criteria that the matching optimizes. As you can have different number of communities, the matching is not necessarily bijective.
There were several methods and quantities proposed for this problem, a bunch is implemented in igraph, see
http://igraph.org/r/doc/compare.html
compare.communities(memb.t1, memb.t0, method="vi")
# [1] 0.4773856
compare.communities(memb.t1, memb.t0, method="nmi")
# [1] 0.7020169
compare.communities(memb.t1, memb.t0, method="rand")
# [1] 0.6666667
See the references in the igraph manual for the details about the methods.

Unit of Analysis Conversion

We are working on a social capital project so our data set has a list of an individual's organizational memberships. So each person gets a numeric ID and then a sub ID for each group they are in. The unit of analysis, therefore, is the group they are in. One of our variables is a three point scale for the type of group it is. Sounds simple enough?
We want to bring the unit of analysis to the individual level and condense the type of group it is into a variable signifying how many different types of groups they are in.
For instance, person one is in eight groups. Of those groups, three are (1s), three are (2s), and two are (3s). What the individual level variable would look like, ideally, is 3, because she is in all three types of groups.
Is this possible in the least?
##simulate data
##individuals
n <- 10
## groups
g <- 5
## group types
gt <- 3
## individuals*group membership
N <- 20
## inidividuals data frame
di <- data.frame(individual=sample(1:n,N,replace=TRUE),
group=sample(1:g,N, replace=TRUE))
## groups data frame
dg <- data.frame(group=1:g, type=sample(1:gt,g,replace=TRUE))
## merge
dm <- merge(di,dg)
## order - not necessary, but nice
dm <- dm[order(dm$individual),]
## group type per individual
library(plyr)
dr <- ddply(dm, "individual", function(x) length(unique(x$type)))
> head(dm)
group individual type
2 2 1 2
8 2 1 2
20 5 1 1
9 3 3 2
12 3 3 2
17 4 3 2
> head(dr)
individual V1
1 1 2
2 3 1
3 4 2
4 5 1
5 6 1
6 7 1
I think what you're asking is whether it is possible to count the number of unique types of group to which an individual belongs.
If so, then that is certainly possible.
I wouldn't be able to tell you how to do it in R since I don't know a lot of R, and I don't know what your data looks like. But there's no reason why it wouldn't be possible.
Is this data coming from a database? If so, then it might be easier to write a SQL query to compute the value you want, rather than to do it in R. If you describe your schema, there should be lots of people here who could give you the query you need.

Resources