I have a series of data frames, each with an individual identifier (in this example a letter A-E), and the site number it was observed at.
In this example, I have 3 data frames:
Letters<-c("A","B","C","D","E")
Site1<-c(1,1,2,2,2)
Site2<-c(10,10,20,30,30)
Site3<-c(17,27,37,47,57)
Df1<-data.frame(Letters, Site1)
Df2<-data.frame(Letters, Site2)
Df3<-data.frame(Letters, Site3)
For the first one, it ends up looking like this:
Df1
Letters Site
1 A 1
2 B 1
3 C 2
4 D 2
5 E 2
Individuals A and B were found at Site 1, and individuals C,D,and E were found at site 2.
I'm looking for a way to track which individuals are found within the same sites within a single dataframe (note the site numbers change each time, so I only care about within-dataframe groupings).
I'm assuming I would create individual co-occurrence matrix, with each single matrix only having a 1 or a 0 indicating whether an individual overlapped. Then the last step would be just to add them up like so:
DF1 co-occurrence
A B C D E
A 1 1 0 0 0
B 1 1 0 0 0
C 0 0 1 1 1
D 0 0 1 1 1
E 0 0 1 1 1
DF2 co-occurrence
A B C D E
A 1 1 0 0 0
B 1 1 0 0 0
C 0 0 1 0 0
D 0 0 0 1 1
E 0 0 0 1 1
DF3 co-occurrence
A B C D E
A 1 0 0 0 0
B 0 1 0 0 0
C 0 0 1 0 0
D 0 0 0 1 0
E 0 0 0 0 1
And then add them up to see who is most often grouped with whom:
A B C D E
A 3 2 0 0 0
B 2 3 0 0 0
C 0 0 3 1 1
D 0 0 1 3 2
E 0 0 1 2 3
But I'm not sure how to implement this kind of workflow in R, or if this is even the best way to approach this problem. But my hope is to end up with a similar matrix to this last one above, or some similar method to quantify total co-occurrence
I'm looking for a way to quantify observation pairs in individuals (patients). In this example I have patients who each had two different diseases. The couple of disease(that is, in the same individuals) "a" and "b" is repeated 4 times, for example, in patients "G", "H", "I" and "J" and the couple "k" and "o" is repeated twice (patient "D" has done diseases "k" and "o" and patient "E" has also done these two diseases).
Patient_ID<- c("A","A","B","B","C","C","D","D","E","E","F","F",
"G","G","H","H","I","I","J","J")
Disease<-c("v","s","s","v","s","v" ,"k","o","k","o","o","s","a","b",
"a","b","b","a","b","a")
DATA<-data.frame(Patient_ID,Disease)
print(DATA)
Patient_ID Disease
1 A v
2 A s
3 B s
4 B v
5 C s
6 C v
7 D k
8 D o
9 E k
10 E o
11 F o
12 F s
13 G a
14 G b
15 H a
16 H b
17 I b
18 I a
19 J b
20 J a
With these statistics I would like to generate such a table below.
a b k o v s
a 0 4 0 0 0 0
b 4 0 0 0 0 0
k 0 0 0 2 0 0
o 0 0 2 0 0 1
v 0 0 0 0 0 3
s 0 0 0 1 3 0
Then generate a table for only levels that have count above a certain threshold (for example 2) like in the second table (below).
a b v s
a 0 4 0 0
b 4 0 0 0
v 0 0 0 3
s 0 0 3 0
Here is a base R option using table+crossprod, i.e.,
res <- `diag<-`(crossprod(table(DATA)),0)
which gives
> res
Disease
Disease a b k o s v
a 0 4 0 0 0 0
b 4 0 0 0 0 0
k 0 0 0 2 0 0
o 0 0 2 0 1 0
s 0 0 0 1 0 3
v 0 0 0 0 3 0
For the subset by given threshold, you can use
th <- 2
inds <- rowSums(res > th)>0
subset_res <- subset(res,inds,inds)
which gives
> subset_res
Disease
Disease a b s v
a 0 4 0 0
b 4 0 0 0
s 0 0 0 3
v 0 0 3 0
At first, use unstack() to transform Disease to a data frame with 2 columns. Remember to make both columns have equal levels. This step is to prevent dropping levels in the following operation. Then input the data frame into table() and it'll create a contingency table. In this table, "a & b" and "b & a" are different. To compute the total counts, you need tab + t(tab).
pair <- data.frame(t(unstack(DATA, Disease ~ Patient_ID)))
pair[] <- lapply(pair, factor, levels = levels(DATA$Disease))
tab <- table(pair)
tab + t(tab)
# X2
# X1 a b k o s v
# a 0 4 0 0 0 0
# b 4 0 0 0 0 0
# k 0 0 0 2 0 0
# o 0 0 2 0 1 0
# s 0 0 0 1 0 3
# v 0 0 0 0 3 0
I need to create a subgraph from an adjacency matrix selecting by affiliation data. How do I match an adjacency and an affiliation matrix?
Take the following adjacency matrix:
A B C D E F G
A 0 1 0 1 0 1 0
B 1 0 1 1 0 1 0
C 0 1 0 0 0 0 0
D 1 1 0 0 1 1 0
E 0 0 0 1 0 1 0
F 1 1 0 1 1 0 1
G 0 0 0 0 0 1 0
And the following affiliation matrix:
P R Q
A 1 1 0
B 1 0 1
C 1 1 0
D 0 1 0
E 1 0 1
F 0 0 1
G 1 1 0
How do I create a subgraph from the adjacency matrix only with the nodes corresponding to P in the affiliation matrix?
If your goal is to:
filter out nodes from your adjacency matrix where the corresponding P is 1 in the affiliation matrix
convert filtered adjacency matrix to an igraph object
then you can accomplish that with the following:
# the names(which()) isn't needed for the subset of adj
p_nodes <- names(which(aff[,"P"] == 1))
p_adj <- adj[p_nodes, p_nodes]
p_graph <- igraph::graph.adjacency(p_graph)
i have a data.frame structured like this:
A B C D E
F 1 0 7 0 0
G 0 0 0 1 1
H 1 1 0 0 0
I 1 2 1 0 0
L 1 0 0 0 0
and i want to calculate the sparsity(i.e. the percentage of 0 values) of this data.frame.
How could i do?
sum(df == 0)/(dim(df)[1]*dim(df)[2])
[1] 0.6
I have a df with several columns having values 0 or 1. Something like:
a b c d e
1 0 0 0 0
0 1 0 1 0
0 1 0 1 0
1 0 1 0 1
I would like to create a 5 by 5 matrix showing total count if columns have 1 in same row. I only want to consider 1's and in case of diagonal it would automatically reflect total row in that column with 1. Output something like:
a b c d e
a 2 0 1 0 1
b 0 2 0 2 0
c 1 0 1 0 1
d 0 2 0 2 0
e 1 0 1 0 1
Thanks.
Sudhir
Convert to matrix and take cross product:
m <- as.matrix(d)
crossprod(m,m)