This question already has an answer here:
How to create missing values in table in R?
(1 answer)
Closed 2 years ago.
I want to make a contingency table with observations and their predictions based on a neural network. Since I want positives to be on the diagonal, I would like my table to be squared, regardless if there are rows with just 0's. That is, I would like to have
b
a a b c d e f g
a 1 0 1 0 2 1 0
b 0 0 0 0 0 0 0
c 0 0 0 0 0 0 0
d 2 3 1 2 2 3 2
e 1 2 1 1 0 1 3
f 0 0 0 0 0 0 0
g 4 2 1 0 3 1 0
Instead of:
> set.seed(1)
> b<-sample(letters[1:7],40,rep=TRUE)
> a<-sample(letters[1:4],40,rep=TRUE)
>
> table(a,b)
b
a a b c d e f g
a 1 0 1 0 2 1 0
d 2 3 1 2 2 3 2
e 1 2 1 1 0 1 3
g 4 2 1 0 3 1 0
How can I do this?
Convert a and b to factor with levels as union of both :
tmp <- sort(union(a, b))
table(factor(a, levels = tmp), factor(b, levels = tmp))
# a b c d e f g
# a 0 1 1 2 2 1 4
# b 2 1 1 1 2 3 2
# c 4 0 1 2 0 1 1
# d 0 1 1 1 3 1 1
# e 0 0 0 0 0 0 0
# f 0 0 0 0 0 0 0
# g 0 0 0 0 0 0 0
I detect communities in my adjacency matrix. Parallely, I create an affiliation matrix using the vertices of the same matrix. How do I measure the weight of the communities in each of the columns of the affiliation matrix?
Take the following adjacency matrix:
A B C D E F G
A 0 1 0 1 0 1 0
B 1 0 1 1 0 1 0
C 0 1 0 0 0 0 0
D 1 1 0 0 1 1 0
E 0 0 0 1 0 1 0
F 1 1 0 1 1 0 1
G 0 0 0 0 0 1 0
I identify the communities:
com <- edge.betweenness.community(g)
V(g)$memb <- com$membership
Now take the following affiliation matrix:
P R Q
A 1 1 0
B 1 0 1
C 1 1 0
D 0 1 0
E 1 0 1
F 0 0 1
G 1 1 0
How do I count the number of vertices corresponding to community [[1]] which are affiliated to the "P" in the affiliation matrix?
You can do sum(m[com[[1]],"P"]>0), given that m holds your affiliation matrix. Or lapply(com, function(x) colSums(m[x, ])) for all communities.
I have a data.frame with 16 different combinations of 4 different cell markers
combinations_df
FITC Cy3 TX_RED Cy5
a 0 0 0 0
b 1 0 0 0
c 0 1 0 0
d 1 1 0 0
e 0 0 1 0
f 1 0 1 0
g 0 1 1 0
h 1 1 1 0
i 0 0 0 1
j 1 0 0 1
k 0 1 0 1
l 1 1 0 1
m 0 0 1 1
n 1 0 1 1
o 0 1 1 1
p 1 1 1 1
I have my "main" data.frame with 10 columns and thousands of rows.
> main_df
a b FITC d Cy3 f TX_RED h Cy5 j
1 0 1 1 1 1 0 1 1 1 1
2 0 1 0 1 1 0 1 0 1 1
3 1 1 0 0 0 1 1 0 0 0
4 0 1 1 1 1 0 1 1 1 1
5 0 0 0 0 0 0 0 0 0 0
....
I want to use all the possible 16 combinations from combinations_df to compare with each row of main_df. Then I want to create a new vector to later cbind to main_df as column 11.
sample output
> phenotype
[1] "g" "i" "a" "p" "g"
I thought about doing a while loop within a for loop checking each combinations_df row through each main_df row.
Sounds like it could work, but I have close to 1 000 000 rows in main_df, so I wanted to see if anybody had a better idea.
EDIT: I forgot to mention that I want to compare combinations_df only to columns 3,5,7,9 from main_df. They have the same name, but it might not be that obvious.
EDIT: Changin the sample data output, since no "t" should be present
The dplyr solution is outrageously simple. First you need to put phenotype in combinations_df as an explicit variable like this:
# phenotype FITC Cy3 TX_RED Cy5
#1 a 0 0 0 0
#2 b 1 0 0 0
#3 c 0 1 0 0
#4 d 1 1 0 0
# etc
dplyr lets you join on multiple variables, so from here it's a one-liner to look up the phenotypes.
library(dplyr)
left_join(main_df, combinations_df, by=c("FITC", "Cy3", "TX_RED", "Cy5"))
# a b FITC d Cy3 f TX_RED h Cy5 j phenotype
#1 0 1 1 1 1 0 1 1 1 1 p
#2 0 1 0 1 1 0 1 0 1 1 o
#3 1 1 0 0 0 1 1 0 0 0 e
#4 0 1 1 1 1 0 1 1 1 1 p
#5 0 0 0 0 0 0 0 0 0 0 a
I originally thought you'd have to concatenate columns with tidyr::unite but this was not the case.
Its not very elegant but this method works just fine. There are no loops in loops here so it should run just fine. Might trying to match using the dataframe rows and do away with the loops all together but this was just the fastest way I could figure it out. You might look at packages plyr or data.table. Very powerful packages for this kind of thing.
main_text=NULL
for(i in 1:length(main_df[,1])){
main_text[i]<-paste(main_df[i,3],main_df[i,5],main_df[i,7],main_df[i,9],sep="")
}
comb_text=NULL
for(i in 1:length(combinations_df[,1])){
comb_text[i]<-paste(combinations_df[i,1],combinations_df[i,2],combinations_df[i,3],combinations_df[i,4],sep="")
}
rownames(combinations_df)[match(main_text,comb_text)]
How about something like this? My results are different than yours as there is no "t" in the combination_df. You could do it without assigning a new column to if you wanted. This is mainly for illustrative purposes.
combination_df <- read.table("Documents/comb.txt.txt", header=T)
main_df <- read.table("Documents/main.txt", header=T)
main_df
combination_df
main_df$key <- do.call(paste0, main_df[,c(3,5,7,9)])
combination_df$key <- do.call(paste0, combination_df)
rownames(combination_df)[match(main_df$key, combination_df$key)]
I have a df with several columns having values 0 or 1. Something like:
a b c d e
1 0 0 0 0
0 1 0 1 0
0 1 0 1 0
1 0 1 0 1
I would like to create a 5 by 5 matrix showing total count if columns have 1 in same row. I only want to consider 1's and in case of diagonal it would automatically reflect total row in that column with 1. Output something like:
a b c d e
a 2 0 1 0 1
b 0 2 0 2 0
c 1 0 1 0 1
d 0 2 0 2 0
e 1 0 1 0 1
Thanks.
Sudhir
Convert to matrix and take cross product:
m <- as.matrix(d)
crossprod(m,m)
I have a binary transition matrix. I want to delete rows associated with columns that sum to zero. For example, if
A B C D E
A 0 0 0 1 0
B 1 0 0 1 0
C 0 0 1 1 0
D 0 0 1 0 0
E 0 0 1 1 0
column B and E sum to zero. I know how to get rid of the columns like this,
> a.adj=a[,!!colSums(a)]
> a.adj
A C D
A 0 0 1
B 1 0 1
C 0 1 1
D 0 1 0
E 0 1 1
but how can I at the same time delete rows B and E to get
A C D
A 0 0 1
C 0 1 1
D 0 1 0
If the rownames and colnames are in the same order
indx <- !!colSums(a)
a[indx,indx]
# A C D
#A 0 0 1
#C 0 1 1
#D 0 1 0
Use names to select both columns and rows
> ind <- colnames(a[,!!colSums(a)])
> a[ind, ind]
A C D
A 0 0 1
C 0 1 1
D 0 1 0