I have asked a similar question before but this is slightly different compared to my previous question.
I have a matrix
a b c d e
a 0 1 1 1 0
b 1 0 1 1 1
I am trying to convert this to a square matrix like this
a b c d e
a 0 1 1 1 0
b 1 0 1 1 1
c 1 1 0 0 0
d 1 1 0 0 0
e 0 1 0 0 0
Any advise on how to do this in r will be helpful. Thanks in advance.
What do you think about this solution?
res <- (merge(m, t(m)[(nrow(m)+1):ncol(m),], all = TRUE, by = 0:2))[,-1]
rownames(res) <- colnames(res)
res[is.na(res)] <- 0
Related
I detect communities in my adjacency matrix. Parallely, I create an affiliation matrix using the vertices of the same matrix. How do I measure the weight of the communities in each of the columns of the affiliation matrix?
Take the following adjacency matrix:
A B C D E F G
A 0 1 0 1 0 1 0
B 1 0 1 1 0 1 0
C 0 1 0 0 0 0 0
D 1 1 0 0 1 1 0
E 0 0 0 1 0 1 0
F 1 1 0 1 1 0 1
G 0 0 0 0 0 1 0
I identify the communities:
com <- edge.betweenness.community(g)
V(g)$memb <- com$membership
Now take the following affiliation matrix:
P R Q
A 1 1 0
B 1 0 1
C 1 1 0
D 0 1 0
E 1 0 1
F 0 0 1
G 1 1 0
How do I count the number of vertices corresponding to community [[1]] which are affiliated to the "P" in the affiliation matrix?
You can do sum(m[com[[1]],"P"]>0), given that m holds your affiliation matrix. Or lapply(com, function(x) colSums(m[x, ])) for all communities.
I have a data.frame with 16 different combinations of 4 different cell markers
combinations_df
FITC Cy3 TX_RED Cy5
a 0 0 0 0
b 1 0 0 0
c 0 1 0 0
d 1 1 0 0
e 0 0 1 0
f 1 0 1 0
g 0 1 1 0
h 1 1 1 0
i 0 0 0 1
j 1 0 0 1
k 0 1 0 1
l 1 1 0 1
m 0 0 1 1
n 1 0 1 1
o 0 1 1 1
p 1 1 1 1
I have my "main" data.frame with 10 columns and thousands of rows.
> main_df
a b FITC d Cy3 f TX_RED h Cy5 j
1 0 1 1 1 1 0 1 1 1 1
2 0 1 0 1 1 0 1 0 1 1
3 1 1 0 0 0 1 1 0 0 0
4 0 1 1 1 1 0 1 1 1 1
5 0 0 0 0 0 0 0 0 0 0
....
I want to use all the possible 16 combinations from combinations_df to compare with each row of main_df. Then I want to create a new vector to later cbind to main_df as column 11.
sample output
> phenotype
[1] "g" "i" "a" "p" "g"
I thought about doing a while loop within a for loop checking each combinations_df row through each main_df row.
Sounds like it could work, but I have close to 1 000 000 rows in main_df, so I wanted to see if anybody had a better idea.
EDIT: I forgot to mention that I want to compare combinations_df only to columns 3,5,7,9 from main_df. They have the same name, but it might not be that obvious.
EDIT: Changin the sample data output, since no "t" should be present
The dplyr solution is outrageously simple. First you need to put phenotype in combinations_df as an explicit variable like this:
# phenotype FITC Cy3 TX_RED Cy5
#1 a 0 0 0 0
#2 b 1 0 0 0
#3 c 0 1 0 0
#4 d 1 1 0 0
# etc
dplyr lets you join on multiple variables, so from here it's a one-liner to look up the phenotypes.
library(dplyr)
left_join(main_df, combinations_df, by=c("FITC", "Cy3", "TX_RED", "Cy5"))
# a b FITC d Cy3 f TX_RED h Cy5 j phenotype
#1 0 1 1 1 1 0 1 1 1 1 p
#2 0 1 0 1 1 0 1 0 1 1 o
#3 1 1 0 0 0 1 1 0 0 0 e
#4 0 1 1 1 1 0 1 1 1 1 p
#5 0 0 0 0 0 0 0 0 0 0 a
I originally thought you'd have to concatenate columns with tidyr::unite but this was not the case.
Its not very elegant but this method works just fine. There are no loops in loops here so it should run just fine. Might trying to match using the dataframe rows and do away with the loops all together but this was just the fastest way I could figure it out. You might look at packages plyr or data.table. Very powerful packages for this kind of thing.
main_text=NULL
for(i in 1:length(main_df[,1])){
main_text[i]<-paste(main_df[i,3],main_df[i,5],main_df[i,7],main_df[i,9],sep="")
}
comb_text=NULL
for(i in 1:length(combinations_df[,1])){
comb_text[i]<-paste(combinations_df[i,1],combinations_df[i,2],combinations_df[i,3],combinations_df[i,4],sep="")
}
rownames(combinations_df)[match(main_text,comb_text)]
How about something like this? My results are different than yours as there is no "t" in the combination_df. You could do it without assigning a new column to if you wanted. This is mainly for illustrative purposes.
combination_df <- read.table("Documents/comb.txt.txt", header=T)
main_df <- read.table("Documents/main.txt", header=T)
main_df
combination_df
main_df$key <- do.call(paste0, main_df[,c(3,5,7,9)])
combination_df$key <- do.call(paste0, combination_df)
rownames(combination_df)[match(main_df$key, combination_df$key)]
I have a data frame with 5*n columns, where n is the number of categories listed in a vector. I want to break the data frame into chunks of 5 columns (eg. category 1 is columns 1:5, category 2 is columns 6:10) and then assign the category names from the vector to the chunks.
eg.
*original data frame* *vector of category names*
X a b c d e a b c d e a b c d e 1 apples
1 1 0 0 0 1 0 1 0 1 0 0 0 1 1 0 2 oranges
2 0 1 0 1 0 0 0 1 0 1 1 0 0 0 1 3 bananas
Will become
*apples* *oranges* *bananas*
X a b c d e X a b c d e X a b c d e
1 1 0 0 0 1 1 0 1 0 1 0 1 0 0 1 1 0
2 0 1 0 1 0 2 0 0 1 0 1 2 1 0 0 0 1
I can find a whole lot of information about splitting data.frames by rows, which is much more common to do, but I can't find anything about splitting a data frame into n chunks by columns. Thanks!
You could split your original_data_frame by column indices similarely:
df <- read.table(header=T, check.names = F, text="
X a b c d e a b c d e a b c d e
1 1 0 0 0 1 0 1 0 1 0 0 0 1 1 0
2 0 1 0 1 0 0 0 1 0 1 1 0 0 0 1")
n <- 5 # fixed chunksize (a-e)
lst <- lapply(split(2:ncol(df), rep(seq(ncol(df[-1])/n), each=n)), function(x) df[, x])
names(lst) <- c("apples", "oranges", "bananas")
# lst
# $apples
# a b c d e
# 1 1 0 0 0 1
# 2 0 1 0 1 0
#
# $oranges
# a b c d e
# 1 0 1 0 1 0
# 2 0 0 1 0 1
#
# $bananas
# a b c d e
# 1 0 0 1 1 0
# 2 1 0 0 0 1
I don't know if this is elegant, but it came to my mind, first.
I have a binary transition matrix. I want to delete rows associated with columns that sum to zero. For example, if
A B C D E
A 0 0 0 1 0
B 1 0 0 1 0
C 0 0 1 1 0
D 0 0 1 0 0
E 0 0 1 1 0
column B and E sum to zero. I know how to get rid of the columns like this,
> a.adj=a[,!!colSums(a)]
> a.adj
A C D
A 0 0 1
B 1 0 1
C 0 1 1
D 0 1 0
E 0 1 1
but how can I at the same time delete rows B and E to get
A C D
A 0 0 1
C 0 1 1
D 0 1 0
If the rownames and colnames are in the same order
indx <- !!colSums(a)
a[indx,indx]
# A C D
#A 0 0 1
#C 0 1 1
#D 0 1 0
Use names to select both columns and rows
> ind <- colnames(a[,!!colSums(a)])
> a[ind, ind]
A C D
A 0 0 1
C 0 1 1
D 0 1 0
I have a data frame with the following format:
name workplace
a A
b B
c A
d C
e D
....
I would like to convert this data frame into an affiliation network in R with the format
A B C D ...
a 1 0 0 0
b 0 1 0 0
c 1 0 0 0
d 0 0 1 0
e 0 0 0 1
...
and I used the following program:
for (i in 1:nrow(A1)) {
a1[rownames(a1) == A1$name[i],
colnames(a1) == A1$workplace[i]] <- 1
}
where A1 is the data frame, and a1 is the affiliation network. However, since I have a large data frame, the above program runs very slow. Is there an efficient way that avoids looping in data conversion?
Thank you very much!
If your data called df just do:
as.data.frame.matrix(table(df))
# A B C D
# a 1 0 0 0
# b 0 1 0 0
# c 1 0 0 0
# d 0 0 1 0
# e 0 0 0 1
May be this also helps:
m1 <- model.matrix(~0+workplace, data=dat)
dimnames(m1) <- lapply(dat, unique)
as.data.frame(m1)
# A B C D
#a 1 0 0 0
#b 0 1 0 0
#c 1 0 0 0
#d 0 0 1 0
#e 0 0 0 1