I have two data frames containing the names of genetic elements. I want another data frame with the elements in common in both data frames.
Example:
data.a data.b
Column Column
1 a c
2 b e
3 c l
4 d a
I want this result:
data.c
Column
1 a
2 c
This is just an example. The data frame data.b has more elements than data.a.
The %in% operator lets you find which elements are in both.
data.c = data.frame(Column = data.a$Column[data.a$Column %in% data.b$Column])
data.c
Column
1 a
2 c
a <- data.frame(a = c("a","b","c","d"))
a
b <- data.frame(b = c("c","d","e","f"))
b
c <- data.frame(c = a[a$a %in% b$b,])
c
The merge function allows you control the type of join you want.
df1 <- data.frame(a = c("a", "b", "c", "d"))
df2 <- data.frame(a = c("c", "e", "l", "a"))
merge(x=df1, y=df2, by.x="a", by.y="a", all = FALSE)
library(dplyr)
data.a <- data_frame(a = c("a", "b", "c", "d"))
data.b <- data_frame(a = c("c", "e", "l", "a"))
data.c <- data.a %>% inner_join(data.b)
Related
I have two data frame y and z
y <- data.frame(ID = c("A", "A", "A", "B", "B"), gene = c("a", "b", "c", "a", "c"))
z <- data.frame(A = c(2,6,3), B = c(8,4,9), C=c(1,6,2))
rownames(z) <- c("a", "b", "c")
So for y I have a table with patients ID and gene for each patients and in Z I have the same patients IDs in the first row and a list of genes with a specific value (which is not important here). The genes in y are in z, but in z there are genes that are not included in y.
What I want to do is to merge this frames and have something like this:
a b c
A 1 1 1
B 1 0 1
So for each patient, if the genes in z are also in y, fill with 1 and if not, fill with 0
I don't really know how to handle this, any ideas?
Thank you
I've made RE from your question (add this to your question next time):
y <- data.frame(ID = c("id_A", "id_A", "id_A", "id_B", "id_B"), gene = c("a", "b", "c", "a", "c"))
z <- data.frame(id_A = c(2,6,3), id_B = c(8,4,9), id_C=c(1,6,2))
rownames(z) <- c("a", "b", "c")
The idea here is to pivot_longer your tables so you can join than easily.
To do so, you first need to make your rownames into a field:
z <- tibble::rownames_to_column(z, "gene")
Then, you pivot longer you z table:
library(tidyr)
z_long <- pivot_longer(z, starts_with("id_"), names_to = "ID")
and join it with your y table:
library(dplyr)
table_join <- left_join(y, z_long)
Finally, you just have to calculate the frequencies:
table(table_join$ID, table_join$gene)
a b c
id_A 1 1 1
id_B 1 0 1
How to join 2 columns from a single data.frame
For example:
Column A : a,b,c,d,e
Column B : b,c,a,b,e
The column i want
New Column : a,b,c,d,e,b,c,a,b,e
Basically i want to get all data under both columns into a single column
df <- setNames(data.frame(matrix(, nrow = 100, ncol = 2)), c("V1", "V2"))
df$V1 <- "a, b, c, d, e"
df$V2 <- "b, c, a, b, e"
df$V3 <- paste(df$V1, df$V2, sep = ", ")
Hope this helps.
Using base R you could just copy the data.frame to a new object and concatenate the columns A and B using the c() function:
df <- data.frame(
A = c("a", "b", "c", "d", "e"),
B = c("b", "c", "a", "b", "e"),
stringsAsFactors = FALSE
)
df2 <- data.frame(
AB = c(df$A, df$B)
)
Alternatively, you could use a tidyverse approach with the gather() function from the tidyr package. This has the advantage that you can easily include the old column IDs (A or B) from the original data.frame in each row.
library(tidyr)
df_tidy <- df %>%
gather(key = "old_col_id", value = "value", A, B)
How can i get rows of a data frame that has a same value in a element of that comparing with another data frame ?
I have written this but it didn't work.
# example of two data frame
df1 <- data.frame(V1 = c("a", "g", "h", "l", "n", "e"), V2 = c("b", "n", "i", "m", "i", "f"), stringsAsFactors = F)
df2 <- data.frame(V1 = c("a", "c", "f","h"), V2 = c("b", "d", "e","z"), stringsAsFactors = F)
# finding joint values in each element of two data frames
res1<-intersect(df1$V1,df2$V1)
res2<-intersect(df1$V2,df2$V2)
res3<-intersect(df1$V1,df2$V2)
res4<-intersect(df1$V1,df2$V2)
# Getting rows that has joint value at least in one element of df1
ress1<-df1[apply(df1, MARGIN = 1, function(x) all(x== res1)), ]
ress2<-df1[apply(df1, MARGIN = 1, function(x) all(x== res2)), ]
ress3<-df1[apply(df1, MARGIN = 1, function(x) all(x== res3)), ]
ress4<-df1[apply(df1, MARGIN = 1, function(x) all(x== res4)), ]
# Getting rows that has joint value at least in one element of df2
resss1<-df2[apply(df2, MARGIN = 1, function(x) all(x== res1)), ]
resss2<-df2[apply(df2, MARGIN = 1, function(x) all(x== res2)), ]
resss3<-df2[apply(df2, MARGIN = 1, function(x) all(x== res3)), ]
resss4<-df2[apply(df2, MARGIN = 1, function(x) all(x== res4)), ]
# then combine above results
final.res<-rbind(ress1,ress2,ress3,ress4,resss1,resss2,resss3,resss4)
My favorite result is:
a b
h z
h i
f e
e f
This should work
#Import data
df1 <- data.frame(V1 = c("a", "g", "h", "l", "n", "e"), V2 = c("b", "n", "i", "m", "i", "f"), stringsAsFactors = F)
df2 <- data.frame(V1 = c("a", "c", "f","h"), V2 = c("b", "d", "e","z"), stringsAsFactors = F)
# Get the intersects
vals <- intersect(c(df1$V1, df1$V2), c(df2$V1, df2$V2))
#Get the subsets and rbind them
full <- rbind(
subset(df1, df1$V1 %in% vals),
subset(df1, df1$V2 %in% vals),
subset(df2, df2$V1 %in% vals),
subset(df2, df2$V2 %in% vals)
)
#Remove duplicates
full <- full[!duplicated(full),]
How can I convert these loops to lapply function or another fast function to speed up?
Example:
df1 <- data.frame(
V1 = c("a", "g", "h", "l", "n", "e"),
V2 = c("b", "n", "i", "m", "i", "f"),
stringsAsFactors = FALSE)
df2 <- data.frame(
V1 = c("a", "c", "b"),
V2 = c("b", "d", "a"),
stringsAsFactors = FALSE)
for (i in 1:nrow(df1)) {
for (j in 1:nrow(df2)) {
if (df1[i,]$V1==df2[j,]$V1 & df1[i,]$V2==df2[j,]$V2 |
df1[i,]$V1==df2[j,]$V2 & df1[i,]$V2==df2[j,]$V1) {
res1 <- df1[i,]
res2 <- df2[j,]
res <- rbind(res1, res2)
}
}
}
If you only have two columns, you could also use pmin and pmax. and then combine it with merge in order to find common rows
lookup <- setNames(data.frame(do.call(pmin, df2),
do.call(pmax, df2),
1:nrow(df2)),
c(names(df2), "indx"))
df2[merge(lookup, df1)$indx, ]
# V1 V2
# 1 a b
# 3 b a
Or using data.table for more efficiency
library(data.table)
lookup <- setnames(data.table(do.call(pmin, df2),
do.call(pmax, df2)),
names(df2))
indx <- lookup[df1, on = names(df2), which = TRUE, nomatch = 0L]
df2[indx, ]
# V1 V2
# 1 a b
# 3 b a
We can try
df2[do.call(paste0,
as.data.frame(t(apply(df2, 1, sort)))) %in%
do.call(paste0, df1),]
# V1 V2
#1 a b
#3 b a
Hie,
I have two data frames that are like this for example
df1
V1 V2
a b
m n
h i
l m
n i
e f
and
df2
V1 V2
a b
c d
e f
b a
and I want to get rows that are the same in both data frames in a new one
like this
res2
V1 V2
a b
e f
b a
I tried
res1<-df1[df1$v1%in%df2$V1, ]
res2<-res1[res1$V2%in%df2$V2, ]
but I was unsuccessful. Any better idea?
You need to merge your two data frames based on V1 amd V2 with an inner join:
df1 <- data.frame(V1 = c("a", "m", "h", "l", "n", "e"), V2 = c("b", "n", "i", "m", "i", "f"), stringsAsFactors = F)
df2 <- data.frame(V1 = c("a", "c", "e"), V2 = c("b", "d", "f"), stringsAsFactors = F)
merge(df1, df2, by = c("V1", "V2"))
The result will be the unique couple of V1 and V2 which are both on df1 and df2.
Depending on if you want to keep duplicates values in df1 or df2, you could use as well the options all.x = T or all.y = T.