Count matching instances between two data frames [closed] - r

Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 8 years ago.
Improve this question
I'm a newbie with R and can't find my answer/anything that works.
I've got two data frames that look like..
Teams
A
B
C
...
and
TCF
A
B
C
C
B
A
...
I need to count the number of instances that each of the first DF column occurs in the second DF and return the value to the first DF. Thanks in advance!

You could use base R to do this:
sapply(unique(df1$Teams), function(x) sum(df2$TCF %in% x))
#A B C
#2 2 2
Or
setNames(table(match(df2$TCF, unique(df1$Teams))), unique(df1$Teams))
#A B C
#2 2 2
Or using data.table
library(data.table)
setkey(setDT(df1), Teams)
setkey(setDT(df2), TCF)
df2[J(unique(df1$Teams)),.N, by=.EACHI]
# TCF N
#1: A 2
#2: B 2
#3: C 2
data
df1 <- structure(list(Teams = c("A", "B", "C")), .Names = "Teams",
class = "data.frame", row.names = c(NA,-3L))
df2 <- structure(list(TCF = c("A", "B", "C", "C", "B", "A")), .Names = "TCF",
class = "data.frame", row.names = c(NA, -6L))

Would this option be easier to your eyes?
library(dplyr)
df2 %>% count(TCF) %>% filter(TCF %in% unique(df1$Teams))
# Source: local data frame [3 x 2]
# TCF n
# 1 A 2
# 2 B 2
# 3 C 2
Data
df1 <- structure(list(Teams = c("A", "B", "C")), .Names = "Teams", class = "data.frame", row.names = c(NA,
-3L))
df2 <- structure(list(TCF = structure(c(1L, 2L, 3L, 3L, 2L, 1L, 4L,
5L, 5L), .Label = c("A", "B", "C", "X", "Y"), class = "factor")), .Names = "TCF", row.names = c(NA,
-9L), class = "data.frame")

Related

Fastest way to populate a column from another table in R?

I have 2 tables and I need to create a new column "Number" where it populates it with the col2 respective value. My data contains couple of hundreds of rows and I used for loop to populate it but it takes lots of time is there a faster way?
Col1
Col2
A
55
B
77
C
80
D
9
Letter
Number
A
B
C
D
Using match.
transform(df2, Number=df1[match(Letter, df1$Col1), ]$Col2)
# Letter Number
# 1 A 55
# 2 B 77
# 3 C 80
# 4 D 9
Data:
df1 <- structure(list(Col1 = c("A", "B", "C", "D"), Col2 = c(55L, 77L,
80L, 9L)), class = "data.frame", row.names = c(NA, -4L))
df2 <- structure(list(Letter = c("A", "B", "C", "D"), Number = c(NA,
NA, NA, NA)), class = "data.frame", row.names = c(NA, -4L))

How to collapse rows by identical values in a column

Good evening,
I have a two columns tab separated .txt file, as the following:
number letter
1 a
1 b
2 a
2 b
3 b
I would like to collapse rows where the column "number" has identical value, by creating a comma separated value in the corresponding column "letter".
In other words, this should be the output:
number letter
1 a,b
2 a,b
3 b
I have looked up the web but I did not find an actual solution.
Thank you in advance,
Giuseppe
We can use aggregate in base R
aggregate(letter ~ number, df1, FUN = paste, collapse=",")
-output
# number letter
#1 1 a,b
#2 2 a,b
#3 3 b
Or with tidyverse
library(dplyr)
library(stringr)
df1 %>%
group_by(number) %>%
summarise(letter = str_c(letter, collapse=","))
data
df1 <- structure(list(number = c(1L, 1L, 2L, 2L, 3L), letter = c("a",
"b", "a", "b", "b")), class = "data.frame", row.names = c(NA,
-5L))
We can also combine aggregate() with toString:
#Code
newdf <- aggregate(letter~.,df,toString)
Output:
number letter
1 1 a, b
2 2 a, b
3 3 b
Some data:
#Data
df <- structure(list(number = c(1L, 1L, 2L, 2L, 3L), letter = c("a",
"b", "a", "b", "b")), class = "data.frame", row.names = c(NA,
-5L))

Distinct in dplyr does not work (sometimes)

I have the following data frame which I have obtained from a count. I have used dput to make the data frame available and then edited the data frame so there is a duplicate of A.
df <- structure(list(Procedure = structure(c(4L, 1L, 2L, 3L), .Label = c("A", "A", "C", "D", "-1"),
class = "factor"), n = c(10717L, 4412L, 2058L, 1480L)),
class = c("tbl_df", "tbl", "data.frame"), row.names = c(NA, -4L), .Names = c("Procedure", "n"))
print(df)
# A tibble: 4 x 2
Procedure n
<fct> <int>
1 D 10717
2 A 4412
3 A 2058
4 C 1480
Now I would like to take distinct on Procedure and only keep the first A.
df %>%
distinct(Procedure, .keep_all=TRUE)
# A tibble: 4 x 2
Procedure n
<fct> <int>
1 D 10717
2 A 4412
3 A 2058
4 C 1480
It does not work. Strange...
If we print the Procedure column, we can see that there are duplicated levels for a, which is problematic for the distinct function.
df$Procedure
[1] D A A C
Levels: A A C D -1
Warning message:
In print.factor(x) : duplicated level [2] in factor
One way to fix is to drop the factor levels. We can use factor function to achieve this. Another way is to convert the Procedure column to character.
df <- structure(list(Procedure = structure(c(4L, 1L, 2L, 3L), .Label = c("A", "A", "C", "D", "-1"),
class = "factor"), n = c(10717L, 4412L, 2058L, 1480L)),
class = c("tbl_df", "tbl", "data.frame"), row.names = c(NA, -4L), .Names = c("Procedure", "n"))
library(tidyverse)
df %>%
mutate(Procedure = factor(Procedure)) %>%
distinct(Procedure, .keep_all=TRUE)
# # A tibble: 3 x 2
# Procedure n
# <fct> <int>
# 1 D 10717
# 2 A 4412
# 3 C 1480
You have duplicated value in a label parameter .Label = c("A", "A", "C", "D", "-1"). That is an issue. Btw your way of initializing of a tibble seems to be very strange (i do not know exactly your goal but still)
Why not use
df <- tibble(
Procedure = c("D", "A", "A", "C"),
n = c(10717L, 4412L, 2058L, 1480L)
)

Coverting a data frame to matrix in R

I would like to convert a data frame to a matrix in R, as in the following example:
df
row.index column.index matrix element
1 1 A
1 2 B
2 1 C
2 2 D
matrix
A B
C D
Is it possible to do the same with rownames? In example
df
row.name column.name matrix element
X P A
X Q B
Y P C
Y Q D
matrix
P Q
X A B
Y C D
Thanks for help!
We can use tapply
tapply(df$matrixelement, df[1:2], FUN = I)
It would also work for the second dataset
res <- tapply(df1$matrixelement, df1[1:2], FUN = I)
names(dimnames(res)) <- NULL
res
# P Q
#X "A" "B"
#Y "C" "D"
If we need a data.frame, then dcast can be used
library(reshape2)
dcast(df, row.index ~column.index)
data
df <- structure(list(row.index = c(1L, 1L, 2L, 2L), column.index = c(1L,
2L, 1L, 2L), matrixelement = c("A", "B", "C", "D")), .Names = c("row.index",
"column.index", "matrixelement"), class = "data.frame", row.names = c(NA,
-4L))
df1 <- structure(list(row.name = c("X", "X", "Y", "Y"), column.name = c("P",
"Q", "P", "Q"), matrixelement = c("A", "B", "C", "D")), .Names = c("row.name",
"column.name", "matrixelement"), class = "data.frame", row.names = c(NA,
-4L))

Grouping linked unique ID pairs using R [duplicate]

This question already has an answer here:
Create a group index for values connected directly and indirectly
(1 answer)
Closed 4 years ago.
I'm trying to link together pairs of unique IDs using R. Given the example below, I have two IDs (here ID1 and ID2) that indicate linkage. I'm trying to create groups of rows that are linked. In this example A is linked to B which is linked to D which is linked to E. Because these are all connected, I want to group them together. Next, there is also X which is linked to both Y and Z. Because these two are also connected, I want to assign them to a single group as well. How can I tackle this using R?
Thanks!
Example data:
ID1 ID2
A B
B D
D E
X Y
X Z
DPUT R representation
structure(list(id1 = structure(c(1L, 2L, 3L, 4L, 4L), .Label = c("A", "B", "D", "X"), class = "factor"), id2 = structure(1:5,.Label = c("B", "D", "E", "Y", "Z"), class = "factor")), .Names = c("id1", "id2"), row.names = c(NA, -5L), class = "data.frame")
Output needed:
ID1 ID2 GROUP
A B 1
B D 1
D E 1
X Y 2
X Z 2
As per mentionned by #Frank in the comments, you can use igraph:
library(igraph)
idf <- graph.data.frame(df)
clusters(idf)$membership
Which gives:
A B D X E Y Z
1 1 1 2 1 2 2
Should you want to assign the result back to rows of df:
merge(df, stack(clusters(idf)$membership), by.x = "id1", by.y = "ind", all.x = TRUE)

Resources