How to aggregate undirected combinations in R [duplicate] - r

This question already has answers here:
Create unique identifier from the interchangeable combination of two variables
(2 answers)
Closed 6 years ago.
I have a dataframe of 3 columns
A B 1
A B 1
A C 1
B A 1
I want to aggregate it such that it considers combinations A-B and B-A to be the same, resulting in
A B 3
A C 1
How do I go about this?

Use pmin and pmax on the first two columns and then do the group-by-count:
library(dplyr);
df %>% group_by(G1 = pmin(V1, V2), G2 = pmax(V1, V2)) %>% summarise(Count = sum(V3))
Source: local data frame [2 x 3]
Groups: G1 [?]
G1 G2 Count
(chr) (chr) (int)
1 A B 3
2 A C 1
Corresponding data.table solution would be:
library(data.table)
setDT(df)
df[, .(Count = sum(V3)), .(G1 = pmin(V1, V2), G2 = pmax(V1, V2))]
G1 G2 Count
1: A B 3
2: A C 1
Data:
structure(list(V1 = c("A", "A", "A", "B"), V2 = c("B", "B", "C",
"A"), V3 = c(1L, 1L, 1L, 1L)), .Names = c("V1", "V2", "V3"), row.names = c(NA,
-4L), class = "data.frame")

Related

Rename columns of a dataframe based on another dataframe except columns not in that dataframe in R

Given two dataframes df1 and df2 as follows:
df1:
df1 <- structure(list(A = 1L, B = 2L, C = 3L, D = 4L, G = 5L), class = "data.frame", row.names = c(NA,
-1L))
Out:
A B C D G
1 1 2 3 4 5
df2:
df2 <- structure(list(Col1 = c("A", "B", "C", "D", "X"), Col2 = c("E",
"Q", "R", "Z", "Y")), class = "data.frame", row.names = c(NA,
-5L))
Out:
Col1 Col2
1 A E
2 B Q
3 C R
4 D Z
5 X Y
I need to rename columns of df1 using df2, except column G since it not in df2's Col1.
I use df2$Col2[match(names(df1), df2$Col1)] based on the answer from here, but it returns "E" "Q" "R" "Z" NA, as you see column G become NA. I hope it keep the original name.
The expected result:
E Q R Z G
1 1 2 3 4 5
How could I deal with this issue? Thanks.
By using na.omit(it's little bit messy..)
colnames(df1)[na.omit(match(names(df1), df2$Col1))] <- df2$Col2[na.omit(match(names(df1), df2$Col1))]
df1
E Q R Z G
1 1 2 3 4 5
I have success to reproduce your error with
df2 <- data.frame(
Col1 = c("H","I","K","A","B","C","D"),
Col2 = c("a1","a2","a3","E","Q","R","Z")
)
The problem is location of df2$Col1 and names(df1) in match.
na.omit(match(names(df1), df2$Col1))
gives [1] 4 5 6 7, which index does not exist in df1 that has length 5.
For df1, we should change order of terms in match, na.omit(match(df2$Col1,names(df1))) gives [1] 1 2 3 4
colnames(df1)[na.omit(match(df2$Col1, names(df1)))] <- df2$Col2[na.omit(match(names(df1), df2$Col1))]
This will works.
A solution using the rename_with function from the dplyr package.
library(dplyr)
df3 <- df2 %>%
filter(Col1 %in% names(df1))
df4 <- df1 %>%
rename_with(.cols = df3$Col1, .fn = function(x) df3$Col2[df3$Col1 %in% x])
df4
# E Q R Z G
# 1 1 2 3 4 5

How fill a dataframe from another one in R?

I want to fill df2 with information from df1.
df1 as below
ID Mutation
1 A
2 B
2 C
3 A
df2 as below
ID A B C
1
2
3
For example, if mutation A is found in ID 1, then I want it in df2 it marked as "Y".
So the df2 result should be
ID A B C
1 Y
2 Y Y
3 Y
I have hundreds of IDs and more than 20 mutations. How can I efficiently achieve this in R? Thanks!
Using data.table you can try
setDT(df)
df2 <- dcast(df,formula = ID~Mutation )
df2[, c("A", "B", "C") := lapply(.SD, function(x) ifelse(is.na(x), " ", "Y")), ID]
df2
#Output
ID A B C
1: 1 Y
2: 2 Y Y
3: 3 Y
Create a new column with value 'Y' and cast the data in wide format.
library(dplyr)
library(tidyr)
df %>%
mutate(value = 'Y') %>%
pivot_wider(names_from = Mutation, values_from = value, values_fill = '')
# ID A B C
# <int> <chr> <chr> <chr>
#1 1 "Y" "" ""
#2 2 "" "Y" "Y"
#3 3 "Y" "" ""
data
df <- structure(list(ID = c(1L, 2L, 2L, 3L), Mutation = c("A", "B",
"C", "A")), class = "data.frame", row.names = c(NA, -4L))

Rearrangement data using r

I would like to ask how can I rearrange my dataset that fulfils the following
[
Original :
Group Value_y Value_z
1 m a
1 n a
2 o b
2 p b
Intended:
Group Value_a Value_b
1 m n
2 o p
]1
which involves separating value_y according to value_z and adding a new column according to the group number. Will potential need to average a separate column's values and add as a new column the same way.
Thank you!
In data.table we can use dcast :
library(data.table)
dcast(setDT(df), Group~rowid(Value_z), value.var = 'Value_y')
# Group 1 2
#1: 1 m n
#2: 2 o p
data
df <- structure(list(Group = c(1L, 1L, 2L, 2L), Value_y = c("m", "n",
"o", "p"), Value_z = c("a", "a", "b", "b")), class = "data.frame",
row.names = c(NA, -4L))
There is a dplyr solution. Define
Uneven = seq(1, dim(A)[1] - 1, by = 2)
Even = seq(2, dim(A)[1], by = 2)
with
A = data.frame(Group = c(1, 1, 2, 2), Value_y = c("m", "n", "o", "p"))
Then, you can use the pipe and some dplyr functionality to get
A2 = A %>%
dplyr::group_by(Group) %>%
dplyr::mutate(Row_1 = Value_y[Uneven]) %>%
dplyr::mutate(Row_2 = Value_y[Even]) %>%
dplyr::select(-Value_y) %>%
dplryr::slice(1)
and the output
> A2
# A tibble: 2 x 3
# Groups: Group [2]
Group Row_1 Row_2
<dbl> <fct> <fct>
1 1 m n
2 2 o p
Note that this solution presupposes two-pairs of Groups, i.e. an even number of observations.

Replace a subset of data frame

I have a data frame with some error
T item V1 V2
1 a 2 .1
2 a 5 .8
1 b 1 .7
2 b 2 .2
I have another data frame with corrections for items concerning V1 only
T item V1
1 a 2
2 a 6
How do I get the final data frame? Should I use merge or rbind. Note: actual data frames are big.
An option would be a data.table join on the 'T', 'item' and assigning the 'V1' with the the corresponding 'V1' column (i.V1) from the second dataset
library(data.table)
setDT(df1)[df2, V1 := i.V1, on = .(T, item)]
df1
# T item V1 V2
#1: 1 a 2 0.1
#2: 2 a 6 0.8
#3: 1 b 1 0.7
#4: 2 b 2 0.2
data
df1 <- structure(list(T = c(1L, 2L, 1L, 2L), item = c("a", "a", "b",
"b"), V1 = c(2L, 5L, 1L, 2L), V2 = c(0.1, 0.8, 0.7, 0.2)),
class = "data.frame", row.names = c(NA, -4L))
df2 <- structure(list(T = 1:2, item = c("a", "a"), V1 = c(2L, 6L)),
class = "data.frame", row.names = c(NA,
-2L))
This should work -
library(dplyr)
df1 %>%
left_join(df2, by = c("T", "item")) %>%
mutate(
V1 = coalesce(as.numeric(V1.y), as.numeric(V1.x))
) %>%
select(-V1.x, -V1.y)

Grouping linked unique ID pairs using R [duplicate]

This question already has an answer here:
Create a group index for values connected directly and indirectly
(1 answer)
Closed 4 years ago.
I'm trying to link together pairs of unique IDs using R. Given the example below, I have two IDs (here ID1 and ID2) that indicate linkage. I'm trying to create groups of rows that are linked. In this example A is linked to B which is linked to D which is linked to E. Because these are all connected, I want to group them together. Next, there is also X which is linked to both Y and Z. Because these two are also connected, I want to assign them to a single group as well. How can I tackle this using R?
Thanks!
Example data:
ID1 ID2
A B
B D
D E
X Y
X Z
DPUT R representation
structure(list(id1 = structure(c(1L, 2L, 3L, 4L, 4L), .Label = c("A", "B", "D", "X"), class = "factor"), id2 = structure(1:5,.Label = c("B", "D", "E", "Y", "Z"), class = "factor")), .Names = c("id1", "id2"), row.names = c(NA, -5L), class = "data.frame")
Output needed:
ID1 ID2 GROUP
A B 1
B D 1
D E 1
X Y 2
X Z 2
As per mentionned by #Frank in the comments, you can use igraph:
library(igraph)
idf <- graph.data.frame(df)
clusters(idf)$membership
Which gives:
A B D X E Y Z
1 1 1 2 1 2 2
Should you want to assign the result back to rows of df:
merge(df, stack(clusters(idf)$membership), by.x = "id1", by.y = "ind", all.x = TRUE)

Resources