Combination with conditions in R

Combination with conditions in R - r

how to make a combination of letters
label=c("A","B","C","D","E")
into a dataframe with 4 group (G1, G2, G3, G4) as follows
k2=data.frame(G1=c("AB","AC","AD","AE","BC","BD","BE","CD","CE","DE"),
G2=c("C","B","B","B","A","A","A","A","A","A"),
G3=c("D","D","C","C","D","C","C","B","B","B"),
G4=c("E","E","E","D","E","E","D","E","D","C"))
and if i want to make group into 3 (G1, G2, G3) and give condition so that "B" and "C" can't separate like below dataframe how to do?
k3=data.frame(G1=c("BCD","BCE","BCA","AE","AD","DE"),
G2=c("A","A","D","BC","BC","BC"),
G3=c("E","D","E","D","E","A"))
Thank you very much for the help

Here is one way to do what you want to do:
a <- t(combn(c("A", "B", "C", "D", "E"), 2))
a <- paste0(a[, 1], a[, 2])
b <- t(apply(a, 1, function(x) setdiff(c("A", "B", "C", "D", "E"), x)))
k2 <- data.frame(a, b)
colnames(k2) <- paste0("G", 1:4)
k2
# G1 G2 G3 G4
# 1 AB C D E
# 2 AC B D E
# 3 AD B C E
# 4 AE B C D
# 5 BC A D E
# 6 BD A C E
# 7 BE A C D
# 8 CD A B E
# 9 CE A B D
# 10 DE A B C
The simplest way to do the second version is to exclude "C" and add it at the end:
d <- t(combn(c("A", "B", "D", "E"), 2))
d <- paste0[d[, 1], d[, 2]]
e <- t(apply(d, 1, function(x) setdiff(c("A", "B", "D", "E"), x)))
k3 <- data.frame(d, e)
colnames(k3) <- paste0("G", 1:3)
k3 <- data.frame(sapply(g, function(x) gsub("B", "BC", x)))
k3
# G1 G2 G3
# 1 ABC D E
# 2 AD BC E
# 3 AE BC D
# 4 BCD A E
# 5 BCE A D
# 6 DE A BC
This does not match your k3 exactly, but it is more consistent with k2.

Related

Multiply without eliminate information

I have a dataframe and I would like to maintain information. My data frame is like:
a <- c("a","b", "c", "d")
b <- c("e","f", "g", "h")
c <- c(1, 2, 1, 3) # multiply
d <- c("AB","BC", "CD", "DE")
e <- c(7, 5, 4, 3)
f<- c(2, 3, 5, 4)
g<- c(5, 7, 7, 9)
h <- c(6, 1, 2, 10)
m <- data.frame(a, b, d, e, f, g, h, c)
I would like to change e and f with the multiply of c * e ... c * h. How can I do it automatically without writing every single column?

With across:
library(dplyr)
mutate(m, across(c(e, f), ~ .x * c))
#mutate(m, across(c(e, f), `*`, c))
a b d e f c
1 a e AB 7 2 1
2 b f BC 10 6 2
3 c g CD 4 5 1
4 d h DE 9 12 3
or if you wanna keep the original columns:
mutate(m, across(c(e, f), ~ .x * c, .names = "new_{.col}"))
a b d e f c new_e new_f
1 a e AB 7 2 1 7 2
2 b f BC 10 6 2 20 12
3 c g CD 4 5 1 4 5
4 d h DE 9 12 3 27 36
or in base R
m[c(4,5)] <- sapply(m[c(4,5)], \(x) c*x)
# or even simpler (as pointed out by #zx8754 in the comments)
#m[, 4:5] <- m[, 4:5] * m$c
if you wanna keep the original columns:
m[paste0("new_", c("e", "f"))] <- sapply(m[c(4,5)], \(x) c*x)
#m[paste0("new_", c("e", "f"))] <- m[, 4:5] * m$c

You could use mutate, e.g.
a <- c("a","b", "c", "d")
b <- c("e","f", "g", "h")
c <- c(1, 2, 1, 3) #multiply
d <- c("AB","BC", "CD", "DE")
e <- c(7, 5, 4, 3)
f<- c(2, 3, 5, 4)
m <- data.frame(a, b, d, e, f, c)
library(dplyr)
m %>%
mutate(e = e*c,
f = f*c)
Output:
a b d e f c
1 a e AB 7 2 1
2 b f BC 10 6 2
3 c g CD 4 5 1
4 d h DE 9 12 3

base R solution:
m[,4:5] <- sapply(m[4:5], "*", c)
m
a b d e f c
1 a e AB 7 2 1
2 b f BC 10 6 2
3 c g CD 4 5 1
4 d h DE 9 12 3

How to create a long dataset based on number in column indicating how many times they should correspond [duplicate]

This question already has answers here:
Repeat each row of data.frame the number of times specified in a column
(10 answers)
Closed 1 year ago.
for example I have a dataset that looks like this
structure(list(ID = c(1, 2, 3, 4, 5), COL1 = c("A", "B", "C",
"D", "E"), COL2 = c("F", "G", "H", "I", "J"), Paired = c(2, 3,
1, 2, 1)), row.names = c(NA, -5L), class = c("tbl_df", "tbl",
"data.frame"))
ID COL1 COL2 Paired
1 A F 2
2 B G 3
3 C H 1
4 D I 2
5 E J 1
I would like to create a dataset that looks like this. Note the number in the paired column
Col Col2
A F
A F
F A
F A
B G
B G
B G
G B
G B
G B
C H
H C
D I
D I
I D
I D
E J
J E
Note that A F is paired up two times. I want it basically to show in long the two times A and F paired in both combination scenario so 2 pairs is AF, AF, FA, FA.

We can use
library(dplyr)
library(tidyr)
df1 %>%
uncount(Paired) -> tmp
tmp %>%
rename(COL1= COL2, COL2 = COL1) %>%
bind_rows(tmp) %>%
select(-ID)
-output
A tibble: 18 x 2
COL2 COL1
<chr> <chr>
1 A F
2 A F
3 B G
4 B G
5 B G
6 C H
7 D I
8 D I
9 E J
10 F A
11 F A
12 G B
13 G B
14 G B
15 H C
16 I D
17 I D
18 J E

How to count the number of components every time a graph is updated in R

I have the following data:
data <- data.frame(name = c("A", "A", "A", "B", "B", "C", "D", "D", "D", "D", "E", "B", "C", "C"),
surname = c("aa", "bb", "cc", "dd", "hh", "ee", "ii", "aa", "qq", "ff", "gg", "ff", "gg", "cc"))
This data produces a connected graph:
plot(graph_from_data_frame(data, directed = F))
which obviously has 1 component.
I would like to count the number of components this data produces every time we add a row in the graph. For example, the initial graph will have 1 component, since the vertices A and aa in the first row of the data are connected. The next graph will have again 1 component, since we add the second row and because of the A value in the name column. When we include the fourth row (B, dd), the graph will have 2 components.
I use the following piece of code to get the number of components each time the data is updated:
for (i in 1:dim(data)[1]) {
data$number_of_components[i] <- components(graph_from_data_frame(data[1:i,], directed = F))$no}
Is there a smarter/more sophisticated way to get this? Thanks.

You can take a look at sapply().
dt$number_of_components <- sapply(seq_len(nrow(dt)), function(x) {
g <- graph_from_data_frame(dt[seq_len(x),], directed = FALSE)
components(g)$no
})
dt
# name surname number_of_components
# 1 A aa 1
# 2 A bb 1
# 3 A cc 1
# 4 B dd 2
# 5 B hh 2
# 6 C ee 3
# 7 D ii 4
# 8 D aa 3
# 9 D qq 3
# 10 D ff 3
# 11 E gg 4
# 12 B ff 3
# 13 C gg 2
# 14 C cc 1

You can try decompose like below
transform(
data,
num_components = sapply(
seq_along(name),
function(k) length(decompose(graph_from_data_frame(head(data, k), directed = FALSE)))
)
)
or
transform(
data,
num_components = lengths(
sapply(
seq_along(name),
function(k) decompose(graph_from_data_frame(head(data, k), directed = FALSE))
)
)
)
which gives
name surname num_components
1 A aa 1
2 A bb 1
3 A cc 1
4 B dd 2
5 B hh 2
6 C ee 3
7 D ii 4
8 D aa 3
9 D qq 3
10 D ff 3
11 E gg 4
12 B ff 3
13 C gg 2
14 C cc 1

Reshape R dataframe

I have created the following dataframe object in R
df<-data.frame("Attribute"<-c("A", "B", "C", "D"), "Name1"<-c(1,2,3,4),
"Name2"<-c(2,1,2,1), "Name3"<-c(1,3,2,4)
names(df)<-c("Attributes", "Name1", "Name2", "Name3")
df
I would like the following output.
names attributes''
1 A D B
2 B A C
3 C B
4 D
I am unable to get a solution for this. Request your help in this regard

Here is a base R solution using stack and by:
# Sample data
df <- data.frame(
Attribute = c("A", "B", "C", "D"),
Name1 = c(1,2,3,4),
Name2 = c(2,1,2,1),
Name3 = c(1,3,2,4))
df.stacked <- data.frame(stack(df[, -1]), Attribute = df$Attribute);
by(df.stacked, df.stacked$values, function(x) list(unique(x$Attribute)))
#[1] A B D
#Levels: A B C D
#------------------------------------------------------------
#[1] B A C
#Levels: A B C D
#------------------------------------------------------------
#[1] C B
#Levels: A B C D
#------------------------------------------------------------
#[1] D
#Levels: A B C D

Try this
library(dplyr)
library(data.table)
setDT(df)
df2 <- melt(df, id = 1, measure = patterns("Name"), value.name = "names")
df2 %>%
select(-2) %>%
group_by(names) %>%
distinct() %>%
summarise(attributes = paste(Attributes, collapse = " "))
# output
# A tibble: 4 x 2
names attributes
<dbl> <chr>
1 1 A B D
2 2 B A C
3 3 C B
4 4 D

Here is a solution with base R:
df <- data.frame(Attribute=c("A", "B", "C", "D"), Name1=c(1,2,3,4), Name2=c(2,1,2,1), Name3=c(1,3,2,4))
df
A <- df$Attribute
df <- as.matrix(df[-1])
lapply(1:max(df), function(x) A[apply(df==x, 1, any)])
# > lapply(1:max(df), function(x) A[apply(df==x, 1, any)])
# [[1]]
# [1] A B D
# Levels: A B C D
#
# [[2]]
# [1] A B C
# Levels: A B C D
#
# [[3]]
# [1] B C
# Levels: A B C D
#
# [[4]]
# [1] D
# Levels: A B C D
Here is a solution with data.table:
library("data.table")
df <- data.frame(Attribute=c("A", "B", "C", "D"), Name1=c(1,2,3,4), Name2=c(2,1,2,1), Name3=c(1,3,2,4))
df
A <- df$Attribute
df <- setDT(df[-1])
lapply(1:max(as.matrix(df)), function(a) unique(unlist(sapply(df, function(x) A[x==a]))))
# > lapply(1:max(as.matrix(df)), function(a) unique(unlist(sapply(df, function(x) A[x==a]))))
# [[1]]
# [1] A B D
# Levels: A B C D
#
# [[2]]
# [1] B A C
# Levels: A B C D
#
# [[3]]
# [1] C B
# Levels: A B C D
#
# [[4]]
# [1] D
# Levels: A B C D

Merge two data.frames with replacement

I have two datasets. First one is smaller, but have more precise data.
I need to join them, but:
1. If I have some data in Data1 - I'm using only this data.
2. If I haven't got data in Data1, but they're in Data2 - I'm using only data from Data2.
Data1 <- data.frame(
X = c(1,4,7,10,13,16),
Y = c("a", "b", "c", "d", "e", "f")
)
Data2 <- data.frame(
X = c(1:10),
Y = c("a", "b", "c", "d", "e", "f", "g", "h", "i", "j")
)
So my data.frame should look like that:
DataJoin <- data.frame(
X = c(1,4,7,10,13,16,7,8,9,10),
Y = c("a", "b", "c", "d", "e", "f", "g", "h", "i", "j")
)
How can I do that?
I've tried somehow option merge form base package and data.table package, but I couldn't make it happend, as I like.

There's no join needed. You can reformulate the problem as "add the data found in Data2 and not found in Data1 to Data1". So simply do:
id <- Data2$Y %in% Data1$Y
DataJoin <- rbind(Data1,Data2[!id,])
Gives:
> DataJoin
X Y
1 1 a
2 4 b
3 7 c
4 10 d
5 13 e
6 16 f
7 7 g
8 8 h
9 9 i
10 10 j

Using data.table:
d1 <- data.table(Data1, key="Y")[, X := as.integer(X)]
d2 <- data.table(Data2, key="Y")
# copy d2 so that it doesn't get modified by reference
# i.X refers to the column X of DT in 'i' = d1's 'X'
ans <- copy(d2)[d1, X := i.X]
X Y
1: 1 a
2: 4 b
3: 7 c
4: 10 d
5: 13 e
6: 16 f
7: 7 g
8: 8 h
9: 9 i
10: 10 j

DataJoin <- merge(Data1, Data2, by="Y", all=TRUE)
DataJoin$X.x[is.na(DataJoin$X.x)] <- DataJoin$X.y[is.na(DataJoin$X.x)]
DataJoin[,1:2]
# Y X.x
# 1 a 1
# 2 b 4
# 3 c 7
# 4 d 10
# 5 e 13
# 6 f 16
# 7 g 7
# 8 h 8
# 9 i 9
# 10 j 10