Drop out observations by conditioning on other data in R

Drop out observations by conditioning on other data in R - r

I have two data sets as below. I want to remove 1st data observations which are matched ID, V1 and user with 2nd data sets.
How should I do that?
ID V1 user V2 V3 V4 ...
1 1 A 10
1 2 B 15
1 3 C 13
2 1 A 11
2 1 B 13
3 1 C 15
3 2 B 20
4 1 D 11
4 2 A 15
4 3 B 11
4 3 C 12
ID V1 user
1 3 C
2 1 B
3 2 B
4 3 C

This should work:
merged_df <- merge(data dataframe_1,data dataframe2, by=c("ID","V1","USER"), all.x=TRUE)
This should exclude observations that matched.

Related

How to add a column with repeating but changing sequence?

I'm trying to add a column with repeating sequence but one that changes for each group. In the example data, the group is the id column.
data <- tibble::expand_grid(id = 1:12, condition = c("a", "b", "c"))
data
id condition
1 a
1 b
1 c
2 a
2 b
2 c
3 a
3 b
3 c
... and so on
I'd like to add a column called order to repeat various combinations like 1 2 3 2 3 1 3 1 2 1 3 2 2 1 3 3 2 1 for each id.
In the end, the desired output will look like this
id condition order
1 a 1
1 b 2
1 c 3
2 a 2
2 b 3
2 c 1
3 a 3
3 b 1
3 c 2
... and so on
I'm looking for a simple mutate solution or base R solution. I tried generating a list of combinations but I'm not sure how to create a variable from that.

You can use perms from package pracma to generate all permutations, e.g.,
data %>%
cbind(order = c(t(pracma::perms(1:3))))
which gives
id condition order
1 1 a 3
2 1 b 2
3 1 c 1
4 2 a 3
5 2 b 1
6 2 c 2
7 3 a 2
8 3 b 3
9 3 c 1
10 4 a 2
11 4 b 1
12 4 c 3
13 5 a 1
14 5 b 2
15 5 c 3
16 6 a 1
17 6 b 3
18 6 c 2
19 7 a 3
20 7 b 2
21 7 c 1
22 8 a 3
23 8 b 1
24 8 c 2
25 9 a 2
26 9 b 3
27 9 c 1
28 10 a 2
29 10 b 1
30 10 c 3
31 11 a 1
32 11 b 2
33 11 c 3
34 12 a 1
35 12 b 3
36 12 c 2

How to use join to combined two data frame by two variables and keep different rows with second variable

I want to combine two data frames by both ID and date variables, and want to keep all IDs from two data, and dates from two data.
examples:
data A:
ID date V1
1 1 a
1 4 b
2 9 d
3 10 e
data B:
ID date X
1 1 24
1 2 30
1 4 15
2 2 40
2 5 10
2 7 12
results:
ID date X V1
1 1 24 a
1 2 30 NA
1 4 15 b
2 2 40 NA
2 5 10 NA
2 7 12 NA
2 9 NA d
3 10 NA e

You could use the following solution:
library(dplyr)
df1 %>%
full_join(df2, by = c("ID", "date")) %>%
arrange(ID, date)
ID date V1 X
1 1 1 a 24
2 1 2 <NA> 30
3 1 4 b 15
4 2 2 <NA> 40
5 2 5 <NA> 10
6 2 7 <NA> 12
7 2 9 d NA
8 3 10 e NA

Find all combinations of one column based on the unique values of another column in a dataframe

Suppose that I have a dataframe
data.frame(v1 = c(1,1,1,2,2,3), v2 = c(6,1,6,3,4,2))
v1 v2
1 1 6
2 1 1
3 1 6
4 2 3
5 2 4
6 3 2
Is there an R function to return the following dataframe? i.e. the combinations of v2 with based on the unique values of v1
data.frame(v1 = rep(1:3, 6), v2 = c(6,3,2, 6,4,2, 1,3,2, 1,4,2, 6,3,2, 6,4,2))
v1 v2
1 1 6
2 2 3
3 3 2
4 1 6
5 2 4
6 3 2
7 1 1
8 2 3
9 3 2
10 1 1
11 2 4
12 3 2
13 1 6
14 2 3
15 3 2
16 1 6
17 2 4
18 3 2
P.S. I don't think my question is duplicated. Here v2 has duplicated values and the output dataframe has to keep the order (i.e. v1 = c(1,2,3, 1,2,3, ...). The desired out put has 18 rows but expand.grid gives 36 rows and crossing gives 15 rows

Try the code below
dfout <- data.frame(
v1 = unique(df$v1),
v2 = c(t(rev(expand.grid(rev(with(df, split(v2, v1)))))))
)
which gives
> dfout
v1 v2
1 1 6
2 2 3
3 3 2
4 1 6
5 2 4
6 3 2
7 1 1
8 2 3
9 3 2
10 1 1
11 2 4
12 3 2
13 1 6
14 2 3
15 3 2
16 1 6
17 2 4
18 3 2

R cumulative sum based upon other columns

I have a data.frame as below. The data is sorted by column txt and then by column val. summ column is sum of value in val colummn and the summ column value from the earlier row provided that the current row and the earlier row have same value in txt column...How could i do this in R?
txt=c(rep("a",4),rep("b",5),rep("c",3))
val=c(1,2,3,4,1,2,3,4,5,1,2,3)
summ=c(1,3,6,10,1,3,6,10,15,1,3,6)
dd=data.frame(txt,val,summ)
> dd
txt val summ
1 a 1 1
2 a 2 3
3 a 3 6
4 a 4 10
5 b 1 1
6 b 2 3
7 b 3 6
8 b 4 10
9 b 5 15
10 c 1 1
11 c 2 3
12 c 3 6

If by "most earlier" (which in English is more properly written "earliest") you mean the nearest, which is what is implied by your expected output, then what you're talking about is a cumulative sum. You can apply cumsum() separately to each group of txt with ave():
dd <- data.frame(txt=c(rep("a",4),rep("b",5),rep("c",3)), val=c(1,2,3,4,1,2,3,4,5,1,2,3) );
dd$summ <- ave(dd$val,dd$txt,FUN=cumsum);
dd;
## txt val summ
## 1 a 1 1
## 2 a 2 3
## 3 a 3 6
## 4 a 4 10
## 5 b 1 1
## 6 b 2 3
## 7 b 3 6
## 8 b 4 10
## 9 b 5 15
## 10 c 1 1
## 11 c 2 3
## 12 c 3 6

Repeating sets of rows according to the number of rows by column in R with data.table

Currently in R, I am trying to do the following for data.table table:
Suppose my data looks like:
Class Person ID Index
A 1 3
A 2 3
A 5 3
B 7 2
B 12 2
C 18 1
D 25 2
D 44 2
Here, the class refers to the class a person belongs to. The Person ID variable represents a unique identifier of a person. Finally, the Index tells us how many people are in each class.
From this, I would like to create a new data table as so:
Class Person ID Index
A 1 3
A 2 3
A 5 3
A 1 3
A 2 3
A 5 3
A 1 3
A 2 3
A 5 3
B 7 2
B 12 2
B 7 2
B 12 2
C 18 1
D 25 2
D 44 2
D 25 2
D 44 2
where we repeated each set of persons by class based on the index variable. Hence, we would repeat the class A by 3 times because the index says 3.
So far, my code looks like:
setDT(data)[, list(Class = rep(Person ID, seq_len(.N)), Person ID = sequence(seq_len(.N)), by = Index]
However, I am not getting the correct result and I feel like there is a simpler way to do this. Would anyone have any ideas? Thank you!

If that particular order is important to you, then perhaps something like this should work:
setDT(data)[, list(PersonID, sequence(rep(.N, Index))), by = list(Class, Index)]
# Class Index PersonID V2
# 1: A 3 1 1
# 2: A 3 2 2
# 3: A 3 5 3
# 4: A 3 1 1
# 5: A 3 2 2
# 6: A 3 5 3
# 7: A 3 1 1
# 8: A 3 2 2
# 9: A 3 5 3
# 10: B 2 7 1
# 11: B 2 12 2
# 12: B 2 7 1
# 13: B 2 12 2
# 14: C 1 18 1
# 15: D 2 25 1
# 16: D 2 44 2
# 17: D 2 25 1
# 18: D 2 44 2
If the order is not important, perhaps:
setDT(data)[rep(1:nrow(data), Index)]

Here is a way using dplyr in case you wanted to try
library(dplyr)
data %>%
group_by(Class) %>%
do(data.frame(.[sequence(.$Index[row(.)[,1]]),]))
which gives the output
# Class Person.ID Index
#1 A 1 3
#2 A 2 3
#3 A 5 3
#4 A 1 3
#5 A 2 3
#6 A 5 3
#7 A 1 3
#8 A 2 3
#9 A 5 3
#10 B 7 2
#11 B 12 2
#12 B 7 2
#13 B 12 2
#14 C 18 1
#15 D 25 2
#16 D 44 2
#17 D 25 2
#18 D 44 2

Develop Reference

r css asp.net wordpress firebase qt symfony nginx http apache-flex

Drop out observations by conditioning on other data in R - r

This should work: merged_df <- merge(data dataframe_1,data dataframe2, by=c("ID","V1","USER"), all.x=TRUE) This should exclude observations that matched.

Related

How to add a column with repeating but changing sequence?

How to use join to combined two data frame by two variables and keep different rows with second variable

Find all combinations of one column based on the unique values of another column in a dataframe

R cumulative sum based upon other columns

Repeating sets of rows according to the number of rows by column in R with data.table

Categories

Resources