Filter Dataframe by second dataframe [duplicate] - r

This question already has answers here:
Subsetting a data frame to the rows not appearing in another data frame
(5 answers)
Closed 6 years ago.
I have two dataframes.
selectedcustomersa is a dataframe with information about 50 customers. Fist column is the name (Group.1).
selectedcustomersb is another dataframe (same structure) with information about 2000 customers and customers from selectedcustomersa are included there.
I want selctedcustomersb without the customers from selctedcustomersa.
I tried:
newselectedcustomersb<-filter(selectedcustomersb, Group.1!=selectedcustomersa$Group.1)

One way to do this is to use the anti_join in dplyr as follows. It will work across multiple columns and such.
library(dplyr)
df1 <- data.frame(x = c('a', 'b', 'c', 'd'), y = 1:4)
df2 <- data.frame(x = c('c', 'd', 'e', 'f'), z = 1:4)
df <- anti_join(df2, df1)
df
x z
1 e 3
2 f 4

Try:
newselectedcustomersb <- filter(selectedcustomersb, !(Group.1 %in% selectedcustomersa$Group.1))

Related

R- How to rearrange rows in a data frame with a foreign key from another data frame

I'm having a bit of trouble trying to figure out how to rearrange rows in a data frame in R.
I have two data frames which are in different order and both do have a ID which identifies the tipples.
Now I would like to reorder data frame 1 (ID 1) so that it is in the same order like data frame 2 (ID2).
Many thanks in advance.
Create a column of ascending integers in data frame 2 to encode the ordering. Then merge that column to data frame 1 and sort on it.
library(dplyr)
df1 <- tibble(
id = c(1, 2, 3),
col1 = c('a', 'b', 'c')
)
df2 <- tibble(
id = c(3, 1, 2),
col2 = c('c', 'a', 'b')
)
df2$ordering <- sequence(nrow(df2))
df1_ordered <- df1 %>%
left_join(df2, by = 'id') %>%
arrange(ordering)
We can use match to match the ID's and then reorder df1 based on it. Using #Chris' data
df1[match(df2$id, df1$id),]
# id col1
# <dbl> <chr>
#1 3 c
#2 1 a
#3 2 b

Convert simple data.table to named vector [duplicate]

This question already has answers here:
Split data.frame based on levels of a factor into new data.frames
(3 answers)
Closed 3 years ago.
I need to convert simple data.table to named vector.
Lets say I have data.table
a <- data.table(v1 = c('a', 'b', 'c'), v2 = c(1,2,3))
and I want to get the following named vector
b <- c(1, 2, 3)
names(b) <- c('a', 'b', 'c')
Is there a way to do it simple
Using setNames() in j:
a[, setNames(v2, v1)]
# a b c
# 1 2 3
We can use split and unlist to get it as named vector.
unlist(split(a$v2, a$v1))
#a b c
#1 2 3

How to expand a dataframe base on values in a column [duplicate]

This question already has answers here:
Split comma-separated strings in a column into separate rows
(6 answers)
Closed 3 years ago.
I have multiple values in certain rows within a column in a dataframe. I would like to have a dataframe with a new row for each row that contains multiple values for a single column. I have the gotten the values separated by am now certain how to go forward. Any thoughts?
Here is an example:
## input
tibble(
code = c(
85310,
47730,
61900,
93110,
"56210,\r\n70229",
"93110,\r\n93130,\r\n93290"),
vary2 = LETTERS[1:6])
## desired output
tibble(
code = c(85310, 47730, 61900, 93110, 56210, 70229,
93110, 93130, 93290),
vary2 = c('A', 'B', 'C', 'D', 'E', 'E', 'F', 'F', 'F')
)
## one unsuccesful approach
tibble(
code = c(
85310,
47730,
61900,
93110,
"56210,\r\n70229",
"93110,\r\n93130,\r\n93290"),
vary2 = LETTERS[1:6]) %>%
separate(col = 'code', into = LETTERS[1:3], sep = ',\\r\\n')
We can use separate_rows
library(tidyverse)
df1 %>%
separate_rows(code, sep="[,\r\n]+")
# A tibble: 9 x 2
# code vary2
# <chr> <chr>
#1 85310 A
#2 47730 B
#3 61900 C
#4 93110 D
#5 56210 E
#6 70229 E
#7 93110 F
#8 93130 F
#9 93290 F
As #KerryJackson mentioned in the comments, if we don't specify the sep, the algo will automatically pick up all the delimiters (in case we want to limit this to a particular delimiter- better to use sep)
df1 %>%
separate_rows(code)

adding unique rows from one data frame to another

I have a data frame which comprises a subset of records contained in a 2nd data frame. I would like to add the record rows of the 2nd data frame that are not common in the first data frame to the first... Thank you.
If you want all unique rows from both dataframes, this would work:
df1 <- data.frame(X = c('A','B','C'), Y = c(1,2,3))
df2 <- data.frame(X = 'A', Y = 1)
df <- rbind(df1,df2)
no.dupes <- df[!duplicated(df),]
no.dupes
# X Y
#1 A 1
#2 B 2
#3 C 3
But it won't work if there's duplicate rows in either dataframe that you want to preserve.
You should look dplyr's distint() and bind_rows() functions.
Or Better provide a dummy data to work on and expected output .
Suppose you have two dataframes a and b ,and you want to merge unique rows of a dataframe to the b dataframe
a = data.frame(
x = c(1,2,3,1,4,3),
y = c(5,2,3,5,3,3)
)
b = data.frame(
x = c(6,2,2,3,3),
y = c(19,13,12,3,1)
)
library(dplyr)
distinct(a) %>% bind_rows(.,b)

R: Match a dataframe with 3 others, and create a column

I have a big data frame (df) with a variable id, and 3 other data (df1, df2, df3) frames that have some values of this id. So like the big dataframe has id 1:100, df1 might have 1,2,4,11 etc.
What i need to do is add a column to the big dataframe so that it says from which of the smaller dataframes the data came from.
df$new[df$id %in% df1$id] <- 1
df$new[df$id %in% df2$id] <- 2
df$new[df$id %in% df3$id] <- 3
df$new<- factor(df$new, labels = c('a', 'b', 'c'))
This is my solution but i don't really like it. Any other ideas?
We can use a nested ifelse
with(df, ifelse(id %in% df1$id, 'a',
ifelse(id %in% df2$id, 'b',
ifelse(id %in% df3$id, 'c', id)))

Resources