pivot_longer with separate [duplicate] - r

This question already has answers here:
Split comma-separated strings in a column into separate rows
(6 answers)
Closed 2 years ago.
Suppose I have this data frame:
df <- data.frame(ids=c('1,2','3,4'), vals=c('a', 'b'))
and I want to end up with this one:
data.frame(ids=c('1', '2', '3', '4'), vals=c('a', 'a', 'b', 'b'))
In words: one separate row for each value in the comma-separated lists in ids, with the associated vals duplicated.
I'd like to use the tidyverse. I'm pretty sure I should use pivot_longer, maybe with names_sep, but after reading and fiddling it's not obvious to me.
Help?

We can use separate_rows instead of pivot_longer
library(tidyr)
df %>%
separate_rows(ids)
# A tibble: 4 x 2
# ids vals
# <chr> <chr>
#1 1 a
#2 2 a
#3 3 b
#4 4 b

Related

Delete rows based on values in R [duplicate]

This question already has answers here:
Subset data frame based on multiple conditions [duplicate]
(3 answers)
How to combine multiple conditions to subset a data-frame using "OR"?
(5 answers)
Closed 2 years ago.
Is there a way to delete rows based on values . For example
df
ColA ColB
A 1
B 2
A 3
Expected output (Basically i know we can delete based on row number. But is there way to way to delete based on values ("A", 3)
df
ColA ColB
A 1
B 2
You can use subset from base R
> subset(df,!(ColA=="A"&ColB==3))
ColA ColB
1 A 1
2 B 2
or a data.table solution
> setDT(df)[!.("A",3),on = .(ColA,ColB)]
ColA ColB
1: A 1
2: B 2
An option with filter
library(dplyr)
df %>%
filter(!(ColA == "A" & ColB == 3))
The easiest way to do this is to use the which() function (?which). You can then use this with a minus sign in conjunction with with indexing to subset based on a particular criteria.
df <- as.data.frame(cbind("ColA"=c("A", "B", "A"), "ColB" = c(1, 2, 3)))
df <- df[-which(df[,2]==3),]
View(df)

Convert simple data.table to named vector [duplicate]

This question already has answers here:
Split data.frame based on levels of a factor into new data.frames
(3 answers)
Closed 3 years ago.
I need to convert simple data.table to named vector.
Lets say I have data.table
a <- data.table(v1 = c('a', 'b', 'c'), v2 = c(1,2,3))
and I want to get the following named vector
b <- c(1, 2, 3)
names(b) <- c('a', 'b', 'c')
Is there a way to do it simple
Using setNames() in j:
a[, setNames(v2, v1)]
# a b c
# 1 2 3
We can use split and unlist to get it as named vector.
unlist(split(a$v2, a$v1))
#a b c
#1 2 3

How to expand a dataframe base on values in a column [duplicate]

This question already has answers here:
Split comma-separated strings in a column into separate rows
(6 answers)
Closed 3 years ago.
I have multiple values in certain rows within a column in a dataframe. I would like to have a dataframe with a new row for each row that contains multiple values for a single column. I have the gotten the values separated by am now certain how to go forward. Any thoughts?
Here is an example:
## input
tibble(
code = c(
85310,
47730,
61900,
93110,
"56210,\r\n70229",
"93110,\r\n93130,\r\n93290"),
vary2 = LETTERS[1:6])
## desired output
tibble(
code = c(85310, 47730, 61900, 93110, 56210, 70229,
93110, 93130, 93290),
vary2 = c('A', 'B', 'C', 'D', 'E', 'E', 'F', 'F', 'F')
)
## one unsuccesful approach
tibble(
code = c(
85310,
47730,
61900,
93110,
"56210,\r\n70229",
"93110,\r\n93130,\r\n93290"),
vary2 = LETTERS[1:6]) %>%
separate(col = 'code', into = LETTERS[1:3], sep = ',\\r\\n')
We can use separate_rows
library(tidyverse)
df1 %>%
separate_rows(code, sep="[,\r\n]+")
# A tibble: 9 x 2
# code vary2
# <chr> <chr>
#1 85310 A
#2 47730 B
#3 61900 C
#4 93110 D
#5 56210 E
#6 70229 E
#7 93110 F
#8 93130 F
#9 93290 F
As #KerryJackson mentioned in the comments, if we don't specify the sep, the algo will automatically pick up all the delimiters (in case we want to limit this to a particular delimiter- better to use sep)
df1 %>%
separate_rows(code)

Selecting rows from a data frame from combinations of lists given by another dataframe [duplicate]

This question already has answers here:
Selecting rows from a data frame from combinations of lists [duplicate]
(2 answers)
Closed 5 years ago.
I have a dataframe, dat:
dat<-data.frame(col1=rep(1:4,3),
col2=rep(letters[24:26],4),
col3=letters[1:12])
I want to filter dat on two different columns using ONLY the combinations given by the rows in the data frame filter:
filter<-data.frame(col1=1:3,col2=NA)
lists<-list(list("x","y"),list("y","z"),list("x","z"))
filter$col2<-lists
So for example, rows containing (1,x) and (1,y), would be selected, but not (1,z),(2,x), or (3,y).
I know how I would do it using a for loop:
#create a frame to drop results in
results<-dat[0,]
for(f in 1:nrow(filter)){
temp_filter<-filter[f,]
temp_dat<-dat[dat$col1==temp_filter[1,1] &
dat$col2%in%unlist(temp_filter[1,2]),]
results<-rbind(results,temp_dat)
}
Or if you prefer dplyr style:
require(dplyr)
results<-dat[0,]
for(f in 1:nrow(filter)){
temp_filter<-filter[f,]
temp_dat<-filter(dat,col1==temp_filter[1,1] &
col2%in%unlist(temp_filter[1,2])
results<-rbind(results,temp_dat)
}
results should return
col1 col2 col3
1 1 x a
5 1 y e
2 2 y b
6 2 z f
3 3 z c
7 3 x g
I would normally do the filtering using a merge, but I can't now since I have to check col2 against a list rather than a single value. The for loop works but I figured there would be a more efficient way to do this, probably using some variation of apply or do.call.
A solution using tidyverse. dat2 is the final output. The idea is to extract the value from the list column of filter data frame. Convert the filter data frame to the format as filter2 with the col1 and col2 columns having the same components in dat data frame. Finally, use semi_join to filter dat to create dat2.
By the way, filter is a pre-defined function in the dplyr package. In your example you used dplyr package, so it is better to avoid naming a data frame as filter.
library(tidyverse)
filter2 <- filter %>%
mutate(col2_a = map_chr(col2, 1),
col2_b = map_chr(col2, 2)) %>%
select(-col2) %>%
gather(group, col2, -col1)
dat2 <- dat %>%
semi_join(filter2, by = c("col1", "col2")) %>%
arrange(col1)
dat2
col1 col2 col3
1 1 x a
2 1 y e
3 2 y b
4 2 z f
5 3 z c
6 3 x g
Update
Another way to prepare the filter2 package, which does not need to know how many elements are in each list. The rest is the same as the previous solution.
library(tidyverse)
filter2 <- filter %>%
rowwise() %>%
do(data_frame(col1 = .$col1, col2 = flatten_chr(.$col2)))
dat2 <- dat %>%
semi_join(filter2, by = c("col1", "col2")) %>%
arrange(col1)
This is doable with a straight-forward join once you get the filter list back to a standard data.frame:
merge(
dat,
with(filter, data.frame(col1=rep(col1, lengths(col2)), col2=unlist(col2)))
)
# col1 col2 col3
#1 1 x a
#2 1 y e
#3 2 y b
#4 2 z f
#5 3 x g
#6 3 z c
Arguably, I'd do away with whatever process is creating those nested lists in the first place.

Filter Dataframe by second dataframe [duplicate]

This question already has answers here:
Subsetting a data frame to the rows not appearing in another data frame
(5 answers)
Closed 6 years ago.
I have two dataframes.
selectedcustomersa is a dataframe with information about 50 customers. Fist column is the name (Group.1).
selectedcustomersb is another dataframe (same structure) with information about 2000 customers and customers from selectedcustomersa are included there.
I want selctedcustomersb without the customers from selctedcustomersa.
I tried:
newselectedcustomersb<-filter(selectedcustomersb, Group.1!=selectedcustomersa$Group.1)
One way to do this is to use the anti_join in dplyr as follows. It will work across multiple columns and such.
library(dplyr)
df1 <- data.frame(x = c('a', 'b', 'c', 'd'), y = 1:4)
df2 <- data.frame(x = c('c', 'd', 'e', 'f'), z = 1:4)
df <- anti_join(df2, df1)
df
x z
1 e 3
2 f 4
Try:
newselectedcustomersb <- filter(selectedcustomersb, !(Group.1 %in% selectedcustomersa$Group.1))

Resources