How to expand a dataframe base on values in a column [duplicate] - r

This question already has answers here:
Split comma-separated strings in a column into separate rows
(6 answers)
Closed 3 years ago.
I have multiple values in certain rows within a column in a dataframe. I would like to have a dataframe with a new row for each row that contains multiple values for a single column. I have the gotten the values separated by am now certain how to go forward. Any thoughts?
Here is an example:
## input
tibble(
code = c(
85310,
47730,
61900,
93110,
"56210,\r\n70229",
"93110,\r\n93130,\r\n93290"),
vary2 = LETTERS[1:6])
## desired output
tibble(
code = c(85310, 47730, 61900, 93110, 56210, 70229,
93110, 93130, 93290),
vary2 = c('A', 'B', 'C', 'D', 'E', 'E', 'F', 'F', 'F')
)
## one unsuccesful approach
tibble(
code = c(
85310,
47730,
61900,
93110,
"56210,\r\n70229",
"93110,\r\n93130,\r\n93290"),
vary2 = LETTERS[1:6]) %>%
separate(col = 'code', into = LETTERS[1:3], sep = ',\\r\\n')

We can use separate_rows
library(tidyverse)
df1 %>%
separate_rows(code, sep="[,\r\n]+")
# A tibble: 9 x 2
# code vary2
# <chr> <chr>
#1 85310 A
#2 47730 B
#3 61900 C
#4 93110 D
#5 56210 E
#6 70229 E
#7 93110 F
#8 93130 F
#9 93290 F
As #KerryJackson mentioned in the comments, if we don't specify the sep, the algo will automatically pick up all the delimiters (in case we want to limit this to a particular delimiter- better to use sep)
df1 %>%
separate_rows(code)

Related

pivot_longer with separate [duplicate]

This question already has answers here:
Split comma-separated strings in a column into separate rows
(6 answers)
Closed 2 years ago.
Suppose I have this data frame:
df <- data.frame(ids=c('1,2','3,4'), vals=c('a', 'b'))
and I want to end up with this one:
data.frame(ids=c('1', '2', '3', '4'), vals=c('a', 'a', 'b', 'b'))
In words: one separate row for each value in the comma-separated lists in ids, with the associated vals duplicated.
I'd like to use the tidyverse. I'm pretty sure I should use pivot_longer, maybe with names_sep, but after reading and fiddling it's not obvious to me.
Help?
We can use separate_rows instead of pivot_longer
library(tidyr)
df %>%
separate_rows(ids)
# A tibble: 4 x 2
# ids vals
# <chr> <chr>
#1 1 a
#2 2 a
#3 3 b
#4 4 b

R- How to rearrange rows in a data frame with a foreign key from another data frame

I'm having a bit of trouble trying to figure out how to rearrange rows in a data frame in R.
I have two data frames which are in different order and both do have a ID which identifies the tipples.
Now I would like to reorder data frame 1 (ID 1) so that it is in the same order like data frame 2 (ID2).
Many thanks in advance.
Create a column of ascending integers in data frame 2 to encode the ordering. Then merge that column to data frame 1 and sort on it.
library(dplyr)
df1 <- tibble(
id = c(1, 2, 3),
col1 = c('a', 'b', 'c')
)
df2 <- tibble(
id = c(3, 1, 2),
col2 = c('c', 'a', 'b')
)
df2$ordering <- sequence(nrow(df2))
df1_ordered <- df1 %>%
left_join(df2, by = 'id') %>%
arrange(ordering)
We can use match to match the ID's and then reorder df1 based on it. Using #Chris' data
df1[match(df2$id, df1$id),]
# id col1
# <dbl> <chr>
#1 3 c
#2 1 a
#3 2 b

Remove all columns between two column names in dplyr [duplicate]

This question already has answers here:
Deleting multiple columns in R
(4 answers)
Closed 3 years ago.
I'm looking for a simple way to remove all columns between two columns in a dataframe in R.
So let's say I have a dataframe like so:
> test = data.frame('a' = 'a', 'b' = 'b', 'c'= 'c', 'd' = 'd', 'e' = 'e')
> test
a b c d e
1 a b c d e
I'd like to be able to do the following in a dplyr chain
test %>% delete_between(a,c)
>test
d e
1 d e
We can use
test %>%
select(-(a:c))

rstudio making a new column by comparing two other columns (alphebet)

as a beginner in R, I am having an issue with making a column.
I have a table of students' grades based on points and percentile.
let's say I have something like this.
enter image description here
I wish to create a new column called Finalgrade. And to do so, I would like to compare these two columns and assign the higher grade as finalgrade. Can anyone help me with this?
Let's assume that the grading system has a sequence like below
grade_seq <- c('A', 'AB', 'B', 'BC', 'C', 'D', 'E', 'F')
then
library(dplyr)
df <- df %>%
mutate_if(is.factor, as.character) %>%
mutate(Finalgrade = grade_seq[pmin(match(Gradepoints, grade_seq), match(Gradepercentile, grade_seq))])
gives
Gradepoints Gradepercentile Finalgrade
1 A B A
2 A D A
3 F D D
4 F F F
5 AB BC AB
6 AB C AB
Sample data:
df <- data.frame(Gradepoints = c('A','A','F','F','AB','AB'),
Gradepercentile = c('B','D','D','F','BC','C'))

Filter Dataframe by second dataframe [duplicate]

This question already has answers here:
Subsetting a data frame to the rows not appearing in another data frame
(5 answers)
Closed 6 years ago.
I have two dataframes.
selectedcustomersa is a dataframe with information about 50 customers. Fist column is the name (Group.1).
selectedcustomersb is another dataframe (same structure) with information about 2000 customers and customers from selectedcustomersa are included there.
I want selctedcustomersb without the customers from selctedcustomersa.
I tried:
newselectedcustomersb<-filter(selectedcustomersb, Group.1!=selectedcustomersa$Group.1)
One way to do this is to use the anti_join in dplyr as follows. It will work across multiple columns and such.
library(dplyr)
df1 <- data.frame(x = c('a', 'b', 'c', 'd'), y = 1:4)
df2 <- data.frame(x = c('c', 'd', 'e', 'f'), z = 1:4)
df <- anti_join(df2, df1)
df
x z
1 e 3
2 f 4
Try:
newselectedcustomersb <- filter(selectedcustomersb, !(Group.1 %in% selectedcustomersa$Group.1))

Resources