Remove all columns between two column names in dplyr [duplicate] - r

This question already has answers here:
Deleting multiple columns in R
(4 answers)
Closed 3 years ago.
I'm looking for a simple way to remove all columns between two columns in a dataframe in R.
So let's say I have a dataframe like so:
> test = data.frame('a' = 'a', 'b' = 'b', 'c'= 'c', 'd' = 'd', 'e' = 'e')
> test
a b c d e
1 a b c d e
I'd like to be able to do the following in a dplyr chain
test %>% delete_between(a,c)
>test
d e
1 d e

We can use
test %>%
select(-(a:c))

Related

Combine two columns into a third column as a list in R [duplicate]

This question already has answers here:
Paste multiple columns together
(11 answers)
Closed 2 years ago.
Suppose I have data frame in R
A B
d test1
e test2
Suppose I want to combine two columns A B to a new column C which is list of columns A and B
A B C
d test1 (d,test1)
e test2 (e,test2)
Data:
df <- data.frame(
A = c("d","e"),
B = c("test1", "test2"), stringsAsFactors = F)
Solution:
apply the function paste0 to rows (1) in df, collapsing them by ,and assign the result to df$C:
df$C <- apply(df,1, paste0, collapse = ",")
Result:
df
A B C
1 d test1 d,test1
2 e test2 e,test2

Convert simple data.table to named vector [duplicate]

This question already has answers here:
Split data.frame based on levels of a factor into new data.frames
(3 answers)
Closed 3 years ago.
I need to convert simple data.table to named vector.
Lets say I have data.table
a <- data.table(v1 = c('a', 'b', 'c'), v2 = c(1,2,3))
and I want to get the following named vector
b <- c(1, 2, 3)
names(b) <- c('a', 'b', 'c')
Is there a way to do it simple
Using setNames() in j:
a[, setNames(v2, v1)]
# a b c
# 1 2 3
We can use split and unlist to get it as named vector.
unlist(split(a$v2, a$v1))
#a b c
#1 2 3

How to expand a dataframe base on values in a column [duplicate]

This question already has answers here:
Split comma-separated strings in a column into separate rows
(6 answers)
Closed 3 years ago.
I have multiple values in certain rows within a column in a dataframe. I would like to have a dataframe with a new row for each row that contains multiple values for a single column. I have the gotten the values separated by am now certain how to go forward. Any thoughts?
Here is an example:
## input
tibble(
code = c(
85310,
47730,
61900,
93110,
"56210,\r\n70229",
"93110,\r\n93130,\r\n93290"),
vary2 = LETTERS[1:6])
## desired output
tibble(
code = c(85310, 47730, 61900, 93110, 56210, 70229,
93110, 93130, 93290),
vary2 = c('A', 'B', 'C', 'D', 'E', 'E', 'F', 'F', 'F')
)
## one unsuccesful approach
tibble(
code = c(
85310,
47730,
61900,
93110,
"56210,\r\n70229",
"93110,\r\n93130,\r\n93290"),
vary2 = LETTERS[1:6]) %>%
separate(col = 'code', into = LETTERS[1:3], sep = ',\\r\\n')
We can use separate_rows
library(tidyverse)
df1 %>%
separate_rows(code, sep="[,\r\n]+")
# A tibble: 9 x 2
# code vary2
# <chr> <chr>
#1 85310 A
#2 47730 B
#3 61900 C
#4 93110 D
#5 56210 E
#6 70229 E
#7 93110 F
#8 93130 F
#9 93290 F
As #KerryJackson mentioned in the comments, if we don't specify the sep, the algo will automatically pick up all the delimiters (in case we want to limit this to a particular delimiter- better to use sep)
df1 %>%
separate_rows(code)

rstudio making a new column by comparing two other columns (alphebet)

as a beginner in R, I am having an issue with making a column.
I have a table of students' grades based on points and percentile.
let's say I have something like this.
enter image description here
I wish to create a new column called Finalgrade. And to do so, I would like to compare these two columns and assign the higher grade as finalgrade. Can anyone help me with this?
Let's assume that the grading system has a sequence like below
grade_seq <- c('A', 'AB', 'B', 'BC', 'C', 'D', 'E', 'F')
then
library(dplyr)
df <- df %>%
mutate_if(is.factor, as.character) %>%
mutate(Finalgrade = grade_seq[pmin(match(Gradepoints, grade_seq), match(Gradepercentile, grade_seq))])
gives
Gradepoints Gradepercentile Finalgrade
1 A B A
2 A D A
3 F D D
4 F F F
5 AB BC AB
6 AB C AB
Sample data:
df <- data.frame(Gradepoints = c('A','A','F','F','AB','AB'),
Gradepercentile = c('B','D','D','F','BC','C'))

Filter Dataframe by second dataframe [duplicate]

This question already has answers here:
Subsetting a data frame to the rows not appearing in another data frame
(5 answers)
Closed 6 years ago.
I have two dataframes.
selectedcustomersa is a dataframe with information about 50 customers. Fist column is the name (Group.1).
selectedcustomersb is another dataframe (same structure) with information about 2000 customers and customers from selectedcustomersa are included there.
I want selctedcustomersb without the customers from selctedcustomersa.
I tried:
newselectedcustomersb<-filter(selectedcustomersb, Group.1!=selectedcustomersa$Group.1)
One way to do this is to use the anti_join in dplyr as follows. It will work across multiple columns and such.
library(dplyr)
df1 <- data.frame(x = c('a', 'b', 'c', 'd'), y = 1:4)
df2 <- data.frame(x = c('c', 'd', 'e', 'f'), z = 1:4)
df <- anti_join(df2, df1)
df
x z
1 e 3
2 f 4
Try:
newselectedcustomersb <- filter(selectedcustomersb, !(Group.1 %in% selectedcustomersa$Group.1))

Resources