List of unique characters in a column [duplicate] - r

This question already has answers here:
keep only unique elements in string in r
(2 answers)
Closed 2 years ago.
I am trying to figure out how to extract all the unique characters from a certain column. For example, if one of my column has the following rows,
june
july&
august%
then I would like r to give me the list of all the unique characters, i.e,
junely&agst%
How can this be done in R?

Split the column values at each character and paste only unique characters.
x <- c('june', 'july&', 'august%')
paste0(unique(unlist(strsplit(x, ''))), collapse = "")
#[1] "junely&agst%"

May be a Tidy approach will be useful:
library(dplyr)
library(purrr)
library(stringr)
# input
x <- c("june", "july&", "august%")
expected <- "junely&agst%"
# modify
actual <- x %>% str_split(pattern = "") %>% flatten_chr %>% unique %>% paste0(collapse = "")
# validate
stopifnot(actual == expected)

Related

R - Splitting a dataframe by using strsplit, but keep delimiter [duplicate]

This question already has an answer here:
R split on delimiter (split) keep the delimiter (split)
(1 answer)
Closed 2 months ago.
I have a dataframe like the following:
ref = c("ab/1bc/1", "dd/1", "cc/1", "2323")
text = c("car", "train", "mouse", "house")
data = data.frame(ref, text)
Which produces this:
IF the cell within the ref column has /1 in it, I want to split it and duplicate the row.
I.e. the table above should look like this:
I have the following code, which splits the cell by the /1, but it also removes it. I thought about adding /1 back onto every ref, but not all refs have it.
data1 = data %>%
mutate(ref = strsplit(as.character(ref), "/1")) %>%
unnest(ref)
Some of the other answers use regex for when people split by things like &/,. etc, but not /1. Any ideas?
With separate_rows and look-behind:
library(tidyr)
library(dplyr)
data %>%
separate_rows(ref, sep = "(?<=/1)") %>%
filter(ref != "")
output
# A tibble: 5 × 2
ref text
<chr> <chr>
1 ab/1 car
2 bc/1 car
3 dd/1 train
4 cc/1 mouse
5 2323 house
Or with strsplit:
data %>%
mutate(ref = strsplit(ref, "(?<=/1)", perl = TRUE)) %>%
unnest(ref)

Remove a pattern in a string and mutate these values to a new column [duplicate]

This question already has answers here:
Replace specific characters within strings
(7 answers)
Closed 3 years ago.
Let's say I have this data frame:
df <- as.data.frame(c("77111","77039","5005","4032"))
and I want to create a new column where if the values start with "77", then remove the "77" and extract the remaining numbers. Otherwise, keep the values as is so that the new column looks like this:
df <- df %>% mutate(new_numbers =c("111","039","5005","4032"))
We can use str_remove to remove the 77 from the start (^) of the column
library(dplyr)
library(stringr)
df <- df %>%
mutate(col = str_remove(col, "^77"))
data
df <- data.frame(col= c("77111","77039","5005","4032"))
Another...
df <- df %>%
mutate(new_numbers = gsub('^77', '', original_column))
For an approach in base R, just use gsub:
df$new <- gsub(pattern = "^77",
replacement = "",
string = df[,1])

Rename all columns with characters [duplicate]

This question already has answers here:
Add a prefix to column names
(4 answers)
Closed 3 years ago.
I need to rename all columns in my data.frame. Right now, they are numbered 1-150 (without the X) but I would like to add "id" before each number.
Right now:
c = data.frame(1, 2)
names(c)[1] <- "1"
names(c)[2] <- "2"
What I want: so that it is id1, id2 as each column name.
How can I do this?
You can use dplyr::rename_all()
library(dplyr)
iris %>%
rename_all(~ paste0("id_", .x)) %>%
names()
or with base R
setNames(
iris,
nm = paste0(
"id_", names(iris)
)
) %>% names()

group by a id and concatenate where matches into a new features [duplicate]

This question already has answers here:
Collapse / concatenate / aggregate a column to a single comma separated string within each group
(6 answers)
Closed 4 years ago.
sample_data <- data.frame(id = c("123abc", "def456", "789ghi", "123abc"),
some_str = c("carrots", "bananas", "apples", "cabbage"))
I would like to know how to wrangle sample df to be like this:
desired_df <- data.frame(id = c("123abc", "def456", "789ghi"),
some_str_concat = c("carrots, cabbage", "bananas", "apples"))
Each id may appear multiple times. In that case I would like to get the corresponding value from some_str and concatenate into a new feature, where the new df is grouped on id.
In the example above, id 123abc appears twice. First with a value of "carrots" and then again with a value of "apples". Thus, the desired data frame has a single row for abc123 with the value "carrots, cabbage".
How can I do this? Ideally within either base r or dplyr.
sample_data %>%
+ group_by(id) %>%
+ mutate(some_str = paste(some_str, collapse = ", ")) %>%
+ distinct()

Turning a 1x1 data frame into value [duplicate]

This question already has answers here:
Extract a dplyr tbl column as a vector
(8 answers)
dplyr::select one column and output as vector [duplicate]
(3 answers)
Closed 5 years ago.
I'm using dplyr to transform a large data frame, and I want to store the DF's most recent date + 1 as a value. I know there's easier ways to do this by breaking up the statements, but I'm trying to do it all with one pipe statement. I ran into something and I'm not sure why R defaults that way. Example:
Day <- seq.Date(as.Date('2017-12-01'), as.Date('2018-02-03'), 'day')
Day <- sample(Day, length(Day))
ID <- sample(c(1:5), length(Day), replace = T)
df <- data.frame(ID, Day)
foo <- df %>%
arrange(desc(Day)) %>%
mutate(DayPlus = as.Date(Day) + 1) %>%
select(DayPlus) #%>%
#slice(1)
foo <- foo[1,1]
When I run this code, foo becomes a value equal to 2018-02-04 as desired. However, when I run the code with slice uncommented:
foo <- df %>%
arrange(desc(Day)) %>%
mutate(DayPlus = as.Date(Day) + 1) %>%
select(DayPlus) %>%
slice(1)
foo <- foo[1,1]
foo stays as a dataframe. My main question is why foo doesn't become a value in the second example, and my second question is if there's an easy way get the "2018-02-04" as a value stored as foo all from one dplyr pipe.
Thanks
That's because your first snippet returns a data.frame, the second one returns a tibble. tibbles are similar to data.frames, but one major difference is subsetting. If you have a data.frame, foo[1, 1] returns the first row of the first column as a vector, whereas if you have a tibble it returns the first row of the first column as a tibble.
df %>%
arrange(desc(Day)) %>%
mutate(DayPlus = as.Date(Day) + 1) %>%
select(DayPlus) %>%
class()
returns
[1] "data.frame"
whereas the second one
df %>%
arrange(desc(Day)) %>%
mutate(DayPlus = as.Date(Day) + 1) %>%
select(DayPlus) %>%
slice(1) %>%
class()
returns
[1] "tbl_df" "tbl" "data.frame"

Resources