How sort stings in alphabetical and numerical order? - r

I have a vector of strings, which I want to sort alphabetically, and then sort by the number, which is at the end of the strings.
Final output should be "AGSHIM1", "AGSHIU1", "AGSHIZ1","AGSHIH2", "AGSHIM2","AGSHIU2", "AGSHIZ2"
d<-c("AGSHIZ2", "AGSHIZ1", "AGSHIU1", "AGSHIM1", "AGSHIH2", "AGSHIM2",
"AGSHIU2")
d[order(d,as.numeric(substr(d, nchar(d), nchar(d))))]
>"AGSHIH2" "AGSHIM1" "AGSHIM2" "AGSHIZ1" "AGSHIZ2" "AGSHIU1" "AGSHIU2"

What you can do is separate the number from the string, and sort by the number first, and then within each group of numbers sort alphabetically:
sortSpecial <- function(d) {
df <- data.frame(
original = d,
chars = gsub("[[:digit:]]", "", d),
nums = gsub("[^[:digit:]]", "", d)
)
df <- df[with(df, order(nums, chars)),]
return(df$original)
}
d <- sortSpecial(d)
d
# [1] "AGSHIM1" "AGSHIU1" "AGSHIZ1" "AGSHIH2" "AGSHIM2" "AGSHIU2" "AGSHIZ2"
There should be a more elegant approach, I just don't know it. Nevertheless, let me know if it helps.
Update
I could not help but get inspired by Karthik S's approach. If you don't want to generate the function first, you can do the same steps as before using dplyr:
library(dplyr)
d <- data.frame(d = d) %>%
mutate(
chars = gsub("[[:digit:]]", "", d),
nums = gsub("[^[:digit:]]", "", d)
) %>%
arrange(nums, chars) %>%
pull(d)
Again, the steps are identical so the choice of approach is a matter of preference.

Another approach. But I am sure a shorter solution most likely exists.
library(dplyr)
library(stringr)
library(tibble)
d %>% as.tibble() %>%
transmute(dig = str_extract(value,'\\d'), ltrs = str_remove(value, '\\d')) %>% type.convert(as.is = 1) %>%
arrange(dig,ltrs) %>% transmute(d = str_c(ltrs,dig, sep = '')) %>% pull(d)
[1] "AGSHIM1" "AGSHIU1" "AGSHIZ1" "AGSHIH2" "AGSHIM2" "AGSHIU2" "AGSHIZ2"

Here is one base R option using gsub + order
> d[order(as.numeric(gsub("\\D", "", d)), d)]
[1] "AGSHIM1" "AGSHIU1" "AGSHIZ1" "AGSHIH2" "AGSHIM2" "AGSHIU2" "AGSHIZ2"

Related

Compare two character vectors in R based on vector of strings

I have two lists A and B. The dates in A are 2000 - 2022 while those in B are 2023-2030.
names(A) and names(B) give the follow character vectors:
a <- c("ACC_a_his", "BCC_b_his", "Can_c_his", "CES_d_his")
b <- c("ACC_a_fu", "BCC_b_fu", "Can_c_fu", "CES_d_fu","FGO_c_fu")
Also, I have a string vector, c which is common across the names in a and b:
c=c("ACC","BCC", "Can", "CES", "FGO")
Note that the strings in c do not always appear in the same position in filenames. The string can be at the beginning, middle or end of filenames.
Challenge
Using the strings in c I would like to get the difference (i.e., which name exists in b but not in a or vice versa) between the names in a and b
Expected output = "FGO_c_fu"
rbind (or whatever is best) matching dataframes in lists A and B if the names are similar based on string in c
Update: See OP's comment:
Try this:
library(dplyr)
library(tibble)
library(tidyr)
library(stringr)
# or just library(tidyverse)
df %>%
pivot_longer(everything()) %>%
mutate(x = str_extract(value, paste(c, collapse = "|"))
) %>%
group_by(x) %>%
filter(!any(row_number() > 1)) %>%
na.omit() %>%
pull(value)
[1] "FGO_c_fu"
First answer:
Here is an alternative approach:
We create a list
the vectors are of unequal length
With data.frame(lapply(my_list, length<-, max(lengths(my_list)))) we create a data frame
pivot longer and group by all before the first underline
remove NA and filter:
library(dplyr)
library(tidyr)
library(tibble)
my_list <- tibble::lst(a, b)
df <- data.frame(lapply(my_list, `length<-`, max(lengths(my_list))))
df %>%
pivot_longer(everything()) %>%
group_by(x = sub("\\_.*", "", value)) %>%
filter(!any(row_number() > 1)) %>%
na.omit() %>%
pull(value)
[1] "FGO_c_fu"

Simplify output for combn function to remove spaces and row and column markers

I would like to take a data set like the one below generated using the combn function:
https://i.ibb.co/r0qSmYV/example.jpg
and have it print
1111112223
2223343344
3454544555
the current solutions I have tried are trimws() and gsub(). Any help will be greatly appreciated.
libary(tidyverse)
as.data.frame(combn(1:5, 3)) |> rowwise() |>
mutate(
txt = paste0(c_across(),collapse = "")
) |> pull(txt) |> cat(fill=1L)
You could do:
library(dplyr)
combn(1:5, 3) |>
as.data.frame() |>
mutate(combined = apply(across(everything()), 1, function(x) paste0(x, collapse = ""))) |>
pull(combined)
which gives:
[1] "1111112223" "2223343344" "3454554555"
of course, if you don't want to just have a vector and instead have a data frame column, don't run the last line.

R - convert a 2-d vector into a 1-d named vector

How can this be done more elegantly? I'm looking to convert a vector of key value pairs as concatenated strings, into a vector of values with the keys as names.
library(tidyverse)
library(purrr)
x <- c("key1|value1", "key2|value2")
# Current way
x_split <- x %>% str_split("\\|")
keys <- x_split %>% map(pluck(1)) %>% unlist()
values <- x_split %>% map(pluck(2)) %>% unlist()
y <- values %>% set_names(keys)
# More elegant way
y <- x %>% some_functions()
You can use simplify = TRUE in str_split and use set_names.
stringr::str_split(x, "\\|", simplify = TRUE) %>% {purrr::set_names(.[, 2], .[, 1])}
# key1 key2
#"value1" "value2"
I've always liked data.table::tstrsplit.
library(data.table)
tstrsplit(x,"\\|") %>% {setNames(.[[2]],.[[1]])}
# key1 key2
#"value1" "value2"

all group members but the current row

I've got a grouped data frame, like so:
df <- data.frame(group = rep(1:4, each=3),
lets = rep(LETTERS[1:4], times=3))
For each row I would now like to identify all lets within the same group other than the lets of the row itself. Using dplyr I can get all lets thus:
df %>%
group_by(group) %>%
mutate(all_lets_in_group = paste(lets, collapse=','))
But how do I exclude the lets of the current row from what I feed into paste()?
The purpose of this task is not very clear, so the code clarity thus suffers as well, but still:
library(tidyverse)
df %>%
group_by(group) %>%
mutate(
all_lets_in_group = lets %>%
map(function(l) setdiff(., l)) %>%
map_chr(function(x) paste(x, collapse=',')))
Uses set operation setdiff to subtract current letter provided by purrr::map from the group's set, then reformats the list of vectors with paste and returns as character vector.
Not sure about a dplyr solution, but you can use lapply.
df$all_lets_in_group <- lapply(1:nrow(df), function(x)
paste(with(df, lets[group == group[x] & lets != lets[x]]), collapse = ','))
Another base R method that uses ave, sapply, and setdiff
ave(df$lets, df$group,
FUN=function(i) sapply(i, function(j) paste(setdiff(i, j), collapse=",")))
[1] "B,C" "A,C" "A,B" "A,B" "D,B" "D,A" "D,A" "C,A" "C,D" "C,D" "B,D" "B,C"

Counting consecutive patterns in strings using R

I'm attempting to write a function to count the number of consecutive instances of a pattern. As an example, I'd like the string
string<-"A>A>A>B>C>C>C>A>A"
to be transformed into
"3 A > 1 B > 3 C > 2 A"
I've got a function that counts the instances of each string, see below. But it doesn't achieve the ordering effect that I'd like. Any ideas or pointers?
Thanks,
R
Existing function:
fnc_gen_PathName <- function(string) {
p <- strsplit(as.character(string), ";")
p1 <- lapply(p, table)
p2 <- lapply(p1, function(x) {
sapply(1:length(x), function(i) {
if(x[i] == 25){
paste0(x[i], "+ ", names(x)[i])
} else{
paste0(x[i], "x ", names(x)[i])
}
})
})
p3 <- lapply(p2, function(x) paste(x, collapse = "; "))
p3 <- do.call(rbind, p3)
return(p3)
}
As commented by #MrFlick you could try the following using rle and strsplit
with(rle(strsplit(string, ">")[[1]]), paste(lengths, values, collapse = " > "))
## [1] "3 A > 1 B > 3 C > 2 A"
Here are two dplyr solutions: one regular and one with rle. Advantages are: can input multiple strings as a vector, builds a tidy intermediate dataset before (ugh) renesting.
library(dplyr)
library(tidyr)
library(stringi)
strings = "A>A>A>B>C>C>C>A>A"
data_frame(string = strings) %>%
mutate(string_split =
string %>%
stri_split_fixed(">")) %>%
unnest(string_split) %>%
mutate(ID =
string_split %>%
lag %>%
`!=`(string_split) %>%
plyr::mapvalues(NA, TRUE) %>%
cumsum) %>%
count(string, ID, string_split) %>%
group_by(string) %>%
summarize(new_string =
paste(n,
string_split,
collapse = " > ") )
data_frame(string = strings) %>%
group_by(string) %>%
do(.$string %>%
first %>%
stri_split_fixed(">") %>%
first %>%
rle %>%
unclass %>%
as.data.frame) %>%
summarize(new_string =
paste(lengths, values, collapse = " > "))

Resources