R How to remap letters in a string - r

I’d be grateful for suggestions as to how to remap letters in strings in a map-specified way.
Suppose, for instance, I want to change all As to Bs, all Bs to Ds, and all Ds to Fs. If I do it like this, it doesn’t do what I want since it applies the transformations successively:
"abc" %>% str_replace_all(c(a = "b", b = "d", d = "f"))
Here’s a way I can do what I want, but it feels a bit clunky.
f <- function (str) str_c( c(a = "b", b = "d", c = "c", d = "f") %>% .[ strsplit(str, "")[[1]] ], collapse = "" )
"abc" %>% map_chr(f)
Better ideas would be much appreciated.
James.
P.S. Forgot to specify. Sometimes I want to replace a letter with multiple letters, e.g., replace all As with the string ZZZ.
P.P.S. Ideally, this would be able to handle vectors of strings too, e.g., c("abc", "gersgaesg", etc.)

We could use chartr in base R
chartr("abc", "bdf", "abbbce")
#[1] "bdddfe"
Or a package solution would be mgsub which would also match and replace strings with number of characters greater than 1
library(mgsub)
mgsub("abbbce", c("a", "b", "c"), c("b", "d", "f"))
#[1] "bdddfe"
mgsub("abbbce", c("a", "b", "c"), c("ba", "ZZZ", "f"))
#[1] "baZZZZZZZZZfe"

Maybe this is more elegant? It will also return warnings when values aren't found.
library(plyr)
library(tidyverse)
mappings <- c(a = "b", b = "d", d = "f")
str_split("abc", pattern = "") %>%
unlist() %>%
mapvalues(from = names(mappings), to = mappings) %>%
str_c(collapse = "")
# The following `from` values were not present in `x`: d
# [1] "bdc"

Related

Apply filter criteria to variables that contain/start with certain string in R

I am trying to find a way to filter a dataframe by criteria applied to variables that their name contains a certain string
in this example below,
I want to find the subjects that any of their test results contain "d".
d=structure(list(ID = c("a", "b", "c", "d", "e"), test1 = c("a", "b", "a", "d", "a"), test2 = c("a", "b", "b", "a", "s"), test3 = c("b", "c", "c", "c", "d"), test4 = c("c", "d", "a", "a", "f")), class = "data.frame", row.names = c(NA, -5L))
I can use dplyr and write one by one using | which works for small examples like this but for my real data will be time consuming.
library(dplyr) library(stringr) d %>% filter(str_detect(d$test1, "d") |str_detect(d$test2, "d") |str_detect(d$test3, "d") |str_detect(d$test4, "d") )
the output I get shows that subjects b, d and e meet the criteria:
ID test1 test2 test3 test4
1 b b b c d
2 d d a c a
3 e a s d f
The output is what I need but I was looking for an easier way, for example, if there is a way to apply the filter criteria to the variables that contain the word "test"
I know about the contain function in dplyr to select certain variables and I tried it here but not working,
d %>% filter(str_detect(contains("test"), "d"))
is there a way to write this code different or is there another way to achieve the same goal?
thank you
In base R you can use lapply/sapply :
d[Reduce(`|`, lapply(d[-1], grepl, pattern = 'd')), ]
#d[rowSums(sapply(d[-1], grepl, pattern = 'd')) > 0, ]
# ID test1 test2 test3 test4
#2 b b b c d
#4 d d a c a
#5 e a s d f
If you are interested in dplyr solution you can use any of the below method :
library(dplyr)
library(stringr)
#1.
d %>%
filter_at(vars(starts_with('test')), any_vars(str_detect(., 'd')))
#2.
d %>%
rowwise() %>%
filter(any(str_detect(c_across(starts_with('test')), 'd')))
#3.
d %>%
filter(Reduce(`|`, across(starts_with('test'), str_detect, 'd')))

Finding in which vector does the element belong to

suppose I have 3 vectors:
a = c("A", "B", "C")
b = c("D", "E", "F")
c = c("G", "H", "I")
then I have an element:
element = "E"
I want to find which list does my element belongs to. In this case, list b.
It will be appreciated if the solution to this problem is more general because my real data set have more than a hundred lists.
element = "E"
names(our_lists)[sapply(our_lists, `%in%`, x = element)]
# [1] "b"
Data
our_lists <- list(
a = c("A", "B", "C"),
b = c("D", "E", "F"),
c = c("G", "H", "I")
)
Using grep.
element <- "E"
l <- mget(c("a", "b", "c"))
names(l)[grep(element, l)]
# [1] "b"
If you keep the data in individual objects, you need to check for the element in each one individually. Get them in a list.
list_data <- mget(c('a', 'b', 'c'))
names(Filter(any, lapply(list_data, `==`, element)))
#[1] "b"
If all your vectors have the same length then a vectorised idea can be,
c('a', 'b', 'c')[ceiling(which(c(a, b, c) == 'E') / length(a))]
#[1] "b"
You can use dplyr::lst that creates named list from variable names. Then purrr::keep to keep only the vectors that contain your element.
require(tidyverse)
lst(a, b, c) %>%
keep(~ element %in% .x) %>%
names()
output:
[1] "b"

Count string followed by separate string occurrences in r

I am trying to count the number of occurrences of a string followed by another string in r. I cannot seem to get the regex worked out to count this correctly.
As an example:
v <- c("F", "F", "C", "F", "F", "C", "F", "F")
b <- str_count(v, "F(?=C)")
I would like b to tell me how many times string F was followed by string C in vector v (which should equal 2).
I have successfully implemented str_count() to count single strings, but I cannot figure out how to count string followed by a different string.
Also, I found that in regex (?=...) should indicated "followed by" however this does not seem to be sufficient.
You don't have one string. You have individual strings. Her you can test if F is followed by C by shifting using [ for subsetting.
sum(v[-length(v)] == "F" & v[-1] == "C")
#sum(v == "F" & c(v[-1] == "C", FALSE)) #Alternative
#[1] 2
To use stringr::str_count you can paste v to one string.
stringr::str_count(paste(v, collapse = ""), "F(?=C)")
#[1] 2
And for rows of a data.frame:
set.seed(42)
v <- as.data.frame(matrix(sample(c("F", "C"), 25, TRUE), 5))
stringr::str_count(apply(v, 1, paste, collapse = ""), "F(?=C)")
#[1] 1 1 2 1 1
You can use lag() from dplyr:
library(dplyr)
sum(v == "C" & lag(v) == "F", na.rm = TRUE)
(The na.rm = TRUE is because the first value of lag(v) is NA).
Your comment notes that you're also interested in applying this across each row of a data frame. This can be done by pivoting the data to be longer, then applying a grouped mutate, then pivoting the data to be wider again. On an example dataset:
example <- tibble(id = 1:3,
s1 = c("F", "F", "F"),
s2 = c("C", "F", "C"),
s3 = c("C", "C", "F"),
s4 = c("F", "C", "C"))
example %>%
pivot_longer(s1:s4) %>%
group_by(id) %>%
mutate(fc_count = sum(value == "C" & lag(value) == "F", na.rm = TRUE)) %>%
ungroup() %>%
pivot_wider(names_from = name, values_from = value)
Result:
# A tibble: 3 x 6
id fc_count s1 s2 s3 s4
<int> <int> <chr> <chr> <chr> <chr>
1 1 1 F C C F
2 2 1 F F C C
3 3 2 F C F C
Note that this assumed the data had something like an id column that uniquely identifies each original row. If it doesn't, you can add one with mutate(id = row_number()) first.

How to use fct_relevel with mutate_at syntax

I want to relevel the factors in a dataset, however I'm really struggling with the fct_relevel syntax and using it with mutate_at. I get a series of errors about my data not being a factor.
The solution must allow me to relevel multiple factors (the actual dataset has 20-odd factors to relevel in different ways)
This answer seems like it should work, but I'm clearly not picking up the syntax properly. Where am I going wrong?
Here's an example:
library(tidyverse)
dat <- tibble (x1 = c("b", "b", "a", "c", "b"),
x2 = c("c", "b", "c", "a", "a"),
y = c(10, 5, 12, 3, 4)) %>%
mutate_at(.vars = vars(x1:x2), factor)
I'm definitely dealing with factors
sapply(dat, class)
But I can't relevel x1, I receive the following error: f must be a factor (or character vector))
dat %>% fct_relevel(x1, "c", "b", "a")
And this is what I ideally want to be able to do
dat2 <- dat %>%
mutate_at(.vars = vars (x1:x2),
.funs = fct_relevel("c", "b", "a"))
At the moment that final set is giving me the following errors:
Error: Can't create call to non-callable object
Call rlang::last_error() to see a backtrace
In addition: Warning message:
Unknown levels in f: b, a
I'd be really grateful for anyone pointing out what I'm sure is an obvious mistake.
This should work
library(dplyr)
library(forcats)
dat <- dat %>% mutate_at(vars(x1:x2), ~fct_relevel(., c("c", "b", "a")))
dat$x1
#[1] b b a c b
#Levels: c b a
dat$x2
#[1] c b c a a
#Levels: c b a
We can specify it with
library(forcats)
dat <- dat %>%
mutate_at(.vars = vars (x1:x2),
fct_relevel, c("c", "b", "a"))

How to paste vector elements comma-separated and in quotation marks?

I want to select columns of data frame dfr by their names in a certain order, that i obtain with the numbers in first place.
> (x <- names(dfr)[c(3, 4, 2, 1, 5)])
[1] "c" "d" "b" "a" "e"
In the final code there only should be included the names version, because it's safer.
dfr[, c("c", "d", "b", "a", "e")
I want to paste the elements separated with commas and quotation marks into a string, in order to include it into the final code. I've tried a few options, but they don't give me what I want:
> paste(x, collapse='", "')
[1] "c\", \"d\", \"b\", \"a\", \"e"
> paste(x, collapse="', '")
[1] "c', 'd', 'b', 'a', 'e"
I need something like "'c', 'd', 'b', 'a', 'e'",—of course "c", "d", "b", "a", "e" would be much nicer.
Data
dfr <- setNames(data.frame(matrix(1:15, 3, 5)), letters[1:5])
So dput(x) is the correct answer but just in case you were wondering how to achieve this by modifying your existing code you could do something like the following:
cat(paste0('c("', paste(x, collapse='", "'), '")'))
c("c", "d", "b", "a", "e")
Can also be done with packages (as Tung has showed), here is an example using glue:
library(glue)
glue('c("{v}")', v = glue_collapse(x, '", "'))
c("c", "d", "b", "a", "e")
Try vector_paste() function from the datapasta package
library(datapasta)
vector_paste(input_vector = letters[1:3])
#> c("a", "b", "c")
vector_paste_vertical(input_vector = letters[1:3])
#> c("a",
#> "b",
#> "c")
Or, using base R, this gives you what you want:
(x <- letters[1:3])
q <- "\""
( y <- paste0("c(", paste(paste0(q, x, q), collapse = ", ") , ")" ))
[1] "c(\"a\", \"b\", \"c\")"
Though I'm not realy sure why you want it? Surely you can simply subset like this:
df <- data.frame(a=1:3, b = 1:3, c = 1:3)
df[ , x]
a b c
1 1 1 1
2 2 2 2
3 3 3 3
df[ , rev(x)]
c b a
1 1 1 1
2 2 2 2
3 3 3 3
suppose you want to add a quotation infront and at the end of a text, and save it as an R object - use the capture.output function from utils pkg.
Example. I want ABCDEFG to be saved as an R object as "ABCDEFG"
> cat("ABCDEFG")
> ABCDEFG
> cat("\"ABCDEFG\"")
> "ABCDEFG"
>
#To save output of the cat as an R object including the quotation marks at the start and end of the word use the capture.ouput
> add_quote <- capture.output(cat("\"ABCDEFG\""))
> add_quote
[1] "\"ABCDEFG\""

Resources