How to disambiguate repeated strings by appending varying length strings? - r

I saw the clever code submitted by Gabor G. in response to this question about disambiguation of strings. His answer, slightly modified, is:
uniqName <- function(x){
thenames <- ave(x,x,FUN = function(z){
znam <- if (length(z) == 1) z else sprintf("%s%02d", z, seq_along(z))
return(znam)
})
return(thenames)
}
I wanted to go for an "invisible" version of that, and tried to come up with a compact function that would append N spaces to the (N+1)th occurrence of a name.
(Gabor's code calculates an integer and appends that, so the number of characters appended is constant). The best I could do was the following clunky function ("fatit")
spacify <- function (x){
fatit <-function(x){
k = vector(length=length(x))
for(jp in 1:length(x)){
k[jp]=sprintf('%s%s',x[jp],paste0(rep(' ',jp),collapse=''))
}
return(k)
}
spaceOut <- ave(x,x, FUN = function(z) if (length(z) == 1) z else fatit(z) )
return(spaceOut)
}
Is there some cleaner, more compact, way to set the number of characters to append based on length(z) in the fatit function ?
Note:
uniqName(foo)
[1] "a01" "b01" "c01" "a02" "b02" "a03" "c02" "d" "e"
spacify(foo)
[1] "a " "b " "c " "a " "b " "a " "c " "d" "e"

We can take advantage of make.unique by striping the numbers that make the characters unique, and using them (... + 1) as reference as to how many characters to append, i.e.
i1 <- as.numeric(gsub('\\D+', '', make.unique(x)))
i1[is.na(i1)] <- 0 #because where there is no number it returns NA
paste0(x, sapply(i1 + 1, function(i) paste(rep(' ', each = i), collapse = '')))
#[1] "a " "b " "c " "a " "b " "a " "c " "d " "e "

We can take advantage of the stri_pad_right function from stringi:
library(stringi)
f <- function(x){
ave(x, x, FUN = function(z){
if(length(z) == 1) z else stri_pad_right(z, nchar(z[1]) + seq_along(z))
})
}
x <- c('a', 'b', 'c', 'a', 'b', 'a', 'c', 'd', 'e')
f(x)
# [1] "a " "b " "c " "a " "b " "a " "c " "d" "e"
Using stringr::str_pad(..., side = 'right') is conceptually similar.

Related

Shortest way to remove duplicate words from string

I have this string:
x <- c("A B B C")
[1] "A B B C"
I am looking for the shortest way to get this:
[1] "A B C"
I have tried this:
Removing duplicate words in a string in R
paste(unique(x), collapse = ' ')
[1] "A B B C"
# does not work
Background:
In a dataframe column I want to count only the unique word counts.
A regex based approach could be shorter - match the non-white space (\\S+) followed by a white space character (\\s), capture it, followed by one or more occurrence of the backreference, and in the replacement, specify the backreference to return only a single copy of the match
gsub("(\\S+\\s)\\1+", "\\1", x)
[1] "A B C"
Or may need to split the string with strsplit, unlist, get the unique and then paste
paste(unique(unlist(strsplit(x, " "))), collapse = " ")
# [1] "A B C"
Another possible solution, based on stringr::str_split:
library(tidyverse)
str_split(x, " ") %>% unlist %>% unique
#> [1] "A" "B" "C"
Just in case the duplicates are not following each other, also using gsub.
x <- c("A B B C")
gsub("\\b(\\S+)\\s+(?=.*\\b\\1\\b)", "", x, perl=TRUE)
#[1] "A B C"
gsub("\\b(\\S+)\\s+(?=.*\\b\\1\\b)", "", "A B B A ABBA", perl=TRUE)
#[1] "B A ABBA"
You can use ,
gsub("\\b(\\w+)(?:\\W+\\1\\b)+", "\\1", x)

Combination of two lists with partial string matching (in R)

I am trying to find all the combinations of two lists, however the second list is essentially repetition of the first lists variables with added brackets etc., as shown below.
other_cols <- c("C", "D", "E", "F")
other_colsRnd <- c("(1|C)", "(1|D)", "(1|E)", "(1|F)")
# I have some code to do combinations from one list:
combos = do.call(c, lapply(seq_along(other_cols), function(y) {
arrangements::combinations(other_cols, y, layout = "l")
}))
theBigList = sapply(combos, paste, collapse = " + ")
> theBigList
[1] "C" "D" "E" "F" "C + D" "C + E" "C + F" "D + E" "D + F"
[10] "E + F" "C + D + E" "C + D + F" "C + E + F" "D + E + F" "C + D + E + F"
I would like the full list of combinations in theBigList of both of them combined, without any repetition of C and (1|C)
########
edit
C or D etc. are shorthand versions of the "real" variables, which look more like:
other_cols <- c("Charlie", "Delta", "Echo", "Foxtrot")
other_colsRnd <- c("(1|Charlie)", "(1|Delta)", "(1|Echo)", "(1|Foxtrot)")
########
The expected outcome is something like this, though stored order will not be important.
theBigList
"C" "(1|C)" "D" "(1|D)" "E" "(1|E)" "F" "(1|F)" "C + D"
"C + (1|D)" "C + E" "C + (1|E)" "C + F" "C + (1|F)"
"D + E" "D + (1|E)" "D + F" "D + (1|F)"
"E + F" "E + (1|F)"
"C + D + E" "(1|C) + D + E" "(1|C) + (1|D) + E" "(1|C) + (1|D) + (1|E)" etc.
Is there a way to put the lapply inside the lapply?
Or, I am currently thinking I can comboRnd e.g
combosRnd = do.call(c, lapply(seq_along(other_cols), function(y) {
arrangements::combinations(other_colsRnd, y, layout = "l")
}))
and then take inspiration from here using var_comb <- expand.grid(combos, combosRnd) with some sort of if and grep to detect the "same" variables, that I haven't worked out yet.
edit
I think I think, I can add combos e.g. something like
theBigList = sapply(combos, paste, collapse = " + ")
theBigListRnd = sapply(combosRnd, paste, collapse = " + ")
comboBigList = c(theBigList, theBigListRnd)
var_comb <- expand.grid(combos, combosRnd)
var_comb2 <- expand.grid(theBigList, theBigListRnd)
... so comboBigList has all the ones where there is no crossover whatsoever, and then I can remove any "lines" in either or var_comb or var_comb2 that have that have matching anything matching in the var columns.
Yes, this is a smaller easier chunk of my previously asked question here, however I have refined it to the bare necessity for me to get this infernal analysis done, as it seems that I may have been biting off more than I can chew on that one. I will brute force the nestings I need with this as a supplement (hopefully).
Why not combine other_cols and other_colsRnd and use the same code that you have.
combine_vec <- c(other_cols, other_colsRnd)
combos <- do.call(c, lapply(seq_along(combine_vec), function(y) {
arrangements::combinations(combine_vec, y, layout = "l")
}))
theBigList = sapply(combos, paste, collapse = " + ")
theBigList
# [1] "C"
# [2] "D"
# [3] "E"
# [4] "F"
# [5] "(1|C)"
# [6] "(1|D)"
# [7] "(1|E)"
# [8] "(1|F)"
# [9] "C + D"
# [10] "C + E"
# [11] "C + F"
# [12] "C + (1|C)"
#...
#...
From this theBigList you can drop the variable + (1|variable) combination using the following code.
library(stringr)
finalList <- theBigList[!mapply(function(x, y) any(x %in% y) || any(y %in% x),
str_extract_all(theBigList, '\\b[A-Z](?!\\))'),
str_extract_all(theBigList, '(?<=1\\|)[A-Z]'))]

Combining vector elements with paste

I have two vectors:
old <- c("a", "b", "c")
new <- c("1", "2", "3")
I want to combine them, so that the elements of the new vector var_names are 'a' = '1', 'b' = '2', 'c' = '3'.
I tried something like this:
for (i in length(new)){
var_names[i] <- paste(old[i], "=", new[i])
}
But it is not working properly. What am I doing wrong here?
EDIT
I was a bit unclear about this. But what I am trying to achieve is;
var_names<- c('a' = '1',
'b' = '2',
'c' = '3')
Reason: https://vincentarelbundock.github.io/modelsummary/articles/modelsummary.html#coef-map
Specifically if you want quotes around a and b
paste0("'",old,"'"," = ","'",new,"'")
[1] "'a' = '1'" "'b' = '2'" "'c' = '3'"
If you want it all in one string
paste0("'",old,"'"," = ","'",new,"'",collapse=", ")
[1] "'a' = '1', 'b' = '2', 'c' = '3'"
Edit: regarding your edit, did you mean this?
names(new)=old
new
a b c
"1" "2" "3"
Update
According to your update, you can use setNames
> setNames(new, old)
a b c
"1" "2" "3"
There are two places you have syntax/logic errors:
You didn't initialize a vector var_name
In for loop, you should use 1:length(new)
Below is a correction and it works
var_names <- c()
for (i in 1:length(new)) {
var_names[i] <- paste(old[i], "=", new[i])
}
and you will see
> var_names
[1] "a = 1" "b = 2" "c = 3"
A more efficient way to achieve the same result is using paste or paste0, e.g.,
> paste(old, new, sep = " = ")
[1] "a = 1" "b = 2" "c = 3"
> paste0(old, " = ", new)
[1] "a = 1" "b = 2" "c = 3"

R loop over two or more vectors simultaneously - paralell

I was looking for method to iterate over two or more character vectors/list in R simultaneously ex. is it some way to do something like:
foo <- c('a','c','d')
bar <- c('aa','cc','dd')
for(i in o){
print(o[i], p[i])
}
Desired result:
'a', 'aa'
'c', 'cc'
'd', 'dd'
In Python we can do simply:
foo = ('a', 'c', 'd')
bar = ('aa', 'cc', 'dd')
for i, j in zip(foo, bar):
print(i, j)
But can we do this in R?
Like this?
foo <- c('a','c','d')
bar <- c('aa','cc','dd')
for (i in 1:length(foo)){
print(c(foo[i],bar[i]))
}
[1] "a" "aa"
[1] "c" "cc"
[1] "d" "dd"
Works under the condition that the vectors are the same length.
In R, you rather iterate based on the indices than on vectors directly:
for (i in 1:(min(length(foo), length(bar)))){
print(foo[i], bar[i])
}
Another option is to use mapply. This wouldn't make a lot of sense for printing, but I'm assuming you have an interest in doing this for something more interesting than print
foo <- c('a','c','d')
bar <- c('aa','cc','dd')
invisible(
mapply(function(f, b){ print(c(f, b))},
foo, bar)
)
Maybe someone arriving based on the title makes good use of this:
foo<-LETTERS[1:10]
bar<-LETTERS[1:3]
i = 0
for (j in 1:length(foo)){
i = i + 1
if (i > length(bar)){
i = 1
}
print(paste(foo[j],bar[i]) )
}
[1] "A A"
[1] "B B"
[1] "C C"
[1] "D A"
[1] "E B"
[1] "F C"
[1] "G A"
[1] "H B"
[1] "I C"
[1] "J A"
which is "equivalent" to: (using for eases assignments)
suppressWarnings(invisible(
mapply(function(x, y){
print(paste(x, y))},
foo, bar)
))

Pasting two strings using paste function and its collapse argument

I am trying to paste two vectors
vector_1 <- c("a", "b")
vector_2 <- c("x", "y")
paste(vector_1, vector_2, collapse = " + ")
The output I get is
"a + b x + y "
My desired output is
"a + b + x + y"
paste with more then one argument will paste together term-by-term.
> paste(c("a","b","c"),c("A","B","C"))
[1] "a A" "b B" "c C"
the result being the length of the longest vector, with the shorter term recycled. That enables things like this to work:
> paste("A",c("1","2","BBB"))
[1] "A 1" "A 2" "A BBB"
> paste(c("1","2","BBB"),"A")
[1] "1 A" "2 A" "BBB A"
then sep is used within the elements and collapse to join the elements.
> paste(c("a","b","c"),c("A","B","C"))
[1] "a A" "b B" "c C"
> paste(c("a","b","c"),c("A","B","C"),sep="+")
[1] "a+A" "b+B" "c+C"
> paste(c("a","b","c"),c("A","B","C"),sep="+",collapse="#")
[1] "a+A#b+B#c+C"
Note that once you use collapse you get a single result rather than three.
You seem to not want to combine your two vectors element-wise, so you need to turn them into one vector, which you can do with c(), giving us the solution:
> c(vector_1, vector_2)
[1] "a" "b" "x" "y"
> paste(c(vector_1, vector_2), collapse=" + ")
[1] "a + b + x + y"
Note that sep isn't needed - you are just collapsing the individual elements into one string.

Resources