For loop: How to print the remaining sequence? - r

I have a vector:
vector <- c("A", "B", "C")
And I want to print the following:
[1] A then B and C
[1] B then A and C
[1] C then A and B
I have been working with a for loop. However, I can't figure out how to print the sequence seperated by 'and'?
for(i in vector){
print(paste(i, "then", XXX))
}
I guess something needs to added where I wrote XXX?

You can use paste with collapse = " then " and reorder vector using [ in your for loop.
for(i in seq_along(vector)) {
print(paste0(vector[i], " then ", paste(vector[-i], collapse = " and ")))
}
#[1] "A then B and C"
#[1] "B then A and C"
#[1] "C then A and B"

You can use setdiff to find the remaining vector and then paste with collapse= to put that whole vector together into some text:
for(i in vector){
remaining.elements.vector <- setdiff(vector, i)
remaining.elements.text <- paste(remaining.elements.vector, collapse=' and ')
print(paste(i, "then", remaining.elements.text))
}

Related

Extract matching words from strings in order

If I have two strings that look like this:
x <- "Here is a test of words and stuff."
y <- "Here is a better test of words and stuff."
Is there an easy way to check the words from left to right and create a new string of matching words and then stop when the words no longer match so the output would look like:
> "Here is a"
I don't want to find all matching words between the two strings but rather just the words that match in order. So "words and stuff." is in both string but I don't want that to be selected.
Split the strings, compute the minimum of the length of the two splits, take that number of words from the head of each and append a FALSE to ensure a non-match can occur when matching the corresponding words. Then use which.min to find the first non-match and take that number minus 1 of the words and paste back together.
L <- strsplit(c(x, y), " +")
wx <- which.min(c(do.call(`==`, lapply(L, head, min(lengths(L)))), FALSE))
paste(head(L[[1]], wx - 1), collapse = " ")
## [1] "Here is a"
This shows you the first n words that match:
xvec <- strsplit(x, " +")[[1]]
yvec <- strsplit(y, " +")[[1]]
(len <- min(c(length(xvec), length(yvec))))
# [1] 8
i <- which.max(cumsum(head(xvec, len) != head(yvec, len)))
list(xvec[1:i], yvec[1:i])
# [[1]]
# [1] "Here" "is" "a" "test" "of" "words" "and" "stuff."
# [[2]]
# [1] "Here" "is" "a" "better" "test" "of" "words" "and"
cumsum(head(xvec, len) != head(yvec, len))
# [1] 0 0 0 1 2 3 4 5
i <- which.max(cumsum(head(xvec, len) != head(yvec, len)) > 0)
list(xvec[1:(i-1)], yvec[1:(i-1)])
# [[1]]
# [1] "Here" "is" "a"
# [[2]]
# [1] "Here" "is" "a"
From here, we can easily derive the leading string:
paste(xvec[1:(i-1)], collapse = " ")
# [1] "Here is a"
and the remaining strings with
paste(xvec[-(1:(i-1))], collapse = " ")
# [1] "test of words and stuff."
I wrote a function which will check the string and return the desired output:
x <- "Here is a test of words and stuff."
y <- "Here is a better test of words and stuff."
z <- "This string doesn't match"
library(purrr)
check_str <- function(inp, pat, delimiter = "\\s") {
inp <- unlist(strsplit(inp, delimiter))
pat <- unlist(strsplit(pat, delimiter))
ln_diff <- length(inp) - length(pat)
if (ln_diff < 0) {
inp <- append(inp, rep("", abs(ln_diff)))
}
if (ln_diff > 0) {
pat <- append(pat, rep("", abs(ln_diff)))
}
idx <- map2_lgl(inp, pat, ~ identical(.x, .y))
rle_idx <- rle(idx)
if (rle_idx$values[1]) {
idx2 <- seq_len(rle_idx$length[1])
} else {
idx2 <- 0
}
paste0(inp[idx2], collapse = delimiter)
}
check_str(x, y, " ")
#> [1] "Here is a"
check_str(x, z, " ")
#> [1] ""
Created on 2023-02-13 with reprex v2.0.2
You could write a helper function to do the check for you
common_start<-function(x, y) {
i <- 1
last <- NA
while (i <= nchar(x) & i <= nchar(x)) {
if (substr(x,i,i) == substr(y,i,i)) {
if (grepl("[[:space:][:punct:]]", substr(x,i,i), perl=T)) {
last <- i
}
} else {
break;
}
i <- i + 1
}
if (!is.na(last)) {
substr(x, 1, last-1)
} else {
NA
}
}
and use that with your sample stirngs
common_start(x,y)
# [1] "Here is a"
The idea is to check every character, keeping track of the last non-word character that still matches. Using a while loop may not be fancy but it does mean you get to break early without processing the whole string as soon as a mismatch is found.

How to paste together all the objects in an environment in R?

This seems like a simple question, but I can't find a solution. I want to take all of the objects (character vectors) in my environment and use them as arguments in a paste function. But the catch is I want to do so without specifying them all individually.
a <- "foo"
b <- "bar"
c <- "baz"
z <- paste(a, b, c, sep = " ")
z
[1] "foo bar baz"
I imagine that there must be something like the ls() would offer this, but obviously
z <- paste(ls(), collapse = " ")
z
[1] "a b c"
not "foo bar baz", which is what I want.
We can use mget to return the values of the objects in a list and then with do.call paste them into a single string
do.call(paste, c(mget(ls()), sep= " "))
As the sep is " ", we don't need that in paste as it by default giving a space
do.call(paste, mget(ls()))

Insert blank space between letters of word

I'm trying to create a function able to return various versions of the same string but with blank spaces between the letters.
something like:
input <- "word"
returning:
w ord
wo rd
wor d
We first break the string into every character using strsplit. We then append an empty space at every position using sapply.
input <- "word"
input_break <- strsplit(input, "")[[1]]
c(input, sapply(seq(1,nchar(input)-1), function(x)
paste0(append(input_break, " ", x), collapse = "")))
#[1] "word" "w ord" "wo rd" "wor d"
?append gives us append(x, values, after = length(x))
where x is the vector, value is the value to be inserted (here " " ) and after is after which place you want to insert the values.
Here is an option using sub
sapply(seq_len(nchar(input)-1), function(i) sub(paste0('^(.{', i, '})'), '\\1 ', input))
#[1] "w ord" "wo rd" "wor d"
Or with substring
paste(substring(input, 1, 1:3), substring(input, 2:4, 4))
#[1] "w ord" "wo rd" "wor d"

How to look for a certain part in a string and only keep that part

What is the cleanest way of finding for example the string ": [1-9]*" and only keeping that part?
You can work with regexec to get the starting points, but isn't there a cleaner way just to get immediately the value?
For example:
test <- c("surface area: 458", "bedrooms: 1", "whatever")
regexec(": [1-9]*", test)
How do I get immediately just
c(": 458",": 1", NA )
You can use base R which handles this just fine.
> x <- c('surface area: 458', 'bedrooms: 1', 'whatever')
> r <- regmatches(x, gregexpr(':.*', x))
> unlist({r[sapply(r, length)==0] <- NA; r})
# [1] ": 458" ": 1" NA
Although, I find it much simpler to just do...
> x <- c('surface area: 458', 'bedrooms: 1', 'whatever')
> sapply(strsplit(x, '\\b(?=:)', perl=T), '[', 2)
# [1] ": 458" ": 1" NA
library(stringr)
str_extract(test, ":.*")
#[1] ": 458" ": 1" NA
Or for a faster approach stringi
library(stringi)
stri_extract_first_regex(test, ":.*")
#[1] ": 458" ": 1" NA
If you need the keep the values of the one that doesn't have the match
gsub(".*(:.*)", "\\1", test)
#[1] ": 458" ": 1" "whatever"
Try any of these. The first two use the base of R only. The last one assumes that we want to return a numeric vector.
1) sub
s <- sub(".*:", ":", test)
ifelse(test == s, NA, s)
## [1] ": 458" ": 1" NA
If there can be more than one : in a string then replace the pattern with "^[^:]*:" .
2) strsplit
sapply(strsplit(test, ":"), function(x) c(paste0(":", x), NA)[2])
## [1] ": 458" ": 1" NA
Do not use this one if there can be more than one : in a string.
3) strapplyc
library(gsubfn)
s <- strapplyc(test, "(:.*)|$", simplify = TRUE)
ifelse(s == "", NA, s)
## [1] ": 458" ": 1" NA
We can omit the ifelse line if "" is ok instead of NA.
4) strapply If the idea is really that there are some digits on the line and we want to return the numbers or NA then try this:
library(gsubfn)
strapply(test, "\\d+|$", as.numeric, simplify = TRUE)
## [1] 458 1 NA

suppress NAs in paste()

Regarding the bounty
Ben Bolker's paste2-solution produces a "" when the strings that are pasted contains NA's in the same position. Like this,
> paste2(c("a","b", "c", NA), c("A","B", NA, NA))
[1] "a, A" "b, B" "c" ""
The fourth element is an "" instead of an NA Like this,
[1] "a, A" "b, B" "c" NA
I'm offering up this small bounty for anyone who can fix this.
Original question
I've read the help page ?paste, but I don't understand how to have R ignore NAs. I do the following,
foo <- LETTERS[1:4]
foo[4] <- NA
foo
[1] "A" "B" "C" NA
paste(1:4, foo, sep = ", ")
and get
[1] "1, A" "2, B" "3, C" "4, NA"
What I would like to get,
[1] "1, A" "2, B" "3, C" "4"
I could do like this,
sub(', NA$', '', paste(1:4, foo, sep = ", "))
[1] "1, A" "2, B" "3, C" "4"
but that seems like a detour.
I know this question is many years old, but it's still the top google result for r paste na. I was looking for a quick solution to what I assumed was a simple problem, and was somewhat taken aback by the complexity of the answers. I opted for a different solution, and am posting it here in case anyone else is interested.
bar <- apply(cbind(1:4, foo), 1,
function(x) paste(x[!is.na(x)], collapse = ", "))
bar
[1] "1, A" "2, B" "3, C" "4"
In case it isn't obvious, this will work on any number of vectors with NAs in any positions.
IMHO, the advantage of this over the existing answers is legibility. It's a one-liner, which is always nice, and it doesn't rely on a bunch of regexes and if/else statements which may trip up your colleagues or future self. Erik Shitts' answer mostly shares these advantages, but assumes there are only two vectors and that only the last of them contains NAs.
My solution doesn't satisfy the requirement in your edit, because my project has the opposite requirement. However, you can easily solve this by adding a second line borrowed from 42-'s answer:
is.na(bar) <- bar == ""
For the purpose of a "true-NA": Seems the most direct route is just to modify the value returned by paste2 to be NA when the value is ""
paste3 <- function(...,sep=", ") {
L <- list(...)
L <- lapply(L,function(x) {x[is.na(x)] <- ""; x})
ret <-gsub(paste0("(^",sep,"|",sep,"$)"),"",
gsub(paste0(sep,sep),sep,
do.call(paste,c(L,list(sep=sep)))))
is.na(ret) <- ret==""
ret
}
val<- paste3(c("a","b", "c", NA), c("A","B", NA, NA))
val
#[1] "a, A" "b, B" "c" NA
I found a dplyr/tidyverse solution to that question, which is rather elegant in my opinion.
library(tidyr)
foo <- LETTERS[1:4]
foo[4] <- NA
df <- data.frame(foo, num = 1:4)
df %>% unite(., col = "New.Col", num, foo, na.rm=TRUE, sep = ",")
> New.Col
1: 1,A
2: 2,B
3: 3,C
4: 4
A function that follows up on #ErikShilt's answer and #agstudy's comment. It generalizes the situation slightly by allowing sep to be specified and handling cases where any element (first, last, or intermediate) is NA. (It might break if there are multiple NA values in a row, or in other tricky cases ...) By the way, note that this situation is described exactly in the second paragraph of the Details section of ?paste, which indicates that at least the R authors are aware of the situation (although no solution is offered).
paste2 <- function(...,sep=", ") {
L <- list(...)
L <- lapply(L,function(x) {x[is.na(x)] <- ""; x})
gsub(paste0("(^",sep,"|",sep,"$)"),"",
gsub(paste0(sep,sep),sep,
do.call(paste,c(L,list(sep=sep)))))
}
foo <- c(LETTERS[1:3],NA)
bar <- c(NA,2:4)
baz <- c("a",NA,"c","d")
paste2(foo,bar,baz)
# [1] "A, a" "B, 2" "C, 3, c" "4, d"
This doesn't handle #agstudy's suggestions of (1) incorporating the optional collapse argument; (2) making NA-removal optional by adding an na.rm argument (and setting the default to FALSE to make paste2 backward compatible with paste). If one wanted to make this more sophisticated (i.e. remove multiple sequential NAs) or faster it might make sense to write it in C++ via Rcpp (I don't know much about C++'s string-handling, but it might not be too hard -- see convert Rcpp::CharacterVector to std::string and Concatenating strings doesn't work as expected for a start ...)
As Ben Bolker mentioned the above approaches may fall over if there are multiple NAs in a row. I tried a different approach that seems to overcome this.
paste4 <- function(x, sep = ", ") {
x <- gsub("^\\s+|\\s+$", "", x)
ret <- paste(x[!is.na(x) & !(x %in% "")], collapse = sep)
is.na(ret) <- ret == ""
return(ret)
}
The second line strips out extra whitespace introduced when concatenating text and numbers.
The above code can be used to concatenate multiple columns (or rows) of a dataframe using the apply command, or repackaged to first coerce the data into a dataframe if needed.
EDIT
After a few more hours thought I think the following code incorporates the suggestions above to allow specification of the collapse and na.rm options.
paste5 <- function(..., sep = " ", collapse = NULL, na.rm = F) {
if (na.rm == F)
paste(..., sep = sep, collapse = collapse)
else
if (na.rm == T) {
paste.na <- function(x, sep) {
x <- gsub("^\\s+|\\s+$", "", x)
ret <- paste(na.omit(x), collapse = sep)
is.na(ret) <- ret == ""
return(ret)
}
df <- data.frame(..., stringsAsFactors = F)
ret <- apply(df, 1, FUN = function(x) paste.na(x, sep))
if (is.null(collapse))
ret
else {
paste.na(ret, sep = collapse)
}
}
}
As above, na.omit(x) can be replaced with (x[!is.na(x) & !(x %in% "") to also drop empty strings if desired. Note, using collapse with na.rm = T returns a string without any "NA", though this could be changed by replacing the last line of code with paste(ret, collapse = collapse).
nth <- paste0(1:12, c("st", "nd", "rd", rep("th", 9)))
mnth <- month.abb
nth[4:5] <- NA
mnth[5:6] <- NA
paste5(mnth, nth)
[1] "Jan 1st" "Feb 2nd" "Mar 3rd" "Apr NA" "NA NA" "NA 6th" "Jul 7th" "Aug 8th" "Sep 9th" "Oct 10th" "Nov 11th" "Dec 12th"
paste5(mnth, nth, sep = ": ", collapse = "; ", na.rm = T)
[1] "Jan: 1st; Feb: 2nd; Mar: 3rd; Apr; 6th; Jul: 7th; Aug: 8th; Sep: 9th; Oct: 10th; Nov: 11th; Dec: 12th"
paste3(c("a","b", "c", NA), c("A","B", NA, NA), c(1,2,NA,4), c(5,6,7,8))
[1] "a, A, 1, 5" "b, B, 2, 6" "c, , 7" "4, 8"
paste5(c("a","b", "c", NA), c("A","B", NA, NA), c(1,2,NA,4), c(5,6,7,8), sep = ", ", na.rm = T)
[1] "a, A, 1, 5" "b, B, 2, 6" "c, 7" "4, 8"
You can use ifelse, a vectorized if-else construct to determine if a value is NA and substitute a blank. You'll then use gsub to strip out the trailing ", " if it isn't followed by any other string.
gsub(", $", "", paste(1:4, ifelse(is.na(foo), "", foo), sep = ", "))
Your answer is correct. There isn't a better way to do it. This issue is explicitly mentioned in the paste documentation in the Details section.
If working with df or tibbles using tidyverse, I use mutate_all or mutate_at with str_replace_na before paste or unite to avoid pasting NAs.
library(tidyverse)
new_df <- df %>%
mutate_all(~str_replace_na(., "")) %>%
mutate(combo_var = paste0(var1, var2, var3))
OR
new_df <- df %>%
mutate_at(c('var1', 'var2'), ~str_replace_na(., "")) %>%
mutate(combo_var = paste0(var1, var2))
This can be acheived in a single line.
For e.g.,
vec<-c("A","B",NA,"D","E")
res<-paste(vec[!is.na(vec)], collapse=',' )
print(res)
[1] "A,B,D,E"
Or remove the NAs after paste with str_replace_all
data$1 <- str_replace_all(data$1, "NA", "")
A variant of Joe's solution (https://stackoverflow.com/a/49201394/3831096) that respects both sep and collapse and returns NA when all values are NA is:
paste_missing <- function(..., sep=" ", collapse=NULL) {
ret <-
apply(
X=cbind(...),
MARGIN=1,
FUN=function(x) {
if (all(is.na(x))) {
NA_character_
} else {
paste(x[!is.na(x)], collapse = sep)
}
}
)
if (!is.null(collapse)) {
paste(ret, collapse=collapse)
} else {
ret
}
}
Here is a solution that behaves more like paste and handles more edge cases than current solutions (empty strings, "NA" strings, more than 2 arguments, use of collapse argument...).
paste2 <- function(..., sep = " ", collapse = NULL, na.rm = FALSE){
# in default case, use paste
if(!na.rm) return(paste(..., sep = sep, collapse = collapse))
# cbind is convenient to recycle, it warns though so use suppressWarnings
dots <- suppressWarnings(cbind(...))
res <- apply(dots, 1, function(...) {
if(all(is.na(c(...)))) return(NA)
do.call(paste, as.list(c(na.omit(c(...)), sep = sep)))
})
if(is.null(collapse)) res else
paste(na.omit(res), collapse = collapse)
}
# behaves like `paste()` by default
paste2(c("a","b", "c", NA), c("A","B", NA, NA))
#> [1] "a A" "b B" "c NA" "NA NA"
# trigger desired behavior by setting `na.rm = TRUE` and `sep = ", "`
paste2(c("a","b", "c", NA), c("A","B", NA, NA), sep = ",", na.rm = TRUE)
#> [1] "a,A" "b,B" "c" NA
# handles hedge cases
paste2(c("a","b", "c", NA, "", "", ""),
c("a","b", "c", NA, "", "", "NA"),
c("A","B", NA, NA, NA, "", ""),
sep = ",", na.rm = TRUE)
#> [1] "a,a,A" "b,b,B" "c,c" NA "," ",," ",NA,"
Created on 2019-10-01 by the reprex package (v0.3.0)
This works for me
library(stringr)
foo <- LETTERS[1:4]
foo[4] <- NA
foo
# [1] "A" "B" "C" NA
if_else(!is.na(foo),
str_c(1:4, str_replace_na(foo, ""), sep = ", "),
str_c(1:4, str_replace_na(foo, ""), sep = "")
)
# [1] "1, A" "2, B" "3, C" "4"
Updating #Erik Shilts solution in order to get rid of the last one comma:
x = gsub(",$", "", paste(1:4, ifelse(is.na(foo), "", foo), sep = ","))
Then in order to get rid of the trailing last "," in it just repeat it once again:
x <- gsub(",$", "", x)

Resources