This question already has answers here:
How to remove all whitespace from a string?
(9 answers)
Closed 1 year ago.
I want to merge following two strings in R (and remove the spaces). I was using paste but I was not able to get desired results.
a <- "big earth"
b <- "small moon"
c <- paste(a,b, sep = "")
I want to have a c <- "bigearthsmallmoon"
Thank you very much for the help.
You can paste the strings together into one with paste(). Then you can use gsub() to remove all spaces:
gsub(" ", "", paste(a, b))
# [1] "bigearthsmallmoon"
c <- paste(a, b, sep = "")
Related
This question already has answers here:
Remove part of string after "."
(6 answers)
gsub() in R is not replacing '.' (dot)
(3 answers)
Closed last year.
I have a list
t <- list('mcd.norm_1','mcc.norm_1', 'mcr.norm_1')
How can i convert the list to remove the period and everything after so the list is just
'mcd' 'mcc' 'mcr'
You may try
library(stringr)
lapply(t, function(x) str_split(x, "\\.", simplify = T)[1])
Another possible solution:
library(tidyverse)
t <- list('mcd.norm_1','mcc.norm_1', 'mcr.norm_1')
t %>%
str_remove("\\..*")
#> [1] "mcd" "mcc" "mcr"
This could be another option:
unlist(sapply(t, \(x) regmatches(x, regexec(".*(?=\\.)", x, perl = TRUE))))
[1] "mcd" "mcc" "mcr"
This question already has answers here:
Finding the most repeated character in a string in R
(2 answers)
Closed 1 year ago.
Suppose the next character string:
test_string <- "A A B B C C C H I"
Is there any way to extract the most frequent value within test_string?
Something like:
extract_most_frequent_character(test_string)
Output:
#C
We can use scan to read the string as a vector of individual elements by splitting at the space, get the frequency count with table, return the named index that have the max count (which.count), get its name
extract_most_frequent_character <- function(x) {
names(which.max(table(scan(text = x, what = '', quiet = TRUE))))
}
-testing
extract_most_frequent_character(test_string)
[1] "C"
Or with strsplit
extract_most_frequent_character <- function(x) {
names(which.max(table(unlist(strsplit(x, "\\s+")))))
}
Here is another base R option (not as elegant as #akrun's answer)
> intToUtf8(names(which.max(table(utf8ToInt(gsub("\\s", "", test_string))))))
[1] "C"
One possibility involving stringr could be:
names(which.max(table(str_extract_all(test_string, "[A-Z]", simplify = TRUE))))
[1] "C"
Or marginally shorter:
names(which.max(table(str_extract_all(test_string, "[A-Z]")[[1]])))
Here is solution using stringr package, table and which:
library(stringr)
test_string <- str_split(test_string, " ")
test_string <- table(test_string)
names(test_string)[which.max(test_string)]
[1] "C"
This question already has answers here:
Can I use an OR statement to indicate the pattern in stringr's str_extract_all function?
(1 answer)
find multiple strings using str_extract_all
(3 answers)
Closed 2 years ago.
Let's say I have this toy vectors
vec <- c("FOO blabla", "fail bla", "blabla FEEbla", "textFOO", "textttt")
to_match <- c("FOO", "FEE")
I would like to obtain a vector of the same length of vec in which to store only the patterns from to_match, if present, otherwise leave NA. Therefore, my desired result would be
c("FOO", NA, "FEE", "FOO", NA)
My first thought was to replace everything that does not match any of the patterns in to_match with whitespaces (""). I tried the following code which does the exact opposite, i.e. it replaces everything that does match any of the patterns in to_match with whitespaces.
sub(paste(to_match, collapse = "|"), "", vec)
# [1] " blabla" "fail bla" "blabla bla" "text" "textttt"
However, I tried to invert this behaviour using a caret (^) before a grouping structure but with scarse success.
# fail
sub(paste0("^(", paste(to_match, collapse = "|"), ")"), "", vec)
# [1] " blabla" "fail bla" "blabla FEEbla" "textFOO" "textttt"
How can I reach the desired output?
Your approach was correct but you should look at extracting the pattern that you want instead of removing which you don't want.
library(stringr)
str_extract(vec, str_c(to_match, collapse = "|"))
#[1] "FOO" NA "FEE" "FOO" NA
This question already has answers here:
Remove everything after space in string
(5 answers)
Closed 6 years ago.
I have a R data frame in which one column has factor data type with all text in that column. I want to extract string from that column considering text before space. I tried gsub( " .*$", "", data[, 3] ),where summary is my that field.But it is not working.
For example: My data is like "abcd efgh ijk" & I want "abcd"
I tried to convert my factor field as a character field using
data[, 3] <- sapply(data[, 3], as.character)
But it's failed to retrieve the string before first space. Can you please help?
Sorry I can't able to put data here as it is a client data
or
x <- "abcd efgh ijk"
strsplit(x, " ")[[1]][1]
Try gsub( "\\s.*", "", data[3,] ) \s is the regular expression for white space. You need the extra \ so R doesn't interpret \ as an escape character.
x<-"abcd efgh ijk"
gsub( "\\s.*", "", x )
[1] "abcd"
Here is a useful cheat sheet of regular expressions:
https://www.cheatography.com/davechild/cheat-sheets/regular-expressions/#downloads
This question already has answers here:
Concatenate a vector of strings/character
(8 answers)
Closed 6 years ago.
I was working with the paste command in R, when I found that
a <- c("something", "to", "paste")
paste(a, sep="_")
produces the output
# [1] "something" "to" "paste"
Which is same as when I print "a"
# [1] "something" "to" "paste"
So what effect does the sep have on the paste command in R?
sep is more generally applicable when you have more than two vectors of length greater than 1. If you were looking to get "something_to_paste", then you would be looking for the collapse argument.
Try the following to get a sense of what the sep argument does:
paste(a, 1:3, sep = "_")
# [1] "something_1" "to_2" "paste_3"
and compare it to collapse:
paste(a, collapse = "_")
# [1] "something_to_paste"