Remove white space [duplicate] - r

This question already has answers here:
How to remove all whitespace from a string?
(9 answers)
Closed 5 years ago.
I'm trying to remove the white space in the filename that I have created using the follwing code:
epoch <- strsplit(toString(files[val]),split='.', fixed=TRUE)[[1]][1]
print(paste(epoch,".csv"))
The current output gives me: "2016_Q3 .csv". I would like to remove the whitespace between the 3 and the .so the final string looks like "2016_Q3.csv"
I have looked at gsub and trimws, but can't get them to work.

paste puts a space by default.
Instead do:
paste(epoch,".csv",sep="")
or
paste0(epoch,".csv")
Both return:
[1] "2016_Q3.csv"

If you want to use gsub it becomes a pretty simple task.
str <- "2016_Q3 .csv"
gsub(" ","",str)
Gives you:
"2016_Q3.csv"

We can use sub to match one or more spaces (\\s+) followed by a dot (\\. - escape the dot as it is a metacharacter implying any character) and replace it with .
sub("\\s+\\.", ".", str1)
#[1] "2016_Q3.csv"
Using the OP's example, even a non-specific (\\s+) should work
sub("\\s+", "", str1)
data
str1 <- "2016_Q3 .csv"

Using stringr:
library(stringr)
epoch = str_replace(epoch, " ", "")

Related

Replace "$" in a string in R [duplicate]

This question already has answers here:
How do I deal with special characters like \^$.?*|+()[{ in my regex?
(2 answers)
Closed 1 year ago.
I would like to replace $ in my R strings. I have tried:
mystring <- "file.tree.id$HASHd15962267-44c21f1cee1057d95d6840$HASHe92451fece3b3341962516acfa962b2f$checked"
stringr::str_replace(mystring, pattern="$",
replacement="!")
However, it fails and my replacement character is put as the last character in my original string:
[1] "file.tree.id$HASHd15962267-44c21f1cee1057d95d6840$HASHe92451fece3b3341962516acfa962b2f$checked!"
I tried some variation using "pattern="/$" but it fails as well. Can someone point a strategy to do that?
In base R, You could use:
chartr("$","!", mystring)
[1] "file.tree.id!HASHd15962267-44c21f1cee1057d95d6840!HASHe92451fece3b3341962516acfa962b2f!checked"
Or even
gsub("$","!", mystring, fixed = TRUE)
We need fixed to be wrapped as by default pattern is in regex mode and in regex $ implies the end of string
stringr::str_replace_all(mystring, pattern = fixed("$"),
replacement = "!")
Or could escape (\\$) or place it in square brackets ([$]$), but `fixed would be more faster

Replace a string with first few characters [duplicate]

This question already has answers here:
Regex group capture in R with multiple capture-groups
(9 answers)
Closed 2 years ago.
Let say I have a pattern like -
Str = "#sometext_any_character_including_&**(_etc_blabla\\s"
Now I want to replace above text with
"#some\\s"
i.e. I just want to retain first 4 characters and trailing space and beginning #. Is there any r way to do this?
Any pointer will be highly appreciated.
I would extract using regex. If you want all text following the \\s I would capture them with an ex:
import re
# Extract
pattern = re.compile("(#[a-z]{4}|\\\s)")
my_match = "".join(pattern.findall(my_string))
An option with sub
sub("^(#.{4}).*(\\\\s)$", "\\1\\2", Str)
#[1] "#some\\s"
str_replace(string, pattern, replacement)
or
str_replace_all(string, pattern, replacement)
You can use

Remove (or replace) everything after a specified character in R strings [duplicate]

This question already has answers here:
Remove characters after the last occurrence of a specific character
(1 answer)
regex to remove everything after the last dot in a file
(3 answers)
Closed 4 years ago.
I have a column of strings that I would like to remove everything after the last '.'
I tried:
sub('\\..*', '', x)
But my problem is is that for some of the stings there are x2 '.' and for some only x1 '.'
eg
ENST00000338167.9
ABCDE.42927.6
How can I remove only characters after the last '.'??
So that I'm left with:
ENST00000338167
ABCDE.42927
Many thanks!!
We can use sub to match the . (escaped as it is a metacharacter for any character) followed by 0 or more characters that are not a . ([^.]*) until the end ($) of the string and replace it with blank ("")
sub("\\.[^.]*$", "", x)
#[1] "ENST00000338167" "ABCDE.42927"
Or use str_remove from stringr
library(stringr)
str_remove(x, "\\.[^.]*$")
#[1] "ENST00000338167" "ABCDE.42927"
data
x <- c("ENST00000338167.9", "ABCDE.42927")
Yet another way is by "capturing" the part before.
sub("(.*)\\..*", "\\1", x)

add characters before special characters in a string

I would like to add some characters to a string before a special character "(" and after the special character ")"
The position of "(" and ")" changes from one string to the next.
If it helps, I tried several ways, but I don't know how to piece it back together.
a <- "a(b"
grepl("[[:punct:]]", a) #special character exists
x <- "[[:punct:]]"
image <- str_extract(a, x) #extract special character
image
e.g.
"I want to go out (i.e. now). "
And the result to look like:
"I want to go out again (i.e. now) thanks."
I want to add "again" and "thanks" to the sentence.
Thank you for helping!
Use str_replace
library(stringr)
str_replace("I want to go out (i.e. now).", "\\(", "again (") %>%
str_replace("\\)", ") thanks")
We can use sub. Match the characters inside the brackets including the brackets, capture it as a group, and we replace it with adding 'again' followed by the backreference of the captureed group (\\1) followed by 'thanks'
sub("(\\([^)]+\\))\\..*", "again \\1 thanks.", str1)
#[1] "I want to go out again (i.e. now) thanks."
Or using two capture groups
sub("(\\([^)]+\\))(.*)\\s+", "again \\1 thanks\\2", str1)
#[1] "I want to go out again (i.e. now) thanks."
data
str1 <- "I want to go out (i.e. now). "
NOTE: Using only base R

In R replace punctuation "." within a string [duplicate]

This question already has answers here:
Replacing commas and dots in R
(3 answers)
Closed 7 years ago.
I have look into the web and found this webpage In R, replace text within a string to replace a text within in a string.
I tried the same method to replace the punctuation "." into another punctuation "-" but it did not work.
group <- c("12357.", "12575.", "197.18", ".18947")
gsub(".", "-", group)
gives this output
[1] "------" "------" "------" "------"
instead of
[1] "12357-" "12575-" "197-18" "-18947"
Is there an alternate way to do this ?
"." in regex langage means "any character". To capture the actual point, you need to escape it, so:
gsub("\\.", "-", group)
#[1] "12357-" "12575-" "197-18" "-18947"
As mentioned by #akrun in the comments, if you prefer, you can also enclosed it in between brackets, then you don't need to escape it:
gsub('[.]', '-', group)
[1] "12357-" "12575-" "197-18" "-18947"

Resources