In R replace punctuation "." within a string [duplicate] - r

This question already has answers here:
Replacing commas and dots in R
(3 answers)
Closed 7 years ago.
I have look into the web and found this webpage In R, replace text within a string to replace a text within in a string.
I tried the same method to replace the punctuation "." into another punctuation "-" but it did not work.
group <- c("12357.", "12575.", "197.18", ".18947")
gsub(".", "-", group)
gives this output
[1] "------" "------" "------" "------"
instead of
[1] "12357-" "12575-" "197-18" "-18947"
Is there an alternate way to do this ?

"." in regex langage means "any character". To capture the actual point, you need to escape it, so:
gsub("\\.", "-", group)
#[1] "12357-" "12575-" "197-18" "-18947"
As mentioned by #akrun in the comments, if you prefer, you can also enclosed it in between brackets, then you don't need to escape it:
gsub('[.]', '-', group)
[1] "12357-" "12575-" "197-18" "-18947"

Related

Replace "$" in a string in R [duplicate]

This question already has answers here:
How do I deal with special characters like \^$.?*|+()[{ in my regex?
(2 answers)
Closed 1 year ago.
I would like to replace $ in my R strings. I have tried:
mystring <- "file.tree.id$HASHd15962267-44c21f1cee1057d95d6840$HASHe92451fece3b3341962516acfa962b2f$checked"
stringr::str_replace(mystring, pattern="$",
replacement="!")
However, it fails and my replacement character is put as the last character in my original string:
[1] "file.tree.id$HASHd15962267-44c21f1cee1057d95d6840$HASHe92451fece3b3341962516acfa962b2f$checked!"
I tried some variation using "pattern="/$" but it fails as well. Can someone point a strategy to do that?
In base R, You could use:
chartr("$","!", mystring)
[1] "file.tree.id!HASHd15962267-44c21f1cee1057d95d6840!HASHe92451fece3b3341962516acfa962b2f!checked"
Or even
gsub("$","!", mystring, fixed = TRUE)
We need fixed to be wrapped as by default pattern is in regex mode and in regex $ implies the end of string
stringr::str_replace_all(mystring, pattern = fixed("$"),
replacement = "!")
Or could escape (\\$) or place it in square brackets ([$]$), but `fixed would be more faster

Trying to figure out regular expression in R for sub() [duplicate]

This question already has answers here:
Replace single backslash in R
(5 answers)
Closed 3 years ago.
I'm trying to use regular expression in a sub() function in order to replace all the "\" in a Vector
I've tried a number of different ways to get R to recognize the "\":
I've tried "\\\" but I keep getting errors.
I've tried "\.*"
I've tried "\\\.*"
data.frame1$vector4 <- sub(pattern = "\\\", replace = ", data.frame1$vector4)
The \ that I am trying to get rid of only appears occasionally in the vector and always in the middle of the string. I want to get rid of it and all the characters that follow it.
The error that I am getting
Error: '\.' is an unrecognized escape in character string starting "\."
Also I'm struggling to get Stack to print the "\" that I am typing above. It keeps deleting them.
1) 4 backslashes To insert backslash into an R literal string use a double backslash; however, a backslash is a metacharacter for a regular expression so it must be escaped by prefacing it with another backslash which also has to be doubled. Thus using 4 backslashes will be needed in the regular expression.
s <- "a\\b\\c"
nchar(s)
## [1] 5
gsub("\\\\", "", s)
## [1] "abc"
2) character class Another way to effectively escape it is to surround it with [...]
gsub("[\\]", "", s)
## [1] "abc"
3) fixed argument Perhaps the simplest way is to use fixed=TRUE in which case special characters will not be regarded as regular expression metacharacters.
gsub("\\", "", s, fixed = TRUE)
## [1] "abc"

Regex for literal curly brackets in R [duplicate]

This question already has answers here:
Error: '\R' is an unrecognized escape in character string starting "C:\R"
(5 answers)
Closed 2 years ago.
I am not an expert on Regex in R, but I feel I have read the docs first long enough and still come up short, so I am posting here.
I am trying to replace the following string, all LITERALLY as written:
a = "\\begin{tabular}"
a = gsub("\\begin{tabular}", "\\scalebox{0.7}{
\\begin{tabular}", a)
Desired output is : cat('\\scalebox{0.7}{ \\begin{tabular}')
So I know I need to escape the first "\" to "\", but when I escape the brackets I get
Error: '\}' is an unrecognized escape in character string starting...
In your case since you're seeking to replace a fixed string, you can simply set fixed = T option to avoid regular expressions entirely.
a = "\\begin{tabular}"
a = gsub("\\begin{tabular}", "\\scalebox{0.7}{\n\\begin{tabular}", x=a, fixed= T)
and use \n for the newline.
If you did want to use regex, you need to escape curly bracket in pattern using two backslashes rather than one.
e.g.,
a = "\\begin{tabular}"
gsub(pattern = "\\{|\\}", replacement = "_foo_", x=a)
[1] "\\begin_foo_tabular_foo_"
Alternatively, you can enclose the curly brackets in square brackets like so:
e.g.,
a = "\\begin{tabular}"
gsub(pattern = "[{]|[}]", replacement = "_foo_", x=a)
[1] "\\begin_foo_tabular_foo_"

Remove (or replace) everything after a specified character in R strings [duplicate]

This question already has answers here:
Remove characters after the last occurrence of a specific character
(1 answer)
regex to remove everything after the last dot in a file
(3 answers)
Closed 4 years ago.
I have a column of strings that I would like to remove everything after the last '.'
I tried:
sub('\\..*', '', x)
But my problem is is that for some of the stings there are x2 '.' and for some only x1 '.'
eg
ENST00000338167.9
ABCDE.42927.6
How can I remove only characters after the last '.'??
So that I'm left with:
ENST00000338167
ABCDE.42927
Many thanks!!
We can use sub to match the . (escaped as it is a metacharacter for any character) followed by 0 or more characters that are not a . ([^.]*) until the end ($) of the string and replace it with blank ("")
sub("\\.[^.]*$", "", x)
#[1] "ENST00000338167" "ABCDE.42927"
Or use str_remove from stringr
library(stringr)
str_remove(x, "\\.[^.]*$")
#[1] "ENST00000338167" "ABCDE.42927"
data
x <- c("ENST00000338167.9", "ABCDE.42927")
Yet another way is by "capturing" the part before.
sub("(.*)\\..*", "\\1", x)

Remove white space [duplicate]

This question already has answers here:
How to remove all whitespace from a string?
(9 answers)
Closed 5 years ago.
I'm trying to remove the white space in the filename that I have created using the follwing code:
epoch <- strsplit(toString(files[val]),split='.', fixed=TRUE)[[1]][1]
print(paste(epoch,".csv"))
The current output gives me: "2016_Q3 .csv". I would like to remove the whitespace between the 3 and the .so the final string looks like "2016_Q3.csv"
I have looked at gsub and trimws, but can't get them to work.
paste puts a space by default.
Instead do:
paste(epoch,".csv",sep="")
or
paste0(epoch,".csv")
Both return:
[1] "2016_Q3.csv"
If you want to use gsub it becomes a pretty simple task.
str <- "2016_Q3 .csv"
gsub(" ","",str)
Gives you:
"2016_Q3.csv"
We can use sub to match one or more spaces (\\s+) followed by a dot (\\. - escape the dot as it is a metacharacter implying any character) and replace it with .
sub("\\s+\\.", ".", str1)
#[1] "2016_Q3.csv"
Using the OP's example, even a non-specific (\\s+) should work
sub("\\s+", "", str1)
data
str1 <- "2016_Q3 .csv"
Using stringr:
library(stringr)
epoch = str_replace(epoch, " ", "")

Resources