replace "\" character with gsub R - r

Does anybody knows how can i replace "\" in r?
Other answers have posted something like:
l <- "1120190\neconomic"
gsub("\\", "", l, fixed=TRUE)
But didn't work in my case.

The \n is a symbol for newline and if you want to replace it with space you can use the following:
l <- "1120190\neconomic"
cat(l)
gsub("\n", " ", l, fixed=TRUE)
Note, that the output would be:
1120190
economic
[1] "1120190 economic"

Related

Replace a pattern with special characters in R

I have a string x as below. I am trying to replace "c("" and "\nLOC" such that I am left with only Abc,xyz.
x<-"c(\"Abc, xyz\\nLOC"
This is what I tried which works but is there a shorter way of doing it?
x <-str_replace_all(x, "[^[:alnum:]]", " ")
x <-str_replace_all(x, "c ", "")
x <-str_replace_all(x, "nLOC", "")
With only a single example, it's hard to know what to generalize... You could do it all in one big pattern,
str_replace_all(x, "[^[:alnum:],]|^c|nLOC", "")
[1] "Abc,xyz"

How to throw out spaces and underscores only from the beginning of the string?

I want to ignore the spaces and underscores in the beginning of a string in R.
I can write something like
txt <- gsub("^\\s+", "", txt)
txt <- gsub("^\\_+", "", txt)
But I think there could be an elegant solution
txt <- " 9PM 8-Oct-2014_0.335kwh "
txt <- gsub("^[\\s+|\\_+]", "", txt)
txt
The output should be "9PM 8-Oct-2014_0.335kwh ". But my code gives " 9PM 8-Oct-2014_0.335kwh ".
How can I fix it?
You could bundle the \s and the underscore only in a character class and use quantifier to repeat that 1+ times.
^[\s_]+
Regex demo
For example:
txt <- gsub("^[\\s_]+", "", txt, perl=TRUE)
Or as #Tim Biegeleisen points out in the comment, if only the first occurrence is being replaced you could use sub instead:
txt <- sub("[\\s_]+", "", txt, perl=TRUE)
Or using a POSIX character class
txt <- sub("[[:space:]_]+", "", txt)
More info about perl=TRUE and regular expressions used in R
R demo
The stringr packages offers some task specific functions with helpful names. In your original question you say you would like to remove whitespace and underscores from the start of your string, but in a comment you imply that you also wish to remove the same characters from the end of the same string. To that end, I'll include a few different options.
Given string s <- " \t_blah_ ", which contains whitespace (spaces and tabs) and underscores:
library(stringr)
# Remove whitespace and underscores at the start.
str_remove(s, "[\\s_]+")
# [1] "blah_ "
# Remove whitespace and underscores at the start and end.
str_remove_all(s, "[\\s_]+")
# [1] "blah"
In case you're looking to remove whitespace only – there are, after all, no underscores at the start or end of your example string – there are a couple of stringr functions that will help you keep things simple:
# `str_trim` trims whitespace (\s and \t) from either or both sides.
str_trim(s, side = "left")
# [1] "_blah_ "
str_trim(s, side = "right")
# [1] " \t_blah_"
str_trim(s, side = "both") # This is the default.
# [1] "_blah_"
# `str_squish` reduces repeated whitespace anywhere in string.
s <- " \t_blah blah_ "
str_squish(s)
# "_blah blah_"
The same pattern [\\s_]+ will also work in base R's sub or gsub, with some minor modifications, if that's your jam (see Thefourthbird`s answer).
You can use stringr as:
txt <- " 9PM 8-Oct-2014_0.335kwh "
library(stringr)
str_trim(txt)
[1] "9PM 8-Oct-2014_0.335kwh"
Or the trimws in Base R
trimws(txt)
[1] "9PM 8-Oct-2014_0.335kwh"

Remove punctuation but keep hyphenated phrases in R text cleaning

Is there any effective way to remove punctuation in text but keeping hyphenated expressions, such as "accident-prone"?
I used the following function to clean my text
clean.text = function(x)
{
# remove rt
x = gsub("rt ", "", x)
# remove at
x = gsub("#\\w+", "", x)
x = gsub("[[:punct:]]", "", x)
x = gsub("[[:digit:]]", "", x)
# remove http
x = gsub("http\\w+", "", x)
x = gsub("[ |\t]{2,}", "", x)
x = gsub("^ ", "", x)
x = gsub(" $", "", x)
x = str_replace_all(x, "[^[:alnum:][:space:]'-]", " ")
#return(x)
}
and apply it on hyphenated expressions that returned
my_text <- "accident-prone"
new_text <- clean.text(text)
new_text
[1] "accidentprone"
while my desired output is
"accident-prone"
I have referenced this thread but didn't find it worked on my situation. There must be some regex things that I haven't figured out. It will be really appreciated if someone could enlighten me on this.
Putting my two cents in, you could use (*SKIP)(*FAIL) with perl = TRUE and remove any non-word characters:
data <- c("my-test of #$%^&*", "accident-prone")
(gsub("(?<![^\\w])[- ](?=\\w)(*SKIP)(*FAIL)|\\W+", "", data, perl = TRUE))
Resulting in
[1] "my-test of" "accident-prone"
See a demo on regex101.com.
Here the idea is to match what you want to keep
(?<![^\\w])[- ](?=\\w)
# a whitespace or a dash between two word characters
# or at the very beginning of the string
let these fail with (*SKIP)(*FAIL) and put what you want to be removed on the right side of the alternation, in this case
\W+
effectively removing any non-word-characters not between word characters.
You'd need to provide more examples for testing though.
The :punct: set of characters includes the dash and you are removing them. You could make an alternate character class that omits the dash. You do need to pay special attention to the square-brackets placements and escape the double quote and the backslash:
(test <- gsub("[]!\"#$%&'()*+,./:;<=>?#[\\^_`{|}~]", "", "my-test of #$%^&*") )
[1] "my-test of "
The ?regex (help page) advises against using ranges. I investigated whether there might be any simplification using my local ASCII sequence of punctuation, but it quickly became obvious that was not the way to go for other reasons. There were 5 separate ranges, and the "]" was in the middle of one of them so there would have been 7 ranges to handle in addition to the "]" which needs to come first.

add space in string when meeting a given pattern

I have a string as follows:
a<-c("AbcDef(123)")
> a
[1] "AbcDef(123)"
Is there any efficient way to transform it as
a<-c("Abc Def (123)")
In other words, I would like to add a space when meeting a upper case or a special character ( .
one possibility :
gsub("(?<=[^A-Z(])(?=[A-Z(])", " ", a, perl=T)
Mine's a bit kludgy and uses two gsubs. The inner gsub adds spaces, the outer gsub removes the leading whitespace.
a <- "AbcDef(123)"
gsub("^\\s", "", gsub("([A-Z(])", " \\1", a))
Try this:
gsub("(?<=.)([A-Z(])", " \\1", a, perl = TRUE)
giving:
[1] "Abc Def (123)"
If the string with spaces has no one-character pieces it can be simplified to this:
gsub("(.)([A-Z(])", "\\1 \\2", a)

remove all line breaks (enter symbols) from the string using R

How to remove all line breaks (enter symbols) from the string?
my_string <- "foo\nbar\rbaz\r\nquux"
I've tried gsub("\n", "", my_string), but it doesn't work, because new line and line break aren't equal.
You need to strip \r and \n to remove carriage returns and new lines.
x <- "foo\nbar\rbaz\r\nquux"
gsub("[\r\n]", "", x)
## [1] "foobarbazquux"
Or
library(stringr)
str_replace_all(x, "[\r\n]" , "")
## [1] "foobarbazquux"
I just wanted to note here that if you want to insert spaces where you found newlines the best option is to use the following:
gsub("\r?\n|\r", " ", x)
which will insert only one space regardless whether the text contains \r\n, \n or \r.
Have had success with:
gsub("\\\n", "", x)
With stringr::str_remove_all
library(stringr)
str_remove_all(my_string, "[\r\n]")
# [1] "foobarbazquux"

Resources