How do I gsub ' with empty? [duplicate] - r

This question already has answers here:
How to remove single quote from a string in R?
(3 answers)
How do I deal with special characters like \^$.?*|+()[{ in my regex?
(2 answers)
Closed 11 months ago.
For example for a string like this
NANYANG-GIRLS'-HIGH-SCHOOL
how do I use gsub to replace ' to empty and make it
NANYANG-GIRLS-HIGH-SCHOOL
when I do it in R, it shows error

You can use either of the following two approaches:
sec_name <- gsub('\'', '', sec_name, fixed=TRUE)
sec_name <- gsub("'", "", sec_name, fixed=TRUE)
This first approach is a correct version of what you were doing. Here, we use single quotes for the strings, but we escape the single quote to make it a literal single quote.

Related

How to treat operators like characters in R [duplicate]

This question already has answers here:
Match string using regex which includes vertical bar
(3 answers)
Closed 12 months ago.
I often have input dataframes where data in some columns are delimited by a "||". I would like to be able to remove all the data after the "||", but since "||" is an operator weird things happen when I treat it like a normal string, e.g.:
gsub("||.*", "", df$col) and str_replace(df$col, "||", "") do not do what I expect them to do.
Is there a simple way to force R to read operators as if they were any other character?
Thanks!
Any of these statements should work:
sub("[|][|].*", "", x)
sub("[|]{2}.*", "", x)
sub("\\|\\|.*", "", x)
sub("\\|{2}.*", "", x)
The problem is not that || is an operator in R. It is that | is a metacharacter in regular expressions. You can get around the special interpretation of metacharacters by placing them inside of character classes delimited by [] or escaping them with a backslash (and escaping that backslash with a second backslash). See ?regex for details.

Is there a way to keep only defined charaters in a string from a whitelist? [duplicate]

This question already has answers here:
in R, use gsub to remove all punctuation except period
(4 answers)
Closed 2 years ago.
I'm looking for a way to use a whitelist that contains digits and the Plus sign "+" to replace all other chars from a string.
string <- "opiqr8929348t89hr289r01++r42+3525"
I tried first to use:
gsub("[[:punct:][:alpha:]]", "", string)
but this excludes also the "+":
# [1] "89293488928901423525"
How can I exclude the "+" from [:alpha:] ?
So my intension is to use a whitelist instead:
whitelist <- c("0123456879+")
Is there a way to use gsub() in the other way around? Because when I use my whitelist it will identify the chars that should remain.
What about this:
string <- "opiqr8929348t89hr289r01++r42+3525"
gsub("[^0-9+]", "", string)
# [1] "89293488928901++42+3525"
This replaces everything that's not a 0-9 or plus with "".

How to escap "..." [duplicate]

This question already has answers here:
How do I deal with special characters like \^$.?*|+()[{ in my regex?
(2 answers)
Closed 3 years ago.
I R imports columns with no colname as ...1 I need to replace this ... with something else
Trying:
str_replace("hi_...","/././.","&")
Seems like you are trying to replace each dot . with &. You need to escape . as \\. and use str_replace_all. Try this,
library(stringr)
str_replace_all("hi_...","\\.","&")
Output,
[1] "hi_&&&"
Just in case you want to replace all three dots with & (which I barely think you wanted), use this,
str_replace("hi_...","\\.\\.\\.","&")
OR
str_replace("hi_...","\\.+","&")
Another way to achieve same can be using gsub
gsub("\\.", "&", "hi_...")
We can use
library(stringr)
str_replace("hi_...", "[.]{3}", "&")

How to remove beginning-digits only in R [duplicate]

This question already has answers here:
Remove numbers at the beginning and end of a string
(3 answers)
Remove string from a vector in R
(4 answers)
Closed 5 years ago.
I have some strings with digits and alpha characters in them. Some of the digits are important, but the ones at the beginning of the string (and only these) are unimportant. This is due to a peculiarity in how email addresses are stored. So the best example is:
x<-'12345johndoe23#gmail.com'
Should be transformed to johndoe23#gmail.com
unfortunately there are no spaces. I have tried gsub('[[:digit:]]+', '', x) but this removes all numbers, not just the beginning-ones
Edit: I have found some solutions in other languages: Python: Remove numbers at the beginning of a string
As per my comment:
See regex in use here
^[[:digit:]]+
^ Asserts position at the start of the string
You can do this:
x<-'12345johndoe23#gmail.com'
gsub('^[[:digit:]]+', '', x) #added ^ as begin of string
Another regex is :
sub('^\\d+','',x)

Using Gsub in R to remove a string containing brackets [duplicate]

This question already has answers here:
How do I deal with special characters like \^$.?*|+()[{ in my regex?
(2 answers)
Closed 6 years ago.
I'm trying to use gsub to remove certain parts of a string. However, I can't get it to work, and I think it's because the string to be removed contains brackets. Is there any way around this? Thanks for any help.
The command I want to use:
gsub('(4:4aCO)_','', '(5:3)_(4:4)_(5:3)_(4:4)_(4:4aCO)_(6:2)_(4:4a)')
Returns:
#"(5:3)_(4:4)_(5:3)_(4:4)_(4:4aCO)_(6:2)_(4:4a)"
Expected output:
#"(5:3)_(4:4)_(5:3)_(4:4)_(6:2)_(4:4a)"
A quick test to see if brackets were the problem:
gsub('te','', 'test')
#[1] "st"
gsub('(te)','', '(te)st')
#[1] "()st"
We can by placing the brackets inside the square brackets as () is a metacharacter
gsub('[(]4:4aCO[)]','', '(5:3)(4:4)(5:3)(4:4)(4:4aCO)(6:2)_(4:4a)')
Or with fixed = TRUE to evaluate the literal meaning of that character
gsub('(4:4aCO)','', '(5:3)(4:4)(5:3)(4:4)(4:4aCO)(6:2)_(4:4a)', fixed = TRUE)

Resources