Replace Incomplete Bracket with Gsub [duplicate] - r

This question already has answers here:
How do I deal with special characters like \^$.?*|+()[{ in my regex?
(2 answers)
Closed 5 years ago.
I want to do a simple replace in R for the following column:
df
Songs
1 Saga (Skit) [feat. RZA
2 Revenge
3 Whatever You Want
4 What About Us
5 But We Lost It
6 Barbies
I want to do two different replacements:
1) Replace "[" with blank
2) Replace "]" with blank
Need to do this separately though because some of my values only has 1 on the brackets like the first value in the Songs column.
df[,1]<-gsub("[","",df[,1])
Error:
Error in gsub("[", "", newdf2[, 1]) :
invalid regular expression '[', reason 'Missing ']''
How do I go about going around this invalid regular expression error?
Thanks!

Sometimes you have to double escape things in R. This should work to do both the replacements in one go.
gsub("\\[|\\]", "", df$Songs)

The [ is a metacharacter, so it needs to be escaped
gsub("\\[|\\]", "", df$Songs)
Or other way is
gsub("[][]", "", df$Songs)

Related

Gsub in R for hyphens and digits [duplicate]

This question already has answers here:
Trim a string to a specific number of characters in R
(3 answers)
Using gsub in R to remove values in Zip Code field
(1 answer)
Closed 2 years ago.
I'm trying to use gsub on the df$Zipcode in the following data frame:
#Sample
df <-data.frame(ID = c(1,2,3,4,5,6,7),
Zipcode =c("10001-2838", "95011", "95011", "100028018", "84321", "84321", "94011"))
df
I want to take everything after the "-" (hyphen) out and replace it with nothing. Something like:
df$Zipcode <- gsub("\-", "", df$Zipcode)
But I don't think that is quite right. I also want to take the first 5 digits of all Zipcodes that are longer than 5 digits, like observation 4. Which should just be 10002. Maybe this is correct:
df$Zipcode <- gsub("[:6:]", "", df$Zipcode)
We can capture the first 5 characters that are not a - as a group and replace with the backreference (\\1) of the captured group
df$Zipcode <- sub("^([^-]{5}).*", "\\1", df$Zipcode)
df$Zipcode
#[1] "10001" "95011" "95011" "10002" "84321" "84321" "94011"
I think what you're looking for is this:
sub("(\\d{5}).*", "\\1", df$Zipcode)
[1] "10001" "95011" "95011" "10002" "84321" "84321" "94011"
This matches the first 5 digits, puts them into a capturing group, and 'remembers' them (but not the rest) via backreference \\1 in the replacement argument to sub.

Remove characters which repeat more than twice in a string [duplicate]

This question already has answers here:
remove repeated character between words
(4 answers)
Closed 3 years ago.
I have this text:
F <- "hhhappy birthhhhhhdayyy"
and I want to remove the repeat characters, I tried this code
https://stackoverflow.com/a/11165145/10718214
and it works, but I need to remove repeat characters if it repeats more than 2, and if it repeated 2 times keep it.
so the output that I expect is
"happy birthday"
any help?
Try using sub, with the pattern (.)\\1{2,}:
F <- ("hhhappy birthhhhhhdayyy")
gsub("(.)\\1{2,}", "\\1", F)
[1] "happy birthday"
Explanation of regex:
(.) match and capture any single character
\\1{2,} then match the same character two or more times
We replace with just the single matching character. The quantity \\1 represents the first capture group in sub.

Remove characters in string before specific symbol(including it) [duplicate]

This question already has answers here:
My regex is matching too much. How do I make it stop? [duplicate]
(5 answers)
Use gsub remove all string before first white space in R
(4 answers)
Closed 5 years ago.
at the beginning, yes - simillar questions are present here, however the solution doesn't work as it should - at least for me.
I'd like to remove all characters, letters and numbers with any combination before first semicolon, and also remove it too.
So we have some strings:
x <- "1;ABC;GEF2"
y <- "X;EER;3DR"
Let's do so gsub() with . and * which means any symbol with occurance 0 or more:
gsub(".*;", "", x)
gsub(".*;", "", y)
And as a result i get:
[1] "GEF2"
[1] "3DR"
But I'd like to have:
[1] "ABC;GEF2"
[1] "EER;3DR"
Why did it 'catch' second occurence of semicolon instead of first?
You could use
gsub("[^;]*;(.*)", "\\1", x)
# [1] "ABC;GEF2"

what regular expression [[:space:][:digit:]]+ stands for in r [duplicate]

This question already has answers here:
What is the difference between square brackets and parentheses in a regex?
(3 answers)
How to use double brackets in a regular expression?
(2 answers)
Closed 5 years ago.
I am trying to find out what is this regular expression [[:space:][:digit:]]+ stands for.
I learn from Wikipedia that [:space:] means Whitespace characters and [:digit:] means Digits from 0 to 9.
So I think [[:space:][:digit:]]+ matches any Whitespace characters followed by a digit like ' 1' or ' 9'.
But, when I try this in r:
> txt <- c("arm","foot","lefroo", "laura ")
> i <- grep("[[:space:][:digit:]]+", txt)
> txt[i]
[1] "laura "
there is no digit in "laura ", but it sill matched.
this really confused me, any one can explain this?

Column contains unit ($ sign) that need to be replaced [duplicate]

This question already has answers here:
How do I strip dollar signs ($) from data/ escape special characters in R?
(4 answers)
Closed 5 years ago.
I have a few columns that contain a $ in the value through the excel sheet.
[1] "$5,656.50" "$3,179.20" "$1,391.40" "$2,376.30" "$1,476.80" "$712.30" "$5,327.80"
[8] "$3,642.70" "$1,506.00" "$7,923.70" "$4,782.30" "$1,392.40" "$229.30" "$1,106.90"
[15] "$1,553.30" "$3,492.30" "$4,029.40" "$1,646.70" "$6,013.90" "$19,928.00" "$4,260.60"
There are >10,000 rows in this column and R will read it as a character due to the "$".
I tried
gsub( "$", " ", thedata$col.with.dollar.signs)
to replace the dollar sign with a space, but it didn't work.
Any other ideas are much appreciated.
This one maybe:
substring(thedata$col.with.dollar.signs, 2)
For example:
vec <- c("$5,656.50", "$3,179.20", "$1,391.40")
substring(vec,2)
#[1] "5,656.50" "3,179.20" "1,391.40"

Resources