Column contains unit ($ sign) that need to be replaced [duplicate] - r

This question already has answers here:
How do I strip dollar signs ($) from data/ escape special characters in R?
(4 answers)
Closed 5 years ago.
I have a few columns that contain a $ in the value through the excel sheet.
[1] "$5,656.50" "$3,179.20" "$1,391.40" "$2,376.30" "$1,476.80" "$712.30" "$5,327.80"
[8] "$3,642.70" "$1,506.00" "$7,923.70" "$4,782.30" "$1,392.40" "$229.30" "$1,106.90"
[15] "$1,553.30" "$3,492.30" "$4,029.40" "$1,646.70" "$6,013.90" "$19,928.00" "$4,260.60"
There are >10,000 rows in this column and R will read it as a character due to the "$".
I tried
gsub( "$", " ", thedata$col.with.dollar.signs)
to replace the dollar sign with a space, but it didn't work.
Any other ideas are much appreciated.

This one maybe:
substring(thedata$col.with.dollar.signs, 2)
For example:
vec <- c("$5,656.50", "$3,179.20", "$1,391.40")
substring(vec,2)
#[1] "5,656.50" "3,179.20" "1,391.40"

Related

Regex: extracting matches preceding a pattern in R [duplicate]

This question already has answers here:
Remove part of string after "."
(6 answers)
Extract string before "|" [duplicate]
(3 answers)
Closed 1 year ago.
I'm trying to extract matches preceding a pattern in R. Lets say that I have a vector consisting of the next elements:
my_vector
> [1] "ABCC12|94160" "ABCC13|150000" "ABCC1|4363" "ACTA1|58"
[5] "ADNP2|22850" "ADNP|23394" "ARID1B|57492" "ARID2|196528"
I'm looking for a regular expression to extract all characters preceding the "|". The expected result must be something like this:
my_new_vector
> [1] "ABCC12" "ABCC13" "ABCC1" "ACTA1"
and so on.
I have already tried using stringr functions and regular expressions based on look arounds, but I failed.
I really appreciate your advices and help to solve my issue.
Thanks in advance!
We could use trimws and specify the whitespace as a regex that matches the | (metacharacter - so escape \\ followed by one or more character (.*)
trimws(my_vector, whitespace = "\\|.*")

Remove characters which repeat more than twice in a string [duplicate]

This question already has answers here:
remove repeated character between words
(4 answers)
Closed 3 years ago.
I have this text:
F <- "hhhappy birthhhhhhdayyy"
and I want to remove the repeat characters, I tried this code
https://stackoverflow.com/a/11165145/10718214
and it works, but I need to remove repeat characters if it repeats more than 2, and if it repeated 2 times keep it.
so the output that I expect is
"happy birthday"
any help?
Try using sub, with the pattern (.)\\1{2,}:
F <- ("hhhappy birthhhhhhdayyy")
gsub("(.)\\1{2,}", "\\1", F)
[1] "happy birthday"
Explanation of regex:
(.) match and capture any single character
\\1{2,} then match the same character two or more times
We replace with just the single matching character. The quantity \\1 represents the first capture group in sub.

Difference between [A-Z] and LETTERS in grep [duplicate]

This question already has answers here:
Matching multiple patterns
(6 answers)
Closed 5 years ago.
I am trying to only keep rows whose id contains letters. And I find the following two ways give different results.
df[grep("[A-Z]",df$id),]
df[grep(LETTERS,df$id),]
It seems the second way will omit many rows that actually have letters.
Why?
If you want to grep patterns in a vector try this:
to_match <- paste(LETTERS, collapse = "|")
to_match
[1] "A|B|C|D|E|F|G|H|I|J|K|L|M|N|O|P|Q|R|S|T|U|V|W|X|Y|Z"
and then
df[grep(to_match, df$id), ]
Explanation:
You will match any of the characters in "to_match" since they are separated by the "or" operator "|".

Remove characters in string before specific symbol(including it) [duplicate]

This question already has answers here:
My regex is matching too much. How do I make it stop? [duplicate]
(5 answers)
Use gsub remove all string before first white space in R
(4 answers)
Closed 5 years ago.
at the beginning, yes - simillar questions are present here, however the solution doesn't work as it should - at least for me.
I'd like to remove all characters, letters and numbers with any combination before first semicolon, and also remove it too.
So we have some strings:
x <- "1;ABC;GEF2"
y <- "X;EER;3DR"
Let's do so gsub() with . and * which means any symbol with occurance 0 or more:
gsub(".*;", "", x)
gsub(".*;", "", y)
And as a result i get:
[1] "GEF2"
[1] "3DR"
But I'd like to have:
[1] "ABC;GEF2"
[1] "EER;3DR"
Why did it 'catch' second occurence of semicolon instead of first?
You could use
gsub("[^;]*;(.*)", "\\1", x)
# [1] "ABC;GEF2"

what regular expression [[:space:][:digit:]]+ stands for in r [duplicate]

This question already has answers here:
What is the difference between square brackets and parentheses in a regex?
(3 answers)
How to use double brackets in a regular expression?
(2 answers)
Closed 5 years ago.
I am trying to find out what is this regular expression [[:space:][:digit:]]+ stands for.
I learn from Wikipedia that [:space:] means Whitespace characters and [:digit:] means Digits from 0 to 9.
So I think [[:space:][:digit:]]+ matches any Whitespace characters followed by a digit like ' 1' or ' 9'.
But, when I try this in r:
> txt <- c("arm","foot","lefroo", "laura ")
> i <- grep("[[:space:][:digit:]]+", txt)
> txt[i]
[1] "laura "
there is no digit in "laura ", but it sill matched.
this really confused me, any one can explain this?

Resources