This question already has answers here:
How do I strip dollar signs ($) from data/ escape special characters in R?
(4 answers)
Closed 5 years ago.
I have a few columns that contain a $ in the value through the excel sheet.
[1] "$5,656.50" "$3,179.20" "$1,391.40" "$2,376.30" "$1,476.80" "$712.30" "$5,327.80"
[8] "$3,642.70" "$1,506.00" "$7,923.70" "$4,782.30" "$1,392.40" "$229.30" "$1,106.90"
[15] "$1,553.30" "$3,492.30" "$4,029.40" "$1,646.70" "$6,013.90" "$19,928.00" "$4,260.60"
There are >10,000 rows in this column and R will read it as a character due to the "$".
I tried
gsub( "$", " ", thedata$col.with.dollar.signs)
to replace the dollar sign with a space, but it didn't work.
Any other ideas are much appreciated.
This one maybe:
substring(thedata$col.with.dollar.signs, 2)
For example:
vec <- c("$5,656.50", "$3,179.20", "$1,391.40")
substring(vec,2)
#[1] "5,656.50" "3,179.20" "1,391.40"
Related
This question already has answers here:
Remove part of string after "."
(6 answers)
Extract string before "|" [duplicate]
(3 answers)
Closed 1 year ago.
I'm trying to extract matches preceding a pattern in R. Lets say that I have a vector consisting of the next elements:
my_vector
> [1] "ABCC12|94160" "ABCC13|150000" "ABCC1|4363" "ACTA1|58"
[5] "ADNP2|22850" "ADNP|23394" "ARID1B|57492" "ARID2|196528"
I'm looking for a regular expression to extract all characters preceding the "|". The expected result must be something like this:
my_new_vector
> [1] "ABCC12" "ABCC13" "ABCC1" "ACTA1"
and so on.
I have already tried using stringr functions and regular expressions based on look arounds, but I failed.
I really appreciate your advices and help to solve my issue.
Thanks in advance!
We could use trimws and specify the whitespace as a regex that matches the | (metacharacter - so escape \\ followed by one or more character (.*)
trimws(my_vector, whitespace = "\\|.*")
This question already has answers here:
remove repeated character between words
(4 answers)
Closed 3 years ago.
I have this text:
F <- "hhhappy birthhhhhhdayyy"
and I want to remove the repeat characters, I tried this code
https://stackoverflow.com/a/11165145/10718214
and it works, but I need to remove repeat characters if it repeats more than 2, and if it repeated 2 times keep it.
so the output that I expect is
"happy birthday"
any help?
Try using sub, with the pattern (.)\\1{2,}:
F <- ("hhhappy birthhhhhhdayyy")
gsub("(.)\\1{2,}", "\\1", F)
[1] "happy birthday"
Explanation of regex:
(.) match and capture any single character
\\1{2,} then match the same character two or more times
We replace with just the single matching character. The quantity \\1 represents the first capture group in sub.
This question already has answers here:
Matching multiple patterns
(6 answers)
Closed 5 years ago.
I am trying to only keep rows whose id contains letters. And I find the following two ways give different results.
df[grep("[A-Z]",df$id),]
df[grep(LETTERS,df$id),]
It seems the second way will omit many rows that actually have letters.
Why?
If you want to grep patterns in a vector try this:
to_match <- paste(LETTERS, collapse = "|")
to_match
[1] "A|B|C|D|E|F|G|H|I|J|K|L|M|N|O|P|Q|R|S|T|U|V|W|X|Y|Z"
and then
df[grep(to_match, df$id), ]
Explanation:
You will match any of the characters in "to_match" since they are separated by the "or" operator "|".
This question already has answers here:
My regex is matching too much. How do I make it stop? [duplicate]
(5 answers)
Use gsub remove all string before first white space in R
(4 answers)
Closed 5 years ago.
at the beginning, yes - simillar questions are present here, however the solution doesn't work as it should - at least for me.
I'd like to remove all characters, letters and numbers with any combination before first semicolon, and also remove it too.
So we have some strings:
x <- "1;ABC;GEF2"
y <- "X;EER;3DR"
Let's do so gsub() with . and * which means any symbol with occurance 0 or more:
gsub(".*;", "", x)
gsub(".*;", "", y)
And as a result i get:
[1] "GEF2"
[1] "3DR"
But I'd like to have:
[1] "ABC;GEF2"
[1] "EER;3DR"
Why did it 'catch' second occurence of semicolon instead of first?
You could use
gsub("[^;]*;(.*)", "\\1", x)
# [1] "ABC;GEF2"
This question already has answers here:
What is the difference between square brackets and parentheses in a regex?
(3 answers)
How to use double brackets in a regular expression?
(2 answers)
Closed 5 years ago.
I am trying to find out what is this regular expression [[:space:][:digit:]]+ stands for.
I learn from Wikipedia that [:space:] means Whitespace characters and [:digit:] means Digits from 0 to 9.
So I think [[:space:][:digit:]]+ matches any Whitespace characters followed by a digit like ' 1' or ' 9'.
But, when I try this in r:
> txt <- c("arm","foot","lefroo", "laura ")
> i <- grep("[[:space:][:digit:]]+", txt)
> txt[i]
[1] "laura "
there is no digit in "laura ", but it sill matched.
this really confused me, any one can explain this?