what regular expression [[:space:][:digit:]]+ stands for in r [duplicate] - r

This question already has answers here:
What is the difference between square brackets and parentheses in a regex?
(3 answers)
How to use double brackets in a regular expression?
(2 answers)
Closed 5 years ago.
I am trying to find out what is this regular expression [[:space:][:digit:]]+ stands for.
I learn from Wikipedia that [:space:] means Whitespace characters and [:digit:] means Digits from 0 to 9.
So I think [[:space:][:digit:]]+ matches any Whitespace characters followed by a digit like ' 1' or ' 9'.
But, when I try this in r:
> txt <- c("arm","foot","lefroo", "laura ")
> i <- grep("[[:space:][:digit:]]+", txt)
> txt[i]
[1] "laura "
there is no digit in "laura ", but it sill matched.
this really confused me, any one can explain this?

Related

Regex: extracting matches preceding a pattern in R [duplicate]

This question already has answers here:
Remove part of string after "."
(6 answers)
Extract string before "|" [duplicate]
(3 answers)
Closed 1 year ago.
I'm trying to extract matches preceding a pattern in R. Lets say that I have a vector consisting of the next elements:
my_vector
> [1] "ABCC12|94160" "ABCC13|150000" "ABCC1|4363" "ACTA1|58"
[5] "ADNP2|22850" "ADNP|23394" "ARID1B|57492" "ARID2|196528"
I'm looking for a regular expression to extract all characters preceding the "|". The expected result must be something like this:
my_new_vector
> [1] "ABCC12" "ABCC13" "ABCC1" "ACTA1"
and so on.
I have already tried using stringr functions and regular expressions based on look arounds, but I failed.
I really appreciate your advices and help to solve my issue.
Thanks in advance!
We could use trimws and specify the whitespace as a regex that matches the | (metacharacter - so escape \\ followed by one or more character (.*)
trimws(my_vector, whitespace = "\\|.*")

Regex capture 1 character [duplicate]

This question already has answers here:
Complete word matching using grepl in R
(3 answers)
Closed 4 years ago.
Whenever english character of length 1 exists, I want that to be combined with the previous text.
gsub('(.*)\\s+([a-zA-Z]{1})', "\\1\\2", 'Anti-Candida a ингибинов')
Anti-Candidaa ингибинов
For the example below, it should return 'Anti-Candida am ингибинов' as 'am' is of length 2.
gsub('(.*)\\s+([a-zA-Z]{1})', "\\1\\2", 'Anti-Candida am ингибинов')
You can use this regex:
\W+([a-zA-Z])\b
replace with \\1. The trick here is to match a word boundary after the single letter.
Demo
Your regex will work as well, if you just add that \b at the end.

Remove characters in string before specific symbol(including it) [duplicate]

This question already has answers here:
My regex is matching too much. How do I make it stop? [duplicate]
(5 answers)
Use gsub remove all string before first white space in R
(4 answers)
Closed 5 years ago.
at the beginning, yes - simillar questions are present here, however the solution doesn't work as it should - at least for me.
I'd like to remove all characters, letters and numbers with any combination before first semicolon, and also remove it too.
So we have some strings:
x <- "1;ABC;GEF2"
y <- "X;EER;3DR"
Let's do so gsub() with . and * which means any symbol with occurance 0 or more:
gsub(".*;", "", x)
gsub(".*;", "", y)
And as a result i get:
[1] "GEF2"
[1] "3DR"
But I'd like to have:
[1] "ABC;GEF2"
[1] "EER;3DR"
Why did it 'catch' second occurence of semicolon instead of first?
You could use
gsub("[^;]*;(.*)", "\\1", x)
# [1] "ABC;GEF2"

Replace Incomplete Bracket with Gsub [duplicate]

This question already has answers here:
How do I deal with special characters like \^$.?*|+()[{ in my regex?
(2 answers)
Closed 5 years ago.
I want to do a simple replace in R for the following column:
df
Songs
1 Saga (Skit) [feat. RZA
2 Revenge
3 Whatever You Want
4 What About Us
5 But We Lost It
6 Barbies
I want to do two different replacements:
1) Replace "[" with blank
2) Replace "]" with blank
Need to do this separately though because some of my values only has 1 on the brackets like the first value in the Songs column.
df[,1]<-gsub("[","",df[,1])
Error:
Error in gsub("[", "", newdf2[, 1]) :
invalid regular expression '[', reason 'Missing ']''
How do I go about going around this invalid regular expression error?
Thanks!
Sometimes you have to double escape things in R. This should work to do both the replacements in one go.
gsub("\\[|\\]", "", df$Songs)
The [ is a metacharacter, so it needs to be escaped
gsub("\\[|\\]", "", df$Songs)
Or other way is
gsub("[][]", "", df$Songs)

Column contains unit ($ sign) that need to be replaced [duplicate]

This question already has answers here:
How do I strip dollar signs ($) from data/ escape special characters in R?
(4 answers)
Closed 5 years ago.
I have a few columns that contain a $ in the value through the excel sheet.
[1] "$5,656.50" "$3,179.20" "$1,391.40" "$2,376.30" "$1,476.80" "$712.30" "$5,327.80"
[8] "$3,642.70" "$1,506.00" "$7,923.70" "$4,782.30" "$1,392.40" "$229.30" "$1,106.90"
[15] "$1,553.30" "$3,492.30" "$4,029.40" "$1,646.70" "$6,013.90" "$19,928.00" "$4,260.60"
There are >10,000 rows in this column and R will read it as a character due to the "$".
I tried
gsub( "$", " ", thedata$col.with.dollar.signs)
to replace the dollar sign with a space, but it didn't work.
Any other ideas are much appreciated.
This one maybe:
substring(thedata$col.with.dollar.signs, 2)
For example:
vec <- c("$5,656.50", "$3,179.20", "$1,391.40")
substring(vec,2)
#[1] "5,656.50" "3,179.20" "1,391.40"

Resources