Replace words that start with a period [duplicate] - r

This question already has answers here:
What special characters must be escaped in regular expressions?
(13 answers)
Closed 5 years ago.
I'm trying to fix a dataset that has some errors of decimal numbers wrongly typed. For example, some entries were typed as ".15" instead of "0.15". Currently this column is chr but later I need to convert it to numeric.
I'm trying to select all of those "words" that start with a period "." and replace the period with "0." but it seems that the "^" used to anchor the start of the string doesn't work nicely with the period.
I tried with:
dataIMN$precip <- str_replace (dataIMN$precip, "^.", "0.")
But it puts a 0 at the beginning of all the entries, including the ones that are correctly typed (those that don't start with a period).

If you need to do as you've stated, brackets [] are regex for 'find exact', or you can use '\\' which escapes a character, such as a period:
Option 1:
gsub("^[.]","0.",".54")
[1] "0.54"
Option 2:
gsub("^\\.","0.",".54")
[1] "0.54"
Otherwise, as.numeric should also take care of it automatically.

Related

Is there a way to keep only defined charaters in a string from a whitelist? [duplicate]

This question already has answers here:
in R, use gsub to remove all punctuation except period
(4 answers)
Closed 2 years ago.
I'm looking for a way to use a whitelist that contains digits and the Plus sign "+" to replace all other chars from a string.
string <- "opiqr8929348t89hr289r01++r42+3525"
I tried first to use:
gsub("[[:punct:][:alpha:]]", "", string)
but this excludes also the "+":
# [1] "89293488928901423525"
How can I exclude the "+" from [:alpha:] ?
So my intension is to use a whitelist instead:
whitelist <- c("0123456879+")
Is there a way to use gsub() in the other way around? Because when I use my whitelist it will identify the chars that should remain.
What about this:
string <- "opiqr8929348t89hr289r01++r42+3525"
gsub("[^0-9+]", "", string)
# [1] "89293488928901++42+3525"
This replaces everything that's not a 0-9 or plus with "".

Renaming a column that R recognizes as a function [duplicate]

This question already has answers here:
How do I deal with special characters like \^$.?*|+()[{ in my regex?
(2 answers)
Closed 2 years ago.
I have a large data set with set column names, I need to rename the columns. Here are the columns:
class(spec_sub_act_feat_all)
[1] "data.frame"
names(spec_sub_act_feat_all[,82:86])
[1] "angle(tBodyAccMean,gravity)" "angle(tBodyAccJerkMean),gravityMean)"
[3] "angle(tBodyGyroMean,gravityMean)" "angle(tBodyGyroJerkMean,gravityMean)"
[5] "angle(X,gravityMean)"
class(names(spec_sub_act_feat_all[,82:88]))
[1] "character"
However, when I try to rename the columns using
names(spec_sub_act_feat_all) <- gsub(`angle(tBodyAccMean,gravity)`,
'Vector of mean body accelerometer signal',names(spec_sub_act_feat_all))
Error in gsub(`angle(tBodyAccMean,gravity)`, "Vector of mean body accelerometer signal", :
object 'angle(tBodyAccMean,gravity)' not found
or simply
names(spec_sub_act_feat_all[,82])<-"Vector of mean body accelerometer signal"
Neither one works. I believe my problem is that R is recognizing the column name as an actual function and won't let me select the character string to change it. The first renaming I tried with gsub(), I used `` to try to select the column name as not a function, which was recommended in another post, but did not work for me. I did notice that I could substitute out what was in the () but not the whole 'angle(...)' part.
As the ( and ) are metacharacters to capture the group in regex mode, if we use this in gsub, the default option is fixed = FALSE i.e. regex mode, either we need to escape (\\() or place it in square brackets ([(]) to literally evaluate it in regex mode or we can specify fixed = TRUE
gsub("angle(tBodyAccMean,gravity)",
'Vector of mean body accelerometer signal',
names(spec_sub_act_feat_all), fixed = TRUE)

How to remove '+ off' from the end of string? [duplicate]

This question already has answers here:
How do I deal with special characters like \^$.?*|+()[{ in my regex?
(2 answers)
Closed 4 years ago.
Similar to R - delete last two characters in string if they match criteria except I'm trying to get rid of the special character '+' as well. I also attached a picture of my output.
When I attempt to use the escape command of '+', I get an error message saying
Error: '\+' is an unrecognized escape in character string starting ""\\s\+"
As you noticed, + is a metacharacter in regex so it needs to be escaped. \+ escapes that character, but \, itself, is a special character in R character strings so it, too, needs to be escaped. This is an R requirement, not a regex requirement.
This means that, instead of '\+', you need to write '\\+'.

How to remove beginning-digits only in R [duplicate]

This question already has answers here:
Remove numbers at the beginning and end of a string
(3 answers)
Remove string from a vector in R
(4 answers)
Closed 5 years ago.
I have some strings with digits and alpha characters in them. Some of the digits are important, but the ones at the beginning of the string (and only these) are unimportant. This is due to a peculiarity in how email addresses are stored. So the best example is:
x<-'12345johndoe23#gmail.com'
Should be transformed to johndoe23#gmail.com
unfortunately there are no spaces. I have tried gsub('[[:digit:]]+', '', x) but this removes all numbers, not just the beginning-ones
Edit: I have found some solutions in other languages: Python: Remove numbers at the beginning of a string
As per my comment:
See regex in use here
^[[:digit:]]+
^ Asserts position at the start of the string
You can do this:
x<-'12345johndoe23#gmail.com'
gsub('^[[:digit:]]+', '', x) #added ^ as begin of string
Another regex is :
sub('^\\d+','',x)

Using Gsub in R to remove a string containing brackets [duplicate]

This question already has answers here:
How do I deal with special characters like \^$.?*|+()[{ in my regex?
(2 answers)
Closed 6 years ago.
I'm trying to use gsub to remove certain parts of a string. However, I can't get it to work, and I think it's because the string to be removed contains brackets. Is there any way around this? Thanks for any help.
The command I want to use:
gsub('(4:4aCO)_','', '(5:3)_(4:4)_(5:3)_(4:4)_(4:4aCO)_(6:2)_(4:4a)')
Returns:
#"(5:3)_(4:4)_(5:3)_(4:4)_(4:4aCO)_(6:2)_(4:4a)"
Expected output:
#"(5:3)_(4:4)_(5:3)_(4:4)_(6:2)_(4:4a)"
A quick test to see if brackets were the problem:
gsub('te','', 'test')
#[1] "st"
gsub('(te)','', '(te)st')
#[1] "()st"
We can by placing the brackets inside the square brackets as () is a metacharacter
gsub('[(]4:4aCO[)]','', '(5:3)(4:4)(5:3)(4:4)(4:4aCO)(6:2)_(4:4a)')
Or with fixed = TRUE to evaluate the literal meaning of that character
gsub('(4:4aCO)','', '(5:3)(4:4)(5:3)(4:4)(4:4aCO)(6:2)_(4:4a)', fixed = TRUE)

Resources