'Contains' function [duplicate] - r

This question already has answers here:
Test if characters are in a string
(9 answers)
Closed 4 years ago.
I need to know if there are any functions available in R that allow me to check if one string contains a substring and return a boolean. I've already tried str_detect but that doesn't suit my need.
For example:
string = 12345REFUND4567
and
substring = REFUND
contains(string,substring) would ideally return TRUE
since 12345REFUND4567 contains REFUND.
contains(string,substring) is just the format I'd imagine the function to take.

You probably are looking for grepl:
string <- "12345REFUND4567"
grepl("REFUND", string, fixed=TRUE)
[1] TRUE

Related

Is there a way to keep only defined charaters in a string from a whitelist? [duplicate]

This question already has answers here:
in R, use gsub to remove all punctuation except period
(4 answers)
Closed 2 years ago.
I'm looking for a way to use a whitelist that contains digits and the Plus sign "+" to replace all other chars from a string.
string <- "opiqr8929348t89hr289r01++r42+3525"
I tried first to use:
gsub("[[:punct:][:alpha:]]", "", string)
but this excludes also the "+":
# [1] "89293488928901423525"
How can I exclude the "+" from [:alpha:] ?
So my intension is to use a whitelist instead:
whitelist <- c("0123456879+")
Is there a way to use gsub() in the other way around? Because when I use my whitelist it will identify the chars that should remain.
What about this:
string <- "opiqr8929348t89hr289r01++r42+3525"
gsub("[^0-9+]", "", string)
# [1] "89293488928901++42+3525"
This replaces everything that's not a 0-9 or plus with "".

how to get the last part of strings with different lengths ended by ".nc" [duplicate]

This question already has answers here:
Get filename without extension in R
(9 answers)
Find file name from full file path
(4 answers)
Closed 3 years ago.
I have several download links (i.e., strings), and each string has different length.
For example let's say these fake links are my strings:
My_Link1 <- "http://esgf-data2.diasjp.net/pr/gn/v20190711/pr_day_MRI-AGCM3-2-H_highresSST_gn_20100101-20141231.nc"
My_Link2 <- "http://esgf-data2.diasjp.net/gn/v20190711/pr_-present_r1i1p1f1_gn_19500101-19591231.nc"
My goals:
A) I want to have only the last part of each string ended by .nc , and get these results:
pr_day_MRI-AGCM3-2-H_highresSST_gn_20100101-20141231.nc
pr_-present_r1i1p1f1_gn_19500101-19591231.nc
B) I want to have only the last part of each string before .nc , and get these results:
pr_day_MRI-AGCM3-2-H_highresSST_gn_20100101-20141231
pr_-present_r1i1p1f1_gn_19500101-19591231
I tried to find a way on the net, but I failed. It seems this can be done in Python as documented here:
How to get everything after last slash in a URL?
Does anyone know the same method in R?
Thanks so much for your time.
A shortcut to get last part of the string would be to use basename
basename(My_Link1)
#[1] "pr_day_MRI-AGCM3-2-H_highresSST_gn_20100101-20141231.nc"
and for the second question if you want to remove the last ".nc" we could use sub like
sub("\\.nc", "", basename(My_Link1))
#[1] "pr_day_MRI-AGCM3-2-H_highresSST_gn_20100101-20141231"
With some regex here is another way to get first part :
sub(".*/", "", My_Link1)

How to remove beginning-digits only in R [duplicate]

This question already has answers here:
Remove numbers at the beginning and end of a string
(3 answers)
Remove string from a vector in R
(4 answers)
Closed 5 years ago.
I have some strings with digits and alpha characters in them. Some of the digits are important, but the ones at the beginning of the string (and only these) are unimportant. This is due to a peculiarity in how email addresses are stored. So the best example is:
x<-'12345johndoe23#gmail.com'
Should be transformed to johndoe23#gmail.com
unfortunately there are no spaces. I have tried gsub('[[:digit:]]+', '', x) but this removes all numbers, not just the beginning-ones
Edit: I have found some solutions in other languages: Python: Remove numbers at the beginning of a string
As per my comment:
See regex in use here
^[[:digit:]]+
^ Asserts position at the start of the string
You can do this:
x<-'12345johndoe23#gmail.com'
gsub('^[[:digit:]]+', '', x) #added ^ as begin of string
Another regex is :
sub('^\\d+','',x)

Replace words that start with a period [duplicate]

This question already has answers here:
What special characters must be escaped in regular expressions?
(13 answers)
Closed 5 years ago.
I'm trying to fix a dataset that has some errors of decimal numbers wrongly typed. For example, some entries were typed as ".15" instead of "0.15". Currently this column is chr but later I need to convert it to numeric.
I'm trying to select all of those "words" that start with a period "." and replace the period with "0." but it seems that the "^" used to anchor the start of the string doesn't work nicely with the period.
I tried with:
dataIMN$precip <- str_replace (dataIMN$precip, "^.", "0.")
But it puts a 0 at the beginning of all the entries, including the ones that are correctly typed (those that don't start with a period).
If you need to do as you've stated, brackets [] are regex for 'find exact', or you can use '\\' which escapes a character, such as a period:
Option 1:
gsub("^[.]","0.",".54")
[1] "0.54"
Option 2:
gsub("^\\.","0.",".54")
[1] "0.54"
Otherwise, as.numeric should also take care of it automatically.

Using Gsub in R to remove a string containing brackets [duplicate]

This question already has answers here:
How do I deal with special characters like \^$.?*|+()[{ in my regex?
(2 answers)
Closed 6 years ago.
I'm trying to use gsub to remove certain parts of a string. However, I can't get it to work, and I think it's because the string to be removed contains brackets. Is there any way around this? Thanks for any help.
The command I want to use:
gsub('(4:4aCO)_','', '(5:3)_(4:4)_(5:3)_(4:4)_(4:4aCO)_(6:2)_(4:4a)')
Returns:
#"(5:3)_(4:4)_(5:3)_(4:4)_(4:4aCO)_(6:2)_(4:4a)"
Expected output:
#"(5:3)_(4:4)_(5:3)_(4:4)_(6:2)_(4:4a)"
A quick test to see if brackets were the problem:
gsub('te','', 'test')
#[1] "st"
gsub('(te)','', '(te)st')
#[1] "()st"
We can by placing the brackets inside the square brackets as () is a metacharacter
gsub('[(]4:4aCO[)]','', '(5:3)(4:4)(5:3)(4:4)(4:4aCO)(6:2)_(4:4a)')
Or with fixed = TRUE to evaluate the literal meaning of that character
gsub('(4:4aCO)','', '(5:3)(4:4)(5:3)(4:4)(4:4aCO)(6:2)_(4:4a)', fixed = TRUE)

Resources