This question already has answers here:
How to Convert "space" into "%20" with R
(4 answers)
Closed 6 years ago.
I have following string:
url <- https://www.google.mu/webhp?sourceid=chrome-instant&ion=1&espv=2&ie=UTF-8#q=green carot
I want to replace the space between green and carot with %20
>url
https://www.google.mu/webhp?sourceid=chrome-instant&ion=1&espv=2&ie=UTF-8#q=green%20carot
there are some functions to work with url.
In base R use URLencode
url <- "https://www.google.mu/webhp?sourceid=chrome-instant&ion=1&espv=2&ie=UTF-8#q=green carot"
URLencode(url)
#> [1] "https://www.google.mu/webhp?sourceid=chrome-instant&ion=1&espv=2&ie=UTF-8#q=green%20carot"
For a straight string replacement use:
> gsub(" ", "%20", url)
Though, a URL encode function like URLencode() would be better.
Related
This question already has answers here:
How to remove single quote from a string in R?
(3 answers)
How do I deal with special characters like \^$.?*|+()[{ in my regex?
(2 answers)
Closed 11 months ago.
For example for a string like this
NANYANG-GIRLS'-HIGH-SCHOOL
how do I use gsub to replace ' to empty and make it
NANYANG-GIRLS-HIGH-SCHOOL
when I do it in R, it shows error
You can use either of the following two approaches:
sec_name <- gsub('\'', '', sec_name, fixed=TRUE)
sec_name <- gsub("'", "", sec_name, fixed=TRUE)
This first approach is a correct version of what you were doing. Here, we use single quotes for the strings, but we escape the single quote to make it a literal single quote.
This question already has answers here:
in R, use gsub to remove all punctuation except period
(4 answers)
Closed 2 years ago.
I'm looking for a way to use a whitelist that contains digits and the Plus sign "+" to replace all other chars from a string.
string <- "opiqr8929348t89hr289r01++r42+3525"
I tried first to use:
gsub("[[:punct:][:alpha:]]", "", string)
but this excludes also the "+":
# [1] "89293488928901423525"
How can I exclude the "+" from [:alpha:] ?
So my intension is to use a whitelist instead:
whitelist <- c("0123456879+")
Is there a way to use gsub() in the other way around? Because when I use my whitelist it will identify the chars that should remain.
What about this:
string <- "opiqr8929348t89hr289r01++r42+3525"
gsub("[^0-9+]", "", string)
# [1] "89293488928901++42+3525"
This replaces everything that's not a 0-9 or plus with "".
This question already has answers here:
Get filename without extension in R
(9 answers)
Find file name from full file path
(4 answers)
Closed 3 years ago.
I have several download links (i.e., strings), and each string has different length.
For example let's say these fake links are my strings:
My_Link1 <- "http://esgf-data2.diasjp.net/pr/gn/v20190711/pr_day_MRI-AGCM3-2-H_highresSST_gn_20100101-20141231.nc"
My_Link2 <- "http://esgf-data2.diasjp.net/gn/v20190711/pr_-present_r1i1p1f1_gn_19500101-19591231.nc"
My goals:
A) I want to have only the last part of each string ended by .nc , and get these results:
pr_day_MRI-AGCM3-2-H_highresSST_gn_20100101-20141231.nc
pr_-present_r1i1p1f1_gn_19500101-19591231.nc
B) I want to have only the last part of each string before .nc , and get these results:
pr_day_MRI-AGCM3-2-H_highresSST_gn_20100101-20141231
pr_-present_r1i1p1f1_gn_19500101-19591231
I tried to find a way on the net, but I failed. It seems this can be done in Python as documented here:
How to get everything after last slash in a URL?
Does anyone know the same method in R?
Thanks so much for your time.
A shortcut to get last part of the string would be to use basename
basename(My_Link1)
#[1] "pr_day_MRI-AGCM3-2-H_highresSST_gn_20100101-20141231.nc"
and for the second question if you want to remove the last ".nc" we could use sub like
sub("\\.nc", "", basename(My_Link1))
#[1] "pr_day_MRI-AGCM3-2-H_highresSST_gn_20100101-20141231"
With some regex here is another way to get first part :
sub(".*/", "", My_Link1)
This question already has answers here:
How do I deal with special characters like \^$.?*|+()[{ in my regex?
(2 answers)
Closed 3 years ago.
I R imports columns with no colname as ...1 I need to replace this ... with something else
Trying:
str_replace("hi_...","/././.","&")
Seems like you are trying to replace each dot . with &. You need to escape . as \\. and use str_replace_all. Try this,
library(stringr)
str_replace_all("hi_...","\\.","&")
Output,
[1] "hi_&&&"
Just in case you want to replace all three dots with & (which I barely think you wanted), use this,
str_replace("hi_...","\\.\\.\\.","&")
OR
str_replace("hi_...","\\.+","&")
Another way to achieve same can be using gsub
gsub("\\.", "&", "hi_...")
We can use
library(stringr)
str_replace("hi_...", "[.]{3}", "&")
This question already has answers here:
How do I deal with special characters like \^$.?*|+()[{ in my regex?
(2 answers)
Closed 6 years ago.
I'm trying to use gsub to remove certain parts of a string. However, I can't get it to work, and I think it's because the string to be removed contains brackets. Is there any way around this? Thanks for any help.
The command I want to use:
gsub('(4:4aCO)_','', '(5:3)_(4:4)_(5:3)_(4:4)_(4:4aCO)_(6:2)_(4:4a)')
Returns:
#"(5:3)_(4:4)_(5:3)_(4:4)_(4:4aCO)_(6:2)_(4:4a)"
Expected output:
#"(5:3)_(4:4)_(5:3)_(4:4)_(6:2)_(4:4a)"
A quick test to see if brackets were the problem:
gsub('te','', 'test')
#[1] "st"
gsub('(te)','', '(te)st')
#[1] "()st"
We can by placing the brackets inside the square brackets as () is a metacharacter
gsub('[(]4:4aCO[)]','', '(5:3)(4:4)(5:3)(4:4)(4:4aCO)(6:2)_(4:4a)')
Or with fixed = TRUE to evaluate the literal meaning of that character
gsub('(4:4aCO)','', '(5:3)(4:4)(5:3)(4:4)(4:4aCO)(6:2)_(4:4a)', fixed = TRUE)