Separating "-" from Text in R [duplicate] - r

This question already has answers here:
R - remove anything after comma from column
(5 answers)
Closed 4 years ago.
I am trying to remove all text after "-" including the "-" such as "Albany-Schenectady, Allentown-Bethlehem" etc.
I tried using Gsub, but having trouble getting the code to work.
#What I tried to make work
gsub("(.*)-.*", "\\1",

That's an incomplete line of code and not really even on the right track for what you've described. This should work.
gsub("-.*", "", vector)
The first argument tells it to grab the hyphen and everything after it to be replaced by the second argument, an empty string. The third argument is the vector you're performing the operation on.

Related

how to get the last part of strings with different lengths ended by ".nc" [duplicate]

This question already has answers here:
Get filename without extension in R
(9 answers)
Find file name from full file path
(4 answers)
Closed 3 years ago.
I have several download links (i.e., strings), and each string has different length.
For example let's say these fake links are my strings:
My_Link1 <- "http://esgf-data2.diasjp.net/pr/gn/v20190711/pr_day_MRI-AGCM3-2-H_highresSST_gn_20100101-20141231.nc"
My_Link2 <- "http://esgf-data2.diasjp.net/gn/v20190711/pr_-present_r1i1p1f1_gn_19500101-19591231.nc"
My goals:
A) I want to have only the last part of each string ended by .nc , and get these results:
pr_day_MRI-AGCM3-2-H_highresSST_gn_20100101-20141231.nc
pr_-present_r1i1p1f1_gn_19500101-19591231.nc
B) I want to have only the last part of each string before .nc , and get these results:
pr_day_MRI-AGCM3-2-H_highresSST_gn_20100101-20141231
pr_-present_r1i1p1f1_gn_19500101-19591231
I tried to find a way on the net, but I failed. It seems this can be done in Python as documented here:
How to get everything after last slash in a URL?
Does anyone know the same method in R?
Thanks so much for your time.
A shortcut to get last part of the string would be to use basename
basename(My_Link1)
#[1] "pr_day_MRI-AGCM3-2-H_highresSST_gn_20100101-20141231.nc"
and for the second question if you want to remove the last ".nc" we could use sub like
sub("\\.nc", "", basename(My_Link1))
#[1] "pr_day_MRI-AGCM3-2-H_highresSST_gn_20100101-20141231"
With some regex here is another way to get first part :
sub(".*/", "", My_Link1)

Creating a string in R with " in it [duplicate]

This question already has an answer here:
paste quotation marks into character string, within a loop
(1 answer)
Closed 5 years ago.
I am trying to get text given between “ ” to make a string. But because string has two “ already in it, I am not able to do so.
?jql=filter%20=%20"Plan%20Standup%20-%20Mutual-SA"
When I am trying to input , it is giving me an error.
Input <- "?jql=filter%20=%20"Plan%20Standup%20-%20Mutual-SA""
I tried many escape characters, but always I got an error message.
Error: unexpected symbol in "input <- "?jql=filter%20=%20"Plan"
any help will be highly appreciated
In the string, there is already a double quote. So, we can wrap it with single quotes
Input <- '?jql=filter%20=%20"Plan%20Standup%20-%20Mutual-SA'
cat(Input, "\n")
#?jql=filter%20=%20"Plan%20Standup%20-%20Mutual-SA
"Escape" the character like this
a <- "\""

Replace words that start with a period [duplicate]

This question already has answers here:
What special characters must be escaped in regular expressions?
(13 answers)
Closed 5 years ago.
I'm trying to fix a dataset that has some errors of decimal numbers wrongly typed. For example, some entries were typed as ".15" instead of "0.15". Currently this column is chr but later I need to convert it to numeric.
I'm trying to select all of those "words" that start with a period "." and replace the period with "0." but it seems that the "^" used to anchor the start of the string doesn't work nicely with the period.
I tried with:
dataIMN$precip <- str_replace (dataIMN$precip, "^.", "0.")
But it puts a 0 at the beginning of all the entries, including the ones that are correctly typed (those that don't start with a period).
If you need to do as you've stated, brackets [] are regex for 'find exact', or you can use '\\' which escapes a character, such as a period:
Option 1:
gsub("^[.]","0.",".54")
[1] "0.54"
Option 2:
gsub("^\\.","0.",".54")
[1] "0.54"
Otherwise, as.numeric should also take care of it automatically.

Using Gsub in R to remove a string containing brackets [duplicate]

This question already has answers here:
How do I deal with special characters like \^$.?*|+()[{ in my regex?
(2 answers)
Closed 6 years ago.
I'm trying to use gsub to remove certain parts of a string. However, I can't get it to work, and I think it's because the string to be removed contains brackets. Is there any way around this? Thanks for any help.
The command I want to use:
gsub('(4:4aCO)_','', '(5:3)_(4:4)_(5:3)_(4:4)_(4:4aCO)_(6:2)_(4:4a)')
Returns:
#"(5:3)_(4:4)_(5:3)_(4:4)_(4:4aCO)_(6:2)_(4:4a)"
Expected output:
#"(5:3)_(4:4)_(5:3)_(4:4)_(6:2)_(4:4a)"
A quick test to see if brackets were the problem:
gsub('te','', 'test')
#[1] "st"
gsub('(te)','', '(te)st')
#[1] "()st"
We can by placing the brackets inside the square brackets as () is a metacharacter
gsub('[(]4:4aCO[)]','', '(5:3)(4:4)(5:3)(4:4)(4:4aCO)(6:2)_(4:4a)')
Or with fixed = TRUE to evaluate the literal meaning of that character
gsub('(4:4aCO)','', '(5:3)(4:4)(5:3)(4:4)(4:4aCO)(6:2)_(4:4a)', fixed = TRUE)

replacing text with a single backslash [duplicate]

This question already has answers here:
Replacing white space with one single backslash
(2 answers)
Closed 6 years ago.
I have this text and I want to replace // with \
This is the text sdfd//dfsadfs
and I want it to be sdfd\dfsadfs
Can gsub work? This does not work: gsub("//","[\]","sdfd//dfsadfs")
I had a similar problem before. Like #Psidom commented, you should use gsub("//","\\\\","sdfd//dfsadfs"). This will replace //(2 characters) with \\ which is actually a single character in R (Check by running nchar("\\")). Even though it is prints as \\, it behaves as \. You can check this by running cat("\\"). If you exported the data after running gsub to a table (or csv), I believe there will be only one \

Resources