df <- data.frame(
videos = c("Moon vs Grubby", "Moon vs Happy", "Happy vs Th00"),
links = c("https://www.youtube.com/watch?v=QlNc-jb4ESk&t", "https://www.youtube.com/watch?v=VESO8YQVFSE", "https://www.youtube.com/watch?v=RI3IJT8ZzBM")
df$links <- as.character(df$links)
df$links <- gsub("watch?v=", "embed/", df$links)
I have got the following code with links to YouTube which I want to embed in a shiny App. However YouTube needs to replace part of the string which is interpreted as a regular expression. I did not find a helpful solution here.
So how can I gsub this pattern?
Current Links:
Expected Outcome:

We need to escape the ? and = as these are metacharacters
gsub("watch\\?v\\=", "embed/=", df$links)
or with fixed = TRUE
gsub("watch?v=", "embed/=", df$links, fixed = TRUE)
Also, as there is only a single instance, we can use sub
sub("watch?v=", "embed/=", df$links, fixed = TRUE)
#[1] "https://www.youtube.com/embed/=QlNc-jb4ESk&t"
#[2] "https://www.youtube.com/embed/=VESO8YQVFSE"
#[3] "https://www.youtube.com/embed/=RI3IJT8ZzBM"

My guess is that this expression might work:
The expression is explained on the top right panel of this demo, if you wish to explore further or modify it, and in this link, you can watch how it would match against some sample inputs step by step, if you like.
and our code might look like:
gsub("(\\S*)watch\\?v\\=(\\S*)", "\\1embed/\\2", df$links)
My guess is that this would be the desired output:


Remove multiple instances with a regex expression, but not the text in between instances [duplicate]

In long passages using bookdown, I have inserted numerous images. Having combined the passages into a single character string (in a data frame) I want to remove the markdown text associated with inserting images, but not any text in between those inserted images. Here is a toy example.
text.string <- "writing ![Stairway scene](/media/ClothesFairLady.jpg) writing to keep ![Second scene](/media/attire.jpg) more writing"
str_remove_all(string = text.string, pattern = "!\\[.+\\)")
[1] "writing more writing"
The regex expression doesn't stop at the first closed parenthesis, it continues until the last one and deletes the "writing to keep" in between.
I tried to apply String manipulation in R: remove specific pattern in multiple places without removing text in between instances of the pattern, which uses gsubfn and gsub but was unable to get the solutions to work.
Please point me in the right direction to solve this problem of a regex removal of designated strings, but not the characters in between the strings. I would prefer a stringr solution, but whatever works. Thank you
You have to use the following regex
alternatively you can also use this:
both solution offer a lazy match rather than a greedy one, which is the key to your question
I think you could use the following solution too:
gsub("!\\[[^][]*\\]\\([^()]*\\)", "", text.string)
[1] "writing writing to keep more writing"

Replace latex with r strings using gsub [duplicate]

I would like to find and replace tabular instances by tabularx. I tried with gsub but it seems to enter me into a world of escaping pain. Following other questions and answers I find fixed=TRUE which is the best I so far have. The code snippet below almost works, \B is unrecognized. If I escape it twice I get \BEGIN as output!
texText <- '\begin{tabular}{rl}\begin{tabular}{rll}'
texText <- gsub("\begin{tabular}{rl}", "\BEGIN{tabular}{rll}", texText, fixed=TRUE)
I'm using BEGIN as my test to see what is happening. This is before I get to tackling the question of what goes on in the brackets {rl} {ll} {rrl} etc. Ideally I'm looking for a regex that would output:
That is the final column is replaced by X.
Try using proper escaping:
texText <- "\begin{tabular}{rl}\begin{tabular}{rll}"
output <- gsub("\begin\\{tabular\\}", "\begin{tabularx}", texText)
[1] "\begin{tabularx}{rl}\begin{tabularx}{rll}"
A literal backslash requires two backslashes, and also metacharacters such as { and } require two backslashes.

How to use regex to match upto third forward slash in R using gsub? [duplicate]

So this question is relating to specifically how R handles regex - I would like to find some regex in conjunction with gsub to extract out the text all but before the 3rd forward slash.
Here are some string examples:
I would like to obtain the following strings only:
So it is not keeping the information after the 3rd forward slash.
I cannot seem to get any regex working along with using gsub to solve this!
The closest I have got is:
gsub(pattern = "/[A-Za-z0-9_.-]/[A-Za-z0-9_.-]*$", replacement = "", x = the_data_above )
I think R has some issues regarding forward slashes and escaping them.
From the start of the string match two instances of slash and following non-slash characters followed by anything and replace with the two instances.
paths <- c("/google.com/images/video", "/msn.com/bing/chat", "/bbc.com/video")
sub("^((/[^/]*){2}).*", "\\1", paths)
## [1] "/google.com/images" "/msn.com/bing" "/bbc.com/video"
You can take advantage of lazy (vs greedy) matching by adding the ? after the quantifier (+ in this case) within your capture group:
gsub("(/.+?/.+?)/.*", "\\1", text)
[1] "/google.com/images" "/msn.com/bing" "/bbc.com/video"
text <- c("/google.com/images/video",
Try this out:
As seen here: https://regex101.com/r/9ZYppe/1
Your problem arises from the fact that [A-Za-z0-9_.-] matches only one such character. You need to use the + operator to specify that there are multiple of them. Also, the $ at the end is pretty unnecessary because using ^ to assert the start of the sentence solves a great many problems.

how to get the last part of strings with different lengths ended by ".nc" [duplicate]

I have several download links (i.e., strings), and each string has different length.
For example let's say these fake links are my strings:
My_Link1 <- "http://esgf-data2.diasjp.net/pr/gn/v20190711/pr_day_MRI-AGCM3-2-H_highresSST_gn_20100101-20141231.nc"
My_Link2 <- "http://esgf-data2.diasjp.net/gn/v20190711/pr_-present_r1i1p1f1_gn_19500101-19591231.nc"
My goals:
A) I want to have only the last part of each string ended by .nc , and get these results:
B) I want to have only the last part of each string before .nc , and get these results:
Thanks so much for your time.
A shortcut to get last part of the string would be to use basename
#[1] "pr_day_MRI-AGCM3-2-H_highresSST_gn_20100101-20141231.nc"
and for the second question if you want to remove the last ".nc" we could use sub like
sub("\\.nc", "", basename(My_Link1))
#[1] "pr_day_MRI-AGCM3-2-H_highresSST_gn_20100101-20141231"
With some regex here is another way to get first part :
sub(".*/", "", My_Link1)

Using Gsub in R to remove a string containing brackets [duplicate]

I'm trying to use gsub to remove certain parts of a string. However, I can't get it to work, and I think it's because the string to be removed contains brackets. Is there any way around this? Thanks for any help.
The command I want to use:
gsub('(4:4aCO)_','', '(5:3)_(4:4)_(5:3)_(4:4)_(4:4aCO)_(6:2)_(4:4a)')
Expected output:
A quick test to see if brackets were the problem:
gsub('te','', 'test')
#[1] "st"
gsub('(te)','', '(te)st')
#[1] "()st"
We can by placing the brackets inside the square brackets as () is a metacharacter
gsub('[(]4:4aCO[)]','', '(5:3)(4:4)(5:3)(4:4)(4:4aCO)(6:2)_(4:4a)')
Or with fixed = TRUE to evaluate the literal meaning of that character
gsub('(4:4aCO)','', '(5:3)(4:4)(5:3)(4:4)(4:4aCO)(6:2)_(4:4a)', fixed = TRUE)
