escape quote with paste in r - r

I've tried to follow other related question here, and the answer is not working for me, so I apologize for what is likely a duplicate question. I'm not finding other answers work for me.
I have 2 strings:
numbers = 1:12
month = month.name
I want to paste together to get the following, having quotes around the number, and quotes around the month, with an equal sign between:
""1" = "January"", ""2" = "February"", etc.
But
paste(numbers, month, sep = '" = "')
and
paste(numbers, month, sep = '\" = \"')
both give result:
"1\" = \"January" "2\" = \"February"...etc.
How do I get rid of the \?

Related

How to indicate not only delimiter but also it's position ( like you can do in SQL) for separate function in R?

I wanted to know how to split columns indicated delimiter but also a position of it. I need to separate title of the film and the common delimiter is "(", but obviously some movies have brackets in their title as well, soI wanted to indicate that the bracket should be followed by a number, but the number itself shouldn't be used as separator.
Here is the code:
imdb_ratings <- imdb_ratings %>% separate(col = title, into = c("title", "year"),
sep = "\\(*[:digit:]")
It obviously throws an error, that all the values in a year column is NA. I already know, that my code tries to use the bracket and a number as a separators ( I guess you can have only one character), but I don't know, how to indicate where the bracket should be. I tried to use smth like this "\\(?=[:digit:]", but it also doesn't work.
[UPDATE]
Here is my code now:
imdb_ratings <- imdb_ratings %>% filter(Animation == 1 & !str_detect(title, "\\$")) %>%
separate(col = title, into = c("title", "year"),
sep = "\\((?=\\d)")
I wanted to filter out the rows that end with backslash, because I know that they don't have a year, that's why I used the code !str_detect(title, "\\$"), but it doesn't work, because after I filtered it, the results come with the same rows that have the backslash at the end:
[![enter image description here][1]][1]
[UPDATE2]
How to use separate function in order to get the year of the movie in the second column in cases where after a bracket there is not a year but some string character. On the screenshot you can see an example "Aladdin (Video game 1993)" What to do in order to separate the Aladdin in first column and 1993 in the second year column? Maybe option would be to get the Video game within brackets in the first column as well.
[![enter image description here][2]][2]
[UPDATE]
The regex string was working all the time, but now suddenly R gives error over it.
The code was not changed:
imdb <- imdb %>% extract(title, c("title", "year"),
"^(.*?)(?:\s*\([^()]*?(\d{4})[^()]*\))?$")
the error: Error in drop && length(x) == 1L : invalid 'x' type in 'x && y'
If you plan to split a string at a ( char that is followed with a digit, you may use
\((?=\d)
See the regex demo. It matches a ( with \( and the positive lookahead (?=\d) requires the presence of a digit immediately to the right of the current location.
To check if the last char of a string is a backslash, you may use "\\\\$", \\$, pattern. See the regex demo.
In your case, you may use it as
imdb_ratings <- imdb_ratings
%>% filter(Animation == 1 & !str_detect(title, "\\\\$"))
%>% separate(col = title, into = c("title", "year"), sep = "\\((?=\\d)")
We can use a regex lookaround here
library(dplyr)
library(tidyr)
imdb_ratings %>%
separate(col = title, into = c("title", "year"),
sep = "\\(?=[[:digit:]])")
If we need to filter out the rows that ends with \, then do a filter
imdb_ratings %>%
filter(substring(title, nchar(title)) != '"')

R - Split String with conditions

I have a string splitting related problem. I have a huge amount of files, which names are structures like this:
filenames = c("NO2_Place1_123_456789.dat", "NO2_Nice_Place_123_456789.dat", "NO2_Nice_Place_123_456789.dat", "NO2_Place2_123_456789.dat")
I need to extract the Stationnames, e.g. Place1, Nice_Place1 and so on. Its either "Place" and a number or "Nice_Place" and a number.
I tried this to get the stationnames for "Place" and a number and it works geat, but this doesnt give me the correct name in case of "Nice_Place"...because it handles it as 2 words.
Station = strsplit(filenames[1], "_")[[1]][2] #Works
Station = strsplit(filenames[2], "_")[[1]][2] #Doesnt work
My idea is now to use if...else. So If the Stationname in the example above is "Nice", add the 3rd part of the stringsplit with an underscore. Unfortunatley I am totally new to this if else condition.
Can somebody please help?
EDIT:
Expected output:
Station = strsplit(filenames[1], "_")[[1]][2] #Station = Place
Station = strsplit(filenames[2], "_")[[1]][2] #Station = Nice -- not correct I want to have "Nice_Place"
So When I get
Station = strsplit(filenames[2], "_")[[1]][2] #Station = Nice
I want to add a condition, that if Station is "Nice" it should add strsplit(filenames[2], "_")[[1]][3] with an underscore!
EDIT2:
I found now a way to get what I want:
filenames = c("NO2_Place1_123_456789.dat", "NO2_Nice_Place1_123_456789.dat", "NO2_Nice_Place2_123_456789.dat", "NO2_Place2_123_456789.dat")
Station = strsplit(filenames[2], "_")[[1]][2]
if (Station == "Nice"){
Station = paste(Station, strsplit(filenames[2], "_")[[1]][3], sep = "_")
}
We can use sub
sub("^[^_]+_(.*Place\\d*).*", "\\1", filenames[2])
#[1] "Nice_Place1"

Move "*" to new column in R

Hello I have a column in a data.frame, it has many rows, e.g.,
df = data.frame("Species" = c("*Briza minor", "*Briza minor", "Wattle"))
I want to make a new column "Species_new" where the "*" is moved to the end of the character string, e.g.,
df = data.frame("Species" = c("*Briza minor", "*Briza minor", "Wattle"),
"Species_new" = c("Briza minor*", "Briza minor*", "Wattle"))
Is there a way to do this using gsub? The manual example would take far too long as I have approximately 50,000 rows.
Thanks in advance
One option is to capture the * as a group and in the replacement reverse the backreferences
df$Species_new <- sub("^([*])(.*)$", "\\2\\1", df$Species)
df$Species_new
#[1] "Briza minor*" "Briza minor*" "Wattle"
NOTE: * is a metacharacter meaning 0 or more, so we can either escape (\\*) or place it in brackets ([]) to evaluate the raw character i.e. literal evaluation
Thanks so much for the quick response, I also found a workaround;
df$Species_new = sub("[*]","",df$Species, perl=TRUE)
differences = setdiff(df$Species,df$Species_new)
tochange = subset(df,df$Species == differences)
toleave = subset(df,!df$Species == differences)
tochange$Species_new = paste(tochange$Species_new, "*", sep = "")
df = rbind(tochange,toleave)

Text Mining in a string using R

I recently started using R and a newbie for data analysis.
Is it possible in R to find the number of repetitions in a single main string of data when a string of data is used for searching through it?
Example:
Main string: 'abcdefghikllabcdefgllabcd'
and search string: 'lla'
Desired output: 'abcdefghik lla bcdefg lla bcd'
[I tried using grep() function of R, but It is not working in the desired way and only gives the number of repetitions of search string in multiple main strings.]
Thank you in advance.
This works too using regex capture groups:
gsub("(lla)"," \\1 ","abcdefghikllabcdefgllabcd")
Try the gsub() method like this:
main_string <- 'abcdefghikllabcdefgllabcd'
search_string <- 'lla'
output_string <- gsub(search_string, paste(' ', search_string, ' ', sep = ''), main_string)
Your question says that you might want to just COUNT the number of occurrences of the search tring in the main string. If that is the case, try this one liner:
string = "abcdefghikllabcdefgllabcd"
search = 'lla'
( nchar(string) - nchar( gsub(search, "", string)) ) / nchar(search)
#returns 2
string2 = "llaabcdefghikllabcdefgllabcdlla"
( nchar(string2) - nchar( gsub(search, "", string2)) ) / nchar(search)
#returns 4
NOTE: Unit-test your solution for matches at the beginning and end of the string (i.e. make sure it works on 'llaabcdefghikllabcdefgllabcdlla'). I have seen several solutions elsewhere that rely on strsplit() to split on 'lla', but these solutions skip the final 'lla' at the end of the word.

Set up different fonts for fragments of string in R

I have a long string txt that I want to display as margin text in a plot using mtext(). The txt string is composed of another string txt.sub, as well as of a date string, which applies a specific format to a date command argument. However, I want to display the "date" part of that string only in bold.
The string is:
date.in = as.Date( commandArgs( trailingOnly=TRUE )[1], format="%m/%d/%Y" )
date = format(date.in, "%b %d, %Y")
txt.sub = "Today's date is: "
txt = paste(txt.sub, date, sep = "")
I tried the following
## Plot is called first here.
mtext(expression(paste(txt.sub, bold(date), sep = "")), line = 0, adj = 0, cex = 0.8)
but the problem with this is that it doesn't paste the values of txt.sub and date, but rather displays literally the words "txt.sub" and "date".
Is there any way to get to the result I am looking for? Thank you!
Adjusting one of the examples from the help page on mathematical annotation (see example 'How to combine "math" and numeric variables'):
mtext(bquote(.(txt.sub) ~ bold(.(date))), line=0, adj=0, cex=0.8)

Resources