How to remove "\" from paste function output with quotation marks? - r

I'm working with the following code:
Y_Columns <- c("Y.1.1")
paste('{"ImportId":"', Y_Columns, '"}', sep = "")
The paste function produces the following output:
"{\"ImportId\":\"Y.1.1\"}"
How do I get the paste function to omit the \? Such that, the output is:
"{"ImportId":"Y.1.1"}"
Thank you for your help.

Note: I did do a search on SO to see if there were any Q's that asked "what is an escape character in R". But I didn't review all the 160 answers, only the first 20.
This is one way of demonstrating what I wrote in my comment:
out <- paste('{"ImportId":"', Y_Columns, '"}', sep = "")
out
#[1] "{\"ImportId\":\"Y.1.1\"}"
?print
print(out,quote=FALSE)
#[1] {"ImportId":"Y.1.1"}
Both R and regex patterns use escape characters to allow special characters to be displayed in print output or input. (And sometimes regex patterns need to have doubled escapes.) R has a few characters that need to be "escaped" in certain situation. You illustrated one such situation: including double-quote character inside a result that will be printed with surrounding double-quotes. If you were intending to include any single quotes inside a character value that was delimited by single quotes at the time of creation, they would have needed to be escaped as well.
out2 <- '\'quoted\''
nchar(out2)
#[1] 8 ... note that neither the surround single-quotes nor the backslashes get counted
> out2
[1] "'quoted'" ... and the default output quote-char is a double-quote.
Here's a good Q&A to review:How to replace '+' using gsub() function in R
It has two answers, both useful: one shows how to double escape a special character and the other shows how to use teh fixed argument to get around that requirement.
And another potentially useful Q&A on the topic of handling Windows paths:
File path issues in R using Windows ("Hex digits in character string" error)
And some further useful reading suggestions: Look at the series of help pages that start with capital letters. (Since I can never remember which one has which nugget of essential information, I tried ?Syntax first and it has a "See Also" list of essential reading: Arithmetic, Comparison, Control, Extract, Logic, NumericConstants, Paren, Quotes, Reserved. and I then realized what I wanted to refer you to was most likely ?Quotes where all the R-specific escape sequence letters should be listed.

Related

Using R, how does one extract multiple URLs/pattern matches from a string in a dataset, and then place each URL in its own adjacent column?

I have a (large) dataset that initially consists of an identifier and associated text (in raw HTML). Oftentimes the text will include one or more embedded links. Here's a sample dataset:
id text
1 <p>I love dogs!</p>
2 <p>My <strong>favorite</strong> dog is this kind.</p>
3 <p>I've had both Labs and Huskies in my life.</p>
What I'd like as output (with the text column included in the same spot, but I removed it for visibility here) is:
id link1 link2
1
2 doge.com
3 labs.com huskies.com
I've tried using str_extract_all() paired with <a\s+(?:[^>]*?\s+)?href=(["'])(.*?)\1, but even when I double escape the backslashes I either get an "unexpected" error OR it keeps asking me for more and I have to Escape out. I feel like this method is the one I want and SHOULD work, but I can't seem to get the regex to play nicely. Here are my results so far:
> str_extract_all(text, "<a\s+(?:[^>]*?\s+)?href=(["'])(.*?)\1")
Error: '\s' is an unrecognized escape in character string starting ""<a\s"
> str_extract_all(text, perl(<a\s+(?:[^>]*?\s+)?href=(["'])(.*?)\1))
Error: unexpected '<' in "str_extract_all(text, perl(<"
> str_extract_all(text, "<a\\s+(?:[^>]*?\\s+)?href=(["'])(.*?)\\1")
+
> str_extract_all(text, perl(<a\\s+(?:[^>]*?\\s+)?href=(["'])(.*?)\\1))
Error: unexpected '<' in "str_extract_all(text, perl(<"
I've also tried parseURI from the XML package and for whatever reason it crashes my R session.
The other solutions I've found to date either only deal with single links, or return items in a list or vector altogether. I want to keep things separated by their identifier and in a dataset.
If needed, I could tolerate generating a separate dataset and merging them together, but there will be cases where there are no links, so I'd want to avoid any pitfalls of rows being deleted due to not having a value in any of the link columns.
R does not like quotes within strings so in your example above R is considering the string ongoing:
str_extract_all(text, "<a\\s+(?:[^>]*?\\s+)?href=(["'])(.*?)\\1")
R is still looking for the end of the string since it was not escaped in the regex. R has special cases in which as single \ can be used for escaping, (e.g \n for new line), see this. \' escapes a single quote and \" escapes a double quote in R regex:
str_extract_all(text, "<a\\s+(?:[^>]*?\\s+)?href=([\"])(.*?)\\1", text, perl=T)
"\ itself is a special character that needs escape, e.g. \\d. Do not
confuse these regular expressions with R escape sequences such as
\t."
or in your case \"

R programming - How to remove special characters from a data set?

I have a data set that contains strings and special characters like the one below can be found in the data set.
Special character
How do I remove special characters like the above from my data set?
Use regular expressions to remove unwanted characters, for example:
dataset$textcolumn <- gsub("[^\\w\\s]", "", dataset$textcolumn, perl=TRUE)
to remove everything except word characters and spaces. To do more complex replacements look into the help topic ?regexp.
Also look into the encoding (Encoding and iconv are helpful here.), maybe the text is correct but the wrong encoding is assumed.

remove/replace specific words or phrases from character strings - R

I looked around both here and elsewhere, I found many similar questions but none which exactly answer mine. I need to clean up naming conventions, specifically replace/remove certain words and phrases from a specific column/variable, not the entire dataset. I am migrating from SPSS to R, I have an example of the code to do this in SPSS below, but I am not sure how to do it in R.
EG:
"Acadia Parish" --> "Acadia" (removes Parish and space before Parish)
"Fifth District" --> "Fifth" (removes District and space before District)
SPSS syntax:
COMPUTE county=REPLACE(county,' Parish','').
There are only a few instances of this issue in the column with 32,000 cases, and what needs replacing/removing varies and the cases can repeat (there are dozens of instances of a phrase containing 'Parish'), meaning it's much faster to code what needs to be removed/replaced, it's not as simple or clean as a regular expression to remove all spaces, all characters after a specific word or character, all special characters, etc. And it must include leading spaces.
I have looked at the replace() gsub() and other similar commands in R, but they all involve creating vectors, or it seems like they do. What I'd like is syntax that looks for characters I specify, which can include leading or trailing spaces, and replaces them with something I specify, which can include nothing at all, and if it does not find the specific characters, the case is unchanged.
Yes, I will end up repeating the same syntax many times, it's probably easier to create a vector but if possible I'd like to get the syntax I described, as there are other similar operations I need to do as well.
Thank you for looking.
> x <- c("Acadia Parish", "Fifth District")
> x2 <- gsub("^(\\w*).*$", "\\1", x)
> x2
[1] "Acadia" "Fifth"
Legend:
^ Start of pattern.
() Group (or token).
\w* One or more occurrences of word character more than 1 times.
.* one or more occurrences of any character except new line \n.
$ end of pattern.
\1 Returns group from regexp
Maybe I'm missing something but I don't see why you can't simply use conditionals in your regex expression, then trim out the annoying white space.
string <- c("Arcadia Parish", "Fifth District")
bad_words <- c("Parish", "District") # Write all the words you want removed here!
bad_regex <- paste(bad_words, collapse = "|")
trimws( sub(bad_regex, "", string) )
# [1] "Arcadia" "Fifth"
dataframename$varname <- gsub(" Parish","", dataframename$varname)

Nesting more than two types of quotes in R

I would like to know how to accommodate more than two types of quotes in a same row in R. Let´s say that I want to print:
'first-quote-type1 "first-quote-type2 "second-quote-type2
'sencond-quote-type1
Using one quote in the beginning and one in the end we have:
print("'first-quote-type1 "first-quote-type2 "second-quote-type2 'sencond-quote-type1")
Error: unexpected symbol in "print("'first-quote-type1 "first"
I tried to include triple quotes as required in Python in this cases:
print(''''first-quote-type1 "first-quote-type2 "second-quote-type2 'sencond-quote-type1''')
print("""'first-quote-type1 "first-quote-type2 "second-quote-type2 'sencond-quote-type1""")
However, I also got a similar error. Some idea how to make this syntax work in R?
To use a quote within a quote you can escape the quote character with a backslash
print("the man said \"hello\"")
However, the print function in R will always escape character.
To not show the escaped character use cat() instead
so...
cat("the man said \"hello\"") will return
the man said "hello"

Paste "25 \%" in R for further processing in LaTeX

I want a character variable in R taking the value from, lets say "a", and adding " \%", to create a %-sign later in LaTeX.
Usually I'd do something like:
a <- 5
paste(a,"\%")
but this fails.
Error: '\%' is an unrecognized escape in character string starting "\%"
Any ideas? A workaround would be to define another command giving the %-sign in LaTeX, but I'd prefer a solution within R.
As many other languages, certain characters in strings have a different meaning when they're escaped. One example for that is \n, which means newline instead of n. When you write \%, R tries to interpret % as a special character and fails doing so. You might want to try to escape the backslash, so that it is just a backslash:
paste(a, "\\%")
You can read on escape sequences here.
You can also look at the latexTranslate function from the Hmisc package, which will escape special characters from strings to make them LaTeX-compatible :
R> latexTranslate("You want to give me 100$ ? I agree 100% !")
[1] "You want to give me 100\\$ ? I agree 100\\% !"

Resources