Read a string with single and doubles quotes - r

Just summertime curiosity about strings in R. Let use say that I have a x and y strings. As we know we have to quote single quotes in double quotes and vice versa.
x <- "a string with 'single' quotes"
y <- 'another one with "double" quotes'
paste0(x, y)
[1] "a string with 'single' quotesanother one with \"double\" quotes"
cat(x, y)
a string with 'single' quotes another one with "double" quotes
What if we have a string with single and double quotes too? I have tried this:
Backticks do not work (R triggers an error):
z <- `a string with 'single' quotes and with "double" quotes`
Use a \" instead of " and then use cat:
This works well but the problem is that users must add a backslash to every double quote.
z1 <- "a string with 'single' quotes and with \"double\" quotes"
what if we have a huge text file (like a .txt for example) with both type of quotes and we want to read in R?
At this point a (silly) solution to me seems to be: work outside R, do some manipulations (like substitute all " with \") and then read in R.
Is this a solution or does exist a better way inside R?
Here is just a little .txt file for example: Link, anyways for who is interested, the file is just a .txt with one line with this text:
a string with 'single' quotes and with \"double\" quotes

You may specify any alternate quoting characters as desired when reading text, e.g.
> p<-scan(what="character",quote="`")
1: `It is 'ambiguous' if "this is a new 'string' or "nested" in the 'first'", isn't it?`
2:
Read 1 item
> p
[1] "It is 'ambiguous' if \"this is a new 'string' or \"nested\" in the 'first'\", isn't it?"
Or, just read raw text, e.g. with readline as suggested by #rawr
> readline()
"It is 'ambiguous' if "this is a new 'string' or "nested" in the 'first'", isn't it?"
[1] "\"It is 'ambiguous' if \"this is a new 'string' or \"nested\" in the 'first'\", isn't it?\""

Related

negative-lookahead in gsub

In a recent scenario I wanted to extract the very last part of a vector of url's.
Eg.
> urls <- c('https::abc/efg/hij/', 'https::abc/efg/hij/lmn/', 'https::abc/efg/hij/lmn/opr/')
> rs <- regexpr("([^/])*(?=/$)", urls, perl = TRUE)
> substr(urls, rs, rs + attr(rs, 'match.length'))
[1] "hij/" "lmn/" "opr/"
which is somewhat simple to read. But I'd like to understand how I could do something similar by inverting the lookahead expression, eg. remove the second to last '/' and anything preceding (assuming that the string always ends with '/'). I can't seem to get the exact logic straight,
> gsub('([^/]|[/])(?!([^/]*/)$)', '', urls, perl = TRUE)
[1] "/hij" "/lmn" "/opr"
Basically I'm looking for the regexp logic that would return the result in the first example, but using only a single gsub call.
To get a match only, you could still use the lookahead construct:
^.*/(?=[^/]*/$)
^ Start of the string
.*/ Match until the last /
(?= Positive lookahead, assert what is on the right is
[^/]*/$ assert what is at the right is 0+ times any char except /, then match / at end of string
) Close lookahead
Regex demo | R example
For example
gsub('^.*/(?=[^/]*/$)', '', urls, perl = TRUE)
An option using a negative lookahead:
^.*/(?!$)
^ Start of string
.*/ Match the last /
(?!$) Negative lookahead, assert what is directly to the right is not the end of the string
Regex demo
The non-regex & very quick solution would be to use basename():
basename(urls)
[1] "hij" "lmn" "opr"
Or, for your case:
paste0(basename(urls), '/')
[1] "hij/" "lmn/" "opr/"
my prefered method is to replace the whole string with parts of the string, like so:
gsub("^.*/([^/]+/)$", "\\1", urls)
The "\\1" matches whatever was matched inside ().
So Basically I am replacing the whole string with the last part of the url.

R cannot input quotation mark using Rcpp

Double quotation marks cannot be recognized by Rcpp, which shows an error of "unexpected symbol".
The following is example codes.
cppFunction("NumericVector attrs() {
NumericVector out = NumericVector::create(1,2,3);
out.names() = CharacterVector::create("xa","xb","xc");
return out;
}")
The quotation marks in "xa", "xb", and "xc" are the problem. The codes have been written using Microsoft Word and Notepad.
Try escaping the quotation marks out:
cppFunction("NumericVector attrs() {
NumericVector out = NumericVector::create(1,2,3);
out.names() = CharacterVector::create(\"xa\",\"xb\",\"xc\");
return out;
}")
To generalize, you cannot include a quotation mark inside a string in R without escaping. You can however use single quotation marks inside a double quotation marks string or vice versa:
s1 <- "the 'cat' on the roof"
s2 <- 'the "cat" on the roof'
The latter approach might be in fact an easier solution to your issue with cppFunction, but I'll keep my original answer here because it addressed the issue itself.

Assign ' in string

I have the following value that needs to be assign into string -
ABC'DEFGH
How I can assign the sign of ' into string?
example -
str := 'ABC'DEFGH'
It's the same as with plain SQL: to escape a single quote, double it.
str := 'ABC''DEFGH';
You could also use the quoted string: q'<delimiter character><string<closing delimiter character>', e.g.:
str := q'{ABC'DEFGH}'
You can use a variety of characters as the quote delimiters. For more information, see the documentation for information on text literals, which includes how to use the q operator.

Matching series of Ampersands in R?

I am unable to solve the below question.Requesting all to help me in this regard.
I have series of ampersands(&) in my data, I want to replace pair of ampersands with some value, but for some reason I am unable to do it.
My attempt and example:
string1 <- "This aa should be replaced: but this aaa shouldn't"
string2 <- "This && should be replaced: but this &&& shouldn't"
gsub("aa", "XXX", string1) #1.
gsub("\\baa\\b", "XXX", string1) #2.
gsub("&&", "XXX", string2) #3.
gsub("\\b&&\\b", "XXX", string2) #4.
Above, if I want to match 'aa' from string1, I can have two approaches,
In approach 1 (denoted as : #1), I can simply pass 'aa' but this will also match 'aaa' partially, which I don't want, I want my regex to match exactly pairs of 'a', which in my case is 'aa'.
To solve this I use regex (#2), In this case it is working fine.
Now, in string2, I expected a similar behavior, where instead of matching pair of 'a' I want to match pair of '&&' which is not matching.
The (#3) attempt is working, but that is not the result I want as it is also matching partially '&&&',
The (#4) attempt is not working for some reason and its not replacing the string.
My question is:
1) Why pair of ampersands are not working with boundary conditions ?
2) What is the way around to solve this problem ?
I really had the hard time, and wasted my entire day due to this and really feeling bad, tried finding the solution on google, not yet successful.
In case some one know, if its there please redirect me to a post. OR if someone finds its a duplicate please let me know, I will remove it.
Thanks for your help and reading the question.
EDIT: My word boundary is space for now.
Outputs:
> gsub("aa", "XXX", string1)
[1] "This XXX should be replaced: but this XXXa shouldn't"
> gsub("\\baa\\b", "XXX", string1)
[1] "This XXX should be replaced: but this aaa shouldn't"
>
> gsub("&&", "XXX", string2)
[1] "This XXX should be replaced: but this XXX& shouldn't"
> gsub("\\b&&\\b", "XXX", string2)
[1] "This && should be replaced: but this &&& shouldn't"
>
Note: I have also checked with perl=TRUE, but its not working.
The \b word boundary means:
There are three different positions that qualify as word boundaries:
Before the first character in the string, if the first character is a
word character.
After the last character in the string, if the last
character is a word character.
Between two characters in the string,
where one is a word character and the other is not a word character.
The "\\b&&\\b" pattern matches && when it is enclosed with word chars, letters, digits or _ chars.
To match whitespace boundaries, you may use
gsub("(?<!\\S)&&(?!\\S)", "XXX", string2, perl=TRUE)
The pattern matches
(?<!\\S) - a location not immediately preceded with a non-whitespace char (that is, there must be start of string or a whitespace char immediately to the left of the current location)
&& - a literal substring
(?!\\S) - a location not immediately followed with a non-whitespace char (that is, there must be end of string or a whitespace char immediately to the right of the current location).
More specific, but you could use a 2-step function like so
replace2steps <- function(mystring, toreplace,replacement, toexclude, intermediate) {
intermstring <- gsub(toexclude, intermediate,string2)
result <- gsub(toreplace, replacement, intermstring)
result <- gsub(intermediate, toexclude, result)
return(result)
}
replace2steps(string2, "&&", "XX", "&&&", "%%%")
[1] "This XX should be replaced: but this &&& shouldn't"

How to replace special characters using regex

Using Asp.net for regex.
I've written an extension method that I want to use to replace whole words - a word might also be a single special character like '&'.
In this case I want to replace '&' with 'and', and I'll need to use the same technique to reverse it back from 'and' to '&', so it must work for whole words only and not extended words like 'hand'.
I've tried a few variations for the regex pattern - started with '\bWORD\b' which didn't work at all for the ampersand, and now have '\sWORD\s' which almost works except that it also removes the spaces around the word, meaning that a phrase like "health & beauty" ends up as "healthandbeauty".
Any help appreciated.
Here's the extension method:
public static string ReplaceWord(this string #this,
string wordToFind,
string replacement,
RegexOptions regexOptions = RegexOptions.None)
{
Guard.String.NotEmpty(() => #this);
Guard.String.NotEmpty(() => wordToFind);
Guard.String.NotEmpty(() => replacement);
var pattern = string.Format(#"\s{0}\s", wordToFind);
return Regex.Replace(#this, pattern, replacement, regexOptions);
}
In order to match a dynamic string that should be enclosed with spaces (or be located at the start or end of string), you can use negative lookaheads:
var pattern = string.Format(#"(?<!\S){0}(?!\S)", wordToFind);
^^^^^^^ ^^^^^^
or even safer:
var pattern = string.Format(#"(?<!\S){0}(?!\S)", Regex.Escape(wordToFind));
^^^^^^^^^^^^^
The (?<!\S) lookbehind will fail the match if the word is not preceded with a non-whitespace character and (?!\S) lookahead will fail the match if the word is not followed with a non-whitespace character.

Resources