Malformed string from character string

Malformed string from character string - r

I have a simple character string:
y <- "Location 433900E 387200N, Lat 53.381 Lon -1.490, 131 metres amsl"
When I perform regex capture on it:
stringr::str_extract(r'Lat(.*?)\,', y)
I get this error:
>Error: malformed raw string literal at line 1
why?

With R's raw strings (introduced in version 4.0.0), you need to use either ( or [ or { with the quotes, e.g.,
r'{Lat(.*?)\,}'
This is documented at ?Quotes (and in the release notes):
Raw character constants are also available using a syntax similar to the one used in C++: r"(...)" with ... any character sequence, except that it must not contain the closing sequence )"⁠. The delimiter pairs [] and {} can also be used, and R can be used in place of r."

Related

Does R 4.0.0. make it possible to define foo"(...)" operators, similar to the newly introduced r"(...)" syntax?

R 4.0.0 brings in a new syntax for raw strings:
r"(raw string here can contain anything except the closing sequence)"
But this same construct in R 3.x.x produced a syntax error:
Error: unexpected string constant in "r"(asdasd)""
Does it mean that the interpreter was changed in R 4.0.0. ?
And if so - does R 4.0.0. provide a mechanism to define custom functions like foo"()" ?

No, that's not possible at the moment (nor would I anticipate it becoming possible anytime soon).
Here's the NEWS item:
There is a new syntax for specifying raw character constants similar to the one used in C++: r"(...)" with ... any character sequence not containing the sequence )". This makes it easier to write strings that contain backslashes or both single and double quotes. For more details see ?Quotes.
https://cran.r-project.org/doc/manuals/r-devel/NEWS.html
Then from ?Quotes:
Raw character constants are also available using a syntax similar to
the one used in C++: r"(...)" with ... any character
sequence, except that it must not contain the closing sequence
)". The delimiter pairs [] and {} can also be
used, and R can be used in place of r. For additional
flexibility, a number of dashes can be placed between the opening quote
and the opening delimiter, as long as the same number of dashes appear
between the closing delimiter and the closing quote.
https://github.com/wch/r-source/blob/trunk/src/library/base/man/Quotes.Rd
Here's the (git mirror of the SVN patch of the) commit where this functionality was added:
https://github.com/wch/r-source/commit/8b0e58041120ddd56cd3bb0442ebc00a3ab67ebc

Regular expression for excluding some specific characters

I am trying to build a regular expression in Qt for the following set of strings:
The set can contain all the set of strings of length 1 which does not include r and z.
The set also includes the set of strings of length greater than 1, which start with z, followed by any number of z's but must terminate with a single character that is not r and z
So far I have developed the following:
[a-qs-y]?|z+[a-qs-y]
But it does not work.

The question mark in your regular expression causes the first alternative to either match lowercase strings of length 1 excluding r and z or the empty string, and as the empty string can be matched within any string, the second alternative will never be matched against. The rest of your regular expression matches your specification, although you will probably want to make your regular expression only match entire strings by anchoring it:
QRegularExpression re("^[a-qs-y]$|^z+[a-qs-y]$");
QRegularExpressionMatch match = re.match("zzza");
if (match.hasMatch()) {
QString matched = match.captured(0);
// ...
}

What are the default separator for string interpolation?

It seems ",", "$", "/" all serve as a separator, but "_" not.
x = "1"
"$x,x", "$x$x", "$x/1", "$x_1"
Is there any doc about this?

I believe this is because x_1 is a valid variable name in Julia, so it is trying to insert the value of that variable into the string.

The doc says:
The shortest complete expression after the $ is taken as the expression whose value is to be interpolated into the string
The internal workings are explained in the github issue #455 which could be summarised by:
The way string interpolation works is actually entirely defined in Julia. What happens is that the parser (in FemtoLisp) scans the code and finds a string literal, delimited by double quotes. If it finds no unescaped $ in the string, it just creates a string literal itself — ASCIIString or UTF8String depending on the content of the string. On the other hand, if the string has an unescaped $, it punts and hands the interpretation of the string literal to the str julia macro, which generates an expression that constructs the desired strings by concatenating string literals and interpolated values. This is a nice elegant scheme that lets the parser not worry about stuff like interpolation.
I could guess that #\, #\) #\] #\} #\; which are ,, ), ], } and ; respectively are closing tokens for expressions and $ is specifying the start of next interpolation.

Are double "" and single '' quotes (always) interchangeable in R?

This is perhaps rather a minor question...
but just a moment ago I was looking through some code I had written and noticed that I tend to just use ="something" and ='something_else' completely interchangeably, often in the same function.
So my question is: Is there R code in which using one or other (single or double quotes) has different behaviour? Or are they totally synonymous?

According to http://stat.ethz.ch/R-manual/R-patched/library/base/html/Quotes.html, "[s]ingle and double quotes delimit character constants. They can be used interchangeably but double quotes are preferred (and character constants are printed using double quotes), so single quotes are normally only used to delimit character constants containing double quotes."

Just for curiosity, there is a further explaination in R-help mailing list for Why double quote is preferred in R:
To avoid confusion for those who are accustomed to programming in the
C family of languages (C, C++, Java), where there is a difference in
the meaning of single quotes and double quotes.
A C programmer reads 'a' as a single character and "a" as a character
string consisting of the letter 'a' followed by a null character to
terminate the string.
In R there is no character data type, there are
only character strings. For consistency with other languages it helps
if character strings are delimited by double quotes. The single quote
version in R is for convenience.
(Since) On most keyboards you don't need to
use the shift key to type a single quote but you do need the shift for
a double quote.

> print(""hi"")
Error: unexpected symbol in "print(""hi"
> print("'hi'")
[1] "'hi'"
> print("hi")
[1] "hi"

Encoding in R like Python ("ord" and "chr")

I was wondering how to do encoding and decoding in R. In Python, we can use ord('a') and chr(97) to transform a letter to number or transform a number to a letter. Do you know any similar functions in R? Thank you!
For example, in python
>>>ord("a")
97
>>>ord("A")
65
>>>chr(97)
'a'
>>>chr(90)
'Z'
FYI:
ord(c) in Python
Given a string of length one, return an integer representing the Unicode code point of the character when the argument is a unicode object, or the value of the byte when the argument is an 8-bit string. For example, ord('a') returns the integer 97, ord(u'\u2020') returns 8224. This is the inverse of chr() for 8-bit strings and of unichr() for unicode objects. If a unicode argument is given and Python was built with UCS2 Unicode, then the character’s code point must be in the range [0..65535] inclusive; otherwise the string length is two, and a TypeError will be raised.
chr(i) in Python
Return a string of one character whose ASCII code is the integer i. For example, chr(97) returns the string 'a'. This is the inverse of ord(). The argument must be in the range [0..255], inclusive; ValueError will be raised if i is outside that range. See also unichr().

You're looking for utf8ToInt and intToUtf8
utf8ToInt("a")
[1] 97
intToUtf8(97)
[1] "a"

Develop Reference

r css asp.net wordpress firebase qt symfony nginx http apache-flex

Malformed string from character string - r

I have a simple character string: y <- "Location 433900E 387200N, Lat 53.381 Lon -1.490, 131 metres amsl" When I perform regex capture on it: stringr::str_extract(r'Lat(.*?)\,', y) I get this error: >Error: malformed raw string literal at line 1 why?

Related

Does R 4.0.0. make it possible to define foo"(...)" operators, similar to the newly introduced r"(...)" syntax?

Regular expression for excluding some specific characters

What are the default separator for string interpolation?

Are double "" and single '' quotes (always) interchangeable in R?

Encoding in R like Python ("ord" and "chr")

Categories

Resources