What constitutes a valid symbol (identifier) in R - r

I can’t find a spec of the language…
Note that I want a correct answer, e.g. like this, as i could easily come up with a simple, but likely wrong approximation myself, such as [[:alpha:]._][\w._]*

The documentation for make.names() says
A syntactically valid name consists of letters, numbers and the dot or underline characters and starts with a letter or the dot not followed by a number. Names such as ".2way" are not valid, and neither are the reserved words.
The definition of a letter depends on the current locale, but only ASCII digits are considered to be digits.
#Roland points out this section of the R language definition:
10.3.2 Identifiers
Identifiers consist of a sequence of letters, digits, the period (‘.’) and the underscore. They must not start with a digit or an underscore, or with a period followed by a digit.
The definition of a letter depends on the current locale: the precise set of characters allowed is given by the C expression (isalnum(c) || c == ‘.’ || c == ‘_’) and will include accented letters in many Western European locales.
Notice that identifiers starting with a period are not by default listed by the ls function and that ‘...’ and ‘..1’, ‘..2’, etc. are special.
Notice also that objects can have names that are not identifiers. These are generally accessed via get and assign, although they can also be represented by text strings in some limited circumstances when there is no ambiguity (e.g. "x" <- 1). As get and assign are not restricted to names that are identifiers they do not recognise subscripting operators or replacement functions.
The rules seem to allow "Morse coding":
> .__ <- 1
> ._._. <- 2
> .__ + ._._.
[1] 3

Related

What is the underlying logic when comparing strings?

What is the logic used by R to end up with the output FALSE in the below logical operation on characters. Is it just comparing letter S with letter T instead of the entire string.
"Sachin" > "Tendulkar"
Output: FALSE
This is in the documentation. ?">" gives:
Comparison of strings in character vectors is lexicographic within the strings using the collating sequence of the locale in use
In other words, this is just a regular dictionary-style comparison. Things can get very complicated/weird depending on locales (e.g. how non-alphabetic, accented, upper/vs lower case, etc. etc. characters are handled), but this case looks straightforward. "S" comes before "T" in every locale I can imagine, so "S"<"T"; in a lexicographic sort, this will determine the order (otherwise ties would be broken by later letters in the sequence).

Convert string to variable name in R

I have spend hours to look for a proper solutions but I found nothing on Internet. There is my question. In R, I have a specific list of characters containings my desired variable names ("2011_Q4", "2012_Q1", ...). When I try to assign a dataset to each of this name with a loop, it does work but the output it's strange. Indeed, I have
> View(`2011_Q4`)
instead of
> View(2011_Q4)
And I don't know how to remove this apostrophe. It's very annoying since I have to type this ` in order to call the variable.
Somebody can help me? I would appreciate his help.
Thanks a lot and best regards
Firstly, it's a backtick (`), not an apostrophe ('). In R, backticks occasionally denote variable names; apostrophes work as single quotes for denoting strings.
The issue you're having is that your variables start with a number, which is not allowed in R. Since you somehow made it happen anyway, you need to use backticks to tell R not to interpret 2011_Q4 as a number, but as a variable.
From ?Quotes:
Names and Identifiers
Identifiers consist of a sequence of letters, digits, the period (.)
and the underscore. They must not start with a digit nor underscore,
nor with a period followed by a digit. Reserved words are not valid
identifiers.
The definition of a letter depends on the current locale, but only
ASCII digits are considered to be digits.
Such identifiers are also known as syntactic names and may be used
directly in R code. Almost always, other names can be used provided
they are quoted. The preferred quote is the backtick (`), and deparse
will normally use it, but under many circumstances single or double
quotes can be used (as a character constant will often be converted to
a name). One place where backticks may be essential is to delimit
variable names in formulae: see formula.
The best solution to your issue is simply to change your variable names to something that starts with a character, e.g. Y2011_Q4.

Sorting Algorithm in R

I had a question related to the sorting algorithm in R.
if I use order() to sort a particular column, the shorter string is not what is sorted first.
To give you an example: I had to sort a column of character type and it puts firearm_weight above fire_weigh and this is not how the dictionary way of sorting strings anyways.
How can I change this while using the order() command?
Thanks!
"_" < "a" is TRUE on my system and locale.
help("Comparison") is relevant here:
Comparison of strings in character vectors is lexicographic within the
strings using the collating sequence of the locale in use: see
locales. The collating sequence of locales such as en_US is normally
different from C (which should use ASCII) and can be surprising.
Beware of making any assumptions about the collation order: [...]
Collation of non-letters (spaces, punctuation signs, hyphens,
fractions and so on) is even more problematic.
You could substitute "_" with something that is ordered after "z" on your system. E.g., a "µ" on my system.

Combining 2 regular expression in web.config passwordStrengthRegularExpression=""

I am not able to combine below two regular expressions. Password standard requirement:
Password cannot contain your username or parts of your full name
exceeding two consecutive characters
Passwords must be at least 6 characters in length
Passwords must contain characters from three of the following categories
Uppercase characters (English A-Z)
Lowercase characters (English a-z)
Base 10 digits (0-9)
Non-alphabetic characters (e.g., !, #, #, $, %, etc.)
Expression:
passwordStrengthRegularExpression="((?=.*\d)(?=.*[a-z])(?=.*[A-Z])(?=.*[##$%]).{6,20})"
Passwords cannot contain the word “Test” or “test” or variants of the word
passwordStrengthRegularExpression="((?=.*\"^((?!Test|test|TEST).*)$"
Both are working fine individually.
Because your second regexp primarily uses a negative lookahead, you can remodel that slightly and stick it right at the beginning of the other expression. First, I'm going to change your second regex to:
"(?!.*(?:Test|test|TEST))"
In english, the string may not contain any number of (or zero) characters followed by test.
Then, I'm going to stick that right at the beginning of your other expression
passwordStrengthRegularExpression="^(?!.*(?:Test|test|TEST))(?=.*\d)(?=.*[a-z])(?=.*[A-Z])(?=.*[##$%]).{6,20}$"
Finally, I'm going to show you how to make only one part of a regex case-insensitive. This may or may not be supported depending on what program this is actually for.
passwordStrengthRegularExpression="^(?!.*(?i:test))(?=.*\d)(?=.*[a-z])(?=.*[A-Z])(?=.*[##$%]).{6,20}$"
See the (?i:...)? That means that the flags between the ? and the : are applied only to that part of the expression, that is, only that area is case-insensitive.
Combining your requirements and https://stackoverflow.com/a/2860380/156388 i've come up with this:
(?=^[^\s]{6,}$)(?!.*(?i:test))((?=.*?\d)(?=.*?[A-Z])(?=.*?[a-z])|(?=.*?\d)(?=.*?[^\w\d\s])(?=.*?[a-z])|(?=.*?[^\w\d\s])(?=.*?[A-Z])(?=.*?[a-z])|(?=.*?\d)(?=.*?[A-Z])(?=.*?[^\w\d\s]))^.*
Dont think your first regex is actually working fine if you want to meet the requirements in bullets above it. Clamps to 20 chars but doesn't say you have to. Requires all four of the categories but requirements says 3 of the 4. Doesn't check the username requirement at all. So I've gutted out most of the initial regex.
It matches these (as expected):
Short5
TeSamplePrd6
TEBREaKST6
WinningUser6#
It fails on these (as expected):
SamplePassword
TestUser6#
Shrt5
TeSTTest
Remaining problems
For some reason it matches this:
TEBREKST6
but it only meets two of the four requirements + min length - not sure why?
There is nothing taken into account about the "Password cannot contain your username or parts of your full name exceeding two consecutive characters" requirement and I'm not sure you can even do this through web.config min password requirement as you dont have access to it within the regex.

Alphanumeric RegEx validation

What is the regular Expression Validation for only Letters and Numbers in asp.net?
I need to enter first two should be character after that it can take hyphen(-), space(), apostrophes(')
I tried
^[A-Z a-z\s-'\s]{2,25}$
this is not working.
If I understood what you want, this should work:
^[a-zA-Z]{2}[-\040']*$
This will match two letters followed by any number of hyphens, spaces, or apostrophes. It will match the following strings
ab --
xy'
zz
But will not match these
12
'ab
x-
NOTE: This will not limit the length of the match expression (as your original one did). If that's important replace the * with {,23}.

Resources