Variable name restrictions in R

Variable name restrictions in R - r

What are the restrictions as to what characters (and maybe other restrictions) can be used for a variable name in R?
(This screams of general reference, but I can't seem to find the answer)

You might be looking for the discussion from ?make.names:
A syntactically valid name consists of letters, numbers and the dot or
underline characters and starts with a letter or the dot not followed
by a number. Names such as ".2way" are not valid, and neither are the
reserved words.
In the help file itself, there's a link to a list of reserved words, which are:
if else repeat while function for in next break
TRUE FALSE NULL Inf NaN NA NA_integer_ NA_real_ NA_complex_
NA_character_
Many other good notes from the comments include the point by James to the R FAQ addressing this issue and Josh's pointer to a related SO question dealing with checking for syntactically valid names.

Almost NONE! You can use 'assign' to make ridiculous variable names:
assign("1",99)
ls()
# [1] "1"
Yes, that's a variable called '1'. Digit 1. Luckily it doesn't change the value of integer 1, and you have to work slightly harder to get its value:
1
# [1] 1
get("1")
# [1] 99
The "syntactic restrictions" some people might mention are purely imposed by the parser. Fundamentally, there's very little you can't call an R object. You just can't do it via the '<-' assignment operator. "get" will set you free :)

The following may not directly address your question but is of great help.
Try the exists() command to see if something already exists and this way you know you should not use the system names for your variables or function.
Example...
> exists('for')
[1] TRUE
>exists('myvariable')
[1] FALSE

Using the make.names() function from the built in base package may help:
is_valid_name<- function(x)
{
length_condition = if(getRversion() < "2.13.0") 256L else 10000L
is_short_enough = nchar(x) <= length_condition
is_valid_name = (make.names(x) == x)
final_condition = is_short_enough && is_valid_name
return(final_condition)
}

Related

String Detecting in R

I have the following strings.
x <- c("A1A1A1", "A3V???", "B4F3**")
I want to flag only the strings in which the last 3 characters do not follow the patter [[:digit:]][[:alpha:]][[:digit]]
Thus, I would want to flag the 2nd and 3rd string above. Any suggestions?

Just for clarification, are you trying to remove those strings that do not follow that pattern? The way i can think of doing this is clnstrings <- str_remove_all(vectornameofstrings, "symbols or patterns that you would want removed")
There are probably more efficient ways to do this, but from my knowledge (which is limited, as I am still learning) this could be a way to do it. If anyone else has any input on this answer please don't hesitate to comment!

grepl is suitable here
> !grepl("\\d\\w\\d$", x)
[1] FALSE TRUE TRUE
If you want to get the position:
> grep("\\d\\w\\d$", x, invert = TRUE)
[1] 2 3

named Element-wise operations in R

I am a beginner in R and apologize in advance for asking a basic question, but I couldn't find answer anywhere on Google (maybe because the question is so basic that I didn't even know how to correctly search for it.. :D)
So if I do the following in R:
v = c(50, 25)
names(v) = c("First", "Last")
v["First"]/v["Last"]
I get the output as:
First
2
Why is it that the name, "First" appears in the output and how to get rid of it?

From help("Extract"), this is because
Subsetting (except by an empty index) will drop all attributes except names, dim and dimnames.
and
The usual form of indexing is [. [[ can be used to select a single element dropping names, whereas [ keeps them, e.g., in c(abc = 123)[1].
Since we are selecting single elements, you can switch to double-bracket indexing [[ and names will be dropped.
v[["First"]] / v[["Last"]]
# [1] 2
As for which name is preserved when using single bracket indexing, it looks like it's always the first (at least with the / operator). We'd have to go digging into the C source for further explanation. If we switch the order, we still get the first name on the result.
v["Last"] / v["First"]
# Last
# 0.5

Searching for an exact String in another String

I'm dealing with a very simple question and that is searching for a string inside of another string. Consider the example below:
bigStringList <- c("SO1.A", "SO12.A", "SO15.A")
strToSearch <- "SO1."
bigStringList[grepl(strToSearch, bigStringList)]
I'm looking for something that when I search for "SO1.", it only returns "SO1.A".
I saw many related questions on SO but most of the answers include grepl() which does not work in my case.
Thanks very much for your help in advance.

When searching for a simple string that doesn't include any metacharacters, you can set fixed=TRUE:
grep("SO1.", bigStringList, fixed=TRUE, value=TRUE)
# [1] "SO1.A"
Otherwise, as Frank notes, you'll need to escape the period (so that it'll be interpreted as an actual . rather than as a symbol meaning "any single character"):
grep("SO1\\.", bigStringList, value=TRUE)
# [1] "SO1.A"

List of all the R keywords

I want to enable the auto-complete functionality in emacs for editing my R files. For this, I need to have listed all the keywords in the R language. Does someone know if this is available somewhere? I know I would have to include all the functions names in the external packages I am using, but for now the list of what is in r-cran-base should be fine for me.
Thanks a lot!

apropos with an empty string argument will list all objects on the search path. It is what is used for the tab complete in the default GUI.
apropos("")
[1] "-"
[2] "-.Date"
[3] "-.POSIXt"
[4] "!"
[5] "!.hexmode"
[6] "!.octmode"
...

The R Language Definition lists all of R's keywords. Note that those are also reserved.
The following identifiers have a special meaning and cannot be used for object names
if else repeat while function for in next break TRUE FALSE NULL Inf NaN NA NA_integer_ NA_real_
NA_complex_ NA_character_ ... ..1 ..2 etc.

See ?Reserved, ?Control and maybe ?Syntax and ?Ops.

Just a little heads up that what's being discussed here has nothing to do with actual R keywords, which is its own special thing. I suspect this is what #HongOoi is alluding to.
R keywords exists ostensibly to help group functions by theme, but, except for the special case of internal, isn't widely used.
If you want to see the list of valid keywords you can get it like this
readLines(file.path(R.home("doc"), "KEYWORDS.db"))

try this
ls('package:base')
list all objects in a package

You might go to an R buffer and look at the following variable (given that you have Emacs Speaks Statistics):
ess-R-font-lock-keywords
by using C-h v ess-R-font-lock-keywords.
From there on, you can look in ess-custom.el and find everything you need on the implementation.

Just making sure you really want to do this since ?rcompgen describes the built-in functions by Deepayan Sarkar in the utils-package that already provide "tab-completion".

The arcane formals(function(x){})$x

What is the object formals(function(x){})$x?
It's found in the formals of a function, bound to arguments without default value.
Is there any other way to refer to this strange object? Does it have some role other than representing an empty function argument?
Here are some of its properties that can be checked in the console:
> is(formals(function(x){})$x)
[1] "name" "language" "refObject"
> formals(function(x){})$x
> as.character(formals(function(x){})$x)
[1] ""
EDIT: Here are some other ways to get this object:
alist(,)[[1]]
bquote()
quote(expr=)

Background: What is formals(function(x) {})?
Well, to start with (and as documented in ?formals) , formals(function(x) {}) returns a pairlist:
is(formals(function(x){}))
# [1] "pairlist"
Unlike list objects, pairlist objects can have named elements that contain no value -- a very nice thing when constructing a function that has a possibly optional formal argument. From ?pairlist:
tagged arguments with no value are allowed whereas ‘list’ simply ignores them.
To see the difference, compare alist(), which creates pairlists, with list() which constructs 'plain old' lists:
list(x=, y=2)
# Error in list(x = , y = 2) : argument 1 is empty
alist(x=, y=2)
# $x
#
# $y
# [1] 2
Your question: What is formals(function(x) {})$x?
Now to your question about what formals(function(x) {})$x is. My understanding is in some sense its real value is the "empty symbol". You can't, however, get at it from within R because the "empty symbol" is an object that R's developers -- very much by design -- try to entirely hide from R users. (For an interesting discussion of the empty symbol, and why it's kept hidden, see the thread starting here).
When one tries to get at it by indexing an empty-valued element of a pairlist, R's developers foil the attempt by having R return the name of the element instead of its verbotten-for-public-viewing value. (This is, of course, the name object shown in your question).

It's a name or symbol, see ?name, e.g.:
is(as.name('a'))
#[1] "name" "language" "refObject"
The only difference from your example is that you can't use as.name to create an empty one.

Develop Reference

r css asp.net wordpress firebase qt symfony nginx http apache-flex

Variable name restrictions in R - r

What are the restrictions as to what characters (and maybe other restrictions) can be used for a variable name in R? (This screams of general reference, but I can't seem to find the answer)

The following may not directly address your question but is of great help. Try the exists() command to see if something already exists and this way you know you should not use the system names for your variables or function. Example... > exists('for') [1] TRUE >exists('myvariable') [1] FALSE

Related

String Detecting in R

named Element-wise operations in R

Searching for an exact String in another String

List of all the R keywords

The arcane formals(function(x){})$x

Categories

Resources