Disable assignment via = in R - r

R allows for assignment via <- and =.
Whereas there a subtle differences between both assignment operators, there seems to be a broad consensus that <- is the better choice than =, as = is also used as operator mapping values to arguments and thus its use may lead to ambiguous statements. The following exemplifies this:
> system.time(x <- rnorm(10))
user system elapsed
0 0 0
> system.time(x = rnorm(10))
Error in system.time(x = rnorm(10)) : unused argument(s) (x = rnorm(10))
In fact, the Google style code disallows using = for assignment (see comments to this answer for a converse view).
I also almost exclusively use <- as assignment operator. However, the almost in the previous sentence is the reason for this question. When = acts as assignment operator in my code it is always accidental and if it leads to problems these are usually hard to spot.
I would like to know if there is a way to turn off assignment via = and let R throw an error any time = is used for assignment.
Optimally this behavior would only occur for code in the Global Environment, as there may well be code in attached namespaces that uses = for assignment and should not break.
(This question was inspired by a discussion with Jonathan Nelson)

Here's a candidate:
`=` <- function(...) stop("Assignment by = disabled, use <- instead")
# seems to work
a = 1
Error in a = 1 : Assignment by = disabled, use <- instead
# appears not to break named arguments
sum(1:2,na.rm=TRUE)
[1] 3

I'm not sure, but maybe simply overwriting the assignment of = is enough for you. After all, `=` is a name like any other—almost.
> `=` <- function() { }
> a = 3
Error in a = 3 : unused argument(s) (a, 3)
> a <- 3
> data.frame(a = 3)
a
1 3
So any use of = for assignment will result in an error, whereas using it to name arguments remains valid. Its use in functions might go unnoticed unless the line in question actually gets executed.

The lint package (CRAN) has a style check for that, so assuming you have your code in a file, you can run lint against it and it will warn you about those line numbers with = assignments.
Here is a basic example:
temp <- tempfile()
write("foo = function(...) {
good <- 0
bad = 1
sum(..., na.rm = TRUE)
}", file = temp)
library(lint)
lint(file = temp, style = list(styles.assignment.noeq))
# Lint checking: C:\Users\flodel\AppData\Local\Temp\RtmpwF3pZ6\file19ac3b66b81
# Lint: Equal sign assignemnts: found on lines 1, 3
The lint package comes with a few more tests you may find interesting, including:
warns against right assignments
recommends spaces around =
recommends spaces after commas
recommends spaces between infixes (a.k.a. binary operators)
warns against tabs
possibility to warn against a max line width
warns against assignments inside function calls
You can turn on or off any of the pre-defined style checks and you can write your own. However the package is still in its infancy: it comes with a few bugs (https://github.com/halpo/lint) and the documentation is a bit hard to digest. The author is responsive though and slowly making improvements.

If you don't want to break existing code, something like this (printing a warning not an error) might work - you give the warning then assign to the parent.frame using <- (to avoid any recursion)
`=` <- function(...){
.what <- as.list(match.call())
.call <- sprintf('%s <- %s', deparse(.what[[2]]), deparse(.what[[3]]))
mess <- 'Use <- instead of = for assigment '
if(getOption('warn_assign', default = T)) {
stop (mess) } else {
warning(mess)
eval(parse(text =.call), envir = parent.frame())
}
}
If you set options(warn_assign = F), then = will warn and assign. Anything else will throw an error and not assign.
examples in use
# with no option set
z = 1
## Error in z = 1 : Use <- instead of = for assigment
options(warn_assign = T)
z = 1
## Error in z = 1 : Use <- instead of = for assigment
options(warn_assign = F)
z = 1
## Warning message:
## In z = 1 : Use <- instead of = for assigment
Better options
I think formatR or lint and code formatting are better approaches.

Related

force the evaluation of .SD in data.table

In order to debug j in data.table I prefer to interactively inspect the resulting -by- dt´s with browser(). SO 2013 adressed this issue and I understand that .SD must be invoked in j in order for all columns to be evaluated. I use Rstudio and using the SO 2013 method, there are two problems:
The environment pane is not updated reflecting the browser environment
I often encounter the following error msg
Error: option error has NULL value
In addition: Warning message:
In get(object, envir = currentEnv, inherits = TRUE) :
restarting interrupted promise evaluation
I can get around this by doing:
f <- function(sd=force(.SD),.env = parent.frame(n = 1)) {
by = .env$.BY;
i = .env$.I;
sd = .env$.SD;
grp = .env$.GRP;
N = .env$.N;
browser()
}
library (data.table)
setDT(copy(mtcars))[,f(.SD),by=.(gear)]
But - in the data.table spirit of keeping things short and sweet- can I somehow force (the force in f does not work) the evaluation of .SD in the call to f so that the final code could run:
setDT(copy(mtcars))[,f(),by=.(gear)]
As far as I know,
data.table needs to explicitly see .SD somewhere in the code passed to j,
otherwise it won't even expose it in the environment it creates for the execution.
See for example this question and its comments.
Why don't you create a different helper function that always specifies .SD in j?
Something like:
dt_debugger <- function(dt, ...) {
f <- function(..., .caller_env = parent.frame()) {
by <- .caller_env$.BY;
i <- .caller_env$.I;
sd <- .caller_env$.SD;
grp <- .caller_env$.GRP;
N <- .caller_env$.N;
browser()
}
dt[..., j = f(.SD)]
}
dt_debugger(as.data.table(mtcars), by = .(gear))

Use = instead of <- for assignment when styling R code with styler

I love the package but I was wondering how I could change one rule from the tidyverse style: I'd like to keep "=" instead of "<-" for assignment.
I've read that note: http://styler.r-lib.org/articles/customizing_styler.html#implementation-details
But I still don't get how to simply change that rule.
I've tried the very naive:
library(styler)
force_assignment_op <- function (pd)
{
to_replace <- pd$token == "LEFT_ASSIGN"
pd$token[to_replace] <- "EQ_ASSIGN"
pd$text[to_replace] <- "="
pd
}
tidyverse_style()$token$force_assignment_op = force_assignment_op
But get the following error:
Error in tidyverse_style()$token$force_assignment_op =
force_assignment_op :
invalid (NULL) left side of assignment
I would like to modify it in a way that I can simply run the styler addin afterwards.
The problem is that tidyverse_style()$token is a list, not an environment, so you can't modify it. (Well, you can modify it, but you're modifying a copy, not the original.)
You need to write your own function to replace the tidyverse_style function, and use it instead. For example, assuming you keep your force_assignment_op function:
LaSy_style <- function(...) {
ts <- tidyverse_style(...)
ts$token$force_assignment_op <- force_assignment_op
ts
}
Then
style_text(c("ab <- 3", "a <-3"), strict = FALSE, style = LaSy_style)
(one of the examples from ?tidyverse_style) will print
ab = 3
a = 3
(This is ugly, the original tidyverse_style is better, but I won't stop you.)

declare a variable without registering it to the environment

In some R script, I use some dummy variable in a for loop.
The variable has no purpose itself, so I don't need it recorded at all.
For instance :
database = read.csv("data/somefile.csv")
for (i in 1:ncol(database)) {
name <- names(database)[i]
if (name %in% some_vector) {
label(database[, .i]) <- some_function(databas$somecolumn)
}
}
In R Studio, the "Global Environement" tab keeps track of variables i and name (and give it the last value it had), although they have no usefulness at all.
Is there any elegant way to declare my value so it is not tracked in the global environment ?
Use local for all your workspace hygiene needs.
foo <- local({
x <- 0
for(i in 1:nrow(mtcars))
x <- x + mtcars$mpg[i]
x
})
foo now contains the result of the calculation, and the temporary variables i and x are discarded.
To hide objects from RStudio's object explorer, you can prefix with . like
.x = 2
Downsides. This still creates .x and keeps it in memory, where it might take up space or accidentally be used again after you've forgotten about it. It also hides from the standard "clear workspace" command rm(list = ls()). See ?ls for a way of handling this.
Aside. Generally, I would not create any variables like this, instead wrapping any operation involving temporary objects in a function as #Aurèle suggested and not leaning too heavily on what RStudio's object browser shows me.
The only case so far where I've used dot-prefixed objects is for interactive use in a function, like:
f = function(x, y, debug.obj = FALSE){
dx = dim(x)
dy = dim(y)
if (!(length(dx) == 2 && length(dy) == 2 && dx[2] == dy[1])){
if (debug.obj){
.debug.f <<- list(dx = dx, dy = dy)
stop("Dims don't match. See .debug.f")
}
stop("Dims don't match.")
}
x %*% y
}
# example usage
f(matrix(1,1,1), matrix(2,2,2), debug.obj = TRUE)
# Error in f(matrix(1, 1, 1), matrix(2, 2, 2), debug.obj = TRUE) :
# Dims don't match. See .debug.f
.debug.f
# $dx
# [1] 1 1
#
# $dy
# [1] 2 2
Even this might be a bad idea, though.

How to prevent namespace pollution in R [duplicate]

This is probably not correct terminology, but hopefully I can get my point across.
I frequently end up doing something like:
myVar = 1
f <- function(myvar) { return(myVar); }
# f(2) = 1 now
R happily uses the variable outside of the function's scope, which leaves me scratching my head, wondering how I could possibly be getting the results I am.
Is there any option which says "force me to only use variables which have previously been assigned values in this function's scope"? Perl's use strict does something like this, for example. But I don't know that R has an equivalent of my.
EDIT: Thank you, I am aware of that I capitalized them differently. Indeed, the example was created specifically to illustrate this problem!
I want to know if there is a way that R can automatically warn me when I do this.
EDIT 2: Also, if Rkward or another IDE offers this functionality I'd like to know that too.
As far as I know, R does not provide a "use strict" mode. So you are left with two options:
1 - Ensure all your "strict" functions don't have globalenv as environment. You could define a nice wrapper function for this, but the simplest is to call local:
# Use "local" directly to control the function environment
f <- local( function(myvar) { return(myVar); }, as.environment(2))
f(3) # Error in f(3) : object 'myVar' not found
# Create a wrapper function "strict" to do it for you...
strict <- function(f, pos=2) eval(substitute(f), as.environment(pos))
f <- strict( function(myvar) { return(myVar); } )
f(3) # Error in f(3) : object 'myVar' not found
2 - Do a code analysis that warns you of "bad" usage.
Here's a function checkStrict that hopefully does what you want. It uses the excellent codetools package.
# Checks a function for use of global variables
# Returns TRUE if ok, FALSE if globals were found.
checkStrict <- function(f, silent=FALSE) {
vars <- codetools::findGlobals(f)
found <- !vapply(vars, exists, logical(1), envir=as.environment(2))
if (!silent && any(found)) {
warning("global variables used: ", paste(names(found)[found], collapse=', '))
return(invisible(FALSE))
}
!any(found)
}
And trying it out:
> myVar = 1
> f <- function(myvar) { return(myVar); }
> checkStrict(f)
Warning message:
In checkStrict(f) : global variables used: myVar
checkUsage in the codetools package is helpful, but doesn't get you all the way there.
In a clean session where myVar is not defined,
f <- function(myvar) { return(myVar); }
codetools::checkUsage(f)
gives
<anonymous>: no visible binding for global variable ‘myVar’
but once you define myVar, checkUsage is happy.
See ?codetools in the codetools package: it's possible that something there is useful:
> findGlobals(f)
[1] "{" "myVar" "return"
> findLocals(f)
character(0)
You need to fix the typo: myvar != myVar. Then it will all work...
Scope resolution is 'from the inside out' starting from the current one, then the enclosing and so on.
Edit Now that you clarified your question, look at the package codetools (which is part of the R Base set):
R> library(codetools)
R> f <- function(myVAR) { return(myvar) }
R> checkUsage(f)
<anonymous>: no visible binding for global variable 'myvar'
R>
Using get(x, inherits=FALSE) will force local scope.
myVar = 1
f2 <- function(myvar) get("myVar", inherits=FALSE)
f3 <- function(myvar){
myVar <- myvar
get("myVar", inherits=FALSE)
}
output:
> f2(8)
Error in get("myVar", inherits = FALSE) : object 'myVar' not found
> f3(8)
[1] 8
You are of course doing it wrong. Don't expect static code checking tools to find all your mistakes. Check your code with tests. And more tests. Any decent test written to run in a clean environment will spot this kind of mistake. Write tests for your functions, and use them. Look at the glory that is the testthat package on CRAN.
There is a new package modules on CRAN which addresses this common issue (see the vignette here). With modules, the function raises an error instead of silently returning the wrong result.
# without modules
myVar <- 1
f <- function(myvar) { return(myVar) }
f(2)
[1] 1
# with modules
library(modules)
m <- module({
f <- function(myvar) { return(myVar) }
})
m$f(2)
Error in m$f(2) : object 'myVar' not found
This is the first time I use it. It seems to be straightforward so I might include it in my regular workflow to prevent time consuming mishaps.
you can dynamically change the environment tree like this:
a <- 1
f <- function(){
b <- 1
print(b)
print(a)
}
environment(f) <- new.env(parent = baseenv())
f()
Inside f, b can be found, while a cannot.
But probably it will do more harm than good.
You can test to see if the variable is defined locally:
myVar = 1
f <- function(myvar) {
if( exists('myVar', environment(), inherits = FALSE) ) return( myVar) else cat("myVar was not found locally\n")
}
> f(2)
myVar was not found locally
But I find it very artificial if the only thing you are trying to do is to protect yourself from spelling mistakes.
The exists function searches for the variable name in the particular environment. inherits = FALSE tells it not to look into the enclosing frames.
environment(fun) = parent.env(environment(fun))
will remove the 'workspace' from your search path, leave everything else. This is probably closest to what you want.
#Tommy gave a very good answer and I used it to create 3 functions that I think are more convenient in practice.
strict
to make a function strict, you just have to call
strict(f,x,y)
instead of
f(x,y)
example:
my_fun1 <- function(a,b,c){a+b+c}
my_fun2 <- function(a,b,c){a+B+c}
B <- 1
my_fun1(1,2,3) # 6
strict(my_fun1,1,2,3) # 6
my_fun2(1,2,3) # 5
strict(my_fun2,1,2,3) # Error in (function (a, b, c) : object 'B' not found
checkStrict1
To get a diagnosis, execute checkStrict1(f) with optional Boolean parameters to show more ore less.
checkStrict1("my_fun1") # nothing
checkStrict1("my_fun2") # my_fun2 : B
A more complicated case:
A <- 1 # unambiguous variable defined OUTSIDE AND INSIDE my_fun3
# B unambiguous variable defined only INSIDE my_fun3
C <- 1 # defined OUTSIDE AND INSIDE with ambiguous name (C is also a base function)
D <- 1 # defined only OUTSIDE my_fun3 (D is also a base function)
E <- 1 # unambiguous variable defined only OUTSIDE my_fun3
# G unambiguous variable defined only INSIDE my_fun3
# H is undeclared and doesn't exist at all
# I is undeclared (though I is also base function)
# v defined only INSIDE (v is also a base function)
my_fun3 <- function(a,b,c){
A<-1;B<-1;C<-1;G<-1
a+b+A+B+C+D+E+G+H+I+v+ my_fun1(1,2,3)
}
checkStrict1("my_fun3",show_global_functions = TRUE ,show_ambiguous = TRUE , show_inexistent = TRUE)
# my_fun3 : E
# my_fun3 Ambiguous : D
# my_fun3 Inexistent : H
# my_fun3 Global functions : my_fun1
I chose to show only inexistent by default out of the 3 optional additions. You can change it easily in the function definition.
checkStrictAll
Get a diagnostic of all your potentially problematic functions, with the same parameters.
checkStrictAll()
my_fun2 : B
my_fun3 : E
my_fun3 Inexistent : H
sources
strict <- function(f1,...){
function_text <- deparse(f1)
function_text <- paste(function_text[1],function_text[2],paste(function_text[c(-1,-2,-length(function_text))],collapse=";"),"}",collapse="")
strict0 <- function(f1, pos=2) eval(substitute(f1), as.environment(pos))
f1 <- eval(parse(text=paste0("strict0(",function_text,")")))
do.call(f1,list(...))
}
checkStrict1 <- function(f_str,exceptions = NULL,n_char = nchar(f_str),show_global_functions = FALSE,show_ambiguous = FALSE, show_inexistent = TRUE){
functions <- c(lsf.str(envir=globalenv()))
f <- try(eval(parse(text=f_str)),silent=TRUE)
if(inherits(f, "try-error")) {return(NULL)}
vars <- codetools::findGlobals(f)
vars <- vars[!vars %in% exceptions]
global_functions <- vars %in% functions
in_global_env <- vapply(vars, exists, logical(1), envir=globalenv())
in_local_env <- vapply(vars, exists, logical(1), envir=as.environment(2))
in_global_env_but_not_function <- rep(FALSE,length(vars))
for (my_mode in c("logical", "integer", "double", "complex", "character", "raw","list", "NULL")){
in_global_env_but_not_function <- in_global_env_but_not_function | vapply(vars, exists, logical(1), envir=globalenv(),mode = my_mode)
}
found <- in_global_env_but_not_function & !in_local_env
ambiguous <- in_global_env_but_not_function & in_local_env
inexistent <- (!in_local_env) & (!in_global_env)
if(typeof(f)=="closure"){
if(any(found)) {cat(paste(f_str,paste(rep(" ",n_char-nchar(f_str)),collapse=""),":", paste(names(found)[found], collapse=', '),"\n"))}
if(show_ambiguous & any(ambiguous)) {cat(paste(f_str,paste(rep(" ",n_char-nchar(f_str)),collapse=""),"Ambiguous :", paste(names(found)[ambiguous], collapse=', '),"\n"))}
if(show_inexistent & any(inexistent)) {cat(paste(f_str,paste(rep(" ",n_char-nchar(f_str)),collapse=""),"Inexistent :", paste(names(found)[inexistent], collapse=', '),"\n"))}
if(show_global_functions & any(global_functions)){cat(paste(f_str,paste(rep(" ",n_char-nchar(f_str)),collapse=""),"Global functions :", paste(names(found)[global_functions], collapse=', '),"\n"))}
return(invisible(FALSE))
} else {return(invisible(TRUE))}
}
checkStrictAll <- function(exceptions = NULL,show_global_functions = FALSE,show_ambiguous = FALSE, show_inexistent = TRUE){
functions <- c(lsf.str(envir=globalenv()))
n_char <- max(nchar(functions))
invisible(sapply(functions,checkStrict1,exceptions,n_char = n_char,show_global_functions,show_ambiguous, show_inexistent))
}
What works for me, based on #c-urchin 's answer, is to define a script which reads all my functions and then excludes the global environment:
filenames <- Sys.glob('fun/*.R')
for (filename in filenames) {
source(filename, local=T)
funname <- sub('^fun/(.*).R$', "\\1", filename)
eval(parse(text=paste('environment(',funname,') <- parent.env(globalenv())',sep='')))
}
I assume that
all functions and nothing else are contained in the relative directory ./fun and
every .R file contains exactly one function with an identical name as the file.
The catch is that if one of my functions calls another one of my functions, then the outer function has to also call this script first, and it is essential to call it with local=T:
source('readfun.R', local=T)
assuming of course that the script file is called readfun.R.

a reliable way to tell if = is for assignment in R code?

I'm a stubborn useR who uses = instead of <- all the time, and apparently many R programmers will frown on this. I wrote the formatR package which can replace = with <- based on the parser package. As some of you might know, parser was orphaned on CRAN a few days ago. Although it is back now, this made me hesitant to depend on it. I'm wondering if there is another way to safely replace = with <-, because not all ='s mean assignment, e.g. fun(a = 1). Regular expressions are unlikely to be reliable (see line 18 of the mask.inline() function in formatR), but I will certainly appreciate it if you can improve mine. Perhaps the codetools package can help?
A few test cases:
# should replace
a = matrix(1, 1)
a = matrix(
1, 1)
(a = 1)
a =
1
function() {
a = 1
}
# should not replace
c(
a = 1
)
c(
a = c(
1, 2))
This answer uses regular expressions. There are a few edge cases where it will fail but it should be okay for most code. If you need perfect matching then you'll need to use a parser, but the regexes can always be tweaked if you run into problems.
Watch out for
#quoted function names
`my cr*azily*named^function!`(x = 1:10)
#Nested brackets inside functions
mean(x = (3 + 1:10))
#assignments inside if or for blocks
if((x = 10) > 3) cat("foo")
#functions running over multiple lines will currently fail
#maybe fixable with paste(original_code, collapse = "\n")
mean(
x = 1:10
)
The code is based upon an example on the ?regmatches page. The basic idea is: swap function contents for a placeholder, do the replacement, then put your function contents back.
#Sample code. For real case, use
#readLines("source_file.R")
original_code <- c("a = 1", "b = mean(x = 1)")
#Function contents are considered to be a function name,
#an open bracket, some stuff, then a close bracket.
#Here function names are considered to be a letter or
#dot or underscore followed by optional letters, numbers, dots or
#underscores. This matches a few non-valid names (see ?match.names
#and warning above).
function_content <- gregexpr(
"[[:alpha:]._][[:alnum:._]*\\([^)]*\\)",
original_code
)
#Take a copy of the code to modify
copy <- original_code
#Replace all instances of function contents with the word PLACEHOLDER.
#If you have that word inside your code already, things will break.
copy <- mapply(
function(pattern, replacement, x)
{
if(length(pattern) > 0)
{
gsub(pattern, replacement, x, fixed = TRUE)
} else x
},
pattern = regmatches(copy, function_content),
replacement = "PLACEHOLDER",
x = copy,
USE.NAMES = FALSE
)
#Replace = with <-
copy <- gsub("=", "<-", copy)
#Now substitute back your function contents
(fixed_code <- mapply(
function(pattern, replacement, x)
{
if(length(replacement) > 0)
{
gsub(pattern, replacement, x, fixed = TRUE)
} else x
},
pattern = "PLACEHOLDER",
replacement = regmatches(original_code, function_content),
x = copy,
USE.NAMES = FALSE
))
#Write back to your source file
#writeLines(fixed_code, "source_file_fixed.R")
Kohske sent a pull request to the formatR package which solved the problem using the codetools package. The basic idea is to set a code walker to walk through the code; when it detects = as a symbol of a functional call, it is replaced by <-. This is due to the "Lisp nature" of R: x = 1 is actually `=`(x, 1) (we replace it by `<-`(x, 1)); of course, = is treated differently in the parse tree of fun(x = 1).
The formatR package (>= 0.5.2) has since got rid of dependency on the parser package, and replace.assign should be robust now.
The safest (and probably fastest) way to replace = by <- is directly typing <- instead of trying to replace it.

Resources