R: Avoid accidently overwriting variables

R: Avoid accidently overwriting variables - r

Is there any way to define a variable in R in your namespace, such that it can't be overwritten (maybe ala a "Final" declaration)? Something like the following psuedocode:
> xvar <- 10
> xvar
[1] 10
xvar <- 6
> "Error, cannot overwrite this variable unless you remove its finality attribute"
Motivation: When running R scripts multiple times, it's sometimes too easy to inadvertently overwrite variables.

Check out ? lockBinding:
a <- 2
a
## [1] 2
lockBinding('a', .GlobalEnv)
a <- 3
## Error: cannot change value of locked binding for 'a'
And its complement, unlockBinding:
unlockBinding('a', .GlobalEnv)
a <- 3
a
## [1] 3

You can make variables constant using the pryr package.
install_github("pryr")
library(pryr)
xvar %<c-% 10
xvar
## [1] 10
xvar <- 6
## Error: cannot change value of locked binding for 'xvar'
The %<c-% operator is a convenience wrapper for assign + lockBinding.
Like Baptiste said in the comments: if you are having problems with this, it's a possible sign of poor coding style. Bundling the majority of your logic into functions will reduce variable name clashes.

Related

Assign a variable in R using another variable

I have to run 10's of different permutations with same structure but different base names for the output. to avoid having to keep replacing the whole character names within each formula, I was hoping to great a variable then use paste function to assign the variable to the name of the output..
Example:
var<-"Patient1"
(paste0("cells_", var, sep="") <- WhichCells(object=test, expression = test > 0, idents=c("patient1","patient2"))
The expected output would be a variable called "cells_Patient1"
Then for subsequent runs, I would just copy and paste these 2 lines and change var <-"Patient1" to var <-"Patient2"
[please note that I am oversimplifying the above step of WhichCells as it entails ~10 steps and would rather not have to replace "Patient1" by "Patient2" using Search and Replaced
Unfortunately, I am unable to crate the variable "cells_Patient1" using the above command. I am getting the following error:
Error in variable(paste0("cells_", var, sep = "")) <-
WhichCells(object = test, : target of assignment expands to
non-language object
Browsing stackoverflow, I couldn't find a solution. My understanding of the error is that R can't assign an object to a variable that is not a constant. Is there a way to bypass this?

1) Use assign like this:
var <- "Patient1"
assign(paste0("cells_", var), 3)
cells_Patient1
## [1] 3
2) environment This also works.
e <- .GlobalEnv
e[[ paste0("cells_", var) ]] <- 3
cells_Patient1
3) list or it might be better to make these variables into a list:
cells <- list()
cells[[ var ]] <- 3
cells[[ "Patient1" ]]
## [1] 3
Then we could easily iterate over all such variables. Replace sqrt with any suitable function.
lapply(cells, sqrt)
## $Patient1
## [1] 1.732051

Make sure that R functions don't use global variables

I'm writing some code in R and have around 600 lines of functions right now and want to know if there is an easy way to check, if any of my functions is using global variables (which I DON'T want).
For example it could give me an error if sourcing this code:
example_fun<-function(x){
y=x*c
return(y)
}
x=2
c=2
y=example_fun(x)
WARNING: Variable c is accessed from global workspace!
Solution to the problem with the help of #Hugh:
install.packages("codetools")
library("codetools")
x = as.character(lsf.str())
which_global=list()
for (i in 1:length(x)){
which_global[[x[i]]] = codetools::findGlobals(get(x[i]), merge = FALSE)$variables
}
Results will look like this:
> which_global
$arrange_vars
character(0)
$cal_flood_curve
[1] "..count.." "FI" "FI_new"
$create_Flood_CuRve
[1] "y"
$dens_GEV
character(0)
...

For a given function like example_function, you can use package codetools:
codetools::findGlobals(example_fun, merge = FALSE)$variables
#> [1] "c"
To collect all functions see Is there a way to get a vector with the name of all functions that one could use in R?

What about emptying your global environment and running the function? If an object from the global environment were to be used in the function, you would get an error, e.g.
V <- 100
my.fct <- function(x){return(x*V)}
> my.fct(1)
[1] 100
#### clearing global environment & re-running my.fct <- function... ####
> my.fct(1)
Error in my.fct(1) : object 'V' not found

What are the dangers of using R attributes?

Adding used-defined attributes to R objects makes it easy to carry around some additional information glued together with the object of interest. The problem is that it slightly changes how R sees the objects, e.g. a numeric vector with additional attribute still is numeric but is not a vector anymore:
x <- rnorm(100)
class(x)
## [1] "numeric"
is.numeric(x)
## [1] TRUE
is.vector(x)
## [1] TRUE
mode(x)
## [1] "numeric"
typeof(x)
## [1] "double"
attr(x, "foo") <- "this is my attribute"
class(x)
## [1] "numeric"
is.numeric(x)
## [1] TRUE
is.vector(x)
## [1] FALSE # <-- here!
mode(x)
## [1] "numeric"
typeof(x)
## [1] "double"
Can this lead to any potential problems? What I'm thinking about is adding some attributes to common R objects and then passing them to other methods. What is the risk of something breaking just because of the fact alone that I added additional attributes to standard R objects (e.g. vector, matrix, data.frame etc.)?
Notice that I'm not asking about creating my own classes. For the sake of simplicity we can also assume that there won't be any conflicts in the names of the attributes (e.g. using dims attribute). Let's also assume that it is not a problem if some method at some point will drop my attribute, it is an acceptable risk.

In my (somewhat limited) experience, adding new attributes to an object hasn't ever broken anything. The only likely scenario I can think of where it would break something would be if a function required that an object have a specific set of attributes and nothing else. I can't think of a time when I've encountered that though. Most functions, especially in S3 methods, will just ignore attributes they don't need.
You're more likely to see problems arise if you remove attributes.
The reason you won't see a lot of problems stemming from additional attributes is that methods are dispatched on the class of an object. As long as the class doesn't change, methods will be dispatched in much the same way. However, this doesn't mean that existing methods will know what to do with your new attributes. Take the following example--after adding a new_attr attribute to both x and y, and then adding them, the result adopts the attribute of x. What happened to the attribute of y? The default + function doesn't know what to do with conflicting attributes of the same name, so it just takes the first one (more details at R Language Definition, thanks Brodie).
x <- 1:10
y <- 10:1
attr(x, "new_attr") <- "yippy"
attr(y, "new_attr") <- "ki yay"
x + y
[1] 1 2 3 4 5 6 7 8 9 10
attr(,"new_attr")
[1] "yippy"
In a different example, if we give x and y attributes with different names, x + y produces an object that preserves both attributes.
x <- 1:10
y <- 10:1
attr(x, "new_attr") <- "yippy"
attr(y, "another_attr") <- "ki yay"
x + y
[1] 11 11 11 11 11 11 11 11 11 11
attr(,"another_attr")
[1] "ki yay"
attr(,"new_attr")
[1] "yippy"
On the other hand, mean(x) doesn't even try to preserve the attributes. I don't know of a good way to predict which functions will and won't preserve attributes. There's probably some reliable mnemonic you could use in base R (aggregation vs. vectorized, perhaps?), but I think there's a separate principle that ought to be considered.
If preservation of your new attributes is important, you should define a new class that preserves the inheritance of the old class
With a new class, you can write methods that extend the generics and handle the attributes in whichever way you want. Whether or not you should define a new class and write its methods is very much dependent on how valuable any new attributes you add are to the future work you will be doing.
So in general, adding new attributes is very unlikely to break anything in R. But without adding a new class and methods to handle the new attributes, I would be very cautious about interpreting the meaning of those attributes after they've been passed through other functions.

Why did a show up in my table name?

I ran this in R:
a <- factor(c("A","A","B","A","B","B","C","A","C"))
And then I made a table
results <- table(a)
but when I run
> attributes(results)
$dim
[1] 3
$dimnames
$dimnames$a
But I'm confused why does a show up in my attributes? I've programmed in Java before and I thought variable names weren't supposed to show up in your functions .

R functions can not only see the data you pass to them, but they can see the actual call that was run to invoke them. So when you run, table(a) the table() function not only sees the values of a, but is also can see that those values came from a variable named a.
So by default table() likes to name each dimension in the resulting table. If you don't pass explicit names in the call via the dnn= parameter, table() will look back to the call, and turn the variable name into a character and use that value for the dimension name.
So after table() has ran, it has no direct connection to the variable a, it merely used the name of that variable as a character label of the results.
Many functions in R do this. For example this is similar to how plot(height~weight, data=data.frame(height=runif(10), weight=runif(10))) knows to use the names "weight" and "height" for the axis labels on the plot.
Here's a simple example to show one way this can be accomplished.
paramnames <- function(...) {
as.character(substitute(...()))
}
paramnames(a,b,x)
# [1] "a" "b" "x"

I think the only answer is because the designers wanted it that way. It seems reasonable to label table objects with the names of variables that formed the margins:
> b <- c(1,1,1,2,2,2, 3,3,3)
> table(a, b)
b
a 1 2 3
A 2 1 1
B 1 2 0
C 0 0 2
R was intended as a clone of S, and S was intended as a tool for working statisticians. R also has a handy function for working with table objects, as.data.frame:
> as.data.frame(results)
a Freq
1 A 4
2 B 3
3 C 2
If you want to build a function that performs the same sort of labeling or that otherwise retrieves the name of the object passed to your function then there is the deparse(substitute(.))-maneuver:
myfunc <- function(x) { nam <- deparse(substitute(x)); print(nam)}
> myfunc <- function(x) { nam <- deparse(substitute(x)); print(nam)}
> myfunc(z)
[1] "z"
> str(z)
Error in str(z) : object 'z' not found
So "z" doesn't even need to exist. Highly "irregular" if you ask me. If you "ask" myfunc what its argument list looks like you get the expected answer:
> formals(myfunc)
$x
But that is a list with an R-name for its single element x. R names are language elements, whereas the names function will retrieve it as a character value, "x", which is not a language element:
> names(formals(myfunc))
[1] "x"
R has some of the aspects of Lisp (interpreted, functional (usually)) although the dividing line between its language functions and the data objects seems less porous to me, but I'm not particularly proficient in Lisp.

Hiding function names from ls() results - to find a variable name more quickly

When we have defined tens of functions - probably to develop a new package - it is hard to find out the name of a specific variable among many function names through ls() command.
In most of cases we are not looking for a function name - we already know they exist - but we want to find what was the name we assigned to a variable.
Any idea to solve it is highly appreciated.

If you want a function to do this, you need to play around a bit with the environment that ls() looks in. In normal usage, the implementation below will work by listing objects in the parent frame of the function, which will be the global environment if called at the top level.
lsnofun <- function(name = parent.frame()) {
obj <- ls(name = name)
obj[!sapply(obj, function(x) is.function(get(x)))]
}
> ls()
[1] "bar" "crossvalidate" "df"
[4] "f1" "f2" "foo"
[7] "lsnofun" "prod"
> lsnofun()
[1] "crossvalidate" "df" "f1"
[4] "f2" "foo" "prod"
I've written this so you can pass in the name argument of ls() if you need to call this way down in a series of nested function calls.
Note also we need to get() the objects named by ls() when we test if they are a function or not.

Rather than sorting through the objects in your global environment and trying to seperate data objects from functions it would be better to store the functions in a different environment so that ls() does not list them (by default it only lists things in the global environment). But they are still accessible and can be listed if desired.
The best way to do this is to make a package with the functions in it. This is not as hard as it sometimes seems, just use package.skeleton to start.
Another alternative is to use the save function to save all your functions to a file, delete them from the global environment, then use the attach function to attach this file (and therefore all the functions) to the search path.

So perhaps
ls()[!ls()%in%lsf.str()]
Josh O'Brien's suggestion was to use
setdiff(ls(), lsf.str())
That function, after some conversions and checks, calls
x[match(x, y, 0L) == 0L]
which is pretty close to what I suggested in the first place, but is packed nicely in the function setdiff.

The following function lsos was previously posted on stackoverflow (link) - it gives a nice ordering of objects loaded in your R session based on their size. The output of the function contains the class of the object, which you can subsequently filter to get the non-function objects.
source("lsos.R")
A <- 1
B <- 1
C <- 1
D <- 1
E <- 1
F <- function(x) print(x)
L <- lsos(n=Inf)
L[L$Type != "function",]
This returns:
> lsos(n=Inf)
Type Size Rows Columns
lsos function 5184 NA NA
F function 1280 NA NA
A numeric 48 1 NA
B numeric 48 1 NA
C numeric 48 1 NA
D numeric 48 1 NA
E numeric 48 1 NA
Or, with the filter, the function F is not returned:
> L[L$Type != "function",]
Type Size Rows Columns
A numeric 48 1 NA
B numeric 48 1 NA
C numeric 48 1 NA
D numeric 48 1 NA
E numeric 48 1 NA

So you just want the variable names, not the functions? This will do that.
ls()[!sapply(ls(), function(x) is.function(get(x)))]

I keep this function in my .rprofile. I don't use it often but it's great when I have several environments, function and objects in my global environment. Clearly not as elegant as BenBarnes' solution but I never have to remember the syntax and can just call lsa() as needed. This also allows me to list specific environments. e.g. lsa(e)
lsa <- function(envir = .GlobalEnv) {
obj_type <- function(x) {
class(get(x))
}
lis <- data.frame(sapply(ls(envir = envir), obj_type))
lis$object_name <- rownames(lis)
names(lis)[1] <- "class"
names(lis)[2] <- "object"
return(unrowname(lis))
}

Develop Reference

r css asp.net wordpress firebase qt symfony nginx http apache-flex

R: Avoid accidently overwriting variables - r

Check out ? lockBinding: a <- 2 a ## [1] 2 lockBinding('a', .GlobalEnv) a <- 3 ## Error: cannot change value of locked binding for 'a' And its complement, unlockBinding: unlockBinding('a', .GlobalEnv) a <- 3 a ## [1] 3

Related

Assign a variable in R using another variable

Make sure that R functions don't use global variables

What are the dangers of using R attributes?

Why did a show up in my table name?

Hiding function names from ls() results - to find a variable name more quickly

Categories

Resources