R super assignment vector - r

I have a function in which I use the superassingment operator to update a variable in the global environment. This works fine as long as it is a single value e.g.
a <<- 3
However I get errors with subsets of data frames and data tables e.g.
a <- c(1,2,3)
a[3] <<- 4
Error in a[3] <<- 4 : object 'a' not found
Any idea why this is and how to solve it?
Thanks!

The superassignment operator and other scope-breaking techniques should be avoided if at all possible, in particular because it makes for unclear code and confusing situations like these. But if you really, truly had to assign values to a variable that is out of scope, you could use standard assignment inside eval:
a <- c(1,2,3)
eval(a[3] <- 4, envir = -1)
a
[1] 1 2 4
To generalize this further (if performing the assignment inside a function), you may need to use <<- inside eval anyway.
While changing variables out of scope is still a bad idea, using eval at least makes the operation more explicit, since you have to specify the environment in which the expression is to be evaluated.
All that said, scope-breaking assignments are never necessary, per se, and you should perhaps find a way to write your script such that this is not relied on.

Related

Attempting Pass by reference in R

I want to emulate call by reference in R and in my search came across this link https://www.r-bloggers.com/call-by-reference-in-r/.
Using the strategy given in the link above I tried to create a function that would modify the integer vector passed to it as well as return the modified vector. Here's its implementation
library(purrr)
fun = function(top){
stopifnot(is_integer(top))
top1 <- top
top1 <- c(top1,4L)
eval.parent(substitute(top<-top1))
top1
}
When I create a variable and pass to this function, it works perfectly as shown
> k <- c(9L,5L)
> fun(k)
[1] 9 5 4
> k
[1] 9 5 4
But when I pass the integer vector directly, it throws an error:
> fun(c(3L,4L))
Error in c(3L, 4L) <- c(3L, 4L, 4L) :
target of assignment expands to non-language object
Is there a workaround for this situation, where if a vector is passed directly, then we only return the modified vector as the result ?
Any help would be appreciated...
There is no workaround for this. You've essentially created a function that takes a variable name as input and modifies that variable as a side effect of running. Because c(3L,4L) is not a variable name, the function cannot work as intended.
To be clear, what you have right now is not really pass-by-reference. Your function resembles it superficially, but is in fact using some workarounds to simply evaluate an expression in the function's parent environment, instead of its own. This type of "operation by side effect" is generally considered bad practice (such changes are hard to track and debug, and prone to error), and R is built to avoid them.
Pass-by-reference in R is generally not possible, nor have I found it necessary in over a decade of daily R use.

How to loop through columns of a data.frame and use a function

This has probably been answered already and in that case, I am sorry to repeat the question, but unfortunately, I couldn't find an answer to my problem. I am currently trying to work on the readability of my code and trying to use functions more frequently, yet I am not that familiar with it.
I have a data.frame and some columns contain NA's that I want to interpolate with, in this case, a simple kalman filter.
require(imputeTS)
#some test data
col <- c("Temp","Prec")
df_a <- data.frame(c(10,13,NA,14,17),
c(20,NA,30,NA,NA))
names(df_a) <- col
#this is my function I'd like to use
gapfilling <- function(df,col){
print(sum(is.na(df[,col])))
df[,col] <- na_kalman(df[,col])
}
#this is my for-loop to loop through the columns
for (i in col) {
gapfilling(df_a, i)
}
I have two problems:
My for loop works, yet it doesn't overwrite the data.frame. Why?
How can I achieve this without a for-loop? As far as I am aware you should avoid for-loops if possible and I am sure it's possible in my case, I just don't know how.
How can I achieve this without a for-loop? As far as I am aware you should avoid for-loops if possible and I am sure it's possible in my case, I just don't know how.
You most definitely do not have to avoid for loops. What you should avoid is using a loop to perform actions that could be vectorized. Loops are in general just fine, however they are (much) slower compared to compiled languages such as c++, but are equivalent to loops in languages such as python.
My for loop works, yet it doesn't overwrite the data.frame. Why?
This is a problem with overwriting values within a function, or what is referred to as scope. Basically any assignment is restricted to its current environment (or scope). Take the example below:
f <- function(x){
a <- x
cat("a is equal to ", a, "\n")
return(3)
}
x <- 4
f(x)
a is equal to 4
[1] 3
print(a)
Error in print(a) : object 'a' not found
As you can see, "a" definitely exists, but it stops existing after the function call has been fulfilled. It is restricted to the environment (or scope) of the function. Here the scope is basically the time at which the function is run.
To alleviate this, you have to overwrite the value in the global environment
for (i in col) {
df_a[, i] <- gapfilling(df_a, i)
}
Now for readability (not speed) one could change this to a lapply
df_a[, col] <- lapply(df_a[, col], na_kalman)
I set a heavy point on it not being faster than using a loop. lapply iterates over each column, as you would in a loop. Speed could be obtained if say na_kalman was programmed to take multiple columns, and possibly save time using optimized c or c++ code.

not error, but not results either in R

I am trying to make a function in R that calculates the mean of nitrate, sulfate and ID. My original dataframe have 4 columns (date,nitrate, sulfulfate,ID). So I designed the next code
prueba<-read.csv("C:/Users/User/Desktop/coursera/001.csv",header=T)
columnmean<-function(y, removeNA=TRUE){ #y will be a matrix
whichnumeric<-sapply(y, is.numeric)#which columns are numeric
onlynumeric<-y[ , whichnumeric] #selecting just the numeric columns
nc<-ncol(onlynumeric) #lenght of onlynumeric
means<-numeric(nc)#empty vector for the means
for(i in 1:nc){
means[i]<-mean(onlynumeric[,i], na.rm = TRUE)
}
}
columnmean(prueba)
When I run my data without using the function(), but I use row by row with my data it will give me the mean values. Nevertheless if I try to use the function so it will make all the steps by itself, it wont mark me error but it also won't compute any value, as in my environment the dataframe 'prueba' and the columnmean function
what am I doing wrong?
A reproducible example would be nice (although not absolutely necessary in this case).
You need a final line return(means) at the end of your function. (Some old-school R users maintain that means alone is OK - R automatically returns the value of the last expression evaluated within the function whether return() is specified or not - but I feel that using return() explicitly is better practice.)
colMeans(y[sapply(y, is.numeric)], na.rm=TRUE)
is a slightly more compact way to achieve your goal (although there's nothing wrong with being a little more verbose if it makes your code easier for you to read and understand).
The result of an R function is the value of the last expression. Your last expression is:
for(i in 1:nc){
means[i]<-mean(onlynumeric[,i], na.rm = TRUE)
}
It may seem strange that the value of that expression is NULL, but that's the way it is with for-loops in R. The means vector does get changed sequentially, which means that BenBolker's advice to use return(.) is correct (as his advice almost always is.) . For-loops in R are a notable exception to the functional programming paradigm. They provide a mechanism for looping (as do the various *apply functions) but the commands inside the loop exert their effects in the calling environment via side effects (unlike the apply functions).

returning functions in R - when does the binding occur?

As in other functional languages, returning a function is a common case in R. for example, after training a model you'd like to return a "predictor" object, which is essentially a function, that given new data, returns predictions. There are other cases when this is useful, of course.
My question is when does the binding (e.g. evaluation) of values within the returned function occur.
As a simple example, suppose I want to have a list of three functions, each is slightly different based on a parameter whose value I set at the time of the creation of the function. Here is a simple code for this:
function.list = list()
for (i in 1:3) function.list[[i]] = function(x) x+i
So now I have three functions. Ideally, the first one returns x+1, the second computes x+2 and the third computes x+3
so I would expect:
function.list[[1]] (3) = 4
function.list[[2]] (3) = 5
etc.
Unfortunately, this doesn't happen and all the functions in the list above compute the same x+3. my question is why? why does the binding of the value of i is so late, and hence the same for all the functions in the list? How can I work around this?
EDIT:
rawr's link to a similar question was insightful, and I thought it solved the problem. Here is the link:
Explain a lazy evaluation quirk
however, I checked the code I gave above, with the fix suggested there, and it still doesn't work. Certainly, I miss something very basic here. can anyone tell me what is it? here is the "fixed" code (that still doesn't work)
function.list = list()
for (i in 1:3) { force(i); function.list[[i]] = function(x) x+i}
Still function.list[[1]] (3) gives 6 and not 4 as expected.
I also tried the following (e.g. putting the force() inside the function)
function.list = list()
for (i in 1:3) function.list[[i]] = function(x) {force(i);x+i}
what's going on?
Here's a solution with a for loop, using R 3.1:
> makeadd=function(i){force(i);function(x){x+i}}
> for (i in 1:3) { function.list[[i]] = makeadd(i)}
> rm(i) # not necessary but causes errors if we're actually using that `i`
> function.list[[1]](1)
[1] 2
> function.list[[2]](1)
[1] 3
The makeadd function creates the adding function in a context with a local i, which is why this works. It would be interesting to know if this works without the force in R 3.2. I always use the force, Luke....

Why is it not possible to assign contrasts using with() or transform() in R?

I've been trying to learn more about environments in R. Through reading, it seemed that I should be able to use functions like with() and transform() to modify variables in a data.frame as if I was operating within that object's environment. So, I thought the following might work:
X <- expand.grid(
Cond=c("baseline","perceptual","semantic"),
Age=c("child","adult"),
Gender=c("male","female")
)
Z <- transform(X,
contrasts(Cond) <- cbind(c(1,0,-1)/2, c(1,-2,1))/4,
contrasts(Age) <- cbind(c(-1,1)/2),
contrasts(Gender) <- cbind(c(-1,1)/2)
)
str(Z)
contrasts(Z$Cond)
But it does not. I was hoping someone could explain why. Of course, I understand that contrasts(X$Cond) <- ... would work, but I'm curious about why this does not.
In fact, this does not work either [EDIT: false, this does work. I tried this quickly before posting originally and did something wrong]:
attach(X)
contrasts(Cond) <- cbind(c(1,0,-1)/2, c(1,-2,1))/4
contrasts(Age) <- cbind(c(-1,1)/2)
contrasts(Gender) <- cbind(c(-1,1)/2)
detach(X)
I apologize if this is a "RTFM" sort of thing... it's not that I haven't looked. I just don't understand. Thank you!
[EDIT: Thank you joran---within() instead of with() or transform() does the trick! The following syntax worked.]
Z <- within(X, {
contrasts(Cond) <- ...
contrasts(Age) <- ...
contrasts(Gender) <- ...
}
)
transform is definitely the wrong tool, I think. And you don't want with, you probably want within, in order to return the entire object:
X <- within(X,{contrasts(Cond) <- cbind(c(1,0,-1)/2, c(1,-2,1))/4
contrasts(Age) <- cbind(c(-1,1)/2)
contrasts(Gender) <- cbind(c(-1,1)/2)})
The only tricky part here is to remember the curly braces to enclose multiple lines in a single expression.
Your last example, using attach, works just fine for me.
transform is only set up to evaluate expressions of the form tag = value, and because of the way it evaluates those expressions, it isn't really set up to modify attributes of a column. It is more intended for direct modifications to the columns themselves. (Scaling, taking the log, etc.)
The difference between with and within is nicely summed up by the Value section of ?within:
Value For with, the value of the evaluated expr. For within, the modified object.
So with only returns the result of the expression. within is for modifying an object and returning the whole thing.
While I agree with #Jornan that within is the best strategy here, I will point out it is possible to use transform you just need to do so in a different way
Z <- transform(X,
Cond = `contrasts<-`(Cond, value=cbind(c(1,0,-1)/2, c(1,-2,1))/4),
Age = `contrasts<-`(Age, value=cbind(c(-1,1)/2)),
Gender= `contrasts<-`(Gender, value=cbind(c(-1,1)/2))
)
Here we are explicitly calling the magic function that is used when you run contrasts(a)=b. This actually returns a value that can be used with the a=b format that transform expects. And of course it leaves X unchanged.
The within solution looks much cleaner of course.

Resources