Append element to list in R function - r

I am a newbie in R trying to write a function to add elements to a list.
Below is the code for the function varNames. I can call it with varNames("name1") but "name1" is not added to "listNames" (this still remains as an empty list).
I've been trying a few things, and searching for answers for a long time, with no success.
Also tried lappend, with no success.
listNames<-list()
varNames<- function(name){
listNames <- c(listNames, name)
}

R is a functional language, which generally means that you pass objects to functions and those functions return some object back, which you can do with as you wish. So, your intended result is a function like:
varNames <- function(existinglist, itemtoadd){
returnvalue <- c(existinglist, itemtoadd)
return(returnvalue)
}
listNames <- list()
a <- 'a'
varNames(existinglist = listNames, itemtoadd = a)
If you want to replace your original listNames object with the return value of the function, then you need to assign it into that original object's name:
listNames
listNames <- varNames(existinglist = listNames, itemtoadd = a)
listNames
The way you've originally written your code is a common error among those new to R. You're trying to create what's known as a "side effect". That is, you want to modify your original listNames object in place without using a <- assignment. This is typically considered bad practice and there are relatively few functions in R that produce side effects like that.
To understand this better, you may find the R Introduction on scope and on assignment within functions helpful, as well as Circle 6 of R Inferno.

The problem is with scope. The listNames in the function is local to that function. Essentially, it is a different object from the listNames you want to change.
There are a few ways to get around this:
Change the value of listNames to the output of the function varNames():
listNames <- varNames(name)
Use <<- and get() to change the value of the listNames in the outer scope. This is generally a bad idea as it makes debuggin very hard.
Don't encapsulate the c() function in the first place.

Related

How to loop through columns of a data.frame and use a function

This has probably been answered already and in that case, I am sorry to repeat the question, but unfortunately, I couldn't find an answer to my problem. I am currently trying to work on the readability of my code and trying to use functions more frequently, yet I am not that familiar with it.
I have a data.frame and some columns contain NA's that I want to interpolate with, in this case, a simple kalman filter.
require(imputeTS)
#some test data
col <- c("Temp","Prec")
df_a <- data.frame(c(10,13,NA,14,17),
c(20,NA,30,NA,NA))
names(df_a) <- col
#this is my function I'd like to use
gapfilling <- function(df,col){
print(sum(is.na(df[,col])))
df[,col] <- na_kalman(df[,col])
}
#this is my for-loop to loop through the columns
for (i in col) {
gapfilling(df_a, i)
}
I have two problems:
My for loop works, yet it doesn't overwrite the data.frame. Why?
How can I achieve this without a for-loop? As far as I am aware you should avoid for-loops if possible and I am sure it's possible in my case, I just don't know how.
How can I achieve this without a for-loop? As far as I am aware you should avoid for-loops if possible and I am sure it's possible in my case, I just don't know how.
You most definitely do not have to avoid for loops. What you should avoid is using a loop to perform actions that could be vectorized. Loops are in general just fine, however they are (much) slower compared to compiled languages such as c++, but are equivalent to loops in languages such as python.
My for loop works, yet it doesn't overwrite the data.frame. Why?
This is a problem with overwriting values within a function, or what is referred to as scope. Basically any assignment is restricted to its current environment (or scope). Take the example below:
f <- function(x){
a <- x
cat("a is equal to ", a, "\n")
return(3)
}
x <- 4
f(x)
a is equal to 4
[1] 3
print(a)
Error in print(a) : object 'a' not found
As you can see, "a" definitely exists, but it stops existing after the function call has been fulfilled. It is restricted to the environment (or scope) of the function. Here the scope is basically the time at which the function is run.
To alleviate this, you have to overwrite the value in the global environment
for (i in col) {
df_a[, i] <- gapfilling(df_a, i)
}
Now for readability (not speed) one could change this to a lapply
df_a[, col] <- lapply(df_a[, col], na_kalman)
I set a heavy point on it not being faster than using a loop. lapply iterates over each column, as you would in a loop. Speed could be obtained if say na_kalman was programmed to take multiple columns, and possibly save time using optimized c or c++ code.

How do I remove an object from within a function environment in R?

How do I remove an object from the current function environment?
I'm trying to achieve this:
foo <- function(bar){
x <- bar
rm(bar, envir = environment())
print(c(x, is.null(bar)))
}
Because I want the function to be able to handle multiple inputs.
Specifically I'm trying to pass either a dataframe or a vector to the function, and if I'm passing a dataframe I want to set the vector to NULL for later error handling.
If you want, you can watch my DepthPlotter script, where I want to let the second function check if depth is a dataframe, and if so, assign it to df in stead and remove depth from the environment.
Here is a very brief sketch of how to set this up using S3 method dispatch.
First, you define your generic:
DepthPlotter <- function(depth,...){
UseMethod("DepthPlotter", depth)
}
Then you define methods for specific classes of the argument depth. As a very basic example in your case, you might create only two, a data.frame method and a default method to handle the vector case:
DepthPlotter.default <- function(depth, variable, ...){
#Here you write a function assuming that depth is
# anything but a data frame
}
DepthPlotter.data.frame <- function(depth,...){
#Here you'd write a function that assumes
# that depth is a data frame
}
And then you can call DepthPlotter() using either type of argument and the correct function will be run based upon the result of class(depth).
The example I've sketched out here is a little crude, since I've used a default method to handle the vector case. You could write .numeric and .integer methods to handle numeric or integer vectors more specifically. In my example, the .default method will be called for any case other than data.frame, so if you go this route you'd want to write some code in there that checks for strange cases like depth being a complicated list, or other odd object, if you think there's a chance something like that might be passed to the function.

returning functions in R - when does the binding occur?

As in other functional languages, returning a function is a common case in R. for example, after training a model you'd like to return a "predictor" object, which is essentially a function, that given new data, returns predictions. There are other cases when this is useful, of course.
My question is when does the binding (e.g. evaluation) of values within the returned function occur.
As a simple example, suppose I want to have a list of three functions, each is slightly different based on a parameter whose value I set at the time of the creation of the function. Here is a simple code for this:
function.list = list()
for (i in 1:3) function.list[[i]] = function(x) x+i
So now I have three functions. Ideally, the first one returns x+1, the second computes x+2 and the third computes x+3
so I would expect:
function.list[[1]] (3) = 4
function.list[[2]] (3) = 5
etc.
Unfortunately, this doesn't happen and all the functions in the list above compute the same x+3. my question is why? why does the binding of the value of i is so late, and hence the same for all the functions in the list? How can I work around this?
EDIT:
rawr's link to a similar question was insightful, and I thought it solved the problem. Here is the link:
Explain a lazy evaluation quirk
however, I checked the code I gave above, with the fix suggested there, and it still doesn't work. Certainly, I miss something very basic here. can anyone tell me what is it? here is the "fixed" code (that still doesn't work)
function.list = list()
for (i in 1:3) { force(i); function.list[[i]] = function(x) x+i}
Still function.list[[1]] (3) gives 6 and not 4 as expected.
I also tried the following (e.g. putting the force() inside the function)
function.list = list()
for (i in 1:3) function.list[[i]] = function(x) {force(i);x+i}
what's going on?
Here's a solution with a for loop, using R 3.1:
> makeadd=function(i){force(i);function(x){x+i}}
> for (i in 1:3) { function.list[[i]] = makeadd(i)}
> rm(i) # not necessary but causes errors if we're actually using that `i`
> function.list[[1]](1)
[1] 2
> function.list[[2]](1)
[1] 3
The makeadd function creates the adding function in a context with a local i, which is why this works. It would be interesting to know if this works without the force in R 3.2. I always use the force, Luke....

Dynamic input in R apply

I was wondering if the apply family could be used in R with a regressive input.
Say I have:
apply(MyMatrix,1,MyFunc,MyMatrix)
I know that apply is essentially a loop, so in the above example could it run one iteration of MyFunc over the first line of MyMatrix modifying MyMatrix globally and then select the modified MyMatrix for the next iteration ? I realize that normal loops could be used here but I just wanted to know if there is a way to do it like this.
Thanks
I don't believe so. Even modifying MyMatrix globally won't change the MyMatrix passed to your function. R functions don't operate that way. Your object is actually copied when it's passed into a function and a new instance of it exists then. It's not done by reference.
Unfortunately, the *apply family of functions are able to work in this manner. (This has been a frustration to me at times as well, but I've come to appreciate and work with it.)
There are two impediments to this:
The *apply family of functions deal with the value of MyMatrix when you make the call, iterate over the rows (in this example), and then join the results (based on the dimensions of each output). It is not re-evaluated each time.
Even if it did re-evaluate it, MyFunc is only given one row (in this example) at a time, not the whole matrix. (Your second reference to MyMatrix appears to be working around this.)
To do what I think you're saying, then your MyFunc function needs to accept as arguments the entire matrix and the row on which you are operating, and return just the row in question, ala:
MyFunc <- function(rownum, mtx) {
# ...
mtx[rownum,]
}
Using that premise, you could do:
for (rr in seq.int(nrow(MyMatrix))) {
MyMatrix[rr,] <- MyFunc(rr, MyMatrix)
}
or, if you must stay with the *apply family:
MyMatrix.new <- sapply(seq.int(nrow(MyMatrix)), MyFunc, MyMatrix)
You might want the transpose (t()) of the return from sapply() here.
If MyFunc returns the whole matrix instead of just one row, this can be done though a little differently.
I know of no way to directly do what you suggest.

returning different data frames in a function - R

Is it possible to return 4 different data frames from one function?
Scenario:
I am trying to read a file, parse it, and return some parts of the file.
My function looks something like this:
parseFile <- function(file){
carFile <- read.table(file, header=TRUE, sep="\t")
carNames <- carFile[1,]
carYear <- colnames(carFile)
return(list(carFile,carNames,carYear))
}
I don't want to have to use list(carFile,carNames,carYear). Is there a way return the 3 data frames without returning them in a list first?
R does not support multiple return values. You want to do something like:
foo = function(x,y){return(x+y,x-y)}
plus,minus = foo(10,4)
yeah? Well, you can't. You get an error that R cannot return multiple values.
You've already found the solution - put them in a list and then get the data frames from the list. This is efficient - there is no conversion or copying of the data frames from one block of memory to another.
This is also logical, the return from a function should conceptually be a single entity with some meaning that is transferred to whatever function is calling it. This meaning is also better conveyed if you name the returned values of the list.
You could use a technique to create multiple objects in the calling environment, but when you do that, kittens die.
Note in your example carYear isn't a data frame - its a character vector of column names.
There are other ways you could do that, if you really really want, in R.
assign('carFile',carFile,envir=parent.frame())
If you use that, then carFile will be created in the calling environment. As Spacedman indicated you can only return one thing from your function and the clean solution is to go for the list.
In addition, my personal opinion is that if you find yourself in such a situation, where you feel like you need to return multiple dataframes with one function, or do something that no one has ever done before, you should really revisit your approach. In most cases you could find a cleaner solution with an additional function perhaps, or with the recommended (i.e. list).
In other words the
envir=parent.frame()
will do the job, but as SpacedMan mentioned
when you do that, kittens die
The zeallot package does what you need in a similar that Python can unpack variables from a function. Reproducible example below.
parseFile <- function(){
carMPG <- mtcars$mpg
carName <- rownames(mtcars)
carCYL <- mtcars$cyl
return(list(carMPG,carName,carCYL))
}
library(zeallot)
c(myFile, myName, myYear) %<-% parseFile()

Resources