I've got a very simple problem but I was unable to find a simple solution in R because I was used to solve such problems by iterating through an incrementing for-loop in other languages.
Let's say I've got a random distributed numeric list like:
rand.list <- list(4,3,3,2,5)
I'd like to change this random distributed pattern into a constantly rising pattern so the result would look like:
[4,7,10,12,17]
Try using Reduce with the accumulate parameter set to TRUE:
Reduce("+",rand.list, accumulate = T)
I hope this helps.
It came to me first to do cumsum(unlist(rand.list)), where unlist collapses the list into a plain vector. However, my lucky try shows that cumsum(rand.list) also works.
It is not that clear to me how this work, as the source code of cumsum calls .Primitive, an internal S3 method dispatcher which is not easy to further investigate. But I make another complementary experiment as follow:
x <- list(1:2,3:4,5:6)
cumsum(x) ## does not work
x <- list(c(1,2), c(3,4), c(5,6))
cumsum(x) ## does not work
In this case, we have to do cumsum(unlist(x)).
Related
This has probably been answered already and in that case, I am sorry to repeat the question, but unfortunately, I couldn't find an answer to my problem. I am currently trying to work on the readability of my code and trying to use functions more frequently, yet I am not that familiar with it.
I have a data.frame and some columns contain NA's that I want to interpolate with, in this case, a simple kalman filter.
require(imputeTS)
#some test data
col <- c("Temp","Prec")
df_a <- data.frame(c(10,13,NA,14,17),
c(20,NA,30,NA,NA))
names(df_a) <- col
#this is my function I'd like to use
gapfilling <- function(df,col){
print(sum(is.na(df[,col])))
df[,col] <- na_kalman(df[,col])
}
#this is my for-loop to loop through the columns
for (i in col) {
gapfilling(df_a, i)
}
I have two problems:
My for loop works, yet it doesn't overwrite the data.frame. Why?
How can I achieve this without a for-loop? As far as I am aware you should avoid for-loops if possible and I am sure it's possible in my case, I just don't know how.
How can I achieve this without a for-loop? As far as I am aware you should avoid for-loops if possible and I am sure it's possible in my case, I just don't know how.
You most definitely do not have to avoid for loops. What you should avoid is using a loop to perform actions that could be vectorized. Loops are in general just fine, however they are (much) slower compared to compiled languages such as c++, but are equivalent to loops in languages such as python.
My for loop works, yet it doesn't overwrite the data.frame. Why?
This is a problem with overwriting values within a function, or what is referred to as scope. Basically any assignment is restricted to its current environment (or scope). Take the example below:
f <- function(x){
a <- x
cat("a is equal to ", a, "\n")
return(3)
}
x <- 4
f(x)
a is equal to 4
[1] 3
print(a)
Error in print(a) : object 'a' not found
As you can see, "a" definitely exists, but it stops existing after the function call has been fulfilled. It is restricted to the environment (or scope) of the function. Here the scope is basically the time at which the function is run.
To alleviate this, you have to overwrite the value in the global environment
for (i in col) {
df_a[, i] <- gapfilling(df_a, i)
}
Now for readability (not speed) one could change this to a lapply
df_a[, col] <- lapply(df_a[, col], na_kalman)
I set a heavy point on it not being faster than using a loop. lapply iterates over each column, as you would in a loop. Speed could be obtained if say na_kalman was programmed to take multiple columns, and possibly save time using optimized c or c++ code.
As in other functional languages, returning a function is a common case in R. for example, after training a model you'd like to return a "predictor" object, which is essentially a function, that given new data, returns predictions. There are other cases when this is useful, of course.
My question is when does the binding (e.g. evaluation) of values within the returned function occur.
As a simple example, suppose I want to have a list of three functions, each is slightly different based on a parameter whose value I set at the time of the creation of the function. Here is a simple code for this:
function.list = list()
for (i in 1:3) function.list[[i]] = function(x) x+i
So now I have three functions. Ideally, the first one returns x+1, the second computes x+2 and the third computes x+3
so I would expect:
function.list[[1]] (3) = 4
function.list[[2]] (3) = 5
etc.
Unfortunately, this doesn't happen and all the functions in the list above compute the same x+3. my question is why? why does the binding of the value of i is so late, and hence the same for all the functions in the list? How can I work around this?
EDIT:
rawr's link to a similar question was insightful, and I thought it solved the problem. Here is the link:
Explain a lazy evaluation quirk
however, I checked the code I gave above, with the fix suggested there, and it still doesn't work. Certainly, I miss something very basic here. can anyone tell me what is it? here is the "fixed" code (that still doesn't work)
function.list = list()
for (i in 1:3) { force(i); function.list[[i]] = function(x) x+i}
Still function.list[[1]] (3) gives 6 and not 4 as expected.
I also tried the following (e.g. putting the force() inside the function)
function.list = list()
for (i in 1:3) function.list[[i]] = function(x) {force(i);x+i}
what's going on?
Here's a solution with a for loop, using R 3.1:
> makeadd=function(i){force(i);function(x){x+i}}
> for (i in 1:3) { function.list[[i]] = makeadd(i)}
> rm(i) # not necessary but causes errors if we're actually using that `i`
> function.list[[1]](1)
[1] 2
> function.list[[2]](1)
[1] 3
The makeadd function creates the adding function in a context with a local i, which is why this works. It would be interesting to know if this works without the force in R 3.2. I always use the force, Luke....
I was wondering if the apply family could be used in R with a regressive input.
Say I have:
apply(MyMatrix,1,MyFunc,MyMatrix)
I know that apply is essentially a loop, so in the above example could it run one iteration of MyFunc over the first line of MyMatrix modifying MyMatrix globally and then select the modified MyMatrix for the next iteration ? I realize that normal loops could be used here but I just wanted to know if there is a way to do it like this.
Thanks
I don't believe so. Even modifying MyMatrix globally won't change the MyMatrix passed to your function. R functions don't operate that way. Your object is actually copied when it's passed into a function and a new instance of it exists then. It's not done by reference.
Unfortunately, the *apply family of functions are able to work in this manner. (This has been a frustration to me at times as well, but I've come to appreciate and work with it.)
There are two impediments to this:
The *apply family of functions deal with the value of MyMatrix when you make the call, iterate over the rows (in this example), and then join the results (based on the dimensions of each output). It is not re-evaluated each time.
Even if it did re-evaluate it, MyFunc is only given one row (in this example) at a time, not the whole matrix. (Your second reference to MyMatrix appears to be working around this.)
To do what I think you're saying, then your MyFunc function needs to accept as arguments the entire matrix and the row on which you are operating, and return just the row in question, ala:
MyFunc <- function(rownum, mtx) {
# ...
mtx[rownum,]
}
Using that premise, you could do:
for (rr in seq.int(nrow(MyMatrix))) {
MyMatrix[rr,] <- MyFunc(rr, MyMatrix)
}
or, if you must stay with the *apply family:
MyMatrix.new <- sapply(seq.int(nrow(MyMatrix)), MyFunc, MyMatrix)
You might want the transpose (t()) of the return from sapply() here.
If MyFunc returns the whole matrix instead of just one row, this can be done though a little differently.
I know of no way to directly do what you suggest.
I have some functions like this:
myf = function(x) {
# many similar statements involving indexing x
do1(x[, indexfunc1()])
do2(x[, indexfunc1()])
do3(x[, indexfunc1()])
do4(x[, indexfunc1()])
do5(x[, indexfunc1()])
}
In all these functions, I need extract columns or rows
of x, and these functions are used in some loops.
The problem is sometimes we also have data in a transposed
format, so this means for these data we have to get t(x).
This is very ineffecient and very time consuming since
these matrices are often huge.
Is there a smart way to deal with this? It would be very annoying
to have to change code manually.
Well, first of all, if your doX functions expect the transpose of the matrix, you are going to be calling t somewhere, for example
do1(t(x[indexfunc(),])))
So your options are:
Transpose x once at the top
Transpose at each doX call
Rewrite your doX functions so they take an optional isTranspose argument.
Option 3 will be the most work, but also the most efficient. The situation where it would make sense to use option 2 is if x is huge, but you are only selecting a small number of rows/cols each time. In which case you could do something like this:
matrixSelect<-function(x,subset,dim=1){
if(dim==1)
t(x[subset,])
else
x[,subset]
}
and then write
myf = function(x,dim=2) {
# many similar statements involving indexing x
do1(matrixSelect(x,indexfunc1(),dim)
# etc
}
This is a curiousity and I highly doubt you can do what I am asking because the concept is, well silly. If I were to round something can it be unrounded?
So:
x <- round(rnorm(10))
x
You have no idea what the original something is can you get back to the original numbers generated by rnorm?
I ask because when I write functions for users I often put rounding arguments in them to make display better but I always give the user control of the digits and allow independent control of digit rounding for list objects. That makes a function full of digits= arguments really quickly. I would put these arguments in the function internally if I knew the user could somehow magically re-extract the original values. I could leave the digits as are, assign to a class and use a print method but for a list this is a pain at best.
If you round the actual data itself, in general you cannot recover it. Instead you should change the display using a custom print or trying something like option(digits=3). In the very particular case of random number generation, you could recover the original data if you first set the seed (set.seed), remembered it and then re-generated the random data from the same seed.
You could use sprintf to just modify how things get printed.
myfun <- function(){
x <- rnorm(3)
print(sprintf("%.3f", x))
invisible(x)
}
out <- myfun()
#[1] "-0.527" "0.226" "-0.168"
out
#[1] -0.5266562 0.2262599 -0.1680460
Since I can't resist doing it the hard way...
x<-runif(100)*10
z<-round(x,2)
y<-x-z