Using merge.zoo to dynamically create variables in R - r

I'm trying to create a function that automatically creates polynomials of a zoo object. Coming from Python, the typical way to it is to create a list outside a for loop, and then append the list inside the loop. Following this, I wrote the below code in R:
library("zoo")
example<-zoo(2:8)
polynomial<-function(data, name, poly) {
##creating the catcher object that the polynomials will be attached to
returner<-data
##running the loop
for (i in 2:poly) {
#creating the polynomial
poly<-data^i
##print(paste(name, i), poly) ##done to confirm that paste worked correctly##
##appending the returner object
merge.zoo(returner, assign(paste(name, i), poly))
}
return(returner)
}
#run the function
output<-polynomial(example, "example", 4)
However, when I run the function, R throws no exceptions, but the output object does not have any additional data beyond what I originally created in the example zoo object. I suspect I'm misunderstanding merge.zoo or perhaps now allowed to dynamically reassign the names of the polynomials inside the loop.
Thoughts?

As for error in your code you are missing assignment of result from merge.zoo to returner.
However, I think there is better way to achieve what you want.
example <- zoo(2:8)
polynomial <- function(data, name, poly) {
res <- zoo(sapply(1:poly, function(i) data^i))
names(res) <- paste(name, 1:4)
return(res)
}
polynomial(example, "example", 4)
## example 1 example 2 example 3 example 4
## 1 2 4 8 16
## 2 3 9 27 81
## 3 4 16 64 256
## 4 5 25 125 625
## 5 6 36 216 1296
## 6 7 49 343 2401
## 7 8 64 512 4096

Related

How to speed up this function (for n parameters) in R?

I have this function:
col <- 0
rres <- data.frame(matrix(nrow=nrow(ind),ncol=length(lt)))
gig <- NULL
> lt
[1] 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22
> delta.1
[1] 5 7 9 10 12 15 17 20 22 26 29 34 39 46 54 68 96 138 138
> f.bio
function(x,y,a,b,l,k,m)
{
for (t in 1:nrow(y)){
for (i in 1:length(lt)){
for(j in 1:delta.1[i]){
ifelse (t+j-1>nrow(x),gig[j]<- NA,
gig[j] <- x[t+j-1,i]*
(a*(l-(((l-(lt[i]+1))/(exp(-k*((j-1)/12))))))
^b)*exp(m[(1+j),i]*(j-1)))
}
rres[t,i] <- sum(gig, na.rm = TRUE)
}
result <- apply(rres,1,function(x) sum(x)/1000000)
}
return(result)
}
which it is apply to some biological data, the code is:
f.bio(ind,eff,a_all,b_all,Linf,K_coef,mort)
where the arguments are:
> dim(ind)
[1] 1356 19
> dim(eff)
[1] 1356 1
a_all = 0.004
b_all= 3
Linf= 19.4
K_coef = 0.57
> dim(mort)
[1] 110 19
ind, eff, and mort are data.frame.
Now, my question is, is possible to apply this function to n parameters, without excessive time machine?
I mean for n parameters a distribution of a certain parameters, for example:
set.seed(1)
a_all_v <- round(sort(rnorm(40,a_all,0.00034)),5) #40 values!!
and so on for the 4 par: a_all, b_all, K_coef, Linf
I wrote this code, with loop ( in this loop i can combine a_all with b_all, and Linf with K_coef):
col <- 0
for (m1 in 1:length(a_all_v)){
a_all <- a_all_v[m1]
b_all <- b_all_v[m1]
for(m2 in 1:length(Linf_v)){
Linf <- Linf_v[m2]
K_coef <- k_coef_v[m2]
col <- col+1
res.temp <-f.bio(ind,eff,a_all,b_all,Linf,K_coef,mort)
res.2[,col] <-res.temp
}
}
where res.2 is:
res.2 <- data.frame(matrix(nrow=1356,ncol=1600)) #1600=40*40 (number of values for each parameters distribution)
This loop employ many time machine (many day on my PC). For these reason, there is some package or function (like Monte Carlo or bootstrap) that can change my code structure, and run the function with a good number of parameters combination, in little time (if is possible)?
If you keep your current setup with for loops, you need to start preallocating your output objects. For example, you start with an empty gig (NULL) and iteratively fill it. However, the way you do it right now gig needs to be rebuild every iteration as the analysis progresses, and reallocation of memory is a very expensive operation. Simply making gig as large as it needs to be and then doing the assignment will speed up your code tremendously.
Even better is to solve your problem not via for loops (which are notoriously slow, even with preallocation) but use either:
Vectorisation, matrix calculations. These will be order of magnitude faster.
dplyr or data.table. If smartly used, these will also be much faster, but vectorisation is probably even faster.

When used inside a function, transform.zoo can't access variables

I found out that zoo's transform is not able to use additional (i.e. not part of the zoo object) variables when used in a function body.
Let me explain:
I entered the following code at the prompt to create a small two-column zoo object z and to add a new column calculated from an existing column and a variable x:
library(zoo)
z <- zoo(matrix(1:10, ncol=2, dimnames=list(NULL, c("a", "b"))), order.by=2001:2005)
x <- 2
transform(z, c = x*a)
I got the desired result, a zoo object with a new colum c. No problem here.
Now I'd like to use transform in a function body; the variable for the calculation is passed as a parameter to the function:
rm(x)
f <- function(data, x) { transform(data, c = x*a) }
f(z, 2)
This stops with Error in eval(expr, envir, enclos) (from #1) : object 'x' not found. If I assign x <- 2 at the prompt, it works (therefore the rm(x) above).
With dataframes (i.e. transform.data.frame) there is no problem.
I think that when transform.zoo calls transform.data.frame, the bindings of the formals of f are lost. I don't understand R's environments well enough to find out what exactly is wrong here.
Edited to add: Not only can transform not get the formals but also no variables from inside the function body.
Is there a way to make transform see x? (I know I could work without transform, but it's a nice tool for short, succinct code.)
I think the best advice is: Don't do that!
transform(), with(), subset() are really sugar for use at the top level, to make things somewhat easier to write data manipulation code. If you are writing functions you should use the general replacement functions [<- and [[<- depending on what you are doing.
If you don't believe me, see the Warning in ?transform
Warning:
This is a convenience function intended for use interactively.
For programming it is better to use the standard subsetting
arithmetic functions, and in particular the non-standard
evaluation of argument ‘transform’ can have unanticipated
consequences.
What I mean by using [<- or [ or other functions is to write f like this
f <- function(obj, x) {
cd <- coredata(obj)
cd <- cbind(cd, c = x * cd[, "a"])
zoo(cd, index(obj), attr(obj, "frequency"))
}
f(z, 2)
Which gives the desired result
> transform(z, c = x*a)
a b c
2001 1 6 2
2002 2 7 4
2003 3 8 6
2004 4 9 8
2005 5 10 10
> f(z, 2)
a b c
2001 1 6 2
2002 2 7 4
2003 3 8 6
2004 4 9 8
2005 5 10 10
f is complicated because coredata(obj) is a matrix. It might be neater to
f2 <- function(obj, x) {
cd <- as.data.frame(coredata(obj))
cd[, "c"] <- x * cd[, "a"] ## or cd$c <- x * cd$a
zoo(cd, index(obj), attr(obj, "frequency"))
}
f2(z, 2)
> f2(z, 2)
a b c
2001 1 6 2
2002 2 7 4
2003 3 8 6
2004 4 9 8
2005 5 10 10
You really need to understand environments and evaluation frames to use transform() - well you can't you'd need to learn to use eval(), which is what transform() calls internally, and specify the correct values for envir (the environment in which to evaluate), and enclos the enclosure. See ?eval.

bestfit nonlinear to a list of dataframes

I have a list of data frames and I would like to calculate the nonlinear bestfit to each dataframe in the list and to get a list with the best fit objects. I am trying to use lapply but I am having problems with the parameters.
# define a function for D
fncTtr <- function(n,d) (d/n)*((sqrt(1+2*(n/d))-1))
# define a function for best fit
bestFitD <- function(dat,fun) {
res <- nls(dat$ttr~fun(n,d),data=dat,start=list(d=25),trace=T)
return(res)
}
resL <- lapply(dData2,bestFitD,dat=dData2,fun=fncTtr)
When I execute this code I am getting the following error:
Error in FUN(X[[1L]], ...) : unused argument(s) (X[[1]])
I want the lapply to go thru each dataframe in dData2 and to execute the bestFitD function. How should I define the parameters for the function in lapply. The dData2 object is a list. I am using it as a parameter for bestFitD but this function expects one dataframe as a parameter. How can I define this parameter? When I execute the function bestFitD by itself with one dataframe, it is running correctly. example of a dData2 list with components that are dataframes:
$`1`
n ttr d id
1 35 0.6951 27.739 1
2 36 0.6925 28.072 1
3 37 0.6905 28.507 1
4 38 0.6887 28.946 1
5 39 0.6790 28.003 1
6 40 0.6703 27.247 1
7 41 0.6566 25.735 1
8 42 0.6605 26.981 1
9 43 0.6567 27.016 1
10 44 0.6466 26.026 1
11 45 0.6531 27.667 1
12 46 0.6461 27.128 1
13 47 0.6336 25.751 1
14 48 0.6225 24.636 1
15 49 0.6214 24.992 1
16 50 0.6248 26.011 1
$`2`
n ttr d id
17 35 0.6951 27.739 2
18 36 0.6925 28.072 2
19 37 0.6905 28.507 2
20 42 0.6605 26.981 2
The following code seems to be ok:
res <- bestFitD(dData2[[1]],fncTtr)
but when I execute the following:
res <- bestFitD(dData2[[2]],fncTtr)
I am getting the followin error:
Error in model.frame.default(formula = ~dat + ttr + n, data = dat) :
invalid type (list) for variable 'dat'
Why? Both are dataframes!
But it seems that There is something strange with the second component!
just get rid of the dat$ in your nls function call. i believe it's looking for dat$dat$ttr which obviously will break. That is, your bestFitD function should be:
bestFitD <- function(dat,fun) {
res <- nls(ttr~fun(n,d),data=dat,start=list(d=25),trace=T)
return(res)
}
Now, call using lapply as:
resL <- lapply(dData2, bestFitD, fun = fncTtr)
This should work:
resL <- lapply(dData2, function(x,fun){
bestFitD(x,fun)
},fun='fncTtr')
Where I rewrite,bestFitD using do.call
bestFitD <- function(dat,fun){
nls(ttr~do.call(fun,list(n,d)), data=dat,
start=list(d=25),trace=T)
res
}
0.003237457 : 25
0.0009393089 : 26.77943
0.0009362902 : 26.84895
0.0009362902 : 26.84898
0.001282807 : 25
4.771935e-05 : 27.64267
4.389588e-05 : 27.80729
4.389584e-05 : 27.80781
EDIT
my solution can be simplified to (similar but not exactly to Anthony solution)
lapply(dData2, bestFitD, fun = 'fncTtr')

Example Needed: Change the default print method of an object

I need a bit of help with jargon, and a short piece of example code. Different types of objects have a specific way of outputting themselves when you type the name of the object and hit enter, an lm object shows a summary of the model, a vector lists the contents of the vector.
I'd like to be able to write my own way for "showing" the contents of a specific type of object. Ideally, I'd like to be able to seperate this from existing types of objects.
How would I go about doing this?
Here's an example to get you started. Once you get the basic idea of how S3 methods are dispatched, have a look at any of the print methods returned by methods("print") to see how you can achieve more interesting print styles.
## Define a print method that will be automatically dispatched when print()
## is called on an object of class "myMatrix"
print.myMatrix <- function(x) {
n <- nrow(x)
for(i in seq_len(n)) {
cat(paste("This is row", i, "\t: " ))
cat(x[i,], "\n")
}
}
## Make a couple of example matrices
m <- mm <- matrix(1:16, ncol=4)
## Create an object of class "myMatrix".
class(m) <- c("myMatrix", class(m))
## When typed at the command-line, the 'print' part of the read-eval-print loop
## will look at the object's class, and say "hey, I've got a method for you!"
m
# This is row 1 : 1 5 9 13
# This is row 2 : 2 6 10 14
# This is row 3 : 3 7 11 15
# This is row 4 : 4 8 12 16
## Alternatively, you can specify the print method yourself.
print.myMatrix(mm)
# This is row 1 : 1 5 9 13
# This is row 2 : 2 6 10 14
# This is row 3 : 3 7 11 15
# This is row 4 : 4 8 12 16

How to efficiently sum over levels defined in another variable?

I am new to R. Now I have a function as follow:
funItemAverRating = function()
{
itemRatingNum = array(0, itemNum);
print("begin");
apply(input, 1, function(x)
{
itemId = x[2]+1;
itemAverRating[itemId] <<- itemAverRating[itemId] + x[3];
itemRatingNum[itemId] <<- itemRatingNum[itemId] + 1;
}
);
}
In this function input is a n*3 data frame, n is ~6*(10e+7), itemRatingNum is a vector of size ~3*(10e+5).
My question is why the apply function is so slow (it would take nearly an hour to finish)? Also, as the function runs, it uses more and more memory. But as you can see, the variables are all defined outside the apply function. Can anybody help me?
cheng
It's slow because you call high-level R functions many times.
You have to vectorize your function, meaning that most operations (like <- or +1) should be computed over all data vectors.
For example it looks to me that itemRatingNum holds frequencies of input[[2]] (second column of input data.frame) which could be replaced by:
tb <- table(input[[2]]+1)
itemRatingNum[as.integer(names(tb))] <- tb
Don't do that. You're following a logic that is completely not R-like. If I understand it right, you want to add to a certain itemAverRating vector a value from a third column in some input dataframe.
What itemRatingNum is doing, is rather obscure. It does not end up in the global environment, and it just becomes a vector filled with frequencies at the end of the loop. As you define itemRatingNum within the function, the <<- assignment will also assign it within the local environment of the function, and it will get destroyed when the function ends.
Next, you should give your function input, and get some output. Never assign to the global environment if it's not necessary. Your function is equivalent to the - rather a whole lot faster - following function, which takes input and gives output :
funItemAverRating = function(x,input){
sums <- rowsum(input[,3],input[,2])
sumid <- as.numeric(rownames(sums))+1
x[sumid]+c(sums)
}
FUNCTION EDITED PER MAREKS COMMENT
Which works like :
# make data
itemNum <- 10
set.seed(12)
input <- data.frame(
a1 = rep(1:10,itemNum),
a2 = sample(9:0,itemNum*10,TRUE),
a3 = rep(10:1,itemNum)
)
itemAverRating <- array(0, itemNum)
itemAverRating <- funItemAverRating(itemAverRating,input)
itemAverRating
0 1 2 3 4 5 6 7 8 9
39 65 57 36 62 33 98 62 60 38
If I try your code, I get :
> funItemAverRating()
[1] "begin"
...
> itemAverRating
[1] 39 65 57 36 62 33 98 62 60 38
Which is the same. If you want itemRatingNum, then just do :
> itemRatingNum <- table(input[,2])
0 1 2 3 4 5 6 7 8 9
6 11 11 8 10 6 18 9 13 8

Resources