I have some array named P_Array with 100,000 data points and need to calculate the first order autocorrelation for subintervalls of length 100, i.e. from 1:100 and 2:101 etc. I've written a loop which works just fine, but is very slow.
Tf <- 100000
acf_Array <- rep(0, length.out = Tf-100)
for (t in 1:(Tf-100)){
acf_Array[t] <- acf(P_Array[t:(t+100)])$acf[2]
}
My idea was to use something like
acf_Array[1:(Tf-100)] <- acf(P_Array[(1:(Tf-100)):(101:Tf)])$acf[2]
which, however, does not work. Any suggestions?
Edit
I think this will do the trick
for (t in 1:(Tf-100)){
acf_Array[t] <- cor(P_Array[t:(t+98)], P_Array[(t+1):(t+99)])
}
To answer the specific question on vectorising the for loop, this is my answer:
acf_Array <- sapply(1:Tf-100, function(x) acf(P_Array[x:x+100])$acf[2])
But as mentioned in the comments the speed limiting bit is probably the acf function.
Related
I have a dataframe (check the picture). I am creating periods of 30 values and I am calculating how many of this values are over 0.1. At the end, I want to save all the 336 outputs in a dataframe (as a row). How could I do that? My code is failing!
i <- 0
secos=as.data.frame(NULL)
for (i in c(0:336)){
hola=as.data.frame(pp[c(1+i:29 + i)])
secos[[i]]=sum(hola > 0.1)
secos=rbind(secos[[i]])}
Iteratively building (growing) data.frames in R is a bad thing. For good reading, see the R Inferno, chapter 2 on Growing Objects. Bottom line, though: it works, but as you add more rows, it will get progressively slower and use (at least) twice as much memory as you intend.
You explicitly overwrite secos with rbind(secos[[i]]), where the rbind call is a complete no-op doing nothing. (e.g., see identical(rbind(mtcars), mtcars)). Back to (1), best to L <- lapply(0:336, function(i) ...) then secos <- do.call(rbind, L).
R indexes are 1-based, but your first call assigns to secos[[0]] which fails.
A literal translation of this into a better start is something like the following. (Up front, your reference to pp only makes sense if you have an object pp that you used to create your data.frame above ... since pp[.] by itself will not reference the frame. If you're using attach(.) to be able to do that, then ... don't. Too many risks and things that can go wrong with it, it is one of the base functions I'd vote to remove.)
invec <- 0:336
L <- sapply(invec, function(i) {
hola=as.data.frame(pp[c(1+i:29 + i)])
sum(hola > 0.1)
})
secos <- data.frame(i = invec, secos = L)
An alternative:
L <- lapply(invec, function(i) {
hola=as.data.frame(pp[c(1+i:29 + i)])
data.frame(secos = sum(hola > 0.1))
})
out <- do.call(rbind, L)
I can't help but think there is a more efficient, R-idiomatic way to aggregate this data. My guess is that it's a moving window of sorts, perhaps a month wide (or similar). If that's the case, I recommend looking into zoo::rollapply(pp, 30, function(z) sum(z > 0.1)), perhaps with meaningful application of align=, partial=, and/or fill=.
(1,2,2,3,3,3,4,4,4,4,...,n,...,n)
I want to make the above vector by for loop, but not using rep function or the other functions. This may not be good question to ask in stackoverflow, but since I am a newbie for R, I dare ask here to be helped.
(You can suppose the length of the vector is 10)
With a for loop, it can be done with
n <- 10
out <- c()
for(i in seq_len(n)){
for(j in seq_len(i)) {
out <- c(out, i)
}
}
In R, otherwise, this can be done as
rep(seq_len(n), seq_len(n))
I have been beaten by #akrun by seconds, even so I'd like to give you a few hints if using rep would have been possible which may help you with R in general. (Without rep usage, just look at #akrun)
Short answer using rep
rep(1:n, 1:n)
Long Answer using rep
Before posting a question you should try to develop your own solutions and share them.
Trying googling a bit and sharing what you already found is usually good as well. Please, have a look at "help/how-to-ask"
Let's try to do it together.
First of all, we should try to have a look at official sources:
R-project "getting help", here you can see the standard way to get a function's documentation is just typing ?func_name in your R console
R-project "official manuals" offer a good introduction to R. Try looking at the first topic, "An Introduction to R"
From the previous two (and other sources as well) you will find two interesting functions:
: operator: it can be used to generate a sequence of integers from a to b like a:b. Typing 1:3, for instance, gives you the 1, 2, 3 vector
rep(x, t) is a function which can be used to replicate the item(s) x t times.
You also need to know R is "vector-oriented", that is it applies functions over vectors without you typing explicits loops.
For instance, if you call repl(1:3, 2), it's (almost) equivalent to running:
for(i in 1:3)
rep(i, 2)
By combining the previous two functions and the notion R is "vector-oriented", you get the rep(1:n, 1:n) solution.
I am not sure why you don't want to use rep, but here is a method of not using it or any functions similar to rep within the loop.
`for (i in 1:10){
a<-NA
a[1:i] <- i
if (i==1){b<-a}
else if (i >1){b <- c(b,a)}
assign("OutputVector",b,envir = .GlobalEnv)
}`
`OutputVector`
Going for an n of ten seemed subjective so I just did the loop for numbers 1 through 10 and you can take the first 10 numbers in the vector if you want. OutputVector[1:10]
You can do this with a single loop, though it's a while rather than a for
n <- 10
x <- 1;
i <- 2;
while(i <= n)
{
x <- c(x, 1/i);
if(sum(x) %% 1 == 0) i = i + 1;
}
1/x
I need to scale up a set of files for a proof of concept in my company. Essentially have several 1000row files with around 200 columns each, and I want to rbind them until I reach the desired scale. This might be 1Million rows or more.
The output will be essentially a repetition of data (sounds a bit silly) and I'm aware of that, but i just need to prove something.
I used a while loop in R similar to this:
while(nrow(df) < 1000000) {df <- rbind(df,df);}
This seems to work but it looks a bit computationally heavy. It might might take like 10-15minutes.
I though of creating a function (below) and use an "apply" family function on the df, but couldn't succeed:
scaleup_function <- function(x)
{
while(nrow(df) < 1000)
{
x <- rbind(df, df)
}
}
Is there a quicker and more efficient way of doing it (it doesn't need to be with rbind) ?
Many thanks,
Joao
This should do the trick:
df <- matrix(0,nrow=1000,ncol=200)
reps_needed <- ceiling(1000000 / nrow(df))
df_scaled <- df[rep(1:nrow(df),reps_needed),]
I have a large data set and I want to perform several functions at once and extract for each a parameter.
The test dataset:
testdf <- data.frame(vy = rnorm(60), vx = rnorm(60) , gvar = rep(c("a","b"), each=30))
I first definded a list of functions:
require(fBasics)
normfuns <- list(jarqueberaTest=jarqueberaTest, shapiroTest=shapiroTest, lillieTest=lillieTest)
Then a function to perform the tests by the grouping variable
mynormtest <- function(d) {
norm_test <- res_reg <- list()
for (i in c("a","b")){
res_reg[[i]] <- residuals(lm(vy~vx, data=d[d$gvar==i,]))
norm_test[[i]] <- lapply(normfuns, function(f) f(res_reg[[i]]))
}
return(norm_test)
}
mynormtest(testdf)
I obtain a list of test summaries for each grouping variable.
However, I am interested in getting only the parameter "STATISTIC" and I did not manage to find out how to extract it.
You can obtain the value stored as "STATISTIC" in the output of the various tests with
res_list <- mynormtest(testdf)
res_list$a$shapiroTest#test#statistic
res_list$a$jarqueberaTest#test#statistic
res_list$a$lillieTest#test#statistic
And correspondingly for set b:
res_list$b$shapiroTest#test$statistic
res_list$b$jarqueberaTest#test$statistic
res_listb$lillieTest#test$statistic
Hope this helps.
Concerning your function fgetparam I think that it is a nice starting point. Here's my suggestion with a few minor modifications:
getparams2 <- function(myp) {
m <- matrix(NA, nrow=length(myp), ncol=3)
for (i in (1:length(myp))){
m[i,] <- sapply(1:3,function(x) myp[[i]][[x]]#test$statistic)}
return(m)
}
This function represents a minor generalization in the sense that it allows for an arbitrary number of observations, while in your case this was fixed to two cases, a and b. The code can certainly be further shortened, but it might then also become somewhat more cryptic. I believe that in developing a code it is helpful to preserve a certain compromise between efficacy and compactness on one hand and readability or easiness to understand on the other.
Edit
As pointed out by #akrun and #Roland the function getparams2() can be written in a much more elegant and shorter form. One possibility is
getparams2 <- function(myp) {
matrix(unname(rapply(myp, function(x) x#test$statistic)),ncol=3)}
Another great alternative is
getparams2 <- function(myp){t(sapply(myp, sapply, function(x) x#test$statistic))}
I feel dumb asking such a simple question, but I can't seem to find a way, although I'm sure there are plenty of ways. The easiest way to explain my problem might be to show an example, I've got some program I want applied.
FUN<- function(v1, v2, v3){
n=length(v1)
res <- vector()
for (i in 1:n){
if(v1[i]>v2[i]) (res[i] <- v3[i+2])
else(res[i] <- v1[i+2])}
return(res)}
The input is two vectors and a matrix, all being the same length
matrix <- matrix(runif(30),ncol=3)
v2 <- runif(10)
v3 <- rnorm(10)
So that when I run the a for loop including the function I can do the program i times and each time the output goes to a different column in a matrix. I've tried something like and several similar "versions", but with no luck.
for (i in 1:3)(
r <- matrix()
r[re,i] <- re <- FUN(matrix[,i], v2, v3))
Can anyone please help me?
r <- matrix(ncol=3, nrow=10)
for (i in 1:3) {
r[,i] <- FUN(matrix[,i], v2, v3)
}
Declare your matrix outside the loop, and just fill in one column per loop iteration.
(This assumes that FUN is correct; even if it is, there are better ways to do what it does. And other ways to do what you want other than a loop.)