Time measurements in R - r

I'm trying to measure time used to solve a sysitem in R:
t1<-Sys.time()
b=t(Q)%*%t(Pi)%*%z
Sol<-BackwardSubs(R,b)
t2<-Sys.time()
DeltaT<-t2-t1
print(paste("System: ",DeltaT," sec",sep=""))
Sometimes i find results (1 or 2 seconds when the function execution is several minutes) with high dimensions of the problem that are not possible.
Is Sys.time() correct?

Related

Same expression, but takes much less time in a loop on average

I've been trying to find the computational expense with Sys.time(), starting with some simple operations.
I started with something like this
a=c(10,6,8,3,2,7,9,11,13)
t_beginning=Sys.time()
cl2=NULL
indx=which(a==7)
t_ending=Sys.time()
print(t_ending-t_beginning)
and it gives me about 0.0023sec after running the code in Rstudio.
Then the code is put into a for loop to find the average expense of the two lines.
sum=0
a=c(10,6,8,3,2,7,9,11,13)
for (i in 1:5) {
print(i)
t_beginning=Sys.time()
cl2=NULL
indx=which(a==7)
t_ending=Sys.time()
sum=t_ending-t_beginning+sum
print(t_ending-t_beginning)
}
sum/5
It turns out that, for every iteration in the for loop, the time consumption is just several milliseconds, much less than what it took as out of the for loop.
[1] 1
Time difference of 7.152557e-06 secs
[1] 2
Time difference of 5.00679e-06 secs
[1] 3
Time difference of 4.053116e-06 secs
[1] 4
Time difference of 4.053116e-06 secs
[1] 5
Time difference of 5.00679e-06 secs
I expect that the average time cost of the for loop to be about the same as that without a loop, but they are so different. Not sure why this is happening. Can anyone reproduce the same? Thanks!
The difference comes from the way RStudio (or R) runs the code.
The original code is executed line by line, so the timing you get includes interface between RStudio and R.
a=c(10,6,8,3,2,7,9,11,13)
t_beginning=Sys.time()
cl2=NULL
indx=which(a==7)
t_ending=Sys.time()
print(t_ending-t_beginning)
# Time difference of 0.02099395 secs
If, however, you run all this code at once, by wrapping the code in curvy brackets, the timing you get improve drastically:
{
a=c(10,6,8,3,2,7,9,11,13)
t_beginning=Sys.time()
cl2=NULL
indx=which(a==7)
t_ending=Sys.time()
print(t_ending-t_beginning)
}
# Time difference of 0 secs

Calculate Period Changes in Unevenly Sampled Times Series in R (or Matlab)

The heading says it all, I'm trying desperately to figure out a way calculate the period of a time series that is unevenly sampled. I tried creating an evenly sampled time series with NAs for the times where there is no data, but there are just too many NAs for any imputation method to do a reasonable job. The main problem being that the sample times are much further apart than the average period (VERY roughly 0.5), which is only obvious with period-folding applied. Because I'm looking for a small change in period, I can't round the sampling times.
Time-folded period
Here is a sample of the data:
HJD(time) Mag err
2088.91535 18.868 0.078
2090.87535 19.540 0.165
2103.92958 18.704 0.040
2104.94812 19.291 0.098
2106.84596 18.910 0.066
...
4864.56170 18.835 0.061
The data set has about 650 rows.
I've spent almost a week googling my problem and nothing has helped yet so any ideas would be greatly appreciated! I have some experience with Matlab too, so if it's possible to do it with Matlab rather than R, I'd be happy with that too.
I do not think that there is a way to do what you want. The Nyquist-Shannon theorem states that your average sampling frequency needs to be at least twice as high as the frequency of the events you want to capture (roughly speaking).
So if you want to extract information from events with a period of 0.5 [units] you will need a sample every 0.25 [units].
Note that this is a mathematical limitation, not one of R.

Estimating functions run time

I have a large list of 15000 elements each containing 10 numbers (data)
I am doing a time series cluster analysis using
distmatrix <- dist(data, method = "DTW")
This has now been running for 24 hours. Is it likely to complete any time soon. Is there a way of checking on it's progress. I don't want to abort just in case it's about to finish.

Snowfall's sfApply and sfClusterApplyLB is slower than normal loop or sapply [duplicate]

This question already has answers here:
Why is the parallel package slower than just using apply?
(3 answers)
Closed 9 years ago.
When i apply this code in R, the loop and sapply are faster than snowfall's functions. What am i doing wrong? (using windows 8)
library(snowfall)
a<- 2
sfInit(parallel = TRUE, cpus = 4)
wrapper <- function(x){((x*a)^2)/3}
sfExport('a')
values <- seq(0, 100,1)
benchmark(for(i in 1:length(values)){wrapper(i)},sapply(values,wrapper),sfLapply(values, wrapper),sfClusterApplyLB(values, wrapper))
sfStop()
elapsed time for after 100 replications:
loop 0.05
sapply 0.07
sfClusterApplySB 2.94
sfApply 0.26
If the function that is sent to each of the worker nodes takes a small amount of time, the overhead of paralellization causes the overall duration of the task to take longer than running the job serially. When the jobs that are sent to the worker nodes take a significant amount of time (at least several seconds), than paralellization will really show improved performance.
See also:
Why is the parallel package slower than just using apply?
Searching for [r] parallel will yield at least 20 questions like yours, including more details as to what you can do to solve the problem.

Thinking in Vectors with R

I know that R works most efficiently with vectors and looping should be avoided. I am having a hard time teaching myself to actually write code this way. I would like some ideas on how to 'vectorize' my code. Here's an example of creating 10 years of sample data for 10,000 non unique combinations of state (st), plan1 (p1) and plan2 (p2):
st<-NULL
p1<-NULL
p2<-NULL
year<-NULL
i<-0
starttime <- Sys.time()
while (i<10000) {
for (years in seq(1991,2000)) {
st<-c(st,sample(c(12,17,24),1,prob=c(20,30,50)))
p1<-c(p1,sample(c(12,17,24),1,prob=c(20,30,50)))
p2<-c(p2,sample(c(12,17,24),1,prob=c(20,30,50)))
year <-c(year,years)
}
i<-i+1
}
Sys.time() - starttime
This takes about 8 minutes to run on my laptop. I end up with 4 vectors, each with 100,000 values, as expected. How can I do this faster using vector functions?
As a side note, if I limit the above code to 1000 loops on i it only takes 2 seconds, but 10,000 takes 8 minutes. Any idea why?
Clearly I should have worked on this for another hour before I posted my question. It's so obvious in retrospect. :)
To use R's vector logic I took out the loop and replaced it with this:
st <- sample(c(12,17,24),10000,prob=c(20,30,50),replace=TRUE)
p1 <- sample(c(12,17,24),10000,prob=c(20,30,50),replace=TRUE)
p2 <- sample(c(12,17,24),10000,prob=c(20,30,50),replace=TRUE)
year <- rep(1991:2000,1000)
I can now do 100,000 samples almost instantaneous. I knew that vectors were faster, but dang. I presume 100,000 loops would have taken over an hour using a loop and the vector approach takes <1 second. Just for kicks I made the vectors a million. It took ~2 seconds to complete. Since I must test to failure, I tried 10mm but ran out of memory on my 2GB laptop. I switched over to my Vista 64 desktop with 6GB ram and created vectors of length 10mm in 17 seconds. 100mm made things fall apart as one of the vectors was over 763mb which resulted in an allocation issue with R.
Vectors in R are amazingly fast to me. I guess that's why I am an economist and not a computer scientist.
To answer your question about why the loop of 10000 took much longer than your loop of 1000:
I think the primary suspect is the concatenations that are happening every loop. As the data gets longer R is probably copying every element of the vector into a new vector that is one longer. Copying a small (500 elements on average) data set 1000 times is fast. Copying a larger (5000 elements on average) data set 10000 times is slower.

Resources