Same expression, but takes much less time in a loop on average - r

I've been trying to find the computational expense with Sys.time(), starting with some simple operations.
I started with something like this
a=c(10,6,8,3,2,7,9,11,13)
t_beginning=Sys.time()
cl2=NULL
indx=which(a==7)
t_ending=Sys.time()
print(t_ending-t_beginning)
and it gives me about 0.0023sec after running the code in Rstudio.
Then the code is put into a for loop to find the average expense of the two lines.
sum=0
a=c(10,6,8,3,2,7,9,11,13)
for (i in 1:5) {
print(i)
t_beginning=Sys.time()
cl2=NULL
indx=which(a==7)
t_ending=Sys.time()
sum=t_ending-t_beginning+sum
print(t_ending-t_beginning)
}
sum/5
It turns out that, for every iteration in the for loop, the time consumption is just several milliseconds, much less than what it took as out of the for loop.
[1] 1
Time difference of 7.152557e-06 secs
[1] 2
Time difference of 5.00679e-06 secs
[1] 3
Time difference of 4.053116e-06 secs
[1] 4
Time difference of 4.053116e-06 secs
[1] 5
Time difference of 5.00679e-06 secs
I expect that the average time cost of the for loop to be about the same as that without a loop, but they are so different. Not sure why this is happening. Can anyone reproduce the same? Thanks!

The difference comes from the way RStudio (or R) runs the code.
The original code is executed line by line, so the timing you get includes interface between RStudio and R.
a=c(10,6,8,3,2,7,9,11,13)
t_beginning=Sys.time()
cl2=NULL
indx=which(a==7)
t_ending=Sys.time()
print(t_ending-t_beginning)
# Time difference of 0.02099395 secs
If, however, you run all this code at once, by wrapping the code in curvy brackets, the timing you get improve drastically:
{
a=c(10,6,8,3,2,7,9,11,13)
t_beginning=Sys.time()
cl2=NULL
indx=which(a==7)
t_ending=Sys.time()
print(t_ending-t_beginning)
}
# Time difference of 0 secs

Related

R - Sys.time() and as.POSIXct() issue [duplicate]

I want to run a R code at a specific time that I need.
And after the process finished, I want to terminate the R session.
If a code is as below,
tm<-Sys.time()
write.table(tm,file='OUT.TXT', sep='\t');
quit(save = "no")
What should I do to run this code at "2012-04-18 17:25:40".
I need your help. Thanks in advance.
It is easiest to use the Task Scheduler of Windows, or a cron job under Linux. There you can specify a command or program that should be run at a certain time you specify.
If somehow you cannot use the cron job service and have to schedule within R, the following R code shows how to wait a specific amount of time so as to execute at a pre-specified target time.
stop.date.time.1 <- as.POSIXct("2012-12-20 13:45:00 EST") # time of last afternoon execution.
stop.date.time.2 <- as.POSIXct("2012-12-20 7:45:00 EST") # time of last morning execution.
NOW <- Sys.time() # the current time
lapse.time <- 24 * 60 * 60 # A day's worth of time in Seconds
all.exec.times.1 <- seq(stop.date.time.1, NOW, -lapse.time) # all of afternoon execution times.
all.exec.times.2 <- seq(stop.date.time.2, NOW, -lapse.time) # all of morning execution times.
all.exec.times <- sort(c(all.exec.times.1, all.exec.times.2)) # combine all times and sort from recent to future
cat("To execute your code at the following times:\n"); print(all.exec.times)
for (i in seq(length(all.exec.times))) { # for each target time in the sequence
## How long do I have to wait for the next execution from Now.
wait.time <- difftime(Sys.time(), all.exec.times[i], units="secs") # calc difference in seconds.
cat("Waiting for", wait.time, "seconds before next execution\n")
if (wait.time > 0) {
Sys.sleep(wait.time) # Wait from Now until the target time arrives (for "wait.time" seconds)
{
## Put your execution code or function call here
}
}
}

Time measurements in R

I'm trying to measure time used to solve a sysitem in R:
t1<-Sys.time()
b=t(Q)%*%t(Pi)%*%z
Sol<-BackwardSubs(R,b)
t2<-Sys.time()
DeltaT<-t2-t1
print(paste("System: ",DeltaT," sec",sep=""))
Sometimes i find results (1 or 2 seconds when the function execution is several minutes) with high dimensions of the problem that are not possible.
Is Sys.time() correct?

Add 24 hours to timestamp in R and exit loop

Given
launchTime = Sys.timeDate(FinCenter = "America/Los_Angeles")
launchTime looks like:
America/Los_Angeles
[1] [2013-06-26 12:52:28]
I would like to add 24 hours to tStamp and call it exitTime.
Now at launchTime I start an R script which has a loop, that runs for say 7 days.
What would be a good way to put in a condition or few lines code that allows for exiting from the loop when the real time reaches exitTime?
Now I would like the condition to check till the accuracy of the day, hour and minute level. Not at the seconds level.
Set something like: exitTime <- as.numeric(Sys.time()+(60*60*24)) (to get one day from the present) second. Then include a conditional in your loop like:
if(as.numeric(Sys.time()) > exitTime)
break
I don't follow your bit about precision.

I want to run a R code at a specific time

I want to run a R code at a specific time that I need.
And after the process finished, I want to terminate the R session.
If a code is as below,
tm<-Sys.time()
write.table(tm,file='OUT.TXT', sep='\t');
quit(save = "no")
What should I do to run this code at "2012-04-18 17:25:40".
I need your help. Thanks in advance.
It is easiest to use the Task Scheduler of Windows, or a cron job under Linux. There you can specify a command or program that should be run at a certain time you specify.
If somehow you cannot use the cron job service and have to schedule within R, the following R code shows how to wait a specific amount of time so as to execute at a pre-specified target time.
stop.date.time.1 <- as.POSIXct("2012-12-20 13:45:00 EST") # time of last afternoon execution.
stop.date.time.2 <- as.POSIXct("2012-12-20 7:45:00 EST") # time of last morning execution.
NOW <- Sys.time() # the current time
lapse.time <- 24 * 60 * 60 # A day's worth of time in Seconds
all.exec.times.1 <- seq(stop.date.time.1, NOW, -lapse.time) # all of afternoon execution times.
all.exec.times.2 <- seq(stop.date.time.2, NOW, -lapse.time) # all of morning execution times.
all.exec.times <- sort(c(all.exec.times.1, all.exec.times.2)) # combine all times and sort from recent to future
cat("To execute your code at the following times:\n"); print(all.exec.times)
for (i in seq(length(all.exec.times))) { # for each target time in the sequence
## How long do I have to wait for the next execution from Now.
wait.time <- difftime(Sys.time(), all.exec.times[i], units="secs") # calc difference in seconds.
cat("Waiting for", wait.time, "seconds before next execution\n")
if (wait.time > 0) {
Sys.sleep(wait.time) # Wait from Now until the target time arrives (for "wait.time" seconds)
{
## Put your execution code or function call here
}
}
}

Thinking in Vectors with R

I know that R works most efficiently with vectors and looping should be avoided. I am having a hard time teaching myself to actually write code this way. I would like some ideas on how to 'vectorize' my code. Here's an example of creating 10 years of sample data for 10,000 non unique combinations of state (st), plan1 (p1) and plan2 (p2):
st<-NULL
p1<-NULL
p2<-NULL
year<-NULL
i<-0
starttime <- Sys.time()
while (i<10000) {
for (years in seq(1991,2000)) {
st<-c(st,sample(c(12,17,24),1,prob=c(20,30,50)))
p1<-c(p1,sample(c(12,17,24),1,prob=c(20,30,50)))
p2<-c(p2,sample(c(12,17,24),1,prob=c(20,30,50)))
year <-c(year,years)
}
i<-i+1
}
Sys.time() - starttime
This takes about 8 minutes to run on my laptop. I end up with 4 vectors, each with 100,000 values, as expected. How can I do this faster using vector functions?
As a side note, if I limit the above code to 1000 loops on i it only takes 2 seconds, but 10,000 takes 8 minutes. Any idea why?
Clearly I should have worked on this for another hour before I posted my question. It's so obvious in retrospect. :)
To use R's vector logic I took out the loop and replaced it with this:
st <- sample(c(12,17,24),10000,prob=c(20,30,50),replace=TRUE)
p1 <- sample(c(12,17,24),10000,prob=c(20,30,50),replace=TRUE)
p2 <- sample(c(12,17,24),10000,prob=c(20,30,50),replace=TRUE)
year <- rep(1991:2000,1000)
I can now do 100,000 samples almost instantaneous. I knew that vectors were faster, but dang. I presume 100,000 loops would have taken over an hour using a loop and the vector approach takes <1 second. Just for kicks I made the vectors a million. It took ~2 seconds to complete. Since I must test to failure, I tried 10mm but ran out of memory on my 2GB laptop. I switched over to my Vista 64 desktop with 6GB ram and created vectors of length 10mm in 17 seconds. 100mm made things fall apart as one of the vectors was over 763mb which resulted in an allocation issue with R.
Vectors in R are amazingly fast to me. I guess that's why I am an economist and not a computer scientist.
To answer your question about why the loop of 10000 took much longer than your loop of 1000:
I think the primary suspect is the concatenations that are happening every loop. As the data gets longer R is probably copying every element of the vector into a new vector that is one longer. Copying a small (500 elements on average) data set 1000 times is fast. Copying a larger (5000 elements on average) data set 10000 times is slower.

Resources