Measuring function execution time and time of completion by default in R - r

Measuring function execution time in R is simple but pollutes the code.
t0 <- Sys.time()
my_function()
t1 <- Sys.time()
t1-t0
Is there some package or setting in R that makes it record the execution time (duration) and the time of completion and print that to the screen after the function output?
In stata this can be done by the setting:
set rmsg on
After that if you run a block of code, with the following 4 commands:
clear
set obs 3
gen x=1
The output window would display:
. clear
r; t=0.00 9:10:28
. set obs 3
number of observations (_N) was 0, now 3
r; t=0.00 9:10:28
. gen x=1
r; t=0.00 9:10:28
.
end of do-file
r; t=0.00 9:10:28
Above we have execution and completion time for:
each command command. This follow the commands own output (bare in mind clear and gen have no screen output).
the hole command block. This is indicated by adding end of do-file and the time information after than.
I find this very useful when working on large datasets.
Is there a way to do this in R?
If not, would it be too complicated to create a package to implement this feature?

Have a look at the microbenchmark package. E.g.
microbenchmark::microbenchmark(my_fun(), times = 100L, unit = "ms")
summary(microbenchmark::microbenchmark(my_fun(), times = 100L, unit = "ms"))$uq
With the latter option you can access entries for further tests.
Running example:
microbenchmark::microbenchmark(factorial(100), times = 100L, unit = "us")

Related

R - Sys.time() and as.POSIXct() issue [duplicate]

I want to run a R code at a specific time that I need.
And after the process finished, I want to terminate the R session.
If a code is as below,
tm<-Sys.time()
write.table(tm,file='OUT.TXT', sep='\t');
quit(save = "no")
What should I do to run this code at "2012-04-18 17:25:40".
I need your help. Thanks in advance.
It is easiest to use the Task Scheduler of Windows, or a cron job under Linux. There you can specify a command or program that should be run at a certain time you specify.
If somehow you cannot use the cron job service and have to schedule within R, the following R code shows how to wait a specific amount of time so as to execute at a pre-specified target time.
stop.date.time.1 <- as.POSIXct("2012-12-20 13:45:00 EST") # time of last afternoon execution.
stop.date.time.2 <- as.POSIXct("2012-12-20 7:45:00 EST") # time of last morning execution.
NOW <- Sys.time() # the current time
lapse.time <- 24 * 60 * 60 # A day's worth of time in Seconds
all.exec.times.1 <- seq(stop.date.time.1, NOW, -lapse.time) # all of afternoon execution times.
all.exec.times.2 <- seq(stop.date.time.2, NOW, -lapse.time) # all of morning execution times.
all.exec.times <- sort(c(all.exec.times.1, all.exec.times.2)) # combine all times and sort from recent to future
cat("To execute your code at the following times:\n"); print(all.exec.times)
for (i in seq(length(all.exec.times))) { # for each target time in the sequence
## How long do I have to wait for the next execution from Now.
wait.time <- difftime(Sys.time(), all.exec.times[i], units="secs") # calc difference in seconds.
cat("Waiting for", wait.time, "seconds before next execution\n")
if (wait.time > 0) {
Sys.sleep(wait.time) # Wait from Now until the target time arrives (for "wait.time" seconds)
{
## Put your execution code or function call here
}
}
}

Vectorizing R custom calculation with dynamic day range

I have a big dataset (around 100k rows) with 2 columns referencing a device_id and a date and the rest of the columns being attributes (e.g. device_repaired, device_replaced).
I'm building a ML algorithm to predict when a device will have to be maintained. To do so, I want to calculate certain features (e.g. device_reparations_on_last_3days, device_replacements_on_last_5days).
I have a function that subsets my dataset and returns a calculation:
For the specified device,
That happened before the day in question,
As long as there's enough data (e.g. if I want last 3 days, but only 2 records exist this returns NA).
Here's a sample of the data and the function outlined above:
data = data.frame(device_id=c(rep(1,5),rep(2,10))
,day=c(1:5,1:10)
,device_repaired=sample(0:1,15,replace=TRUE)
,device_replaced=sample(0:1,15,replace=TRUE))
# Exaxmple: How many times the device 1 was repaired over the last 2 days before day 3
# => getCalculation(3,1,data,"device_repaired",2)
getCalculation <- function(fday,fdeviceid,fdata,fattribute,fpreviousdays){
# Subset dataset
df = subset(fdata,day<fday & day>(fday-fpreviousdays-1) & device_id==fdeviceid)
# Make sure there's enough data; if so, make calculation
if(nrow(df)<fpreviousdays){
calculation = NA
} else {
calculation = sum(df[,fattribute])
}
return(calculation)
}
My problem is that the amount of attributes available (e.g. device_repaired) and the features to calculate (e.g. device_reparations_on_last_3days) has grown exponentially and my script takes around 4 hours to execute, since I need to loop over each row and calculate all these features.
I'd like to vectorize this logic using some apply approach which would also allow me to parallelize its execution, but I don't know if/how it's possible to add these arguments to a lapply function.

From Stata to R: recoding bysort and xtreg

I'm very new to R and currently working on a replication project for a meta-research course at my university. The paper examines if having a in-home display to monitor energy consumption reduces the energy usage. I have already recoded 300 lines of code, but now I ran into a problem I could not yet solve.
The source code says: bysort id expdays: egen ave15 = mean(power) if hours0105==1
I do understand what this does, but I cannot replicate it in R. id is the identifier for the examined household and expdays denotes the current day of the experiment. So ave15 is the average power consumption from midnight to 6 am sorted for every household on each day. I figured out that (EIPbasedata is the complete dataset containing hourly data)
EIPbasedata$ave15[EIPbasedata$hours0105 == 1] <- ave(EIPbasedata$power, EIPbasedata$ID, EIPbasedata$ExpDays, FUN=mean)
would probably do the job, but this gives me a warning:
number of items to replace is not a multiple of replacement length
and the results are not right too. I do not have any idea what I could do to solve this.
The next thing I struggle to recode is:
xtreg ln_power0105 ihd0105 i.days0105 if exptime==4, fe vce(bootstrap, rep(200) seed(12345))
I think the right way would be using plm but I'm not sure how to implement the if condition (days0105 is a running variable for the number of the day in experiment and 0 if not between 0-6am, ihd0105 is a dummy for having an in-home display, exptime denotes 4 am in the morning- however I do not understand what exptime does here)
table4_1 <- plm(EIPbasedata$ln_power0105 ~ EIPbasedata$ihd0105, data=EIPbasedata, index = c("days0105"), model="within")
How do I compute the bootstrapped standard errors in plm?
I hope some expert can help me, since my R and Stata knowledge is not sufficient for this..
My lecturer provided the answer to me: at first i do specify a subsample which I call tmp_data here: tmp_data <- EIPbasedata[which(EIPbasedata$ExpTime == 4) , ]
Then I'm regressing the tmp_data with as.factor(days0105) values, which is the R equivalent to i.days0105
tmp_results <- plm(tmp_data$ln_power0105 ~ tmp_data$ihd0105 + as.factor(tmp_data$days0105), data = tmp_data, index = ("ID"), model = "within")
There are probably better and cleaner ways to do this, but I'm fine with it for now.

How do I time my script in R? [duplicate]

This question already has answers here:
Measuring function execution time in R
(15 answers)
Closed 8 years ago.
I guess I have a simple and straightforward question.
I am running a script and for each function I want to time the runtime of the function. I suppose there is a function to time my function. Can anybody help me here?
I have been searching but keep finding functions for time series and time intervals. I am not searching that.
As the others in the comments mentioned before, the simplest way is with system.time. Here's an example code from the system.time manual page
require(stats)
system.time(for(i in 1:100) mad(runif(1000)))
## Not run:
exT <- function(n = 10000) {
# Purpose: Test if system.time works ok; n: loop size
system.time(for(i in 1:n) x <- mean(rt(1000, df = 4)))
}
#-- Try to interrupt one of the following (using Ctrl-C / Escape):
exT() #- about 4 secs on a
2.5GHz Xeon
system.time(exT()) #~ +/- same
On my machine, once the function exT() is called, this is my output:
user system elapsed
2.916 0.004 2.925
And for the function system.time(exT()) I get the following output:
user system elapsed
3.004 0.016 3.026
This means that for the first case the elapsed time is 2.925 seconds and 3.026 for the second.
However, if you want to perform benchmark tests, you should use the package rbenchmark (go here). This is a library which consists of one function:
The library consists of just one function, benchmark, which is a
simple wrapper around system.time.
On the link I've provided, you can see more examples of how to use this package. There are 4 examples there, which are pretty good.

I want to run a R code at a specific time

I want to run a R code at a specific time that I need.
And after the process finished, I want to terminate the R session.
If a code is as below,
tm<-Sys.time()
write.table(tm,file='OUT.TXT', sep='\t');
quit(save = "no")
What should I do to run this code at "2012-04-18 17:25:40".
I need your help. Thanks in advance.
It is easiest to use the Task Scheduler of Windows, or a cron job under Linux. There you can specify a command or program that should be run at a certain time you specify.
If somehow you cannot use the cron job service and have to schedule within R, the following R code shows how to wait a specific amount of time so as to execute at a pre-specified target time.
stop.date.time.1 <- as.POSIXct("2012-12-20 13:45:00 EST") # time of last afternoon execution.
stop.date.time.2 <- as.POSIXct("2012-12-20 7:45:00 EST") # time of last morning execution.
NOW <- Sys.time() # the current time
lapse.time <- 24 * 60 * 60 # A day's worth of time in Seconds
all.exec.times.1 <- seq(stop.date.time.1, NOW, -lapse.time) # all of afternoon execution times.
all.exec.times.2 <- seq(stop.date.time.2, NOW, -lapse.time) # all of morning execution times.
all.exec.times <- sort(c(all.exec.times.1, all.exec.times.2)) # combine all times and sort from recent to future
cat("To execute your code at the following times:\n"); print(all.exec.times)
for (i in seq(length(all.exec.times))) { # for each target time in the sequence
## How long do I have to wait for the next execution from Now.
wait.time <- difftime(Sys.time(), all.exec.times[i], units="secs") # calc difference in seconds.
cat("Waiting for", wait.time, "seconds before next execution\n")
if (wait.time > 0) {
Sys.sleep(wait.time) # Wait from Now until the target time arrives (for "wait.time" seconds)
{
## Put your execution code or function call here
}
}
}

Resources