Write to file elapsed time, with chosen/known units - r

I am using R and Rstudio. I am a complete newbie.
I mean to write to file the time elapsed in each iteration of a loop. So I define three variables: start <- Sys.time() (at the beginning of code), and similarly prevtime and currtime (at the beginning and end of each iteration).
The values of variables are (e.g., last iteration)
> currtime - start
Time difference of 5.486106 mins
> currtime - prevtime
Time difference of 1.239183 secs
R automatically sets units. But then if I execute
> write( currtime - start, file = "test.Rout", append = F )
> write( currtime - prevtime, file = "test.Rout", append = T )
I get in test.Rout
5.486106
1.239183
with no units.
Is there any way to force the units for writing to file (write), e.g., all to seconds, so there is no ambiguity?
I guess I could scan the output of currtime - prevtime and find the units, but I am sure there is a very simple way to do this.
I guess I could also use system.time(*{mycommands}*), but I think it would be easier to have variables assigned, since I might want to define time points in the middle of my loop, and get various time differences.

Try isolating the units and combining them to the time value:
start <- Sys.time()
currtime <- start + 180
diff.time <- currtime - start
timediff <- paste(diff.time, attr(diff.time, "units"))
write(timediff, file = "test.Rout", append = F )
Explanation
The object diff.time is of the class difftime. When you write it to file, the attributes are being dropped. Check str(diff.time) to see the structure:
str(diff.time)
Class 'difftime' atomic [1:1] 3
..- attr(*, "units")= chr "mins"
The attribute we are looking for is "units" and its value is "mins". We can extract that attribute and paste it to the time difference.
Checking attributes
We can check the attributes with attr("object", "name of attribute"):
attr(diff.time, "units")
[1] "mins"
We can also check with attributes and subset list-style:
attributes(diff.time)
$units
[1] "mins"
$class
[1] "difftime"

Related

R paste time difference with unit (sec, min etc)

In R I want to get the timing in a character string keeping the unit (e.g., if it is sec or min). Please see example code below.
T1 <- Sys.time()
T2 <- Sys.time()
duration <- T2-T1
# Looking at duration show unit:
duration
time_description <- paste("it took: ", round(duration, 2), sep="", col="")
# However int time description the unit is removed
time_description
Preferably without using additional packages.
Thanks in advance.
You can use units to extract the unit from difftime object.
time_description <- sprintf('it took %.2f %s', duration, units(duration))
time_description
#[1] "it took 0.39 secs"

In R, image processing loop takes an order of magnitude longer to process after ~50 iterations

EDIT: changed the image names from Image1-Image11 to Image50-Image60 for clarity.
EDIT2: Solved by adding a garbage collection command after removing the image file in each loop iteration. Code is updated.
I have 400+ jpeg images in a folder. I'm trying write a script to: read each image, identify some text in the image, and then write the file name and that text into a data frame.
When I run the script below, the first ~50 iterations print a time of .1-.3 seconds. Then, for a few iterations, the iteration will take 1-3 seconds. Then, this bumps up to 1-5 minutes, after which I kill the script.
library(dplyr)
library(magick)
fileList3 = list.files(path = filePath)
printJobXRes = data.frame(
jobName = as.character(),
xRes = as.numeric(),
stringsAsFactors = FALSE
)
i = 0
for (fileName in fileList3){
img = paste0(filePath, '/', fileName, '_TestImage.jpg')
start_time = Sys.time()
temp.xRes = image_read(img, strip = T) %>%
image_rotate(270) %>%
image_crop('90x150+1750') %>%
image_negate %>%
image_convert(type = 'Bilevel') %>%
image_ocr %>%
as.numeric
stop_time = Sys.time()
i = i+1
print(paste(fileName,'first attempt, item #', i))
print(stop_time-start_time)
temp.df3 = data.frame(
jobName = fileName,
xRes = temp.xRes,
stringsAsFactors = FALSE
)
printJobXRes = rbind(printJobXRes, temp.df3)
rm(temp.xRes)
rm(temp.df3)
rm(img)
gc() #This solved the issue
}
Here's a couple lines of the output:
#Images 1-49 process in .1-.3 seconds each
[1] "Image50.jpg first attempt, item # 50"
Time difference of 0.2320111 secs
[1] "Image51.jpg first attempt, item # 51"
Time difference of 0.213742 secs
[1] "Image52.jpg first attempt, item # 52"
Time difference of 0.2536581 secs
[1] "Image53.jpg first attempt, item # 53"
Time difference of 1.253844 secs
[1] "Image54.jpg first attempt, item # 54"
Time difference of 1.149764 secs
[1] "Image55.jpg first attempt, item # 55"
Time difference of 1.171134 secs
[1] "Image56.jpg first attempt, item # 56"
Time difference of 1.397093 secs
[1] "Image57.jpg first attempt, item # 57"
Time difference of 1.201915 secs
[1] "Image58.jpg first attempt, item # 58"
Time difference of 1.455768 secs
[1] "Image59.jpg first attempt, item # 59"
Time difference of 1.618744 secs
[1] "Image60.jpg first attempt, item # 60"
Time difference of 4.527751 mins
Can anyone offer suggestions as to why the loop doesn't continue to take ~.1-.3 seconds? All jpgs are roughly the same size, resolution, and all generated from the same source.
I was able to solve my issue based on Mark's suggestion. I was removing the image file from memory in each loop iteration, but the freed up memory was never realized by R. I added a garbage collection command (gc()) into the loop to fix this issue, and the loop then ran as expected.

Unexpected results in benchmark of read.csv / fread [duplicate]

I can run a piece of code for 5 or 10 seconds using the following code:
period <- 10 ## minimum time (in seconds) that the loop should run for
tm <- Sys.time() ## starting data & time
while(Sys.time() - tm < period) print(Sys.time())
The code runs just fine for 5 or 10 seconds. But when I replace the period value by 60 for it to run for a minute, the code never stops. What is wrong?
As soon as elapsed time exceeds 1 minute, the default unit changes from seconds to minutes. So you want to control the unit:
while (difftime(Sys.time(), tm, units = "secs")[[1]] < period)
From ?difftime
If ‘units = "auto"’, a suitable set of units is chosen, the
largest possible (excluding ‘"weeks"’) in which all the absolute
differences are greater than one.
Subtraction of date-time objects gives an object of this class, by
calling ‘difftime’ with ‘units = "auto"’.
Alternatively use proc.time, which measures various times ("user", "system", "elapsed") since you started your R session in seconds. We want "elapsed" time, i.e., the wall clock time, so we retrieve the 3rd value of proc.time().
period <- 10
tm <- proc.time()[[3]]
while (proc.time()[[3]] - tm < period) print(proc.time())
If you are confused by the use of [[1]] and [[3]], please consult:
How do I extract just the number from a named number (without the name)?
How to get a matrix element without the column name in R?
Let me add some user-friendly reproducible examples. Your original code with print inside a loop is quite annoying as it prints thousands of lines onto the screen. I would use Sys.sleep.
test.Sys.time <- function(sleep_time_in_secs) {
t1 <- Sys.time()
Sys.sleep(sleep_time_in_secs)
t2 <- Sys.time()
## units = "auto"
print(t2 - t1)
## units = "secs"
print(difftime(t2, t1, units = "secs"))
## use '[[1]]' for clean output
print(difftime(t2, t1, units = "secs")[[1]])
}
test.Sys.time(5)
#Time difference of 5.005247 secs
#Time difference of 5.005247 secs
#[1] 5.005247
test.Sys.time(65)
#Time difference of 1.084357 mins
#Time difference of 65.06141 secs
#[1] 65.06141
The "auto" units is very clever. If sleep_time_in_secs = 3605 (more than an hour), the default unit will change to "hours".
Be careful with time units when using Sys.time, or you may be fooled in a benchmarking. Here is a perfect example: Unexpected results in benchmark of read.csv / fread. I had answered it with a now removed comment:
You got a problem with time units. I see that fread is more than 20 times faster. If fread takes 4 seconds to read a file, read.csv takes 80 seconds = 1.33 minutes. Ignoring the units, read.csv is "faster".
Now let's test proc.time.
test.proc.time <- function(sleep_time_in_secs) {
t1 <- proc.time()
Sys.sleep(sleep_time_in_secs)
t2 <- proc.time()
## print user, system, elapsed time
print(t2 - t1)
## use '[[3]]' for clean output of elapsed time
print((t2 - t1)[[3]])
}
test.proc.time(5)
# user system elapsed
# 0.000 0.000 5.005
#[1] 5.005
test.proc.time(65)
# user system elapsed
# 0.000 0.000 65.057
#[1] 65.057
"user" time and "system" time are 0, because both CPU and the system kernel are idle.

Timing R code with Sys.time()

I can run a piece of code for 5 or 10 seconds using the following code:
period <- 10 ## minimum time (in seconds) that the loop should run for
tm <- Sys.time() ## starting data & time
while(Sys.time() - tm < period) print(Sys.time())
The code runs just fine for 5 or 10 seconds. But when I replace the period value by 60 for it to run for a minute, the code never stops. What is wrong?
As soon as elapsed time exceeds 1 minute, the default unit changes from seconds to minutes. So you want to control the unit:
while (difftime(Sys.time(), tm, units = "secs")[[1]] < period)
From ?difftime
If ‘units = "auto"’, a suitable set of units is chosen, the
largest possible (excluding ‘"weeks"’) in which all the absolute
differences are greater than one.
Subtraction of date-time objects gives an object of this class, by
calling ‘difftime’ with ‘units = "auto"’.
Alternatively use proc.time, which measures various times ("user", "system", "elapsed") since you started your R session in seconds. We want "elapsed" time, i.e., the wall clock time, so we retrieve the 3rd value of proc.time().
period <- 10
tm <- proc.time()[[3]]
while (proc.time()[[3]] - tm < period) print(proc.time())
If you are confused by the use of [[1]] and [[3]], please consult:
How do I extract just the number from a named number (without the name)?
How to get a matrix element without the column name in R?
Let me add some user-friendly reproducible examples. Your original code with print inside a loop is quite annoying as it prints thousands of lines onto the screen. I would use Sys.sleep.
test.Sys.time <- function(sleep_time_in_secs) {
t1 <- Sys.time()
Sys.sleep(sleep_time_in_secs)
t2 <- Sys.time()
## units = "auto"
print(t2 - t1)
## units = "secs"
print(difftime(t2, t1, units = "secs"))
## use '[[1]]' for clean output
print(difftime(t2, t1, units = "secs")[[1]])
}
test.Sys.time(5)
#Time difference of 5.005247 secs
#Time difference of 5.005247 secs
#[1] 5.005247
test.Sys.time(65)
#Time difference of 1.084357 mins
#Time difference of 65.06141 secs
#[1] 65.06141
The "auto" units is very clever. If sleep_time_in_secs = 3605 (more than an hour), the default unit will change to "hours".
Be careful with time units when using Sys.time, or you may be fooled in a benchmarking. Here is a perfect example: Unexpected results in benchmark of read.csv / fread. I had answered it with a now removed comment:
You got a problem with time units. I see that fread is more than 20 times faster. If fread takes 4 seconds to read a file, read.csv takes 80 seconds = 1.33 minutes. Ignoring the units, read.csv is "faster".
Now let's test proc.time.
test.proc.time <- function(sleep_time_in_secs) {
t1 <- proc.time()
Sys.sleep(sleep_time_in_secs)
t2 <- proc.time()
## print user, system, elapsed time
print(t2 - t1)
## use '[[3]]' for clean output of elapsed time
print((t2 - t1)[[3]])
}
test.proc.time(5)
# user system elapsed
# 0.000 0.000 5.005
#[1] 5.005
test.proc.time(65)
# user system elapsed
# 0.000 0.000 65.057
#[1] 65.057
"user" time and "system" time are 0, because both CPU and the system kernel are idle.

date of creation of a variable using R

Is there a way to document variables in R with their date of creation/modification?
Because I have .RData files that I used fluency, but sometimes need update the values based in how old is it.
Try file.info():
To get the last modified time:
file.info('path/to/file.Rdata')$mtime
If you want to know when individual variables within your .RData object were last defined by R the only thing I know to do would be to manually add that metadata in something like this:
a = 3
attr(a, 'time_defined') = Sys.time()
b = 4
attr(b, 'time_defined') = Sys.time()
save(a, b, file = 'my_data.RData')
# ... later on ...
load('my_data.RData')
if(difftime(attr(a, 'time_defined'), Sys.time(), units = 'days') > 10) # do the following if more than 10 days old

Resources