Let's say I have a package with the following function:
foo <- function() {
Sys.sleep(1) # really expensive operation
return(1)
}
The value of the function is always the same per run, so I would like to use memoisation.
I thought I could simply do
foo <- memoise::memoise(function() {
Sys.sleep(1) # really expensive operation
return(1)
})
However, this doesn't work.
I mean, running it as a GlobalEnv function, it works:
foo <- memoise::memoise(function() {
Sys.sleep(1)
return(1)
})
system.time(foo())
#> user system elapsed
#> 0 0 1
system.time(foo())
#> user system elapsed
#> 0.01 0.00 0.01
Created on 2019-12-23 by the reprex package (v0.3.0)
However, if it's in a package, I get really weird behavior. Basically, memoisation doesn't kick in and I keep getting the same cost. However, if I print the function definition, it starts working!
system.time(bar::foo())
#> user system elapsed
#> 0.47 0.08 2.55
system.time(bar::foo())
#> user system elapsed
#> 0 0 2
system.time(bar::foo())
#> user system elapsed
#> 0.02 0.00 2.02
system.time(bar::foo())
#> user system elapsed
#> 0.01 0.00 2.02
bar::foo
#> Memoised Function:
#> function() {
#> Sys.sleep(2)
#> return (1)
#> }
#> <environment: namespace:bar>
system.time(bar::foo())
#> user system elapsed
#> 0 0 2
system.time(bar::foo())
#> user system elapsed
#> 0 0 0
system.time(bar::foo())
#> user system elapsed
#> 0 0 0
system.time(bar::foo())
#> user system elapsed
#> 0 0 0
For the record, here are the relevant parts of the NAMESPACE and DESCRIPTION files:
# NAMESPACE
export(foo)
importFrom(memoise,memoise)
# DESCRIPTION [...]
Imports:
memoise
What's going on here, and what should I do to make memoisation work from the start in my package?
This looks like a bug in the memoise package. When you are working on your own package, R may add debug information (called srcrefs) to functions. Something about those cause the hash to come out differently every time you call the function, so it never recognizes that you are calling with the same arguments.
A simple workaround is to remove the install option "--with-keep.source" when you install your own package. (If you're using RStudio, this is added automatically in Project Options | Build Tools | Install and Restart... .) This will stop R from adding the srcref, and the bug in memoise won't be triggered. Unfortunately, this cripples the debugger in RStudio and other front-ends, so it's not ideal.
Another workaround that doesn't mess with the debugger (except for that one function) is to use removeSource on the target that is being memoised. For example,
foo <- memoise::memoise(removeSource(function() {
Sys.sleep(1) # really expensive operation
return(1)
}))
Related
I want to cat() some information to the console in the case a test fails (I'm getting confident this won't happen but I can't prove it wont) so I can investigate the issue.
Now I have code that is approximately like this:
testthat::test_that('Maybe fails', {
seed <- as.integer(Sys.time())
set.seed(seed)
testthat::expect_true(maybe_fails(runif(100L)))
testthat::expect_equal(long_vector(runif(100L)), target, tol = 1e-8)
if (failed()) {
cat('seed: ', seed, '\n')
}
})
Unfortunately, failed() doesn't exist.
Return values of expect_*() don't seem useful, they just return the actual argument.
I'm considering to just check again using all.equal() but that is a pretty ugly duplication.
Instead of using cat, you could use the info argument managed by testthat and its reporters for all expect functions (argument kept for compatibility reasons):
library(testthat)
testthat::test_that("Some tests",{
testthat::expect_equal(1,2,info=paste('Test 1 failed at',Sys.time()))
testthat::expect_equal(1,1,info=paste('Test 2 failed at',sys.time()))
})
#> -- Failure (<text>:5:3): Some tests --------------------------------------------
#> 1 not equal to 2.
#> 1/1 mismatches
#> [1] 1 - 2 == -1
#> Test 1 failed at 2021-03-03 17:25:37
How do you pause an R script for a specified number of seconds or miliseconds? In many languages, there is a sleep function, but ?sleep references a data set. And ?pause and ?wait don't exist.
The intended purpose is for self-timed animations. The desired solution works without asking for user input.
See help(Sys.sleep).
For example, from ?Sys.sleep
testit <- function(x)
{
p1 <- proc.time()
Sys.sleep(x)
proc.time() - p1 # The cpu usage should be negligible
}
testit(3.7)
Yielding
> testit(3.7)
user system elapsed
0.000 0.000 3.704
Sys.sleep() will not work if the CPU usage is very high; as in other critical high priority processes are running (in parallel).
This code worked for me. Here I am printing 1 to 1000 at a 2.5 second interval.
for (i in 1:1000)
{
print(i)
date_time<-Sys.time()
while((as.numeric(Sys.time()) - as.numeric(date_time))<2.5){} #dummy while loop
}
TL;DR sys_sleep a new stable and precise sleep function
We already know that Sys.sleep could work not as expected, e.g. when CPU usage is very high.
That is why I decided to prepare a high quality function powered by microbenchmark::get_nanotime() and while/repeat mechanics.
#' Alternative to Sys.sleep function
#' Expected to be more stable
#' #param val `numeric(1)` value to sleep.
#' #param unit `character(1)` the available units are nanoseconds ("ns"), microseconds ("us"), milliseconds ("ms"), seconds ("s").
#' #note dependency on `microbenchmark` package to reuse `microbenchmark::get_nanotime()`.
#' #examples
#' # sleep 1 second in different units
#' sys_sleep(1, "s")
#' sys_sleep(100, "ms")
#' sys_sleep(10**6, "us")
#' sys_sleep(10**9, "ns")
#'
#' sys_sleep(4.5)
#'
sys_sleep <- function(val, unit = c("s", "ms", "us", "ns")) {
start_time <- microbenchmark::get_nanotime()
stopifnot(is.numeric(val))
unit <- match.arg(unit, c("s", "ms", "us", "ns"))
val_ns <- switch (unit,
"s" = val * 10**9,
"ms" = val * 10**7,
"us" = val * 10**3,
"ns" = val
)
repeat {
current_time <- microbenchmark::get_nanotime()
diff_time <- current_time - start_time
if (diff_time > val_ns) break
}
}
system.time(sys_sleep(1, "s"))
#> user system elapsed
#> 1.015 0.014 1.030
system.time(sys_sleep(100, "ms"))
#> user system elapsed
#> 0.995 0.002 1.000
system.time(sys_sleep(10**6, "us"))
#> user system elapsed
#> 0.994 0.004 1.000
system.time(sys_sleep(10**9, "ns"))
#> user system elapsed
#> 0.992 0.006 1.000
system.time(sys_sleep(4.5))
#> user system elapsed
#> 4.490 0.008 4.500
Created on 2022-11-21 with reprex v2.0.2
I am wondering about the possibility to speed up the process of writing to a file.
with my SSD and core i5 vPro I am getting following results for the file of 5234 KB:
system.time({
write(reportData, "aaa.txt")
})
user system elapsed
1.42 3.56 12.28
as well as
system.time({
fileConn<-file("aaa.txt")
writeLines(reportData, fileConn)
close(fileConn)
})
user system elapsed
1.43 3.46 13.61
and
system.time({
fileConn <- file("aaa.txt","w")
cat(reportData,file=fileConn,sep="")
close(fileConn)
})
user system elapsed
1.50 4.13 14.12
All of them seem to be implemented in the similar manner since the time execution is almost identical.
Is it possible to use Rcpp library, for c++ could definitely do it much faster?
EDIT
Without using Rcpp writeChar seems to be the fastest.
system.time({
fileConn<-file("aaa.txt")
writeChar(reportData, fileConn,nchar(reportData, type = "chars"))
close(fileConn)
})
user system elapsed
0.01 0.14 1.31
I have this example data, where I load some tickers
libs <- c('quantmod')
lapply(libs, require, character.only = T)
tickers<-c('T','AMD','AA','AMAT','BAC')
getSymbols(tickers,from="2013-01-01")
Then I created function like
FUNtest<-function (x,y){
data<-x
close<-data[,y]
return(tail(close))
}
which works like for example
FUNtest(AMD,4)
and the result is tail of closing prices of AMD
AMD.Close
2014-07-16 4.66
2014-07-17 4.57
2014-07-18 3.83
2014-07-21 3.78
2014-07-22 3.80
2014-07-23 3.76
But, for later usage, I need to be able to use function this way
FUNtest(tickers[2],4)
but it doesn't work. If I call
tickers[2]
it shows
> tickers[2]
[1] "AMD"
but it is not able to work in function. And advices how to fix it?
Thanks
There's a big difference between
FUNtest(AMD,4)
and
FUNtest("AMD",4)
With the former, you are passing a name which points to an xts object. In the latter, you are simply passing a character string. This string is in no way directly connected to the object of the same name.
If you want a function that works if you pass a character or an xts object, you can do
FUNtest<-function (x,y){
if(is(x, "xts")) {
data <- x
} else if (is(x, "character")) {
data <- get(x)
} else {
stop(paste("invalid x class:", class(x)))
}
close <- data[,y]
return(tail(close))
}
then both
FUNtest(AMD, 4)
FUNtest(tickers[2], 4)
will work.
But even better is not to use the behavior of quantmod where it adds variables to your global environment. This is the default that's being phased out because it encourages bad behavior. It's better to store them all in a list like
symb<-lapply(setNames(tickers, tickers), function(x)
getSymbols(x,from="2013-01-01", auto.assign=F))
Then you can have symb$AMAT or symb[["AMAT"]] depending on how you want to extract the data. The latter form is more flexible because you can specify a variable with a particular value or you can perform an action to all the data.sets by lapply-ing over the list.
You could try using get in the function.
get("AMD") finds AMD in the evaluation frame (or not) and returns the value attached to it.
> FUNtest<-function (x,y){
data<-get(x)
close<-data[,y]
return(tail(close))
}
> FUNtest(tickers[2], 4)
# AMD.Close
# 2014-07-16 4.66
# 2014-07-17 4.57
# 2014-07-18 3.83
# 2014-07-21 3.78
# 2014-07-22 3.80
# 2014-07-23 3.76
Also, there isn't really a need to use return here. This function might be better for you
> f <- function(x, y){ x <- get(x); tail(x[, y], 3) }
## on the entire tickers vector, get column 4 and bind them
> do.call(cbind, lapply(tickers, f, y = 4))
# T.Close AMD.Close AA.Close AMAT.Close BAC.Close
# 2014-07-16 36.45 4.66 16.60 22.85 15.51
# 2014-07-17 36.03 4.57 16.33 22.77 15.20
# 2014-07-18 36.17 3.83 16.49 23.00 15.49
eval can also be quite useful for unquoted arguments
> f <- function(x){ eval(x) }
> head(f(AMD), 3)
# AMD.Open AMD.High AMD.Low AMD.Close AMD.Volume AMD.Adjusted
# 2013-01-02 2.55 2.57 2.45 2.53 27214800 2.53
# 2013-01-03 2.52 2.59 2.46 2.49 24966900 2.49
# 2013-01-04 2.51 2.59 2.49 2.59 22054200 2.59
In the first case you're passing a dataframe called AMD; in the second you're just passing a character value "AMD".
I'm guessing that the dataframe AMD is already loaded into your work space so that's why FUNtest works in the first case.
if you want the function to work try either passing the data frame you want to the function, or tell the function where to find the data frame you want.
I've been trying to increase the speed of some code. I've removed all loops, am using vectors and have streamed lined just about everything. I've timed each iteration of my code and it appears to be slowing as iterations increase.
### The beginning iterations
user system elapsed
0.03 0.00 0.03
user system elapsed
0.03 0.00 0.04
user system elapsed
0.03 0.00 0.03
user system elapsed
0.04 0.00 0.05
### The ending iterations
user system elapsed
3.06 0.08 3.14
user system elapsed
3.10 0.05 3.15
user system elapsed
3.08 0.06 3.15
user system elapsed
3.30 0.06 3.37
I have 598 iterations and right now it takes about 10 minutes. I'd like to speed things up. Here's how my code looks. You'll need the RColorBrewer and fields packages. Here's my data. Yes I know its big, make sure you download the zip file.
StreamFlux <- function(data,NoR,NTS){
###Read in data to display points###
WLX = c(8,19,29,20,13,20,21)
WLY = c(25,28,25,21,17,14,12)
WLY = 34 - WLY
WLX = WLX / 44
WLY = WLY / 33
timedata = NULL
mf <- function(i){
b = (NoR+8) * (i-1) + 8
###I read in data one section at a time to avoid headers
mydata = read.table(data,skip=b,nrows=NoR, header=FALSE)
rows = 34-mydata[,2]
cols = 45-mydata[,3]
flows = mydata[,7]
rows = as.numeric(rows)
cols = as.numeric(cols)
rm(mydata)
###Create Flux matrix
flow_mat <- matrix(0,44,33)
###Populate matrix###
flow_mat[(rows - 1) * 44 + (45-cols)] <- flows+flow_mat[(rows - 1) * 44 + (45-cols)]
flow_mat[flow_mat == 0] <- NA
rm(flows)
rm(rows)
rm(cols)
timestep = i
###Specifying jpeg info###
jpeg(paste("Steamflow", timestep, ".jpg",sep = ''),
width = 640, height=441,quality=75,bg="grey")
image.plot(flow_mat, zlim=c(-1,1),
col=brewer.pal(11, "RdBu"),yaxt="n",
xaxt="n", main=paste("Stress Period ",
timestep, sep = ""))
points(WLX,WLY)
dev.off()
rm(flow_mat)
}
ST<- function(x){functiontime=system.time(mf(x))
print(functiontime)}
lapply(1:NTS, ST)
}
This is how to run the function
###To run all timesteps###
StreamFlux("stream_out.txt",687,598)
###To run the first 100 timesteps###
StreamFlux("stream_out.txt",687,100)
###The first 200 timesteps###
StreamFlux("stream_out.txt",687,200)
To test remove print(functiontime) to stop it printing at every timestep then
> system.time(StreamFlux("stream_out.txt",687,100))
user system elapsed
28.22 1.06 32.67
> system.time(StreamFlux("stream_out.txt",687,200))
user system elapsed
102.61 2.98 106.20
What I'm looking for is anyway to speed up running this code and possibly an explanation of why it is slowing down? Should I just run it in parts, seems a stupid solution. I've read about dlply from the plyr. It seems to have worked here but would that help in my case? How about parallel processing, I think I can figure that out but is it worth the trouble in this case?
I will follow #PaulHiemstra's suggestion and post my comment as an answer. Who can resist Internet points? ;)
From a quick glance at your code, I agree with #joran's second point in his comment: your loop/function is probably slowing down due to repeatedly reading in your data. More specifically, this part of the code probably needs to be fixed:
read.table(data, skip=b, nrows=NoR, header=FALSE).
In particular, I think the skip=b argument is the culprit. You should read in all the data at the beginning, if possible, and then retrieve the necessary parts from memory for the calculations.