Implementation of simple polling of results file - r

For one of my dissertation's data collection modules, I have implemented a simple polling mechanism. This is needed, because I make each data collection request (one of many) as SQL query, submitted via Web form, which is simulated by RCurl code. The server processes each request and generates a text file with results at a specific URL (RESULTS_URL in code below). Regardless of the request, URL and file name are the same (I cannot change that). Since processing time for different data requests, obviously, is different and some requests may take significant amount of time, my R code needs to "know", when the results are ready (file is re-generated), so that it can retrieve them. The following is my solution for this problem.
POLL_TIME <- 5 # polling timeout in seconds
In function srdaRequestData(), before making data request:
# check and save 'last modified' date and time of the results file
# before submitting data request, to compare with the same after one
# for simple polling of results file in srdaGetData() function
beforeDate <- url.exists(RESULTS_URL, .header=TRUE)["Last-Modified"]
beforeDate <<- strptime(beforeDate, "%a, %d %b %Y %X", tz="GMT")
<making data request is here>
In function srdaGetData(), called after srdaRequestData()
# simple polling of the results file
repeat {
if (DEBUG) message("Waiting for results ...", appendLF = FALSE)
afterDate <- url.exists(RESULTS_URL, .header=TRUE)["Last-Modified"]
afterDate <- strptime(afterDate, "%a, %d %b %Y %X", tz="GMT")
delta <- difftime(afterDate, beforeDate, units = "secs")
if (as.numeric(delta) != 0) { # file modified, results are ready
if (DEBUG) message(" Ready!")
break
}
else { # no results yet, wait the timeout and check again
if (DEBUG) message(".", appendLF = FALSE)
Sys.sleep(POLL_TIME)
}
}
<retrieving request's results is here>
The module's main flow/sequence of events is linear, as follows:
Read/update configuration file
Authenticate with the system
Loop through data requests, specified in configuration file (via lapply()),
where for each request perform the following:
{
...
Make request: srdaRequestData()
...
Retrieve results: srdaGetData()
...
}
The issue with the code above is that it doesn't seem to be working as expected: upon making data request, the code should print "Waiting for results ..." and then, periodically checking the results file for being modified (re-generated), print progress dots until the results are ready, when it prints confirmation. However, the actual behavior is that the code waits long time (I intentionally made one request a long-running), not printing anything, but then, apparently retrieves results and prints both "Waiting for results ..." and " Ready" at the same time.
It seems to me that it's some kind of synchronization issue, but I can't figure out what exactly. Or, maybe it's something else and I'm somehow missing it. Your advice and help will be much appreciated!

In a comment to the question, I believe MrFlick solved the issue: the polling logic appears to be functional, but the problem is that the progress messages are out of synch with current events on the system.
By default, the R console output is buffered. This is by design: to speed things up and avoid the distracting flicker that may be associated with frequent messages etc. We tend to forget this fact, particularly after we've been using R in a very interactive fashion, running various ad-hoc statement at the console (the console buffer is automatically flushed just before returning the > prompt).
It is however possible to get message() and more generally console output in "real time" by either explicitly flushing the console after each critical output statement, using the flush.console() function, or by disabling buffering at the level of the R GUI (right-click when on the console, see Buffered output Ctrl W item. This is also available in the Misc menu)
Here's a toy example of the explicit use of flush.console. Note the use of cat() rather than message() as the former doesn't automatically add a CR/LF to the output. The latter however is useful however because its messages can be suppressed with suppressMessages() and the like. Also as shown in the comment you can cat the "\b" (backspace) character to make the number overwrite one another.
CountDown <- function() {
for (i in 9:1){
cat(i)
# alternatively to cat(i) use: message(i)
flush.console() # <<<<<<< immediate ouput to console.
Sys.sleep(1)
cat(" ") # also try cat("\b") instead ;-)
}
cat("... Blast-off\n")
}
The output is the following, what is of course not evident in this print-out is that it took 10 seconds overall with one number printed every second, before the final "Blast off"; do remove the flush.console() statement and the output will come at once, after 10 seconds, i.e. when the function terminates (unless console is not buffered at the level of the GUI).
CountDown()
9 8 7 6 5 4 3 2 1 ... Blast-off

Related

R: Using source and loop cycle to run a 2nd script

I am using command source in a Script (Script Number 1) to run other script file saved (Script number 2).
My idea is use a loop to read continually the Script number 2 that is modified continually. I Never stop the loop. But Lamentably this dont work.
Always Script number 1 read the original Script number 2 and not change the result when I change the script number 2. I am using a sublime Text to change the script and not stop the loop cycle.
Example:
Example Script number 1:
repeat{
source("C:/Users/myPC/Desktop/script.R")
Sys.sleep(10)
}
Example Script number 2 modified (script saved as script.R in my desktop):
repeat {
print('Checking files')
Sys.sleep(time=10)
}
This run ok. But during the loop cycle I make changes to the script number 2 (and save file) :
Script number 2 modified:
repeat {
print('NOW RE-Checking files')
Sys.sleep(time=10)
}
The result always is this. Not read the script number 2 modified.
[1] "Checking files"
[1] "Checking files"
[1] "Checking files"
[1] "Checking files"
[1] "Checking files"
[1] "Checking files"
Not knowing what your scripts are really doing, it's a little guess-work, but I'll work on the two things I do know from this:
Checking to see if a file has been changed and then reloading it will work 95-99% or more of the time. Unfortunately, the risk is that whatever is writing the file will still be writing when your script-1 tries to source it: writing contents to a file is never atomic, so you cannot guarantee without some form of file-lock (not supported on all filesystems) or other coordination mechanism.
To remedy this, I suggest that what mechanism (sight-unseen) that is modifying script-2 needs to write to a temp-file and then destructively rename it to be script2.R. While writing to a file is not atomic, renaming/moving a file is atomic. With this, when script1.R notices that a file has been updated, it will get to read the whole thing unabated.
I suggest that instead of running code in script2.R, you should define a function that is called (repeatedly) by script-1. That way, script-1 retains control and can check for updates in script2.R and, as necessary, reload it. The function one writes in script-2 needs to do one thing and then yield control back to the caller; that is, no repeat or while loops unless they are very well contained/limited and will not take longer to exit than you expect script2.R to be updated. (Recognize that script-2's while/repeat loop will not be interrupted, so plan accordingly.)
(script2.R)
work <- function() {
print("Checking files")
Sys.sleep(10)
}
(script1.R)
prevmt <- NA
repeat {
mt <- file.info("script2.R")$mtime
if (is.na(prevmt) || prevmt < mt) {
source("script2.R")
}
work()
Sys.sleep(10)
}

Using Sys.sleep() to delay API call

I'm using R to make an API call to a weather data provider to download some weather forecasts. I'm using a free key that allows me to make no more than 10 calls per minute. I've tried using Sys.sleep() to ensure I don't go over the threshold but the API resource monitor tells me that I've exceeded the number of calls.
For example, if I'm making 6 calls, a time interval of 10 seconds between the calls ought to be sufficient (not taking into account the time R would need).
dat <- list()
for(i in 1:6){
dat[[i]] <- getWeatherData(web_url, api_key, history_date, data_format)
Sys.sleep(10)
web_url <- gsub(i-1, i, url)
}
The getWeatherData function does the following:
makes the API call (only one API call is made each time the function is invoked. Uses httr::GET() to get the data),
parses the XML output to get desired variables (regulat expressions),
performs some clean-up (for missing/garbage values),
converts strings to R date-time objects (POSIXct), and
rounds values to the nearest hour (lubridate::round_date()).
Function inputs:
web_url is a custom url,
api_key is my personal key,
history_date is a string (formatted as "%d/%m/%Y %H:%M:%S"), and
data_format specifies if I want an .XML or .json file as output.
I can not share the url/key for obvious reasons. As soon as I run this, I get a notification from the data provider that I've exceeded the allowable calls per minute (10). I don't get a notification every time - not sure why that is either.
Any help is appreciated!
This solution should be helpful for you if Sys.sleep doesn't do the trick.
Basically, this replaces the use of Sys.sleep with while logic.
dat <- list()
delay_seconds<-10
for(i in 1:6){
dat[[i]] <- getWeatherData(web_url, api_key, history_date, data_format)
date_time<-Sys.time()
while((as.numeric(Sys.time()) - as.numeric(date_time))<delay_seconds){}
web_url <- gsub(i-1, i, url)
}
Here, we are:
defining a number of seconds to wait ( delay_seconds<-10 )
defining a start time for comparison ( date_time<-Sys.time() )
using a while loop that checks the present time in comparison to our comparison time and seeing if this is less than our chosen delay interval ( (as.numeric(Sys.time()) - as.numeric(date_time)<delay_seconds )
doing nothing until the wait time is over( {} )
Not knowing if you need/want to, but in the case that you're hoping to get your data out of the lists and into a longer combined form, I recommend the dplyr function bind_rows().
dat2<-bind_rows(dat)
Thanks to an answer by rbtj to this question: How to make execution pause, sleep, wait for X seconds in R?

Rmpi send/receive hanging?

I am writing the code for a population MCMC. I will try to provide as much information I think could help, so please bear with me.
I am using tempered distributions and I want to perform exchange moves, i.e a moves that propose to swap the value of two chains.
What I have done (exchange happening in the master)
I had written this initially by
letting each chain mutate for a specified number of iterations n.
at every n-th iteration, I would send the slaves' results to the master and there attempt an exchange of parameters between chains.
Then send updated values back to slaves and repeat the process.
What I want to achieve (exchange directly between slaves)
This is working fine, but I wanted to clean up my code and remove unnecessary communication between master and slave . That is, let the slaves communicate directly between them.
So assuming I am spawning 10 slaves,
at iteration n, I want to let slave1-slave2, slave3-slave4, ...., slave9-slave10 communicate between them
at iteration 2*n, I want to let slave2-slave3, slave4-slave5, ...slave8-slave9 communicate between them
and so on, so that I let samples travel through the temperature ladder.
The problem
And this is where I am facing a problem.
I think I am managing to send a value from one slave to another (my print statement "Succesfully sent" gets printed in the log files of the slaves that are sending) but this doesn't seem to be received (my "Succesfully received" statement doesn't get printed in the log of the partner slaves).
And the program just hangs. I think maybe I have caused a deadlock, but I am not sure what I have done wrong?
Could you please advise? I have used as guide this Parallel Tempering R code
http://www.lindonslog.com/mathematics/parallel-tempering-r-rmpi/
Please see below my code
Many thanks!
Sofia
ind <- mpi.comm.rank()
oddFlag<-0 ### object to flag code suitable for odd/even numbered slaves.
for (i in 1:TotalIter) {
##### normal MCMC move (single chain mutation) - logL.current
if ( i%%exchangeInterval == 0 ){ ### every nth (right now 5th) iteration, attempt an exchange
message("\n\nAttempt an exchange move")
oddFlag<-oddFlag+1
exchange<-0
logL.partner<-0
if (ind%%2 == oddFlag%%2) { ###when oddFlag even , the following code concerns even-numbered slaves. When odd number, it concerns odd-numbered slaves.
ind.partner<-ind+1
if (0<ind.partner && ind.partner<(noChains+1)){
message("This is the slave: ", ind, " and its partner is: ", ind.partner)
message("The tag for receiving logL.partner is: ", ind.partner)
logL.partner<-mpi.recv.Robj(source=ind.partner,tag=ind.partner) #### receive the logL of partner
message("Succesfully received")
message("This is the logL.partner: ", logL.partner)
exchanges.attempted<-exchanges.attempted+1
if (runif(1)< min(1, exp((logL.partner - estimatorSelf)*(temper[ind] - temper[ind.partner] )))) { ############# exp((chain2 - chain1)*(T1 - T2))
message("I exchanged the values")
exchange<-1
print(exchange)
exchanges.accepted<-exchanges.accepted+1
}
mpi.send.Robj(obj=exchange,dest=ind.partner,tag=15*ind)
}
if (exchange==1){
### exchange parameters with mpi.send.Robj/mpi.recv.Robj functions
}
} else { ##### ###when oddFlag even , the following code concerns odd-numbered slaves. For oddFlag odd, it concerns even-numbered slaves.
ind.partner<-ind-1
if (0<ind.partner && ind.partner<(noChains+1)){
message("This is the slave: ", ind, " and its partner is: ", ind.partner)
message("The tag for sending logL.current is: ", ind)
mpi.send.Robj(obj=logL.current,dest=ind.partner,tag=ind) ### send logL to partner
message("Succesfully sent")
exchange<-mpi.recv.Robj(source=ind.partner, tag=15*ind.partner)
message("I received the exchange message")
}
if (exchange==1){
### exchange parameters send/receive functions
}
}
}
}
This seems very strange. To my knowledge the send and receive commands are locking so for the send to work without the receive to work strikes me as a bit odd. Have you tried the code on the guide? If you like I can send you the entire R-code just to see if it works on your system. If the code on the guide works then you can safely say that it isn't a problem with your MPI setup.
One thing you might try is for the receive command instead of using ind.partner you could use mpi.any.source() (i think is the correct one) which will accept a message of any tag. If this resolves your deadlock, it could be a problem with the tags (but by eye there doesn't seem anything wrong to me).
Another thing you might try is removing the "source=" and the "tag=" on the receive commands. I notice in my code that I don't have any there whilst I do for the send commands. Perhaps that was causing me too problems, but I can't quite remember. Let me know how it goes, I hope it all works out.

Kill a calculation programme after user defined time in R

Say my executable is c:\my irectory\myfile.exe and my R script calls on this executeable with system(myfile.exe)
The R script gives parameters to the executable programme which uses them to do numerical calculations. From the ouput of the executable, the R script then tests whether the parameters are good ore not. If they are not good, the parameters are changed and the executable rerun with updated parameters.
Now, as this executable carries out mathematical calculations and solutions may converge only slowly I wish to be able to kill the executable once it has takes to long to carry out the calculations (say 5 seconds)
How do I do this time dependant kill?
PS:
My question is a little related to this one: (time non dependant kill)
how to run an executable file and then later kill or terminate the same process with R in Windows
You can add code to your R function which issued the executable call:
setTimeLimit(elapse=5, trans=T)
This will kill the calling function, returning control to the parent environment (which could well be a function as well). Then use the examples in the question you linked to for further work.
Alternatively, set up a loop which examines Sys.time and if the expected update to the parameter set has not taken place after 5 seconds, break the loop and issue the system kill command to terminate myfile.exe .
There might possibly be nicer ways but it is a solution.
The assumption here is, that myfile.exe successfully does its calculation within 5 seconds
try.wtl <- function(timeout = 5)
{
y <- evalWithTimeout(system(myfile.exe), timeout = timeout, onTimeout= "warning")
if(inherits(y, "try-error")) NA else y
}
case 1 (myfile.exe is closed after successfull calculation)
g <- try.wtl(5)
case 2 (myfile.exe is not closed after successfull calculation)
g <- try.wtl(0.1)
MSDOS taskkill required for case 2 to recommence from the beginnging
if (class(g) == "NULL") {system('taskkill /im "myfile.exe" /f',show.output.on.console = FALSE)}
PS: inspiration came from Time out an R command via something like try()

User input when executing R code in batch mode

I am searching for a way to get user input inside a loop while executing in batch mode.
readLines() and scan() work well for me in interactive mode only, in batch mode they start to read in lines of code as user input, unless all the code is surrounded by {}, which is inconvenient. I need a simple solution to get just 1 integer value in a way that I can just type in value and press ENTER, so
the input field (if the solution involves GUI) must automatically get focus and
ENTER must trigger end of input/submission.
I can't find a way to do it that will satisfy both conditions, e.g. ginput() from gWidgets activates the input field, but ENTER doesn't trigger form submission.
Here is how I solved my own problem:
require(gWidgets)
options(guiToolkit="RGtk2")
INPUT <- function(message) {
CHOICE <- NA
w <- gbasicdialog(title=message, handler = function(h,...) CHOICE <<- svalue(input))
input <- gedit("", initial.msg="", cont=w, width=10)
addHandlerChanged(input, handler=function (h, ...) {
CHOICE <<- svalue(input)
dispose(w)
})
visible(w, set=TRUE)
return(CHOICE)
}
repeat{
x=INPUT("Input an integer")
if(!is.na(as.integer(x))) break
}
print(x)
Update:
I can't test this right now, but take a look at ?menu and have it pop up a gui window.
I'm not certain if that will work, but it is different in that it takes a mouse-click response.
original answer:
As per the documentation to ?readline:
This can only be used in an interactive session.
..
In non-interactive use the result is as if the response was RETURN and the value is "".
If you are simply waiting for one piece of information, and you do not know this piece of information before beginning the execution of the script (presumably, there is a decision to be made which is dependent on the results earlier in the script), then one alternative is to simply break your script up into three parts:
everything before the decision point.
an interactive script which prompts for input
everything after the decision point.
And simply chain the three together by having the first end by calling the second in an interactive session. Then have the second end by calling the third.

Resources