SQLite transactions fail - r

I'm performing thousands of calculations with R and some external software.
To keep track of this, I have constructed a pipeline in R on a SQLite3 database.
To get things done, I've set it up to allow for multiple nodes on our computing cluster to run the R script.
Naturally, we need to keep things atomized so I am actively starting transactions.
To avoid the script from crashing when trying to get a lock on the database, I have the following code that attempts to begin a transaction, and if fails, waits and then retries:
dbBeginTransaction <- function(dbconn, state='DEFERRED', timeout=5, retries=10) {
state <- toupper(state)
if (!state %in% c('DEFERRED','IMMEDIATE','EXCLUSIVE')) stop('Attempt at illegal transaction.')
res <- NULL
exit <- FALSE
for (i in 1:retries) {
try(
res <- dbSendQuery(dbconn, paste('BEGIN',state,'TRANSACTION;'))
, silent=TRUE
)
# res is null if above query fails.
err <- dbGetException(dbconn)
if (err$errorNum == 0) { ## OK
#return(TRUE)
exit <- TRUE
break
} else if (err$errorNum == 5) { ## Database locked.
if (i == retries+1) {
cat('Database still locked after',i,'attempts.\n',file=stderr())
#return(FALSE)
exit <- FALSE
break
} else {
Sys.sleep(timeout)
}
} else {
## errorNum == 1 ## Already within transaction.
cat(err$errorMsg, '\n', file = stderr())
#return(FALSE)
exit <- FALSE
break
}
}
invisible(exit)
}
## Usage:
insert_results <- function(results) {
## Some preparing of results
if (dbBeginTransaction(conn, 'EXCLUSIVE') == FALSE) return(FALSE)
dbSendPreparedQuery(conn, 'INSERT INTO results (...) VALUES (...);', results)
dbCommit(conn)
}
## After a computation:
results <- magic()
if (!insert_results(results)) stop('Could not save results')
When debugging, it works as it should. But every now and then I get this odd error, and the script crashes:
Error in sqliteSendQuery(conn, statement, bind.data) :
rsqlite_query_send: could not execute: database is locked
Calls: dbSendPreparedQuery ... dbSendPreparedQuery -> .local -> sqliteSendQuery -> .Call
Execution halted
I cannot fathom what happens, and I am yet to reproduce the error.
The error is clear, but I would have thought that my routine had prevented it.
Any thoughts on why this is happening?
I'm running R under linux, as seen by:
> sessionInfo()
R version 3.1.2 (2014-10-31)
Platform: x86_64-unknown-linux-gnu (64-bit)
locale:
[1] LC_CTYPE=en_US LC_NUMERIC=C LC_TIME=en_US
[4] LC_COLLATE=en_US LC_MONETARY=en_US LC_MESSAGES=en_US
[7] LC_PAPER=en_US LC_NAME=C LC_ADDRESS=C
[10] LC_TELEPHONE=C LC_MEASUREMENT=en_US LC_IDENTIFICATION=C
attached base packages:
[1] stats graphics grDevices utils datasets methods base
other attached packages:
[1] RSQLite_1.0.0 DBI_0.3.1

Encountered this problem today in the office today. This is the suggestion of my colleague: It can be solved by making sure that every time you do a database connection and make some sort of change to the database you do a dbCommit(con). The new code would look like this:
dbBeginTransaction <- function(dbconn, state='DEFERRED', timeout=5, retries=10) {
state <- toupper(state)
if (!state %in% c('DEFERRED','IMMEDIATE','EXCLUSIVE')) stop('Attempt at illegal transaction.')
res <- NULL
exit <- FALSE
for (i in 1:retries) {
try(
dbBegin(dbconn)
res <- dbSendQuery(dbconn, paste('BEGIN',state,'TRANSACTION;'))
, silent=TRUE
)
# res is null if above query fails.
err <- dbGetException(dbconn)
if (err$errorNum == 0) { ## OK
#return(TRUE)
exit <- TRUE
break
} else if (err$errorNum == 5) { ## Database locked.
if (i == retries+1) {
cat('Database still locked after',i,'attempts.\n',file=stderr())
#return(FALSE)
exit <- FALSE
break
} else {
Sys.sleep(timeout)
}
} else {
## errorNum == 1 ## Already within transaction.
cat(err$errorMsg, '\n', file = stderr())
#return(FALSE)
exit <- FALSE
break
}
}
invisible(exit)
}
## Usage:
insert_results <- function(results) {
## Some preparing of results
if (dbBeginTransaction(conn, 'EXCLUSIVE') == FALSE) return(FALSE)
dbSendPreparedQuery(conn, 'INSERT INTO results (...) VALUES (...);', results)
dbCommit(conn)
}
## After a computation:
results <- magic()
if (!insert_results(results)) stop('Could not save results')
And I solved by adding dbBegin(con) just before your begin any transactions with the database
dbBeginTransaction <- function(dbconn, state='DEFERRED', timeout=5, retries=10) {
state <- toupper(state)
if (!state %in% c('DEFERRED','IMMEDIATE','EXCLUSIVE')) stop('Attempt at illegal transaction.')
res <- NULL
exit <- FALSE
for (i in 1:retries) {
try(
res <- dbSendQuery(dbconn, paste('BEGIN',state,'TRANSACTION;'))
, silent=TRUE
dbCommit(dbconn)
)
# res is null if above query fails.
err <- dbGetException(dbconn)
if (err$errorNum == 0) { ## OK
#return(TRUE)
exit <- TRUE
break
} else if (err$errorNum == 5) { ## Database locked.
if (i == retries+1) {
cat('Database still locked after',i,'attempts.\n',file=stderr())
#return(FALSE)
exit <- FALSE
break
} else {
Sys.sleep(timeout)
}
} else {
## errorNum == 1 ## Already within transaction.
cat(err$errorMsg, '\n', file = stderr())
#return(FALSE)
exit <- FALSE
break
}
}
invisible(exit)
}
## Usage:
insert_results <- function(results) {
## Some preparing of results
if (dbBeginTransaction(conn, 'EXCLUSIVE') == FALSE) return(FALSE)
dbSendPreparedQuery(conn, 'INSERT INTO results (...) VALUES (...);', results)
dbCommit(conn)
}
## After a computation:
results <- magic()
if (!insert_results(results)) stop('Could not save results')

Related

tryCatch in R Programming

In R, I want to create a function that return Flag=0 if it encounters any error:
Error<-function(x){
tryCatch(x,
error=function(e{
Flag=0
}
)
}
When I enter: Error(5+b). It does not reflect Flag=0.
Flag <- 1
tryCatch(
5+b,
error=function(e) { Flag <<- 0}
)
Flag
[1] 0
In your code, the scope of Flag is local to the error handler. Using <<- makes the assignment in the global environment.
You can return a string ('error') here if error occurs and check it's value to return 1/0.
Error<-function(x){
y <- tryCatch(x,error=function(e) return('error'))
as.integer(y != 'error')
}
Error(sqrt('a'))
#[1] 0
Error(sqrt(64))
#[1] 1

scheduled cores ... did not deliver results, all values of the jobs will be affected in parallel::mclapply() in R 4.0.1

I'm using parallel::mclapply() with R 4.0.1 and getting the following warning: "scheduled cores ... did not deliver results, all values of the jobs will be affected".
Here the result of my investigation: inspecting the function source code, I realized that it happens when the vector dr is not all TRUE. This means that for some cores the second condition inside the for loop below (is.raw(a)) is never executed. a is the value returned by readChild(), that if returned raw data at least once, the condition would be verified at least once. So I'm thinking that readChild() is returning NULL.
readChild and readChildren return a raw vector with a "pid" attribute if data were available, an integer vector of length one with the process ID if a child terminated or NULL if the child no longer exists (no children at all for readChildren).
I ask you to validate or reject my conclusions. Finally if true what are the possible reasons?
while (!all(fin)) {
s <- selectChildren(ac[!fin], -1)
if (is.null(s)) break # no children -> no hope we get anything (should not happen)
if (is.integer(s))
for (ch in s) {
a <- readChild(ch)
if (is.integer(a)) {
core <- which(cp == a)
fin[core] <- TRUE
} else if (is.raw(a)) {
core <- which(cp == attr(a, "pid"))
job.res[[core]] <- ijr <- unserialize(a)
if (inherits(ijr, "try-error"))
has.errors <- c(has.errors, core)
dr[core] <- TRUE
} else if (is.null(a)) {
# the child no longer exists (should not happen)
core <- which(cp == ch)
fin[core] <- TRUE
}
}
}
This error message can occur when the child process dies/crashes, e.g.
> y <- parallel::mclapply(1:2, FUN = function(x) if (x == 1) quit("no") else x)
Warning message:
In parallel::mclapply(1:2, FUN = function(x) if (x == 1) quit("no") else x) :
scheduled core 1 did not deliver a result, all values of the job will be affected
> str(y)
List of 2
$ : NULL
$ : int 2
That a child process completely dies is of course not good. It can happen for several reasons. My best guess is that you parallelize something that must not be parallelized. Forked processing (=mclapply()) is known to be unstable with code that multi-thread, among other things.
For what's it worth, if you use the future framework instead (disclaimer: I'm the author), you'll get an error message that is a bit more informative, e.g.
> library(future.apply)
> plan(multicore)
> y <- future_lapply(1:2, FUN = function(x) if (x == 1) quit("no") else x)
Error: Failed to retrieve the result of MulticoreFuture (future_lapply-1) from
the forked worker (on localhost; PID 19959). Post-mortem diagnostic: No process
exists with this PID, i.e. the forked localhost worker is no longer alive.

Reconnect if connection fails using R

I have written small function but somehow does not work as expected.
I have connection to server and sometime the server is down so I cannot connect. The script is running in batch so I have to have it automatized.
The script should run the conn <- function(..) successfully (it means no error message) if not restart/re-check and restart again in approx. 1min time. This should run in loop until connection is established something like 12 hours. (approx.). The connection should be assigned to conn object so the object must return successful connection. (something like <Connection established, # 20180522 20:43:41 CET>
The function which does not work is here:
connect <- function(c) { message(paste("remaining run", c));
withRestarts(err <- tryCatch({ conn <- server.coonect(settings...) },
error=function(e) { invokeRestart("reconnect") }), reconnect = function() { message("re-connecting");
stopifnot(c > 0); for(i in 1:10) { Sys.sleep(6); cat(i) }; connect(c-1) }) }
connect(1000) # with approx. 1min sleep we get here over 12 hours running..
So the question is what is wrong and how to rewrite the function such it runs as expected. Thanks.
EDIT
It seems that the function should be:
connect <- function(c) { message(paste("remaining run", c));
withRestarts(err <- tryCatch({ server.coonect(settings...) },
error=function(e) { invokeRestart("reconnect") }), reconnect = function() { message("re-connecting");
stopifnot(c > 0); for(i in 1:10) { Sys.sleep(6); cat(i) } }) }
conn <- connect(1000)
EDIT 2
Here is comment for the above function I have tested:
I have tested the EDIT function by simulating the connection by first running the function without internet connection (now the function checks every 1:10 o 6sec, and after the function is running I connect to the internet, now I expect the function in next iteration pics up and connects to server if available...) what happens is that the function does not pick up the later possibility to connect...
If you only want to loop over the connection establishment this will work:
# simulate an instable connection
server.connect <- function(...) {
if (round(as.numeric(Sys.time())) %% 10 != 0) # about 90 % failed connections
stop("Connection error")
return(TRUE) # success
}
connect <- function(attempts = 1, sleep.seconds = 6) {
for (i in 1:attempts) {
res <- try(server.connect("my connection string"), silent = TRUE)
if (!("try-error" %in% class(res))) {
print("Connected...")
return(res)
}
print(paste0("Attempt #", i, " failed"))
Sys.sleep(sleep.seconds)
}
stop("Maximum number of connection attempts exceeded")
}
con <- connect(attempts = 10, sleep = 1)
Example execution log:
[1] "Attempt #1 failed"
[1] "Attempt #2 failed"
[1] "Attempt #3 failed"
[1] "Attempt #4 failed"
[1] "Attempt #5 failed"
[1] "Attempt #6 failed"
[1] "Connected..."

Return result from multiple if statements

I'm trying to use multiple actions in if statement. For example:
x <- 1
if (x == 1) {
paste("First")
1*1 #multiple actions
} else if (x == 2) {
paste("Second")
2*2 } else {("Nothing")
}
[1] 1 #what I'm getting
[2] "First"
1 #what I want to get
In this case only the second part of the expressions was printed to the console.
Any ideas how can I run all actions between if and else if ?
All statements are running as intended. The value of a statement is only printed to the console if all these conditions are true:
The value isn't saved to a variable
The value isn't invisible
It's the result of the last statement in an expression
R is running in interactive mode
The reason things sometimes print is to help people interactively exploring data in the command line. Instead of type print(x), they can save a few keystrokes by just typing x. In summary, use print if you want to be sure it's printed:
x <- 1
if (x == 1) {
print("First")
print(1*1)
} else if (x == 2) {
print("Second")
print(2*2)
} else {
invisible("Nothing")
}
# [1] "First"
# [1] 1
You can use print or cat:
getResult <- function(x = 1) {
if (x == 1) {
cat("First", 1 * 1, "\n")
} else if (x == 2) {
print("Second")
print(2 * 2)
} else {
cat("Nothing\n")
}
}
getResult()
# First 1
getResult(2)
# [1] "Second"
# [1] 4

Do loop again when error

I tried to read everything, but i kind of got stuck on one problem.
By using bigrquery I create queries to Google BigQuery to get data - unfortunately sometimes my query doesn't work because of a time-out.
Q is a SQL-Query and BQ is supposed to store the data downloaded from BigQuery.
Does anybody know how to re-do a loop every time tryCatch gives me an error?
I got this so far:
BQ_Foo <- NULL
tryCatch(
{
repeat{
BQ_Foo <- query_exec(Q_foo,"bigquery")
if(is.list(BQ_Foo) == TRUE)break }
}
,error=function(e){cat("ERROR : Query not loaded!", "\n")}
)
EDIT:
I tried my first approach again and this time i received this error message:
Error in curl::curl_fetch_memory(url, handle = handle) :
Operation was aborted by an application callback
Does anybody know how to handle this?
Widely based on r2evans answer, here's how to do the same kind of things with withRestarts, with some helps from This blog post:
set.seed(2)
foo <- NULL
operation <- function(x,tries) {
message(paste("x is",x,"remaining tries",tries))
withRestarts(
tryCatch({
if (runif(1) < x) stop("fail!") else 1
},
error=function(e) { invokeRestart("retry")}),
retry = function() {
message("Retrying")
stopifnot(tries > 0)
operation(x,tries-1)
}
)
}
> operation(0.9,5)
# x is 0.9 remaining tries 5
# Retrying
# x is 0.9 remaining tries 4
# Retrying
# x is 0.9 remaining tries 3
# Retrying
# x is 0.9 remaining tries 2
# Retrying
# x is 0.9 remaining tries 1
[1] 1
It's a kind of recursive call, so you can do whatever you want before calling the function again.
You may do it in the tryCatch error handler the same way, the interest to use restarts handlers is to call a specific function, if you had two tryCatch for which you want nearly same handler behavior then you can add a parameter and use the same handler for each try catch, i.e.:
testfun <- function(x) {
withRestarts({
tryCatch(
{
ifelse(runif(1) < 0.5,stop("Error Message"),warning("Warning message"))
},
warning=function(e) { invokeRestart("logger", level="warning", message=e ) },
error=function(e) { invokeRestart("logger", level="error", message=e ) }
)
},
logger = function(level,message) {
message(date()," [",level,"]: ",message[['message']])
}
)
}
Giving:
> set.seed(2)
> testfun()
Fri Jul 29 14:15:11 2016 [error]: Error Message
> testfun()
Fri Jul 29 14:15:12 2016 [warning]: Warning message
> testfun()
Fri Jul 29 14:15:13 2016 [warning]: Warning message
> testfun()
Fri Jul 29 14:15:13 2016 [error]: Error Message
Main interest here is the factorizing of the logger method and to reduce code duplication.
Naïve Solution
You might start with a mildly naïve attempt of putting the repeat/while outside the tryCatch, something like this:
set.seed(2)
foo <- NULL
while (is.null(foo)) {
foo <- tryCatch({
if (runif(1) < 0.9) stop("fail!") else 1
},
error = function(e) { message("err"); NULL; }
)
}
# err
# err
# err
# err
message("success: ", foo)
# success: 1
Unfortunately you introduce the possibility that the loop will never return. To protect against this, you can try a counter ...
Less-Naïve Solution
set.seed(2)
foo <- NULL
max_attempts <- 3
counter <- 0
while (is.null(foo) && counter < max_attempts) {
counter <- counter + 1
foo <- tryCatch({
if (runif(1) < 0.9) stop("fail!") else 1
},
error = function(e) { message("err"); NULL; }
)
}
# err
# err
# err
if (is.null(foo)) message("final failure") else message("success: ", foo)
# final failure
Now this is better for you, but it may still inadvertently introduce a denial-of-service "attack" on the server. (Consider "why" the query failed: if it is because the server is temporarily inundated, then you are making things worse by clobbering it even for a few limited requests.) Though it slows you down a little, in the case of a busy server, putting in pauses will ease the burden on the server and possibly give you a better chance of a successful query before failing.
Better Solution
In network parlance, small TCP packets can cause congestion when repeated retried (see Nagle's Algorithm for a quick reference). Using some form of exponential backoff is common, and to guard against two (or more) clients doing exactly same backoff simultaneously, some clients jitter slightly (for example, httr::RETRY).
set.seed(2)
foo <- NULL
max_attempts <- 3
# borrowed from hadley/httr::RETRY
pause_cap <- pause_base <- 1
counter <- 0
while (is.null(foo) && counter < max_attempts) {
if (counter > 0L) {
length <- stats::runif(1, max = min(pause_cap, pause_base * (2 ^ counter)))
message("sleeping ", round(length, 1))
Sys.sleep(length)
}
counter <- counter + 1
foo <- tryCatch({
if (runif(1) < 0.9) stop("fail!") else 1
},
error = function(e) { message("err"); NULL; }
)
}
# err
# sleeping 0.7
# err
# sleeping 0.2
if (is.null(foo)) message("final failure") else message("success: ", foo)
# success: 1
Wrap-Up
Somewhat sloppy code, but I hope you get the point. Putting loops on network queries without some form of self-limit can very easily escalate into an inadvertent DOS.
Based on your ideas i created this code, which seems to work - i just need to stresstest it.
QFoo <- paste0('SQL Code', dateBQ, ' ')
BQFoo <- NULL
testfun <- function(QFoo) {
withRestarts({
tryCatch({
query_exec(QFoo, "bigquery")
},
warning = function(e) { invokeRestart("logger", level="warning", message = e) },
error = function(e) { invokeRestart("logger", level="error", message = e) })
},
logger = function(level, message) {
message(date(), " [", level, "]: ", message[['message']])
})
}
testfun(QFoo)

Resources