Clean way to wrap-up and handle RMySQL connections? - r

I'm fairly new to R, so forgive me if this is a amateur question. I still don't get parts of how the R language works and I haven't used closures enough to really build intuition on how to approach this problem.
I want to wrap up opening and closing a database connection in my R project in a clean way. I have a variety of scripts set aside that all use a common DB connection configuration file (I don't put it in my repo, it's a local file only), all of which need to connect to the same MySQL database.
The end goal is to do something like :
query <- db_open()
out <- query("select * from example limit 10")
db_close()
This is what I wrote so far (all my scripts load these functions from another .R file) :
db_open <- function() {
db_close()
db_conn <<- dbConnect(MySQL(), user = db_user, password = db_pass, host = db_host)
query <- function(...) { dbGetQuery(db_conn, ...) }
return(query)
}
db_close <- function() {
result <- tryCatch({
dbDisconnect(db_conn)
}, warning = function(w) {
# ignore
}, error = function(e) {
return(FALSE)
})
return(result)
}
I'm probably thinking of this in an OOP way when I shouldn't be, but sticking db_conn in the global environment feels unnecessary or even wrong.
Is this a reasonable way to accomplish what I want? Is there a better way that I'm missing here?
Any advice is appreciated.

You basically had it, you just need to move the query function into its own function. Regarding keeping db_conn, there really is no reason not to have it in the global environment.
db_open <- function() {
db_close()
db_conn <<- dbConnect(MySQL(), user='root', password='Use14Characters!', dbname='msdb_complex', host='localhost')
}
db_close <- function() {
result <- tryCatch({
dbDisconnect(db_conn)
}, warning = function(w) {
# ignore
}, error = function(e) {
return(FALSE)
})
return(return)
}
query <- function(x,num=-1)
{
q <- dbSendQuery(db_conn, x)
s <- fetch(q, num);
}
Then you should be able to do something like:
query <- db_open()
results <- query("SELECT * FROM msenrollmentlog", 10)
db_close()

Related

How to redo tryCatch after error in for loop

I am trying to implement tryCatch in a for loop.
The loop is built to download data from a remote server. Sometimes the server no more responds (when the query is big).
I have implemented tryCatch in order to make the loop keeping.
I also have added a sys.sleep() pause if an error occurs in order to wait some minutes before sending next query to the remote server (it works).
The problem is that I don't figure out how to ask the loop to redo the query that failed and lead to a tryCatch error (and to sys.sleep()).
for(i in 1:1000){
tmp <- tryCatch({download_data(list$tool[i])},
error = function(e) {Sys.sleep(800)})
}
Could you give me some hints?
You can do something like this:
for(i in 1:1000){
download_finished <- FALSE
while(!download_finished) {
tmp <- tryCatch({
download_data(list$tool[i])
download_finished <- TRUE
},
error = function(e) {Sys.sleep(800)})
}
}
If you are certain that waiting for 800 seconds always fixes the issue this change should do it.
for(i in 1:1000) {
tmp <- tryCatch({
download_data(list$tool[i])
},
error = function(e) {
Sys.sleep(800)
download_data(list$tool[i])
})
}
A more sophisticated approach could be, to collect the information of which request failed and then rerun the script until all requests succeed.
One way to do this is to use the possibly() function from the purrr package. It would look something like this:
todo <- rep(TRUE, length(list$tool))
res <- list()
while (any(todo)) {
res[todo] <- map(list$tool[todo],
possibly(download_data, otherwise = NA))
todo <- map_lgl(res, ~ is.na(.))
}

Wrapping functions in R similar to Python decorators

In my project I have a lot of functions to work with database looking like this:
some_function <- function(smth) {
con <- dbConnect(SQLite(), db)
-- something --
dbDisconnect(con)
return(smth)
}
Is there any way to reduce the code and write something like a Python decorator with connection and disconnetion from db?
like this:
#conenct-disconnect
some_funcion <- function(smth){
-- something --
}
Or may be another way to do it?
A function operator
with_con <- function(f, db) {
function(...) {
con <- dbConnect(SQLite(), db)
f(..., `con` = con)
on.exit(dbDisconnect(con))
}
}
some_func <- function(query, con) {
#so something with x and the connection
dbGetQuery(con, query)
}
connected_func <- with_con(some_func, db)
connected_func('SELECT * FROM table')

TryCatch with parLapply (Parallel package) in R

I am trying to run something on a very large dataset. Basically, I want to loop through all files in a folder and run the function fromJSON on it. However, I want it to skip over files that produce an error. I have built a function using tryCatch however, that only works when i use the function lappy and not parLapply.
Here is my code for my exception handling function:
readJson <- function (file) {
require(jsonlite)
dat <- tryCatch(
{
fromJSON(file, flatten=TRUE)
},
error = function(cond) {
message(cond)
return(NA)
},
warning = function(cond) {
message(cond)
return(NULL)
}
)
return(dat)
}
and then I call parLapply on a character vector files which contains the full paths to the JSON files:
dat<- parLapply(cl,files,readJson)
that produces an error when it reaches a file that doesn't end properly and does not create the list 'dat' by skipping over the problematic file. Which is what the readJson function was supposed to mitigate.
When I use regular lapply, however it works perfectly fine. It generates the errors, however, it still creates the list by skipping over the erroneous file.
any ideas on how I could use exception handling with parLappy parallel such that it will skip over the problematic files and generate the list?
In your error handler function cond is an error condition. message(cond) signals this condition, which is caught on the workers and transmitted as an error to the master. Either remove the message calls or replace them with something like
message(conditionMessage(cond))
You won't see anything on the master though, so removing is probably best.
What you could do is something like this (with another example, reproducible):
test1 <- function(i) {
dat <- NA
try({
if (runif(1) < 0.8) {
dat <- rnorm(i)
} else {
stop("Error!")
}
})
return(dat)
}
cl <- parallel::makeCluster(3)
dat <- parallel::parLapply(cl, 1:100, test1)
See this related question for other solutions. I think using foreach with .errorhandling = "pass" would be another good solution.

Using R try catch to download PDFs of the web

I'm trying to download PDFs using a table containing links. Due to inconsistent formatting of the links, I have created different versions of the same links residing in different columns. For privacy reasons I can't disclose the links, but here is what I did.
links <- data.frame(links1,links2,links3,links4)
filenames <- str_c(format(seq.Date(from = as.Date("2015-04-01"),
to = Sys.Date(), by = "day"),"%Y_%m_%d"),".pdf")
After having created all the versions and the names, here I try to write a loop wrapped in Try-Catch to continue despite the link not being correct. my goal is to when it doesn't find the link on column links$links1[3] to look at the other columns on the same row to find a working link.
Here is my try:
for (i in seq_along(links[,1])) {
#using trycatch to bipass the error when url doesn't exist
tryCatch({
if (!file.exists(str_c(folder,"/",filenames[i]))) {
download.file(links[i,1], filenames[i], mode = "wb")
print(paste0("Downloading: ", filenames[i]))
} }, error = function(e){
for (j in seq_along(links[i,])){
tryCatch({
download.file(links[i,j], filenames[i], mode = "wb")
}, error = function(e){}
)
}
}
)
}
For some reason its not picking up PDFs uploaded on April 9th 2015 and possibly other dates too.
The line for (j in seq_along(links[i,])){ is causing the inner loop to retry the already failed link. If the link fails, it will therefore fail again in the inner loop. Your program continues happily, never having tried the other links.
You should skip over j = 1 in your inner for loop.
Here's a slightly modified version of your program showing what is happening.
links1 <- c('a','b','c')
links2 <- c('x','y','z')
links <- data.frame(links1,links2)
for (i in seq_along(links[,1])) {
#using trycatch to bipass the error when url doesn't exist
tryCatch({
print(sprintf("trying: %s", links[i,1]))
if( i == 2 ) {
stop( simpleError("error"))
}
},
error = function(e){
for (j in seq_along(links[i,])){
tryCatch({
print(sprintf("falling back to %s", links[i,j]))
},
error = function(e){
})
}
})
}

Check that connection is valid

I'm using RPostgreSQL and sqldf inside my function like this:
MyFunction <- function(Connection) {
options(sqldf.RPostgreSQL.user = Connection[1],
sqldf.RPostgreSQL.password = Connection[2],
sqldf.RPostgreSQL.dbname = Connection[3],
sqldf.RPostgreSQL.host = Connection[4],
sqldf.RPostgreSQL.port = Connection[5])
# ... some sqldf() stuff
}
How do I test that connection is valid?
You can check that an existing connection is valid using isPostgresqlIdCurrent.
conn <- dbConnect("RPgSQL", your_database_details)
isPostgresqlIdCurrent(conn)
For testing new connections, I don't think that there is a way to know if a connection is valid without trying it. (How would R know that the database exists and is available until it tries to connect?)
For most analysis purposes, just stopping on an error and fixing the login details is the best approach. So just call dbConnect and don't worry about extra check functions.
If you are creating some kind of application where you need to to handle errors gracefully, a simple tryCatch wrapper should do the trick.
conn <- tryCatch(conn <- dbConnection(wherever), error = function(e) do_something)
My current design uses tryCatch:
Connection <- c('usr','secret','db','host','5432')
CheckDatabase <- function(Connection) {
require(sqldf)
require(RPostgreSQL)
options(sqldf.RPostgreSQL.user = Connection[1],
sqldf.RPostgreSQL.password = Connection[2],
sqldf.RPostgreSQL.dbname = Connection[3],
sqldf.RPostgreSQL.host = Connection[4],
sqldf.RPostgreSQL.port = Connection[5])
out <- tryCatch(
{
sqldf("select TRUE;")
},
error=function(cond) {
out <- FALSE
}
)
return(out)
}
if (!CheckDatabase(Connection)) {
stop("Not valid PostgreSQL connection.")
} else {
message("PostgreSQL connection is valid.")
}
One approach is to simply try executing the code, and catching any errors with a nice informative error message. Have a look at the documentation of tryCatch to see the details regarding how this works.
The following blog post provides an introduction to the exception-based style of programming.

Resources