I am using odbc in my R to get data from SQL server.
Recently, I had an issue: For some unknown reason. My query may take hours to get the result from the SQL server. It was fine before. The return data is only 10,000 rows. My data team colleagues haven't figure out the reason. My old code was:
getSqlData = function(server, sqlstr){
con = odbc::dbConnect(odbc(),
Driver = "SQL Server",
Server = server,
Trusted_Connection = "True")
result = odbc::dbGetQuery(con, sqlstr)
dbDisconnect(con)
return(result)
}
At first, I was trying to find a timeout parameter for dbGetQuery(). Unfortunately, there is no such parameter for this function. So I decide to monitor the runtime by myself.
getSqlData = function(server, sqlstr){
con = odbc::dbConnect(odbc(),
Driver = "SQL Server",
Server = server,
Trusted_Connection = "True")
result = tryCatch(
{
a = withTimeout(odbc::dbGetQuery(con, sqlstr), timeout = 600, onTimeout = "error")
return(a)
},
error=function(cond) {
msg <- "The query timed out:"
msg <- paste(msg,sqlstr,sep = " ")
error(logger,msg)
return(NULL)
},finally={
dbDisconnect(con)
}
)
return(result)
}
I force the function to stop if dbGetQuery() didn't finish in 10 mins. However, I get warning message as
In connection_release(conn#ptr) : There is a result object still in use.
The connection will be automatically released when it is closed
My understanding is this means the query is still running and the connection is not closed.
Is there a way to force the connection to be closed and force the query to stop?
The other thing I notice is even I set timeout = 1, it will not raise the error in 1s, it will run for around 1mins and then raise the error. Does anyone know why it behaved like this?
Thank you.
Related
I am trying to transfer a large object from a session to another using a socketConnection.
With the following I can establish the connection between the two session and see that it works for small messages:
socket <- serverSocket(11927)
r <- callr::r_session$new()
r$call(function() {
assign(
"conection",
socketConnection(port = 11927, open = "wb", blocking = TRUE, server = FALSE),
envir = .GlobalEnv
)
NULL
})
con <- socketAccept(socket, open = "wb", blocking = TRUE)
close(socket)
r$read()
serialize("hello world", con)
r$run(function() {
serialize(paste("hello from there: ", unserialize(conection)), conection)
})
unserialize(con)
Now if I try to serialize a large value, for example:
r$run(function() {
x <- runif(256*256*3)
serialize(x, conection)
TRUE
})
The serialization never finishes. It's worth noting that this works as expected on Linux. I didn't try on Windows.
I think this should work correctly because, using the parallel package, that also uses socket connections to transfer objects everything works as expected and I can transfer large objects pretty quickly. For example:
cl <- parallel::makePSOCKcluster(1)
parallel::clusterEvalQ(cl, {
get_batch <- function() runif(256*256*3)
})
out <- parallel::clusterCall(cl, "get_batch")
Any idea on what could be causing this behavior on macOS?
Sometimes I run into an issue where the database query generates error. One example is:
nanodbc/nanodbc.cpp:3069: 07009: [Microsoft][ODBC SQL Server Driver]Invalid Descriptor Index
I know why the error occurs but I can't seem to catch the error to try something else when something like this happens.
result <- tryCatch(
data <- tbl(conn, query),
error = function(e){
print("Error encountered: ", e)
print("Attempting to run by sorting the columns")
new_query <- create_query_with_column_names(query)
print("Attempting to fetch the data with the new query")
data <- tbl(conn, new_query)
end_time <- Sys.time()
show_query_runtime(total_time=end_time-start_time, caller="fetch data without lazy loading.")
}
)
But instead, the code runs without error, but when I run the result, I get the error again.
> result
Error in result_fetch(res#ptr, n) :
nanodbc/nanodbc.cpp:3069: 07009: [Microsoft][ODBC SQL Server Driver]Invalid Descriptor Index
Warning message:
In dbClearResult(res) : Result already cleared
The above code won't catch the error. Why? How can I fix this?
Take a look at this answer for detailed guidance on tryCatch in R.
The problem is most likely how you are returning values.
If it executes correctly, the Try part returns the last statement in the section.
If the Try does not execute correctly, then the error section will return the last statement in the section.
Right now, the last statement in your error section is show_query_runtime(...) but what it looks like you want is tbl(conn, new_query).
Try the following, note the use of return to specify the value that should be returned:
result <- tryCatch(
# try section
data <- tbl(conn, query),
# error section
error = function(e){
print("Error encountered: ", e)
print("Attempting to run by sorting the columns")
new_query <- create_query_with_column_names(query)
print("Attempting to fetch the data with the new query")
data <- tbl(conn, new_query)
end_time <- Sys.time()
show_query_runtime(total_time=end_time-start_time, caller="fetch data without lazy loading.")
return(data)
}
)
In case it is part of the confusion, assigning data <- tbl(conn, new_query) within a function does not make the assignment in the calling environment. The variable data is not available once the tryCatch finishes. This is why we need to return the result out of the calling function.
I am using RStudio Server and ODBC to connect to a redshift database. I can connect easily using:
conn <- dbConnect(odbc::odbc(), Driver="redshift",
Server = SERVER_URL,
Port = "5439",
Database = DB_NAME,
PWD = PASSWORD,
UID = CREDENTIALS,
timeout = 10,
Trusted_Connection = "True")
When connected in shows up in the sidebar "connections" where I have an UI to look through the database. That is exactly what i want.
The problem is that if i call the same code inside a function, then I get the database connection but no UI?!? How do i get the UI to appear when calling this code from inside a function?
C
onnection_odbc_profile <- function(INPUT){
conn <- dbConnect(odbc::odbc(), Driver="redshift",
Server = SERVER_URL,
Port = "5439",
Database = DB_NAME,
PWD = PASSWORD,
UID = CREDENTIALS,
timeout = 10,
Trusted_Connection = "True")
return(conn)
}
I think the issue is that the connection pane only gets updated when the code is run at top-level. Is there any way to force a line of code in a function to run at top-level (or directly in the console)
I solved the problem by adding:
code <- c(match.call()) # This saves what was typed into R
odbc:::on_connection_opened(conn, paste(c(paste("con <-", gsub(", ", ",\n\t", code))), collapse = "\n"))
I'm using tbl_sql object in my Shiny app to have access to a database table. I've noticed that sometimes dplyr close this connection. It might be because garbage collector calls db_disconnector. Is there any way to stop this? I could close the connection on the shiny close event.
It seems like, if you d <- src_mysql(...) (I guess that's the backend you're using, and how you're connecting to the data base?) then the garbage collector will only run if d goes out of scope. Maybe its the database that is timing out connections as a way to manage load?
One way to test this is to write your own wrapper (rather than src_mysql()) that does not disconnect
src_yoursql <-
function (dbname, host = NULL, port = 0L, user = "root", password = "",
...)
{
if (!requireNamespace("RMySQL", quietly = TRUE)) {
stop("RMySQL package required to connect to mysql/mariadb",
call. = FALSE)
}
con <- DBI::dbConnect(RMySQL::MySQL(), dbname = dbname, host = host,
port = port, username = user, password = password, ...)
info <- DBI::dbGetInfo(con)
src_sql("mysql", con, info = info)
}
d = src_yoursql(...)
Close it manually with
DBI::dbDisconnect(d$con)
I'm using RPostgreSQL and sqldf inside my function like this:
MyFunction <- function(Connection) {
options(sqldf.RPostgreSQL.user = Connection[1],
sqldf.RPostgreSQL.password = Connection[2],
sqldf.RPostgreSQL.dbname = Connection[3],
sqldf.RPostgreSQL.host = Connection[4],
sqldf.RPostgreSQL.port = Connection[5])
# ... some sqldf() stuff
}
How do I test that connection is valid?
You can check that an existing connection is valid using isPostgresqlIdCurrent.
conn <- dbConnect("RPgSQL", your_database_details)
isPostgresqlIdCurrent(conn)
For testing new connections, I don't think that there is a way to know if a connection is valid without trying it. (How would R know that the database exists and is available until it tries to connect?)
For most analysis purposes, just stopping on an error and fixing the login details is the best approach. So just call dbConnect and don't worry about extra check functions.
If you are creating some kind of application where you need to to handle errors gracefully, a simple tryCatch wrapper should do the trick.
conn <- tryCatch(conn <- dbConnection(wherever), error = function(e) do_something)
My current design uses tryCatch:
Connection <- c('usr','secret','db','host','5432')
CheckDatabase <- function(Connection) {
require(sqldf)
require(RPostgreSQL)
options(sqldf.RPostgreSQL.user = Connection[1],
sqldf.RPostgreSQL.password = Connection[2],
sqldf.RPostgreSQL.dbname = Connection[3],
sqldf.RPostgreSQL.host = Connection[4],
sqldf.RPostgreSQL.port = Connection[5])
out <- tryCatch(
{
sqldf("select TRUE;")
},
error=function(cond) {
out <- FALSE
}
)
return(out)
}
if (!CheckDatabase(Connection)) {
stop("Not valid PostgreSQL connection.")
} else {
message("PostgreSQL connection is valid.")
}
One approach is to simply try executing the code, and catching any errors with a nice informative error message. Have a look at the documentation of tryCatch to see the details regarding how this works.
The following blog post provides an introduction to the exception-based style of programming.