Show all open RODBC connections - r

Does anyone know how to do this? showConnections won't list any open connections from odbcConnect.

You can narrow down your search in the following way, which will return all variables of your environment currently of the class RODBC.
envVariables<-ls()
bools<-sapply(envVariables, function(string){
class(get(string))=="RODBC"
})
rodbcObj<-envVariables[bools]
Closed connections are still of the class RODBC though, so there is still a little work to be done here.
We can define a function, using trycatch, that will try to get the connection info of the associated RODBC object. If it is an open connection, then this command will run fine, and we return the string of the variable name.
If the RODBC object is not an open connection, this will throw an error, which we catch and, in the way I've implemented, return NA. You could return any number of things here.
openConns<-function(string){
tryCatch({
result<-odbcGetInfo(get(string))
string
}, error = function(e){
NA
})
}
We then remove the return value that corresponds to the error. In my case, NA, so I do na.omit on the return.
na.omit(sapply(rodbcObj, openConns))
Or alternatively
result<-sapply(rodbcObj, openConns)
result[!is.na(result)]
Any questions or comments on it let me know
-DMT

Related

Adding a column to MongoDB from R via mongolite gives persistent error

I want to add a column to a MongoDB collection via R. The collection has tabular format and is already relatively big (14000000 entries, 140 columns).
The function I am currently using is
function (collection, name, value)
{
mongolite::mongo(collection)$update("{}", paste0("{\"$set\":{\"",
name, "\": ", value, "}}"), multiple = TRUE)
invisible(NULL)
}
It does work so far. (It takes about 5-10 Minutes, which is ok. Although, it would be nice if the speed could be improved somehow).
However, it also gives me persistently the following error that interrupts the execution of the rest of the script.
The error message reads:
Error: Failed to send "update" command with database "test": Failed to read 4 bytes: socket error or timeout
Any help on resolving this error would be appreciated. (If there are ways to improve the performance of the update itself I'd also be more than happy for any advices.)
the default socket timeout is 5 minutes.
You can override the default by setting sockettimeoutms directly in your connection URI:
mongoURI <- paste0("mongodb://", user,":",pass, "#", mongoHost, ":", mongoPort,"/",db,"?sockettimeoutms=<something large enough in milliseconds>")
mcon <- mongo(mongoCollection, url=mongoURI)
mcon$update(...)

How to force a For loop() or lapply() to run with error message in R

On this code when I use for loop or the function lapply I get the following error
"Error in get_entrypoint (debug_port):
Cannot connect R to Chrome. Please retry. "
library(rvest)
library(xml2) #pull html data
library(selectr) #for xpath element
url_stackoverflow_rmarkdown <-
'https://stackoverflow.com/questions/tagged/r-markdown?tab=votes&pagesize=50'
web_page <- read_html(url_stackoverflow_rmarkdown)
questions_per_page <- html_text(html_nodes(web_page, ".page-numbers.current"))[1]
link_questions <- html_attr(html_nodes(web_page, ".question-hyperlink")[1:questions_per_page],
"href")
setwd("~/WebScraping_chrome_print_to_pdf")
for (i in 1:length(link_questions)) {
question_to_pdf <- paste0("https://stackoverflow.com",
link_questions[i])
pagedown::chrome_print(question_to_pdf)
}
Is it possible to build a for loop() or use lapply to repeat the code from where it break? That is, from the last i value without breaking the code?
Many thanks
I edited #Rui Barradas idea of tryCatch().
You can try to do something like below.
The IsValues will get either the link value or bad is.
IsValues <- list()
for (i in 1:length(link_questions)) {
question_to_pdf <- paste0("https://stackoverflow.com",
link_questions[i])
IsValues[[i]] <- tryCatch(
{
message(paste("Converting", i))
pagedown::chrome_print(question_to_pdf)
},
error=function(cond) {
message(paste("Cannot convert", i))
# Choose a return value in case of error
return(i)
})
}
Than, you can rbind your values and extract the bad is:
do.call(rbind, IsValues)[!grepl("\\.pdf$", do.call(rbind, IsValues))]
[1] "3" "5" "19" "31"
You can read more about tryCatch() in this answer.
Based on your example, it looks like you have two errors to contend with. The first error is the one you mention in your question. It is also the most frequent error:
Error in get_entrypoint (debug_port): Cannot connect R to Chrome. Please retry.
The second error arises when there are links in the HTML that return 404:
Failed to generate output. Reason: Failed to open https://lh3.googleusercontent.com/-bwcos_zylKg/AAAAAAAAAAI/AAAAAAAAAAA/AAnnY7o18NuEdWnDEck_qPpn-lu21VTdfw/mo/photo.jpg?sz=32 (HTTP status code: 404)
The key phrase in the first error is "Please retry". As far as I can tell, chrome_print sometimes has issues connecting to Chrome. It seems to be fairly random, i.e. failed connections in one run will be fine in the next, and vice versa. The easiest way to get around this issue is to just keep trying until it connects.
I can't come up with any fix for the second error. However, it doesn't seem to come up very often, so it might make sense to just record it and skip to the next URL.
Using the following code I'm able to print 48 of 50 pages. The only two I can't get to work have the 404 issue I describe above. Note that I use purrr::safely to catch errors. Base R's tryCatch will also work fine, but I find safely to be a little more convient. That said, in the end it's really just a matter of preference.
Also note that I've dealt with the connection error by utilizing repeat within the for loop. R will keep trying to connect to Chrome and print until it is either successful, or some other error pops up. I didn't need it, but you might want to include a counter to set an upper threshold for the number of connection attempts:
quest_urls <- paste0("https://stackoverflow.com", link_questions)
errors <- NULL
safe_print <- purrr::safely(pagedown::chrome_print)
for (qurl in quest_urls){
repeat {
output <- safe_print(qurl)
if (is.null(output$error)) break
else if (grepl("retry", output$error$message)) next
else {errors <- c(errors, `names<-`(output$error$message, qurl)); break}
}
}

Repeatedly running a function in r till an error is not produced.

I apologize that I can not tell you what these functions are form the start.
I have a function CheckOutCell. It takes one argument and that is the number 764. So every time I run the function it looks like this in it's entirety: CheckOutCell(764).
Now many times the function will give me an error:
Error in checkInCell(764) :
The function is currently locked; try again in a minute.
Which is a custom error message and the details are not important to this question.
Now this function could be locked from anywhere from 30 seconds to an hour. I want to be able to automatically run CheckOutCell(764) till it goes through, and then stop running it. That is, run it till I do not get an error, then stop.
I think a start would be using
while(capture.output(checkInCell(764)) == "Error in checkInCell(764) :
The function is currently locked; try again in a minute."){
do something}
However this just produces
Error in checkInCell(764) :
The function is currently locked; try again in a minute.
because the function is still locked, so no output can be captured.
How would I test for while(error = T)
Assume the source code of the function cannot be modified.
Even is.error(CheckInCell(764)) will just produce the same error message
So it seems that this code works in a way
wrapcheck <- function(x){
repeatCheck =tryCatch(checkOutCell(764),
error = function(cond)"skip")
SudoCheck = ifelse(repeatCheck=="skip",repeatCheck, checkOutCell(764))
while(SudoCheck == "skip"){
repeatCheck
}
}
wrapcheck(764)
Basically this checks for an error and then keeps running the function till the error is not produced. In fact I am fairly confident that this would work with any funciton you wanted to put in place of CheckOutCell.
The main problem is that when the function is locked, that it not really an error, it is locked. Therefore this above block will not work. This above block will work when errors other than a lock are produced.

parlapply on sqlQuery from RODBC

R Version : 2.14.1 x64
Running on Windows 7
Connecting to a database on a remote Microsoft SQL Server 2012
I have an unordered vectors of names, say:
names<-c(“A”, “B”, “A”, “C”,”C”)
each of which have an id in a table in my db. I need to convert the names to their corresponding ids.
I currently have the following code to do it.
###
names<-c(“A”, “B”, “A”, “C”,”C”)
dbConn<-odbcDriverConnect(connection=”connection string”) #successfully connects
nameToID<-function(name, dbConn){
#dbConn : active db connection formed via odbcDriverConnect
#name : a char string
sqlQuery(dbConn, paste(“select id from table where name=’”, name, “’”, sep=””))
}
sapply(names, nameToID, dbConn=dbConn)
###
Barring better ways to do this, which could involve loading the table into R then working with the problem there (which is possible), I understand why the following doesn’t work, but I cannot seem to find a solution. Attempting to use parallelization via the package ‘parallel’ :
###
names<-c(“A”, “B”, “A”, “C”,”C”)
dbConn<-odbcDriverConnect(connection=”connection string”) #successfully connects
nameToID<-function(name, dbConn){
#dbConn : active db connection formed via odbcDriverConnect
#name : a char string
sqlQuery(dbConn, paste(“select id from table where name=’”, name, “’”, sep=””))
}
mc<-detectCores()
cl<-makeCluster(mc)
clusterExport(cl, c(“sqlQuery”, “dbConn”))
parSapply(cl, names, nameToID, dbConn=dbConn) #incorrect passing of nameToID’s second argument
###
As in the comment, this is not the correct way to assign the second argument to nameToID.
I have also tried the following:
parSapply(cl, names, function(x) nameToID(x, dbConn))
in place of the previous parSapply call, but that also does not work, with the error being thrown saying “the first parameter is not an open RODBC connection”, presumably referring to the first parameter of the sqlQuery(). dbConn remains open though
The following code does work with parallization.
###
names<-c(“A”, “B”, “A”, “C”,”C”)
dbConn<-odbcDriverConnect(connection=”connection string”) #successfully connects
nameToID<-function(name){
#name : a char string
dbConn<-odbcDriverConnect(connection=”string”)
result<-sqlQuery(dbConn, paste(“select id from table where name=’”, name, “’”, sep=””))
odbcClose(dbConn)
result
}
mc<-detectCores()
cl<-makeCluster(mc)
clusterExport(cl, c(“sqlQuery”, “odbcDriverConnect”, “odbcClose”, “dbConn”, “nameToID”)) #throwing everything in
parSapply(cl, names, nameToID)
###
But the constant opening and closing of the connection ruins the gains from parallelization, and seems just a bit silly.
So the overall question would be how to pass the second parameter (the open db connection) to the function within parSapply, in much the same way as it is done in the regular apply? In general, how does one pass a second, third, nth parameter to a function within a parallel routine?
Thanks and if you need any more information let me know.
-DT
Database connection objects can't be exported or passed as function arguments because they contain socket connections. If you try, it will be serialized, sent to the workers and deserialized, but it won't work correctly since the socket connection won't be valid.
The solution is to create the database connection on each worker before calling parSapply. I often do that using clusterEvalQ:
clusterEvalQ(cl, {
library(RODBC)
dbConn <- odbcDriverConnect(connection="connection string")
NULL
})
Now the worker function can be written as:
nameToID <- function(name) {
sqlQuery(dbConn, paste("select id from table where name='", name, "'", sep=""))
}
and called with:
parSapply(cl, names, nameToID)
Also note that since RODBC is loaded on each of the workers you don't have to export functions defined in it, which I think is good programming practice.

Websockets in R

I managed to establish a connection in R to Mtgox websocket with following specs:
url: https://socketio.mtgox.com/mtgox?Currency=USD
port: 80
specs: https://en.bitcoin.it/wiki/MtGox/API/Streaming
I used the improved R library "websocket" downloaded from https://github.com/zeenogee/R-Websockets:
require("websockets")
con = websocket("https://socketio.mtgox.com/mtgox?Currency=USD")
and the connection was successfully established. However, it seems that the socket is not broadcasting. I made an easy function f
f = function(con) {
Print("Test Test!", con)
}
set_callback("receive", f, con)
while(TRUE)
{
service(con)
Sys.sleep(0.05)
}
which should print some text whenever some data are received from the websocket. But the websocket doesnt seem to trigger the "receive" method and nothing is displayed. Code ended up with infinite loop with no output.
I know that the websocket is working so there must be a mistake in the code. Do I have to "ping" the socket somehow to start broadcasting? Anyone has anidea how to get it working?
Thanks!
Firstly, you have an infinite loop, because you have defined an infinite loop:
While(TRUE)
It is worth noting, numerous R websocket implementations leverage this loop, so may not be a bug but rather an implementation detail causing what you are seeing.
It would appear that you need to subscribe to the 'message' event not 'receive' (
https://en.bitcoin.it/wiki/MtGox/API/Streaming).
In JavaScript (from MtGox Spec):
conn.on('message', function(data) {
// Handle incoming data object.
});
Or in R:
set_callback('message',f,con)
Failing that...
I would also comment to say, that maybe the stream is returning you data that you are not able to implicitly print in R Print function.
Sample:
{
"op":"remark",
"message":<MESSAGE FROM THE SERVER>,
"success":<boolean>
}
If the data follows this format as defined in spec, you may which to examine how that data is being parsed, and the "op" which is being returned.

Resources