R tryCatch RODBC function issue - r

We have a number of MS Access databases on a server which are copies from remote locations which are updated overnight. We collate some of the data from these machines for reporting purposes on a daily basis. Sometimes the overnight update fails, meaning we don’t have access to all of the databases, so I am attempting to write an R script which will test if we can connect (using a list of the database paths), and output an updated version of the list including only those which we can connect to. This will then be used to run a further script which will only update the data related to the available databases.
This is what I have so far (I am new to R but reasonably proficient in SAS and SQL – attempting to use R both as a learning exercise and for potential cost savings);
{
# Create Store data locations listing
A=matrix(c(1000,1,"One","//Server/Comms1/Access.mdb"
,2000,2,"Two","//Server/Comms2/Access.mdb"
,3000,3,"Three","//Server/Comms3/Access.mdb"
)
,nrow=3,ncol=4,byrow=TRUE)
# Add column names
colnames(A)<-c("Ref1","Ref2","Ref3","Location")
#Create summary for testing connections (Ref1 and Location)
B<-A[,c(1,4)]
ConnectionTest<-function(Ref1,Location)
{
out<-tryCatch({ch<-odbcDriverConnect(paste("Driver={Microsoft Access Driver (*.mdb, *.accdb)};DBQ=",Location))
sqlQuery(ch,paste("select ",Ref1," as Ref1,COUNT(variable) as Count from table"))}
,error=matrix(c(Ref1,0),nrow=1,ncol=2,byrow=TRUE)
)
return(out)
}
#Run function, using 'B' to provide arguments
C<-apply(B,1,function(x)do.call(ConnectionTest,as.list(x)))
#Convert to matrix and add column names
D<-matrix(unlist(C),ncol=2,byrow=T)
colnames(D)<-c("Ref1","Count")
}
When I run the script I get the following error message;
Error in value[3L] : attempt to apply non-function
I am guessing this is because I am using TryCatch incorrectly inside the UDF?
Does anyone have any advice on what I am doing incorrectly, or even if this is the best way to do what I am attempting?
Thanks
(apologies if this is formatted incorrectly, having to post on my phone due to Stackoverflow posting being blocked)
Edit - I think I fixed the 'Error in value[3L]' issue by adding function(e) {} around the matrix function in the error part of the tryCatch.
The issue now is that the script just fails if it can't reach one of the databases, rather than doing the matrix function. Do I need to add something else to make it ignore the error?
Edit 2 - it seems tryCatch does now work - it processes the
alternate function upon error but also shows warnings about the error, which makes sense.

As mentioned in the edit above, using 'function(e) {}' to wrap the Matrix function in the error section of the tryCatch fixed the 'Error in value[3L]' issue, so the script now works, but displays error messages if it can't access a particular channel. I am guessing the 'warning' section of the tryCatch can be used to adjust these as necessary.

Related

RODBC connection issue

I am trying to use RODBC to connect to an access database. I have used the same structure several times in this project with success. However, in this instance it is now failing and I cannot figure out why. The code is not really reprex as I can't provide the DB, but...
This works for a single table:
library(magrittr);library(RODBC)
#xWalk_path is simply the path to the accdb
#xtabs generated by querying the available tables
x=1
tab=xtabs$TABLE_NAME[x]
temp<-RODBC::odbcConnectAccess2007(xWalk_path)%>%
RODBC::sqlFetch(., tab, stringsAsFactors = FALSE)
odbcCloseAll()
#that worked perfectly
However, I really want to use this in a a function so I can read several similar tables into a list. As a function it does not work:
xWalk_ls<- lapply(seq_along(xtabs$TABLE_NAME), function(x, xWalk_path=xWalk_path, tab=xtabs$TABLE_NAME[x]){
#print(tab) #debug code
temp<-RODBC::odbcConnectAccess2007(xWalk_path)%>%
RODBC::sqlFetch(., tab, stringsAsFactors = FALSE)
return(temp)
odbcCloseAll()
})
#error every time
The above code will return the error:
Warning in odbcDriverConnect(con, ...) :
[RODBC] ERROR: Could not SQLDriverConnect
Warning in odbcDriverConnect(con, ...) : ODBC connection failed
Error in RODBC::sqlFetch(., tab, stringsAsFactors = FALSE) :
first argument is not an open RODBC channel
I am baffled. I accessed the db to pull table names and generate the xtabs variable using sql Tables. Also, earlier in my code I used a similar code structure (not identical, but same core: sqlFetch to retrieve a table into a list) nd it worked without a problem. Only difference between then and now is that: Then I was opening and closing different .accdb files, but pulling the same table name from each. Now, I am opening and closing the same .accdb file but pulling different sheet names each time.
Am I somehow opening and closing this too fast and it is getting irritated with me? That seems unlikely, because if I force it to print(tab) as the first line of the function it will only print the first table name. If it was getting annoyed about the speed of opening an closing I would expect it to print 2 table names before throwing the error.
return returns its argument and exits, so the remaining code (odbcCloseAll()) won't be executed and the opened file (AccessDB) remains locked as you supposed.

r extension netlogo, object 'economicvalue' not found

I am working in netlogo on a model which has to communicate with R during the run. I do this using the r extension in netlogo (so not Rnetlogo in R).
at the setup I load my script with
r:eval "source('C:/Users/keemi/OneDrive/Documenten/Thesis/heatpumps/scriptHeatpumpV1.R')"
this works fine since I can ask what I want coming from the script with this code. r:get "cpquery(fittedHeatpumpv1, event = (Reliability == 0.88), evidence = (Economic == 0.08))" this gives me a chance percentage of the event given the evidence.
However the evidence must come from the netlogo network, I do this using
r:put "economicvalue" reliability this creates a variable in r -> economicvalue from the value of reliability in netlogo (which is 0.08 for the example).
I then put in the following code r:get "cpquery(fittedHeatpumpv1, event = (Reliability == 0.88), evidence = (Economic == economicvalue))" to get to the same result, however netlogo gives the error
Extension exception: Error in R-Extension: Error in Get.
org.nlogo.api.ExtensionException: Error in eval(evidence, generated.data, parent.frame()) :
object 'economicvalue' not found
error while company 157 running R:GET
called by procedure INVEST
called by procedure GO
called by Button 'go-once'
this is odd since if I do the same thing in r itself it works just fine. and the script itself also works fine since I can load things from it.
I also checked the value of the r:put and this was indeed set to 0.08 if I call it back using r:get "economicvalue"
I also tested it already without the variable coming from netlogo but just giving the command directly to r using r:eval "economicvalue <- 0.08"but the same error occurs.
I can't figure out what I am doing wrong here, since the code works in r itself if I put the same code lines but not coming from netlogo, and netlogo also performs well since I can see if the r commands work with the r:get and this all gives the right values.
could somebody help me out?

rxDataStep in RevoScaleR package crashing

I am trying to create a new factor column on an .xdf data set with the rxDataStep function in RevoScaleR:
rxDataStep(nyc_lab1
, nyc_lab1
, transforms = list(RatecodeID_desc = factor(RatecodeID, levels=RatecodeID_Levels, labels=RatecodeID_Labels))
, overwrite=T
)
where nyc_lab1 is a pointer to a .xdf file. I know that the file is fine because I imported it into a data table and successfully created a the new factor column.
However, I get the following error message:
Error in doTryCatch(return(expr), name, parentenv, handler) :
ERROR: The sample data set for the analysis has no variables.
What could be wrong?
First, RevoScaleR has some warts when it comes to replacing data. In particular, overwriting the input file with the output can sometimes causes rxDataStep to fail for unknown reasons.
Even if it works, you probably shouldn't do it anyway. If there is a mistake in your code, you risk destroying your data. Instead, write to a new file each time, and only delete the old file once you've verified you no longer need it.
Second, any object you reference that isn't part of the dataset itself, has to be passed in via the transformObjects argument. See ?rxTransform. Basically the rx* functions are meant to be portable to distributed computing contexts, where the R session that runs the code isn't be the same as your local session. In this scenario, you can't assume that objects in your global environment will exist in the session where the code executes.
Try something like this:
nyc_lab2 <- RxXdfData("nyc_lab2.xdf")
nyc_lab2 <- rxDataStep(nyc_lab1, nyc_lab2,
transforms=list(
RatecodeID_desc=factor(RatecodeID, levels=.levs, labels=.labs)
),
rxTransformObjects=list(
.levs=RatecodeID_Levels,
.labs=RatecodeID_Labels
)
)
Or, you could use dplyrXdf which will handle all this file management business for you:
nyc_lab2 <- nyc_lab1 %>% factorise(RatecodeID)

How do I close unused connections after read_html in R

I am quite new to R and am trying to access some information on the internet, but am having problems with connections that don't seem to be closing. I would really appreciate it if someone here could give me some advice...
Originally I wanted to use the WebChem package, which theoretically delivers everything I want, but when some of the output data is missing from the webpage, WebChem doesn't return any data from that page. To get around this, I have taken most of the code from the package but altered it slightly to fit my needs. This worked fine, for about the first 150 usages, but now, although I have changed nothing, when I use the command read_html, I get the warning message " closing unused connection 4 (http:....." Although this is only a warning message, read_html doesn't return anything after this warning is generated.
I have written a simplified code, given below. This has the same problem
Closing R completely (or even rebooting my PC) doesn't seem to make a difference - the warning message now appears the second time I use the code. I can run the querys one at a time, outside of the loop with no problems, but as soon as I try to use the loop, the error occurs again on the 2nd iteration.
I have tried to vectorise the code, and again it returned the same error message.
I tried showConnections(all=TRUE), but only got connections 0-2 for stdin, stdout, stderr.
I have tried searching for ways to close the html connection, but I can't define the url as a con, and close(qurl) and close(ttt) also don't work. (Return errors of no applicable method for 'close' applied to an object of class "character and no applicable method for 'close' applied to an object of class "c('xml_document', 'xml_node')", repectively)
Does anybody know a way to close these connections so that they don't break my routine? Any suggestions would be very welcome. Thanks!
PS: I am using R version 3.3.0 with RStudio Version 0.99.902.
CasNrs <- c("630-08-0","463-49-0","194-59-2","86-74-8","148-79-8")
tit = character()
for (i in 1:length(CasNrs)){
CurrCasNr <- as.character(CasNrs[i])
baseurl <- 'http://chem.sis.nlm.nih.gov/chemidplus/rn/'
qurl <- paste0(baseurl, CurrCasNr, '?DT_START_ROW=0&DT_ROWS_PER_PAGE=50')
ttt <- try(read_html(qurl), silent = TRUE)
tit[i] <- xml_text(xml_find_all(ttt, "//head/title"))
}
After researching the topic I came up with the following solution:
url <- "https://website_example.com"
url = url(url, "rb")
html <- read_html(url)
close(url)
# + Whatever you wanna do with the html since it's already saved!
I haven't found a good answer for this problem. The best work-around that I came up with is to include the function below, with Secs = 3 or 4. I still don't know why the problem occurs or how to stop it without building in a large delay.
CatchupPause <- function(Secs){
Sys.sleep(Secs) #pause to let connection work
closeAllConnections()
gc()
}
I found this post as I was running into the same problems when I tried to scrape multiple datasets in the same script. The script would get progressively slower and I feel it was due to the connections. Here is a simple loop that closes out all of the connections.
for (i in seq_along(df$URLs)){function(i)
closeAllConnections(i)
}

R Parallelisation Error unserialize(socklisk[[n]])

In a nutshell I am trying to parallelise my whole script over dates using Snow and adply but continually get the below error.
Error in unserialize(socklist[[n]]) : error reading from connection
In addition: Warning messages:
1: <anonymous>: ... may be used in an incorrect context: ‘.fun(piece, ...)’
2: <anonymous>: ... may be used in an incorrect context: ‘.fun(piece, ...)’
I have set up the parallelisation process in the following way:
Cores = detectCores(all.tests = FALSE, logical = TRUE)
cl = makeCluster(Cores, type="SOCK")
registerDoSNOW(cl)
clusterExport(cl, c("Var1","Var2","Var3","Var4"), envir = environment())
exposureDaily <- adply(.data = dateSeries,.margins = 1,.fun = MainCalcFunction,
.expand = TRUE, Var1, Var2, Var3,
Var4,.parallel = TRUE)
stopCluster(cl)
Where dateSeries might look something like
> dateSeries
marketDate
1 2016-04-22
2 2016-04-26
MainCalcFunction is a very long script with multiple of my own functions contained within it. As the script is so long reproducing it wouldn't be practical, and a hypothetical small function would defeat the purpose as I have already got this methodology to work with other smaller functions. I can say that within MainCalcFunction I call all my libraries, necessary functions, and a file containing all other variables aside from those exported above so that I don't have to export a long list libraries and other objects.
MainCalcFunction can run successfully in its entirety over 2 dates using adply but not parallelisation, which tells me that it is not a bug in the code that is causing the parallelisation to fail.
Initially I thought (from experience) that the parallelisation over dates was failing because there was another function within the code that utilised parallelisation, however I have subsequently rebuilt the whole code to make sure that there was no such function.
I have poured over the script with a fine tooth comb to see if there was any place where I accidently didn't export something that I needed and I can't find anything.
Some ideas as to what could be causing the code to fail are:
The use of various option valuation functions in fOptions and rquantlib
The use of type sock
I am aware of this question already asked and also this question, and while the first question has helped me, it hasn't yet help solve the problem. (Note: that may be because I haven't used it correctly, having mainly used loginfo("text") to track where the code is. Potentially, there is a way to change that such that I log warning and/or error messages instead?)
Please let me know if there is any other information I can provide to help in solving this. I would be so appreciative if someone could provide some guidance, as the code takes close to 40 minutes to run for a day and I need to run it for close to a year, therefore parallelisation is essential!
EDIT
I have tried to implement the suggestion in the first question included above by utilising the outfile option. Given I am using Windows, I have done this by including the following lines before the exporting of the key objects and running MainCalcFunction :
reportLogName <- paste("logout_parallel.txt", sep="")
addHandler(writeToFile,
file = paste(Save_directory,reportLogName, sep="" ),
level='DEBUG')
with(getLogger(), names(handlers))
loginfo(paste("Starting log file", getwd()))
mc<-detectCores()
cl<-makeCluster(mc, outfile="")
registerDoParallel(cl)
Similarly, at the beginning of MainCalcFunction, after having sourced my libraries and functions I have included the following to print to file:
reportLogName <- paste(testDate,"_logout.txt", sep="")
addHandler(writeToFile,
file = paste(Save_directory,reportLogName, sep="" ),
level='DEBUG')
with(getLogger(), names(handlers))
loginfo(paste("Starting test function ",getwd(), sep = ""))
In the MainCalcFunction function I have then put loginfo("text") statements at key junctures to inform me of where the code is at.
This has resulted in some text files being available after the code fails due to the aforementioned error. However, these text files provide no more information on the cause of the error aside from at what point. This is despite having a tryCatch statement embedded in MainCalcFunction where at the end, on any instance of error I have added the line logerror(e)
I am posting this answer in case it helps anyone else with a similar problem in the future.
Essentially, the error unserialize(socklist[[n]]) doesn't tell you a lot, so to solve it it's a matter of narrowing down the issue.
Firstly, be absolutely sure the code runs over several dates in non-parallel with no errors
Ensure the parallelisation is set up correctly. There are some obvious initial errors that many other questions respond to, e.g., hidden parallelisation inside the code which means parallelisation is occurring twice.
Once you are sure that there is no problem with the code and the parallelisation is set up correctly start narrowing down. The issue is likely (unless something has been missed above) something in the code which isn't a problem when it is run in serial, but becomes a problem when run in parallel. The easiest way to narrow down is by setting outfile = "Log.txt" in which make cluster function you use, e.g., cl<-makeCluster(cores-1, outfile="Log.txt"). Then add as many print("Point in code") comments in your function to narrow down on where the issue is occurring.
In my case, the problem was the line jj = closeAllConnections(). This line works fine in non-parallel but breaks the code when in parallel. I suspect it has something to do with the function closing all connections including socket connections that are required for the parallelisation.
Try running using plain R instead of running in RStudio.

Resources