How to stop R from stopping execution on error? - r

I'm having an issue where if I execute several lines of code at once and one of them has an error, the lines below don't get executed.
For example if I have:
table(data$AGE)
table(dataREGION)
table(date$SEXE)
I get the table for the first line, and then
Error in table(dataREGION) : object 'dataREGION' not found
>
And the last line does not execute.
Does anyone know why it does that and how to fix it?
(I work with R 4.2.2 and RStudio 2022.12.0+353 "Elsbeth Geranium" Release)
Thanks!
Have a nice day,
Cassandra

Fixed: In Global Options > Console, under "Execution" uncheck the "Discard pending console input on error"

It seems like you want to use try().
try(table(data$AGE), silent = F, outFile = T)
try(table(dataREGION)) # also works without any params
try(table(date$SEXE))
You can also use tryCatch() if you want more control but it doesn't seem necessary for your purpose.
__
As for why your dataREGION doesn't exectue:
Hazarding a guess it might be because you forgot the $ between data and REGION

Related

Is there a function that automatically moves my cursor in R to the error found in my code?

Whenever I get an error in my code, I have to start looking on which line it is. Having only 500 lines or so, this is doable, but I can imagine the efficiency boost I would get if there'd be a keyboard shortcut that automatically moves my cursor to the error.
Anyone that found such a shortcut?
I am working in R script.
You could set error option to browser or recover (as suggested by #Ben Bolker).
With browser:
options(error=browser)
test <- function() {
1+1
stop('custom error')
}
#some code
cat('OK \n')
test()
cat('After error \n')
With recover:
And to get back to default:
options(error=NULL)
Have you tried R Studio? It has "Debug" tool
https://support.rstudio.com/hc/en-us/articles/200713843?version=1.3.1093&mode=desktop
Also in console you can see the warning with the wrong line. Sorry if I did not understand your question.

How do I close unused connections after read_html in R

I am quite new to R and am trying to access some information on the internet, but am having problems with connections that don't seem to be closing. I would really appreciate it if someone here could give me some advice...
Originally I wanted to use the WebChem package, which theoretically delivers everything I want, but when some of the output data is missing from the webpage, WebChem doesn't return any data from that page. To get around this, I have taken most of the code from the package but altered it slightly to fit my needs. This worked fine, for about the first 150 usages, but now, although I have changed nothing, when I use the command read_html, I get the warning message " closing unused connection 4 (http:....." Although this is only a warning message, read_html doesn't return anything after this warning is generated.
I have written a simplified code, given below. This has the same problem
Closing R completely (or even rebooting my PC) doesn't seem to make a difference - the warning message now appears the second time I use the code. I can run the querys one at a time, outside of the loop with no problems, but as soon as I try to use the loop, the error occurs again on the 2nd iteration.
I have tried to vectorise the code, and again it returned the same error message.
I tried showConnections(all=TRUE), but only got connections 0-2 for stdin, stdout, stderr.
I have tried searching for ways to close the html connection, but I can't define the url as a con, and close(qurl) and close(ttt) also don't work. (Return errors of no applicable method for 'close' applied to an object of class "character and no applicable method for 'close' applied to an object of class "c('xml_document', 'xml_node')", repectively)
Does anybody know a way to close these connections so that they don't break my routine? Any suggestions would be very welcome. Thanks!
PS: I am using R version 3.3.0 with RStudio Version 0.99.902.
CasNrs <- c("630-08-0","463-49-0","194-59-2","86-74-8","148-79-8")
tit = character()
for (i in 1:length(CasNrs)){
CurrCasNr <- as.character(CasNrs[i])
baseurl <- 'http://chem.sis.nlm.nih.gov/chemidplus/rn/'
qurl <- paste0(baseurl, CurrCasNr, '?DT_START_ROW=0&DT_ROWS_PER_PAGE=50')
ttt <- try(read_html(qurl), silent = TRUE)
tit[i] <- xml_text(xml_find_all(ttt, "//head/title"))
}
After researching the topic I came up with the following solution:
url <- "https://website_example.com"
url = url(url, "rb")
html <- read_html(url)
close(url)
# + Whatever you wanna do with the html since it's already saved!
I haven't found a good answer for this problem. The best work-around that I came up with is to include the function below, with Secs = 3 or 4. I still don't know why the problem occurs or how to stop it without building in a large delay.
CatchupPause <- function(Secs){
Sys.sleep(Secs) #pause to let connection work
closeAllConnections()
gc()
}
I found this post as I was running into the same problems when I tried to scrape multiple datasets in the same script. The script would get progressively slower and I feel it was due to the connections. Here is a simple loop that closes out all of the connections.
for (i in seq_along(df$URLs)){function(i)
closeAllConnections(i)
}

R Parallelisation Error unserialize(socklisk[[n]])

In a nutshell I am trying to parallelise my whole script over dates using Snow and adply but continually get the below error.
Error in unserialize(socklist[[n]]) : error reading from connection
In addition: Warning messages:
1: <anonymous>: ... may be used in an incorrect context: ‘.fun(piece, ...)’
2: <anonymous>: ... may be used in an incorrect context: ‘.fun(piece, ...)’
I have set up the parallelisation process in the following way:
Cores = detectCores(all.tests = FALSE, logical = TRUE)
cl = makeCluster(Cores, type="SOCK")
registerDoSNOW(cl)
clusterExport(cl, c("Var1","Var2","Var3","Var4"), envir = environment())
exposureDaily <- adply(.data = dateSeries,.margins = 1,.fun = MainCalcFunction,
.expand = TRUE, Var1, Var2, Var3,
Var4,.parallel = TRUE)
stopCluster(cl)
Where dateSeries might look something like
> dateSeries
marketDate
1 2016-04-22
2 2016-04-26
MainCalcFunction is a very long script with multiple of my own functions contained within it. As the script is so long reproducing it wouldn't be practical, and a hypothetical small function would defeat the purpose as I have already got this methodology to work with other smaller functions. I can say that within MainCalcFunction I call all my libraries, necessary functions, and a file containing all other variables aside from those exported above so that I don't have to export a long list libraries and other objects.
MainCalcFunction can run successfully in its entirety over 2 dates using adply but not parallelisation, which tells me that it is not a bug in the code that is causing the parallelisation to fail.
Initially I thought (from experience) that the parallelisation over dates was failing because there was another function within the code that utilised parallelisation, however I have subsequently rebuilt the whole code to make sure that there was no such function.
I have poured over the script with a fine tooth comb to see if there was any place where I accidently didn't export something that I needed and I can't find anything.
Some ideas as to what could be causing the code to fail are:
The use of various option valuation functions in fOptions and rquantlib
The use of type sock
I am aware of this question already asked and also this question, and while the first question has helped me, it hasn't yet help solve the problem. (Note: that may be because I haven't used it correctly, having mainly used loginfo("text") to track where the code is. Potentially, there is a way to change that such that I log warning and/or error messages instead?)
Please let me know if there is any other information I can provide to help in solving this. I would be so appreciative if someone could provide some guidance, as the code takes close to 40 minutes to run for a day and I need to run it for close to a year, therefore parallelisation is essential!
EDIT
I have tried to implement the suggestion in the first question included above by utilising the outfile option. Given I am using Windows, I have done this by including the following lines before the exporting of the key objects and running MainCalcFunction :
reportLogName <- paste("logout_parallel.txt", sep="")
addHandler(writeToFile,
file = paste(Save_directory,reportLogName, sep="" ),
level='DEBUG')
with(getLogger(), names(handlers))
loginfo(paste("Starting log file", getwd()))
mc<-detectCores()
cl<-makeCluster(mc, outfile="")
registerDoParallel(cl)
Similarly, at the beginning of MainCalcFunction, after having sourced my libraries and functions I have included the following to print to file:
reportLogName <- paste(testDate,"_logout.txt", sep="")
addHandler(writeToFile,
file = paste(Save_directory,reportLogName, sep="" ),
level='DEBUG')
with(getLogger(), names(handlers))
loginfo(paste("Starting test function ",getwd(), sep = ""))
In the MainCalcFunction function I have then put loginfo("text") statements at key junctures to inform me of where the code is at.
This has resulted in some text files being available after the code fails due to the aforementioned error. However, these text files provide no more information on the cause of the error aside from at what point. This is despite having a tryCatch statement embedded in MainCalcFunction where at the end, on any instance of error I have added the line logerror(e)
I am posting this answer in case it helps anyone else with a similar problem in the future.
Essentially, the error unserialize(socklist[[n]]) doesn't tell you a lot, so to solve it it's a matter of narrowing down the issue.
Firstly, be absolutely sure the code runs over several dates in non-parallel with no errors
Ensure the parallelisation is set up correctly. There are some obvious initial errors that many other questions respond to, e.g., hidden parallelisation inside the code which means parallelisation is occurring twice.
Once you are sure that there is no problem with the code and the parallelisation is set up correctly start narrowing down. The issue is likely (unless something has been missed above) something in the code which isn't a problem when it is run in serial, but becomes a problem when run in parallel. The easiest way to narrow down is by setting outfile = "Log.txt" in which make cluster function you use, e.g., cl<-makeCluster(cores-1, outfile="Log.txt"). Then add as many print("Point in code") comments in your function to narrow down on where the issue is occurring.
In my case, the problem was the line jj = closeAllConnections(). This line works fine in non-parallel but breaks the code when in parallel. I suspect it has something to do with the function closing all connections including socket connections that are required for the parallelisation.
Try running using plain R instead of running in RStudio.

Problems with reassignInPackage

I am trying to understand the way the YourCast R package works and make it work with my data.
For example, if a function produces errors, I
get the source code of that function using YourCast:::bad.fn
add outputs of critical
values at critical stages
use reassignInPackage(name="original.fn", package="YourCast", value="my.fn")
Once I find the cause of the error, I fix it in the function and reassign it in the package.
However, for some strange reason this does not work for non-hidden functions.
For example:
install.packages("YourCast")
Library(YourCast)
YourCast:::check.depvar
This will print the hidden function check.depvar. One line if (all(ix == 1:3)) will produce an error message if any of the x is missing.
Thus, I change the whole function to the following and replace the original formula:
mzuba.check.depvar <- function(formula)
{
return (grepl("log[(]",as.character(formula)[2]))
}
reassignInPackage("check.depvar",
pkgName="YourCast",
mzuba.check.depvar)
rm(mzuba.check.depvar)
Now YourCast:::check.depvar will print my version of that function, and everything is fine.
However
YourCast::yourcast or YourCast:::yourcast or simply yourcast will print the non-hidden function yourcast. Suppose I want to change that function as well.
reassignInPackage(name="yourcast",
pkgName="YourCast",
value=test)
Now, YourCast::yourcast and YourCast:::yourcast will print the new, modified version but yourcast still gives the old version!
That might not a problem if I could simply call YourCast::yourcast instead of yourcast, but that produces some kind of error that I can't trace back because suddenly R-Studio does not print error messages at all anymore!, although it still does something if it is capable to:
> Uagh! do something!
> 1 + 1
[1] 2
> Why no error msg?
>
Restarting the R-session will solve the error-msg problem, though.
So my question is: How do I reassign non-hidden functions in packages?
Furthermore (this would faciliate testing a lot), is there a way to make all hidden functions available without using the ::: operator? I.e., How to export all functions from a package?

Function to clear the console in R and RStudio

I am wondering if there is a function to clear the console in R and, in particular, RStudio I am looking for a function that I can type into the console, and not a keyboard shortcut.
Someone has already provided such a function in this StackExchange post from 2010. Unfortunately, this depends on the RCom package and will not run on Mac OS X.
cat("\014")
is the code to send CTRL+L to the console, and therefore will clear the screen.
Far better than just sending a whole lot of returns.
If you are using the default R console, the key combination Option + Command + L will clear the console.
You may define the following function
clc <- function() cat(rep("\n", 50))
which you can then call as clc().
shell("cls") if on Windows,
shell("clear") if on Linux or Mac.
(shell() passes a command (or any string) to the host terminal.)
cat("\f") may be easier to remember than cat("\014").
It works fine for me on Windows 10.
In Ubuntu-Gnome, simply pressing CTRL+L should clear the screen.
This also seems to also work well in Windows 10 and 7 and Mac OS X Sierra.
Here's a function:
clear <- function() cat(c("\033[2J","\033[0;0H"))
then you can simply call it, as you call any other R function, clear().
If you prefer to simply type clear (instead of having to type clear(), i.e. with the parentheses), then you can do
clear_fun <- function() cat(c("\033[2J","\033[0;0H"));
makeActiveBinding("clear", clear_fun, baseenv())
I developed an R package that will do this, borrowing from the suggestions above. The package is called called mise, as in "mise en place." You can install and run it using
install.packages("mise")
library(mise)
mise()
Note that mise() also deletes all variables and functions and closes all figures by default. To just clear the console, use mise(vars = FALSE, figs = FALSE).
In linux use system("clear") to clear the screen.
If you are using the default R console CTRL + L
RStudio - CTRL + L
cat("\014") . This will work. no worries
You can combine the following two commands
cat("\014");
cat(rep("\n", 50))
Another option for RStudio is rstudioapi::sendToConsole("\014"). This will work even if output is diverted.
sink("out.txt")
cat("\014") # Console not cleared
rstudioapi::sendToConsole("\014") # Console cleared
sink()
I know this question is very old, but I found myself visiting it many times looking for a totally different answer:
n = 20
for (i in 0:n) {
cat(100*i/n, "% \r")
flush.console()
Sys.sleep(0.01) #do something slow
}
flush.console() will kind of "clear the console in r and studio", maybe not in OP's terms but still.
This code will act like a progress bar in the console. For each iteration, the percentage is incremented and then erased on the next iteration.
Note that this won't work without \r or with an \n, for some reason.

Resources