Find out WHERE an error occurs in R when debugging - r

How can i find out WHERE the error occurs?
i've got a double loop like this
companies <- # vector with all companies in my data.frame
dates <- # vector with all event dates in my data.frame
for(i in 1:length(companies)) {
events_k <- # some code that gives me events of one company at a time
for{j in 1:nrow(events_k)) {
# some code that gives me one event date at a time
results <- # some code that calculates stuff for that event date
}
mylist[[i]] <- results # store the results in a list
}
In this code I got an error (it was something like error in max(i)...)
The inner loop works perfectly. So by leaving out the outer loop and manually enter the company ID's until that error appeared, I found out for which company there was something wrong. My data.frame had letters in a vector with daily returns for that specific company.
For a next time: Is there a way in R to find out WHERE (or here FOR WHICH COMPANY) the error appears? It could save a lot of time!

What I like to use is:
options(error = recover)
You only need to run it once at the beginning of your session (or add it to your .Rprofile file)
After that, each time an error is thrown you will be shown the stack of function calls that led to the error. You can select any of these calls and it will be as if you had run that command in browser() mode: you will be able to look at the variables in the calling environment and walk through the code.
More info and examples at ?recover.

It is difficult to know without explicit code that we can run, but my guess is that changing your code to
for(i in companies) {
for(j in dates) {
or alternatively
for(i in 1:length(companies)) {
for(j in 1:length(dates)) {
may solve the issue. Note the ( in the second loop. If not, it might be a good idea to edit your example to have some code / data that produces the same error.
To figure out where it occurs, you can always add print(i) or something like that at a suitable place in the code.

Related

tryCatch function works on most non-existent URLs, but it does not work in (at least) one case

Dear Stackoverflow users,
I am using R to scrape profiles of a few psycotherapists from Psychology Today; this is done for exercising and learning more about web scraping.
I am new to R and I I have to go through this intense training that will help me with a future projects. It implies that I might not know precisely what I am doing at the moment (e.g. I might not interpret well either the script or the error messages from R), but I have to get it done. Therefore, I beg your pardon for possible misunderstandings or inaccuracies.
In short, the situation is the following.
I have created a function through which I scrape information from 2 nodes of psycotherapists' profiles; the function is showed on this stackoverflow post.
Then I create a loop where that function is used on a few psycotherapists' profiles; the loop is in the above post as well, but I report it below because that is the part of the script that generates some problems (additionally to what I solved in the above mentioned post).
j <- 1
MHP_codes <- c(150140:150180) #therapist identifier
df_list <- vector(mode = "list", length(MHP_codes))
for(code1 in MHP_codes) {
URL <- paste0('https://www.psychologytoday.com/us/therapists/illinois/', code1)
#Reading the HTML code from the website
URL <- read_html(URL)
df_list[[j]] <- tryCatch(getProfile(URL),
error = function(e) NA)
j <- j + 1
}
when the loop is done, I bind the information from different profiles into one data frame and save it.
final_df <- rbind.fill(df_list)
save(final_df,file="final_df.Rda")
The function (getProfile) works well on individual profiles.
It works also on a small range of profiles ( c(150100:150150)).
Please, note that I do not know what psychoterapist id is actually assigned; so, many URLs within the range do not exist.
However, generally speaking, tryCatch should solve this issue. When an URL is non-existent (and thus the ID is not associated to any psychoterapist), each of the 2 nodes (and thus each of the 2 corresponding variables in my data frame) are empty (i.e. the data frame shows NAs in the corresponding cells).
However, in some IDs ranges, two problems might happen.
First, I get one error message such as teh following one:
Error in open.connection(x, "rb") : HTTP error 404.
So, this happens despite the fact that I am usign tryCatch and despite the fact that it generally appears to work (at least, until the error message appear).
Moreover, after the loop is stopped and R runs the line:
final_df <- rbind.fill(df_list)
A second error message appears:
Warning message:
In df[[var]] :
closing unused connection 3 (https://www.psychologytoday.com/us/therapists/illinois/150152)
It seems like there is a specific problem with that one empty URL.
In fact, when I change ID range, the loop works well despite non-existent URLs: on one hand, when the URL exists the information is scraped from the website, on the other hand, when the URL does not exists, the 2 variables associated to that URL (and thus to that psyciotherapist ID) get an NA.
Is it possible, perhaps, to tell R to skip the URL if it is empty? Without recording anything?
This solution would be excellent, since it would shrink the data frame to the existing URLs, but I do not know how to do it and I do not know whether it is a solution to my problem.
Anyone who is able to help me sorting out this issue?
Yes, you need to wrap a tryCatch around the read_html call. This is where R tries to connect to the website, so it will throw an error (as opposed to returning an empty object) there if fails to connect. You can catch that error and then use next to tell R to skip to the next iteration of the loop.
library(rvest)
##Valid URL, works fine
URL <- "https://news.bbc.co.uk"
read_html(URL)
##Invalid URL, error raised
URL <- "https://news.bbc.co.uk/not_exist"
read_html(URL)
##Leads to error
Error in open.connection(x, "rb") : HTTP error 404.
##Invalid URL, catch and skip to next iteration of the loop
URL <- "https://news.bbc.co.uk/not_exist"
tryCatch({
URL <- read_html(URL)},
error=function(e) {print("URL Not Found, skipping")
next})
I would like to thank #Jul for the answer.
Here I post my updated loop:
j <- 1
MHP_codes <- c(150000:150200) #therapist identifier
df_list <- vector(mode = "list", length(MHP_codes))
for(code1 in MHP_codes) {
delayedAssign("do.next", {next})
URL <- paste0('https://www.psychologytoday.com/us/therapists/illinois/', code1)
#Reading the HTML code from the website
URL <- tryCatch(read_html(URL),
error = function(e) force(do.next))
df_list[[j]] <- getProfile(URL)
j <- j + 1
}
final_df <- rbind.fill(df_list)
As you can see, something had to be changed: although the answer from #Jul was close to solve the problem, the loop still stopped, and thus I had to slightly change the original suggestion.
In particular, I have introduced in the loop but outside of the tryCatch function the following line:
delayedAssign("do.next", {next})
And in the tryCatch function the following argument:
force(do.next)
This is based on this other stackoverlflow post.

Continue for loop in R after error (example using vector of numbers and letters)

I'm currently working on a project where I'm trying complete a task in a for loop. There are a lot of areas where something can go wrong (it involves creating a pdf report via rmarkdown), but in this case I don't care if something goes wrong, I just want the for loop to keep going.
In the toy example below I would like to be able to print off the numbers 2 through 16, while skipping over the letter 'a'.
something<-function(x){
print(x + 1)
}
for(i in c(1:10,'a',11:15))
{
res <- try(something(i))
if(inherits(res, "try-error"))
{
#error handling code, maybe just skip this iteration using
next
}
#rest of iteration for case of no error
}
This is loosely based on the example provided in the answer below.
R Script - How to Continue Code Execution on Error.
I've tried adapting several other "how do I continue a for loop in R" to no success.
I'm not a full time programmer, so I'm convinced I'm missing something very simple, but any help would be appreciated.
I had a similar issue, where I had a function I was running in a for loop and needed it to keep running if error comes up. Here's how I did it:
OutputStorage=list()
for (i in 1:k){
Output=tryCatch(examplefunction(x), error=function(e) NULL)
OutputStorage[[i]]=Output
}
So if there is an output, it's stored in the list and if not, NULL is stored for that. I think it's working; running it just now actually.
Hope that helps!

How to output multiple pdf files from many data frame in a for loop in R

I through assign() function to name many data frame.
Use this script:
> for (i in 1:15)
{
assign(paste0('TagIDNum',i),filter(Ordf,Ordf$TagID==i))
}
Got this 15 data frame
Next step I need to output scatterplot of these 15 data frames with pairs() function and for loop to output pdf in once.
Here is my script:
for (i in 1:15)
{
pdf(paste('TagPlotNum',j,'.pdf',sep=''))
x<-paste('TagIDNum',j,sep='')
print(pairs(~x[,11]+x[,38]+x[,39]+x[,40]+x[,41]+x[,43]))
dev.off()
}
But I got this error information
Error information: incorrect number of dimensions
And I found that the x had no data, just a value as follow:
I will do some analysis in next steps, so this problem disturb for 2 days.
Post this article to ask any expert to solve this issue.
In my opinion, maybe paste() function have something to think, but I don't know how to solve this topic.
Here is my R information:
Thanks.
As per your output, x is the string "TagIDNum11", not the object with that name. You can get that however using get(), i.e.
x<-get(paste('TagIDNum',j,sep=''))
FYI, spaces are free, your code will be much more readable if you use them, i.e.
x <- get(paste('TagIDNum', j, sep=''))

R program does not output

I'm new to R and programming and taking a Coursera course. I've asked in their forums, but nobody can seem to provide an answer in the forums. To be clear, I'm trying to determine why this does not output.
When I first wrote the program, I was getting accurate outputs, but after I tried to upload, something went wonky. Rather than producing any output with [1], [2], etc. when I run the program from RStudio, I only get the the blue +++, but no errors and anything I change still does not produce an output.
I tried with a previous version of R, and reinstalled the most recent version 3.2.1 for Windows.
What I've done:
Set the correct working directory through RStudio
pol <- function(directory, pol, id = 1:332) {
files <- list.files("specdata", full.names = TRUE);
data <- data.frame();
for (i in ID) {
data <- rbind(data, read.csv(files_list[i]))
}
subset <- subset(data, ID %in% id);
polmean <- mean(subset[pol], na.rm = TRUE);
polmean("specdata", "sulfate", 1:10)
polmean("specdata", "nitrate", 70:72)
polmean("specdata", "nitrate", 23)
}
Can someone please provide some direction - debug help?
when I adjust the code the following errors tend to appear:
ID not found
Missing or unexpected } (although I've matched them all).
The updated code is as follow, if I'm understanding:
data <- data.frame();
files <- files[grepl(".csv",files)]
pollutantmean <- function(directory, pollutant, id = 1:332) {
pollutantmean <- mean(subset1[[pollutant]], na.rm = TRUE);
}
Looks like you haven't declared what ID is (I assume: a vector of numbers)?
Also, using 'subset' as a variable name while it's also a function, and pol as both a function name and the name of one of the arguments of that same function is just asking for trouble...
And I think there is a missing ")" in your for-loop.
EDIT
So the way I understand it now, you want to do a couple of things.
Read in a bunch of files, which you'll use multiple times without changing them.
Get some mean value out of those files, under different conditions.
Here's how I would do it.
Since you only want to read in the data once, you don't really need a function to do this (you can have one, but I think it's overkill for now). You correctly have code that makes a vector with the file names, and then loop over over them, rbinding them to each other. The problem is that this can become very slow. Check here. Make sure your directory only contains files that you want to read in, so no Rscripts or other stuff. A way (not 100% foolproof) to do this is using files <- files[grepl(".csv",files)], which makes sure you only have the csv's (grepl checks whether a certain string is a substring of another, and returns a boolean the [] then only keeps the elements for which a TRUE was returned).
Next, there is 'a thing you want to do multiple times', namely getting out mean values. This is where you'd use a function. Apparently you want to get the mean for different types of pollution, and you want this in restricted IDs.
Let's assume that 1. has given you a dataframe df with a column named Type for the type of pollution and a column called Id that somehow represents a sort of ID (substitute with the actual names in your script - if you don't have a column for ID, I'll edit the answer later on). Now you want a function
polmean <- function(type, id) {
# some code that returns the mean of a restricted version of df
}
This is all you need. You write the code that generates df, you then write a function that will get you what you want from that dataframe, and then you call it for the circumstances you want to use it in (the three polmean calls at the end of your original code, but now without the first argument as you no longer need this).
Ok - I finally solved this. Thanks for the help.
I didn't need to call "specdata" in line 2. the directory in line 1 referred to the correct directory.
My for/in statement needed to refer the the id in the first line not the ID in the dataset. The for/in statement doesn't appear to need to be indented (but it looks cleaner)
I did not need a subset
The last 3 lines for pollutantmean did not need to be a part of the program. These are used in the R console to call the results one by one.

R issue "object not found"

I am a newcomer to R. Last week I had a long and complicated function working perfectly. The program was letting me pick a subset of columns and doing various manipulations on that subset. The function must work 'function(arg1=first_header_name, arg2=second_header_name,....)'. I have cleared the console, removed the old history file. I have read the manual again, I have checked the .csv file to make sure everything there is still the same. I have gone back and reworked it all step by step and I have the place where this new problem occurs. As it is a very long function, I am only going to reproduce it in a simplified version of the part that is suddenly not working.
elbow <- function(arg1,arg2) {
 my_data <- read.csv("data.csv", header=TRUE, sep=",") 
average_A <- (arg1 + arg2)
average_A
}
elbow(A3,A5)
# Error in elbow(A3, A5) : object 'A3' not found
Column headers are A3,A4,A5,A7,A8,A9,B2,B3,B5,B6,B7,B9
What stupid little error am I making? This is driving me batty. It has to be something trivial.
Here's my guess at what might work the way you wanted:
elbow <- function(arg1,arg2) {
my_data <- read.csv("data.csv", header=TRUE, sep=",")
average_A <- my_data[[arg1]] + my_data[[arg2]] # "[[" evaluates args
average_A
}
elbow('A3','A5') # entered a character literals
You should realize that the rest of my_data will have evaporated and be garbage collected after return from the elbow call. I could have showed you how to use your original expression following attach(), which would have been arguably safe within that function, but that would have violated my religious principles.
Probably during your last session you had objects named A3 or A5 in your workspace (either defined explicitly, or perhaps you had loaded and attached the data). The function was working because those objects were there, but it wasn't actually doing what you thought it was doing, so in a new session with a new workspace--without those objects--it's not working. Your function as written doesn't actually do anything with the dataset (my_data) which you are reading in inside of it; I suspect you want something like this:
elbow <- function(arg1, arg2) {
my_data <- read.csv("data.csv",header=TRUE,sep=",")
average_A <- my_data[,arg1] + my_data[,arg2]
return(average_A)
}
You will also need to use quotes when calling the function, e.g.
elbow('A3','A5')

Resources