quantmod getsymbols error handling - r

I am trying to cycle through a list of symbols and download the data and save it to a csv file. individual stocks work perfectly fine if it is there, but it stops if there is an error and I do not know how to handle errors (new to R) I used part of an answer here, but I am unable to find answer on error handling while looping it to save it to file.
quantmod omitting tickers in getSymbols
startDate = Sys.Date()- 365
pth = "C:\\"
tickers <- c("LMT","AAPL","AMT", "GOOG")
#the sapply method works by not stopping when it has issues with LMT and still it goes not to dwld AAPL,
library(quantmod)
WoW <- new.env()
sapply(tickers, function(x){
try(
getSymbols(
x,
src ="google",
from =startDate,
env=WoW),
silent=TRUE)
})
#Now for the looping to save to file, somehow it does not go althe way till GOOG. it stops at AAPL
#Error in data.frame(sym) : row names contain missing values.
for (i in 1:length(tickers) ) {
col <- c( "Open","High","Low","Close","Volume")
sym <- eval(parse(text=paste("WoW$",tickers[i],sep="")))
if (!is.null(nrow(sym))){
colnames(sym) <- col
sym <- data.frame(sym)
sym <- cbind(BizDay = 0, sym)
sym$BizDay <- rownames(sym)
op <- paste0(pth,tickers[i],".csv")
print(op)
write.table(sym, file=op, na="", sep=",", row.names = FALSE)
}
}
Any pointers on how to handle basic errors? I have to run through full security list, and have to make sure that i handle those. but right now stuck on this.
Thanks

Got it to work with nrow(sym) > 1 check.

Related

how to properly close connection so I won't get "Error in file(con, "r") : all connections are in use" when using "readlines" and "tryCatch"

I have a list of URLs (more than 4000) from a specific domain (pixilink.com) and what I want to do is to figure out if the provided domain is a picture or a video. To do this, I used the solutions provided here: How to write trycatch in R and Check whether a website provides photo or video based on a pattern in its URL and wrote the code shown below:
#Function to get the value of initial_mode from the URL
urlmode <- function(x){
mycontent <- readLines(x)
mypos <- grep("initial_mode = ", mycontent)
if(grepl("0", mycontent[mypos])){
return("picture")
} else if(grepl("tour", mycontent[mypos])){
return("video")
} else{
return(NA)
}
}
Also, in order to prevent having error for URLs that don't exist, I used the code below:
readUrl <- function(url) {
out <- tryCatch(
{
readLines(con=url, warn=FALSE)
return(1)
},
error=function(cond) {
return(NA)
},
warning=function(cond) {
return(NA)
},
finally={
message( url)
}
)
return(out)
}
Finally, I separated the list of URLs and pass it into the functions (here for instance, I used 1000 values from URL list) described above:
a <- subset(new_df, new_df$host=="www.pixilink.com")
vec <- a[['V']]
vec <- vec[1:1000] # only chose first 1000 rows
tt <- numeric(length(vec)) # checking validity of url
for (i in 1:length(vec)){
tt[i] <- readUrl(vec[i])
print(i)
}
g <- data.frame(vec,tt)
g2 <- g[which(!is.na(g$tt)),] #only valid url
dd <- numeric(nrow(g2))
for (j in 1:nrow(g2)){
dd[j] <- urlmode(g2[j,1])
}
Final <- cbind(g2,dd)
Final <- left_join(g, Final, by = c("vec" = "vec"))
I ran this code on a sample list of URLs with 100, URLs and it worked; however, after I ran it on whole list of URLs, it returned an error. Here is the error : Error in textConnection("rval", "w", local = TRUE) : all connections are in use Error in textConnection("rval", "w", local = TRUE) : all connections are in use
And after this even for sample URLs (100 samples that I tested before) I ran the code and got this error message : Error in file(con, "r") : all connections are in use
I also tried closeAllConnection after each recalling each function in the loop, but it didn't work.
Can anyone explain what this error is about? is it related to the number of requests we can have from the website? what's the solution?
So, my guess as to why this is happening is because you're not closing the connections that you're opening via tryCatch() and via urlmode() through the use of readLines(). I was unsure of how urlmode() was going to be used in your previous post so it had made it as simple as I could (and in hindsight, that was badly done, my apologies). So I took the liberty of rewriting urlmode() to try and make it a little bit more robust for what appears to be a more expansive task at hand.
I think the comments in the code should help, so take a look below:
#Updated URL mode function with better
#URL checking, connection handling,
#and "mode" investigation
urlmode <- function(x){
#Check if URL is good to go
if(!httr::http_error(x)){
#Test cases
#x <- "www.pixilink.com/3"
#x <- "https://www.pixilink.com/93320"
#x <- "https://www.pixilink.com/93313"
#Then since there are redirect shenanigans
#Get the actual URL the input points to
#It should just be the input URL if there is
#no redirection
#This is important as this also takes care of
#checking whether http or https need to be prefixed
#in case the input URL is supplied without those
#(this can cause problems for url() below)
myx <- httr::HEAD(x)$url
#Then check for what the default mode is
mycon <- url(myx)
open(mycon, "r")
mycontent <- readLines(mycon)
mypos <- grep("initial_mode = ", mycontent)
#Close the connection since it's no longer
#necessary
close(mycon)
#Some URLs with weird formats can return
#empty on this one since they don't
#follow the expected format.
#See for example: "https://www.pixilink.com/clients/899/#3"
#which is actually
#redirected from "https://www.pixilink.com/3"
#After that, evaluate what's at mypos, and always
#return the actual URL
#along with the result
if(!purrr::is_empty(mypos)){
#mystr<- stringr::str_extract(mycontent[mypos], "(?<=initial_mode\\s\\=).*")
mystr <- stringr::str_extract(mycontent[mypos], "(?<=\').*(?=\')")
return(c(myx, mystr))
#return(mystr)
#So once all that is done, check if the line at mypos
#contains a 0 (picture), tour (video)
#if(grepl("0", mycontent[mypos])){
# return(c(myx, "picture"))
#return("picture")
#} else if(grepl("tour", mycontent[mypos])){
# return(c(myx, "video"))
#return("video")
#}
} else{
#Valid URL but not interpretable
return(c(myx, "uninterpretable"))
#return("uninterpretable")
}
} else{
#Straight up invalid URL
#No myx variable to return here
#Just x
return(c(x, "invalid"))
#return("invalid")
}
}
#--------
#Sample code execution
library(purrr)
library(parallel)
library(future.apply)
library(httr)
library(stringr)
library(progressr)
library(progress)
#All future + progressr related stuff
#learned courtesy
#https://stackoverflow.com/a/62946400/9494044
#Setting up parallelized execution
no_cores <- parallel::detectCores()
#The above setup will ensure ALL cores
#are put to use
clust <- parallel::makeCluster(no_cores)
future::plan(cluster, workers = clust)
#Progress bar for sanity checking
progressr::handlers(progressr::handler_progress(format="[:bar] :percent :eta :message"))
#Website's base URL
baseurl <- "https://www.pixilink.com"
#Using future_lapply() to recursively apply urlmode()
#to a sequence of the URLs on pixilink in parallel
#and storing the results in sitetype
#Using a future chunk size of 10
#Everything is wrapped in with_progress() to enable the
#progress bar
#
range <- 93310:93350
#range <- 1:10000
progressr::with_progress({
myprog <- progressr::progressor(along = range)
sitetype <- do.call(rbind, future_lapply(range, function(b, x){
myprog() ##Progress bar signaller
myurl <- paste0(b, "/", x)
cat("\n", myurl, " ")
myret <- urlmode(myurl)
cat(myret, "\n")
return(c(myurl, myret))
}, b = baseurl, future.chunk.size = 10))
})
#Converting into a proper data.frame
#and assigning column names
sitetype <- data.frame(sitetype)
names(sitetype) <- c("given_url", "actual_url", "mode")
#A bit of wrangling to tidy up the mode column
sitetype$mode <- stringr::str_replace(sitetype$mode, "0", "picture")
head(sitetype)
# given_url actual_url mode
# 1 https://www.pixilink.com/93310 https://www.pixilink.com/93310 invalid
# 2 https://www.pixilink.com/93311 https://www.pixilink.com/93311 invalid
# 3 https://www.pixilink.com/93312 https://www.pixilink.com/93312 floorplan2d
# 4 https://www.pixilink.com/93313 https://www.pixilink.com/93313 picture
# 5 https://www.pixilink.com/93314 https://www.pixilink.com/93314 floorplan2d
# 6 https://www.pixilink.com/93315 https://www.pixilink.com/93315 tour
unique(sitetype$mode)
# [1] "invalid" "floorplan2d" "picture" "tour"
#--------
Basically, urlmode() now opens and closes connections only when necessary, checks for URL validity, URL redirection, and also "intelligently" extracts the value assigned to initial_mode. With the help of future.lapply(), and the progress bar from the progressr package, this can now be applied quite conveniently in parallel to as many pixilink.com/<integer> URLs as desired. With a bit of wrangling thereafter, the results can be presented very tidily as a data.frame as shown.
As an example, I've demonstrated this for a small range in the code above. Note the commented out 1:10000 range in the code in this context: I let this code run the last couple of hours over this (hopefully sufficiently) large range of URLs to check for errors and problems. I can attest that I encountered no errors (only the regular warnings In readLines(mycon) : incomplete final line found on 'https://www.pixilink.com/93334'). For proof, I have the data from all 10000 URLs written to a CSV file that I can provide upon request (I don't fancy uploading that to pastebin or elsewhere unnecessarily). Due to oversight on my part, I forgot to benchmark that run, but I suppose I could do that later if performance metrics are desired/would be considered interesting.
For your purposes, I believe you can simply take the entire code snippet below and run it verbatim (or with modifications) by just changing the range assignment right before the with_progress(do.call(...)) step to a range of your liking. I believe this approach is simpler and does away with having to deal with multiple functions and such (and no tryCatch() messes to deal with).

about reading files and using tryCatch() in R

What I want to do is to read output files and extract some value in files. But, in fact, some files don't exist, so I use tryCatch() in my program to catch those "errors", then my program will return a NA value and continue reading the next file. But when I execute my program, it reports errors("all connections are in use"), I have tried to find answers online, but there is no good answer for my questions. So, if you can solve my problem, please give your advice ! Thank you very much !
prop.protec <- 0.5
pat <- read.csv("~/par.csv",header = FALSE) # parameter file
pat <- as.matrix(pat)
tpw <- matrix(NA, nrow = 36, ncol = num.rep) # used to store p-values
num.rep <- 500
for (i in 1:36){
# following are 4 parameters
dis.mod <- pat[i,1]
herit.tot <- pat[i,2]
bin <- pat[i,3]
op <- pat[i,4]
for(reps in 1:num.rep){
tryCatch(
{
res.file <- paste("~/z-out-",op, "-", dis.mod, "-",bin,"-",herit.tot,"-",prop.protec,"-",reps, ".extended.qls.res", sep = "")
res.dat <- read.table(file = res.file, header = TRUE)
tpw[i,reps] <- res.dat$P_MFQLS
},
warning=function(cond) {
message("Here's the original warning message:")
message(cond)
},
error = function(e){
message("Here's the original error message:")
message(e)} )
}
}
I have found the answer, a very simple and wonderful answer, we just need to add closeAllConnections() at the end of inner loop. That works very well ! I have spent much time on it, when I know this answer and succeed, I believe you can understand my feeling.

Adjusting for dividend and splits using quantmod

I am using quantmod to adjust for dividends and splits. It seems to work but I have found the following problem: when adjusting my sma(200,0) historical values are wrong and they correct as the date approaches the current date. Please see the code below.
stockData <- new.env() #Make a new environment for quantmod to store data in
symbols = c("IWM","SPY","TLT","TSLA")
nr.of.positions<-3
getSymbols(symbols, src='yahoo',from = "2015-10-01",to = Sys.Date())
for (i in 1:length(symbols)) {
assign (symbols[i], adjustOHLC(get(symbols[i]),
adjust=c("split", "dividend"),
use.Adjusted=FALSE,
symbol.name=symbols[i]))
}
x <- list()
for (i in 1:length(symbols)) {
x[[i]] <- get(symbols[i], pos=stockData) # get data from stockData environment
x[[i]]$sma <-SMA(Cl(x[[i]]),10)
x[[i]]$smalong <-SMA(Cl(x[[i]]),200)
x[[i]]$adx<-ADX(HLC(x[[i]]),10)
x[[i]]$rsi <-RSI(Cl(x[[i]]),14)
x[[i]]$close <-(Cl(x[[i]]))
}
You're lucky that your code works. Or maybe you're unlucky, since an error would have let you know you did something wrong.
You create a stockData environment and the comment says you intended to store the data you pull in it. But you don't specify the stockData environment in your call to getSymbols, or your calls to assign and get in the first for loop. So they're all assigning and getting from the global environment.
Your code would be clearer if you avoid using get and assign within a for loop, and instead used convenience functions lapply and eapply.
stockData <- new.env()
symbols <- c("IWM","SPY","TLT","TSLA")
nr.of.positions <- 3
getSymbols(symbols, from = "2015-10-01", env = stockData)
# loop over objects in an environment with eapply
adj <- eapply(stockData, function(x) {
symbol <- sub("\\.Close$", "", colnames(Cl(x)))
adjustOHLC(x, symbol.name=symbol)
})
# loop over list returned by eapply
x <- lapply(adj, function(x) {
x$sma <- SMA(Cl(x),10)
x$smalong <- SMA(Cl(x),200)
x$adx <- ADX(HLC(x),10)
x$rsi <- RSI(Cl(x),14)
x$close <- Cl(x)
x
})
You'll notice the results of my code and your code are the same if you run them each in a clean R session. So the reason your code produced "wrong" results is probably because you had other objects in your workspace that were being assigned/accessed by your use of get and assign.

How to download data from web? Need to skip and move to next item if data doesn't exist

I’m trying to modify a script that I found online. I want to get the script to skip a ticker if no results are found, kind of like Try…Catch. Now, the code just falls down if a ticker isn’t found. How can I get this to skip any/all missing tickers and finish without crashing?
require(XML)
require(plyr)
getKeyStats_xpath <- function(symbol) {
yahoo.URL <- "http://finance.yahoo.com/q/ks?s="
html_text <- htmlParse(paste(yahoo.URL, symbol, sep = ""), encoding="UTF-8")
#search for <td> nodes anywhere that have class 'yfnc_tablehead1'
nodes <- getNodeSet(html_text, "/*//td[#class='yfnc_tablehead1']")
if(length(nodes) > 0 ) {
measures <- sapply(nodes, xmlValue)
#Clean up the column name
measures <- gsub(" *[0-9]*:", "", gsub(" \\(.*?\\)[0-9]*:","", measures))
#Remove dups
dups <- which(duplicated(measures))
#print(dups)
for(i in 1:length(dups))
measures[dups[i]] = paste(measures[dups[i]], i, sep=" ")
#use siblings function to get value
values <- sapply(nodes, function(x) xmlValue(getSibling(x)))
df <- data.frame(t(values))
colnames(df) <- measures
return(df)
} else {
break
}
}
setwd("C:/Users/rshuell001/Desktop/downloads/")
tickers <- c("OREX",
"OSIR",
"PACB",
"PCRX",
"PCYC",
"PDLI",
"PETX",
"PGNX",
"POZN",
"PRTA",
"PTCT",
"PTLA",
"PTX",
"QGEN",
"QLTI",
"RARE",
"RCPT",
"RDUS",
"REGN",
"RGEN",
"RGLS",
"RLYP",
"RPTP",
"RTRX",
"RVNC",
"SAGE")
stats <- ldply(tickers, getKeyStats_xpath)
rownames(stats) <- tickers
write.csv(t(stats), "FinancialStats_updated.csv",row.names=TRUE)
I posted here about a week ago and a couple people suggested I try this.
stats <- llply(tickers, function(t) tryCatch(getKeyStats_xpath(t), error=function(x) {cat("error occurred, skipping this ticker\n")}))
rownames(stats) <- tickers
write.csv(t(stats), "FinancialStats_updated.csv",row.names=TRUE)
That never worked. I messed around with it for a while, but couldn’t get it to write the data to a CSV file. Although, the code looks like it’s pretty close to what I want.
Can someone help me get this straightened out?
Thanks!!
Here is the error your code produces:
Error in FUN(X[[i]], ...) :
no loop for break/next, jumping to top level
Let's look at help(break):
break breaks out of a for, while or repeat loop; control is transferred to the first statement outside the inner-most loop
So, break() may only be used in a for, while or repeat loop, and you are using it in an if else statement... We could just replace that break in getKeyStats_xpath by something else. Plus return something to mark that there was a symbol not found.
...
} else {
cat("Could not find",symbol,"\n")
return(data.frame(NA))
}
...
It now works fine. You can remove the rows that are all NAs, after you have assigned names to stats, with:
stats <- stats[!rowSums(is.na(stats)) == length(stats),]

Using tryCatch within plyr

I'm running this function:
require(XML)
require(plyr)
getKeyStats_xpath <- function(symbol) {
yahoo.URL <- "http://finance.yahoo.com/q/ks?s="
html_text <- htmlParse(paste(yahoo.URL, symbol, sep = ""), encoding="UTF-8")
#search for <td> nodes anywhere that have class 'yfnc_tablehead1'
nodes <- getNodeSet(html_text, "/*//td[#class='yfnc_tablehead1']")
if(length(nodes) > 0 ) {
measures <- sapply(nodes, xmlValue)
#Clean up the column name
measures <- gsub(" *[0-9]*:", "", gsub(" \\(.*?\\)[0-9]*:","", measures))
#Remove dups
dups <- which(duplicated(measures))
#print(dups)
for(i in 1:length(dups))
measures[dups[i]] = paste(measures[dups[i]], i, sep=" ")
#use siblings function to get value
values <- sapply(nodes, function(x) xmlValue(getSibling(x)))
df <- data.frame(t(values))
colnames(df) <- measures
return(df)
} else {
break
}
}
As long as the page exists, it works fine. However, if one of my tickers does NOT have any data on that URL, it throws an error:
Error in FUN(X[[3L]], ...) : no loop for break/next, jumping to top level
I added a trace too, and things break down on ticker number 3.
tickers <- c("QLTI",
"RARE",
"RCPT",
"RDUS",
"REGN",
"RGEN",
"RGLS")
tryCatch({
stats <- ldply(tickers, getKeyStats_xpath)
}, finally={})
I'd like to call the function like this:
stats <- ldply(tickers, getKeyStats_xpath)
rownames(stats) <- tickers
write.csv(t(stats), "FinancialStats_updated.csv",row.names=TRUE)
Basically, if a ticker has no data, I want to skip it.
Can someone please help me get this working?
Expanding on my comment. The issue here is you've enclosed the entire command stats <- ldply(tickers, getKeyStats_xpath) within a tryCatch. This means R will try to get key stats from every ticker.
Instead, what you want is to try each ticker.
To do this, write a wrapper for getKeyStats_xpath that encloses it in tryCatch. you could do this within ldply with an anonymous function, for example ldply(tickers, function (t) tryCatch(getKeyStats_xpath(t), finally={})). Note that finally executes regardless of exit condition, so finally={} executes nothing. (See Advanced R or How to write try catch in R from r-faq for more).
On an error, tryCatch calls the function provided in the argument error. So as is, this code still won't help as the error is unhandled (thanks to rawr for pointing this out earlier). It is also easier to inspect the output if you use llply instead, then
So a complete answer using this approach, and with informative error handling, is below.
stats <- llply(tickers,
function(t) tryCatch(getKeyStats_xpath(t),
error=function(x) {
cat("error occurred for:\n", t, "\n...skipping this ticker\n")
}
)
)
names(stats) <- tickers
lapply(stats, length)
#<snip>
#$RCPT
#[1] 0
# </snip>
As of now, this works for me, returning data for all tickers except the one listed in the code block above.

Resources