quantmod getFinancials() not pulling financials - r
I'm looking to download fundamental data for public companies. Utilizing the quantmod package, I was trying to use getFinancials() to pull data, for which it works for some companies but has varied results (I read and understand the disclaimer about free data) but want to confirm that I am pulling this correctly.
For JPM:
On the Yahoo finance website, I do see financials populated, but the below call seems to pull "google" as the src instead of "yahoo", for which there are sparse financials populated.
Google - https://www.google.com/finance?q=NYSE%3AJPM&fstype=ii&ei=9kh-WejLE5e_etbzmpgP
Yahoo - https://finance.yahoo.com/quote/JPM/financials?p=JPM
library(quantmod)
JPM <- getFinancials("JPM", src = "yahoo", auto.assign = FALSE)
viewFin(JPM, type = "IS", period = "A")
Is there a correct way to specify the src? Also is there a way to use getFinancials() but if there is a NA in an indicative column (Revenues for example) switch the source (google vs. yahoo)?
The top of the help page for getFinancials says (emphasis added),
Download Income Statement, Balance Sheet, and Cash Flow Statements from Google Finance.
There is currently no way to specify Yahoo Finance as a source. Doing so would require someone to write a method to scrape and parse the HTML from Yahoo Finance, since there's no way to download it in a file like there is for price data.
I think Yahoo changed it's API very recently. Download the file from the link titled "Get Excel Spreadsheet to Download Bulk Historical Stock Data from Google Finance"
http://investexcel.net/multiple-stock-quote-downloader-for-excel/
That is for Excel, which you can easily load into R.
You could try something like this, as well.
# assumes codes are known beforehand
codes <- c("MSFT","SBUX","S","AAPL","ADT")
urls <- paste0("https://www.google.com/finance/historical?q=",codes,"&output=csv")
paths <- paste0(codes,"csv")
missing <- !(paths %in% dir(".", full.name = TRUE))
missing
# simple error handling in case file doesn't exists
downloadFile <- function(url, path, ...) {
# remove file if exists already
if(file.exists(path)) file.remove(path)
# download file
tryCatch(
download.file(url, path, ...), error = function(c) {
# remove file if error
if(file.exists(path)) file.remove(path)
# create error message
c$message <- paste(substr(path, 1, 4),"failed")
message(c$message)
}
)
}
# wrapper of mapply
Map(downloadFile, urls[missing], paths[missing])
Or, this.
## downloads historic prices for all constituents of SP500
library(zoo)
library(tseries)
## read in list of constituents, with company name in first column and
## ticker symbol in second column
## CREATE A FILE TO READ DATA FROM!!!
spComp <- read.csv("C:/Users/Excel/Desktop/stocks.csv" )
## specify time period
dateStart <- "2013-01-01"
dateEnd <- "2015-05-08"
## extract symbols and number of iterations
symbols <- spComp[, 1]
nAss <- length(symbols)
## download data on first stock as zoo object
z <- get.hist.quote(instrument = symbols[1], start = dateStart,
end = dateEnd, quote = "AdjClose",
retclass = "zoo", quiet = T)
## use ticker symbol as column name
dimnames(z)[[2]] <- as.character(symbols[1])
## download remaining assets in for loop
for (i in 2:nAss) {
## display progress by showing the current iteration step
cat("Downloading ", i, " out of ", nAss , "\n")
result <- try(x <- get.hist.quote(instrument = symbols[i],
start = dateStart,
end = dateEnd, quote = "AdjClose",
retclass = "zoo", quiet = T))
if(class(result) == "try-error") {
next
}
else {
dimnames(x)[[2]] <- as.character(symbols[i])
## merge with already downloaded data to get assets on same dates
z <- merge(z, x)
}
}
## save data
# CREATE A FILE TO WRITE DATA TO!!!
write.zoo(z, file = "C:/Users/Excel/Desktop/all_sp500_price_data.csv", index.name = "time")
Here is, yet another, option for you to consider.
Method #1:
---
layout: post
title: "2014-11-20-Download-Stock-Data-1"
description: ""
category: R
tags: [knitr,lubridate,stringr,plyr,dplyr]
---
{% include JB/setup %}
This article illustrates how to download stock price data files from Google, save it into a local drive and merge them into a single data frame. This script is slightly modified from a script which downloads RStudio package download log data. The original source can be found [here](https://github.com/hadley/cran-logs-dplyr/blob/master/1-download.r).
First of all, the following three packages are used.
{% highlight r %}
library(knitr)
library(lubridate)
library(stringr)
library(plyr)
library(dplyr)
{% endhighlight %}
The script begins with creating a folder to save data files.
{% highlight r %}
# create data folder
dataDir <- paste0("data","_","2014-11-20-Download-Stock-Data-1")
if(file.exists(dataDir)) {
unlink(dataDir, recursive = TRUE)
dir.create(dataDir)
} else {
dir.create(dataDir)
}
{% endhighlight %}
After creating urls and file paths, files are downloaded using `Map` function - it is a warpper of `mapply`. Note that, in case the function breaks by an error (eg when a file doesn't exist), `download.file` is wrapped by another function that includes an error handler (`tryCatch`).
{% highlight r %}
# assumes codes are known beforehand
codes <- c("MSFT", "TCHC") # codes <- c("MSFT", "1234") for testing
urls <- paste0("http://www.google.com/finance/historical?q=NASDAQ:",
codes,"&output=csv")
paths <- paste0(dataDir,"/",codes,".csv") # back slash on windows (\\)
# simple error handling in case file doesn't exists
downloadFile <- function(url, path, ...) {
# remove file if exists already
if(file.exists(path)) file.remove(path)
# download file
tryCatch(
download.file(url, path, ...), error = function(c) {
# remove file if error
if(file.exists(path)) file.remove(path)
# create error message
c$message <- paste(substr(path, 1, 4),"failed")
message(c$message)
}
)
}
# wrapper of mapply
Map(downloadFile, urls, paths)
{% endhighlight %}
Finally files are read back using `llply` and they are combined using `rbind_all`. Note that, as the merged data has multiple stocks' records, `Code` column is created.
{% highlight r %}
# read all csv files and merge
files <- dir(dataDir, full.name = TRUE)
dataList <- llply(files, function(file){
data <- read.csv(file, stringsAsFactors = FALSE)
# get code from file path
pattern <- "/[A-Z][A-Z][A-Z][A-Z]"
code <- substr(str_extract(file, pattern), 2, nchar(str_extract(file, pattern)))
# first column's name is funny
names(data) <- c("Date","Open","High","Low","Close","Volume")
data$Date <- dmy(data$Date)
data$Open <- as.numeric(data$Open)
data$High <- as.numeric(data$High)
data$Low <- as.numeric(data$Low)
data$Close <- as.numeric(data$Close)
data$Volume <- as.integer(data$Volume)
data$Code <- code
data
}, .progress = "text")
data <- rbind_all(dataList)
{% endhighlight %}
Some of the values are shown below.
|Date | Open| High| Low| Close| Volume|Code |
|:----------|-----:|-----:|-----:|-----:|--------:|:----|
|2014-11-26 | 47.49| 47.99| 47.28| 47.75| 27164877|MSFT |
|2014-11-25 | 47.66| 47.97| 47.45| 47.47| 28007993|MSFT |
|2014-11-24 | 47.99| 48.00| 47.39| 47.59| 35434245|MSFT |
|2014-11-21 | 49.02| 49.05| 47.57| 47.98| 42884795|MSFT |
|2014-11-20 | 48.00| 48.70| 47.87| 48.70| 21510587|MSFT |
|2014-11-19 | 48.66| 48.75| 47.93| 48.22| 26177450|MSFT |
This way wouldn't be efficient compared to the way where files are read directly without being saved into a local drive. This option may be useful, however, if files are large and the API server breaks connection abrubtly.
I hope this article is useful and I'm going to write an article to show the second way.
Method #2:
---
layout: post
title: "2014-11-20-Download-Stock-Data-2"
description: ""
category: R
tags: [knitr,lubridate,stringr,plyr,dplyr]
---
{% include JB/setup %}
In an [earlier article](http://jaehyeon-kim.github.io/r/2014/11/20/Download-Stock-Data-1/), a way to download stock price data files from Google, save it into a local drive and merge them into a single data frame. If files are not large, however, it wouldn't be effective and, in this article, files are downloaded and merged internally.
The following packages are used.
{% highlight r %}
library(knitr)
library(lubridate)
library(stringr)
library(plyr)
library(dplyr)
{% endhighlight %}
Taking urls as file locations, files are directly read using `llply` and they are combined using `rbind_all`. As the merged data has multiple stocks' records, `Code` column is created. Note that, when an error occurrs, the function returns a dummy data frame in order not to break the loop - values of the dummy data frame(s) are filtered out at the end.
{% highlight r %}
# assumes codes are known beforehand
codes <- c("MSFT", "TCHC") # codes <- c("MSFT", "1234") for testing
files <- paste0("http://www.google.com/finance/historical?q=NASDAQ:",
codes,"&output=csv")
dataList <- llply(files, function(file, ...) {
# get code from file url
pattern <- "Q:[0-9a-zA-Z][0-9a-zA-Z][0-9a-zA-Z][0-9a-zA-Z]"
code <- substr(str_extract(file, pattern), 3, nchar(str_extract(file, pattern)))
# read data directly from a URL with only simple error handling
# for further error handling: http://adv-r.had.co.nz/Exceptions-Debugging.html
tryCatch({
data <- read.csv(file, stringsAsFactors = FALSE)
# first column's name is funny
names(data) <- c("Date","Open","High","Low","Close","Volume")
data$Date <- dmy(data$Date)
data$Open <- as.numeric(data$Open)
data$High <- as.numeric(data$High)
data$Low <- as.numeric(data$Low)
data$Close <- as.numeric(data$Close)
data$Volume <- as.integer(data$Volume)
data$Code <- code
data
},
error = function(c) {
c$message <- paste(code,"failed")
message(c$message)
# return a dummy data frame
data <- data.frame(Date=dmy(format(Sys.Date(),"%d%m%Y")), Open=0, High=0,
Low=0, Close=0, Volume=0, Code="NA")
data
})
})
# dummy data frame values are filtered out
data <- filter(rbind_all(dataList), Code != "NA")
{% endhighlight %}
Some of the values are shown below.
|Date | Open| High| Low| Close| Volume|Code |
|:----------|-----:|-----:|-----:|-----:|--------:|:----|
|2014-11-26 | 47.49| 47.99| 47.28| 47.75| 27164877|MSFT |
|2014-11-25 | 47.66| 47.97| 47.45| 47.47| 28007993|MSFT |
|2014-11-24 | 47.99| 48.00| 47.39| 47.59| 35434245|MSFT |
|2014-11-21 | 49.02| 49.05| 47.57| 47.98| 42884795|MSFT |
|2014-11-20 | 48.00| 48.70| 47.87| 48.70| 21510587|MSFT |
|2014-11-19 | 48.66| 48.75| 47.93| 48.22| 26177450|MSFT |
It took a bit longer to complete the script as I had to teach myself how to handle errors in R. And this is why I started to write articles in this blog.
I hope this article is useful.
Summarize Stock returns From Multiple Files:
---
layout: post
title: "2014-11-27-Summarise-Stock-Returns-from-Multiple-Files"
description: ""
category: R
tags: [knitr,lubridate,stringr,reshape2,plyr,dplyr]
---
{% include JB/setup %}
This is a slight extension of the previous two articles ( [2014-11-20-Download-Stock-Data-1](http://jaehyeon-kim.github.io/r/2014/11/20/Download-Stock-Data-1/), [2014-11-20-Download-Stock-Data-2](http://jaehyeon-kim.github.io/r/2014/11/20/Download-Stock-Data-2/) ) and it aims to produce gross returns, standard deviation and correlation of multiple shares.
The following packages are used.
{% highlight r %}
library(knitr)
library(lubridate)
library(stringr)
library(reshape2)
library(plyr)
library(dplyr)
{% endhighlight %}
The script begins with creating a data folder in the format of *data_YYYY-MM-DD*.
{% highlight r %}
# create data folder
dataDir <- paste0("data","_",format(Sys.Date(),"%Y-%m-%d"))
if(file.exists(dataDir)) {
unlink(dataDir, recursive = TRUE)
dir.create(dataDir)
} else {
dir.create(dataDir)
}
{% endhighlight %}
Given company codes, URLs and file paths are created. Then data files are downloaded by `Map`, which is a wrapper of `mapply`. Note that R's `download.file` function is wrapped by `downloadFile` so that the function does not break when an error occurs.
{% highlight r %}
# assumes codes are known beforehand
codes <- c("MSFT", "TCHC")
urls <- paste0("http://www.google.com/finance/historical?q=NASDAQ:",
codes,"&output=csv")
paths <- paste0(dataDir,"/",codes,".csv") # backward slash on windows (\)
# simple error handling in case file doesn't exists
downloadFile <- function(url, path, ...) {
# remove file if exists already
if(file.exists(path)) file.remove(path)
# download file
tryCatch(
download.file(url, path, ...), error = function(c) {
# remove file if error
if(file.exists(path)) file.remove(path)
# create error message
c$message <- paste(substr(path, 1, 4),"failed")
message(c$message)
}
)
}
# wrapper of mapply
Map(downloadFile, urls, paths)
{% endhighlight %}
Once the files are downloaded, they are read back to combine using `rbind_all`. Some more details about this step is listed below.
* only Date, Close and Code columns are taken
* codes are extracted from file paths by matching a regular expression
* data is arranged by date as the raw files are sorted in a descending order
* error is handled by returning a dummy data frame where its code value is NA.
* individual data files are merged in a long format
* 'NA' is filtered out
{% highlight r %}
# read all csv files and merge
files <- dir(dataDir, full.name = TRUE)
dataList <- llply(files, function(file){
# get code from file path
pattern <- "/[A-Z][A-Z][A-Z][A-Z]"
code <- substr(str_extract(file, pattern), 2, nchar(str_extract(file, pattern)))
tryCatch({
data <- read.csv(file, stringsAsFactors = FALSE)
# first column's name is funny
names(data) <- c("Date","Open","High","Low","Close","Volume")
data$Date <- dmy(data$Date)
data$Close <- as.numeric(data$Close)
data$Code <- code
# optional
data$Open <- as.numeric(data$Open)
data$High <- as.numeric(data$High)
data$Low <- as.numeric(data$Low)
data$Volume <- as.integer(data$Volume)
# select only 'Date', 'Close' and 'Code'
# raw data should be arranged in an ascending order
arrange(subset(data, select = c(Date, Close, Code)), Date)
},
error = function(c){
c$message <- paste(code,"failed")
message(c$message)
# return a dummy data frame not to break function
data <- data.frame(Date=dmy(format(Sys.Date(),"%d%m%Y")), Close=0, Code="NA")
data
})
}, .progress = "text")
# data is combined to create a long format
# dummy data frame values are filtered out
data <- filter(rbind_all(dataList), Code != "NA")
{% endhighlight %}
Some values of this long format data is shown below.
|Date | Close|Code |
|:----------|-----:|:----|
|2013-11-29 | 38.13|MSFT |
|2013-12-02 | 38.45|MSFT |
|2013-12-03 | 38.31|MSFT |
|2013-12-04 | 38.94|MSFT |
|2013-12-05 | 38.00|MSFT |
|2013-12-06 | 38.36|MSFT |
The data is converted into a wide format data where the x and y variables are Date and Code respectively (`Date ~ Code`) while the value variable is Close (`value.var="Close"`). Some values of the wide format data is shown below.
{% highlight r %}
# data is converted into a wide format
data <- dcast(data, Date ~ Code, value.var="Close")
kable(head(data))
{% endhighlight %}
|Date | MSFT| TCHC|
|:----------|-----:|-----:|
|2013-11-29 | 38.13| 13.52|
|2013-12-02 | 38.45| 13.81|
|2013-12-03 | 38.31| 13.48|
|2013-12-04 | 38.94| 13.71|
|2013-12-05 | 38.00| 13.55|
|2013-12-06 | 38.36| 13.95|
The remaining steps are just differencing close price values after taking log and applying `sum`, `sd`, and `cor`.
{% highlight r %}
# select except for Date column
data <- select(data, -Date)
# apply log difference column wise
dailyRet <- apply(log(data), 2, diff, lag=1)
# obtain daily return, variance and correlation
returns <- apply(dailyRet, 2, sum, na.rm = TRUE)
std <- apply(dailyRet, 2, sd, na.rm = TRUE)
correlation <- cor(dailyRet)
returns
{% endhighlight %}
{% highlight text %}
## MSFT TCHC
## 0.2249777 0.6293973
{% endhighlight %}
{% highlight r %}
std
{% endhighlight %}
{% highlight text %}
## MSFT TCHC
## 0.01167381 0.03203031
{% endhighlight %}
{% highlight r %}
correlation
{% endhighlight %}
{% highlight text %}
## MSFT TCHC
## MSFT 1.0000000 0.1481043
## TCHC 0.1481043 1.0000000
{% endhighlight %}
Finally the data folder is deleted.
{% highlight r %}
# delete data folder
if(file.exists(dataDir)) { unlink(dataDir, recursive = TRUE) }
{% endhighlight %}
Related
Download file from url R
I am having problems downloading data from the link below directly with the code into R: kaggle.com/c/house-prices-advanced-regression-techniques/data I tried with this code: data<-read.csv("https://www.kaggle.com/c/house-prices-advanced-regression-techniques/data?select=test.csv", skip = 1") I tried most of the options listed here: Access a URL and read Data with R However, I only get html table and not tables with the relevant house-price data from the web-site. Not sure what I am doing wrong. tnx
Here's a simple example post on kaggle how to achieve your goal, the code is taken from the example. Create a verified account Log in Go to you account (click the top right -> account) Click "Create new API token" Place the file somewhere sensible that you can access from R library(httr) library(jsonlite) kgl_credentials <- function(kgl_json_path="~/.kaggle/kaggle.json"){ # returns user credentials from kaggle json user <- fromJSON("~/.kaggle/kaggle.json", flatten = TRUE) return(user) } kgl_dataset <- function(ref, file_name, type="dataset", kgl_json_path="~/.kaggle/kaggle.json"){ # ref: depends on 'type': # - dataset: "sudalairajkumar/novel-corona-virus-2019-dataset" # - competition: competition ID, e.g. 8587 for "competitive-data-science-predict-future-sales" # file_name: specific dataset wanted, e.g. "covid_19_data.csv" .kaggle_base_url <- "https://www.kaggle.com/api/v1" user <- kgl_credentials(kgl_json_path) if(type=="dataset"){ # dataset url <- paste0(.kaggle_base_url, "/datasets/download/", ref, "/", file_name) }else if(type=="competition"){ # competition url <- paste0(.kaggle_base_url, "/competitions/data/download/", ref, "/", file_name) } # call rcall <- httr::GET(url, httr::authenticate(user$username, user$key, type="basic")) # content type content_type <- rcall[[3]]$`content-type` if( grepl("zip", content_type)){ # download and unzup temp <- tempfile() download.file(rcall$url,temp) data <- read.csv(unz(temp, file_name)) unlink(temp) }else{ # else read as text -- note: code this better data <- content(rcall, type="text/csv", encoding = "ISO-8859-1") } return(data) } Then you can use the credentials to download the dataset as described in the post kgl_dataset(file_name = 'test.csv', type = 'competition', ref = 'house-prices-advanced-regression-techniques', kgl_json_path = 'kaggle.json') Alternatively you can use the unofficial R api library(devtools) install_github('mkearney/kaggler') library(kaggler) kgl_auth(creds_file = 'kaggle.json') kgl_competitions_data_download('house-prices-advanced-regression-techniques', 'test.csv') However this fails, due to a mistake in the implementation of kgl_api_get function (path, ..., auth = kgl_auth()) { r <- httr::GET(kgl_api_call(path, ...), auth) httr::warn_for_status(r) if (r$status_code != 200) { # <== should be "==" ... }
I downloaded the data (which you should just do too, it's quite easy), but just in case you don't want to, I uploaded the data to Pastebin and you can run the code below. This is for their "train" dataset, downloaded from the link you provided above data <- read.delim("https://pastebin.com/raw/aGvwwdV0", header=T)
How to call a script in another script in R
I have created a series of commands in R that get a job done using a specific URL. I would like to iterate the series of commands over a list of URLS that reside in a separate text file. How do I call the list into the commands one at a time? I do not know what the proper terminology for this programming action. I've looked into scripting and batch programming but this is not what I want to do. # URL that comes from list URL <- "http://www.urlfromlist.com" # Load URL theurl <- getURL(URL,.opts = list(ssl.verifypeer = FALSE) ) # Read the tables tables <- readHTMLTable(theurl) # Create a list tables <- list.clean(tables, fun = is.null, recursive = FALSE) # Convert the list to a data frame df <- do.call(rbind.data.frame, tables) # Save dataframe out as a csv file write.csv(df2, file = dynamicname, row.names=FALSE) The above code is what I am doing. The first variable needs to be a different URL each time from a list - rinse and repeat. Thanks! UPDATED CODE - this is still not writing out any files but runs. # Function to pull tables from list of URLs URLfunction<- function(x){ # URL that comes from list URL <- x # Load URL theurl <- RCurl::getURL(URL,.opts = list(ssl.verifypeer = FALSE) ) # Read the tables tables <- XML::readHTMLTable(theurl) # Create a list tables <- rlist::list.clean(tables, fun = is.null, recursive = FALSE) # Convert the list to a data frame df <- do.call(rbind,tables) # Split date and time column out df2 <- separate(df, "Date / Time", c("Date", "Time"), sep = " ") # Fill the missing column with text, in this case shapename shapename <- qdapRegex::ex_between(URL, "ndxs", ".html") df2$Shape <- shapename # Save dataframe out as a csv file write.csv(result, paste0(shapename, '.csv', row.names=FALSE)) return(df2) } URL <- read.csv("PATH", header = FALSE) purrr::map_df(URL, URLfunction) ## Also tried purrr::map_df(URL[,1], URLfunction)
If i understand your question correctly, my answer could be work with your problem. Used library library(RCurl) library(XML) library(rlist) library(purrr) Define function URLfunction<- function(x){ # URL that comes from list URL <- x # Load URL theurl <- RCurl::getURL(URL,.opts = list(ssl.verifypeer = FALSE) ) # Read the tables tables <- XML::readHTMLTable(theurl) # Create a list tables <- rlist::list.clean(tables, fun = is.null, recursive = FALSE) # Convert the list to a data frame df <- do.call(rbind,tables) # Save dataframe out as a csv file return(df) } Assume you have a data like below ( I am not sure what data looks like you have ) URL <- c("https://stackoverflow.com/questions/56139810/how-to-call-a-script-in-another-script-in-r", "https://stackoverflow.com/questions/56122052/labelling-points-on-a-highcharter-scatter-chart/56123057?noredirect=1#comment98909916_56123057") result<- purrr::map(URL, URLfunction) result <- do.call(rbind, result) Write.csv is last step If you want write.csv by each URL , plz move in to URLfunction write.csv(result, file = dynamicname, row.names=FALSE) Aditional List version URL <- list("https://stackoverflow.com/questions/56139810/how-to-call-a-script-in-another-script-in-r", "https://stackoverflow.com/questions/56122052/labelling-points-on-a-highcharter-scatter-chart/56123057?noredirect=1#comment98909916_56123057") result<- purrr::map_df(URL, URLfunction) >result asked today yesterday 1 viewed 35 times <NA> 2 active today <NA> 3 viewed <NA> 34 times 4 active <NA> today CSV URL <- read.csv("PATH",header = FALSE) result<- purrr::map_df(URL[,1], URLfunction) >result asked today yesterday 1 viewed 35 times <NA> 2 active today <NA> 3 viewed <NA> 34 times 4 active <NA> today Add edited version of your code. URLfunction<- function(x){ # URL that comes from list URL <- x # Load URL theurl <- RCurl::getURL(URL,.opts = list(ssl.verifypeer = FALSE) ) # Read the tables tables <- XML::readHTMLTable(theurl) # Create a list tables <- rlist::list.clean(tables, fun = is.null, recursive = FALSE) # Convert the list to a data frame df <- do.call(rbind,tables) # Split date and time column out df2 <- tidyr::separate(df, "Date / Time", c("Date", "Time"), sep = " ") # Fill the missing column with text, in this case shapename shapename <- unlist(qdapRegex::ex_between(URL, "ndxs", ".html")) # qdapRegex::ex_between returns list type, when it added to df2 it couldn't be saved. # So i added 'unlist' df2$Shape <- shapename # Save dataframe out as a csv file write.csv(df2, paste0(shapename, '.csv'), row.names=FALSE) # Here are two error. # First, You maked the data named 'df2' not 'result'. So i changed result -->df2 # Second, row.names is not the 'paste0' attributes, it is 'write.csv's attributes. return(df2) } After defining above function, URL = c("nuforc.org/webreports/ndxsRectangle.html", "nuforc.org/webreports/ndxsRound.html") RESULT = purrr::map_df(URL, URLfunction) ## Also tried purrr::map_df(URL[,1], URLfunction) Finally, i get the result below 1. Rectangle.csv, Round.csv files on your desktop(Saved path). 2. Returning row binded data frame looks like below (2011 x 8) > RESULT[1,] Date Time City State Shape Duration 1 5/2/19 00:20 Honolulu HI Rectangle 3 seconds Summary 1 Several of rectangles connected in different LED like colors. Such as red, green, blue, etc. ;above Waikiki. ((anonymous report)) Posted 1 5/9/19
How to summarize by Quarter in R
I am having some difficulties on summarizing data from my database in R. I am looking to pull the data and have it summarized by Quarter. Below is the code i am using to get a txt output but I am getting errors. What do I need to do to manipulate the code to run this so that I can have the data be summarized by quarter? library(data.table, warn.conflicts = FALSE) library(lubridate, warn.conflicts = FALSE) ################ ## PARAMETERS ## ################ # Set path of major source folder for raw transaction data in_directory <- "C:/Users/name/Documents/Raw Data/" # List names of sub-folders (currently grouped by first two characters of CUST_ID) in_subfolders <- list("AA-CA", "CB-HZ", "IA-IL", "IM-KZ", "LA-MI", "MJ-MS", "MT-NV", "NW-OH", "OI-PZ", "QA-TN", "TO-UZ", "VA-WA", "WB-ZZ") # Set location for output out_directory <- "C:/Users/name/Documents/YTD Master/" out_filename <- "NEW.csv" # Set beginning and end of date range to be collected - year-month-day format date_range <- interval(as.Date("2018-01-01"), as.Date("2018-05-31")) # Enable or disable filtering of raw files to only grab items bought within certain months to save space. # If false, all files will be scanned for unique items, which will take longer and be a larger file. date_filter <- TRUE ########## ## CODE ## ########## starttime <- Sys.time() mastertable <- NULL for (j in 1:length(in_subfolders)) { subfolder <- in_subfolders[j] sub_directory <- paste0(in_directory, subfolder, "/") ## IMPORT DATA in_filenames <- dir(sub_directory, pattern =".txt") for (i in 1:length(in_filenames)) { # Default value provided for when fast filtering is disabled. read_this_file <- TRUE # To fast filter the data, we choose to include or exclude an entire file based on the date of its first line. # WARNING: This is only a valid method if filtering by entire months, since that is the amount of data housed in each file. if (date_filter) { temptable <- fread(paste0(sub_directory, in_filenames[i]), colClasses=c(CUSTOMER_TIER = "character"), na.strings = "", nrows = 1) temptable[, INVOICE_DT := as.Date(INVOICE_DT)] # If date matches, set read flag to TRUE. If date does not match, set read flag to FALSE. read_this_file <- temptable[, INVOICE_DT] %within% date_range } if (read_this_file) { print(Sys.time()-starttime) print(paste0("Reading in ", in_filenames[i])) temptable <- fread(paste0(sub_directory, in_filenames[i]), colClasses=c(CUSTOMER_TIER = "character"), na.strings = "") temptable <- temptable[, lapply(.SD, sum), by = quarter(INVOICE_DT), .SDcols = c("INV_ITEM_ID","Ext Sale", "Ext Total Cost", "CE100", "CE110","CE120","QTY_SOLD","PACKSLIP_WHSL")] # Combine into full list mastertable <- rbindlist(list(mastertable, temptable), use.names = TRUE) # Release unneeded memory rm(temptable) } } } # Save Final table print("Saving master table") fwrite(mastertable, paste0(out_directory, out_filename)) rm(mastertable) print(Sys.time()-starttime) After running this scrip the below is the error message i receive. Error in gsum(INV_ITEM_ID) : Type 'character' not supported by GForce sum (gsum). Either add the prefix base::sum(.) or turn off GForce optimization using options(datatable.optimize=1)
Here is the general approach with some generic data. library(tidyverse) library(lubridate) data.frame(date = seq(as.Date('2010-01-12'), as.Date('2018-02-03'), by = 100), var = runif(30)) %>% group_by(quarter(date, with_year = T)) %>% summarize(average_var = mean(var)) you can leave out the "with_year = T" if you don't care about the differences between years.
R - cut a specific column from multiple files and bind them altogether
I have multiple files (30, tab delimited) that look like the one below: |target_id | length| eff_length| est_counts| tpm| |:------------|------:|----------:|----------:|--------:| |LmjF.27.1250 | 966| 823.427| 2932| 94.7314| |LmjF.09.0430 | 1410| 1267.430| 3603| 75.6304| |LmjF.13.0210 | 2001| 1858.430| 4435| 63.4897| |LmjF.28.0530 | 4083| 3940.430| 7032| 47.4778| |LmjF.16.1400 | 591| 448.577| 1163| 68.9761| |LmjF.29.2570 | 1506| 1363.430| 11135| 217.2770| I am trying to cut the fifth column from all of these files 30 files with a command such as: fifth_colum_file1 = file1.csv[ , 5] But I want to make the process more automatised. The files that I want to work with have all the pattern "bs_abundance", therefore I think a good starting point would be to either load all the files I want to work with with such a command: temp = list.files(pattern="*bs_abundance") Or perhaps I can also load all the tables I want to work with directly into the working space already: for(i in temp) { x <- read.table(i, header=TRUE, comment.char = "A", sep="\t") assign(i,x) } Then, as explained, I want to cut the fifth column of each of the files to later bind them all to another table of same number of rows.
Put the files into a folder. For this example let's call it temp. Set your working directory appropriately or specify the full path for the example below. cols <- as.character() files <- dir("temp") for(i in files){ # You didn't mention a file type, but let's say it's csv tmp <- read.csv(files[i], header = T) tmp <- tmp[, 5] cols <- cbind(cols, tmp) } Then you can just cbind the columns in cols with your final data object.
Here is a method using lapply that assumes each file in the folder has the same number of rows. # get file names files <- dir("temp") # remove one file files <- files[-which(files == "removeFileName")] # get list of vectors from 29 files myList <- lapply(files, function(i) {temp <- read.csv(i); temp[, 5]}) # get new data.frame dfDone <- do.call(data.frame, myList)
Web Scraping Yahoo Finance in R (with R Vest)
I am trying to use R vest to webscrape the NASDAQ closing dates for the last 3 months so I can play around with the data. Problem being I cant seem to find the correct xpath for it to return the table. I've tried quite a few using chrome's 'inspect element' to find xpaths as well as 'SelectorGadget' plug-in for chrome. It seems most people have done this with python but I am much more comfortable in R and specifically using R vest for web scraping so i'm hoping i'm not alone! I've posted my code below. I believe the problem is in identifying the xpath. Here is an example of one of the webpages...http://finance.yahoo.com/q/hp?s=CSV After I get one to work I hope to put it in a loop which is below my problem code.... Thank you! library("rvest") library("data.table") library("xlsx") #Problem Code company <- 'CSV' url <- paste("http://finance.yahoo.com/q/hp?s=",toString(company),sep="") url <-html(url) select_table <- '//table' #this is the line I think is incorrect fnames <- html_nodes(url, xpath=select_table) %>% html_table(fill=TRUE) STOCK <- fnames[[1]] STOCKS <- rbind(STOCK, STOCKS) #--------------------------------------------------------------------- #Loop for use later companylist <- read.csv('companylist.csv') #this is a list of all company tickers in the NASDAQ STOCK <- data.frame() STOCKS <- data.frame(Date=character(),Open=character(),High=character(),Low=character(),Close=character(),Volume=character(), AdjClose=character()) for (i in 1:3095) { company <- companylist[i,1] url <- paste("http://finance.yahoo.com/q/hp?s=",toString(company),sep="") url <-html(url) select_table <- '//*[#id="yfncsumtab"]/tbody/tr[2]/td[1]/table[4]' fnames <- html_nodes(url,xpath = select_table) %>% html_table(fill=TRUE) STOCK <- fnames[[1]] STOCKS <- rbind(STOCK, STOCKS) } View(STOCKS)
Do you want to grab stock prices? https://gist.github.com/jaehyeon-kim/356cf62b61248193db25#file-downloadstockdata # assumes codes are known beforehand codes <- c("ABT", "ABBV", "ACE", "ACN", "ACT", "ADBE", "ADT", "AES", "AET", "AFL", "AMG", "A", "GAS", "APD", "ARG", "AKAM", "AA") urls <- paste0("http://www.google.com/finance/historical?q=NASDAQ:", codes,"&output=csv") paths <- paste0(codes,"csv") missing <- !(paths %in% dir(".", full.name = TRUE)) missing # simple error handling in case file doesn't exists downloadFile <- function(url, path, ...) { # remove file if exists already if(file.exists(path)) file.remove(path) # download file tryCatch( download.file(url, path, ...), error = function(c) { # remove file if error if(file.exists(path)) file.remove(path) # create error message c$message <- paste(substr(path, 1, 4),"failed") message(c$message) } ) } # wrapper of mapply Map(downloadFile, urls[missing], paths[missing]) You can try this as well . . . library(knitr) library(lubridate) library(stringr) library(plyr) library(dplyr) {% endhighlight %} The script begins with creating a folder to save data files. {% highlight r %} # create data folder dataDir <- paste0("data","_","2014-11-20-Download-Stock-Data-1") if(file.exists(dataDir)) { unlink(dataDir, recursive = TRUE) dir.create(dataDir) } else { dir.create(dataDir) } {% endhighlight %} After creating urls and file paths, files are downloaded using `Map` function - it is a warpper of `mapply`. Note that, in case the function breaks by an error (eg when a file doesn't exist), `download.file` is wrapped by another function that includes an error handler (`tryCatch`). {% highlight r %} # assumes codes are known beforehand codes <- c("MSFT", "TCHC") # codes <- c("MSFT", "1234") for testing urls <- paste0("http://www.google.com/finance/historical?q=NASDAQ:", codes,"&output=csv") paths <- paste0(dataDir,"/",codes,".csv") # back slash on windows (\\) # simple error handling in case file doesn't exists downloadFile <- function(url, path, ...) { # remove file if exists already if(file.exists(path)) file.remove(path) # download file tryCatch( download.file(url, path, ...), error = function(c) { # remove file if error if(file.exists(path)) file.remove(path) # create error message c$message <- paste(substr(path, 1, 4),"failed") message(c$message) } ) } # wrapper of mapply Map(downloadFile, urls, paths) {% endhighlight %} Finally files are read back using `llply` and they are combined using `rbind_all`. Note that, as the merged data has multiple stocks' records, `Code` column is created. {% highlight r %} # read all csv files and merge files <- dir(dataDir, full.name = TRUE) dataList <- llply(files, function(file){ data <- read.csv(file, stringsAsFactors = FALSE) # get code from file path pattern <- "/[A-Z][A-Z][A-Z][A-Z]" code <- substr(str_extract(file, pattern), 2, nchar(str_extract(file, pattern))) # first column's name is funny names(data) <- c("Date","Open","High","Low","Close","Volume") data$Date <- dmy(data$Date) data$Open <- as.numeric(data$Open) data$High <- as.numeric(data$High) data$Low <- as.numeric(data$Low) data$Close <- as.numeric(data$Close) data$Volume <- as.integer(data$Volume) data$Code <- code data }, .progress = "text") data <- rbind_all(dataList) {% endhighlight %}