quantmod getFinancials() not pulling financials - r

I'm looking to download fundamental data for public companies. Utilizing the quantmod package, I was trying to use getFinancials() to pull data, for which it works for some companies but has varied results (I read and understand the disclaimer about free data) but want to confirm that I am pulling this correctly.
For JPM:
On the Yahoo finance website, I do see financials populated, but the below call seems to pull "google" as the src instead of "yahoo", for which there are sparse financials populated.
Google - https://www.google.com/finance?q=NYSE%3AJPM&fstype=ii&ei=9kh-WejLE5e_etbzmpgP
Yahoo - https://finance.yahoo.com/quote/JPM/financials?p=JPM
JPM <- getFinancials("JPM", src = "yahoo", auto.assign = FALSE)
viewFin(JPM, type = "IS", period = "A")
Is there a correct way to specify the src? Also is there a way to use getFinancials() but if there is a NA in an indicative column (Revenues for example) switch the source (google vs. yahoo)?

The top of the help page for getFinancials says (emphasis added),
Download Income Statement, Balance Sheet, and Cash Flow Statements from Google Finance.
There is currently no way to specify Yahoo Finance as a source. Doing so would require someone to write a method to scrape and parse the HTML from Yahoo Finance, since there's no way to download it in a file like there is for price data.

I think Yahoo changed it's API very recently. Download the file from the link titled "Get Excel Spreadsheet to Download Bulk Historical Stock Data from Google Finance"
That is for Excel, which you can easily load into R.
You could try something like this, as well.
# assumes codes are known beforehand
codes <- c("MSFT","SBUX","S","AAPL","ADT")
urls <- paste0("https://www.google.com/finance/historical?q=",codes,"&output=csv")
paths <- paste0(codes,"csv")
missing <- !(paths %in% dir(".", full.name = TRUE))
# simple error handling in case file doesn't exists
downloadFile <- function(url, path, ...) {
# remove file if exists already
if(file.exists(path)) file.remove(path)
# download file
download.file(url, path, ...), error = function(c) {
# remove file if error
if(file.exists(path)) file.remove(path)
# create error message
c$message <- paste(substr(path, 1, 4),"failed")
# wrapper of mapply
Map(downloadFile, urls[missing], paths[missing])
Or, this.
## downloads historic prices for all constituents of SP500
## read in list of constituents, with company name in first column and
## ticker symbol in second column
spComp <- read.csv("C:/Users/Excel/Desktop/stocks.csv" )
## specify time period
dateStart <- "2013-01-01"
dateEnd <- "2015-05-08"
## extract symbols and number of iterations
symbols <- spComp[, 1]
nAss <- length(symbols)
## download data on first stock as zoo object
z <- get.hist.quote(instrument = symbols[1], start = dateStart,
end = dateEnd, quote = "AdjClose",
retclass = "zoo", quiet = T)
## use ticker symbol as column name
dimnames(z)[[2]] <- as.character(symbols[1])
## download remaining assets in for loop
for (i in 2:nAss) {
## display progress by showing the current iteration step
cat("Downloading ", i, " out of ", nAss , "\n")
result <- try(x <- get.hist.quote(instrument = symbols[i],
start = dateStart,
end = dateEnd, quote = "AdjClose",
retclass = "zoo", quiet = T))
if(class(result) == "try-error") {
else {
dimnames(x)[[2]] <- as.character(symbols[i])
## merge with already downloaded data to get assets on same dates
z <- merge(z, x)
## save data
write.zoo(z, file = "C:/Users/Excel/Desktop/all_sp500_price_data.csv", index.name = "time")
Here is, yet another, option for you to consider.
Method #1:
Method #2:
Summarize Stock returns From Multiple Files:
Download file from url R

I am having problems downloading data from the link below directly with the code into R:
I tried with this code:
data<-read.csv("https://www.kaggle.com/c/house-prices-advanced-regression-techniques/data?select=test.csv", skip = 1")
I tried most of the options listed here:
Access a URL and read Data with R
However, I only get html table and not tables with the relevant house-price data from the web-site. Not sure what I am doing wrong.
Here's a simple example post on kaggle how to achieve your goal, the code is taken from the example.
Create a verified account
Log in
Go to you account (click the top right -> account)
Click "Create new API token"
Place the file somewhere sensible that you can access from R
kgl_credentials <- function(kgl_json_path="~/.kaggle/kaggle.json"){
# returns user credentials from kaggle json
user <- fromJSON("~/.kaggle/kaggle.json", flatten = TRUE)
kgl_dataset <- function(ref, file_name, type="dataset", kgl_json_path="~/.kaggle/kaggle.json"){
# ref: depends on 'type':
# - dataset: "sudalairajkumar/novel-corona-virus-2019-dataset"
# - competition: competition ID, e.g. 8587 for "competitive-data-science-predict-future-sales"
# file_name: specific dataset wanted, e.g. "covid_19_data.csv"
.kaggle_base_url <- "https://www.kaggle.com/api/v1"
user <- kgl_credentials(kgl_json_path)
# dataset
url <- paste0(.kaggle_base_url, "/datasets/download/", ref, "/", file_name)
}else if(type=="competition"){
# competition
url <- paste0(.kaggle_base_url, "/competitions/data/download/", ref, "/", file_name)
# call
rcall <- httr::GET(url, httr::authenticate(user$username, user$key, type="basic"))
# content type
content_type <- rcall[[3]]$`content-type`
if( grepl("zip", content_type)){
# download and unzup
temp <- tempfile()
data <- read.csv(unz(temp, file_name))
# else read as text -- note: code this better
data <- content(rcall, type="text/csv", encoding = "ISO-8859-1")
Then you can use the credentials to download the dataset as described in the post
kgl_dataset(file_name = 'test.csv',
type = 'competition',
ref = 'house-prices-advanced-regression-techniques',
kgl_json_path = 'kaggle.json')
Alternatively you can use the unofficial R api
kgl_auth(creds_file = 'kaggle.json')
kgl_competitions_data_download('house-prices-advanced-regression-techniques', 'test.csv')
However this fails, due to a mistake in the implementation of kgl_api_get
function (path, ..., auth = kgl_auth())
r <- httr::GET(kgl_api_call(path, ...), auth)
if (r$status_code != 200) { # <== should be "=="
I downloaded the data (which you should just do too, it's quite easy), but just in case you don't want to, I uploaded the data to Pastebin and you can run the code below. This is for their "train" dataset, downloaded from the link you provided above
data <- read.delim("https://pastebin.com/raw/aGvwwdV0", header=T)

How to call a script in another script in R

I have created a series of commands in R that get a job done using a specific URL. I would like to iterate the series of commands over a list of URLS that reside in a separate text file. How do I call the list into the commands one at a time?
I do not know what the proper terminology for this programming action. I've looked into scripting and batch programming but this is not what I want to do.
# URL that comes from list
URL <- "http://www.urlfromlist.com"
# Load URL
theurl <- getURL(URL,.opts = list(ssl.verifypeer = FALSE) )
# Read the tables
tables <- readHTMLTable(theurl)
# Create a list
tables <- list.clean(tables, fun = is.null, recursive = FALSE)
# Convert the list to a data frame
df <- do.call(rbind.data.frame, tables)
# Save dataframe out as a csv file
write.csv(df2, file = dynamicname, row.names=FALSE)
The above code is what I am doing. The first variable needs to be a different URL each time from a list - rinse and repeat. Thanks!
UPDATED CODE - this is still not writing out any files but runs.
# Function to pull tables from list of URLs
URLfunction<- function(x){
# URL that comes from list
URL <- x
# Load URL
theurl <- RCurl::getURL(URL,.opts = list(ssl.verifypeer = FALSE) )
# Read the tables
tables <- XML::readHTMLTable(theurl)
# Create a list
tables <- rlist::list.clean(tables, fun = is.null, recursive = FALSE)
# Convert the list to a data frame
df <- do.call(rbind,tables)
# Split date and time column out
df2 <- separate(df, "Date / Time", c("Date", "Time"), sep = " ")
# Fill the missing column with text, in this case shapename
shapename <- qdapRegex::ex_between(URL, "ndxs", ".html")
df2$Shape <- shapename
# Save dataframe out as a csv file
write.csv(result, paste0(shapename, '.csv', row.names=FALSE))
URL <- read.csv("PATH", header = FALSE)
purrr::map_df(URL, URLfunction) ## Also tried purrr::map_df(URL[,1], URLfunction)
If i understand your question correctly,
my answer could be work with your problem.
Used library
Define function
URLfunction<- function(x){
# URL that comes from list
URL <- x
# Load URL
theurl <- RCurl::getURL(URL,.opts = list(ssl.verifypeer = FALSE) )
# Read the tables
tables <- XML::readHTMLTable(theurl)
# Create a list
tables <- rlist::list.clean(tables, fun = is.null, recursive = FALSE)
# Convert the list to a data frame
df <- do.call(rbind,tables)
# Save dataframe out as a csv file
Assume you have a data like below
( I am not sure what data looks like you have )
URL <- c("https://stackoverflow.com/questions/56139810/how-to-call-a-script-in-another-script-in-r",
result<- purrr::map(URL, URLfunction)
result <- do.call(rbind, result)
Write.csv is last step
If you want write.csv by each URL , plz move in to URLfunction
write.csv(result, file = dynamicname, row.names=FALSE)
List version
URL <- list("https://stackoverflow.com/questions/56139810/how-to-call-a-script-in-another-script-in-r",
result<- purrr::map_df(URL, URLfunction)
asked today yesterday
1 viewed 35 times <NA>
2 active today <NA>
3 viewed <NA> 34 times
4 active <NA> today
URL <- read.csv("PATH",header = FALSE)
result<- purrr::map_df(URL[,1], URLfunction)
asked today yesterday
1 viewed 35 times <NA>
2 active today <NA>
3 viewed <NA> 34 times
4 active <NA> today
Add edited version of your code.
URLfunction<- function(x){
# URL that comes from list
URL <- x
# Load URL
theurl <- RCurl::getURL(URL,.opts = list(ssl.verifypeer = FALSE) )
# Read the tables
tables <- XML::readHTMLTable(theurl)
# Create a list
tables <- rlist::list.clean(tables, fun = is.null, recursive = FALSE)
# Convert the list to a data frame
df <- do.call(rbind,tables)
# Split date and time column out
df2 <- tidyr::separate(df, "Date / Time", c("Date", "Time"), sep = " ")
# Fill the missing column with text, in this case shapename
shapename <- unlist(qdapRegex::ex_between(URL, "ndxs", ".html"))
# qdapRegex::ex_between returns list type, when it added to df2 it couldn't be saved.
# So i added 'unlist'
df2$Shape <- shapename
# Save dataframe out as a csv file
write.csv(df2, paste0(shapename, '.csv'), row.names=FALSE)
# Here are two error.
# First, You maked the data named 'df2' not 'result'. So i changed result -->df2
# Second, row.names is not the 'paste0' attributes, it is 'write.csv's attributes.
After defining above function,
URL = c("nuforc.org/webreports/ndxsRectangle.html",
RESULT = purrr::map_df(URL, URLfunction) ## Also tried purrr::map_df(URL[,1], URLfunction)
Finally, i get the result below
1. Rectangle.csv, Round.csv files on your desktop(Saved path).
2. Returning row binded data frame looks like below (2011 x 8)
> RESULT[1,]
Date Time City State Shape Duration
1 5/2/19 00:20 Honolulu HI Rectangle 3 seconds
1 Several of rectangles connected in different LED like colors. Such as red, green, blue, etc. ;above Waikiki. ((anonymous report))
1 5/9/19

How to summarize by Quarter in R

I am having some difficulties on summarizing data from my database in R. I am looking to pull the data and have it summarized by Quarter.
Below is the code i am using to get a txt output but I am getting errors.
What do I need to do to manipulate the code to run this so that I can have the data be summarized by quarter?
library(data.table, warn.conflicts = FALSE)
library(lubridate, warn.conflicts = FALSE)
# Set path of major source folder for raw transaction data
in_directory <- "C:/Users/name/Documents/Raw Data/"
# List names of sub-folders (currently grouped by first two characters of
in_subfolders <- list("AA-CA", "CB-HZ", "IA-IL", "IM-KZ", "LA-MI", "MJ-MS",
"MT-NV", "NW-OH", "OI-PZ", "QA-TN", "TO-UZ",
"VA-WA", "WB-ZZ")
# Set location for output
out_directory <- "C:/Users/name/Documents/YTD Master/"
out_filename <- "NEW.csv"
# Set beginning and end of date range to be collected - year-month-day format
date_range <- interval(as.Date("2018-01-01"), as.Date("2018-05-31"))
# Enable or disable filtering of raw files to only grab items bought within
certain months to save space.
# If false, all files will be scanned for unique items, which will take
longer and be a larger file.
date_filter <- TRUE
## CODE ##
starttime <- Sys.time()
mastertable <- NULL
for (j in 1:length(in_subfolders)) {
subfolder <- in_subfolders[j]
sub_directory <- paste0(in_directory, subfolder, "/")
in_filenames <- dir(sub_directory, pattern =".txt")
for (i in 1:length(in_filenames)) {
# Default value provided for when fast filtering is disabled.
read_this_file <- TRUE
# To fast filter the data, we choose to include or exclude an entire file
based on the date of its first line.
# WARNING: This is only a valid method if filtering by entire months,
since that is the amount of data housed in each file.
if (date_filter) {
temptable <- fread(paste0(sub_directory, in_filenames[i]),
colClasses=c(CUSTOMER_TIER = "character"),
na.strings = "", nrows = 1)
temptable[, INVOICE_DT := as.Date(INVOICE_DT)]
# If date matches, set read flag to TRUE. If date does not match, set
read flag to FALSE.
read_this_file <- temptable[, INVOICE_DT] %within% date_range
if (read_this_file) {
print(paste0("Reading in ", in_filenames[i]))
temptable <- fread(paste0(sub_directory, in_filenames[i]), colClasses=c(CUSTOMER_TIER = "character"),
na.strings = "")
temptable <- temptable[, lapply(.SD, sum), by = quarter(INVOICE_DT),
.SDcols = c("INV_ITEM_ID","Ext Sale", "Ext Total Cost", "CE100", "CE110","CE120","QTY_SOLD","PACKSLIP_WHSL")]
# Combine into full list
mastertable <- rbindlist(list(mastertable, temptable), use.names = TRUE)
# Release unneeded memory
# Save Final table
print("Saving master table")
fwrite(mastertable, paste0(out_directory, out_filename))
After running this scrip the below is the error message i receive.
Error in gsum(INV_ITEM_ID) :
Type 'character' not supported by GForce sum (gsum). Either add the prefix base::sum(.) or turn off GForce optimization using options(datatable.optimize=1)
Here is the general approach with some generic data.
data.frame(date = seq(as.Date('2010-01-12'), as.Date('2018-02-03'), by = 100),
var = runif(30)) %>%
group_by(quarter(date, with_year = T)) %>%
summarize(average_var = mean(var))
you can leave out the "with_year = T" if you don't care about the differences between years.

R - cut a specific column from multiple files and bind them altogether

I have multiple files (30, tab delimited) that look like the one below:
|target_id | length| eff_length| est_counts| tpm|
|LmjF.27.1250 | 966| 823.427| 2932| 94.7314|
|LmjF.09.0430 | 1410| 1267.430| 3603| 75.6304|
|LmjF.13.0210 | 2001| 1858.430| 4435| 63.4897|
|LmjF.28.0530 | 4083| 3940.430| 7032| 47.4778|
|LmjF.16.1400 | 591| 448.577| 1163| 68.9761|
|LmjF.29.2570 | 1506| 1363.430| 11135| 217.2770|
I am trying to cut the fifth column from all of these files 30 files with a command such as:
fifth_colum_file1 = file1.csv[ , 5]
But I want to make the process more automatised.
The files that I want to work with have all the pattern "bs_abundance", therefore I think a good starting point would be to either load all the files I want to work with with such a command:
temp = list.files(pattern="*bs_abundance")
Or perhaps I can also load all the tables I want to work with directly into the working space already:
for(i in temp) {
x <- read.table(i, header=TRUE, comment.char = "A", sep="\t")
Then, as explained, I want to cut the fifth column of each of the files to later bind them all to another table of same number of rows.
Put the files into a folder. For this example let's call it temp. Set your working directory appropriately or specify the full path for the example below.
cols <- as.character()
files <- dir("temp")
for(i in files){
# You didn't mention a file type, but let's say it's csv
tmp <- read.csv(files[i], header = T)
tmp <- tmp[, 5]
cols <- cbind(cols, tmp)
Then you can just cbind the columns in cols with your final data object.
Here is a method using lapply that assumes each file in the folder has the same number of rows.
# get file names
files <- dir("temp")
# remove one file
files <- files[-which(files == "removeFileName")]
# get list of vectors from 29 files
myList <- lapply(files, function(i) {temp <- read.csv(i); temp[, 5]})
# get new data.frame
dfDone <- do.call(data.frame, myList)

Web Scraping Yahoo Finance in R (with R Vest)

I am trying to use R vest to webscrape the NASDAQ closing dates for the last 3 months so I can play around with the data.
Problem being I cant seem to find the correct xpath for it to return the table. I've tried quite a few using chrome's 'inspect element' to find xpaths as well as 'SelectorGadget' plug-in for chrome.
It seems most people have done this with python but I am much more comfortable in R and specifically using R vest for web scraping so i'm hoping i'm not alone!
I've posted my code below. I believe the problem is in identifying the xpath.
Here is an example of one of the webpages...http://finance.yahoo.com/q/hp?s=CSV
After I get one to work I hope to put it in a loop which is below my problem code....
Thank you!
#Problem Code
company <- 'CSV'
url <- paste("http://finance.yahoo.com/q/hp?s=",toString(company),sep="")
url <-html(url)
select_table <- '//table' #this is the line I think is incorrect
fnames <- html_nodes(url, xpath=select_table) %>% html_table(fill=TRUE)
STOCK <- fnames[[1]]
#Loop for use later
companylist <- read.csv('companylist.csv') #this is a list of all company tickers in the NASDAQ
STOCK <- data.frame()
STOCKS <- data.frame(Date=character(),Open=character(),High=character(),Low=character(),Close=character(),Volume=character(), AdjClose=character())
for (i in 1:3095) {
company <- companylist[i,1]
url <- paste("http://finance.yahoo.com/q/hp?s=",toString(company),sep="")
url <-html(url)
select_table <- '//*[#id="yfncsumtab"]/tbody/tr[2]/td[1]/table[4]'
fnames <- html_nodes(url,xpath = select_table) %>% html_table(fill=TRUE)
STOCK <- fnames[[1]]
Do you want to grab stock prices?
# assumes codes are known beforehand
codes <- c("ABT", "ABBV", "ACE", "ACN", "ACT", "ADBE", "ADT", "AES", "AET", "AFL", "AMG", "A", "GAS", "APD", "ARG", "AKAM", "AA")
urls <- paste0("http://www.google.com/finance/historical?q=NASDAQ:",
paths <- paste0(codes,"csv")
missing <- !(paths %in% dir(".", full.name = TRUE))
# simple error handling in case file doesn't exists
downloadFile <- function(url, path, ...) {
# remove file if exists already
if(file.exists(path)) file.remove(path)
# download file
download.file(url, path, ...), error = function(c) {
# remove file if error
if(file.exists(path)) file.remove(path)
# create error message
c$message <- paste(substr(path, 1, 4),"failed")
# wrapper of mapply
Map(downloadFile, urls[missing], paths[missing])
You can try this as well . . .
