I'm trying to use geocode to get the latitudes and longitudes of city names - I had been able to successfully use geocode on full addresses that combined street addresses and city names, but wanted to try to get a couple more values from the observations that geocode had missed the first time by only passing in the city name rather than the full address.
I currently have the code
addresses_pt_2 <- tibble("address_2" = lat_longs_na$city_country)
addresses_pt_2$address_2 <- as.character(addresses_pt_2$address_2)
where lat_longs_na is a dataframe containing just the city for each address that geocode had returned as NA the first time. I even then made the address feature a character just in case. However, when I run
lat_longs_pt_2 <- addresses_pt_2 %>%
geocode(address_2, method = 'osm')
I get the following error: "Error in geocode(., address_2, method = "osm") : is.character(location) is not TRUE"
I've seen that there are posts about this but haven't been able to find anything that fixes this - I tried making it a dataframe instead of a tibble and adding in stringsAsFactors = FALSE, I've updated the package - nothing works, and I don't understand why when geocode worked the first time using the same formatting.
Glyphosate has hogged the limelight recently due to listing as a possible carcinogen by the World Health Organization's IARC. I got curious about pesticide usage patterns in the US and plotting this data with interactive maps in R-Shiny, using leaflet, for example.
County-level estimates of pesticide usage can be found here: https://water.usgs.gov/nawqa/pnsp/usage/maps/county-level/
The data are reported using State/County FIPS codes. I require lat-long coordinates to show the data.
It seems possible to go from lat-long to FIPS quite easily, as illustrated by this API here: https://geo.fcc.gov/api/census/
How to go in the reverse direction?
The solution I found required using a REST API from here.com of the five options (below). I first cross-referenced FIPS codes from USGS table with County and State names using table fips_codes from library(tigris). This gave me names to put together in address lines, like Boulder County, CO. Next, I wrote a small function here_now with sample usage as:
here_now("Boulder+County,+CO") # $lat: 40.08791; $lon: -105.3447
Implementation is a call to the REST API using fromJSON from library(jsonlite)
here_now <- function(searchtext) {
AppCode <- getOption("hereAppCode")
AppID <- getOption("hereAppID")
rootURL <- "https://geocoder.api.here.com/6.2/geocode.json?"
app_id = paste("app_id", AppID, sep="=")
app_code = paste("app_code", AppCode, sep="=")
searchtext = paste("searchtext", searchtext, sep="=")
request <- paste(paste(rootURL, app_id, sep=''), app_code, searchtext, sep="&")
response = fromJSON(request)
res <- list()
res$lat <- response$Response$View$Result[[1]]$Location$NavigationPosition[[1]]$Latitude
res$lon <- response$Response$View$Result[[1]]$Location$NavigationPosition[[1]]$Longitude
res
}
Further, I used the FCC's reverse geo-coding API to validate: https://geo.fcc.gov/api/census/
Options I experimented with for geocoding included:
- google APIs via ggmap (requires API key, requires credit card)
- mapquest API (requires API key, no credit card needed)
- Data Science Toolkit's RDSK implementation
- Geonames service via eponymous R package
- Here APIs (require AppID and AppCode, freemium model)
I would like to fetch the real names of all the contributors to a list of github repos. I can do it but it takes a series of looping calls to the github API. If I want to do, say the top 100 R packages, my approach would take 2100 calls to the API(assuming 20 contributors per repo). If I undersestimate the number of contributors I might hit the rate limits for authorized users. I would like to know if there is an approach that requires fewer calls to the API. Below is the what I think is the "hard" way to do it. This example just gets one contributor from one repo, for illustration, with no iterating through lists.
library(dplyr)
library(jsonlite)
#this would actually be a list of multiple repos
repo_name <- "dplyr"
# we don't know the github username associated with the package so construct a search
# to get the most likely candidate
search_url <- paste0("https://api.github.com/search/repositories?q=",
repo_name,
".%20is:name+language:r&sort=stars&order=desc")
# first api call. would need to would need to loop/map/apply over multiple contributors repo names
all_repos<-jsonlite::read_json(search_url,simplifyVector = TRUE)
# assume first repo is target. Hope so.
target_repo<-all_repos$items$full_name[1]
search_url <- paste0("https://api.github.com/repos/",
target_repo,
"/contributors")
#second api call
contributors <- jsonlite::read_json(search_url,simplifyVector = TRUE)
#third api call. would need to loop/map/apply over multiple contributors
real_name <- jsonlite::read_json(contributors$url[1])$name
real_name
#> [1] "Romain François"
Thank you.
Problem
I am given a long list of specific variable codes for the DP05 table - in the census bureau format. For instance:
target_dp05_vars = c(perc_white = "HC03_VC53",
perc_black = "HC03_VC55",
perc_native = "HC03_VC56")
Since tidycensus uses its own variable naming convention, I can't use the above easily. How do I easily crosswalk to the tidycensus definition?
Temporary solution
In the meantime, I've downloaded the bureau file manually and eliminated rows with HC02 and HC04 prefixes to match with tidycensus to create an internal crosswalk (because it's at least positionally correct) but it's tedious.
I'd love to just feed those HCs as a named vector into get_acs() and perhaps just specify the table as DP05.
tidycensus doesn't use its own variable naming convention - it uses variable IDs as specified by the Census API. For example, see https://api.census.gov/data/2017/acs/acs5/profile/variables.html, which is accessible in R with:
library(tidycensus)
dp17 <- load_variables(2017, "acs5/profile", cache = TRUE)
The IDs you've provided appear to be FactFinder codes.
If you want the full DP05 table in one tidycensus call, you can do the following (e.g. for counties in New York) with tidycensus 0.9:
dp05 <- get_acs(geography = "county",
table = "DP05",
state = "NY")
Mapping of variable IDs to their meanings are in turn available with load_variables().
Note: I am getting intermittent server errors with these calls from the API, which may be due to the government shutdown. If it doesn't work at first, try again.
I am interested in analyzing balance sheets and income statements using R. I have seen that there are R packages that pull information from Yahoo and Google Finance, but all the examples I have seen concern historical stock price information. Is there a way I can pull historical information from balance sheets and income statements using R?
I have found on the net only a partial solution to your issue, for I managed to retrieve only the Balance sheet info and financial statement for one year. I don't know how to do it for more years.
There is a package in R, called quantmod, which you can install from CRAN
install.packages('quantmod')
Then you can do the following: Suppose you want to get the financial info from a company listed at NYSE : General Electric. ticker: GE
library(quantmod)
getFinancials('GE')
viewFinancials(GE.f)
To get only the income statement, reported anually, as a data frame use this:
viewFinancials(GE.f, "IS", "A")
Please let me know if you find out how to do this for multiple years.
The question you want to ask, and get an answer to!, is where can I get free XBRL data for analysing corporate balance sheets, and is there a library for consuming such data in R?
XBRL (Extensible Business Reporting Language - http://en.wikipedia.org/wiki/XBRL) is a standard for marking up accounting statments (income statements, balance sheets, profit & loss statements) in XML format such that they can easily be parsed by computer and put into a spreadsheet.
As far as I know, a lot of corporate regulators (e.g. the SEC in the US, ASIC in Australia) are encouraging the companies under their jurisdiction to report using such a format, or running pilots, but I don't believe it has been mandated at this point. If you limited your investment universe (I am assuming you want this data in electronic format for investment purposes) to firms that have made their quarterly reports freely available in XBRL form, I expect you will have a pretty short list of firms to invest in!
Bloomberg, Reuters et al all have pricey feeds for obtaining corporate fundamental data. There may also be someone out there running a tidy business publishing balance sheets in XBRL format. Cheaper, but still paid for, are XIgnite's xFundamentals and xGlobalFundamentals web services, but you aren't getting full balance sheet data from them.
to read-in the financial information try this function ( I picked it up several months ago and made some small adjustments)
require(XML)
require(plyr)
getKeyStats_xpath <- function(symbol) {
yahoo.URL <- "http://finance.yahoo.com/q/ks?s="
html_text <- htmlParse(paste(yahoo.URL, symbol, sep = ""), encoding="UTF-8")
#search for <td> nodes anywhere that have class 'yfnc_tablehead1'
nodes <- getNodeSet(html_text, "/*//td[#class='yfnc_tablehead1']")
if(length(nodes) > 0 ) {
measures <- sapply(nodes, xmlValue)
#Clean up the column name
measures <- gsub(" *[0-9]*:", "", gsub(" \\(.*?\\)[0-9]*:","", measures))
#Remove dups
dups <- which(duplicated(measures))
#print(dups)
for(i in 1:length(dups))
measures[dups[i]] = paste(measures[dups[i]], i, sep=" ")
#use siblings function to get value
values <- sapply(nodes, function(x) xmlValue(getSibling(x)))
df <- data.frame(t(values))
colnames(df) <- measures
return(df)
} else {
break
}
}
to use it, compare for example 3 companies and write the data into a csv-file do the following:
tickers <- c("AAPL","GOOG","F")
stats <- ldply(tickers, getKeyStats_xpath)
rownames(stats) <- tickers
write.csv(t(stats), "FinancialStats_updated.csv",row.names=TRUE)
Just tried it. Still working.
UPDATE as Yahoo changed it’s web site layout:
The function above does not work anymore as Yahoo again changed its web site layout. Fortunately its still easy to get the financial infos as the tags for getting fundamental data have not been changed.
example for downloading a file with eps and P/E ratio for MSFT, AAPL and Ford insert the following into your browser:
http://finance.yahoo.com/d/quotes.csv?s=MSFT+AAPL+F&f=ser
and after entering the above URL into your browser’s address bar and hitting return/enter. The CSV will be automatically downloaded to your computer and you should get the cvs file as shown below (data as 7/22/2016):
some yahoo tags for fundamental data:
You are making the common mistake of confusing 'access to Yahoo or Google data' with 'everything I see on Yahoo or Google Finance can be downloaded'.
When R functions download historical stock price data, they almost always access an interface explicitly designed for this purpose as e.g. a cgi handler providing csv files given a stock symbol and start and end date. So this easy as all we need to do is form the appropriate query, hit the webserver, fetch the csv file an dparse it.
Now balance sheet information is (as far as I know) not available in such an interface. So you will need to 'screen scrape' and parse the html directly.
It is not clear that R is the best tool for this. I am aware of some Perl modules for the purpose of getting non-time-series data off Yahoo Finance but have not used them.
Taking the last two comments into consideration, you may be able to acquire corporate financial statements economically using EdgardOnline. It isn't free, but is less expensive than Bloomberg and Reuters. Another thing to consider is financial reporting normalization/standardized. Just because two companies are in the same industry and sell similar products does not necessarily mean that if you laid the two companies' income statements or balance sheets side by side, that reporting items would necessarily line up. Compustat has normalized/standardized financial reports.
I don't know anything about R, but assuming that it can call a REST API and consume data in XML form, you can try the Mergent Company Fundamentals API at http://www.mergent.com/servius/ - there's lots of very detailed financial statement data (balance sheets / income statements / cashflow statements / ratios), standardized across companies, going back more than 20 years
I have written a C# program that I think does what you want. It parses the html from nasdaq.com pages. It parses html and creates 1 csv file per stock that includes income statement, cash flow, and balance sheet values going back 5 - 10 years depending on the age of the stock. I am now working to add some analysis calculations (mostly historic ratios at this point). I'm interested in learning about R and it's applications to fundamental analysis. Maybe we can help each other.
I recently found this R package on CRAN. Which does exactly what you are asking I believe.
XBRL: Extraction of business financial information from XBRL documents
You can get all three types of financial statements from Intrinio in R for free. Additionally, you can get as reported statements and standardized statements. The problem with pulling XBRL filings from the SEC is that there is no standardized option, which means you have to manually map financial statement items if you want to do cross equity comparisons. Here is an example:
#Install httr, which you need to request data via API
install.packages("httr")
require("httr")
#Install jsonlite which parses JSON
install.packages("jsonlite")
require("jsonlite")
#Create variables for your usename and password, get those at intrinio.com/login
username <- "Your_API_Username"
password <- "Your_API_Password"
#Making an api call for roic. This puts together the different parts of the API call
base <- "https://api.intrinio.com/"
endpoint <- "financials/"
type <- "standardized"
stock <- "YUM"
statement <- "income_statement"
fiscal_period <- "Q2"
fiscal_year <- "2015"
#Pasting them together to make the API call
call1 <- paste(base,endpoint,type,"?","identifier","=", stock, "&","statement","=",statement,"&","fiscal_period",
"=", fiscal_period, "&", "fiscal_year", "=", fiscal_year, sep="")
# call1 Looks like this "https://api.intrinio.com/financials/standardized?identifier=YUM&statement=income_statement&fiscal_period=Q2&fiscal_year=2015"
#Now we use the API call to request the data from Intrinio's database
YUM_Income <- GET(call1, authenticate(username,password, type = "basic"))
#That gives us the ROIC value, but it isn't in a good format so we parse it
test1 <- unlist(content(YUM_Income, "text"))
#Convert from JSON to flattened list
parsed_statement <- fromJSON(test1)
#Then make your data frame:
df1 <- data.frame(parsed_statement)
I wrote this script to make it easy to change out the ticker, dates, and statement type so you can get the financial statement for any US company for any period.
I actually do this in Google Sheets. I thought it to be the easiest way to do it as well and because it can pull real live data was another bonus point. Lastly it doesn't consume any of my space to save these statements.
=importhtml("http://investing.money.msn.com/investments/stock-income-statement/?symbol=US%3A"&B1&"&stmtView=Ann", "table",0)
where B1 cell contains the ticker.
You can do the same thing for balance sheet, and cash flow as well.
1- Subscribe into yahoo finance api from Rapid Api here
2- Get your key
3- Insert your key in the code:
name="AAPL"
{raw=httr::GET(paste("https://yahoo-finance15.p.rapidapi.com//api/yahoo/qu/quote/",name,"/financial-data", sep = ""),
httr::add_headers("x-rapidapi-host"= "yahoo-finance15.p.rapidapi.com",
"x-rapidapi-key"="insert your Key here")
)
raw=jsonlite::fromJSON(rawToChar(raw$content))
values=sapply(1:length(raw$financialData),function(x){sapply(raw, "[", x)[[1]][1]})
names(values)=names(raw$financialData)
values=as.data.frame(t(values))
row.names(values)=name
}
values
Pros: Easy way to get data
Cons: free version limited into 500 request per month