I am trying to download the XBRL data from the SEC site, using the finstr package in R.
The vignette references Apple financial statements from 2013-14. I am going after Abbott (CIK 1800) for mine. I've looked through the data records on the SEC site and the submission is in this folder:
https://www.sec.gov/Archives/edgar/data/1800/000110465920023904
The Apple xml file is named aapl-20140927.xml (the CIK followed by the submission date). I've gone into the file through a browser and identified the relevant data.
The Abbott xml file that has the same info is named abt-20191231x10k59d41b_htm.xml , again with the relevant data.
Following the vignette, I've added this code:
xbrl_url2020 <- "https://www.sec.gov/Archives/edgar/data/1800/000110465920023904/abt-20191231x10k59d41b_htm.xml"
xbrl_url2019 <-
"https://www.sec.gov/Archives/edgar/data/1800/000104746919000624/abt-20181231.xml"
old_o <- options(stringsAsFactors = FALSE)
xbrl_data_aapl2020 <- xbrlDoAll(xbrl_url2020)
This then returns:
Error in fileFromCache(file) :
Error in download.file(file, cached.file, quiet = !verbose) :
cannot open URL 'https://www.sec.gov/Archives/edgar/data/1800/000110465920023904/https://xbrl.sec.gov/dei/2019/dei-2019-01-31.xsd'
In addition: Warning message:
In download.file(file, cached.file, quiet = !verbose) :
cannot open URL 'https://www.sec.gov/Archives/edgar/data/1800/000110465920023904/https://xbrl.sec.gov/dei/2019/dei-2019-01-31.xsd': HTTP status was '404 Not Found'
I've read through other submissions here and not sure whether it's a schema issue, whether I've gone after the wrong file (there's no other in the folder which has the info in totality) or whether it's something else.
Also I've noticed a comment saying the datasets on the SEC site https://www.sec.gov/dera/data/financial-statement-data-sets.html have all the relevant info. The issue with the sets is that they are the submitted data and not the ratified so may be different to the final results published.
Appreciate any help possible.
Related
I need the data of the .rdata file for text mining. These are my dataset. I don't know exactly what's in that file. The problem is i can't load it.
I tried to open the file with different windows computers but with the same errors. I used RStudio in the updated version. I google the error-Information but nothing worked. Because I can open other rdata files there should be no registry problem. I wanted to check in an other basic windows Editor to look what is in the file but there were only signs like: ‹ ìùuPo³6
’#p ÜÝ‚»»»[pw—…»»»»»†àînÁuA°E`!‡ßûîýí}æÌLMÕÌùæŸÝõTñÈ}÷ÝrõÕ½
¢hXˆ
I tried different possibilities to open the file in RStudio with different error informations as followed:
with load()
require("readr")
setwd("C:/Users/..")
options(stringsAsFactors = F)
load("file")
# Error in load("file") :
# bad restore file magic number (file may be corrupted) -- no data loaded
# In addition: Warning message:
# file ‘.rdata’ has magic number ''
# Use of save versions prior to 2 is deprecated
with source()
require("readr")
setwd("C:/Users/..")
options(stringsAsFactors = F)
source("file")
# Error in source("file") :
# file.rdata:1:1: unexpected input
# 1:
# ^
readRDS
setwd("C:/Users/..")
options(stringsAsFactors = F)
readRDS("file")
# Error in readRDS("file") : unknown input format
I'm trying to download financial statements in R using a package at:
Financial statements in R
I'm trying to modify the example in their read me for other companies. I have tried to download the last two Tesla Q's.
The code I modified so far is:
xbrl_url2017Q3 <- "https://www.sec.gov/Archives/edgar/data/1318605/000156459018026353/tsla-20180930.xml"
xbrl_url2017Q2 <- "https://www.sec.gov/Archives/edgar/data/1318605/000156459018019254/tsla-20180630.xml"
old_o <- options(stringsAsFactors = FALSE)
xbrl_data_tsla2017Q3 <- xbrlDoAll(xbrl_url2017Q3)
Error from the line above is:
Error in fileFromCache(file) :
Error in download.file(file, cached.file, quiet = !verbose) :
cannot open URL 'https://www.sec.gov/Archives/edgar/data/1318605/000156459018026353/https://xbrl.sec.gov/dei/2018/dei-2018-01-31.xsd'
In addition: Warning message:
In download.file(file, cached.file, quiet = !verbose) :
cannot open URL 'https://www.sec.gov/Archives/edgar/data/1318605/000156459018026353/https://xbrl.sec.gov/dei/2018/dei-2018-01-31.xsd': HTTP status was '403 Forbidden'
xbrl_data_tsla2017Q2 <- xbrlDoAll(xbrl_url2017Q2)
options(old_o)
tsla2017Q3 <- xbrl_get_statements(xbrl_data_tsla2017Q3)
tsla2017Q2 <- xbrl_get_statements(xbrl_data_tsla2017Q2 )
tsla2017Q2
balance_sheet2017Q2 <- tsla2017Q2$StatementOfFinancialPositionClassified
balance_sheet2017Q3<- tsla2017Q3$StatementOfFinancialPositionClassified
income2017Q2 <- tsla2017Q2$StatementOfIncome
income2017Q3 <- tsla2017Q3$StatementOfIncome
balance_sheet2017Q3
Returns "NULL"
See the 10-Q at tesla's SEC fillings.
The last 10-Q.
Any recommendations on how I can go about this?
I'm looking to download the financial data to play around it with and would like it in tidy formate.
This is a common problem with the XBRL package where not all XML schemas are downloaded in the cache for some SEC filings. Download the missing schema in your cache folder and retry the xbrlDoAll call - it should work this time.
I am attempting to use the streamR in R to download and analyze Twitter, under the pretense that this library can overcome the limitations from the twitteR package.
When downloading data everything seems to work fabulously, using the filterStream function (just to clarify, the function captures Twitter data, just running it will provide the json file -saved in the working directory- that needs to be used in further steps):
filterStream( file.name="tweets_test.json",
track="NFL", tweets=20, oauth=credential, timeout=10)
Capturing tweets...
Connection to Twitter stream was closed after 10 seconds with up to 21 tweets downloaded.
However, when moving on to parse the json file, I keep getting all sorts of errors:
readTweets("tweets_test.json", verbose = TRUE)
0 tweets have been parsed.
list()
Warning message:
In readLines(tweets) : incomplete final line found on 'tweets_test.json'
Or with this function from the same package:
tweet_df <- parseTweets(tweets='tweets_test.json')
Error in `$<-.data.frame`(`*tmp*`, "country_code", value = NA) :
replacement has 1 row, data has 0
In addition: Warning message:
In stream_in_int(path.expand(path)) : Parsing error on line 0
I have tried reading the json file with jsonlite and rjson with the same results.
Originally, it seemed that the error came from special characters ({, then \) within the json file that I tried to clean up following the suggestion from this post, however, not much came out of it.
I found out about the streamR package from this post, which shows the process as very straight forward and simple (which it is, except for the parsing part!).
If any of you have experience with this library and/or these parsing issues, I'd really appreciate your input. I have been searching non stop but haven't been able to locate a solution.
Thanks!
I found an answer to my own question (see below). Still need help.
In the same package, quantmod, there is an option called getSymbol.google.
Nevertheless,
If I use it to get Microsoft value, for example, it works all right
getSymbols.google('MSFT', environment() , src="google", from = (Sys.Date() - 1))
[1] "MSFT"
But, I can´t make it work on a currency pair;
getSymbols.google("GBPUSD", environment() , src="google", from = (Sys.Date() - 1))
Error in download.file(paste(google.URL, "q=", Symbols.name, "&startdate=", :
cannot open URL 'http://finance.google.com/finance/historical?q=GBPUSD&startdate=Nov+02,+2017&enddate=Nov+03,+2017&output=csv'
In addition: Warning message:
In download.file(paste(google.URL, "q=", Symbols.name, "&startdate=", :
cannot open URL 'http://finance.google.com/finance/historical?q=GBPUSD&startdate=Nov+02,+2017&enddate=Nov+03,+2017&output=csv': HTTP status was '400 Bad Request'
Any ideas?
Good morning,
Since the 1ts of November i´m having trouble with the function getQuote from Yahoo. Is a function inside the package "quantmod", which uses yahoo API to request the information.
The description of the function is as follows; Fetch current stock quote(s) from specified source. At present this only handles sourcing quotes from Yahoo Finance, but it will be extended to additional sources over time.
In r, i´m getting the following error; "HTTP status was '403 Forbidden'"
I´ve look on my browser and the error comes from the following error in Yahoo web page "Fetch current stock quote(s) from specified source. At present this only handles sourcing quotes from Yahoo Finance, but it will be extended to additional sources over time."
Does anybody know how to solve ir, or, any alternatives to the function getQuote()
Here is an example from RStudio
getQuote("AAPL")
Error in download.file(paste("https://finance.yahoo.com/d/quotes.csv?s=", :
cannot open URL 'https://finance.yahoo.com/d/quotes.csv?s=AAPL&f=d1t1l1c1p2ohgv'
In addition: Warning message:
In download.file(paste("https://finance.yahoo.com/d/quotes.csv?s=", :
cannot open URL 'https://finance.yahoo.com/d/quotes.csv?s=AAPL&f=d1t1l1c1p2ohgv': HTTP status was '403 Forbidden'
Thanks
seems that yahoo has discontinued this service. Anyone aware of a alternative for yahoo (I'd rather not have to webscrape yahoo for this)
rob
I ran into the same problem... it's kludgey but as a workaround to get the end-of-day value, I have found this to work for now:
Instead of getQuote() to get the Last price (which doesn't seem to work from Yahoo anymore):
underlying<-"AAPL"
quote.last <-getQuote(underlying)$Last
I use "getSymbols" which still works-- throws it into a new data frame, and I pull out the value I want from that:
Hx<-getSymbols(underlying,from=Sys.Date()-1) # allows me to not have to retain the ticker name if I do this across many tickers
quote.last<-as.double(tail(Cl(get(Hx)),1)) # Closing price value from last row of data
rm(list=Hx) # throw away the temporary data frame with quote history
I'm sure the's a more elegant way to do it, but this is what fell out of my brain as a quick workaround that got it done... sadly that doesn't get things like the Bid and Ask that getQuote does.
I'm working for the first time with the R package BerkeleyEarth, and attempting to use its convenience functions to access the BEST data. I think maybe it's just a problem with their servers (a matter I've separately addressed to the package's maintainer) but I wanted to know if it's instead something silly I'm doing.
To reproduce my fault
library(BerkeleyEarth)
downloadBerkeley()
which provides the following error message
trying URL 'http://download.berkeleyearth.org/downloads/TAVG/LATEST%20-%20Non-seasonal%20_%20Quality%20Controlled.zip'
Error in download.file(urls$Url[thisUrl], destfile = file.path(destDir, :
cannot open URL 'http://download.berkeleyearth.org/downloads/TAVG/LATEST%20-%20Non-seasonal%20_%20Quality%20Controlled.zip'
In addition: Warning message:
In download.file(urls$Url[thisUrl], destfile = file.path(destDir, :
InternetOpenUrl failed: 'A connection with the server could not be established'
Has anyone had a better experience using this package?
The error message is pointing to a different URL than one should get judging what URLs are listed at http://berkeleyearth.org/data/ that point to the zip formatted files. There are another set of .nc files that appear to be more recent. I would replace the entries in the BerkeleyUrls dataframe with the ones that match your analysis strategy:
This is the current URL that should be in position 1,1:
http://berkeleyearth.lbl.gov/downloads/TAVG/LATEST%20-%20Non-seasonal%20_%20Quality%20Controlled.zip
And this is the one that is in the package dataframe:
> BerkeleyUrls[1,1]
[1] "http://download.berkeleyearth.org/downloads/TAVG/LATEST%20-%20Non-seasonal%20_%20Quality%20Controlled.zip"
I suppose you could try:
BerkeleyUrls[, 1] <- sub( "download\\.berkeleyearth\\.org", "berkeleyearth.lbl.gov", BerkeleyUrls[, 1])