I'm trying to download financial statements in R using a package at:
Financial statements in R
I'm trying to modify the example in their read me for other companies. I have tried to download the last two Tesla Q's.
The code I modified so far is:
xbrl_url2017Q3 <- "https://www.sec.gov/Archives/edgar/data/1318605/000156459018026353/tsla-20180930.xml"
xbrl_url2017Q2 <- "https://www.sec.gov/Archives/edgar/data/1318605/000156459018019254/tsla-20180630.xml"
old_o <- options(stringsAsFactors = FALSE)
xbrl_data_tsla2017Q3 <- xbrlDoAll(xbrl_url2017Q3)
Error from the line above is:
Error in fileFromCache(file) :
Error in download.file(file, cached.file, quiet = !verbose) :
cannot open URL 'https://www.sec.gov/Archives/edgar/data/1318605/000156459018026353/https://xbrl.sec.gov/dei/2018/dei-2018-01-31.xsd'
In addition: Warning message:
In download.file(file, cached.file, quiet = !verbose) :
cannot open URL 'https://www.sec.gov/Archives/edgar/data/1318605/000156459018026353/https://xbrl.sec.gov/dei/2018/dei-2018-01-31.xsd': HTTP status was '403 Forbidden'
xbrl_data_tsla2017Q2 <- xbrlDoAll(xbrl_url2017Q2)
options(old_o)
tsla2017Q3 <- xbrl_get_statements(xbrl_data_tsla2017Q3)
tsla2017Q2 <- xbrl_get_statements(xbrl_data_tsla2017Q2 )
tsla2017Q2
balance_sheet2017Q2 <- tsla2017Q2$StatementOfFinancialPositionClassified
balance_sheet2017Q3<- tsla2017Q3$StatementOfFinancialPositionClassified
income2017Q2 <- tsla2017Q2$StatementOfIncome
income2017Q3 <- tsla2017Q3$StatementOfIncome
balance_sheet2017Q3
Returns "NULL"
See the 10-Q at tesla's SEC fillings.
The last 10-Q.
Any recommendations on how I can go about this?
I'm looking to download the financial data to play around it with and would like it in tidy formate.
This is a common problem with the XBRL package where not all XML schemas are downloaded in the cache for some SEC filings. Download the missing schema in your cache folder and retry the xbrlDoAll call - it should work this time.
Related
I am working with a couple of R packages for genetic pathway enrichment analyses and the two packages that I am using are now throwing errors when trying to connect to each package's respective server for downloading the reference data for the analysis.
In the first package gage, I am getting the following error when attempting to download:
library(gage)
> kg.ko = kegg.gsets("ko") # ("ko" is KEGG ortholog pathway)
Error in curl::curl_fetch_memory(url, handle = handle) :
Failure when receiving data from the peer
In the second package clusterProfiler, I am getting the following error:
library(clusterProfiler)
# the data
dput(head(de_kegg_chr))
c("K14847", "K19009", "K00078", "K21407", "K23285", "K06972")
# KEGG enrichment (which will pull relevant reference data during this step)
# over-representation analysis (fisher's)
> enrich <- enrichKEGG(gene = de_kegg_chr,
+ organism = "ko",
+ keyType='kegg',
+ pvalueCutoff = 0.01)
Reading KEGG annotation online:
fail to download KEGG data...
Error in download.KEGG.Path(species) :
'species' should be one of organisms listed in 'http://www.genome.jp/kegg/catalog/org_list.html'...
In addition: Warning message:
In utils::download.file(url, quiet = TRUE, method = method, ...) :
URL 'https://rest.kegg.jp/link/ko/pathway': status was 'Failure when receiving data from the peer'
After the first error, I thought it was something specific to the gage package and found a simple work-around because these data are downloaded from the server prior to the analysis function.
This is more of a problem with the second package because the reference data are downloaded within the function that conducts the analysis.
Now that this is happening with more than one package (both of these scripts were working perfectly before yesterday), I'm thinking it is something systematic within R or R studio.
I am trying to download the XBRL data from the SEC site, using the finstr package in R.
The vignette references Apple financial statements from 2013-14. I am going after Abbott (CIK 1800) for mine. I've looked through the data records on the SEC site and the submission is in this folder:
https://www.sec.gov/Archives/edgar/data/1800/000110465920023904
The Apple xml file is named aapl-20140927.xml (the CIK followed by the submission date). I've gone into the file through a browser and identified the relevant data.
The Abbott xml file that has the same info is named abt-20191231x10k59d41b_htm.xml , again with the relevant data.
Following the vignette, I've added this code:
xbrl_url2020 <- "https://www.sec.gov/Archives/edgar/data/1800/000110465920023904/abt-20191231x10k59d41b_htm.xml"
xbrl_url2019 <-
"https://www.sec.gov/Archives/edgar/data/1800/000104746919000624/abt-20181231.xml"
old_o <- options(stringsAsFactors = FALSE)
xbrl_data_aapl2020 <- xbrlDoAll(xbrl_url2020)
This then returns:
Error in fileFromCache(file) :
Error in download.file(file, cached.file, quiet = !verbose) :
cannot open URL 'https://www.sec.gov/Archives/edgar/data/1800/000110465920023904/https://xbrl.sec.gov/dei/2019/dei-2019-01-31.xsd'
In addition: Warning message:
In download.file(file, cached.file, quiet = !verbose) :
cannot open URL 'https://www.sec.gov/Archives/edgar/data/1800/000110465920023904/https://xbrl.sec.gov/dei/2019/dei-2019-01-31.xsd': HTTP status was '404 Not Found'
I've read through other submissions here and not sure whether it's a schema issue, whether I've gone after the wrong file (there's no other in the folder which has the info in totality) or whether it's something else.
Also I've noticed a comment saying the datasets on the SEC site https://www.sec.gov/dera/data/financial-statement-data-sets.html have all the relevant info. The issue with the sets is that they are the submitted data and not the ratified so may be different to the final results published.
Appreciate any help possible.
I'm trying to get TRMM data from NASA OPenDAP server using the raster package in R. Initially I had some difficulty regarding authentication, but that issue was resolved.
NASA OPenDAP server publishes TRMM 3B42_daily data as individual files, one for each day and an aggregated annual data (using ncml). So, my problem now is that, using R raster package and the authentication files .dodsrc and .netrc I can download individual NetCDF files but I can't download the aggregated data.
So, this works:
library(raster)
single_date_opendap <- 'https://disc2.gesdisc.eosdis.nasa.gov:443/opendap/TRMM_L3/TRMM_3B42_Daily.7/2002/04/3B42_Daily.20020405.7.nc4'
test <- stack(single_date_opendap, varname = 'precipitation')
This doesn't:
library(raster)
url_opendap_no_brkt <- 'https://disc2.gesdisc.eosdis.nasa.gov:443/opendap/ncml/aggregation/TRMM_3B42_Daily.7/TRMM_3B42_daily.7_Aggregation_2001.ncml'
test <- stack(url_opendap_no_brkt, varname = 'precipitation')
And gives me the error message:
Error in .local(.Object, ...) :
An error occurred while creating a virtual connection to the DAP server:
Error while reading the URL: https://disc2.gesdisc.eosdis.nasa.gov:443/openda
p/ncml/aggregation/TRMM_3B42_Daily.7/TRMM_3B42_daily.7_Aggregation_2001.ncml.
ver.
The OPeNDAP server returned the following message:
Unauthorized: Contact the server administrator.
Error in .rasterObjectFromFile(x, band = band, objecttype = "RasterLayer",
Cannot create a RasterLayer object from this file. (file does not exist)
Is it possible to get data from a OPenDAP server that publishes aggregated data?
After some exchange with NASA support and with Antonio's tip, found out that R raster package will not work with the aggregated datasets. But ncdf4::nc_open is able to handle it. Strange because, from what I understand, raster package calls nc_open in the background.
Anyway, this works:
library(ncdf4)
url_opendap <- 'https://disc2.gesdisc.eosdis.nasa.gov:443/opendap/ncml/aggregation/TRMM_3B42_Daily.7/TRMM_3B42_daily.7_Aggregation_2001.ncml'
trmm <- nc_open(url_opendap)
and this doesn't
library(raster)
url_opendap <- 'https://disc2.gesdisc.eosdis.nasa.gov:443/opendap/ncml/aggregation/TRMM_3B42_Daily.7/TRMM_3B42_daily.7_Aggregation_2001.ncml'
trmm <- stack(url_opendap, varname = "precipitation")
I'm trying to read API data from the BLS into R. I am using the Version 1.0 that does not require registration and is open for public use.
Here is my code:
url <-"http://api.bls.gov/publicAPI/v1/timeseries/data/LAUCN040010000000005"
raw.data <- readLines(url, warn = F)
library(rjson)
rd <- fromJSON(raw.data)
And here is the error message I receive:
Error in fromJSON(raw.data) : incomplete list
If I just try to go to the url in my webrowser it seems to work (pull up a JSON webpage). Not really sure what is going on when I try to get this into R.
When you've used readLines, the object returned is a vector of length 4:
length(raw.data)
You can look at the individual pieces via:
raw.data[1]
If you stick the pieces back together using paste
fromJSON(paste(raw.data, collapse = ""))
everything works. Alternatively,
jsonlite::fromJSON(url)
I refered to the link given below for doing sentiment analysis
http://heuristically.wordpress.com/2011/04/08/text-data-mining-twitter-r/
And when I ran the code that is given below :`for (page in c(1:15)){
# search parameter
twitter_q <- URLencode('#prolife OR #prochoice')
twitter_url =
# fetch remote URL and parse
mydata.xml <- xmlParseDoc(twitter_url, asText=F)
# extract the titles
mydata.vector <- xpathSApply(mydata.xml, '//s:entry/s:title', xmlValue, namespaces =c('s'='http://www.w3.org/2005/Atom'))
# aggregate new tweets with previous tweets
mydata.vectors <- c(mydata.vector, mydata.vectors)
}
After running the code it is prompting me for an error
Error:Error in UseMethod("xpathApply") :
no applicable method for 'xpathApply' applied to an object of class "NULL"
I/O warning : failed to load HTTP resource
I installed the packages Roath,stringr,XML,plyr which was required.And I am using R Ver 3.0.3
Kindly help me out pleas how to go about it . I am struggling for this . It would be a great help if anyone guides me properly in right direction.