I am building an API wrapper using httr. The API I'm using doesn't have content within people for the current year, but returns data for the copyright. When I use httr::GET I still get a 200 status code since there is a response.
The response should have data similar to 2019. How do I use httr to throw an error? Is there a warning similar to httr::warn_for_status available?
Example of request that works and returns people, vs. one that doesn't
library(httr)
data <- GET("https://statsapi.mlb.com/api/v1/sports/1/players?season=2019")
# response with data in content. I'll spare everyone the 1,410 rows in the JSON Response
str(content(data, type= "application/json"), list.len = 2)
#> List of 2
#> $ copyright: chr "Copyright 2020 MLB Advanced Media, L.P. Use of any content on this page acknowledges agreement to the terms po"| __truncated__
#> $ people :List of 1410
#> ..$ :List of 36
#> .. ..$ id : int 472551
#> .. ..$ fullName : chr "Fernando Abad"
#> .. .. [list output truncated]
#> ..$ :List of 35
#> .. ..$ id : int 650556
#> .. ..$ fullName : chr "Bryan Abreu"
#> .. .. [list output truncated]
#> .. [list output truncated]
no_data <- GET("https://statsapi.mlb.com/api/v1/sports/1/players?season=2020")
str(content(no_data, type= "application/json"))
#> List of 2
#> $ copyright: chr "Copyright 2020 MLB Advanced Media, L.P. Use of any content on this page acknowledges agreement to the terms po"| __truncated__
#> $ people : list()
The alternative I've used is to parse the data and then use nrow(df) < 1. There has to be a better way.
Related
I was wondering if there is a way to automatically pull the Russell 3000 holdings from the iShares website in R using the read_html (or rvest) function?
url: https://www.ishares.com/us/products/239714/ishares-russell-3000-etf
(all holdings in the table on the bottom, not just top 10)
So far I have had to copy and paste into an Excel document, save as a CSV, and use read_csv to create a tibble in R of the ticker, company name, and sector.
I have used read_html to pull the SP500 holdings from wikipedia, but can't seem to figure out the path I need to put in to have R automatically pull from iShares website (and there arent other reputable websites I've found with all ~3000 holdings). Here is the code used for SP500:
read_html("https://en.wikipedia.org/wiki/List_of_S%26P_500_companies")%>%
html_node("table.wikitable")%>%
html_table()%>%
select('Symbol','Security','GICS Sector','GICS Sub Industry')%>%
as_tibble()
First post, sorry if it is hard to follow...
Any help would be much appreciated
Michael
IMPORTANT
According to the Terms & Conditions listed on BlackRock's website (here):
Use any robot, spider, intelligent agent, other automatic device, or manual process to search, monitor or copy this Website or the reports, data, information, content, software, products services, or other materials on, generated by or obtained from this Website, whether through links or otherwise (collectively, "Materials"), without BlackRock's permission, provided that generally available third-party web browsers may be used without such permission;
I suggest you ensure you are abiding by those terms before using their data in a way that violates those rules. For educational purposes, here is how data would be obtained:
First you need to get to the actual data (not the interactive javascript). How familiar are you with the devloper function on your browser? If you navigate through the webiste and track the traffic, you will notice a large AJAX:
https://www.ishares.com/us/products/239714/ishares-russell-3000-etf/1467271812596.ajax?tab=all&fileType=json
This is the data you need (all). After locating this, it is just cleaning the data. Example:
library(jsonlite)
#Locate the raw data by searching the Network traffic:
url="https://www.ishares.com/us/products/239714/ishares-russell-3000-etf/1467271812596.ajax?tab=all&fileType=json"
#pull the data in via fromJSON
x<-jsonlite::fromJSON(url,flatten=TRUE)
>Large list (10.4 Mb)
#use a comination of `lapply` and `rapply` to unlist, structuring the results as one large list
y<-lapply(rapply(x, enquote, how="unlist"), eval)
>Large list (50677 elements, 6.9Mb)
y1<-y[1:15]
> str(y1)
List of 15
$ aaData1 : chr "MSFT"
$ aaData2 : chr "MICROSOFT CORP"
$ aaData3 : chr "Equity"
$ aaData.display: chr "2.95"
$ aaData.raw : num 2.95
$ aaData.display: chr "109.41"
$ aaData.raw : num 109
$ aaData.display: chr "2,615,449.00"
$ aaData.raw : int 2615449
$ aaData.display: chr "$286,156,275.09"
$ aaData.raw : num 2.86e+08
$ aaData.display: chr "286,156,275.09"
$ aaData.raw : num 2.86e+08
$ aaData14 : chr "Information Technology"
$ aaData15 : chr "2588173"
**Updated: In case you are unable to clean the data, here you are:
testdf<- data.frame(matrix(unlist(y), nrow=50677, byrow=T),stringsAsFactors=FALSE)
#Where we want to break the DF at (every nth row)
breaks <- 17
#number of rows in full DF
nbr.row <- nrow(testdf)
repeats<- rep(1:ceiling(nbr.row/breaks),each=breaks)[1:nbr.row]
#split DF from clean-up
newDF <- split(testdf,repeats)
Result:
> str(head(newDF))
List of 6
$ 1:'data.frame': 17 obs. of 1 variable:
..$ matrix.unlist.y...nrow...50677..byrow...T.: chr [1:17] "MSFT" "MICROSOFT CORP" "Equity" "2.95" ...
$ 2:'data.frame': 17 obs. of 1 variable:
..$ matrix.unlist.y...nrow...50677..byrow...T.: chr [1:17] "AAPL" "APPLE INC" "Equity" "2.89" ...
$ 3:'data.frame': 17 obs. of 1 variable:
..$ matrix.unlist.y...nrow...50677..byrow...T.: chr [1:17] "AMZN" "AMAZON COM INC" "Equity" "2.34" ...
$ 4:'data.frame': 17 obs. of 1 variable:
..$ matrix.unlist.y...nrow...50677..byrow...T.: chr [1:17] "BRKB" "BERKSHIRE HATHAWAY INC CLASS B" "Equity" "1.42" ...
$ 5:'data.frame': 17 obs. of 1 variable:
..$ matrix.unlist.y...nrow...50677..byrow...T.: chr [1:17] "FB" "FACEBOOK CLASS A INC" "Equity" "1.35" ...
$ 6:'data.frame': 17 obs. of 1 variable:
..$ matrix.unlist.y...nrow...50677..byrow...T.: chr [1:17] "JNJ" "JOHNSON & JOHNSON" "Equity" "1.29" ...
I am trying to modify a citation object in R as follows
cit <- citation("ggplot2")
cit$textVersion
#[1] "H. Wickham. ggplot2: Elegant Graphics for Data Analysis. Springer-Verlag New York, 2009."
cit$textVersion <- "Hadley Wickham and Winston Chang (2016). ggplot2: Create Elegant Data Visualisations Using
the Grammar of Graphics. R package version 2.2.1."
But there is no change.
cit$textVersion
#[1] "H. Wickham. ggplot2: Elegant Graphics for Data Analysis. Springer-Verlag New York, 2009."
If we examine the structure of cit, now there are two textVersion attributes. How to modify the original textVersion alone?
str(cit)
List of 1
$ :Class 'bibentry' hidden list of 1
..$ :List of 6
.. ..$ author :Class 'person' hidden list of 1
.. .. ..$ :List of 5
.. .. .. ..$ given : chr "Hadley"
.. .. .. ..$ family : chr "Wickham"
.. .. .. ..$ role : NULL
.. .. .. ..$ email : NULL
.. .. .. ..$ comment: NULL
.. ..$ title : chr "ggplot2: Elegant Graphics for Data Analysis"
.. ..$ publisher: chr "Springer-Verlag New York"
.. ..$ year : chr "2009"
.. ..$ isbn : chr "978-0-387-98140-6"
.. ..$ url : chr "http://ggplot2.org"
.. ..- attr(*, "bibtype")= chr "Book"
.. ..- attr(*, "textVersion")= chr "H. Wickham. ggplot2: Elegant Graphics for Data Analysis. Springer-Verlag New York, 2009."
.. ..- attr(*, "textversion")= chr "Hadley Wickham and Winston Chang (2016). ggplot2: Create Elegant Data Visualisations Using\n the Grammar of Gr"| __truncated__
- attr(*, "mheader")= chr "To cite ggplot2 in publications, please use:"
- attr(*, "class")= chr "bibentry"
A citation object is not made to be modified. The subset operators ($, [, but also $<-) are specific and don't allow easy modifications. This is for a reason: citation information is written in a specific file of a package and are not thought to be modified.
I don't know why you are trying it, but if you really need to, here is a little hack.
#store the class of the object, so can be reassigned later
oc<-class(cit)
#unclass the object to be free to modify
tmp<-unclass(cit)
#assign the new "textVersion"
attr(tmp[[1]],"textVersion")<-"Hadley Wickham and Winston Chang (2016). ggplot2: Create Elegant Data Visualisations Using the Grammar of Graphics. R package version 2.2.1."
#assign the class back
class(tmp)<-oc
tmp
#To cite ggplot2 in publications, please use:
#
# Hadley Wickham and Winston Chang (2016). ggplot2: Create Elegant Data
# Visualisations Using the Grammar of Graphics. R package version
# 2.2.1.
#
#A BibTeX entry for LaTeX users is
#
# #Book{,
# author = {Hadley Wickham},
# title = {ggplot2: Elegant Graphics for Data Analysis},
# publisher = {Springer-Verlag New York},
# year = {2009},
# isbn = {978-0-387-98140-6},
# url = {http://ggplot2.org},
# }
I am trying to use R package termstrc to estimate the term structure. To do that I have to prepare the data as the couponbonds class required by the package. I used some fake data to prevent the potential problem of the real data. Though I tried a lot, it still didn't work.
Any idea what is going wrong?
structure of the official demo data which works
data("govbonds")
str(govbonds)
List of 3
$ GERMANY:List of 8
..$ ISIN : chr [1:52] "DE0001141414" "DE0001137131" "DE0001141422" "DE0001137149" ...
..$ MATURITYDATE: Date[1:52], format: "2008-02-15" "2008-03-14" "2008-04-11" ...
..$ ISSUEDATE : Date[1:52], format: "2002-08-14" "2006-03-08" "2003-04-11" ...
..$ COUPONRATE : num [1:52] 0.0425 0.03 0.03 0.0325 0.0413 ...
..$ PRICE : num [1:52] 100 99.9 99.8 99.8 100.1 ...
..$ ACCRUED : num [1:52] 4.09 2.66 2.43 2.07 2.39 ...
..$ CASHFLOWS :List of 3
.. ..$ ISIN: chr [1:384] "DE0001141414" "DE0001137131" "DE0001141422" "DE0001137149" ...
.. ..$ CF : num [1:384] 104 103 103 103 104 ...
.. ..$ DATE: Date[1:384], format: "2008-02-15" "2008-03-14" "2008-04-11" ...
..$ TODAY : Date[1:1], format: "2008-01-30"
#another two are omitted here
- attr(*, "class")= chr "couponbonds"
> ns_res <- estim_nss(govbonds, c("GERMANY"), method = "ns",tauconstr=list(c(0.2, 5, 0.1)))
[1] "Searching startparameters for GERMANY"
beta0 beta1 beta2 tau1
5.008476 -1.092510 -3.209695 2.400100
my code to prepare fake data
bond=list()
bond$CHINA=list()
n=30*12#suppose I have n bond
enddate=as.Date('2014/11/7')
isin=sprintf('DE%010d',1:n)#some fake ISIN
bond$CHINA$ISIN=isin
bond$CHINA$MATURITYDATE=enddate+(1:n)*30
bond$CHINA$ISSUEDATE=rep(enddate,n)
bond$CHINA$COUPONRATE=rep(5/100,n)
bond$CHINA$PRICE=rep(100,n)
bond$CHINA$ACCRUED=rep(0,n)
bond$CHINA$CASHFLOWS=list()
bond$CHINA$CASHFLOWS$ISIN=isin
bond$CHINA$CASHFLOWS$CF=100+(1:n)*5/12
bond$CHINA$CASHFLOWS$DATE=enddate+(1:n)*30
bond$CHINA$TODAY=enddate
class(bond)='couponbonds'
ns_res <- estim_nss(bond, c("CHINA"), method = "ns",tauconstr=list(c(0.2, 5, 0.1)))
the output
Error in `colnames<-`(`*tmp*`, value = c("DE0000000001", "DE0000000002", :
attempt to set 'colnames' on an object with less than two dimensions
The problem was finally solved by adding one cashflow with amount zero to the CASHFLOW$CF.
Put it in another way, at least one bond should have at least two cashflows.
Then you may face another error caused by uniroot function. Be sure to only include the cashflow after TODAY. The termstrc doesn't filter the cashflow for you by using TODAY.
Whenever I use any sort of HTTP command via the system() function in R studio, the rainbow circle of death appears and I have to force-quit R Studio. Up until now, I've written a bunch of checks to make sure a user isn't in R Studio before using an HTTP command (which I use a ton to access data), but it's quite a pain, and it would be fantastic to get to the root of the problem.
e.g.
system("http get http://api.bls.gov/publicAPI/v1/timeseries/data/CXUALCBEVGLB0101M")
causes R studio to crash. Oddly, on another laptop of mine, such commands don't crash R Studio but cause the following error: 'sh: http: command not found', even though http is installed and works fine when using the terminal.
Does anybody know how to fix this problem / why it happens / does it occur for you guys too? Although I know a lot about R, I'm afraid I have no idea how to try to fix this problem.
Thanks!!!
Using http from the httpie package on Linux hangs RStudio (and not plain terminal R) on my Linux system (your rainbow circle implies its a Mac?) so I'm getting the same behaviour as you.
Installing and using wget works for me:
system("wget -O /tmp/data.out http://api.bls.gov/publicAPI/v1/timeseries/data/CXUALCBEVGLB0101M")
Or you could try R's native download.file function. There's a whole bunch of other functions for getting stuff off the web - see the Web Task View http://cran.r-project.org/web/views/WebTechnologies.html
I've not seen this http command used much, so maybe its flakey. Or maybe its opening stdin...
Yes... Try this:
system("http get http://api.bls.gov/publicAPI/v1/timeseries/data/CXUALCBEVGLB0101M >/tmp/data2.out </dev/null" )
I think http is opening stdin, the Unix standard input channel, RStudio isn't sending anything to it. So it waits. If you explicitly assign http's stdin as /dev/null then http completes. This works for me in RStudio.
However, I still prefer wget or curl-based solutions!
Without more contextual information regarding Rstudio version / operating system it is hard to do more than suggest an alternative approach that avoids the use system()
Instead you could use RCurl and getURL
library(RCurl)
getURL('http://api.bls.gov/publicAPI/v1/timeseries/data/CXUALCBEVGLB0101M')
#[1] "{\"status\":\"REQUEST_SUCCEEDED\",\"responseTime\":129,\"message\":[],\"Results\":{\n\"series\":\n[{\"seriesID\":\"CXUALCBEVGLB0101M\",\"data\":[{\"year\":\"2013\",\"period\":\"A01\",\"periodName\":\"Annual\",\"value\":\"445\",\"footnotes\":[{}]},{\"year\":\"2012\",\"period\":\"A01\",\"periodName\":\"Annual\",\"value\":\"451\",\"footnotes\":[{}]},{\"year\":\"2011\",\"period\":\"A01\",\"periodName\":\"Annual\",\"value\":\"456\",\"footnotes\":[{}]}]}]\n}}"
You could also use PUT, GET, POST, etc directly in R, abstracted from RCurl by the httr package:
library(httr)
tmp <- GET("http://api.bls.gov/publicAPI/v1/timeseries/data/CXUALCBEVGLB0101M")
dat <- content(tmp, as="parsed")
str(dat)
## List of 4
## $ status : chr "REQUEST_SUCCEEDED"
## $ responseTime: num 27
## $ message : list()
## $ Results :List of 1
## ..$ series:'data.frame': 1 obs. of 2 variables:
## .. ..$ seriesID: chr "CXUALCBEVGLB0101M"
## .. ..$ data :List of 1
## .. .. ..$ :'data.frame': 3 obs. of 5 variables:
## .. .. .. ..$ year : chr [1:3] "2013" "2012" "2011"
## .. .. .. ..$ period : chr [1:3] "A01" "A01" "A01"
## .. .. .. ..$ periodName: chr [1:3] "Annual" "Annual" "Annual"
## .. .. .. ..$ value : chr [1:3] "445" "451" "456"
## .. .. .. ..$ footnotes :List of 3
## .. .. .. .. ..$ :'data.frame': 1 obs. of 0 variables
## .. .. .. .. ..$ :'data.frame': 1 obs. of 0 variables
## .. .. .. .. ..$ :'data.frame': 1 obs. of 0 variables
I am working on saving twitter search results into a database (SQL Server) and am getting an error when I pull the search results from twitteR.
If I execute:
library(twitteR)
puppy <- as.data.frame(searchTwitter("puppy", session=getCurlHandle(),num=100))
I get an error of:
Error in as.data.frame.default(x[[i]], optional = TRUE) :
cannot coerce class structure("status", package = "twitteR") into a data.frame
This is important because in order to use RODBC to add this to a table using sqlSave it needs to be a data.frame. At least that's the error message I got:
Error in sqlSave(localSQLServer, puppy, tablename = "puppy_staging", :
should be a data frame
So does anyone have any suggestions on how to coerce the list to a data.frame or how I can load the list through RODBC?
My final goal is to have a table that mirrors the structure of values returned by searchTwitter. Here is an example of what I am trying to retrieve and load:
library(twitteR)
puppy <- searchTwitter("puppy", session=getCurlHandle(),num=2)
str(puppy)
List of 2
$ :Formal class 'status' [package "twitteR"] with 10 slots
.. ..# text : chr "beautifull and kc reg Beagle Mix for rehomes: This little puppy is looking for a new loving family wh... http://bit.ly/9stN7V "| __truncated__
.. ..# favorited : logi FALSE
.. ..# replyToSN : chr(0)
.. ..# created : chr "Wed, 16 Jun 2010 19:04:03 +0000"
.. ..# truncated : logi FALSE
.. ..# replyToSID : num(0)
.. ..# id : num 1.63e+10
.. ..# replyToUID : num(0)
.. ..# statusSource: chr "<a href="http://twitterfeed.com" rel="nofollow">twitterfeed</a>"
.. ..# screenName : chr "puppy_ads"
$ :Formal class 'status' [package "twitteR"] with 10 slots
.. ..# text : chr "the cutest puppy followed me on my walk, my grandma won't let me keep it. taking it to the pound sadface"
.. ..# favorited : logi FALSE
.. ..# replyToSN : chr(0)
.. ..# created : chr "Wed, 16 Jun 2010 19:04:01 +0000"
.. ..# truncated : logi FALSE
.. ..# replyToSID : num(0)
.. ..# id : num 1.63e+10
.. ..# replyToUID : num(0)
.. ..# statusSource: chr "<a href="http://blackberry.com/twitter" rel="nofollow">Twitter for BlackBerry®</a>"
.. ..# screenName : chr "iamsweaters"
So I think the data.frame of puppy should have column names like:
- text
- favorited
- replytoSN
- created
- truncated
- replytoSID
- id
- replytoUID
- statusSource
- screenName
I use this code I found from http://blog.ouseful.info/2011/11/09/getting-started-with-twitter-analysis-in-r/ a while ago:
#get data
tws<-searchTwitter('#keyword',n=10)
#make data frame
df <- do.call("rbind", lapply(tws, as.data.frame))
#write to csv file (or your RODBC code)
write.csv(df,file="twitterList.csv")
I know this is an old question, but still, here is what I think is a ``modern'' version to solve this. Just use the function twListToDf
gvegayon <- getUser("gvegayon")
timeline <- userTimeline(gvegayon,n=400)
tl <- twListToDF(timeline)
Hope it helps
Try this:
ldply(searchTwitter("#rstats", n=100), text)
twitteR returns an S4 class, so you need to either use one of its helper functions, or deal directly with its slots. You can see the slots by using unclass(), for instance:
unclass(searchTwitter("#rstats", n=100)[[1]])
These slots can be accessed directly as I do above by using the related functions (from the twitteR help: ?statusSource):
text Returns the text of the status
favorited Returns the favorited information for the status
replyToSN Returns the replyToSN slot for this status
created Retrieves the creation time of this status
truncated Returns the truncated information for this status
replyToSID Returns the replyToSID slot for this status
id Returns the id of this status
replyToUID Returns the replyToUID slot for this status
statusSource Returns the status source for this status
As I mentioned, it's my understanding that you will have to specify each of these fields yourself in the output. Here's an example using two of the fields:
> head(ldply(searchTwitter("#rstats", n=100),
function(x) data.frame(text=text(x), favorited=favorited(x))))
text
1 #statalgo how does that actually work? does it share mem between #rstats and postgresql?
2 #jaredlander Have you looked at PL/R? You can call #rstats from PostgreSQL: http://www.joeconway.com/plr/.
3 #CMastication I was hoping for a cool way to keep data in a DB and run the normal #rstats off that. Maybe a translator from R to SQL code.
4 The distribution of online data usage: AT&T has recently announced it will no longer http://goo.gl/fb/eTywd #rstat
5 #jaredlander not that I know of. Closest is sqldf package which allows #rstats and sqlite to share mem so transferring from DB to df is fast
6 #CMastication Can #rstats run on data in a DB?Not loading it in2 a dataframe or running SQL cmds but treating the DB as if it wr a dataframe
favorited
1 FALSE
2 FALSE
3 FALSE
4 FALSE
5 FALSE
6 FALSE
You could turn this into a function if you intend on doing it frequently.
For those that run into the same problem I did which was getting an error saying
Error in as.double(y) : cannot coerce type 'S4' to vector of type 'double'
I simply changed the word text in
ldply(searchTwitter("#rstats", n=100), text)
to statusText, like so:
ldply(searchTwitter("#rstats", n=100), statusText)
Just a friendly heads-up :P
Here is a nice function to convert it into a DF.
TweetFrame<-function(searchTerm, maxTweets)
{
tweetList<-searchTwitter(searchTerm,n=maxTweets)
return(do.call("rbind",lapply(tweetList,as.data.frame)))
}
Use it as :
tweets <- TweetFrame(" ", n)
The twitteR package now includes a function twListToDF that will do this for you.
puppy_table <- twListToDF(puppy)