I am scraping OpenFDA (https://open.fda.gov/apis). I know my particular inquiry has 6974 hits, which is organized into 100 hits per page (max download of the API). I am trying to use R (rvest, jsonlite, purr, tidyverse, httr) to download all of this data.
I checked the website information with curl in terminal and downloaded a couple of sites to see a pattern.
I've tried a few lines of code and I can only get 100 entries to download. This code seems to work decently, but it will only pull 100 entries, so one page To skip the fisrt 100, which I can pull down and merge later, here is the code that I have used:
url_json <- "https://api.fda.gov/drug/label.json?api_key=YOULLHAVETOGETAKEY&search=grapefruit&limit=100&skip=6973"
raw_json <- httr::GET(url_json, accept_json())
data<- httr::content(raw_json, "text")
my_content_from_json <- jsonlite::fromJSON(data)
dplyr::glimpse(my_content_from_json)
dataframe1 <- my_content_from_json$results
view(dataframe1)
SOLUTION below in the responses. Thanks!
From the comments:
It looks like the API parameters skip and limit work better than the search_after parameter. They allow pulling down 1,000 entries simultaneously according to the documentation (open.fda.gov/apis/query-parameters). To provide these parameters into the query string, an example URL would be
https://api.fda.gov/drug/label.json?api_key=YOULLHAVETOGETAKEY&search=grapefruit&limit=1000&skip=0
after which you can loop to get the remaining entries with skip=1000, skip=2000, etc. as you've done above.
Related
I need to find some historical time series for Stocks and have the result in R.
I tried already the package "quantmod", but unfortunately most of the stocks are not covered.
I found EXCELS "STOCKPRICEHISTORY" to yield good results.
Hence, to allow more solutions, I phrase my question more open:
How do I get from a table, that contains Stocks (Ticker), Startdate and Endate to a list in R which contains each respective stock and its stockprice timeseries?
My Startingpoint looks like this:
My aim at the very End is to have something like this:
(Its also ok if I have every single stock price timeseries as csv)
My ideas so far:
Excel VBA Solution - 1
Write a macro, that executes EXCELS "STOCKHISTORY" function on each of these stocks, and writes them as csv or so? Then after that, read them in and create a list in R
Excel VBA Solution - 2
Write a macro, that executes EXCELS "STOCKHISTORY" function on each of these stocks,
each one in a new worksheet? Bad Idea, since there are more than 4000 Stocks..
R Solution
(If possible) call "STOCKHISTORY" function from R directly (?)
Any suggestions on how to takle this?
kind regards
I would recommend using an API, especially over trying to connect to Excel via VBA. There are many that are free, but require and API key from their website. For example, alphavantage.
library(tidyverse)
library(jsonlite)
library(httr)
symbol = "IBM"
av_key = "demo"
url <- str_c("https://www.alphavantage.co/query?function=TIME_SERIES_DAILY&symbol=", symbol ,"&apikey=", av_key, "&datatype=csv")
d <- read_csv(url)
d %>% head
Cred and other options: https://quantnomad.com/2020/07/06/best-free-api-for-historical-stock-data-examples-in-r/
I'm trying to webscrape in R from webpages such as these. But the html is only 50 lines so I'm assuming the numbers are hidden in a javascript file or on their server. I'm not sure how to find the numbers I want (e.g., the enrollment number under student population).
When I try to use rvest, as in
num <- school_webpage %>%
html_elements(".number no-mrg-btm") %>%
html_text()
I get an error that says "could not find function "html_elements"" even though I've installed and loaded rvest.
What's my best strategy for getting those various numbers and why am I getting that error message? Thnx.
That data is coming from an API request you can find in the browser network tab. It returns json. Make a request direct to that page (as you don't have a browser to do this based off landing page):
library(jsonlite)
data <- jsonlite::read_json('https://api.caschooldashboard.org/LEAs/01611766000590/6/true')
print(data$enrollment)
I'm using this guide as an example to scrape the time that posts were published to Reddit.
It says to use SelectorGadget tool to bypass learning other languages, so that's what I did.
Although the page on old.reddit.com shows 100 posts (so 100 different times should be recorded), only 25 different time values are actually extracted from my code. Here's what my code looks like:
library(rvest)
url <- 'https://old.reddit.com/'
rawdata <- read_html(url)
rawtime <- html_nodes(rawdata, '.live-timestamp')
#".live-timestamp" was obtained using the Chrome extension "SelectorGadget"
finalresult <- bind_rows(lapply(xml_attrs(rawtime), function(x) data.frame(as.list(x), stringsAsFactors=FALSE)))
Alternatively, you could use PRAW to get the information from Reddit. This is a particular solution for your problem but might work.
https://praw.readthedocs.io/en/latest/
And in the subreddit r/redditdev
You need to be logged in or use the ?limit=100 parameter in order to get 100 items in a listing.
See the API documentation for more information:
limit: the maximum number of items desired (default: 25, maximum: 100)
I'm trying in R to get the list of tickers from every exchange covered by Quandl.
There are 2 ways:
1) For every exchange the provide the zipped csv with all ticker. The URL looks like this (XXXXXXXXXXXXXXXXXXXX - API key, YYY - code of exchange):
https://www.quandl.com/api/v3/databases/YYY/codes?api_key=XXXXXXXXXXXXXXXXXXXX
This looks pretty promissing, but I was not able to read the file with read.table or e.g fread. Don't know why. Is it because of the API key? read.table is supposed to read zip files with no problem.
2) I was able to go further with the 2nd way. They provide URL to the csv of tickers. E.g.:
https://www.quandl.com/api/v3/datasets.csv?database_code=YYY&per_page=100&sort_by=id&page=1&api_key=XXXXXXXXXXXXXXXXXXXX
As you see, URL contains page number. The problem is they only mention it below in text, that you need to run this URL many time (e.g. 56 for LSE) in order to get the full list. I was able to do it like this:
pages <- 1:100 # "100" is taken just to be big enough
Source <- c("LSE","FSE", ...) # vector of exchange codes
QUANDL_API_KEY="XXXXXXXXXXXXXXXXXXXXXXXXXX"
TICKERS = lapply(sprintf(
"https://www.quandl.com/api/v3/datasets.csv?
database_code=%s&per_page=100&sort_by=id&page=%s&api_key=%s",
Source,pages,QUANDL_API_KEY),
FUN=fread,
stringsAsFactors=FALSE)
TICKERS <- do.call(rbind, TICKERS)
The problem is I just put 100 pages, but when R tryies to get the non-existing page (e.g. #57) it delivers an error and do not go further. I was trying to do smth like iferror, but failed.
Could you pls give some hints?
I am using the IBrokers API in R to try to download my current positions in my portfolio on Interactive Brokers. However, I'm having trouble downloading the information by following the API documentation.
I can get this far with the following. This downloads my account information, but it's not a desireable format.
tws <- twsConnect()
reqAccountUpdates(tws)
I trade using the following, but it doesn't work.
twsPortfolioValue(tws)
Ideally, I want a data frame that has the following fields: ticker, shares, execution price.
Is anyone familiar with this API?
Thank you!
You're passing a twsconn object to twsPortfolioValue, but the function needs the output of reqAccountUpdates as its input, as explained in the Details section of ?twsPortfolioValue
Try this:
ac <- reqAccountUpdates(tws)
twsPortfolioValue(ac)