I am trying to import a dataset from CMS using an API. My code, however, only returns 1,000 of the 155,262 observations. I don't know what I am doing wrong. Another user posted a similar problem, but regrettably, I still cannot it figure out.
library(jsonlite)
# url for CMS dataset
url <- 'https://data.cms.gov/data-api/v1/dataset/3cc6ad89-5cc0-4071-91e1-2a91aff79975/data?'
# read url and convert to data.frame
document <- fromJSON(url)
This is the link to the website on CMS: https://data.cms.gov/provider-characteristics/hospitals-and-other-facilities/provider-of-services-file-hospital-non-hospital-facilities. I am interested in accessing the POS file for Q4 2021. Thanks for your help.
Related
I am really struggling to understand how this newly released API works.. Can someone please help me turn it into a useful dataframe in R? My res looks like the below (edited):
library(httr)
library(jsonlite)
library(dplyr)
#GET Function
res = GET("https://comtradeapi.un.org/data/v1/get/C/A/HS?reporterCode=826&period=2020&partnerCode=000&partner2Code=000&cmdCode=TOTAL&flowCode=M HTTP/1.1&subscription-key=6509aa2a08d54ca7b47a2fece2ab5bee")
df= fromJSON(rawToChar(res$content)) #this doesn't work
By pasting your URL into a browser we get:
{"elapsedTime":"0.02 secs","count":0,"data":[],"error":""}
So there appears to be an error with the result itself. Also, I'd strongly advise against publishing your secret API key, as it allows others to access the data you're subscribing to!
I am scraping OpenFDA (https://open.fda.gov/apis). I know my particular inquiry has 6974 hits, which is organized into 100 hits per page (max download of the API). I am trying to use R (rvest, jsonlite, purr, tidyverse, httr) to download all of this data.
I checked the website information with curl in terminal and downloaded a couple of sites to see a pattern.
I've tried a few lines of code and I can only get 100 entries to download. This code seems to work decently, but it will only pull 100 entries, so one page To skip the fisrt 100, which I can pull down and merge later, here is the code that I have used:
url_json <- "https://api.fda.gov/drug/label.json?api_key=YOULLHAVETOGETAKEY&search=grapefruit&limit=100&skip=6973"
raw_json <- httr::GET(url_json, accept_json())
data<- httr::content(raw_json, "text")
my_content_from_json <- jsonlite::fromJSON(data)
dplyr::glimpse(my_content_from_json)
dataframe1 <- my_content_from_json$results
view(dataframe1)
SOLUTION below in the responses. Thanks!
From the comments:
It looks like the API parameters skip and limit work better than the search_after parameter. They allow pulling down 1,000 entries simultaneously according to the documentation (open.fda.gov/apis/query-parameters). To provide these parameters into the query string, an example URL would be
https://api.fda.gov/drug/label.json?api_key=YOULLHAVETOGETAKEY&search=grapefruit&limit=1000&skip=0
after which you can loop to get the remaining entries with skip=1000, skip=2000, etc. as you've done above.
I am looking for a way to automatically 'translate' the shortened URLs from Twitter to the original URL.
I scraped a couple of twitter timelines using following code:
tweets <- userTimeline("exampleuser", n = 3200, includeRts=TRUE)
tweets_df <- tbl_df(map_df(tweets, as.data.frame))
Then I separated the shortened URLs from the rest of the tweet text, so that I have a separate column in my dataframe, which contains only the shortened URL.
Now I am looking for a way to automatically scrape all these URLs, which redirect to various websites, and get a new column with the original (i.e. unshortened) URL.
Anyone an idea how I can do this in R?
Thanks,
Manuel
You can use the httr package.
httr::HEAD("URL") will give you a response in the first line and then you can do the usual cleaning to get just the URL-s.
I'm trying to webscrape equity historical data from the nse website :
https://www.nseindia.com/products/content/equities/equities/eq_security.htm
I Tried to web scrape data data for a company(symbol name) named RELIANCE for the range(time period) past 2 weeks and transfer the contents to a CSV file
library(rvest)
url <- "https://www.nseindia.com/products/dynaContent/common/productsSymbolMapping.jsp?symbol=RELIANCE&segmentLink=3&symbolCount=2&series=ALL&dateRange=15days&fromDate=&toDate=&dataType=PRICEVOLUMEDELIVERABLE"
page_html <- read_html(url)
data <- html_nodes(page_html, "p")
data <- html_text(data)
write.csv(data$data, "scrapedData.csv", row.names=FALSE)
Its Says character(empty)
I know that there is an option to download the csv file there in the website but i want an automated R Script for getting the data.
I know that there are other packages such as quantmod are present for getting historical stock data but i require from this website as it has useful information such as TTQ,Turnover,etc.
why reinvent the wheel?
you can use nsepy python module.
https://github.com/swapniljariwala/nsepy
there are similar alternatives exist.
You just need to use this:
from nsepy import get_history
from datetime import date
data = get_history(symbol="SBIN", start=date(2015,1,1), end=date(2015,1,31))
How do I properly download and load in R an OData dataset?
I tried the OData package, and even if the documentation is really simple, I am sure, I am missing something trivial.
I am trying to download and parse in R this dataset, but I cannot get how it is structured. Is it a XML format? Hence, what is the reason for a separator argument?
library(OData)
#What is the correct argument for the separator?
downloadResourceCsv("https://data.nasa.gov/OData.svc/gh4g-9sfh", sep = "")
As hrbrmstr suggests, use the RSocrata package
e.g., go to 1, click on ... in the top right,
click on "Access this Dataset via OData", click
on "Copy" to copy the OData endpoint, save it:
url <- "https://data.cdc.gov/api/odata/v4/9bhg-hcku"
library(RSocrata)
dat <- read.socrata(url)
It's XML format.So download first.
Try using httr package.
library(httr)
r <- GET("http://httpbin.org/get")
Visit this site for quick-start.
After download use XML package for xmlParse.
Thank you