Data from httr POST-request is long string instead of table - r

I'm receiving the data I'm requesting but don't understand how to sufficiently extract the data. Here is the POST request:
library(httr)
url <- "http://tools-cluster-interface.iedb.org/tools_api/mhci/"
body <- list(method="recommended", sequence_text="SLYNTVATLYCVHQRIDV", allele="HLA-A*01:01,HLA-A*02:01", length="8,9")
data <- httr::POST(url, body = body,encode = "form", verbose())
If I print the data with:
data
..it shows the request details followed by a nicely formatted table. However if I try to extract with:
httr::content(data, "text")
This returns a single string with all the values of the original table. The output looks delimited by "\" but I couldn't str_replace or tease it out properly.
I'm new to requests using R (and httr) and assume it's an option I'm missing with httr. Any advice?
API details here: http://tools.iedb.org/main/tools-api/

The best way to do this is to specify the MIME type:
content(data, type = 'text/tab-separated-values')

Related

Cant connect to Qualtrics API using httr package

I am trying to connect to Qualtrics API using Rstudio Cloud "httr" package to download mailing lists. After a review of the API documentation I was unable to download the data, getting the following error after running the code:
"{"meta":{"httpStatus":"400 - Bad Request","error":{"errorMessage":"Expected authorization in headers, but none provided.","errorCode":"ATP_2"},"requestId":"8fz33cca-f9ii-4bca-9288-5tc69acaea13"}}"
This does not makes me any sense since I am using a inherit auth from parent token. Here is the code:
install.packages("httr")
library(httr)
directoryId<-"POOL_XXXXX"
mailingListId <- "CG_XXXXXX"
apiToken<-"XXXX"
url<- paste("https://iad1.qualtrics.com/API/v3/directories/",directoryId,
"/mailinglists/",mailingListId,"/optedOutContacts", sep = "")
response <- VERB("GET",url, add_headers('X_API-TOKEN' = apiToken),
content_type("application/octet-stream"))
content(response, "text")
Any help will be appreciated.
Thanks in advance.
Your call to httr::VERB breaks the API token and the content type into two arguments to the function, but they should be passed together in a vector to a single "config" argument. Also, content_type isn't a function, it's just the name of an element in that header vector. This should work:
response <- VERB("GET", url, add_headers(c(
'X_API-TOKEN' = apiToken,
'content_type' = "application/octet-stream")))
Note that mailing lists will be returned by Qualtrics as lists that will include both a "meta" element and a "result" element, both of which will themselves be lists. If the list is long, the only the first 100 contacts on the list will be returned; there will be an element response$result$nextpage that will provide the URL required to access the next 100 results. The qualtRics::fetch_mailinglist() function does not work with XM Directory contact lists (which is probably why you got a 500 error when using it), but the code for unpacking the list and looping over each "nextpage" element might be helpful.

How to extact data from a JSON data with CURL

I have a curl request that authenticates with the username and password.
library(curl)
library(rjson)
library(jsonlite)
link1<-paste0("The URL of the required data source")
resp<- curl_fetch_memory(link1,h)
zips<- jsonlite::fromJSON(rawToChar(resp$content,multiple = FALSE))
The expected result is that of a list within a list , but i am getting a list with a row of characters all merged ...
Am i moving on with the wrong logic here? Any leads would be greatly appreciated.

Why is R HTTR content statement not producing the expected request body?

I am using R to call on Cybersource API. When I use the GET request I obtain a successful response of 200. When I read the body of the the response rather than get the csv data back I get a path to a csv file path. I wonder what I am doing wrong.
content(request) gives
"/space/download_reports/output/dailyreports/reports/2018/10/27/testrest/TRRReport-7931d82d-cf4a-71fa-e053-a2588e0ab27a.csv"
The result of content(request) should be the data not a file path.
Here is a the code
library('httr')
merchant<-'testrest'
vcdate<-'Wed, 29 May 2019 10:09:48 GMT'
ho<-'apitest.cybersource.com'
URL<-'https://apitest.cybersource.com/reporting/v3/report-downloads?organizationId=testrest&reportDate=2018-10-27&reportName=TRRReport'
sign<-'keyid="08c94330-f618-42a3-b09d-e1e43be5efda", algorithm="HmacSHA256", headers="host (request-target) v-c-merchant-id", signature="7cr6mZMa1oENhJ5NclGsprmQxwGlo1j3VjqAR6xngxk="'
req<-GET(URL, add_headers(.headers=c('v-c-merchant-id'=merchant, 'v-c-date'=vcdate, 'Host'=ho, 'Signature'=sign)))
content(req)
Here is Cybersource test api where you can verify the data produced.
https://developer.cybersource.com/api/reference/api-reference.html
I am trying to download a report under the reporting tab
Honestly I'm not exactly sure what's going on here, but I do think I was able to get something to work.
The API seems to like to return data in application/hal+json format which is not one that httr normally requests. You can say you'll accept anything with accept("*") So you can do your request with:
req <- GET(URL, add_headers('v-c-merchant-id'=merchant,
'v-c-date'=vcdate,
'Host'=ho,
'Signature'=sign), accept("*"))
Now we are actually getting the data back that we want, but httr doesn't know how to automatically parse it. So we need to parse it ourselves. This seems to do the trick
readr::read_csv(rawToChar(content(req)), skip=1)
There seems to be a header row which we skip with skip= and then we parse the rest as a CSV file with readr::read_csv.

Set cookies with rvest

I would like to programmatically export the records available at this website. To do this manually, I would navigate to the page, click export, and choose the csv.
I tried copying the link from the export button which will work as long as I have a cookie (I believe). So a wget or httr request will result in the html site instead of the file.
I've found some help from an issue on the rvest github repo but ultimately I can't really figure out like the issue maker how to use objects to save the cookie and use it in a request.
Here is where I'm at:
library(httr)
library(rvest)
apoc <- html_session("https://aws.state.ak.us/ApocReports/Registration/CandidateRegistration/CRForms.aspx")
headers <- headers(apoc)
GET(url = "https://aws.state.ak.us/ApocReports/Registration/CandidateRegistration/CRForms.aspx?exportAll=False&exportFormat=CSV&isExport=True",
add_headers(headers)) # how can I take the output from headers in httr and use it as an argument in GET from httr?
I have checked the robots.txt and this is permissible.
You can get the __VIEWSTATE and __VIEWSTATEGENERATOR from the headers when you GET https://aws.state.ak.us/ApocReports/Registration/CandidateRegistration/CRForms.aspx and then reuse those __VIEWSTATE and __VIEWSTATEGENERATOR in your subsequent POST query and GET csv.
options(stringsAsFactors=FALSE)
library(httr)
library(curl)
library(xml2)
url <- 'https://aws.state.ak.us/ApocReports/Registration/CandidateRegistration/CRForms.aspx'
#get session headers
req <- GET(url)
req_html <- read_html(rawToChar(req$content))
fields <- c("__VIEWSTATE","__VIEWSTATEGENERATOR")
viewheaders <- lapply(fields, function(x) {
xml_attr(xml_find_first(req_html, paste0(".//input[#id='",x,"']")), "value")
})
names(viewheaders) <- fields
#post request. you can get the list of form fields using tools like Fiddler
params <- c(viewheaders,
list(
"M$ctl19"="M$UpdatePanel|M$C$csfFilter$btnExport",
"M$C$csfFilter$ddlNameType"="Any",
"M$C$csfFilter$ddlField"="Elections",
"M$C$csfFilter$ddlReportYear"="2017",
"M$C$csfFilter$ddlStatus"="Default",
"M$C$csfFilter$ddlValue"=-1,
"M$C$csfFilter$btnExport"="Export"))
resp <- POST(url, body=params, encode="form")
print(resp$status_code)
resptext <- rawToChar(resp$content)
#writeLines(resptext, "apoc.html")
#get response i.e. download csv
url <- "https://aws.state.ak.us//ApocReports/Registration/CandidateRegistration/CRForms.aspx?exportAll=True&exportFormat=CSV&isExport=True"
req <- GET(url, body=params)
read.csv(text=rawToChar(req$content))
You might need to play around with the inputs/code to get what you want precisely.
Here is another similar solution using RCurl:
how-to-login-and-then-download-a-file-from-aspx-web-pages-with-r

Successfully coercing paginated JSON object to R dataframe

I am trying to convert JSON pulled from an API into a data frame in R, so that I can use and analyze the data.
#Install needed packages
require(RJSONIO)
require(httr)
#request a list of companies currently fundraising using httr
r <- GET("https://api.angel.co/1/startups?filter=raising")
#convert to text object using httr
raise <- content(r, as="text")
#convert to list using RJSONIO
fromJSON(raise) -> new
Once I get this object, new, I am having a really difficult time parsing the list into a dataframe. The json has this structure:
{
"startups": [
{
"id": 6702,
"name": "AngelList",
"quality": 10,
"...": "...",
"fundraising": {
"round_opened_at": "2013-07-30",
"raising_amount": 1000000,
"pre_money_valuation": 2000000,
"discount": null,
"equity_basis": "equity",
"updated_at": "2013-07-30T08:14:40Z",
"raised_amount": 0.0
}
}
],
"total": 4268 ,
"per_page": 50,
"page": 1,
"last_page": 86
}
I've tried looking at individual elements within new using code like:
new$startups[[1]]$fundraising$raised_amount
To pull the raised_amount for the first element listed. However, I don't know how to apply this to the whole list of 4268 startups. In particular, I can't figure out how to deal with the pagination. I only ever seem to get one page of startups (i.e. 50 of them) max.
I tried using a for loop to get the list of startups and just put each value into a row of a dataframe one by one. The example below shows this for just one column, but of course I could do it for all of them just by expanding the for loop. However, I can't get any content on any of the other pages.
df1 <- as.data.frame(1:length(new$startups))
df1$raiseamnt <- 0
for (i in 1:length(new$startups)) {
df1$raiseamnt[i] <- new$startups[[i]]$fundraising$raised_amount
}
e: Thank you for the mention of pagination. I will look through the documents more carefully and see if I can figure out how to correctly structure the API calls to get different pages. I will update this answer if/when I figure that out!
You may find the jsonlite package useful. Below is a quick example.
library(jsonlite)
library(httr)
#request a list of companies currently fundraising using httr
r <- GET("https://api.angel.co/1/startups?filter=raising")
#convert to text object using httr
raise <- content(r, as="text")
#parse JSON
new <- fromJSON(raise)
head(new$startups$id)
[1] 229734 296470 237516 305916 184460 147385
Note, however, this package or the one in the question can be of help to parse JSON string, individual structure should created appropriately so that each element of the string can be added without a problem and it is up to the developer.
For pagnation, the API seems to be a REST API so that filtering condition is normally added in the URL (eg https://api.angel.co/1/startups?filter=raising&variable=value). I guess it would be found somewhere in the API doc.
httr library already imports jsonlite (httr documentation). The more elegant way with better formatted output is:
library(httr)
resp <- httr::GET("https://api.angel.co/1/startups?filter=raising", accept_json())
cont <- content(resp, as = "parsed", type = "application/json")
#explicit convertion to data frame
dataFrame <- data.frame(cont)

Resources