Im trying to paste multiple vin numbers to the nthsa API.
My working solution looks like this:
vins <- c('4JGCB5HE1CA138466','4JGCB5HE1CA138466','4JGCB5HE1CA138466','4JGCB5HE1CA138466','4JGCB5HE1CA138466','4JGCB5HE1CA138466','4JGCB5HE1CA138466','4JGCB5HE1CA138466','4JGCB5HE1CA138466','4JGCB5HE1CA138466','4JGCB5HE1CA138466',)
for (i in vins){
json <- fromJSON(paste0('https://vpic.nhtsa.dot.gov/api/vehicles/DecodeVinValues/',i,'?format=json'))
print(json)
}
This solution is very slow. I tried pbapply, same thing because it pastes one vin at a time.
There is a batch paste option that i just cant figure out. Can someone please assist.
Here is my code so far:
data <- list(data='4JGCB5HE1CA138466;4JGCB5HE1CA138466;4JGCB5HE1CA138466;4JGCB5HE1CA138466')
json <- toJSON(list(data=data), auto_unbox = TRUE)
result <- POST('https://vpic.nhtsa.dot.gov/api/vehicles/DecodeVINValuesBatch/', body = data)
Output <- content(result)
The vin numbers string has to be in the following format: vin;vin;vin;vin;
here is the link: https://vpic.nhtsa.dot.gov/api/ (the last one)
Thanks in advance.
UPDATE:
I also tried this from some other threads but no luck:
headers = c(
`Content-Type` = 'application/json'
)
data = '[{"data":"4JGCB5HE1CA138466;4JGCB5HE1CA138466;4JGCB5HE1CA138466;4JGCB5HE1CA138466"}]'
httr::POST(url = 'https://vpic.nhtsa.dot.gov/api/vehicles/DecodeVINValuesBatch/', httr::add_headers(.headers=headers), body = data)
print(r$status_code)
I am getting status code 200 but server code 500 with no data.
I am not sure if this is possible. The batch endpoint is specifically looking for a dictionary to be passed (ruling out string representations). httr states:
body: must be NULL, FALSE, character, raw or list
I tried using collections library to generate dict
data <- Dict$new(list(format = 'json', data = "4JGCB5HE1CA138466;4JGCB5HE1CA138466;4JGCB5HE1CA138466"))
httr unsurprisingly rejected it as wrong body ype.
I tried using jsonlite to convert with:
data <- jsonlite::toJSON(data)
Yielding:
Error: No method asJSON S3 class: R6
I think due to data being an environment.
Trying to read in string dictionary to json returns no data:
library(httr)
library(jsonlite)
headers = c(
'Accept' = '*/*',
'Accept-Encoding' = 'gzip, deflate',
'Content-Type' = 'application/x-www-form-urlencoded',
'User-Agent' = 'Mozilla/5.0'
)
data = jsonlite::toJSON('{"format":"json","data":"4JGCB5HE1CA138466;4JGCB5HE1CA138466;4JGCB5HE1CA138466"}')
r<- httr::POST(url = 'https://vpic.nhtsa.dot.gov/api/vehicles/DecodeVINValuesBatch/', httr::add_headers(.headers=headers), body = data. encode='json'
print(content(r))
If we examine converted data
> data
["{\"format\":\"json\",\"data\":\"4JGCB5HE1CA138466;4JGCB5HE1CA138466;4JGCB5HE1CA138466\"}"]
This is no longer the dictionary structure the server expects.
So, I am new to R but seems like it might be easier to just go with Python which has a dictionary object and also a json library which handles comfortably the conversion
string to json:
import requests,json
url = 'https://vpic.nhtsa.dot.gov/api/vehicles/DecodeVINValuesBatch/'
data = json.loads('{"format": "json", "data":"4JGCB5HE1CA138466;4JGCB5HE1CA138466;4JGCB5HE1CA138466"}')
r = requests.post(url, data=data)
print(r.json())
dict
import requests
url = 'https://vpic.nhtsa.dot.gov/api/vehicles/DecodeVINValuesBatch/'
data = {'format': 'json', 'data':'4JGCB5HE1CA138466;4JGCB5HE1CA138466;4JGCB5HE1CA138466'}
r = requests.post(url, data=data).json()
print(r)
Related
I'm trying to extract the recipe data from Edamam API. All of my GET/POST requests fail.
I've tried to extract the data through Python, which works seamlessly but R gives "must not be NULL"
Here is the code:
library(httr)
library(jsonlite)
# Store the ID and Key in variables
APP_ID = "XXXXXX"
APP_KEY = "XXXXXXXXXX"
# Note: Those are not real ID and Key,
# Replace the string with your own ones that you recieved upon registration
# Setting up the request URL
api_endpoint = "https://api.edamam.com/api/recipes/v2"
url = paste0(api_endpoint, "?app_id=", APP_ID, "&app_key=", APP_KEY)
#Defining the header (as stated in the documentation)
headers = list(
`Content-type` = 'application/json'
)
#Defining the payload of the request (the data we actually want processed)
recipe = list(
`mealType` = 'breakfast'
)
#Submitting a POST request
tmp <- POST(url, body = recipe, encode = "json")
tmp <- GET(url)
tmp <- httr::POST(url, body = recipe, verbose(), content_type("application/json"))
appData <- content(tmp)
I am having problems downloading data from the link below directly with the code into R:
kaggle.com/c/house-prices-advanced-regression-techniques/data
I tried with this code:
data<-read.csv("https://www.kaggle.com/c/house-prices-advanced-regression-techniques/data?select=test.csv", skip = 1")
I tried most of the options listed here:
Access a URL and read Data with R
However, I only get html table and not tables with the relevant house-price data from the web-site. Not sure what I am doing wrong.
tnx
Here's a simple example post on kaggle how to achieve your goal, the code is taken from the example.
Create a verified account
Log in
Go to you account (click the top right -> account)
Click "Create new API token"
Place the file somewhere sensible that you can access from R
library(httr)
library(jsonlite)
kgl_credentials <- function(kgl_json_path="~/.kaggle/kaggle.json"){
# returns user credentials from kaggle json
user <- fromJSON("~/.kaggle/kaggle.json", flatten = TRUE)
return(user)
}
kgl_dataset <- function(ref, file_name, type="dataset", kgl_json_path="~/.kaggle/kaggle.json"){
# ref: depends on 'type':
# - dataset: "sudalairajkumar/novel-corona-virus-2019-dataset"
# - competition: competition ID, e.g. 8587 for "competitive-data-science-predict-future-sales"
# file_name: specific dataset wanted, e.g. "covid_19_data.csv"
.kaggle_base_url <- "https://www.kaggle.com/api/v1"
user <- kgl_credentials(kgl_json_path)
if(type=="dataset"){
# dataset
url <- paste0(.kaggle_base_url, "/datasets/download/", ref, "/", file_name)
}else if(type=="competition"){
# competition
url <- paste0(.kaggle_base_url, "/competitions/data/download/", ref, "/", file_name)
}
# call
rcall <- httr::GET(url, httr::authenticate(user$username, user$key, type="basic"))
# content type
content_type <- rcall[[3]]$`content-type`
if( grepl("zip", content_type)){
# download and unzup
temp <- tempfile()
download.file(rcall$url,temp)
data <- read.csv(unz(temp, file_name))
unlink(temp)
}else{
# else read as text -- note: code this better
data <- content(rcall, type="text/csv", encoding = "ISO-8859-1")
}
return(data)
}
Then you can use the credentials to download the dataset as described in the post
kgl_dataset(file_name = 'test.csv',
type = 'competition',
ref = 'house-prices-advanced-regression-techniques',
kgl_json_path = 'kaggle.json')
Alternatively you can use the unofficial R api
library(devtools)
install_github('mkearney/kaggler')
library(kaggler)
kgl_auth(creds_file = 'kaggle.json')
kgl_competitions_data_download('house-prices-advanced-regression-techniques', 'test.csv')
However this fails, due to a mistake in the implementation of kgl_api_get
function (path, ..., auth = kgl_auth())
{
r <- httr::GET(kgl_api_call(path, ...), auth)
httr::warn_for_status(r)
if (r$status_code != 200) { # <== should be "=="
...
}
I downloaded the data (which you should just do too, it's quite easy), but just in case you don't want to, I uploaded the data to Pastebin and you can run the code below. This is for their "train" dataset, downloaded from the link you provided above
data <- read.delim("https://pastebin.com/raw/aGvwwdV0", header=T)
I'm trying to loop through all the CSV files on an FTP site and upload the contents of CSVs with a certain filename to a database.
So far I've been able to
access the FTP using...
getURL((url, userpwd = userpwd, ftp.use.epsv = FALSE, dirlistonly = TRUE),
get a list of the filenames using...
unlist(strsplit(filenames, "\r\n"),
and create a dataframe with a list of the full urls (e.g ftp://sample#ftpserver.name.com/samplename.csv) using...
for (i in seq_along(myfiles)) {
url_list[i,] <- paste(url, myfiles[i], sep = '')
}
How do I loop through this dataframe, filtering for certain filenames, in order to create a new dataframe with all of data from the relevant CSVs? (half the files are named Type1SampleName and half are Type2SampleName)
I would then uploading this data to the database.
Thanks!
Since RCurl::getURL returns direct HTTP response here being content of CSVs, consider extending your lapply function call to pass result into read.csv using text argument:
# VECTOR OF URLs
urls <- paste0(url, myfiles[grep("Type1", myfiles])
# LIST OF DATA FRAMES FROM EACH CSV
mydata <- lapply(urls, function(url) {
resp <- getURL(url, userpwd = userpwd, connecttimeout = 60)
read.csv(text = resp)
})
Alternatively, getURL supports a callback function with write argument:
Alternatively, if a value is supplied for the write parameter, this is returned. This allows the caller to create a handler within the call and get it back. This avoids having to explicitly create and assign it and then call getURL and then access the result. Instead, the 3 steps can be inlined in a single call.
# USER DEFINED METHOD
import_csv <- function(resp) read.csv(text = resp)
# LONG FORM NOTATION
mydata <- lapply(urls, function(url)
getURL(url, userpwd = userpwd, connecttimeout = 60, write = import_csv)
)
# SHORT FORM NOTATION
mydata <- lapply(urls, getURL, userpwd = userpwd, connecttimeout = 60, write = import_csv)
Just an update on how I finished this off and what worked for me in the end...
mydata <- lapply(urls, getURL, userpwd = userpwd, connecttimeout = 60)
Following on from above..
while (i <= length(mydata)) {
mydata1 <- paste0(mydata[[i]])
bin <- read.csv(text = mydata1, header = FALSE, skip = 1)
#Column renaming and formatting here
#Uploading to database using RODBC here
}
Thanks for the pointers #Parfait - really appreciated.
Like most problems it looks straightforward after you've done it!
I want to set a header in an request using the R httr package and set a header, when I have the name of the header in a variable.
I would like to do something like this:
tokenName = 'X-Auth-Token'
get_credentials_test <- function (token) {
url <- paste(baseUrl,"/api/usercredentials", sep = '')
r <- GET(url, add_headers(tokenName = token))
r
}
however, the above code seems to set a header with the name tokenName.
It does work if I do the following:
get_credentials_test <- function (token) {
url <- paste(baseUrl,"/api/usercredentials", sep = '')
r <- GET(url, add_headers('X-Auth-Token' = token))
r
}
but I want to have some flexibility if the name of the header changes and the requirement to add the header is sprinkled liberally around the code. I am not sure if it is possible to add a header contained with a variable but that is what I would like to do.
You could create the headers as a named vector, and then pass it as the .headers argument:
h <- c(token)
names(h) <- tokenName
r <- GET(url, add_headers(.headers = h))
While this works because add_headers takes a .headers argument (see here), a more general alternative for calling a function with arbitrary argument names is do.call:
h <- list(token)
names(h) <- tokenName
r <- GET(url, do.call(add_headers, h))
It's easy with structure():
get_creds <- function(base.url, path, header.name, token) {
url <- paste0(base.url, path)
header <- structure(token, names = header.name)
r <- httr::GET(url, httr::add_headers(header))
r
}
I am running into what appears to be character size limit in a JSON string when trying retrieve data from either curlPerform() or getURL(). Here is non-reproducible code [1], but it should shed some light on the problem.
# Note that .base.url is the basic url for the API, q is a query, user
# is specified, etc.
session = getCurlHandle()
curl.opts <- list(userpwd = paste(user, ":", key, sep = ""),
httpheader = "Content-Type: application/json")
request <- paste(.base.url, q, sep = "")
txt <- getURL(url = request, curl = session, .opts = curl.opts,
write = basicTextGatherer())
or
r = dynCurlReader()
curlPerform(url = request, writefunction = r$update, curl = session,
.opts = curl.opts)
My guess is that the update or value functions in the basicTextGather or dynCurlReader text handler objects are having trouble with the large strings. In this example, r$value() will return a truncated string that is approximately 2 MB. The code given above will work fine for queries < 2 MB.
Note that I can easily do the following from the command line (or using system() in R), but writing to disc seems like a waste if I am doing the subsequent analysis in R.
curl -v --header "Content-Type: application/json" --user username:register:passwd https://base.url.for.api/getdata/select+*+from+sometable > stream.json
where stream.json is a roughly 14MB json string. I can read the string into R using either
con <- file(paste(.project.path, "data/stream.json", sep = ""), "r")
string <- readLines(con)
or directly to list as
tmp <- fromJSON(file = paste(.project.path, "data/stream.json", sep = ""))
Any thoughts are very much appreciated.
Ryan
[1] - Sorry for not providing reproducible code, but I'm dealing with a govt firewall.