I need to send a CSV file using Curl in Windows to an R program using the plumber API. I have Curl installed in Windows.
I've tried the following, but it doesn't work. I'm new to R and APIs, sorry.
The Curl command on Windows:
curl -X GET -d '#C:\Users\hp\Document\Final_Training.csv'
http://localhost:8000/data1
gives the error:
"error":["500 - Internal server error"],"message":["Error in is.data.frame(x):
argument \"data1\" is missing, with no default\n"]}
My R plumber code is:
#* #get /data1
function(data1) {
write.csv(data1, file="Final_Test.csv", sep=",", row.names=FALSE,
col.names=TRUE)
}
There are two issues. First, when "form data" is sent over HTTP, it's sent as a sequence of key/value pairs. In particular, in a GET request, it's actually embedded in the URL. For example, in the URL:
https://www.google.com/intl/en/about/?fg=1&utm_source=google.com&...
the key/value pairs are fg=1, utm_source=google.com, etc.
When you write Plumber code, the functions take arguments that correspond to the keys, so if Google was going to decide to deploy Plumber for their "About" page, they might write:
#* #get /intl/en/about
function(fg, utm_source, etc) {
...
}
In your code, you have Plumber code for URL /data1 that also expects a data1 key containing the CSV contents:
#* #get /data1 <-- the URL
function(data1) { <-- which expects a `?data1=...` query parameter
write.csv(data1, file="Final_Test.csv", sep=",", row.names=FALSE,
col.names=TRUE)
}
To get Curl to do this (provide the data from a file using a key data1 and supply it encoded in the URL for a GET request), you need to use a slightly different command line than you're using:
curl -X GET --data-urlencode 'data1#C:\Users\hp\Document\Final_Training.csv'
http://localhost:8000/data1
That should get the data actually sent to the Plumber process. On the Windows side, you should see an empty Javascript object {} response (so no error). On the R side, you'll see some weird warnings about attempts to set col.names and sep being ignored or something. In addition, what gets written to Final_Test.csv won't be a proper CSV file.
That leads to the second issue. In your plumber code, data1 will be received as a plain character string containing the contents of the CSV file. It will not be a data frame or anything like that. If you just want to write the uploaded file to a CSV file, you'd just write the string data1 to the file directly:
function(data1) {
cat(data1, file="Final_Test.csv")
}
If you want to load it into a data frame, you'd want to parse the string as a CSV file using something like:
#* #get /data1
function(data1) {
t <- textConnection(data1)
X <- read.csv(t)
close(t)
print(summary(X))
list(result=jsonlite::unbox("success"))
}
Related
I'm receiving the data I'm requesting but don't understand how to sufficiently extract the data. Here is the POST request:
library(httr)
url <- "http://tools-cluster-interface.iedb.org/tools_api/mhci/"
body <- list(method="recommended", sequence_text="SLYNTVATLYCVHQRIDV", allele="HLA-A*01:01,HLA-A*02:01", length="8,9")
data <- httr::POST(url, body = body,encode = "form", verbose())
If I print the data with:
data
..it shows the request details followed by a nicely formatted table. However if I try to extract with:
httr::content(data, "text")
This returns a single string with all the values of the original table. The output looks delimited by "\" but I couldn't str_replace or tease it out properly.
I'm new to requests using R (and httr) and assume it's an option I'm missing with httr. Any advice?
API details here: http://tools.iedb.org/main/tools-api/
The best way to do this is to specify the MIME type:
content(data, type = 'text/tab-separated-values')
I am using R to call on Cybersource API. When I use the GET request I obtain a successful response of 200. When I read the body of the the response rather than get the csv data back I get a path to a csv file path. I wonder what I am doing wrong.
content(request) gives
"/space/download_reports/output/dailyreports/reports/2018/10/27/testrest/TRRReport-7931d82d-cf4a-71fa-e053-a2588e0ab27a.csv"
The result of content(request) should be the data not a file path.
Here is a the code
library('httr')
merchant<-'testrest'
vcdate<-'Wed, 29 May 2019 10:09:48 GMT'
ho<-'apitest.cybersource.com'
URL<-'https://apitest.cybersource.com/reporting/v3/report-downloads?organizationId=testrest&reportDate=2018-10-27&reportName=TRRReport'
sign<-'keyid="08c94330-f618-42a3-b09d-e1e43be5efda", algorithm="HmacSHA256", headers="host (request-target) v-c-merchant-id", signature="7cr6mZMa1oENhJ5NclGsprmQxwGlo1j3VjqAR6xngxk="'
req<-GET(URL, add_headers(.headers=c('v-c-merchant-id'=merchant, 'v-c-date'=vcdate, 'Host'=ho, 'Signature'=sign)))
content(req)
Here is Cybersource test api where you can verify the data produced.
https://developer.cybersource.com/api/reference/api-reference.html
I am trying to download a report under the reporting tab
Honestly I'm not exactly sure what's going on here, but I do think I was able to get something to work.
The API seems to like to return data in application/hal+json format which is not one that httr normally requests. You can say you'll accept anything with accept("*") So you can do your request with:
req <- GET(URL, add_headers('v-c-merchant-id'=merchant,
'v-c-date'=vcdate,
'Host'=ho,
'Signature'=sign), accept("*"))
Now we are actually getting the data back that we want, but httr doesn't know how to automatically parse it. So we need to parse it ourselves. This seems to do the trick
readr::read_csv(rawToChar(content(req)), skip=1)
There seems to be a header row which we skip with skip= and then we parse the rest as a CSV file with readr::read_csv.
I have 500+ .json files that I am trying to get a specific element out of. I cannot figure out why I cannot read more than one at a time..
This works:
library (jsonlite)
files<-list.files(‘~/JSON’)
file1<-fromJSON(readLines(‘~/JSON/file1.json),flatten=TRUE)
result<-as.data.frame(source=file1$element$subdata$data)
However, regardless of using different json packages (eg RJSONIO), I cannot apply this to the entire contents of files. The error I continue to get is...
attempt to run same code as function over all contents in file list
for (i in files) {
fromJSON(readLines(i),flatten = TRUE)
as.data.frame(i)$element$subdata$data}
My goal is to loop through all 500+ and extract the data and its contents. Specifically if the file has the element ‘subdata$data’, i want to extract the list and put them all in a dataframe.
Note: files are being read as ASCII (Windows OS). This does bot have a negative effect on single extractions but for the loop i get ‘invalid character bytes’
Update 1/25/2019
Ran the following but returned errors...
files<-list.files('~/JSON')
out<-lapply(files,function (fn) {
o<-fromJSON(file(i),flatten=TRUE)
as.data.frame(i)$element$subdata$data
})
Error in file(i): object 'i' not found
Also updated function, this time with UTF* errors...
files<-list.files('~/JSON')
out<-lapply(files,function (i,fn) {
o<-fromJSON(file(i),flatten=TRUE)
as.data.frame(i)$element$subdata$data
})
Error in parse_con(txt,bigint_as_char):
lexical error: invalid bytes in UTF8 string. (right here)------^
Latest Update
Think I found out a solution to the crazy 'bytes' problem. When I run readLines on the .json file, I can then apply fromJSON),
e.x.
json<-readLines('~/JSON')
jsonread<-fromJSON(json)
jsondf<-as.data.frame(jsonread$element$subdata$data)
#returns a dataframe with the correct information
Problem is, I cannot apply readLines to all the files within the JSON folder (PATH). If I can get help with that, I think I can run...
files<-list.files('~/JSON')
for (i in files){
a<-readLines(i)
o<-fromJSON(file(a),flatten=TRUE)
as.data.frame(i)$element$subdata}
Needed Steps
apply readLines to all 500 .json files in JSON folder
apply fromJSON to files from step.1
create a data.frame that returns entries if list (fromJSON) contains $element$subdata$data.
Thoughts?
Solution (Workaround?)
Unfortunately, the fromJSON still runs in to trouble with the .json files. My guess is that my GET method (httr) is unable to wait/delay and load the 'pretty print' and thus is grabbing the raw .json which in-turn is giving odd characters and as a result giving the ubiquitous '------^' error. Nevertheless, I was able to put together a solution, please see below. I want to post it for future folks that may have the same problem with the .json files not working nicely with any R json package.
#keeping the same 'files' variable as earlier
raw_data<-lapply(files,readLines)
dat<-do.call(rbind,raw_data)
dat2<-as.data.frame(dat,stringsasFactors=FALSE)
#check to see json contents were read-in
dat2[1,1]
library(tidyr)
dat3<-separate_rows(dat2,sep='')
x<-unlist(raw_data)
x<-gsub('[[:punct:]]', ' ',x)
#Identify elements wanted in original .json and apply regex
y<-regmatches(x,regexc('.*SubElement2 *(.*?) *Text.*',x))
for loops never return anything, so you must save all valuable data yourself.
You call as.data.frame(i) which is creating a frame with exactly one element, the filename, probably not what you want to keep.
(Minor) Use fromJSON(file(i),...).
Since you want to capture these into one frame, I suggest something along the lines of:
out <- lapply(files, function(fn) {
o <- fromJSON(file(fn), flatten = TRUE)
as.data.frame(o)$element$subdata$data
})
allout <- do.call(rbind.data.frame, out)
### alternatives:
allout <- dplyr::bind_rows(out)
allout <- data.table::rbindlist(out)
I'm trying to parse through a JSON file, this file has thousands of individual JSON objects but for testing purposes I am working with only 3 rows of JSON objects. What I've run into is, I've declared a function in R and using the rjson package, but when I try to execute the R file in RStudio, I get an error in the console saying that my function is undefined.
I've tried calling the function via the console, and executing via the R script file. I've also read through a couple tutorials as I've never worked in R before.
library("rjson")
parseJsonData <- function (fileName)
{
result <- fromJSON(file = fileName)
jsonFrame <- as.data.frame(result)
return(jsonFrame)
}
parseJsonData("testData.json")
I would expect to have a data frame returned and print to the console window.
UPDATE:
Seems to be an issue with the file format, which I don't have control over unfortunately, the file provided is a json file, but it is formatted as having 10k individual JSON objects, not a list of objects.
For example, this is how the file is formatted inside the testData.json
{"name":"test1", "value":"1"}
{"name":"test2", "value":"2"}
{"name":"test3", "value":"3"}
{"name":"test4", "value":"4"}
{"name":"test5", "value":"5"}
I am trying to merge multiple json files into one database and despite trying all the approaches found on SO, it fails.
The files provide sensor data. The stages I've completed are:
1. Unzip the files - produces json files saved as '.txt' files
2. Remove the old zip files
3. Parse the '.txt' files to remove some bugs in the content - random 3
letter + comma combos at the start of some lines, e.g. 'prm,{...'
I've got code which will turn them into data frames individually:
stream <- stream_in(file("1.txt"))
flat <- flatten(stream)
df_it <- as.data.frame(flat)
But when I put it into a function:
df_loop <- function(x) {
stream <- stream_in(x)
flat <- flatten(stream)
df_it <- as.data.frame(flat)
df_it
}
And then try to run through it:
df_all <- sapply(file.list, df_loop)
I get:
Error: Argument 'con' must be a connection.
Then I've tried to merge the json files with rbind.fill and merge to no avail.
Not really sure where I'm going so terribly wrong so would appreciate any help.
You need a small change in your function. Change to -
stream <- stream_in(file(x))
Explanation
Start with analyzing your original implementation -
stream <- stream_in(file("1.txt"))
The 1.txt here is the file path which is getting passed as an input parameter to file() function. A quick ?file will tell you that it is a
Function to create, open and close connections, i.e., “generalized
files”, such as possibly compressed files, URLs, pipes, etc.
Now if you do a ?stream_in() you will find that it is a
function that implements line-by-line processing of JSON data over a
connection, such as a socket, url, file or pipe
Keyword here being socket, url, file or pipe.
Your file.list is just a list of file paths, character/strings to be specific. But in order for stream_in() to work, you need to pass in a file object, which is the output of file() function which takes in the file path as a string input.
Chaining that together, you needed to do stream_in(file("/path/to/file.txt")).
Once you do that, your sapply takes iterates each path, creates the file object and passes it as input to stream_in().
Hope that helps!