I want to upload a csv file to a REST API.
The API is accessible via an URL like
http://sampledomain.com/api/data/?key=xxx
A provided sample curl call looks as following:
curl --form "file=#my_data.zip" \
"http://sampledomain.com/api/data/?key=xxx"
How can I translate this call into R?
I heard of the RCurl package, but can´t figure out how to use it in this case.
Regards
I am not sure RCurl will handle it as you can see from the limit on the first page.
Limitations One doesn't yet have full control over the contents of a
POST form such as specifying files, content type. Error handling uses
a single global variable at present.
However, another package from Hadley that might solve your problem httr
POST("http://sampledomain.com/api/data/?key=xxx", body = list(y = upload_file(system.file("my_data.zip"))))
Related
I tried
download.file('https://www.dropbox.com/s/r3asyvybozbizrm/Himalayas.jpg',
destfile="1.jpg",
method="auto")
but it returns the HTML source of that page.
Also tried a little bit of rdrop
library(rdrop2)
# please put in your key/secret
drop_auth(new_usesr = FALSE, key=key, secret=secret, cache=T)
And the pop up website reports:
Invalid redirect_uri: "http://localhost:1410": It must exactly match one of the redirect URIs you've pre-configured for your app (including the path).
I don't understand the URI thing very well. Can somebody recommend some document to read please....
I read some posts but most of them discuss how to read data from excel files.
repmis worked only for reading excel files...
library(repmis)
repmis::source_DropboxData("test.csv",
"tcppj30pkluf5ko",
sep = ",",
header = F)
Also tried
library(RCurl)
url='https://www.dropbox.com/s/tcppj30pkluf5ko/test.csv'
x = getURL(url)
read.csv(textConnection(x))
And it didn't work...
Any help and discussion's appreciated. Thanks!
The first issue is because the https://www.dropbox.com/s/r3asyvybozbizrm/Himalayas.jpg link points to a preview page, not the file content itself, which is why you get the HTML. You can modify links like this though to point to the file content, as shown here:
https://www.dropbox.com/help/201
E.g., add a raw=1 URL parameter:
https://www.dropbox.com/s/r3asyvybozbizrm/Himalayas.jpg?raw=1
Your downloader will need to follow redirects for that to work though.
The second issue is because you're trying to use a OAuth 2 app authorization flow, which requires that all redirect URIs be pre-registered. You can register redirect URIs, in your case it's http://localhost:1410, for Dropbox API apps on the app's page on the App Console:
https://www.dropbox.com/developers/apps
For more information on using OAuth, you can refer to the Dropbox API OAuth guide here:
https://www.dropbox.com/developers/reference/oauthguide
I use read.table(url("yourdropboxpubliclink")) for instance I use this link
instead of using https://www.dropbox.com/s/xyo8sy9velpkg5y/foo.txt?dl=0, which is chared link on dropbox I use
https://dl.dropboxusercontent.com/u/15634209/histogram/foo.txt
and non-public link raw=1 will work
It works fine for me.
Everytime I am testing postman collection ,I need to change the authorization token under header manually followed by exporting the collection again and running through newman.
Is there any way , instead of giving here, I can give it in a CSV file which is being used as test data file. which would reduce the efforts of changing code every time.
Please suggest.
Yes you can, in your csv file, add one more column - name as "AT"
and mention that reference AT in your request as in the following picture.
I am trying to write a function that will take a list of dates and retrieve the dataset as found on https://www.treasurydirect.gov/GA-FI/FedInvest/selectSecurityPriceDate.htm
I am using PROC IML in SAS to execute R-code (since I am more familiar with R).
My problem is within R, and is due to the website.
First, I am aware that there is an API but this is an exercise I really want to learn because many sites do not have APIs.
Does anyone know how to retrieve the datasets?
Things I've heard:
Use RSelenium to program the clicking. RSelenium got taken off of the archive recently so that isn't an option (even downloading it off of a previous version is causing issues).
Look at the XML url changes as I click the "submit" button in Chrome. However, the XML in the Network tab doesn't show anything, whereas on other websites that have different methods of searching do.
I have been looking for a solution all day, but to no avail! Please help
First, you need to read the terms and conditions and make sure that you are not breaking the rules when scraping.
Next, if there is an API, you should use it so that they can better manage their data usage and operations.
In addition, you should also limit the number of requests made so as not to overload the server. If I am not wrong, this is related to DNS Denial of Service attacks.
Finally, if those above conditions are satisfied, you can use the inspector on Chrome to see what HTTP requests are being made when you browse these webpages.
In this particular case, you do not need RSelenium and a simple HTTP POST will do
library(httr)
resp <- POST("https://www.treasurydirect.gov/GA-FI/FedInvest/selectSecurityPriceDate.htm",
body=list(
priceDate.month=5,
priceDate.day=15,
priceDate.year=2018,
submit="CSV+Format"
),
encode="form")
read.csv(text=rawToChar(resp$content), header=FALSE)
You can perform the same http processing in a SAS session using Proc HTTP. The CSV data does not contain a header row, so perhaps the XML Format is more appropriate. There are a couple of caveats for the treasurydirect site.
Prior to posting a data download request the connection needs some cookies that are assigned during a GET request. Proc HTTP can do this.
The XML contains an extra tag container <bpd> that the SAS XMLV2 library engine can't handle simply. This extra tag can be removed with some DATA step processing.
Sample code for XML
filename response TEMP;
filename respfilt TEMP;
* Get request sets up fresh session and cookies;
proc http
clear_cache
method = "get"
url ="https://www.treasurydirect.gov/GA-FI/FedInvest/selectSecurityPriceDate.htm"
;
run;
* Post request as performed by XML format button;
* automatically utilizes cookies setup in GET request;
* in= can now directly specify the parameter data to post;
proc http
method = "post"
in = 'priceDate.year=2018&priceDate.month=5&priceDate.day=15&submit=XML+Format'
url ="https://www.treasurydirect.gov/GA-FI/FedInvest/selectSecurityPriceDate.htm"
out = response
;
run;
* remove bpd tag from the response (the downloaded xml);
data _null_;
infile response;
file respfilt;
input;
if _infile_ not in: ('<bpd', '</bpd');
put _infile_;
run;
* copy data collections from xml file to tables in work library;
libname respfilt xmlv2 ;
proc copy in=respfilt out=work;
run;
Reference material
REST at Ease with SAS®: How to Use SAS to Get Your REST
Joseph Henry, SAS Institute Inc., Cary, NC
http://support.sas.com/resources/papers/proceedings16/SAS6363-2016.pdf
I had a nice little package to scrape Google Ngram data but I have discovered they have switched to SSL and my package has broken. If I switch from readLines to getURL gets some of the way there, but some of the included script in the page is missing. Do I need to get fancy with user agents or something?
Here is what I have tried so far (pretty basic):
library(RCurl)
myurl <- "https://books.google.com/ngrams/graph?content=hacker&year_start=1950&year_end=2000"
getURL(myurl)
Comparing the results to viewing the source after entering the url in a browser shows that the crucial content is missing from the results returned to R. In the browser, the source includes content looking like this:
<script type="text/javascript">
var data = [{"ngram": "hacker", "type": "NGRAM", "timeseries": [9.4930387994907051e-09,
1.1685493106483591e-08, 1.0784501440023556e-08, 1.0108472218003532e-08,
etc.
Any suggestions would be greatly appreciated!
Sorry, not a direct solution, but it doesn't seem to be an user-agent problem. When you open your URL in a browser, you can see that there is a redirection that adds a parameter at the end of the address : direct_url=t1%3B%2Chacker%3B%2Cc0.
If you use getURL() to download this new URL, complete with the new parameter, then the javascript you are mentioning is present in the result.
Another solution could be to try to access data via Google BigQuery, as mentioned in this SO question :
Google N-Gram Web API
I need to use Fiddler to modify the POST fields sent by a browser. I know I can do that using the Fiddler UI but I want to create a script to do it automatically.
I need to insert the code inside the OnBeforeRequest method and I know I can use regular expressions to parse the POST fields but maybe there is something already available to do it like some sort of object POST with all the current fields, e.g: POST["field1"], POST["field2"], etc.
So...is it possible or do I have to do it manually?
Thanks!
Fiddler itself does not contain a script-accessible POST body parser, which means you'd either need to import one, write one, or use string processing to accomplish this task.