PUT with an empty body using httr (on R) to webHDFS - r

When trying to put to WebHDFS in order to create a file and write to it (using the following link: https://hadoop.apache.org/docs/r1.0.4/webhdfs.html#CREATE) I run into issues using httr.
Using RCurl or RWebHDFS is not possible because the target Hadoop cluster is secure.
Here is the code I have attempted to use:
library(httr)
r <- PUT("https://hadoopmgr1p.global.ad:14000/webhdfs/v1/user/testuser/temp/loadfile_testuser_2019-11-28_15_28_41411?op=CREATE&permission=755&user.name=testuser",
authenticate(":", "", type = "gssnegotiate"),
verbose())
testuser is a super user with permissions to R/W. I get the following error:
<- HTTP/1.1 400 Data upload requests must have content-type set to 'application/octet-stream'
<- Date: Fri, 29 Nov 2019 15:42:30 GMT
<- Date: Fri, 29 Nov 2019 15:42:30 GMT
<- Pragma: no-cache
<- X-Content-Type-Options: nosniff
<- X-XSS-Protection: 1; mode=block
<- Content-Length: 0
The error is pretty explanatory, so I then attempt to PUT with a content-type:
r <- PUT("https://hadoopmgr1p.global.ad:14000/webhdfs/v1/user/testuser/temp/loadfile_testuser_2019-11-28_15_28_41411?op=CREATE&permission=755&user.name=testuser",
authenticate(":", "", type = "gssnegotiate"),
content_type("application/octet-stream"),
verbose())
I get a success - however it is not truly successful:
<- Date: Fri, 29 Nov 2019 16:04:52 GMT
<- Cache-Control: no-cache
<- Expires: Fri, 29 Nov 2019 16:04:52 GMT
<- Date: Fri, 29 Nov 2019 16:04:52 GMT
<- Pragma: no-cache
<- Content-Type: application/json;charset=utf-8
<- X-Content-Type-Options: nosniff
<- X-XSS-Protection: 1; mode=block
<- Content-Length: 0
There is no file that was uploaded. Uploading a file with that first request, gives me another error:
<- HTTP/1.1 307 Temporary Redirect
<- Date: Fri, 29 Nov 2019 16:07:24 GMT
<- Cache-Control: no-cache
<- Expires: Fri, 29 Nov 2019 16:07:24 GMT
<- Date: Fri, 29 Nov 2019 16:07:24 GMT
<- Pragma: no-cache
<- Content-Type: application/json;charset=utf-8
<- X-Content-Type-Options: nosniff
<- X-XSS-Protection: 1; mode=block
Error in curl::curl_fetch_memory(url, handle = handle) :
necessary data rewind wasn't possible
The code in question:
library(httr)
temp_file <- httr::upload_file(lfs_temp_file, type = "text/plain")
r <- PUT("https://hadoopmgr1p.global.ad:14000/webhdfs/v1/user/testuser/temp/loadfile_testuser_2019-11-28_15_28_41411?op=CREATE&permission=755&user.name=testuser",
authenticate(":", "", type = "gssnegotiate"),
body=temp_file,
content_type("application/octet-stream"),
verbose())
Attempting the same command using curl works without issue:
curl -i -k -X PUT --negotiate -u : "https://hadoopmgr1p.global.ad:14000/webhdfs/v1/user/testuser/temp/loadfile_testuser_2019-11-28_15_28_4141?op=CREATE&permission=755&user.name=testuser"
This results in the following:
% Total % Received % Xferd Average Speed Time Time Time Current
Dload Upload Total Spent Left Speed
0 0 0 0 0 0 0 0 --:--:-- --:--:-- --:--:-- 0HTTP/1.1 307 Temporary Redirect
Date: Thu, 28 Nov 2019 23:27:16 GMT
Cache-Control: no-cache
Expires: Thu, 28 Nov 2019 23:27:16 GMT
Date: Thu, 28 Nov 2019 23:27:16 GMT
Pragma: no-cache
Content-Type: application/json;charset=utf-8
X-Content-Type-Options: nosniff
X-XSS-Protection: 1; mode=block
WWW-Authenticate: Negotiate <stuff>/
Set-Cookie: hadoop.auth="<stuff>"; Path=/; Secure; HttpOnly
Location: https://hadoopmgr1p.global.ad:14000/webhdfs/v1/user/testuser/temp/loadfile_testuser_2019-11-28_15_28_4141?op=CREATE&data=true&user.name=testuser&permission=755
Content-Length: 0
Following the Location header lets us create the file successfully.
What am I doing wrong?
Thanks

Good work, including the curl output. I believe that answers it.
Your curl command uses PUT, and your httr command uses POST. Try https://www.rdocumentation.org/packages/httr/versions/1.4.1/topics/PUT .
Hint for future reference: POST commands are not typically used if you're specifying an exact location. That's what PUT is for.

httr is attempting to follow the redirect, and failing. To fix the issue, tell httr to stop following the location config(followlocation = 0L).
The PUT command will be as follows:
r <- PUT("https://hadoopmgr1p.global.ad:14000/webhdfs/v1/user/testuser/temp/
loadfile_testuser_2019-11-28_15_28_41411?op=CREATE&permission=755&user.name=testuser",
authenticate(":", "", type = "gssnegotiate"),
body=NULL,
config(followlocation = 0L),
verbose())
This should return a valid reponse with a Location header.

Related

httr GET returns wrong content-type

httr 1.4.1
R version 3.6.1 (also tried with 3.5.3)
Edit (adding verbose()) output.
I've got a request as follows:
r <- GET("https://my.cool.domain",add_headers(.headers = c('x-api-key' = 'abcdefg', 'Accept' = "text/csv")), verbose())
On my machine it responds with:
-> GET / HTTP/1.1
-> Host: https://my.cool.domain
-> User-Agent: libcurl/7.54.0 r-curl/4.2 httr/1.4.1
-> Accept-Encoding: deflate, gzip
-> x-api-key: abcdefg
-> Accept: text/csv
->
<- HTTP/1.1 200 OK
<- Date: Tue, 26 Nov 2019 17:50:15 GMT
<- Content-Type: text/csv
<- Content-Length: 24902
<- Connection: keep-alive
<- x-amzn-RequestId: ...
<- Content-Encoding: deflate
<- x-amz-apigw-id: ...
<- X-Amzn-Trace-Id: ...
Response [https://my.cool.domain]
Date: 2019-11-26 17:20
Status: 200
Content-Type: text/csv
Size: 209 kB
cats,dogs...
yes,no...
yes,yes...
no,no...
However on my colleague's machine (same version of httr and R, and also with an updated version of R) I get the following:
-> GET / HTTP/2
-> Host: https://my.cool.domain
-> User-Agent: libcurl/7.64.1 r-curl/4.2 httr/1.4.1
-> Accept-Encoding: deflate, gzip
-> x-api-key: abcdefg
-> Accept: text/csv
->
<- HTTP/2 200
<- date: Tue, 26 Nov 2019 17:46:17 GMT
<- content-type: application/json
<- content-length: 21501
<- x-amzn-requestid: ...
<- content-encoding: deflate
<- x-amz-apigw-id: ...
<- x-amzn-trace-id: ...
Response [https://my.cool.domain]
Date: 2019-11-26 17:30
Status: 200
Content-Type: application/json
Size: 377 kB
I'm working with the developer of the https://my.cool.domain domain and I can confirm that the request header params (x-api-key and 'Accept' = "text/csv") are perfect. And the request works on my machine, and several others, but not this one colleague's.
What's going wrong here and how can I debug this?
Thanks
This was fixed by doing httr::set_config(httr::config(http_version = 1.1)) to force 1.1.

SOAP request in R - Content-length = 0

Can someone wise point out to me where i am going wring with this SOAP request in R? I can get a valid response from the server when sending request with Python but in R i get an empty body in the response (Content-length = 0).
Here is the example request: http://www.bom.gov.au/waterdata/wiski-web-public/GetCapabilities%20example%20request.xml
and the example response: http://www.bom.gov.au/waterdata/wiski-web-public/GetCapabilities%20example%20response.xml
library(RCurl)
headerFields =
c(Accept = "text/xml",
'Content-Type' = "text/xml; charset=utf-8",
SOAPAction = "")
TXbody = '<?xml version="1.0" encoding="UTF-8"?>
<soap12:Envelope xmlns:soap12="http://www.w3.org/2003/05/soap-envelope" xmlns:sos="http://www.opengis.net/sos/2.0" xmlns:wsa="http://www.w3.org/2005/08/addressing" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:ows="http://www.opengis.net/ows/1.1" xmlns:fes="http://www.opengis.net/fes/2.0" xmlns:gml="http://www.opengis.net/gml/3.2" xmlns:swes="http://www.opengis.net/swes/2.0" xsi:schemaLocation="http://www.w3.org/2003/05/soap-envelope http://www.w3.org/2003/05/soap-envelope/soap-envelope.xsd http://www.opengis.net/sos/2.0 http://schemas.opengis.net/sos/2.0/sos.xsd">
<soap12:Header>
<wsa:To>http://www.ogc.org/SOS</wsa:To>
<wsa:Action>
http://www.opengis.net/def/serviceOperation/sos/core/2.0/GetCapabilities
</wsa:Action>
<wsa:ReplyTo>
<wsa:Address>http://www.w3.org/2005/08/addressing/anonymous</wsa:Address>
</wsa:ReplyTo>
<wsa:MessageID>0</wsa:MessageID>
</soap12:Header>
<soap12:Body>
<sos:GetCapabilities service="SOS"/>
</soap12:Body>
</soap12:Envelope>'
h = basicTextGatherer()
R <- curlPerform(url = "http://www.bom.gov.au/waterdata/services?service=SOS",
httpheader = headerFields,
postfields = TXbody, verbose=TRUE,
writefunction = h$update)
RXbody <- h$value()
The response i get is:
Trying 104.99.8.39...
Connected to www.bom.gov.au (104.99.8.39) port 80 (#0)
POST /waterdata/services?service=SOS HTTP/1.1 Host: www.bom.gov.au Accept: text/xml Content-Type: text/xml; charset=utf-8 Content-Length:
1018
upload completely sent off: 1018 out of 1018 bytes < HTTP/1.1 200 OK < Access-Control-Allow-Origin: * < Content-Type: text/plain;
charset=UTF-8 < Server: Apache-Coyote/1.1 < X-UA-Compatible: IE=Edge <
Content-Length: 0 < Expires: Tue, 12 Mar 2019 08:11:53 GMT <
Cache-Control: max-age=0, no-cache, no-store < Pragma: no-cache <
Date: Tue, 12 Mar 2019 08:11:53 GMT < Connection: keep-alive <
Connection #0 to host www.bom.gov.au left intact
I have tried this: SOAP request in R
and this: SOAP request failure in R

character vector and JSON in R

I called an API from R using getURL that returns a JSON response.
When I check with typeof in R, it gives me [1] "character".
I am trying to have my data in JSON format as it should be, to be able to convert it to a DataTable. What could be the reason that it is a character list and how do I fix it?
This is what I am getting in the data returned from the API:
[1] "HTTP/1.1 200 OK\r\nDate: Thu, 04 Jan 2018 20:38:50 GMT\r\nContent-Type: application/json; charset=utf-8\r\nTransfer-Encoding: chunked\r\nConnection: keep-alive\r\nSet-Cookie: __cfduid=d6bbf45645c3bd5332f83d25d06d8b8ca1515098329; expires=Fri, 04-Jan-19 20:38:49 GMT; path=/; domain=.onesignal.com; HttpOnly\r\nStatus: 200 OK\r\nCache-Control: public, max-age=7200\r\nAccess-Control-Allow-Origin: *\r\nX-XSS-Protection: 1; mode=block\r\nX-Request-Id: bd2552de-bf7d-4a0c-94d6-ff1b6856002a\r\nAccess-Control-Allow-Headers: SDK-Version\r\nETag: W/\"47580e0a23e806945b01f1237219175c\"\r\nX-Frame-Options: SAMEORIGIN\r\nX-Runtime: 0.112902\r\nX-Content-Type-Options: nosniff\r\nX-Powered-By: Phusion Passenger 5.1.4\r\nCF-Cache-Status: REVALIDATED\r\nExpires: Thu, 04 Jan 2018 22:38:50 GMT\r\nServer: cloudflare-nginx\r\nCF-RAY: 3d8100f109c6a23f-ICN\r\n\r\n{\"total_count\":2057,\"offset\":0,\"limit\":50,\"notifications\":[{\"adm_big_picture\":\"\",\"adm_group\":\"\",\"adm_group_message\":{\"en\":\"\... <truncated>
If I try to use fromJSON function with this data,
I get:
Error in file(con, "r") : cannot open the connection
jsonlite::fromJSON works great for parsing JSON. Your problem is that you have a bunch of stuff in front of your JSON. (Maybe after too, can't tell...)
I think the JSON starts at the first {, so we'll remove everything before that. Calling your data x:
x = sub('^[^\\{]*\\{', '{', x)
jsonlite::fromJSON(x)
Type the unescaped version of the patter into the Regex101 tool for an explanation. (Unescaped version uses single not double backslashes: ^[^\{]*\{ . In R strings we need to double the backslashes.)
Here's a working example based on your data:
x = 'HTTP/1.1 200 OK
Date: Thu, 04 Jan 2018 20:38:50 GMT
Content-Type: application/json; charset=utf-8
Transfer-Encoding: chunked
Connection: keep-alive
Set-Cookie: __cfduid=d6bbf45645c3bd5332f83d25d06d8b8ca1515098329; expires=Fri, 04-Jan-19 20:38:49 GMT; path=/; domain=.onesignal.com; HttpOnly
Status: 200 OK
Cache-Control: public, max-age=7200
Access-Control-Allow-Origin: *
X-XSS-Protection: 1; mode=block
X-Request-Id: bd2552de-bf7d-4a0c-94d6-ff1b6856002a
Access-Control-Allow-Headers: SDK-Version
ETag: W/\"47580e0a23e806945b01f1237219175c\"
X-Frame-Options: SAMEORIGIN
X-Runtime: 0.112902
X-Content-Type-Options: nosniff
X-Powered-By: Phusion Passenger 5.1.4
CF-Cache-Status: REVALIDATED\r\nExpires: Thu, 04 Jan 2018 22:38:50 GMT
Server: cloudflare-nginx
CF-RAY: 3d8100f109c6a23f-ICN
{\"total_count\":2057,\"offset\":0,\"limit\":50,\"notifications\":[{\"adm_big_picture\":\"\",\"adm_group\":\"\"}]}'
y = gsub('^[^\\{]*\\{', '{', x)
jsonlite::fromJSON(sub('^(^\\{)*\\{', '{', y))
# $total_count
# [1] 2057
#
# $offset
# [1] 0
#
# $limit
# [1] 50
#
# $notifications
# adm_big_picture adm_group
# 1
You can use the rjson package to transform your input into a json. Using simplifyDataFrame parameter fromJSON should output a dataframe object.
Importing data from a JSON file into R
[edit]
Your data is returning with some header, you can overcome it removing it from the string and passing to fromJSON
library(stringr)
library(rjson)
json <- str_sub(str_extract(data, "ICN\\r\\n\\r\\n.*"), 8)
df <- as.data.frame(fromJSON(json))
> head(df)
total_count
1 2057

Wrong oauth_signature

Please, can you help me to figure out what is wrong with the signature? I really, can't understand what is the problem. I have been searching for the answer for over a month. If you give me an example, of the similar working
request or help me to understand the problem I would appreciate it so much!
library(RCurl)
library(httr)
library(rvest)
cons_key<-"ae1dd19212a84c4299f4b157462d32d7"
shared_secret<-"c0bc9938ea754f4ba6e926865b347fae"
encoded_url<-URLencode("http://platform.fatsecret.com/rest/server.api",reserved = T,repeated = T)
encoded_string<-URLencode("method=GET&oauth_consumer_key=cons_key&oauth_nonce=asdas4&oauth_signature_method=HMAC-SHA1&oauth_timestamp=15138110238&oauth_version=1.0",reserved = T,repeated = T)
text_string<-paste0("GET&",encoded_url,"&",encoded_string)
signature_oauth<-sha1_hash(key = "ae1dd19212a84c4299f4b157462d32d7&",string = text_string)
getForm("http://platform.fatsecret.com/rest/server.api",
method="GET",
oauth_consumer_key=cons_key,
oauth_signature=signature_oauth,
oauth_signature_method="HMAC-SHA1",
oauth_timestamp="1513811023",
oauth_nonce="asdas4",
oauth_version="1.0",.opts = list(verbose = TRUE))
The answer is:
Trying 34.225.169.9...
* Connected to platform.fatsecret.com (34.225.169.9) port 80 (#0)
> GET /rest/server.api method=GET&oauth_consumer_key=ae1dd19212a84c4299f4b157462d32d7&oauth_signature=rB2IYo5tZ%2BAUIsk%2BjlP0ahtn%2Bhk%3D&oauth_signature_method=HMAC-SHA1&oauth_timestamp=1513811023&oauth_nonce=asdas4&oauth_version=1.0 HTTP/1.1
Host: platform.fatsecret.com
Accept: */*
< HTTP/1.1 200 OK
< Date: Wed, 20 Dec 2017 23:05:44 GMT
< Content-Type: text/xml; charset=utf-8
< Content-Length: 377
< Connection: keep-alive
< Cache-Control: private
< Server: Microsoft-IIS/8.5
< X-AspNet-Version: 4.0.30319
< X-Powered-By: ASP.NET
<
* Connection #0 to host platform.fatsecret.com left intact
[1] "<?xml version=\"1.0\" encoding=\"utf-8\" ?>\r\n<error
xmlns=\"http://platform.fatsecret.com/api/1.0/\" xmlns:xsi=\"http://www.w3.org/2001/XMLSchema-instance\" xsi:schemaLocation=\"http://platform.fatsecret.com/api/1.0/ http://platform.fatsecret.com/api/1.0/fatsecret.xsd\">\r\n\t<code>8</code>\r\n\t<message>Invalid signature: oauth_signature 'rB2IYo5tZ+AUIsk+jlP0ahtn+hk='</message>\r\n</error>\r\n"
attr(,"Content-Type")
charset
"text/xml" "utf-8"

curl request in R

curl req:
curl -i -H "Accept: application/json" -H "Content-Type: application/json" -d '{"username":"emailId","password":"passwrd"}' -X POST https://central.vizury.com/-/api/login
res:
HTTP/1.1 200 OK
Cache-Control: no-cache, no-store, must-revalidate
Content-Type: application/json; charset=utf-8
Date: Wed, 06 Sep 2017 10:47:00 GMT
Expires: 0
Pragma: no-cache
Set-Cookie: viz.sess3=SessionCookieHere; path=/; expires=Wed, 06 Sep 2017 10:49:01 GMT; secure; httponly
Set-Cookie: AWSELB=someval;PATH=/;EXPIRES=Wed, 06 Sep 2017 10:49:01 GMT;SECURE;HTTPONLY
Vary: Accept-Encoding
X-Powered-By: Express
Content-Length: 226
Connection: keep-alive
{"status":"OK","results":{"username":"email","role":"role","products":["webConvert","mobiConvert"],"needsNewPassword":false},"homePath":"/webConvert/#/dashboard/campaignName"}
I need to perform the same action in R:
This is what I have tried so far:
h <- basicHeaderGatherer()
loginUrl <- "https://central.vizury.com/-/api/login"
params <- list('username' = 'username',
'password' = 'password')
loginRes <- postForm(loginUrl, .params=params, style="POST", .opts=curlOptions(headerfunction=h$update, verbose=TRUE))
print("loginres")
print(loginRes)
In response,
print(h$value()['Set-Cookie'] )
I can access Set-Cookie. But how do I access the value of viz.sess3?
Example using curl package:
h <- curl::new_handle()
login_url <- 'https://central.vizury.com/-/api/login'
curl::handle_setform(
handle = h,
username = 'username',
password = 'password'
)
resp <- curl::curl_fetch_memory(login_url, handle = h)
message(resp$status_code)
jsonlite::fromJSON(rawToChar(resp$content))

Resources