How to encode accents to query Mediawiki API in R? - r

I have no problem to query Mediawiki API of French Wikipedia for strings without accents:
string <- 'chien'
string <- stringi::stri_enc_toutf8(string, is_unknown_8bit = FALSE, validate = FALSE)
apiQuery <- paste0('https://fr.wikipedia.org/w/api.php?action=query&format=xml&titles=', string)
page <- xml2::read_xml(apiQuery)
{xml_document}
[1] \n \n \n \n \n <page _idx="2736914" pageid="2736914 ...
but I have problem for strings with accents:
string <- 'être'
string <- stringi::stri_enc_toutf8(string, is_unknown_8bit = FALSE, validate = FALSE)
apiQuery <- paste0('https://fr.wikipedia.org/w/api.php?action=query&format=xml&titles=', string)
page <- xml2::read_xml(apiQuery)
I receive the following error :
Error in open.connection(x, "rb") : HTTP error 400.

You need to encode the query in HTML escapes:
page <- xml2::read_xml(URLencode(apiQuery))
This changes the "ê" to "%C3%AA".

Related

How to get rid of embedded NUL on a raw vector?

Im scraping a ASP.NET website.
This will return a raw element (reporte_nacido) which is a csv file (tab as delimiter):
reporte_nacido = postForm('https://xxxxx/WebSiteNDE/BirthsPages/FiltrosExcelNac.aspx',
.params = params,
curl = curl,
.opts = RCurl::curlOptions(ssl.verifypeer=FALSE, verbose=T))
If i load the file on a text viewer, it looks like this
Now im trying to load that raw element within R but i get the following error. I believe the file downloaded from the server comes corrupted somehow and R is being picky about it
rawToChar(as.vector(unlist(reporte_nacido)))
Error in rawToChar(as.vector(unlist(reporte_nacido))) :
embedded nul in string: '\xfe\xff\0N\0\xda\0M\0E\0R\0O\0 \0C\0E\0R\0T\0I\0F\0I\0C\0A\0D\0O\0\t\0D\0E\0P\0A\0R\0T\0A\0M\0E\0N\0T\0O\0\t\0M\0U\0N\0I\0C\0I\0P\0I\0O\0\t\0A\0R\0E\0A\0 \0N\0A\0C\0I\0M\0I\0E\0N\0T\0O\0\t\0I\0N\0S\0P\0E\0C\0C\0I\0O\0N\0 \0C\0O\0R\0R\0E\0G\0I\0M\0I\0E\0N\0T\0O\0 \0O\0 \0C\0A\0S\0E\0R\0I\0O\0 \0N\0A\0C\0I\0M\0I\0E\0N\0T\0O\0\t\0S\0I\0T\0I\0O\0 \0N\0A\0C\0I\0M\0I\0E\0N\0T\0O\0\t\0C\0\xd3\0D\0I\0G\0O\0 \0I\0N\0S\0T\0I\0T\0U\0C\0I\0\xd3\0N\0\t\0N\0O\0M\0B\0R\0E\0 \0I\0N\0S\0T\0I\0T\0U\0C\0I\0\xd3\0N\0\t\0S\0E\0X\0O\0\t\0P\0E\0S\0O\0 \0(\0G\0r\0a\0m\0o\0s\0)\0\t\0T\0A\0L\0L\0A\0 \0(\0C\0e\0n\0t\0\xed\0m\0e\0t\0r\0o\0s\0)\0\t\0F\0E\0C\0H\0A\0 \0N\0A\0C\0I\0M\0I\0E\0N\0T\0O\0\t\0H\0O\0R\0A\0 \0N\0A\0C\0I\0M\0I\0E\0N\0T\0O\0\t\0P\0A\0R\0T\0O\0 \0A\0T\0E\0N\0D\0I\0D\0O\0 \0P\0O\0R\0\t\0T\0I\0E\0M\0P\0O\0 \0D\0E\0 \0G\0E\0S\0T\0A\0C\0I\0\xd3\0N\0\t\0N\0\xda\0M\0E\0R\0O\0 \0C\0O\0N\0S\0U\0L\0T\0A\0S\0 \0P\0R\0E\0N\0A\0T\0A\0L\0E\0S\0\t\0T\0I\0P\0O\0 \0P\0A
The raw vector you are getting is text encoded as UTF-16. You can convert it like this:
library(stringi)
raw_vec <- as.vector(unlist(reporte_nacido))
decoded <- stri_encode(raw_vec, "UTF16")
decoded
#> [1] "NÚMERO CERTIFICADO\tDEPARTAMENTO\tMUNICIPIO\tAREA NACIMIENTO\tINSPECCION CORREGIMIENTO O CASERIO NACIMIENTO\tSITIO NACIMIENTO\tCÓDIGO INSTITUCIÓN\tNOMBRE INSTITUCIÓN\tSEXO\tPESO (Gramos)\tTALLA (Centímetros)\tFECHA NACIMIENTO\tHORA NACIMIENTO\tPARTO ATENDIDO POR\tTIEMPO DE GESTACIÓN\tNÚMERO CONSULTAS PRENATALES\tTIPO PA"
It appears to be tab-separated rather than csv format, so you probably want to read it like this:
read.table(text = decoded, sep = "\t", header = TRUE)

R retreive information from Eikon cloud

I am trying to get data from Eikon Elektron cloud platform.
I ran following codes from https://github.com/Refinitiv/websocket-api/blob/master/Applications/Examples/R/market_price_authentication.R:
library(curl)
> content = paste("grant_type=", "password","&username=", user, "&password=", password, sep="")
> h <- new_handle(copypostfields = content)
> h
<curl handle> (empty)
> handle_setheaders(h,
+ "Content-Type" = "application/x-www-form-urlencoded"
+ )
> handle_setopt(h, ssl_verifypeer = FALSE, ssl_verifyhost = FALSE)
> auth_url = paste("https://", auth_hostname, sep="")# ":", auth_port, "/getToken", sep="")
> auth_url
[1] "https://api.refinitiv.com/auth/oauth2/v1/token"
> req <- curl_fetch_memory(auth_url, **handle = h**)
> req
$url
[1] "https://api.refinitiv.com/auth/oauth2/v1/token"
$status_code
[1] 400
$type
[1] "application/json"
**> h
<curl handle> (https://api.refinitiv.com/auth/oauth2/v1/token)**
> res_headers = parse_headers(req$headers)
> auth_json_string = rawToChar(req$content)
> auth_json = fromJSON(auth_json_string)
> cat(toJSON(auth_json, pretty=TRUE, auto_unbox=TRUE))
{
"error": "invalid_request"
}
As you can see, I got invalid request error. I think the problem lies in curl_fetch_memory and that the handle=h is using same input as auth_url, however it should use something similiar to the input of content. What can I change in my code to make it work?
I found solution how to access the URL. In github was wrongly written that app_id in R file should equal to the code from the App Key generator in Reuters. However, app_id and client_id are different things and you should add client_id=value from App Key generator (not app_id).+ also do not forget to include trapi and etc.. to your content.

R string for calculating signature

Is there any less complicated solution for the following task?
For working with API I need to calculate signature using the formula below.
Problem comes from double quotes in request body, they stored as \" not ".
Code below generate text 1570702210SoMeF#ke123456[{ "id": "123", "title": "foo", "version": "2019-10-10 10:10:10 } ]}SoMeF#ke123456 and correct hash is "b6e783309e9d6f8ee47647373a9f6086020b3af8" by http://www.sha1-online.com/
Signature formula:
hex( sha1({GMT_UNIXTIME} + {API_SECRET} + {CONTENT} + {API_SECRET}) ), where
hex() - function which convert binary data into hexadecimal format
sha1() - standard hash-function SHA-1, must return binary data
text string concatenation
{API_SECRET} - a secret key which is issued together with login {API_LOGIN}
{CONTENT} - request body
Following code gives incorrect signature "c7a7ecbb0fd2d6eebfb378bdd061ea88d9fb2f69".
library(stringr)
library(lubridate)
library(digest)
API_SECRET <- 'SoMeF#ke123456'
mstime <- ymd_hms('2019-10-10 10:10:10')
my_id <- 123
title1 <- 'foo'
request_body_json <- paste0('[{ "id": "', my_id,'", "title": "', title1, '", "version": "', mstime, ' } ]}')
rbj1 <- paste0(round(as.numeric(mstime)), API_SECRET, request_body_json, API_SECRET)
signature <- digest(rbj1, algo = "sha1")
I see a workaround with saving string to file with cat() function and calculate signature from file, that gives correct signature "b6e783309e9d6f8ee47647373a9f6086020b3af8"
cat(paste0(round(as.numeric(mstime)), API_SECRET, request_body_json, API_SECRET), file = 'rbj.txt')
signature <- digest('rbj.txt', algo = "sha1", file = TRUE)
Prevent the input being serialized:
digest(rbj1, algo = "sha1", serialize = FALSE)
[1] "b6e783309e9d6f8ee47647373a9f6086020b3af8"

Authenticating API in R

I'm trying to access Bitfinex via api but struggling to properly authenticate my request. There is a Python example of what I want to do (https://gist.github.com/jordanbaucke/5812039) but I can't seem to get it to work in R.
key <- c("MY API KEY")
secret = c("MY API SECRET")
req <- GET("https://api.bitfinex.com/v1/balances",
authenticate(key, secret))
add_headers(X-BFX-APIKEY = key))
stop_for_status(req)
content(req)
Can someone tell me what I'm doing wrong?
/v1/balances is a Bitfinex authenticated endpoints, thus it requires the POST request with a proper handling of payload and headers.
Here is a working example from my own script:
library(httr)
key <- "..."
secret <- "..."
# payload JSON object, the request should refer to the URL
# nonce should always be greater than for previous calls
payload_json <- list(request = '/v1/account_infos', nonce = as.character(as.numeric(as.POSIXct(Sys.time()))))
# creating string from JSON payload
payload <- jsonlite::toJSON(payload_json, auto_unbox = TRUE)
# encoding payload string
payload_str <- base64enc::base64encode(charToRaw(as.character(payload)))
# adding three Bitfinex headers:
# X-BFX-APIKEY = key
# X-BFX-PAYLOAD = base64 encoded payload string
# X-BFX-SIGNATURE = sha384 encrypted payload string with the secret key
req <- POST("https://api.bitfinex.com/v1/account_infos",
add_headers('X-BFX-APIKEY' = key,
'X-BFX-PAYLOAD' = payload_str,
'X-BFX-SIGNATURE' = openssl::sha384(payload_str, key = secret))
)
content(req)
For creating "New Order" you only need to change the payload to something, like this:
payload_json <- list(request = '/v1/account_infos',
nonce = as.character(as.numeric(as.POSIXct(Sys.time()))),
symbol = 'BTCUSD',
amount = '0.3',
price = '1000',
exchange = 'bitfinex',
side = 'sell',
type = 'exchange market'
)
The rest of the code will work without changes.
For list of payload parameters check the Bitfinex API docs, e.g. "New Order".

Extracting IMF-data

I try to download the "World Gross domestic product, current prices" using the IMFdata-API. If I use this query:
library(IMFData)
databaseID <- 'IFS'
startdate = '2001-01-01'
enddate = '2016-12-31'
checkquery = TRUE
queryfilter <- list(CL_FREA="", CL_AREA_IFS="W0", CL_INDICATOR_IFS="NGDP_USD")
W0.NGDP.query <- CompactDataMethod(databaseID, queryfilter, startdate, enddate, checkquery)
I get this error:
Error: lexical error: invalid char in json text.
<?xml version="1.0" encoding="u
(right here) ------^
In addition: Warning message:
JSON string contains (illegal) UTF8 byte-order-mark!
How can I fix this? Is there a better way using Quandl, etc.?

Resources