How to create user agent? - r

I'm trying out this function from the package edgarwebR
x <- paste0("https://www.sec.gov/Archives/edgar/data/",
"933691/000119312517247698/0001193125-17-247698-index.htm")
try(filing_information(x))
But it returns me the following
No encoding supplied: defaulting to UTF-8.
Error in check_result(res) :
EDGAR request blocked from Undeclared Automated Tool.
Please visit https://www.sec.gov/developer for best practices.
See https://mwaldstein.github.io/edgarWebR/index.html#ethical-use--fair-access for your responsibilities
Consider also setting the environment variable 'EDGARWEBR_USER_AGENT
So I went to SEC and the information tells me to declare my user agent in the request headers. I'm new to this. How do I create a user agent?

I use the edgar package so I'm not sure this is helpful. But this is how I would do it using edgar package
library("edgar")
useragent = "Your Name Contact#domain.com"
info <- getFilingInfo('933691', 2021, quarter = c(1,2,3,4), form.type = 'ALL', useragent)

Related

jsonlite::fromJSON failed to connect to Port 443 in quantmod getFX()

I needed a function which automatically gets the Exchange Rate from a website (oanda.com), therefore I used the quantmod package and made a function based on that.
library(quantmod)
ForeignCurrency<-"MYR" # has to be a character
getExchangeRate<-function(ForeignCurrency){
Conv<-paste("EUR/",(ForeignCurrency),sep="")
getFX(Conv,from=Sys.Date()-179,to=Sys.Date())
Conv2<-paste0("EUR",ForeignCurrency,sep="")
Table<-as.data.frame(get(Conv2))
ExchangeRate<-1/mean(Table[,1])
ExchangeRate
}
ExchangeRate<-getExchangeRate(ForeignCurrency)
ExchangeRate
On my personal PC, it works perfectly and do what I want. If i run this on the working PC, I get following Error:
Warning: Unable to import “EUR/MYR”.
Failed to connect to www.oanda.com port 443: Timed out
I googled already a lot, it seems to be a Firewall Problem, but none of the suggestions I found there doesnt work. After checking the getFX() function, the Problem seems to be in the jsonlite::fromJSON function which getFX() is using.
Did someone of you faced a similar Problem? I am quite familar with R, but with Firewalls/Ports I have no expertise. Do I have to change something in the R settings or is it a Problem independent of R and something in the Proxy settings needs to be changed?
Can you please help :-) ?
The code below shows how you can do a workaround for the getfx() in an enterprise context where you often have to go through a proxy to the internet.
library(httr)
# url that you can find inside https://github.com/joshuaulrich/quantmod/blob/master/R/getSymbols.R
url <- "https://www.oanda.com/fx-for-business//historical-rates/api/data/update/?&source=OANDA&adjustment=0&base_currency=EUR&start_date=2022-02-17&end_date=2022-02-17&period=daily&price=mid&view=table&quote_currency_0=VND"
# original call inside the quantmod library: # Fetch data (jsonlite::fromJSON will handle connection) tbl <- jsonlite::fromJSON(oanda.URL, simplifyVector = FALSE)
# add the use_proxy with your proxy address and proxy port to get through the proxy
response <- httr::GET(url, use_proxy("XX.XX.XX.XX",XXXX))
status <- status_code(response)
if(status == 200){
content <- httr::content(response)
# use jsonlite to get the single attributes like quote currency, exchange rate = average and base currency
exportJson <- jsonlite::toJSON(content, auto_unbox = T)
getJsonObject <- jsonlite::fromJSON(exportJson, flatten = FALSE)
print(getJsonObject$widget$quoteCurrency)
print(getJsonObject$widget$average)
print(getJsonObject$widget$baseCurrency)
}

How to use R to correctly download data from FREDR package

I'm trying to run a code in R to get some data from FREDR package but I'm getting trouble to understand the error R shows me.
The code I have:
library(fredr)
fredr_set_key("...")
cpi <- fredr::fredr(series_id = "CPIAUCSL",observation_start = as.Date("1960-01-01"),observation_end = as.Date("2005-12-01"))
The error I get:
Error in (function (endpoint, ..., to_frame = TRUE, print_req = FALSE) :
400: Bad Request. The value for variable api_key is not a 32 character alpha-numeric lower-case string. Read https://research.stlouisfed.org/docs/api/api_key.html for more information.
This code runs perfectly in the computer of my professor (who is a Windows user) so I think that the problem may be related to my Mac but I'm really not sure.
Mac OS 10.15.4
Did you use your API key? You should request one here: https://research.stlouisfed.org/docs/api/api_key.html
Then replace ... with your API key.

Rcrawler package: ContentScraper Error

i've a problem with ContentScraper function of Rcrawler package. I would like to extract from this site some information about time and airports of arrival and departure and also the price: (I took inspiration fom this site)
MY_Data=ContentScraper(CssPatterns = c(".leg",".price"), ManyPerPattern = T, Url = "http://www.skyscanner.it/trasporti/voli/rome/lond/180201?adults=1&children=0&adultsv2=1&childrenv2=&infants=0&cabinclass=economy&rtn=0&preferdirects=false&outboundaltsenabled=false&inboundaltsenabled=false&ref=day-view#results")
but i get this error:
Error in LinkExtractor(url = Ur, encod = encod) : object 'Extlinks' not found
I had a look to LinkExtractor function but i have no ideas of why it doesn't find Extlinks since it should be created by the function itself. Isn't it?
Someone could help me?
Thank You!
This website doesn't allow scraping. This may be one reason why your example doesn't work. You can try in this web. I also recommend you to try rvest package which is easier to use.
I have tried the same request using Rcrawler+phantomjs web driver but no result, There is some sort of javascript protection against unreal sessions,
br<-run_browser()
MY_Data<-ContentScraper(CssPatterns = c(".leg",".price"), ManyPerPattern = T, Url = "https://www.skyscanner.it/trasporti/voli/rome/lond/?adults=1&children=0&adultsv2=1&childrenv2=&infants=0&cabinclass=economy&rtn=0&preferdirects=false&outboundaltsenabled=false&inboundaltsenabled=false&ref=day-view&oym=1903&selectedoday=01", browser = br, RenderingDelay = 5)
I retrieved the session Screenshot, and I can confirm that the javascript which load results is stuck .
Using Rselenium+ chrome headless (with gpu enabled) I got robot check page. (see images)
As a result the only hope to get data legitimately is to use their API
Rcrawler creator

RSocrata package with Chicago data neglects my token

I can not throttle-up my downloads by using the token issued to my app (on data.chicago.com portal, where I had to register)
Error 1:
token <- "___my_app_token__";
fdf <- read.socrata("h___s://data.cityofchicago.org/resource/7edu-s3u7.csv?$where=station_name=\"Foster Weather Station\"", token)
2016-10-06 10:39:53.685 getResponse:
Error in httr GET: 403 h___s://data.cityofchicago.org/resource/7edu-s3u7.csv?%24where=station_name%3D%22Foster%20Weather%20Station%22&app_token=%2524%2524app_token%3D___my_app_token_______
I have NO IDEA where did the first 'token' (2524 2524) come from, do you? Can somebody tell me? Maybe the author of the package is here?
Non-error:
fdf <- read.socrata("h___s://data.cityofchicago.org/resource/7edu-s3u7.csv?$where=station_name=\"Foster Weather Station\"")
WITHOUT A TOKEN (and not throttled-up) works perfectly well!
and this 'open source' h___s://github.com/Chicago/RSocrata/blob/master/R/RSocrata.R doesn't answer the question as well.
It looks like the syntax you're using to pass your app token is wrong. I'm no R expert, but I found this example in the documentation for the RSocrata library:
df <- read.socrata("http://soda.demo.socrata.com/resource/4334-bgaj.csv",
app_token = "__my_app_token__")
Try passing your app token as a named parameter instead of an indexed parameter, and see if that helps.

POST request using RCurl

As a way of exploring how to make a package in R for the Denver RUG, I decided that it would be a fun little project to write an R wrapper around the datasciencetoolkit API. The basic R tools come from the RCurl package as you might imagine. I am stuck on a seemingly simple problem and I'm hoping that somebody in this forum might be able to point me in the right direction. The basic problem is that I can't seem to use postForm() to pass an un-keyed string as part of the data option in curl, i.e. curl -d "string" "address_to_api".
For example, from the command line I might do
$ curl -d "Tim O'Reilly, Archbishop Huxley" "http://www.datasciencetoolkit.org/text2people"
with success. However, it seems that postForm() requires an explicit key when passing additional arguments into the POST request. I've looked through the datasciencetoolkit code and developer docs for a possible key, but can't seem to find anything.
As an aside, it's pretty straightforward to pass inputs via a GET request to other parts of the DSTK API. For example,
ip2coordinates <- function(ip) {
api <- "http://www.datasciencetoolkit.org/ip2coordinates/"
result <- getURL(paste(api, URLencode(ip), sep=""))
names(result) <- "ip"
return(result)
}
ip2coordinates('67.169.73.113')
will produce the desired results.
To be clear, I've read through the RCurl docs on DTL's omegahat site, the RCurl docs with the package, and the curl man page. However, I'm missing something fundamental with respect to curl (or perhaps .opts() in the postForm() function) and I can't seem to get it.
In python, I could basically make a 'raw' POST request using httplib.HTTPConnection -- is something like that available in R? I've looked at the simplePostToHost function in the httpRequest package as well and it just seemed to lock my R session (it seems to require a key as well).
FWIW, I'm using R 2.13.0 on Mac 10.6.7.
Any help is much appreciated. All of the code will soon be available on github if you're interested in playing around with the data science toolkit.
Cheers.
With httr, this is just:
library(httr)
r <- POST("http://www.datasciencetoolkit.org/text2people",
body = "Tim O'Reilly, Archbishop Huxley")
stop_for_status(r)
content(r, "parsed", "application/json")
Generally, in those cases where you're trying to POST something that isn't keyed, you can just assign a dummy key to that value. For example:
> postForm("http://www.datasciencetoolkit.org/text2people", a="Archbishop Huxley")
[1] "[{\"gender\":\"u\",\"first_name\":\"\",\"title\":\"archbishop\",\"surnames\":\"Huxley\",\"start_index\":44,\"end_index\":61,\"matched_string\":\"Archbishop Huxley\"},{\"gender\":\"u\",\"first_name\":\"\",\"title\":\"archbishop\",\"surnames\":\"Huxley\",\"start_index\":88,\"end_index\":105,\"matched_string\":\"Archbishop Huxley\"}]"
attr(,"Content-Type")
charset
"text/html" "utf-8"
Would work the same if I'd used b="Archbishop Huxley", etc.
Enjoy RCurl - it's probably my favorite R package. If you get adventurous, upgrading to ~ libcurl 7.21 exposes some new methods via curl (including SMTP, etc.).
From Duncan Temple Lang on the R-help list:
postForm() is using a different style (or specifically Content-Type) of submitting the form than the curl -d command.
Switching the style = 'POST' uses the same type, but at a quick guess, the parameter name 'a' is causing confusion
and the result is the empty JSON array - "[]".
A quick workaround is to use curlPerform() directly rather than postForm()
r = dynCurlReader()
curlPerform(postfields = 'Archbishop Huxley', url = 'http://www.datasciencetoolkit.org/text2people', verbose = TRUE,
post = 1L, writefunction = r$update)
r$value()
This yields
[1]
"[{\"gender\":\"u\",\"first_name\":\"\",\"title\":\"archbishop\",\"surnames\":\"Huxley\",\"start_index\":0,\"end_index\":17,\"matched_string\":\"Archbishop
Huxley\"}]"
and you can use fromJSON() to transform it into data in R.
I just wanted to point out that there must be an issue with passing a raw string via the postForm function. For example, if I use curl from the command line, I get the following:
$ curl -d "Archbishop Huxley" "http://www.datasciencetoolkit.org/text2people
[{"gender":"u","first_name":"","title":"archbishop","surnames":"Huxley","start_index":0,"end_index":17,"matched_string":"Archbishop Huxley"}]
and in R I get
> api <- "http://www.datasciencetoolkit.org/text2people"
> postForm(api, a="Archbishop Huxley")
[1] "[{\"gender\":\"u\",\"first_name\":\"\",\"title\":\"archbishop\",\"surnames\":\"Huxley\",\"start_index\":44,\"end_index\":61,\"matched_string\":\"Archbishop Huxley\"},{\"gender\":\"u\",\"first_name\":\"\",\"title\":\"archbishop\",\"surnames\":\"Huxley\",\"start_index\":88,\"end_index\":105,\"matched_string\":\"Archbishop Huxley\"}]"
attr(,"Content-Type")
charset
"text/html" "utf-8"
Note that it returns two elements in the JSON string and neither one matches on the start_index or end_index. Is this a problem with encoding or something?
The simplePostToHost function in the httpRequest package might do what you are looking for here.

Resources