Accessing Azure Data Lake with R

Accessing Azure Data Lake with R - r

Anyone worked with R for accessing Azure Data Lake (file upload/download from ADLS in R). If so, any code snippet will be helpful.

For ADLSgen2, you can use the AzureStor package, which is on CRAN. Example code from the README:
library(AzureStor)
# authenticate with AAD
token <- AzureRMR::get_azure_token("https://storage.azure.com",
tenant="myaadtenant", app="app_id", password="mypassword"))
ad_endp_tok <- storage_endpoint("https://mystorage.dfs.core.windows.net", token=token)
cont <- storage_container(ad_endp_tok, "myfilesystem")
list_storage_files(cont)
create_storage_dir(cont, "newdir")
storage_download(cont, "/readme.txt", "~/readme.txt")
storage_multiupload(cont, "N:/data/*.*", "newdir")
For ADLSgen1, you can use the AzureSMR package. However, note that AzureSMR is deprecated and no longer actively maintained.

Related

How to distribute a plumber API over several file using mounts?

I am dealing with a large API and I'd like to distribute its definition over several file.
As far as I understood, reading the documentation this where the "mounnt()" method from a plumb comes to play
I have tried the following:
iris.R:
#* Return a bit of iris
#* #get /iris
function(){
head(iris)
}
In a new R session running:
irisAPI <- plumber::plumb("iris.R")
server <- plumber::plumber$new()
server$mount("/server", irisAPI)
server$run(host="0.0.0.0", port=8080, swagger= T)
Curling returns nothing, the swagger json is empty
Cancelling and then running the exact same thing onthe irisAPI plumb and then it works.
Is this a bug or am I missing something?
Thanks,

I had the same problem.
The problem was in plumber version. On CRAN repositories exist 0.4.6, you need download 0.5.0 (on docs say it but downloaded version is 0.4.7.9000) version from github using devtools library on R.
https://github.com/trestletech/plumber/blob/master/NEWS.md
https://cran.r-project.org/web/packages/plumber/index.html
The follow code run succesfully for me:
root <- plumber$new()
a <- plumber$new("controllers/a.R")
root$mount("/a", a)
b <- plumber$new("controllers/b.R")
root$mount("/b", b)
root$run(port = 8080, swagger=TRUE, debug= TRUE)
Regards!

How to use getReturns with the Yahoo finance API

I have problems with R package getReturns. I have encountered this error since 17th May:
Warning in file(file, "rt") :
cannot open URL 'http://ichart.finance.yahoo.com/table.csv?s=AAPL&a=4&b=28&c=2014&d=4&e=27&f=2017&g=w&ignore=.csv': HTTP status was '404 Not Found'
It looks like that ichart API did not run anymore. Can anyone help me with this issue? Does someone know how to fix it? I have encountered the same issue with the quantmod R package.

I have encountered this problem as well. Yahoo! has taken down ichart and the open-source libraries that rely on it are now broken. Yahoo! also has no plans to introduce a replacement. For more information, see this post on Yahoo!'s Forums.

I switched to eodhistoricaldata.com after Yahoo failed, several weeks ago I found good alternative with an API very similar to Yahoo Finance.
Basically, for almost all R scripts I use I just changed this:
URL <- paste0("ichart.finance.yahoo.com/table.csv?s=", symbols[i])
to:
URL <- paste0("eodhistoricaldata.com/api/table.csv?s=", symbols[i])
Then add an API key and it will work in the same way as before. I saved a lot of time for my R scripts on it.

You can follow my an earlier post, which might help you.
I tried :
library(quantmod)
# Create an object containing the Pfizer ticker symbol
symbol <- "PFE"
# Use getSymbols to import the data
getSymbols(symbol, src="yahoo", auto.assign=T)
# because src='google' throws error, yahoo was used, and even that is down
When I tried other source, it worked:
# "quantmod::oanda.currencies" contains a list of currencies provided by Oanda.com
currency_pair <- "GBP/CAD"
# Load British Pound to Canadian Dollar exchange rate data
getSymbols(currency_pair, src="oanda")
str(GBPCAD)
It seems there are issues with google and yahoo while we use quantmod pkg.
I will suggest you to use 'Quandl' instead. Plz goto Quandl website, register for free and create API key, and then copy it in below:
# Install Quandl
install.packages("Quandl")
# or from github
install.packages("devtools")
library(devtools)
install_github("quandl/quandl-r")
# Load the Quandl package
library(Quandl)
# use API for full access
Quandl.api_key("xxxxxx")
# Download APPLE stock data
mydata = Quandl::Quandl.datatable("ZACKS/FC", ticker="AAPL")
For HDFC at BSE, you can use:
hdfc = Quandl("BSE/BOM500180")
for more details:
https://www.quandl.com/data/BSE-Bombay-Stock-Exchange?keyword=HDFC

RSocrata package with Chicago data neglects my token

I can not throttle-up my downloads by using the token issued to my app (on data.chicago.com portal, where I had to register)
Error 1:
token <- "___my_app_token__";
fdf <- read.socrata("h___s://data.cityofchicago.org/resource/7edu-s3u7.csv?$where=station_name=\"Foster Weather Station\"", token)
2016-10-06 10:39:53.685 getResponse:
Error in httr GET: 403 h___s://data.cityofchicago.org/resource/7edu-s3u7.csv?%24where=station_name%3D%22Foster%20Weather%20Station%22&app_token=%2524%2524app_token%3D___my_app_token_______
I have NO IDEA where did the first 'token' (2524 2524) come from, do you? Can somebody tell me? Maybe the author of the package is here?
Non-error:
fdf <- read.socrata("h___s://data.cityofchicago.org/resource/7edu-s3u7.csv?$where=station_name=\"Foster Weather Station\"")
WITHOUT A TOKEN (and not throttled-up) works perfectly well!
and this 'open source' h___s://github.com/Chicago/RSocrata/blob/master/R/RSocrata.R doesn't answer the question as well.

It looks like the syntax you're using to pass your app token is wrong. I'm no R expert, but I found this example in the documentation for the RSocrata library:
df <- read.socrata("http://soda.demo.socrata.com/resource/4334-bgaj.csv",
app_token = "__my_app_token__")
Try passing your app token as a named parameter instead of an indexed parameter, and see if that helps.

Download a custom dataset in Azure ML Jupyter/iPython Notebook using R

I need to download a custom dataset in an Azure Jupyter/iPython Notebook.
My ultimate goal is to install an R package. To be able to do this the package (the dataset) needs to be downloaded in code. I followed the steps outlined by Andrie de Vries in the comments section of this post: Jupyter Notebooks with R in Azure ML Studio.
Uploading the package as a ZIP file was without problems, but when I run the code in my notebook I get an error:
Error in curl(x$DownloadLocation, handle = h, open = conn): Failure
when receiving data from the peer Traceback:
download.datasets(ws, "plotly_3.6.0.tar.gz.zip")
lapply(1:nrow(datasets), function(j) get_dataset(datasets[j, . ], ...))
FUN(1L[[1L]], ...)
get_dataset(datasets[j, ], ...)
curl(x$DownloadLocation, handle = h, open = conn)
So I simplified my code into:
library("AzureML")
ws <- workspace()
ds <- datasets(ws)
ds$Name
data <- download.datasets(ws, "plotly_3.6.0.tar.gz.zip")
head(data)
Where "plotly_3.6.0.tar.gz.zip" is the name of my dataset of data type "Zip".
Unfortunately this results in the same error.
To rule out data type issues I also tried to download another dataset of mine which is of data type "Dataset". Also the same error.
Now I change the dataset I want to download to one of the sample datasets of AzureML Studio.
"text.preprocessing.zip" is of datatype Zip
data <- download.datasets(ws, "text.preprocessing.zip")
"Flight Delays Data" is of datatype GenericCSV
data <- download.datasets(ws, "Flight Delays Data")
Both of the sample datasets can be downloaded without problems.
So why can't I download my own saved dataset?
I could not find anything helpful in the documentation of the download.datasets function. Not on rdocumentation.org, nor on cran.r-project.org (page 17-18).

Try this:
library(AzureML)
ws <- workspace(
id = "your AzureML ID",
auth = "your AzureML Key"
)
name <- "Name of your saved data"
ws <- workspace()

It seems the error I got was due to a bug in the (then early) Azure ML Studio.
I tried again after the reply of Daniel Prager only to find out my code works as expected without any changes. Adding the id and auth parameters was not needed.

Yahoo gecoding API in R

I am trying to do a batch geocode with the Yahoo BOSS api from R.
It is currently throwing an error based on credentials - any idea how I can get this to succeed?
myapp <- oauth_app("yahoo",
key = "my key",
secret = "my secret"
)
yahoo <- oauth_endpoint("get_request_token", "request_auth", "get_token",
base_url = "https://yboss.yahooapis.com/geo/placefinder")
token <- oauth1.0_token(myapp, yahoo)
sig <- sign_oauth1.0(myapp, token$oauth_token, token$oauth_token_secret)
GET("https://yboss.yahooapis.com/geo/placefinder",
sig)

Unfortunately Yahoo uses a weird authentication strategy that isn't compatible with a simple oauth_endpoint function. You can see the general flow I use in the rydn package that #Scott pointed out here.
You might benefit from just using that package, or feel free to leverage the working example I have there in your own stuff.

Develop Reference

r css asp.net wordpress firebase qt symfony nginx http apache-flex

Accessing Azure Data Lake with R - r

Anyone worked with R for accessing Azure Data Lake (file upload/download from ADLS in R). If so, any code snippet will be helpful.

Related

How to distribute a plumber API over several file using mounts?

How to use getReturns with the Yahoo finance API

RSocrata package with Chicago data neglects my token

Download a custom dataset in Azure ML Jupyter/iPython Notebook using R

Yahoo gecoding API in R

Categories

Resources