Download a custom dataset in Azure ML Jupyter/iPython Notebook using R - r

I need to download a custom dataset in an Azure Jupyter/iPython Notebook.
My ultimate goal is to install an R package. To be able to do this the package (the dataset) needs to be downloaded in code. I followed the steps outlined by Andrie de Vries in the comments section of this post: Jupyter Notebooks with R in Azure ML Studio.
Uploading the package as a ZIP file was without problems, but when I run the code in my notebook I get an error:
Error in curl(x$DownloadLocation, handle = h, open = conn): Failure
when receiving data from the peer Traceback:
download.datasets(ws, "plotly_3.6.0.tar.gz.zip")
lapply(1:nrow(datasets), function(j) get_dataset(datasets[j, . ], ...))
FUN(1L[[1L]], ...)
get_dataset(datasets[j, ], ...)
curl(x$DownloadLocation, handle = h, open = conn)
So I simplified my code into:
library("AzureML")
ws <- workspace()
ds <- datasets(ws)
ds$Name
data <- download.datasets(ws, "plotly_3.6.0.tar.gz.zip")
head(data)
Where "plotly_3.6.0.tar.gz.zip" is the name of my dataset of data type "Zip".
Unfortunately this results in the same error.
To rule out data type issues I also tried to download another dataset of mine which is of data type "Dataset". Also the same error.
Now I change the dataset I want to download to one of the sample datasets of AzureML Studio.
"text.preprocessing.zip" is of datatype Zip
data <- download.datasets(ws, "text.preprocessing.zip")
"Flight Delays Data" is of datatype GenericCSV
data <- download.datasets(ws, "Flight Delays Data")
Both of the sample datasets can be downloaded without problems.
So why can't I download my own saved dataset?
I could not find anything helpful in the documentation of the download.datasets function. Not on rdocumentation.org, nor on cran.r-project.org (page 17-18).

Try this:
library(AzureML)
ws <- workspace(
id = "your AzureML ID",
auth = "your AzureML Key"
)
name <- "Name of your saved data"
ws <- workspace()

It seems the error I got was due to a bug in the (then early) Azure ML Studio.
I tried again after the reply of Daniel Prager only to find out my code works as expected without any changes. Adding the id and auth parameters was not needed.

Related

Error message: 'exdir' does not exist when using prism package

I am replicating the R code for a published paper about climate change that uses prism package to access and visualize data from the Oregon State PRISM project.
This code stopped running with a error message of:
utils::unzip(outFile, exdir = ofolder) : 'exdir' does not exist
However, there is no unzipping in the code, except the one already defined in the prism package. This code uses two functions in the prism package: 1) get_prism_dailys, and 2) prism_set_dl_dir.
I have already read this page about error of "exdir does not exist". But this code from the published paper does not have an unzipping command.
Can someone help me with this? Code:
lapply(c("tmin","tmax"), function(var){
print(var)
# Set directory to write daily files
prism_set_dl_dir(paste0(dir$prism,"/daily/",var))
# Download files
get_prism_dailys(type = var, minDate = "1981-01-01", maxDate = "2020-12-31", keepZip=FALSE)
})
})

R-Studio IDE on Databricks

I am trying to import a dataset from my Databricks File System (DBFS) to R-Studio- which is running on Databricks Cluster; and I am facing this issue below.
> sparkDF <- read.df(source = "parquet", path = "/tmp/lrs.parquet", header="true", inferSchema = "true")`
Error: Error in load : java.lang.SecurityException: No token to
authorize principal
at com.databricks.sql.acl.ReflectionBackedAclClient$$anonfun$com$databricks$sql$acl$ReflectionBackedAclClient$$token$2.apply(ReflectionBackedAclClient.scala:137)
at com.databricks.sql.acl.ReflectionBackedAclClient$$anonfun$com$databricks$sql$acl$ReflectionBackedAclClient$$token$2.apply(ReflectionBackedAclClient.scala:137)
at scala.Option.getOrElse(Option.scala:121)
at com.databricks.sql.acl.ReflectionBackedAclClient.com$databricks$sql$acl$ReflectionBackedAclClient$$token(ReflectionBackedAclClient.scala:137)
at com.databricks.sql.acl.ReflectionBackedAclClient$$anonfun$getValidPermissions$1.apply(ReflectionBackedAclClient.scala:86)
at com.databricks.sql.acl.ReflectionBackedAclClient$$anonfun$getValidPermissions$1.apply(ReflectionBackedAclClient.scala:81)
at com.databricks.sql.acl.ReflectionBackedAclClient.stripReflectionException(ReflectionBackedAclClient.scala:73)
at com.databricks.sql.acl.Refle
The DBFS Location is correct, any suggestions or blogs are welcomed for this!
The syntax for reading data with R on Databricks depends on whether you are reading into Spark or into R on the driver. See below:
# reading into Spark
sparkDF <- read.df(source = "parquet",
path = "dbfs:/tmp/lrs.parquet")
# reading into R
r_df <- read.csv("/dbfs/tmp/lrs.csv")
When reading into Spark, use the dbfs:/ prefix, when reading into R directly use /dbfs/.
We should use dbfs before the directory path.
For Example: /dbfs/tmp/lrs.parquet

Problems exporting Power BI with R Script

can you help me?
DataSource.Error: ADO.NET: A problem occurred while processing your R script.
Here are the technical details: Information about a data source is required.
Details:
DataSourceKind = R
DataSourcePath = R
Message = A problem occurred while processing your R script.
Here are the technical details: Information about a data source is required.
ErrorCode = -2147467259
ExceptionType = Microsoft.PowerBI.Scripting.R.Exceptions.RUnexpectedException
This error comes up when I try to export my table using an R script
write.csv2(dataset, file = paste0("C:/Users/Acer/OneDrive/ALLPARTS/ALLPARTSNET_", format(Sys.time(), "%Y%m%d"), ".csv"), row.names=F)
Already tried:
reinstall R, R Studio and Power BI
Is there a solution?
LINK POWER BI FILE
https://1drv.ms/u/s!AnziDWh5m2I1gplLfe9FM2M0EFqDyw
Click on File > Options & settings > Options.
Inside the Options window, you can change the Privacy in the Global or in the Current File section. Click on Privacy.
Select “Ignore the Privacy Levels and potentially improve performance”.
RUN SCRIPT AGAIN!
Power BI often changes the data type upon importing data into Power BI, and this conflicts with how R interprets that data. In my specific case, the error I was getting was the exact same as yours
I fixed this by removing the Applied Step where Power BI changed the data type of all of my columns. It instantly fixed my issue.

acs package in R: Cannot download dataset, error message is inscrutable

I am trying to use the acs package in R to download Census data for a basic map, but I am unable to download the data and I'm receiving a confusing error message.
My code is as follows:
#Including all packages here in case this is somehow the issue
install.packages(c("choroplethr", "choroplethrMaps", "tidycensus", "tigris", "leaflet", "acs", "sf"))
library(choroplethr)
library(choroplethrMaps)
library(tidycensus)
library(tigris)
library(leaflet)
library(acs)
library(sf)
library(tidyverse)
api.key.install("my_api_key")
SD_geo <- geo.make(state="CA", county = 73, tract = "*", block.group = "*")
median_income <- acs.fetch(endyear = 2015, span = 5, geography = SD_geo, table.number = "B19013", col.names="pretty")
Everything appears to work until the final command, when I receive the following error message:
trying URL 'http://web.mit.edu/eglenn/www/acs/acs-variables/acs_5yr_2015_var.xml.gz'
Content type 'application/xml' length 735879 bytes (718 KB)
downloaded 718 KB
Error in if (url.test["statusMessage"] != "OK") { :
missing value where TRUE/FALSE needed
In addition: Warning message:
In (function (endyear, span = 5, dataset = "acs", keyword, table.name, :
XML variable lookup tables for this request
seem to be missing from ' https://api.census.gov/data/2015/acs5/variables.xml ';
temporarily downloading and using archived copies instead;
since this is *much* slower, recommend running
acs.tables.install()
This is puzzling to me because 1) it appears like something is in fact being downloaded at first? and 2) 'Error in if (url.test["statusMessage"] != "OK") { :
missing value where TRUE/FALSE needed' makes no sense to me. It doesn't align with any of the arguments in the function.
I have tried:
Downloading the tables using acs.tables.install() as recommended in the second half of the error message. Doesn't help.
Changing the endyear and span to be sure that I'm falling within the years of data supported by the API. I seem to be, according to the API documentation. Have also used the package default arguments with no luck.
Using 'variable =' and the code for the variable as found in the official API documentation. This returns only the two lines with the mysterious "Error in if..." message.
Removing colnames = "pretty"
I'm going to just download the datafile as a CSV and read it into R for now, but I'd like to be able to perform this function from the script for future maps. Any information on what's going on here would be appreciated. I am running R version 3.3.2. Also, I'm new to using this package and the API. But I'm following the documentation and can't find evidence that I'm doing anything wrong.
Tutorial I am working off of:
http://zevross.com/blog/2015/10/14/manipulating-and-mapping-us-census-data-in-r-using-the-acs-tigris-and-leaflet-packages-3/#get-the-tabular-data-acs
And documentation of the acs package: http://eglenn.scripts.mit.edu/citystate/wp-content/uploads/2013/02/wpid-working_with_acs_R2.pdf
To follow up on Brandon's comment, version 2.1.1 of the package is now on CRAN, which should resolve this issue.
Your code runs for me. My guess would be that the Census API was temporarily down.
As you loaded tidycensus and you'd like to do some mapping, you might also consider the following code:
library(tidycensus)
census_api_key("your key here") # use `install = TRUE` to install the key
options(tigris_use_cache = TRUE) # optional - to cache the Census shapefile
median_income <- get_acs(geography = "block group",
variables = "B19013_001",
state = "CA", county = "San Diego",
geometry = TRUE)
This will get you the data you need, along with feature geometry for mapping, as a tidy data frame.
I emailed Ezra Haber Glenn, the author of the package, about this as I was having the same issue. I received a response within 30 minutes and it was after midnight, which I thought was amazing. Long story short, the acs package version 2.1.0 is configured to work with the changes the Census Bureau is making to their API later this summer, and it is currently presenting some problems windows users in the mean time. Ezra is going to be releasing an update with a fix, but in the mean time I reverted back to version 2.0 and it works fine. I'm sure there are a few ways to do this, but I installed the devtools package and ran:
require(devtools)
install_version("acs", version = "2.0", repos = "http://cran.us.r-project.org")
Hope this helps anyone else having a similar issue.

send2cy doesn't work in Rstudio ~ Cyrest, Cyotoscape

When I run "send2cy" function in R studio, I got error.
# Basic setup
library(igraph)
library(RJSONIO)
library(httr)
dir <- "/currentdir/"
setwd(dir)
port.number = 1234
base.url = paste("http://localhost:", toString(port.number), "/v1", sep="")
print(base.url)
# Load list of edges as Data Frame
network.df <- read.table("./data/eco_EM+TCA.txt")
# Convert it into igraph object
network <- graph.data.frame(network.df,directed=T)
# Remove duplicate edges & loops
g.tca <- simplify(network, remove.multiple=T, remove.loops=T)
# Name it
g.tca$name = "Ecoli TCA Cycle"
# This function will be published as a part of utility package, but not ready yet.
source('./utility/cytoscape_util.R')
# Convert it into Cytosccape.js JSON
cygraph <- toCytoscape(g.tca)
send2cy(cygraph, 'default%20black', 'circular')
Error in file(con, "r") : cannot open the connection
Called from: file(con, "r")
But I didn't find error when I use "send2cy" function from terminal R (Run R from terminal just calling by "R").
Any advice is welcome.
I tested your script with local copies of the network data and utility script, and with updated file paths. The script ran fine for me in R Studio.
Given the error message you are seeing "Error in file..." I suspect the issue is with your local files and file paths... somehow in an R Studio-specific way?
FYI: an updated, consolidated and update set of R scripts for Cytoscape are available here: https://github.com/cytoscape/cytoscape-automation/tree/master/for-scripters/R. I don't think anything has significantly changed, but perhaps trying in a new context will resolve the issue you are facing.

Resources