How to apply Spark configuration settings from R markdown document parameters - r

this is my first post so please be kind to me and my poor English.
I'm interested in running this code (this is just the script). It seems that running the code without customizing it is not possible but I don't understand what I should change.
Maybe I just installed Spark badly? I'm using the latest version of RStudio, this text says this but I don't understand if the latest version is suitable.
Please note that sparklyr version 0.7.0+ (available on GitHub, but not
yet released on CRAN) is needed.
I can tell you that the error occurs when it arrives in this line of code.
# Apply Spark configuration settings from R markdown document parameters
spark_param_names <- grep("spark.", names(params),
fixed = TRUE, value = TRUE)
the error is the following
Error in shell_connection(master = master, spark_home = spark_home, app_name = app_name, :
Failed to connect to Spark (SPARK_HOME is not set).
I'm a student and I'm not very experienced thanks for your patience

Related

How to build API documentation for an R package in RStudio?

I'm working on a very recent Windows 10 build, RStudio 1.3.959 and I've just installed the latest MikTex.
I'm trying to put together an R package using RStudio. I can build the package and the Function documentation comments are being converted into /man/*.Rd files. These are then successfully displayed when one executes ?function_name in the RStudio console window.
Unfortunately, I'm having very little luck building the PDF package API documentation (not to be mistaken as the vignette; which I can build). I've looked over a good few tutorials but they all stop short of instructing how one builds the final PDF API document that one expects with every R package.
I've tried:
Build[Windows]->More->Document ... which execute devtools::document(roclets = c('rd', 'collate', 'namespace', 'vignette'))
Build[Windows]->More->Build Source Package ... which executes devtools::document(roclets = c('rd', 'collate', 'namespace', 'vignette')) followed by devtools::build(binary = TRUE, args = c('--preclean'))
Build[Windows]->More->Build Binary Package ... which executes devtools::document(roclets = c('rd', 'collate', 'namespace', 'vignette')) followed by devtools::build(binary = TRUE, args = c('--preclean'))
All three function as expected but still no final package manual pdf file.
Doing some digging on Stack I noticed someone used the command:
devtools::build_manual()
I'm convinced this is what I need. However, when I execute that line of code I get the error:
Converting Rd files to LaTeX ...
Warning in sys2(makeindex, shQuote(idxfile)) : '"makeindex"' not found
Error in texi2dvi(file = file, pdf = TRUE, clean = clean, quiet = quiet, :
unable to run 'makeindex' on 'Rd2.idx'
Warning in sys2(makeindex, shQuote(idxfile)) : '"makeindex"' not found
Error in texi2dvi(file = file, pdf = TRUE, clean = clean, quiet = quiet, :
unable to run 'makeindex' on 'Rd2.idx'
Error in running tools::texi2pdf()
Error: Failed to build manual
Which has left me none the wiser, although it's quite clear that something's a bit upset by the absence of makeindex. Help is much appreciated.
If I understand correctly, you mean the standard reference manual. For example ggplot2 reference manual. In my experience, this is thrown together by CRAN when you submit. All the manual is (as far as I can tell) a collection of things that a well documented package should have, such as a DESCRIPTION file, a NAMESPACE file, and the various .RD files for actual documentation.
Even when looking at the public github for ggplot2 we see that they do not have the manual in their repository. Additionally, there isn't any evidence in their .git*ignore files to suggest they made the manual themselves.
However, if you want to make this yourself devtools::build_manual() is the correct function call.
I was able to make the manual with a preexisting package on github. I would suggest trying to reinstall your devtools package and make sure there are no warnings or errors. It may be helpful to run a session as administrator to insure things get installed correctly.
Good luck!
I am using R version 3.4 with RStudio 1.1.453 on MacOS High Serra.

RStudio Connect, Packrat, and custom packages in local repos

We have recently got RStudio Connect in my office. For our work, we have made custom packages, which we have updated amongst ourselves by opening the project and build+reloading.
I understand the only way I can get our custom packages to work within apps with RSConnect is to get up a local repo and set our options(repos) to include this.
Currently I have the following:
library(drat)
RepoAddress <- "C:/<RepoPath>" # High level path
drat::insertPackage(<sourcePackagePath>, repodir = RepoAddress)
# Add this new repo to Rs knowledge of repos.
options(repos = c(options("repos")$repos,LocalCurrent = paste0("file:",RepoAddress)))
# Install <PackageName> from the local repo :)
install.packages("<PackageName>")
Currently this works nicely and I can install my custom package from the local repo. This indicates to me that the local repo is set up correctly.
As an additional aside, I have changed the DESCRIPTION file to have an extra line saying repository:LocalCurrent.
However when I try to deploy a Shiny app or Rmd which references , I get the following error on my deploy:
Error in findLocalRepoForPkg(pkg, repos, fatal = fatal) :
No package '<PackageName> 'found in local repositories specified
I understand this is a problem with packrat being unable to find my local repos during the deploy process (I believe at a stage where it uses packrat::snapshot()).This is confusing since I would have thought packrat would use my option("repos") repos similar to install.packages. If I follow through the functions, I can see the particular point of failure is packrat:::findLocalRepoForPkg("<PackageName", repos = packrat::get_opts("local.repos")), which fails even after I define packrat::set_opts("local.repos" = c(CurrentRepo2 = paste0("file:",RepoAddress)))
If I drill into packrat:::findLocalRepoForPkg, it fails because it can't find a file/folder called: "C://". I would have thought this is guaranteed to fail, because repos follow the C://bin/windows/contrib/3.3/ structure. At no point would a repo have the structure it's looking for?
I think this last part is showing I'm materially misunderstanding something. Any guidance on configuring my repo so packrat can understand it would be great.
One should always check what options RStudio connect supports at the moment:
https://docs.rstudio.com/connect/admin/r/package-management/#private-packages
Personally I dislike all options for including local/private packages, as it defeats the purpose of having a nice easy target for deploying shiny apps. In many cases, I can't just set up local repositories in the organization because, I do not have clearance for that. It is also inconvenient that I have to email IT-support to make them manually install new packages. Overall I think RS connect is great product because it is simple, but when it comes to local packages it is really not.
I found a nice alternative/Hack to Rstudio official recommendations. I suppose thise would also work with shinyapps.io, but I have not tried. The solution goes like:
add to global.R if(!require(local_package) ) devtools::load_all("./local_package")
Write a script that copies all your source files, such that you get a shinyapp with a source directory for a local package inside, you could call the directory for ./inst/shinyconnect/ or what ever and local package would be copied to ./inst/shinyconnect/local_package
manifest.
add script ./shinyconnect/packrat_sees_these_dependencies.R to shiny folder, this will be picked up by packrat-manifest
Hack rsconnet/packrat to ignore specifically named packages when building
(1)
#start of global.R...
#load more packages for shiny
library(devtools) #need for load_all
library(shiny)
library(htmltools) #or what ever you need
#load already built local_package or for shiny connection pseudo-build on-the-fly and load
if(!require(local_package)) {
#if local_package here, just build on 2 sec with devtools::load_all()
if(file.exists("./DESCRIPTION")) load_all(".") #for local test on PC/Mac, where the shinyapp is inside the local_package
if(file.exists("./local_package/DESCRIPTION")) load_all("./local_package/") #for shiny conenct where local_package is inside shinyapp
}
library(local_package) #now local_package must load
(3)
make script loading all the dependencies of your local package. Packrat will see this. The script will never be actually be executed. Place it at ./shinyconnect/packrat_sees_these_dependencies.R
#these codelines will be recognized by packrat and package will be added to manifest
library(randomForest)
library(MASS)
library(whateverpackageyouneed)
(4) During deployment, manifest generator (packrat) will ignore the existence of any named local_package. This is an option in packrat, but rsconnect does not expose this option. A hack is to load rsconnect to memory and and modify the sub-sub-sub-function performPackratSnapshot() to allow this. In script below, I do that and deploy a shiny app.
library(rsconnect)
orig_fun = getFromNamespace("performPackratSnapshot", pos="package:rsconnect")
#packages you want include manually, and packrat to ignore
ignored_packages = c("local_package")
#highjack rsconnect
local({
assignInNamespace("performPackratSnapshot",value = function (bundleDir, verbose = FALSE) {
owd <- getwd()
on.exit(setwd(owd), add = TRUE)
setwd(bundleDir)
srp <- packrat::opts$snapshot.recommended.packages()
packrat::opts$snapshot.recommended.packages(TRUE, persist = FALSE)
packrat::opts$ignored.packages(get("ignored_packages",envir = .GlobalEnv)) #ignoreing packages mentioned here
print("ignoring following packages")
print(get("ignored_packages",envir = .GlobalEnv))
on.exit(packrat::opts$snapshot.recommended.packages(srp,persist = FALSE), add = TRUE)
packages <- c("BiocManager", "BiocInstaller")
for (package in packages) {
if (length(find.package(package, quiet = TRUE))) {
requireNamespace(package, quietly = TRUE)
break
}
}
suppressMessages(packrat::.snapshotImpl(project = bundleDir,
snapshot.sources = FALSE, fallback.ok = TRUE, verbose = FALSE,
implicit.packrat.dependency = FALSE))
TRUE
},
pos = "package:rsconnect"
)},
envir = as.environment("package:rsconnect")
)
new_fun = getFromNamespace("performPackratSnapshot", pos="package:rsconnect")
rsconnect::deployApp(appDir="./inst/shinyconnect/",appName ="shinyapp_name",logLevel = "verbose",forceUpdate = TRUE)
The problem is one of nomenclature.
I have set up a repo in the CRAN sense. It works fine and is OK. When packrat references a local repo, it is referring to a local git-style repo.
This solves why findlocalrepoforpkg doesn't look like it will work - it is designed to work with a different kind of repo.
Feel free to also reach out to support#rstudio.com
I believe the local package code path is triggered in packrat because of the missing Repository: value line in the Description file of the package. You mentioned you added this line, could you try the case-sensitive version?
That said, RStudio Connect will not be able to install the package from the RepoAddress as you've specified it (hardcoded on the Windows share). We'd recommend hosting your repo over https from a server that both your Dev environment and RStudio Connect have access to. To make this type of repo setup much easier we just released RStudio Package Manager which you (and IT!) may find a compelling alternative to manually managing releases of your internal packages via drat.

Problems with R Request price feed through Bloomberg API

Following the steps as outlined below:
install Java APIv3 from the Bloomberg terminal (by typing WAPI into the command bar). Once installed connect it to R using :install.packages("Rbbg", repos = "http://r.findata.org") and conn <- blpConnect(log.level = "finest"). Finally, to extract share price information you use bdp(conn,securities,function)
I get an error when trying to connect that gives me the following message:
Error in .jnew("org/findata/blpwrapper/Connection", java.log.level) :
org.findata.blpwrapper.WrapperException: Session not started because: CONNECTION_FAILURE
Any advice how to resolve this would be very appreciated.
Please try migrating to Rblpapi. This is more modern equivalent and can be found on CRAN (install.packages("Rblpapi")) or github (https://github.com/Rblp/Rblpapi).

Publishing AzureML Webservice from R requires external zip utility

I want to deploy a basic trained R model as a webservice to AzureML. Similar to what is done here:
http://www.r-bloggers.com/deploying-a-car-price-model-using-r-and-azureml/
Since that post the publishWebService function in the R AzureML package was has changed it now requires me to have a workspace object as first parameter thus my R code looks as follows:
library(MASS)
library(AzureML)
PredictionModel = lm( medv ~ lstat , data = Boston )
PricePredFunktion = function(percent)
{return(predict(PredictionModel, data.frame(lstat =percent)))}
myWsID = "<my Workspace ID>"
myAuth = "<my Authorization code"
ws = workspace(myWsID, myAuth, api_endpoint = "https://studio.azureml.net/", .validate = TRUE)
# publish the R function to AzureML
PricePredService = publishWebService(
ws,
"PricePredFunktion",
"PricePredOnline",
list("lstat" = "float"),
list("mdev" = "float"),
myWsID,
myAuth
)
But every time I execute the code I get the following error:
Error in publishWebService(ws, "PricePredFunktion", "PricePredOnline", :
Requires external zip utility. Please install zip, ensure it's on your path and try again.
I tried installing programs that handle zip files (like 7zip) on my machine as well as calling the utils library in R which allows R to directly interact with zip files. But I couldn't get rid of the error.
I also found the R package code that is throwing the error, it is on line 154 on this page:
https://github.com/RevolutionAnalytics/AzureML/blob/master/R/internal.R
but it didn't help me in figuring out what to do.
Thanks in advance for any Help!
The Azure Machine Learning API requires the payload to be zipped, which is why the package insists on the zip utility being installed. (This is an unfortunate situation, and hopefully we can find a way in future to include a zip with the package.)
It is unlikely that you will ever encounter this situation on Linux, since most (all?) Linux distributions includes a zip utility.
Thus, on Windows, you have to do the following procedure once:
Install a zip utility (RTools has one and this works)
Ensure the zip is on your path
Restart R – this is important, otherwise R will not recognize the changed path
Upon completion, the litmus test is if R can see your zip. To do this, try:
Sys.which("zip")
You should get a result similar to this:
zip
"C:\\Rtools\\R-3.1\\bin\\zip.exe"
In other words, R should recognize the installation path.
On previous occasions when people told me this didn’t work, it was always because they thought they had a zip in the path, but it turned out they didn’t.
One last comment: installing 7zip may not work. The reason is that 7zip contains a utility called 7zip, but R will only look for a utility called zip.
I saw this link earlier but the additional clarification which made my code not work was
1. Address and Path of Rtools was not as straigt forward
2. You need to Reboot R
With regards to the address - always look where it was installed . I also used this code to set the path and ALWAYS ADD ZIP at the end
##Rtools.bin="C:\\Users\\User_2\\R-Portable\\Rtools\\bin"
Rtools.bin="C:\\Rtools\\bin\\zip"
sys.path = Sys.getenv("PATH")
if (Sys.which("zip") == "" ) {
system(paste("setx PATH \"", Rtools.bin, ";", sys.path, "\"", sep = ""))
}
Sys.which("zip")
you should get a return of
" C:\\RTools|\bin\zip"
From looking at Andrie's comment here: https://github.com/RevolutionAnalytics/AzureML/commit/9cf2c5c59f1f82b874dc7fdb1f9439b11ab60f40
Implies we can just download RTools and be done with it.
Download RTools from:
https://cran.r-project.org/bin/windows/Rtools/
During installation select the check box to modify the PATH
At first it didn't work. I then tried R32bit, and that seemed to work. Then R64 bit started working again. Honestly, not sure if I did something in the middle to make it work. Only takes a few minutes so worth a punt.
Try the following
-Download the Rtools file which usually contains the zip utility.
-Copy all the files in the "bin" folder of "Rtools"
-Paste them in "~/RStudio/bin/x64" folder

Error when using getGEO() in package GEOquery

I'M running the following code in R:
library(GEOquery)
mypath <- "C:/Users/Farzin/Desktop/BIOC"
GDS1 <- getGEO('GDS1',destdir=mypath)
But I'm getting the following error:
Using locally cached version of GDS1 found here:
C:/Users/Farzin/Desktop/BIOC/GDS1.soft.gz
Error in read.table(con, sep = "\t", header = FALSE, nrows = nseries) :
invalid 'nlines' argument
Could anyone please tell me how I could get rid of this error?
I have had the same error using GEOquery (version 2.23.5) with R and Bioconductor from ubuntu (12.04), whatever GDS file I queried. Could it be that the GEOquery package is faulty ?
In my experience, getGEO is extremely finicky. I commonly experience issues connecting to the GEO server. If this happens during download, getGEO leaves a partial file. But since the partial file is there, when you try to re-download, it will use this cached, partially downloaded file, and run into the error you see (which you want, because its not the full file).
To solve this, delete the cached SOFT file and retry the download.

Resources