Arrow R package fails to install on Databricks

Arrow R package fails to install on Databricks - r

About 6 weeks ago (early April 2022), I had tested a Databricks workflow to ensure that I could trigger jobs on databricks remotely from Airflow, which was successful.
As part of the process the workflow activates a pre-built compute, it then loads various R libraries from DBFS into the compute, one of the packages is 'arrow', however while all the other packages load without issue this package fails to load successfully and then causes my workflow to crash.
When I look into the workflow I get the following error 'DRIVER_LIBRARY_INSTALLATION_FAILURE. Error Message: Command to install library [RCranPkgId(arrow,None,None)] on [0303-130414-840hkwxf] orgId [5132544506122561] failed inside Databricks infrastructure', see image below. Arrow Fail Message" data-fileid="0698Y00000JFZosQAHI have triggered workflow directly inside of databricks and still get the same problem, so clearly it does not have an airflow related cause.
I tried to delete the arrow package from dbfs to see if I could run the test without it, but everytime I delete it, it returns when I retry the workflow.
I then checked CRAN to see if arrow had been updated recently, it was on the 2022-05-09, so I loaded an older version instead, (having first deleted everything relating to it from dbfs), this didn't work either, see images attached.
# Databricks notebook source
.libPaths()
# COMMAND ----------
dir("/databricks/spark/R/lib")
# COMMAND ----------
## Add current working directory to library paths
.libPaths(c(getwd(), .libPaths()))
# COMMAND ----------
## The latest versions from CRAN
install.packages(c('arrow', 'tidyverse', 'aws.s3', 'sparklyr', 'cluster', 'sqldf', 'lubridate', 'ChannelAttribution'), repos = "http://cran.us.r-project.org")
# COMMAND ----------
dir("/tmp/Rserv/conn970")
# COMMAND ----------
## Copy from driver to DBFS
system("cp -R /tmp/Rserv/conn970 /usr/lib/R/site-library")
# COMMAND ----------
dir("/usr/lib/R/site-library")
# COMMAND ----------
## Copy from driver to DBFS
system("cp -R /tmp/Rserv/conn970 /dbfs/r-libraries")
# COMMAND ----------
dir("/dbfs/r-libraries")
# COMMAND ----------
# Add packages to libPaths
.libPaths("/dbfs/r-libraries")
# COMMAND ----------
# Check that the dbfs libraries are in libPath
.libPaths()
I have also attached the script I'm using in R to load the packages to dbfs, which I think is good as every other package load properly, however it may be of use in understanding what I am doing or why the error occurs.
What I'd like to know is:
Do you see anything that I might be doing incorrectly inside my attached libraries script?
Is there an issue with loading the arrow package and if so do you have a work around that can prevent the failure?
Why does dbfs continue to re-install arrow despite my removal of it from the directory?
Can I permanently remove the arrow package from dbfs without it returning everytime I trigger the workflow?
Many thanks in advance for any help you guys can offer.

I figured it out, the issue had two aspects to it. The first was plain to see once I looked at the json file for the job. In it I specified that certain packages should be loaded when building the compute, this configuration overrode my model's script by virtue of the fact that it ran first. Seeing this in the json file explained why the arrow library kept trying to load regardless of the fact that I had removed it from dbfs.
The second part of the solution was how to get arrow to load without failing, which continued to happened when I retried to reinstall it on dbfs or independently on my model. By default Databricks seems to try to load packages from CRAN (https://cran.r-project.org/), I don't understand why it failed, maybe an issue with ubuntu from the last update???
The solution was to install it from a snapshot I got from MRAN at the following location 'https://cran.microsoft.com/snapshot/2022-02-24/'. Thank you user2554330, your comment got me rethinking and set me on the right direction.
I hope that this helps someone else if they are having similar issues.

Related

Ignore a dependency during R CMD check

I have a package that I developed to enable my team (and perhaps other interested users) to install and use a particular R package (RQDA) that was archived on CRAN. I have hosted this package on GitHub and am trying to set up GitHub Actions so that I have a CI workflow in place.
Whenever I run R CMD check locally everything is fine, but when I push to GitHub the build fails. This is because, by default, Actions tries to install that same (archived) package. Expectedly, this fails.
So, my question is this: is there a way I can disable the check for a specific package dependency? There are no plans to ever send this package to CRAN, so I am happy to bypass their package policy in this instance.

2 possible ways:
Upload the source for RQDA to a Github repo, or other publicly accessible location, and put a Remotes: line in your DESCRIPTION file
Save the package to cloud storage, eg an S3 bucket or Azure storage container, and download it from there as a separate workflow step prior to checking

This is how I was able to deal with the problem:
The changes made were entirely in the workflow file at ./.github/workflows/. One of the jobs there is for installing R package dependencies for the project:
- name: Install dependencies
run: |
remotes::install_deps(dependencies = TRUE)
remotes::install_cran("rcmdcheck")
shell: Rscript {0}
The first thing I did was to change the dependencies argument to NA so that only packages listed in Depends and Imports are installed. (The RQDA dependency that was giving me trouble is under Suggests).
There was still an error but this time with some guidance that involved going to the job Check and setting the environment variable _R_CHECK_FORCE_SUGGESTS_ to false.
The check now works as expected.

R CRAN Check - detritus in temp directory

I'm getting a recurring NOTE on Debian CRAN checks (Debian only: 11 OK, 1 NOTE), it cannot be reproduced locally using docker interactively or via {rhub}.
* checking for detritus in the temp directory ... NOTE
Found the following files/directories:
‘calibre_4.99.5_tmp_tgaufday’ ‘calibre_4.99.5_tmp_uz2f_rsg’
The name of the files differ every time but they always start with calibre and there are always two of them. This is for the package {echarts4r}, the latest CRAN fail here.
EDIT this is called by creating a browser in the examples, ensure these are not run with if(interactive())

Turns out this issue appears when one starts a web browser from the #examples, in my case a shiny application, wrap them in if(interactive()), it was difficult to tell given the NOTE...

This doesn't matter, so far as I can tell. That sort of 'detritus' won't show up when the package is rebuilt at CRAN. On OS X one often sees a similar note due to leftover files from the attempts to go text --> LaTeX --> PDF. Unless someone from the R dev team rejects your submission, ignore this.

RStudio states file does not exist

Attempting to use the mread function to open a cpp file through R. However, when I run the script I get the following:
setwd("C:/Users/Gustavo/Documents/R/page-2018-mrgsolve-master/model")
getwd()
#> [1] "C:/Users/Gustavo/Documents/R/page-2018-mrgsolve-master/model"
library(mrgsolve)
mod <- mread("simple", "model")
#> Error: project directory 'model' must exist and be readable.
Obviously I am setting the directory to "model" itself. So why isn't R able to read it? Any help would be appreciated as I am still learning R and want to learn the mrgsolve package as well.
Additional info: R version 3.4.4. Rtools version 3.4.0. Rstudio version 1.1.463.

An adaptation to the email I sent my colleagues that were assisting me with a similar issue:
To review, I was unable to open any files through RStudio because RStudio returned error messages indicating either that the file itself or the work directory did not exist. I've done multiple installations of different versions of R, RStudio, and Rtools in an attempt to resolve the issue. I also moved the locations of files and programs of interest and changed the work directory to see if that made a difference. Unfortunately, when RStudio is first initiated on a computer, it establishes a "hidden directory" folder that retains the settings of the program when it was first initiated. However, by deleting this folder, RStudio was wiped and I was able to regain control of where files would be stored and read as desired (more on this in the following link: https://support.rstudio.com/hc/en-us/articles/200534577-Resetting-RStudio-Desktop-s-State). A combination of this and forcing Rtools to the front of the 'path' also allowed me to resolve 'status 127' errors that I was receiving as well.
Unfortunately, this is the result of a more personal issue between the initial settings that RStudio took to my computer and my attempt to manipulate where RStudio should read files which I guess were discordant of one another? Regardless, it seems that I would need to be more cognizant of how RStudio establishes a folder which retains its initial settings.

How to use windows scheduler to execute R script

I have found some way to solve it
e.g: windows system's task scheduler and R package called taskscheduleR
I only can execute a simple R script , like
cat(aa,file="ttt.txt",sep="\n",append=TRUE)
but I can't execute the program I really want to execute
the picture is the situation
I want to know:
if this R script needs other ***.r
what should I key on the command line?
or just put them into the same folder?
now the situation is

Accordingly to the error message, the script is trying to install a package, and you don't have any selected cran mirror in your .Rprofile. Then, it fails as you should manually select a mirror interactively. Either fix this by selecting a mirror, or remove/comment the line in your script that is installing a package from cran and it should works fine.

Warning in install.packages: unable to move temporary installation

I've found a number of questions related to this warning when installing or updating packages in R/RStudio, but none seem to completely match my situation:
Corporate Windows 7 system, so no access to admin privileges
No way to make changes to McAfee Anti-Virus exceptions lists
R is fully installed in the user space C:\Users\[myname]\R
RStudio fully installed in userspace C\Users\[myname]\RStudio
no permission issues in either of the directories... I have full access control over them
Problem only started after installing R 3.4, but RStudio has randomly failing at start or hanging for a few months now
R_LIBS_USER added as user environment variable, pointing to right directory
.libPaths() show correct directories, both system and user
R version 3.4.2, RStudio version 1.0.153
Uninstalled both R and Rstudio and did a clean re-install of both
Tried trace(utils:::unpackPkgZip,edit = T) and edited Line 140 Sys.sleep(0.5) to Sys.sleep(2), which sometimes works temporarily but the edit won't stay put... resets to Sys.sleep(0.5) on every session restart
Happens in both RStudio and RGui
Any package larger than a few Kb gives the message:
package ‘packagename’ successfully unpacked and MD5 sums checked
Warning in install.packages :
unable to move temporary installation ‘C:\Users\[myname]\R\win-library\3.4\file2b884fc37c13\packagename’ to ‘C:\Users\[myname]\R\win-library\3.4\packagename’
The packages are failing to install or update. So, my questions are:
is there a way to avoid the problem altogether that doesn't require admin privileges or changes to the antivirus policies?
is there a way to get the edit to unpackPkgZip to save permanently?
At this point, I'm stumped. I suspect it has something to do with the antivirus temporarily locking the file/directory after download, but I can't do anything about it from that end. The Sys.sleep(2) seems to do the trick, but I can't keep doing that before every package install or update and can't seem to get the edit to stay put.

This was the only thing that worked for me on this issue (the uninstalling antivirus software didn't get me anywhere, unfortunately), so hopeful it works for you.
On Windows systems, sometimes installation of libraries may be running too fast, creating the error "unable to move temporary installation". Then the package is not found in the user library, because it hasn't been moved over...
To fix, try: trace(utils:::unpackPkgZip, edit=TRUE)
Then go to Line 140 in the code and change Sys.sleep(0.5) to Sys.sleep(2.5)
This is a nice longer term solution that does not require manual package moving, uninstalling software, replacing admin responsibilities, or individually routing packages to certain locations.

My original reply is below, but I've subsequently found a better solution.
Execute the following line:
Trace(utils:::unpackPkgZip, edit=TRUE)
Note that there three colons in there, not two.
Then edit line 142, from Sys.sleep(0.5) to: Sys.sleep(2.0), and click to save the edit (the line number may vary slightly). Unfortunately this does not hold across R sessions, but it only takes 10 seconds to do this, and then you can install packages for the current session to your heart's content.
Original answer:
I ran into the same problem at work. I was able to use Sheldon's suggested approach, but as noted that can get tedious quickly. As an alternative, I found I could go to the location of the downloaded zip file(s) in my temp directory (as reported by install.packages), unzip the file or files (there will be multiple zip files if there are dependent packages), and then move or copy all the unzipped directories straight into my R\win-library\3.4 directory. This isn't a whole lot of fun either, but I find it to be less painful than stepping through the debugger, per Sheldon's method, especially when multiple dependencies are involved and also have to be installed.

If you cannot turn off your antivirus here is a workaround that I found that doesn't involve editing the unpackPkgZip file. Debugging the unzip package function and then stepping through it gives the antivirus enough time to do its job without interfering. Use this command:
debug(utils:::unpackPkgZip)
install.packages("packageName")
and then step through the code (by pressing enter many times) when R starts debugging during the installation.
I found this solution here.
If you want to make this change more permanent you can add the debug code into your Rprofile file, see here, but you'll still need to use step through the unzip function each time a package is installed.

Got the same error - seems to be a company gp / access security problem.
It might also be worthwhile checking whether the folder it fails to write to has a Read Only structure (Right Click - Properties). This folder's address can be found by running: .libPaths()[1] in R.
An ad hoc solution to this problem is to unzip and store the downloaded (but not moved) packages using a piece of R code below. You will get an error stating where the binary packages are located (something like: C:/Users/....AppData/...)
Now you can simply unzip the files from here to your .libPaths() location
zipF <- list.files("C:/Users/<YOURNAMEHERE>/AppData/Local/Temp/Rtmp4Apz6Z/downloaded_packages", full.names = TRUE)
outDir <- .libPaths()[1]
for(i in 1: length(zipF)) {
unzip(zipF[i],exdir=outDir)
}
A more general solution will still be extremely worthwhile, as this is unfortunately a common problem when updating R on Windows.

We've had the same problem at my workplace, and one of my coworkers discovered a great workaround. Unfortunately it's a temporary thing you'll need to do each time you install packages, rather than a permanent fix. We're running corporate Windows 8 (no admin privileges) with McAfee, and I've tested this in R 3.4.0-3.4.3.
Temporarily turning off McAfee's "On-Access Scan" feature (in Threat Prevention) solved this for us -- R packages now all install on the first try the way they're intended to. Here's detailed steps to turn that off:
Right-click the McAfee icon in the notification area at the right of
your taskbar, and select McAfee Endpoint Security.
Click on Threat Prevention. This opens up a screen where you should see categories such as "Access Protection", "Exploit Prevention", and "On-Access Scan".
Un-check "Enable On-Access Scan", and then click Apply. (NB: it's
easy to forget to click Apply, but it's essential)
Once you've installed your packages, it's best to repeat the process to turn On-Access Scan back on.

I fixed my instance of this problem (Windows 7) by removing the 'Read-Only' attribute of the folder R was trying to move stuff to.
I went to the Run command from the Start menu in Windows (7) and typed
attrib -r +s drive:\\
Note that just right clicking the folder and trying to change properties didn't take, as per this link from Microsoft: https://support.microsoft.com/en-us/help/326549/you-cannot-view-or-change-the-read-only-or-the-system-attributes-of-fo
Hope that helps someone.
I hope this change doesn't screw me in other ways.

This was the error message that was spit out for me:
package ‘mlogit’ successfully unpacked and MD5 sums checked
Warning in install.packages :
unable to move temporary installation ‘C:\Users\E\Documents\R\win-
library\3.4\file9ec6cfb5e40\mlogit’ to ‘C:\Users\E\Documents\R\win-
library\3.4\mlogit’
The downloaded binary packages are in
C:\Users\E\AppData\Local\Temp\RtmpS0uNDm\downloaded_packages
What I did was went to where the package was downloaded (C:\Users\E\AppData\Local\Temp\RtmpS0uNDm\downloaded_packages) and then copied that zipped file to the desktop then used Winzip to unzip to my file directory where all the packages for R are stored (C:\Users\E\Documents\R\win-library\3.4). It now will load in R.
library("mlogit")
Loading required package: Formula
Loading required package: maxLik
Loading required package: miscTools
....
It worked well for me as it was the only package that was not downloading for some reason. Might not be helpful if you have to do this for every package.

I also found one solution if above solutions wouldn't work in corporate antivirus.
First change the path of package installation use this command and execute in R:
install.packages('caTools','D:\\ML\\Tools\\Installed\\RPackages')
Now it will show a console's error that unable to move and the package is placed on to some location. just remember this location, we need this zip file for further operations.
Now use this command:
install.packages("D:/ML/Tools/Installed/RPackages/caTools_1.17.1.zip", repos = NULL, type = "win.binary", lib="D:/ML/Tools/Installed/R-3.4.3/library")

I struggled with the same issue. For me (on Windows 10), the issue was using MalwareBytes (Premium trial). I uninstalled it and went back to using Windows Defender, and the issue was resolved. Perhaps if more time, I can find out how to create an exception and/or file checking delay for MalwareBytes (i.e., which is a pretty good program), but the user-guide (https://www.malwarebytes.com/pdf/guides/Malwarebytes-User-Guide.pdf) is unclear on this.

Extending the Sys.sleep value to 3.5 on line 142 in the unpackPkgZip function works manually via
trace(utils:::unpackPkgZip, edit=TRUE)
However, it can also be done programmatically by running the following before install.packages:
localUnpackPkgZip <- utils:::unpackPkgZip
body(localUnpackPkgZip)[[14]][[4]][[4]][[4]][[3]][[3]][[2]][[2]] <- substitute(3.5)
assignInNamespace("unpackPkgZip", localUnpackPkgZip, "utils")
This must be run every time you have a new session. You can run it multiple times in the same session without issue.

If you run the below statement right before the install.packages expression then it should install the package:
trace("unpackPkgZip", where=asNamespace("utils"), quote(Sys.sleep(2.5)), at=14L, print=FALSE)

Develop Reference

r css asp.net wordpress firebase qt symfony nginx http apache-flex

Arrow R package fails to install on Databricks - r

Related

Ignore a dependency during R CMD check

R CRAN Check - detritus in temp directory

RStudio states file does not exist

How to use windows scheduler to execute R script

Warning in install.packages: unable to move temporary installation

Categories

Resources