Install rgdal and rgeos on Azure Databricks - r

I cannot install rgdal and rgeos on Databricks, any suggestions?
configure: error: gdal-config not found or not executable.
ERROR: configuration failed for package ‘rgdal’
* removing ‘/databricks/spark/R/lib/rgdal’
configure: error: geos-config not found or not executable.
ERROR: configuration failed for package ‘rgeos’
* removing ‘/databricks/spark/R/lib/rgeos’

Here is one way to install rgdal and rgeos on R on Azure Databricks. Step 1 and 2 needs to be done each time you start the cluster. Step 1 can be automated (see below) but step 2 needs to be executed manually in a separate script or be added to the top of your R script.
Step 1
You need to first install gdal and geos on the linux machines in your cluster. This can be done with bash script in a databricks notebook. The %s is the magic command that allows this cell to run a shell script.
%sh
#!/bin/bash
#Start by updating everything
sudo apt-get update
##############
#### rgdal
#This installs gdal on the linux machine but not the R library (done in R script)
#See https://databricks.com/notebooks/rasterframes-notebook.html
sudo apt-get install -y gdal-bin libgdal-dev
#To be able to install the R library, you also need libproj-dev
#See https://philmikejones.me/tutorials/2014-07-14-installing-rgdal-in-r-on-linux/
sudo apt-get install -y libproj-dev
##############
#### rgeos
#This installs geos on the linux machine but not the R library (done in R script)
#See https://philmikejones.me/tutorials/2014-07-14-installing-rgdal-in-r-on-linux/
sudo apt install libgeos++dev
However, that is annoying to have to run manually each time, so you can create an init script that runs each time on startup of the cluster. So in a databricks python notebook, copy this code into a cell. Scripts in dbfs:/databricks/init/<name_of_cluster> will run on start-up for clusters with that name.
#This file creates a bash script called install_packages.sh. The cluster run this file on each startup.
# The bash script will be anything inside the variable script
clusterName = "RStudioCluster"
script = """#!/bin/bash
#Start by updating everything
sudo apt-get update
##############
#### rgdal
#This installs gdal on the linux machine but not the R library (done in R script)
#See https://databricks.com/notebooks/rasterframes-notebook.html
sudo apt-get install -y gdal-bin libgdal-dev
#To be able to install the R library, you also need libproj-dev
#See https://philmikejones.me/tutorials/2014-07-14-installing-rgdal-in-r-on-linux/
sudo apt-get install -y libproj-dev
##############
#### rgeos
#This installs geos on the linux machine but not the R library (done in R script)
#See https://philmikejones.me/tutorials/2014-07-14-installing-rgdal-in-r-on-linux/
sudo apt install libgeos++dev
"""
dbutils.fs.put("dbfs:/databricks/init/%s/install_packages.sh" % clusterName, script, True)
Step 2
So far you have just installed gdal and geos on the linux machines in the cluster. In this step you will install the R package rgdal. Recent versions of rgdal however, are not compatible with the most recent version of gdal available with apt-get. See here for more details and alternative ways to solve this, but if you are ok with an older version of rgdal then the easiest workaround is to install version 1.2-20 of rgdal. You do that in an databricks R notebook or in the Rstudio databricks app like this:
require(devtools)
install_version("rgdal", version="1.2-20")
install.packages("rgeos")
Setup done
Then you can import these libraries like usual:
library(rgdal)
library(rgeos)

Related

How to Install GDAL to use rgdal in R?

I downloaded GDAL from GISInternals with python bindings, I already have python installed on my computer. But when I use R, it doesn't detect gdal installation instead gives error when use command gdal_setinstallation().
How to remove this error so that R can find GDAL installed.
If like me you are using using Linux run the following in case you are missing system dependencies:
sudo apt-get install gdal-bin proj-bin libgdal-dev libproj-dev
Then (as per #MYaseen208 comment) from within R use:
install.packages("rgdal", dependencies = TRUE)

Error loading rgdal

I have successfully installed the rgdal package along with the dependencies GDAL and Proj4. After installation I succesfully loaded the package in R with the library function. However after my most recent login when i type in the command library(rgdal) I get an error message:
Error: package or namespace load failed for 'rgdal' in dyn.load(file,
DLLpath = DLLpath, ...):
unable to load shared object '/home/nikhail1/R/x86_64-pc-linux-gnu-
library/3.4/rgdal/libs/rgdal.so':
libgdal.so.20: cannot open shared object file: No such file or directory
I understand this means there is no link to the libgdal file but I am not sure how to fix it. libgdal.so.20 is in the system under /home/nikhail1/bin/gdal/lib/. The rgdal.so file is under the rgdal folder in the R library in my /home/nikhail1/ system. I do not have the authority to perform an ldconfig function on shared libraries (I am a novice). Does anyone have a function that could help me make the system recognize the pathway to libgdal.so.20. I am working on a Linux CentOs 6.9 system. I cannot perform any sudo apt-get, yum or brew functions.
Many thanks, Nikhail
You can set LD_LIBRARY_PATH to include /home/nikhail1/bin/gdal/lib, i.e. in bash
export LD_LIBRARY_PATH="/home/nikhail1/bin/gdal/lib:$LD_LIBRARY_PATH"
ldd /home/nikhail1/R/x86_64-pc-linux-gnu-library/3.4/rgdal/libs/rgdal.so
should report libgdal.so.20 as been found. How to make this persistent depends on your desktop environment.
I've suffered quite a lot with gdal, rgdal and the proper setting of these in order to run rgdal functions in R. My bulletproof routine at the moment is the following:
UNINSTALL GDAL
sudo apt-get remove gdal-bin
Uninstall gdal-bin including dependent package
If you would like to remove gdal-bin and it's dependent packages which are no longer needed from Ubuntu,
sudo apt-get remove --auto-remove gdal-bin
Use Purging gdal-bin
If you use with purge options to gdal-bin package all the configuration and dependent packages will be removed.
sudo apt-get purge gdal-bin
If you use purge options along with auto remove, will be removed everything regarding the package, It's really useful when you want to reinstall again.
sudo apt-get purge --auto-remove gdal-bin
Just to be sure if you've tried and failed many times to install it and it doesn't work, run all of these in sequential order.
INSTALL GDAL
sudo apt-get update
sudo apt-get install gdal-bin proj-bin libgdal-dev libproj-dev -y
INSTALL RGDAL
IN R:
install.packages('rgeos', type='source')
install.packages('rgdal', type='source')
Now everything should load and run smooth.

R & RStudio Installation on AWS EC2 Linux AMI - latest version of R

Amazon provides a clear installation guide for launching a micro instance and having R & RStudio installed. The guide can be found here: https://aws.amazon.com/blogs/big-data/running-r-on-aws/
Unfortunately this installs an older version of R. (3.2.2) which provides issues for certain packages, like slam, as they require an R version > 3.3.1
In the guide for the step to change the user data they provide the below script which covers the installation of R & RStudio. How do I change the script to install the latest version of R?
#!/bin/bash
#install R
yum install -y R
#install RStudio-Server
wget https://download2.rstudio.org/rstudio-server-rhel-0.99.465-x86_64.rpm
yum install -y --nogpgcheck rstudio-server-rhel-0.99.465-x86_64.rpm
#install shiny and shiny-server
R -e "install.packages('shiny', repos='http://cran.rstudio.com/')"
wget https://download3.rstudio.org/centos5.9/x86_64/shiny-server-1.4.0.718-rh5-x86_64.rpm
yum install -y --nogpgcheck shiny-server-1.4.0.718-rh5-x86_64.rpm
#add user(s)
useradd username
echo username:password | chpasswd
Thanks
Try this:
# Install r-base
yum install r-base
# Install newest version of R from source
wget https://cran.r-project.org/src/base/R-3/R-3.4.0.tar.gz
./configure --prefix=/home/$user/R/R-3.4.0 --with-x=yes --enable-R-shlib=yes --with-cairo=yes
make
# NEWS.pdf file is missing and will make installation crash.
touch doc/NEWS.pdf
make install
# Do not forget to update your PATH
export PATH=~/R/R-3.4.0/bin:$PATH
export RSTUDIO_WHICH_R=~/R/R-3.4.0/bin/R
I ripped this from an ubuntu R install how-to: http://jtremblay.github.io/software_installation/2017/06/21/Install-R-3.4.0-and-RStudio-on-Ubuntu-16.04

How can I install a R package on a offline Debian machine?

I have an Debian VM which is not connected to internet. Yet, I can still scp any file from my local machine which does have internet connection. To provide a little bit context, I am trying to host an shiny app on the VM.
I can still install an old version of R 3.1.1 with the "apt-get" command:
sudo apt-get update
sudo apt-get install r-base
sudo apt-get install r-base-dev
Yet, still can't find the "shiny" package when check the list:
sudo apt-cache search "^r-.*" | sort
So, I am thinking whether I could just scp the "shiny.tar.gz" to the VM and install the package locally? How could I install any R package offline?
I have tried somethings like:
install.packages('/home/mli/R/dir_pkg/shiny/shiny_0.13.2.tar.gz', repos = NULL, type = "source")
Yet, it didn't go through and error message as below:
Installing package into ‘/usr/local/lib/R/site-library’
(as ‘lib’ is unspecified)
Error in type == "both" :
comparison (1) is possible only for atomic and list types
Calls: install.packages
Execution halted
Then, I tried it with another "R CMD":
R CMD INSTALL /home/mli/R/dir_pkg/shiny/shiny_0.13.2.tar.gz
I got error message telling me that dependencies is missing:
* installing to library ‘/home/mli/R/x86_64-pc-linux-gnu-library/3.1’
ERROR: dependencies ‘httpuv’, ‘mime’, ‘jsonlite’, ‘xtable’, ‘digest’, ‘htmltools’, ‘R6’ are not available for package ‘shiny’
* removing ‘/home/mli/R/x86_64-pc-linux-gnu-library/3.1/shiny’
How can I successfully install shiny package from source? Should I go ahead to install all dependencies and dependencies of dependencies first?
Shiny has a few package dependencies, and "R CMD INSTALL" won't find them for you, so you need to get them manually. According to the description of shiny, it's dependencies are:
'Rcpp’, ‘httpuv’, ‘mime’, ‘jsonlite’, ‘xtable’, ‘digest’, ‘htmltools’, ‘R6’.
So first, get the packages from cran (below are current versions, but they do change over time. Note below is for the computer connected to the internet, you'll need to scp these to the offline computer before continuing):
wget https://cran.r-project.org/src/contrib/Rcpp_0.12.4.tar.gz
wget https://cran.r-project.org/src/contrib/httpuv_1.3.3.tar.gz
wget https://cran.r-project.org/src/contrib/mime_0.4.tar.gz
wget https://cran.r-project.org/src/contrib/jsonlite_0.9.19.tar.gz
wget https://cran.r-project.org/src/contrib/digest_0.6.9.tar.gz
wget https://cran.r-project.org/src/contrib/htmltools_0.3.5.tar.gz
wget https://cran.r-project.org/src/contrib/R6_2.1.2.tar.gz
wget https://cran.r-project.org/src/contrib/shiny_0.13.2.tar.gz
Then go through them in that same order with R CMD INSTALL. eg:
R CMD INSTALL Rcpp_0.12.4.tar.gz
Once all the dependencies are there, R CMD INSTALL should let you install shiny.
To install some package-offline on Debian you can use apt-offline :
apt-offline can fully update and upgrade an APT based distribution without connecting to the network, all of it transparent to APT.
apt-offline can be used to generate a signature on a machine (with no network). This signature contains all download information required for the APT database system. This signature file can be used on another machine connected to the internet (which need not be a Debian box and can even be running windows) to download the updates. The downloaded data will contain all updates in a format understood by APT and this data can be used by apt-offline to update the non-networked machine.
Install apt-offline on the offline os (Debian) then type the following command (to import missing keys) :
sudo apt-key exportall | sudo gpg --no-default-keyring --import --keyring /etc/apt/trusted.gpg
Then you need to get the signature of your_package_name:
apt-offline set debian-install.sig --install-packages your_package_name
Next step ,Upload debian-install.sig to the on-line system and download required files.
apt-offline get debian-install.sig --bundle debian-install.zip
Upload debian-install.zip file to the off-line system, install it using apt-offline utility to update APT database.
sudo apt-offline install debian-install.zip
install the specified packages your_package_name :
sudo apt-get install your_package_name
You can download your package using windows machine tuto
You are in a pickle. The R package mechanism expects you to be connected to get dependencies. That said, you can get some help:
R> AP <- available.packages(contrib.url(options("repos")$repos[1]))
R> revs <- tools::package_dependencies("shiny", AP, recursive=TRUE)[[1]]
R> revs
[1] "methods" "utils" "httpuv" "mime"
[5] "jsonlite" "xtable" "digest" "htmltools"
[9] "R6" "Rcpp" "tools" "stats"
R>
You can now look into AP again and feed this into download.packages().
Also, several (all ?) of these are in a newer Debian distro so you could use apt-get in download-mode (maybe using apt-offline as suggested in the other question).
Lastly, we do offer a Docker container for shiny so if you use that on your VM you don't need anything else.
sudo apt-get update
sudo apt-get install r-cran-digest
I can`t belive that it was so easy. I spent a long time searching and got only bad answers. This commands just solve everything.
I used it on trisquel
after checking all the answers on stackoverflow, I am not able to find exactly how to to install r-base on Debian/linux system. So, I have tried myself and able to run by below steps:
Run below command on internet working Linux machine in a custom folder.
apt-get download r-base r-base-core r-recommended libmpfr6 libisl19 cpp cpp-8 cpp-4 gcc cpp-8 binutils-common libbinutils binutils-x86-64-linux-gn gfortran linux-libc-dev g++ g++-8 libstdc make dpkg-dev perl-base perl-modules-5.28 libperl5.28 ibgdbm-compat4 zip unzip libpaper-utils xdg-utils libblas3 libblas.so.3 libcairo2 libcurl4 libgfortran5 libglib2.0-0 libice6 libicu63 libjpeg62-turbo liblapack3 liblapack.so.3 libpango-1.0-0 libpangocairo-1.0-0 libpng16-16 libsm6 libtcl8.6 libtiff5 libtk8.6 libx11-6 libxext6 libxss1 libxt6 ucf libfontconfig1 libfreetype6 libpixman-1-0 libxcb-render0 libxcb-shm0 libxcb1 libxrender1 libgssapi-krb5-2 libk5crypto3 libkrb5-3 libldap-2.4-2 libnghttp2-14 libpsl5 librtmp1 libssh2-1 libbsd0 x11-common fontconfig libfribidi0 libthai0 libcairo2:amd64 libfontconfig1 libfreetype6 libpango-1.0-0:amd64 libpangoft2-1.0-0 fontconfig-config libkeyutils1 libkrb5support0 libkeyutils1 libkrb5support0 libkeyutils1 libkrb5support0 libsasl2-2 libldap-common fontconfig-config libharfbuzz0b libpaper1 libsasl2-modules-db libthai-data libdatrie1 libwebp6 libjbig0 libxft2 libx11-data libxau6 libxdmcp6 fonts-dejavu-core ttf-bitstream-vera fonts-liberation libgraphite2-3 lsb-base sensible-utils r-cran-boot r-cran-cluster r-cran-foreign r-cran-kernsmooth r-cran-lattice r-cran-mgcv r-cran-nlme r-cran-rpart r-cran-survival r-cran-mass r-cran-class r-cran-nnet r-cran-spatial r-cran-codetools r-cran-matrix
This will create debian files inside the folder.
`Execute tar -zcf folder.tar.gz folder`
Copy this folder.tar.gz in offline computer and execute below command after going inside the folder.
dpkg -i *
Now check R in your system by typing
R --version

Install shiny on remote Debian machine with R version 3.1.1

I am trying to host an shiny app on an remote Debian machine. Yet, i have encountered an R version issue when installing shiny package. I will basically walk through the steps that I took in the process:
After SSH into the VM, I install and update the r-base:
sudo apt-get update
sudo apt-get install r-base
sudo apt-get install r-base-dev
The latest version I can get for R is 3.1.1. Then, I was trying to install "shiny" package as root by the following command:
sudo su - -c "R -e \"install.packages('shiny', repos='http://cran.rstudio.com/')\""
Then, I was getting the error message like:
Installing package into ‘/usr/local/lib/R/site-library’
(as ‘lib’ is unspecified)
Warning: unable to access index for repository http://cran.rstudio.com/src/contrib
Warning message:
package ‘shiny’ is not available (for R version 3.1.1)
Is there any work-around on this issue? Such as starch the apt-get to install the latest R version rather than 3.1.1? Or possibly install shiny from a Github repo? Please help! Thanks!
You should be able to get the R package yourself, rather than using apt-get. This way you can choose which release to install. For example:
wget http://cran.rstudio.com/src/base/R-3/R-3.2.2.tar.gz
tar zxvf R-3.2.2.tar.gz; cd R-3.2.2/
./configure; make;
sudo make install
Then you can get shiny through the terminal as well, rather than within R:
wget https://cran.r-project.org/src/contrib/shiny_0.13.2.tar.gz
sudo R CMD INSTALL shiny_0.13.2.tar.gz
credit to Huiong Tian, from whom I learned this a while back:
http://withr.me/install-shiny-server-on-raspberry-pi/

Resources