MXNet R package on an Amazon Linux Deep Learning EC2 instance - r

I'm attempting to setup an Amazon Linux EC2 instance with MXNet and R (and the MXNet r package available as well). Unfortunately this has been a lot harder than I expected.
I've attempted to follow the instructions from MXNet using Amazon's deep learning AMI with CUDA 8.0 on a p2.xlarge (https://mxnet.incubator.apache.org/get_started/install.html)
However I get the same error when attempting to compile the mxnet r package from this SO post:
Issues installing mxnet GPU R package for Amazon deep learning AMI
The solution discussed in that post are somewhat beyond my abilities to fully test/debug. i.e. I'm not particularly familiar with linux environment variables and such to modify. I've also reviewed some issues raised on the apache-incubator github for MXnet and those were pretty unhelpful as well.
So my questions are,
Is anyone aware of any available AMI's which come pre-packaged with R and MXNet? The ones I see seem to only include python.
Have a working set of instructions (or a script) to run on an Amazon Linux EC2 instance to install the required dependencies (assuming Im using some type of deep learning AMI that comes with CUDA 8.0 at least) to install the MXnet R package?

Right so I was the guy on the other post and I DID eventually get it working. Took 50+ hours and I'm not 100% sure where the issue was because...linux.
sudo yum install R
sudo yum install libxml2-devel
sudo yum install cairo-devel
sudo yum install giflib-devel
sudo yum install libXt-devel
sudo R
install.packages("devtools")
library(devtools)
install_github("igraph/rigraph")
install.packages(c(“DiagrammeR”, “roxygen2”, “rgexf”, “influenceR”, “Cairo”, “imager”))
cd
cd /src/mxnet
cp make/config.mk .
echo "USE_BLAS=openblas" >>config.mk
echo "ADD_CFLAGS += -I/usr/include/openblas" >>config.mk
echo "ADD_LDFLAGS += /usr/local/lib" >>config.mk
echo "USE_CUDA=1" >>config.mk
echo "USE_CUDA_PATH=/usr/local/cuda-9.0/lib64" >>config.mk
echo "USE_CUDNN=1" >>config.mk
*add another LD flag for /usr/local/lib
cd /etc/ld.so.conf.d/
sudo nano cuda.conf
Insert  /usr/local/cuda-9.0/lib64
cd
export LD_LIBRARY_PATH=/usr/local/cuda-9.0/lib64/:$LD_LIBRARY_PATH
export LD_LIBRARY_PATH=/usr/local/lib/:$LD_LIBRARY_PATH
sudo ldconfig
cd R-package
Rscript -e "install.packages('devtools', repo = 'https://cran.rstudio.com')"
Rscript -e "library(devtools); library(methods);options(repos=c(CRAN='https://cran.rstudio.com'));install_deps(dependencies = TRUE)"
cd ..
sudo make rpkg
THEN you gotta make sure R/Rstudio can actually find those libraries:
cd /etc/rstudio
sudo nano rserver.conf
You can add elements to the default LD_LIBRARY_PATH for R sessions (as determined by the R ldpaths script) by adding an rsession-ld-library-path entry to the server config file. This might be useful for ensuring that packages can locate external library dependencies that aren't installed in the system standard library paths. For example:
rsession-ld-library-path=/opt/local/lib:/usr/local/cuda/lib64

Related

Using docker buildkit caching with R-packages

I'm trying to use the docker buildkit approach to caching packages to speed up adding packages to docker containers. I learned about it from the instructions for both python and apt-get packages and useful Stackexchange answer on caching python packages while building Docker. For Python and apt-get I am able to get this to work, but I can't get it to work for R packages.
In a Dockerfile for Python I'm able to change:
RUN pip install -r requirements.txt
to (and the comment looking bit at the top of the Dockerfile is needed)
# syntax=docker/dockerfile:experimental
RUN --mount=type=cache,target=/root/.cache/pip pip install -r requirements.txt
And then when I add a package to the requirements.txt file, rather than re-downloading and building the packages, pip is able to re-use all the work it has done. So buildkit cache mounts add a level of caching beyond the image layers of docker. It's a massive timesaver. I'm hoping to set up something similar for r-packages.
Here is what I've tried that works for apt-get but not r-packges. I've also tried with the install2.r script.
# syntax=docker/dockerfile:experimental
FROM rocker/tidyverse
RUN rm -f /etc/apt/apt.conf.d/docker-clean; echo 'Binary::apt::APT::Keep-Downloaded-Packages "true";' > /etc/apt/apt.conf.d/keep-cache
RUN --mount=type=cache,target=/var/cache/apt --mount=type=cache,target=/var/lib/apt \
apt update && apt install -y gcc \
zsh \
vim
COPY ./requirements.R .
RUN --mount=type=cache,target=/usr/local/lib/R/site-library Rscript ./requirements.R
I think I don't understand:
How buildkit works. Does it do the building of containers inside a container? ie the cache path is on the 'build container'?
What one needs to specify as the target for R to notice that it already has downloaded (and possibly built).
I suspect that it has something to do with the keep.source command when installing an R package, as discussed in this question

Installing Sqlite3 on Ubuntu 14.04 for Python3.6

I have a virtual environment set up with python 3.6. I'm trying to install sqlite3 (I built python from source) and am having trouble doing so. (I need sqlite3 for tensorboard)
After some digging I found an approach:
sudo apt-get install libsqlite3-dev
Now in the downloaded python source rebuild and install python with the following command:
./configure --enable-loadable-sqlite-extensions && make && sudo make install
The issue is I cannot run the first command. Running the first command gives me the error "download failed Oracle JDK 6 is NOT installed." Therefore I downloaded the libsqlite3-dev file.
My question is, where should this be placed before I can run step 2.
I've looked around for a solution for a few hours now ans seem to be at a loss. Any help would be really appreciated with either solving this approach or proposing another approach.
Use Anaconda
wget https://repo.continuum.io/miniconda/Miniconda3-latest-Linux-x86_64.sh
bash Miniconda3-latest-Linux-x86_64.sh
conda create -n envsq python=3.7
source activate envsq
python
And you can import sqlite3 with no issues.

How to speed up R packages installation in docker

Say you have the following list of packages you would like to install for a docker image
("jsonlite","dplyr","stringr","tidyr","lubridate",
"knitr","purrr","tm","cba","caret",
"plumber","httr")
It actually takes around 1 hour to install these!
Any suggestions into how to speed up such a thing ? (or how to prevent the re-installation at every new image build ?)
Side note
I do not install these packages from the dockerfile like this:
RUN Rscript -e "install.packages('stringr')
...
Instead I create an R script Requirements.R which installs these packages and simply do:
RUN Rscript Requirements.R
Is these less optimal than installing the packages directly from the Dockerfile ?
Use binary packages where you can as we often do in the Rocker Project providing multiple Docker files for R, including the official r-base one.
If you start from Ubuntu, you get Michael's PPAs with over 3000+ packages; if you start from Debian you get fewer from the distro but still many essential ones. (There are some efforts to bring more binary packages to Debian but nothing is up right now.)
Lastly, Dockerfile creation is of course compile time too. You spend the time once (per container creation) and re-use potentially many time after. Also, by using the Docker Hub you can avoid spending your local cpu cycles.
Edit in Sep 2020: The (updated) Ubuntu PPA now has over 4600 package for the three most recent LTS releases. Still highly, highly recommended.
I found an article that described how to install R packages from precompiled binaries. It reduced the build time on our Jenkins server from 45 minutes down to 3 minutes.
Here is my Dockerfile:
FROM rocker/r-apt:bionic
WORKDIR /app
RUN apt-get update && \
apt-get install -y libxml2-dev
# Install binaries (see https://datawookie.netlify.com/blog/2019/01/docker-images-for-r-r-base-versus-r-apt/)
COPY ./requirements-bin.txt .
RUN cat requirements-bin.txt | xargs apt-get install -y -qq
# Install remaining packages from source
COPY ./requirements-src.R .
RUN Rscript requirements-src.R
# Clean up package registry
RUN rm -rf /var/lib/apt/lists/*
COPY ./src /app
EXPOSE 5000
CMD ["Rscript", "Server.R"]
You can add a file requirements-bin.txt with package names:
r-cran-plumber
r-cran-quanteda
r-cran-irlba
r-cran-lsa
r-cran-caret
r-cran-stringr
r-cran-dplyr
r-cran-magrittr
r-cran-randomforest
And finally, a requirements-src.R for packages that are not available as binairies:
pkgs <- c(
'otherpackage'
)
install.packages(pkgs)
I ended up using rocker/r-base as #DirkEddelbuettel suggested. Also thanks to this How to avoid reinstalling packages when building Docker image for Python projects? I wrote my Dockerfile in a way that doesen't reinstall packages every time I rebuild my docker image.
I want to share how my Dockerfile looks like now, hopefully this will be of help to others:
FROM rocker/r-base
RUN apt-get update
# install packages
RUN apt-get -y install libcurl4-openssl-dev
RUN apt-get -y install libssl-dev
# set work directory
WORKDIR /myapp
# copy requirments R script
COPY ./Requirements.R /myapp/Requirements.R
# run requirments R script
RUN Rscript Requirements.R
COPY . /myapp
EXPOSE 8094
ENV NAME R-test-service
CMD ["Rscript", "my_R_api.R"]

Installing R packages on Amazon Linux EC2 instance

I am learning to use RSelenium in an EC2 instance, and I found this handy guide on doing so - https://rpubs.com/grahamplace/rselenium-ec2 - however the guide focuses on an Ubuntu instance and I am using an Amazon Linux Instance. In order to install RSelenium, the guide says I must externally (outside of R but ssh'd into my EC2 instance) install the packages xml (XML i think, case sensitive) and RCurl. The guide's relevant lines of code are:
sudo apt-get install r-cran-xml
sudo apt-get install r-cran-RCurl
however, since I'm in an Amazon Linux instance, I tried:
sudo yum install r-cran-xml
sudo yum install r-cran-RCurl
for which I get the following error:
No package r-cran-RCurl available.
Error: Nothing to do
Note: I was successful in installing R on my machine (my instance), and I am able to simply type R to launch R in the EC2 instance.
Note2: install.packages('XML') and install.packages('RCurl') with R launched do not work either.
Any help appreciated with this, thanks!
the amazon linux R package has a different name:
sudo yum install -y R
then you tried (in R) install.packages(c('XML','RCurl')), but the installation failed.
as you discovered and describe in the comment below, you needed to install an additional amazon linux package, libxml2-devel, in order to install.packages('XML') successfully.
this is what I get when I run sudo yum install -y R
No package R available.
Error: Nothing to do
R is available in Amazon Linux Extra topic "R3.4"
To use, run
sudo amazon-linux-extras install R3.4
Learn more at
https://aws.amazon.com/amazon-linux-2/faqs/#Amazon_Linux_Extras

Rscript on ubuntu

Where can I install Rscript from? I need to run an R script from a php file using exec. However I need to install Rscript first.
The main package for R is called r-base. For the scripting and command-line front-end see littler (or r-cran-littler in xenial (16.04LTS) and beyond):
sudo apt-get install littler
Search the ubuntu repositories. Have you checked the littler package?
The answers posted so far are generally useful, but they don't directly answer the question. I recently had the same question and discovered there is no rscript binary for Ubuntu. The r binary itself is used to execute scripts in batch mode as opposed to the separate rscript binary that I was using in OS X.
It appears you may be able to get an rscript binary from other sources (see http://craig-russell.co.uk/2012/05/08/install-r-on-ubuntu.html#.UwKWzkJdW2Q for example), but I'm not sure why you would need that when simple running "r script.r" from the command line works just fine.
I tried running Rscript in a fresh ubuntu installation (16.04.2 LTS) and got:
The program 'Rscript' is currently not installed. You can install it by typing:
sudo apt install r-base-core
so, naturally, i ran sudo apt install r-base-core.
installation took a couple of minutes.
Later, i needed to install an R package, and realized i needed an R shell for that. running r returned:
The program 'r' is currently not installed. You can install it by typing:
sudo apt install r-cran-littler
Again, i followed, this time it was quite faster.
I don't know if these are the correct steps to take (or why they would be wrong), but it's what the system led me to do.

Resources