Issues installing mxnet GPU R package for Amazon deep learning AMI - r

I am having trouble installing mxnet GPU for R on Amazon deep learning linux AMI. The environment variables are such a mess that it’s a nightmare for any non-expert sys-admin to figure out.
Step 1: install the ridiculous amount of missing/broken programs and R packages
sudo yum install R
sudo yum install libxml2-devel
sudo yum install cairo-devel
sudo yum install giflib-devel
sudo yum install libXt-devel
sudo R
install.packages("devtools")
library(devtools)
install_github("igraph/rigraph")
install.packages(‘DiagrammeR’)
install.packages(‘roxygen2’)
install.packages(‘rgexf’)
install.packages(‘influenceR’)
install.packages(‘Cairo’)
install.packages(“imager”)
Step 2: edit the config.mk file
cd /src/mxnet
cp make/config.mk .
echo "USE_BLAS=openblas" >>config.mk
echo "ADD_CFLAGS += -I/usr/include/openblas" >>config.mk
echo "ADD_LDFLAGS += -lopencv_core -lopencv_imgproc -lopencv_imgcodecs" >>config.mk
echo "USE_CUDA=1" >>config.mk
echo "USE_CUDA_PATH=/usr/local/cuda" >>config.mk
echo "USE_CUDNN=1" >>config.mk
*note even though the USE_CUDA_PATH is set, it STILL cannot find libcudart.so and needs to be linked in the make command (shown later)
Step 3: make new config file so make command can find libcudart.so
/etc/ld.so.conf.d/cuda.conf
add /usr/local/cuda-8.0/lib64
sudo ldconfig
note this was posted by nvidia but does absolutely nothing to help the make rpkg
Step 4: set up R directories
Rscript -e "install.packages('devtools', repo = 'https://cran.rstudio.com')"
cd R-package
Rscript -e "library(devtools); library(methods); options(repos=c(CRAN='https://cran.rstudio.com'));
install_deps(dependencies = TRUE)"
cd ..
step 5: make
cd /src/mxnet
sudo make -j8
Result:
make CXX=g++ DEPS_PATH=/home/ec2-user/src/mxnet/deps -C /home/ec2-user/src/mxnet/ps-lite ps
cd /home/ec2-user/src/mxnet/dmlc-core; make libdmlc.a USE_SSE=1 config=/home/ec2-user/src/mxnet/config.mk; cd /home/ec2-user/src/mxnet
make[1]: Entering directory /home/ec2-user/src/mxnet/dmlc-core'
make[1]:libdmlc.a' is up to date.
make[1]: Leaving directory /home/ec2-user/src/mxnet/dmlc-core'
make[1]: Entering directory/home/ec2-user/src/mxnet/ps-lite'
make[1]: Nothing to be done for ps'.
make[1]: Leaving directory/home/ec2-user/src/mxnet/ps-lite'
ar crv lib/libmxnet.a
*note, even when changing the config.mk file, the make command always returns ‘nothing to update’
Step 6: attempt to make rpkg
Cd /src/mxnet
Sudo make rpkg
Error:
Error: package or namespace load failed for ‘mxnet’:
.onLoad failed in loadNamespace() for 'mxnet', details:
call: dyn.load(file, DLLpath = DLLpath, ...)
error: unable to load shared object '/usr/lib64/R/library/mxnet/libs/libmxnet.so':
libcudart.so.8.0: cannot open shared object file: No such file or directory
Error: loading failed
Execution halted
ERROR: loading failed
So it’s looking in a location that doesn’t exist: /usr/lib64/R/library/mxnet/libs/
When the file actually lives:
/home/ec2-user/src/mxnet/R-package/inst/libs/libmxnet.so
or
/home/ec2-user/src/mxnet/lib/libmxnet.so
What I’ve tried so far:
sudo LD_LIBRARY_PATH=/usr/local/cuda/lib64 make rpkg
This will fix the missing libcudart.so.8.0 issue but it is simply replace with:
libmklml_intel.so: cannot open shared object file: No such file or directory as well as the original ‘cannot find libmxnet.so
Also tried:
1. actually creating directories (/usr/lib64/R/library/mxnet/libs/) and then copying libmxnet.so there
Result: same error
adding /home/ec2-user/src/mxnet/R-package/inst/libs/ to the make command
sudo LD_LIBRARY_PATH=/home/ec2-user/src/mxnet/R-package/inst/libs make rpkg
Result: same error
a ridiculous amount of environment labels all of which failed:
export MXNET_HOME=/usr/lib64/R/library/mxnet/libs/
export MXNET_HOME=/usr/lib64/R/library/mxnet/libs/libmxnet.so
sudo ldconfig /usr/local/cuda/lib64
sudo ln -s /usr/lib64/R/library/mxnet/libs /usr/lib
sudo ln -s /usr/lib64/R/library/mxnet/libs/libmxnet.so /usr/lib
sudo ln -s /usr/local/lib/libmklml_intel.so /usr/lib
sudo ln -s /usr/local/lib/libiomp5.so /usr/lib
sudo ln -s /usr/local /usr/lib
export LD_LIBRARY_PATH=/usr/local/cuda-8.0/lib64/libcudart.so.8.0
export LD_LIBRARY_PATH=/usr/lib64/R/library/mxnet/libs/libmxnet.so /usr/lib
export LD_LIBRARY_PATH=/usr/local/cuda-8.0/targets/x86_64-linux/lib/:$LD_LIBRARY_PATH
export LD_LIBRARY_PATH=/usr/local/cuda-8.0/lib64/libcudart.so.8.0
In all ONE of these worked, because I briefly got mxnet R package working before it fell apart again. I’ve dropped 50+ hours into this installation, which, frankly is ridiculous. Tougher to install the software then it is to program an actual net....
I don’t have 5+ years of linux sys admin knowledge so if you’d like please be a bit more helpful then ‘fix environment variables.’ I can tell that’s obviously what’s wrong yet have no idea what ‘fix environment variables’ entails.
To top it off, even after successful install of the R package, it STILL won’t work until setting Rstudio server’s config file to: rsession-ld-library-path=/opt/local/lib:/usr/local/cuda/lib64

Did you try the following when running any sudo commands.
sudo -E make -j8
This means that it will preserve the env variables when running as superuser. You shouldn't have to add a new config file for the make to find the libraries. Just preserving the env variables using the above command should be enough.

Related

Error when running Singularity container for R script

I am building a Singularity container to run a custom R script for tree segmentation using the LidR software package.
I have written the Singularity definition file as such:
Bootstrap: docker
From: ubuntu:20.04
%setup
touch test.R
touch treeSeg_dalponte2016.R
touch /home/ljeasson/R/x86_64-pc-linux-gnu-library/3.6/rgdal/libs/rgdal.so
%files
test.R
treeSeg_dalponte2016.R
/home/ljeasson/R/x86_64-pc-linux-gnu-library/3.6/rgdal/libs/rgdal.so
%post
# Disable interactivity, including region and time zone
export DEBIAN_FRONTEND="noninteractive"
export DEBCONF_NONINTERACTIVE_SEEN=true
# Update apt and install necessary libraries and repositories
apt update
apt install -y build-essential r-base-core software-properties-common dirmngr apt-transport-https lsb-release ca-certificates
add-apt-repository ppa:ubuntugis/ubuntugis-unstable
apt install -y libgdal-dev libgeos++-dev libudunits2-dev libproj-dev libx11-dev libgl1-mesa-dev libglu1-mesa-dev libfreetype6-dev libnode-dev libxt-dev libfftw3-dev
apt clean
# Install necessary R packages and dependencies
R -e "install.packages('lidR', dependencies = TRUE)"
R -e "install.packages('raster', dependencies = TRUE)"
R -e "install.packages('sf', dependencies = TRUE)"
R -e "install.packages('dplyr', dependencies = TRUE)"
R -e "install.packages('rgdal', dependencies = TRUE, repos='https://cran.rstudio.com', configure.args=c('--with-gdal-config=/opt/conda/bin/gdal-config', '--with-proj-include=/opt/conda/include', '--with-proj-lib=/opt/conda/lib', '--with-proj-share=/opt/conda/share/proj/'))"
R -e "install.packages('gdalUtils', dependencies = TRUE, repos='https://cran.rstudio.com')"
%test
#!/bin/bash
R --version
Rscript test.R
%runscript
#!/bin/sh
echo "Arguments received: $*"
Rscript treeSeg_dalponte2016.R $*
And build the container using singularity build ga_container.sif ga_container.def
The container builds without error, but when the container is run using ./ga_container <arguments>, this error always occurs:
Error: package or namespace load failed for 'rgdal' in dyn.load(file, DLLpath = DLLpath, ...):
unable to load shared object '/home/ljeasson/R/x86_64-pc-linux-gnu-library/3.6/rgdal/libs/rgdal.so':
libgdal.so.26: cannot open shared object file: No such file or directory
Execution halted
I know that the error is occuring because it cannot find the image for Rgdal, even though it seems I've attached to the container in the %setup and %files section:
%setup
touch test.R
touch treeSeg_dalponte2016.R
touch /home/ljeasson/R/x86_64-pc-linux-gnu-library/3.6/rgdal/libs/rgdal.so
%files
test.R
treeSeg_dalponte2016.R
/home/ljeasson/R/x86_64-pc-linux-gnu-library/3.6/rgdal/libs/rgdal.so
If the error is from incorrect file attachment, how do I ensure that the Rgdal (and other similar libraries) are attached correctly within the Singularity container?
Thanks in advance
This looks like an environmental issue causing the image to look at your locally installed R modules instead of using the ones installed in the image. Perhaps in your .Rprofile or R_LIBS/R_LIBS_USER. Try running with singularity run --cleanenv ..., or temporarily moving your .Rprofile if you have one, and see if that fixes it. If not, I have a few other observations.
First, the %setup block is creating root owned, empty files on the host OS if they don't exist already. An empty .so file would certainly cause problems. For the majority of cases you don't want to use %setup, as it directly modifies the host as root during sudo singularity build.
In the %files block you are copying the (potentially root owned/empty) to a path in the image that matches your home directory. Your $HOME is automatically mounted when you run/exec/shell an image, which will hide any files in the image at that location. When adding files to an image, you should always put them in a place they are unlikely to get clobbered by a mount. /opt/myapp or something similar usually works well. Additionally, test.R and treeSeg_dalponte2016.R are copied to /test.R and /treeSeg_dalponte2016.R inside the container, but relative paths are used in %runscript and %test. Singularity run/exec will attempt to run from the first path that exists in the container: $PWD (implicitly mounted, but this can fail silently), then $HOME (also implicitly mounted and can fail silently), then /. You can use singularity --verbose run ... to see if anything isn't being mounted correctly and add echo $PWD to %runscript to see where it's running from.
In %post when you install the rgdal package, you specify several paths with /opt/conda/... but conda is not installed or configured in the image. I'm not familiar with rgdal, so don't know if that would cause problems or not though.

Rserve : ld: library not found for -lssl

I am getting an error while trying to install Rserve 1.8.6. I can successfully install 1.7.3 from CRAN. This is on Mac OS High Sierra.
ld: library not found for -lssl
clang: error: linker command failed with exit code 1 (use -v to see invocation)
make[1]: *** [forward] Error 1
make: *** [all] Error 2
ERROR: compilation failed for package ‘Rserve’
* removing ‘/Library/Frameworks/R.framework/Versions/3.5/Resources/library/Rserve’
* restoring previous ‘/Library/Frameworks/R.framework/Versions/3.5/Resources/library/Rserve’
The downloaded source packages are in
‘/private/var/folders/v7/hyxrfmk94p1_03gdrm27fnxncy3vq1/T/RtmpFHKNMe/downloaded_packages’
This worked for me (MacOS):
In terminal:
brew install openssl
export LIBRARY_PATH=$LIBRARY_PATH:/usr/local/opt/openssl/lib/
I'm running Mac OS 10.15 Catalina, I've spent 2 days trying to fix this same problem, scouring the internet for help. I finally managed it by cobbling together solutions from a few different sources.
The key thing I was missing was that Mac OS ships with its own version of openssl which it thinks is superior to anything else you can find. It is wrong. What you need to do is go and download the latest version of openssl, install that, then export THAT library to your library path variable. Here are the steps I took with openssl 1.1.1:
Get the version number for the latest version of openssl from the source (https://www.openssl.org/source/) and then manually install it directly where it's supposed to go:
cd /usr/local/src
If you're getting "No such file or directory", make it:
cd /usr/local && mkdir src && cd src
Download openssl using curl (shown) or using the link above to the source code (make sure you put the file in the directory you just made in the previous step):
curl --remote-name https://www.openssl.org/source/openssl-1.1.1f.tar.gz
Extract and cd in:
tar -xzvf openssl-1.1.1f.tar.gz
cd openssl-1.1.1f
Compile and install (these are the 64 bit Mac OS instructions. Refer to the openssl documentation for 32 bit and other OS instructions):
./Configure darwin64-x86_64-cc shared enable-ec_nistp_64_gcc_128 no-ssl2 no-ssl3 no-comp --openssldir=/usr/local/ssl/macos-x86_64
make depend
sudo make install
This created a new openssl folder so when you export the library path you have to feed it the right openssl folder:
export LIBRARY_PATH=$LIBRARY_PATH:/usr/local/opt/openssl#1.1/lib/
Hope that helps you if you haven't figured it out yet, and anyone else in the future who is ready to chuck their computer across the room, like I was.

MXNet R package on an Amazon Linux Deep Learning EC2 instance

I'm attempting to setup an Amazon Linux EC2 instance with MXNet and R (and the MXNet r package available as well). Unfortunately this has been a lot harder than I expected.
I've attempted to follow the instructions from MXNet using Amazon's deep learning AMI with CUDA 8.0 on a p2.xlarge (https://mxnet.incubator.apache.org/get_started/install.html)
However I get the same error when attempting to compile the mxnet r package from this SO post:
Issues installing mxnet GPU R package for Amazon deep learning AMI
The solution discussed in that post are somewhat beyond my abilities to fully test/debug. i.e. I'm not particularly familiar with linux environment variables and such to modify. I've also reviewed some issues raised on the apache-incubator github for MXnet and those were pretty unhelpful as well.
So my questions are,
Is anyone aware of any available AMI's which come pre-packaged with R and MXNet? The ones I see seem to only include python.
Have a working set of instructions (or a script) to run on an Amazon Linux EC2 instance to install the required dependencies (assuming Im using some type of deep learning AMI that comes with CUDA 8.0 at least) to install the MXnet R package?
Right so I was the guy on the other post and I DID eventually get it working. Took 50+ hours and I'm not 100% sure where the issue was because...linux.
sudo yum install R
sudo yum install libxml2-devel
sudo yum install cairo-devel
sudo yum install giflib-devel
sudo yum install libXt-devel
sudo R
install.packages("devtools")
library(devtools)
install_github("igraph/rigraph")
install.packages(c(“DiagrammeR”, “roxygen2”, “rgexf”, “influenceR”, “Cairo”, “imager”))
cd
cd /src/mxnet
cp make/config.mk .
echo "USE_BLAS=openblas" >>config.mk
echo "ADD_CFLAGS += -I/usr/include/openblas" >>config.mk
echo "ADD_LDFLAGS += /usr/local/lib" >>config.mk
echo "USE_CUDA=1" >>config.mk
echo "USE_CUDA_PATH=/usr/local/cuda-9.0/lib64" >>config.mk
echo "USE_CUDNN=1" >>config.mk
*add another LD flag for /usr/local/lib
cd /etc/ld.so.conf.d/
sudo nano cuda.conf
Insert  /usr/local/cuda-9.0/lib64
cd
export LD_LIBRARY_PATH=/usr/local/cuda-9.0/lib64/:$LD_LIBRARY_PATH
export LD_LIBRARY_PATH=/usr/local/lib/:$LD_LIBRARY_PATH
sudo ldconfig
cd R-package
Rscript -e "install.packages('devtools', repo = 'https://cran.rstudio.com')"
Rscript -e "library(devtools); library(methods);options(repos=c(CRAN='https://cran.rstudio.com'));install_deps(dependencies = TRUE)"
cd ..
sudo make rpkg
THEN you gotta make sure R/Rstudio can actually find those libraries:
cd /etc/rstudio
sudo nano rserver.conf
You can add elements to the default LD_LIBRARY_PATH for R sessions (as determined by the R ldpaths script) by adding an rsession-ld-library-path entry to the server config file. This might be useful for ensuring that packages can locate external library dependencies that aren't installed in the system standard library paths. For example:
rsession-ld-library-path=/opt/local/lib:/usr/local/cuda/lib64

Install R from source

I'm trying to install R from source in my home directory on a server running CentOS.
I do not have root rights, and I'm not permitted to write to `usr/local/include/'.
I use the following code:
wget http://cran.rstudio.com/src/base/R-3/R-3.2.0.tar.gz
tar xvf R-3.2.0.tar.gz
cd R-3.2.0
./configure --prefix=$HOME/R
In configuration step, I get error
configure:error:--with-readline=yes (default) and headers/libs are not available
In my understanding, it tells me that readline library is not available.
So I try to install readline.
I downloaded tar.gz file. and then I use the following command
tar xvf readline-6.3.tar.gz
cd readline-6.3
./configure --prefix=$HOME/readline
make
make install
Things are fine, and there's an additional folder in my home directory named "readline".
When I go back and try to configure R again, I still get the same error message. How can I fix it?
Try using these flags for your ./configure script
CXXFLAGS="-ggdb -pipe -Wall -pedantic -I/path/readline/6.3/include"
CPPFLAGS="-I/path/readline/6.3/include"
LDFLAGS="-L/path/readline/6.3/lib"
Then if you get an error about X11, set --with-x=no and try again.

nginx install on linux

I downloaded nginx from it's site for linux(I use ubuntu 10.4).I extracted nginx-1.0.6.tar.gz and there was a configure file in that directory. So I entered "./configure" command in shell. It seemed to be configured right.After I entered "make" command ,It said this error:
make -f objs/Makefile
make[1]: Entering directory `/usr/local/nginx'
cd ./auto/lib/pcre/ \
&& if [ -f Makefile ]; then make distclean; fi \
&& CC="gcc" CFLAGS="-O2 -fomit-frame-pointer -pipe " \
./configure --disable-shared
/bin/sh: ./configure: not found
make[1]: *** [auto/lib/pcre//Makefile] Error 127
make[1]: Leaving directory `/usr/local/nginx'
make: *** [build] Error 2
what should I do now?
you have to install Dependencies .
generally these will be enough
libpcre3 libpcre3-dev libpcrecpp0 libssl-dev zlib1g-dev
so you can first install them
sudo apt-get install libpcre3 libpcre3-dev libpcrecpp0 libssl-dev zlib1g-dev
and then compile .. also make sure you run the make command as root.
The ./configure program of nginx wants to find either the shared libs to build nginx dynamicaly linked or the sources of openssl prce and zlib respectivly.
The obove mentioned error occurs when you invoke ../nginx/configure with the wrong options.
--with-pcre=/path/to/lib # where libpcre.a resides
--with-openssl=/path/to/lib # where libssl.a resides
--with-zlib=/path/to/lib # where libz.a resides
is wrong especially when ld.so has no idea about these libs
If you build a statically linked version of nginx
try instead
--with-pcre=/path/to/src/of/pcre
--with-openssl=/path/to/src/of/openssl
--with-zlib=/path/to/src/of/zlib
e.g.
--with-pcre=../pcre-8.36 --with-openssl=../openssl-1.0.2 --with-zlib=../zlib-1.2.8
Download PCRE from source
Unzip it (do not install)
Copy this path to configure (from Downloads folder)
./configure --with-pcre=/home/USER/DOWNLOADS/pcre-8.37/
Enter your nginx install directory - I solved this error by editing objs/Makefile and removing -Wall and -Werror params so it looks like this (second line):
CC = gcc
CFLAGS = -pipe -O -W -Wpointer-arith -Wno-unused-parameter -Wunused-function -Wunused-variable -Wunused-value -g
Also, running your ./configure should initiate a long procedure of verifications to ensure
that your system contains all the necessary components. If the configuration fails for any reason, check
less objs/autoconf.err
for more details. Any errors at configuration are usually based on missing dependencies for your configuration.
You didn't configure it right. Use these commands (in the nginx directory):
./configure --with-pcre=./auto/lib/pcre/ --with-zlib=./auto/lib/zlib/
./configure
make
sudo make install
Look for Ubuntu installation at http://wiki.nginx.org/Install.
Look for ubuntu/ centos installation at https://nodevine.com/library/installing-multiple-virtual-hosts-on-nginx-on-ubuntu-12-04-and-cent-os-6
We can now add the repository to install the latest version of nginx:
sudo add-apt-repository ppa:nginx/stable
Note: If this command still does not work (normally on 12.10), run the following command:
sudo apt-get install software-properties-common
This will add the repository to Ubuntu and fetches the repository's key. This is to verify that the packages have not been interfered with since they have been built.
Step Three - Updating the Repositories
After adding a new repository, you will need to update the list:
sudo apt-get update
Install nginx
To install nginx or update the version you already have installed, run the following command:
sudo apt-get install nginx
Check That Nginx is Running
You can check to see that nginx is running by either going to your VPS' IP address/domain, or typing in:
service nginx status
This will tell you whether nginx is currently running.
with a vps debian wheezy
I have to install a lot of tools in order to install nginx 1.2.9 :
apt-get install libpcre3 libpcre3-dev
apt-get install --reinstall zlibc zlib1g zlib1g-dev
apt-get install make
apt-get install sudo

Resources