Unable to install spark with sparklyr in Dockerfile - r

We are trying to build our own docker image to use R and the tidyverse with Spark. However, we are getting an error in the build when trying to install Spark.
Here is our Dockerfile:
# start with the most up-to-date tidyverse image as the base image
FROM rocker/tidyverse:latest
# install openjdk 8 (Java)
RUN apt-get update \
&& apt-get install -y openjdk-8-jdk
# Install sparklyr
RUN install2.r --error --deps TRUE sparklyr
# Install spark
RUN Rscript -e 'sparklyr::spark_install("2.4.3")'
RUN mv /root/spark /opt/ && \
chown -R rstudio:rstudio /opt/spark/ && \
ln -s /opt/spark/ /home/rstudio/
RUN install2.r --error --deps TRUE DBI
RUN install2.r --error --deps TRUE RPostgres
RUN install2.r --error --deps TRUE dbplyr
We are using Docker compose up to build and then create the container.
When building, it throws the error:
=> ERROR [4/8] RUN Rscript -e 'sparklyr::spark_install("2.3.0")' 62.0s
------
> [4/8] RUN Rscript -e 'sparklyr::spark_install("2.3.0")':
#7 61.85 Error in download.file(installInfo$packageRemotePath, destfile = installInfo$packageLocalPath, :
#7 61.85 download from 'https://archive.apache.org/dist/spark/spark-2.3.0/spark-2.3.0-bin-hadoop2.7.tgz' failed
#7 61.85 Calls: <Anonymous>
#7 61.85 Execution halted
------
failed to solve: rpc error: code = Unknown desc = executor failed running [/bin/sh -c Rscript -e 'sparklyr::spark_install("2.3.0")']: exit code: 1
We have also tried running it as:
RUN R -e 'sparklyr::spark_install("2.4.3")'
Instead of:
RUN Rscript -e 'sparklyr::spark_install("2.4.3")'
but it stil throws an error. We have also tried installing different versions of Spark to see if that would work but still no luck. Does anyone know why I am getting this error and how I can properly install spark with sparklyr in docker? Thank you.

Related

Error Installing redux R package on centos7

Am getting an error trying to install redux r package on centos7, and have no idea how to fix it. Has anybody come across it before?
my Dockerfile is:
FROM centos:centos7
RUN yum -y install wget git tar
RUN yum -y install https://dl.fedoraproject.org/pub/epel/epel-release-latest-7.noarch.rpm
RUN yum -y install epel-release openssh-server
ENV R_VERSION=4.0.5
RUN wget https://cdn.rstudio.com/r/centos-7/pkgs/R-${R_VERSION}-1-1.x86_64.rpm \
&& yum -y install R-${R_VERSION}-1-1.x86_64.rpm \
&& rm R-${R_VERSION}-1-1.x86_64.rpm
ENV PATH="${PATH}:/opt/R/${R_VERSION}/bin/"
RUN yum -y install openssl-devel
RUN Rscript -e "install.packages(c('redux'), repos = 'https://packagemanager.rstudio.com/all/__linux__/centos7/latest')"
RUN Rscript -e "library(redux)"
CMD ["/bin/bash"]
Then i build the image:
docker build -t test-3:latest .
And the error i get is:
=> ERROR [8/8] RUN Rscript -e "library(redux)" 0.6s
------
> [8/8] RUN Rscript -e "library(redux)":
#12 0.528 Error: package or namespace load failed for 'redux' in dyn.load(file, DLLpath = DLLpath, ...):
#12 0.528 unable to load shared object '/opt/R/4.0.5/lib/R/library/redux/libs/redux.so':
#12 0.528 libhiredis.so.0.12: cannot open shared object file: No such file or directory
#12 0.528 Execution halted
------
executor failed running [/bin/sh -c Rscript -e "library(redux)"]: exit code: 1
ps. I am able to install any other package and reference it without problems
That file seems to come from the hiredis package: https://rhel.pkgs.org/7/okey-x86_64/hiredis-0.12.1-1.el7.centos.x86_64.rpm.html
Try adding the line RUN yum -y install hiredis or maybe adding that package to one of your existing yum install lines.
So turns out I had to also have hiredis installed for the package to load successfully
Updated dockerfile:
FROM centos:centos7
RUN yum -y install wget git tar
RUN yum -y install https://dl.fedoraproject.org/pub/epel/epel-release-latest-7.noarch.rpm
RUN yum -y install epel-release openssh-server
ENV R_VERSION=4.0.5
RUN wget https://cdn.rstudio.com/r/centos-7/pkgs/R-${R_VERSION}-1-1.x86_64.rpm \
&& yum -y install R-${R_VERSION}-1-1.x86_64.rpm \
&& rm R-${R_VERSION}-1-1.x86_64.rpm
ENV PATH="${PATH}:/opt/R/${R_VERSION}/bin/"
RUN yum -y install openssl-devel hiredis
RUN Rscript -e "install.packages(c('redux'), repos = 'https://packagemanager.rstudio.com/all/__linux__/centos7/latest')"
RUN Rscript -e "library(redux)"
CMD ["/bin/bash"]

Installing R in a docker container

I'm trying to install in a Ubuntu:20.04 based container miniconda and, using the conda keyword, R:4.05.
The Dockerfile I'm using is this:
FROM ubuntu:20.04
USER root
RUN apt-get update
RUN apt-get install -y curl
RUN apt-get -y install libcurl4-openssl-dev
RUN apt-get install -y wget
RUN mkdir -p ~/miniconda3
RUN wget https://repo.anaconda.com/miniconda/Miniconda3-latest-Linux-aarch64.sh -O ~/miniconda3/miniconda.sh
RUN bash ~/miniconda3/miniconda.sh -b -u -p ~/miniconda3
RUN export PATH=~/miniconda3/bin:$PATH
RUN rm -rf ~/miniconda3/miniconda.sh
RUN ~/miniconda3/bin/conda init bash
RUN ~/miniconda3/bin/conda init zsh
RUN ~/miniconda3/bin/conda config --add channels conda-forge
RUN ~/miniconda3/bin/activate
RUN ~/miniconda3/bin/conda install -y -c conda-forge r-base
RUN R -e "install.packages('BiocManager')"
RUN R -e "BiocManager::install('DESeq2')"
From lines 8 to 16 I download miniconda and run it in ~/miniconda3
In line 17:
RUN R -e "install.packages('BiocManager')"
I try to use R and install the BiocManager package from the command line, but I receive this error:
> [16/17] RUN R -e "install.packages('BiocManager')":
#19 2.767 /bin/sh: 1: R: not found
------
executor failed running [/bin/sh -c R -e "install.packages('BiocManager')"]: exit code: 127
I've also tried to start from the official distribution of Rocker, but in this way (the way I've shown you in this post) I would prefer it since I would end up with an image in which I have both miniconda and R.
Can someone help me?
Thanks a lot!
Each RUN command runs in a separate shell, so your export command sets the path, but then the shell exits and the path is reset for the next RUN command.
You also have to use the absolute path. Tilde expansion doesn't work.
Instead of
RUN export PATH=~/miniconda3/bin:$PATH
try
ENV PATH=/root/miniconda3/bin:$PATH

Sparklyr fails to download Spark from apache in Dockerfile

I am trying to create a dockerfile that builds an image from Rocker/tidyverse and include Spark from sparklyr. Previously, on this post: Unable to install spark with sparklyr in Dockerfile, I was trying to figure out why spark wouldn't download from my dockerfile. After playing with it for the past 5 days I think I have found the reason but have no idea how to fix it.
Here is my Dockerfile:
# start with the most up-to-date tidyverse image as the base image
FROM rocker/tidyverse:latest
# install openjdk 8 (Java)
RUN apt-get update \
&& apt-get install -y openjdk-8-jdk
# Install devtools
RUN Rscript -e 'install.packages("devtools")'
# Install sparklyr
RUN Rscript -e 'devtools::install_version("sparklyr", version = "1.5.2", dependencies = TRUE)'
# Install spark
RUN Rscript -e 'sparklyr::spark_install(version = "3.0.0", hadoop_version = "3.2")'
RUN mv /root/spark /opt/ && \
chown -R rstudio:rstudio /opt/spark/ && \
ln -s /opt/spark/ /home/rstudio/
RUN apt-get install unixodbc unixodbc-dev --install-suggests
RUN apt-get install odbc-postgresql
RUN install2.r --error --deps TRUE DBI
RUN install2.r --error --deps TRUE RPostgres
RUN install2.r --error --deps TRUE dbplyr
It has no problem downloading everything up until this line:
RUN Rscript -e 'sparklyr::spark_install(version = "3.0.0", hadoop_version = "3.2")'
Which then gives me the error:
Step 5/11 : RUN Rscript -e 'sparklyr::spark_install(version = "3.0.0", hadoop_version = "3.2")'
---> Running in 739775db8f12
Error in download.file(installInfo$packageRemotePath, destfile = installInfo$packageLocalPath, :
download from 'https://archive.apache.org/dist/spark/spark-3.0.0/spark-3.0.0-bin-hadoop3.2.tgz' failed
Calls: <Anonymous>
Execution halted
ERROR: Service 'rocker_sparklyr' failed to build : The command '/bin/sh -c Rscript -e 'sparklyr::spark_install(version = "3.0.0", hadoop_version = "3.2")'' returned a non-zero code: 1
After doing some research I thought that it was a timeout error, in which case I ran beforehand:
RUN Rscript -e 'options(timeout=600)'
This did not increase the time it took to error out again. I installed everything onto my personal machine through Rstudio and it installed with no problems. I think the problem is specific to docker in that it isn't able to download from https://archive.apache.org/dist/spark/spark-3.0.0/spark-3.0.0-bin-hadoop3.2.tgz
I have found very little documentation on this problem and am relying heavily on this post to figure it out. Thank you in advance to anyone with this knowledge for reaching out.
download the version yourself and then use this function to install
sparklyr::spark_install_tar(tarfile ="~/spark/spark-3.0.1-bin-hadoop3.2.tgz")

Adding DT R packages prevents Docker from running

I use Docker to run Shiny app, and I install some R packages in the Dockerfile (here is the portion of the Dockerfile, I omitted some lines, marking it with <...>):
FROM r-base:latest
RUN apt-get update && apt-get install -y -t unstable \
sudo \
gdebi-core \
make \
git \
gcc \
<...>
R -e "install.packages(c('shiny', 'rmarkdown'), repos='https://cran.rstudio.com/')" && \
R -e "install.packages(c('ada','bsplus','caret','ddalpha','diptest','doMC','dplyr','e1071','evtree','fastAdaboost','foreach','GGally','ggplot2','gridExtra','iterators','kernlab','lattice','markdown','MASS','mboost','nnet','optparse','partykit','plyr','pROC','PRROC','randomForest','recipes','reshape2','RSNNS','scales','shinyBS','shinyFiles','shinythemes'))"
This works fine. But if I add one more R package (DT), the container still builds fine (and I can see that the package gets installed properly) but when I try to run the container I get:
Loading required package: shiny
Error in dir.exists(lib) : invalid filename argument
Calls: <Anonymous> ... load_libraries -> get_package -> install.packages -> dir.exists
Execution halted
This error is not informative at all and I can't figure out what possibly can be wrong. I will appreciate any ideas! Thank you.

R App in Docker Container: Unable to download PDF report (Error: No such file or directory) Knitr/Rmarkdown

I have been building a docker container for an R application and have continually run into an error with downloading a PDF report. The PDF report function works fine in R on a local machine, but when containerized, it throws the error below. I have tried forcing the install of specific packages, namely Knitr and Rmarkdown as other questions have mentioned, however it is still showing the same error. The file in Chrome downloads simply says "Failed - Server Problem". I have tested the download of a CSV file using the app, which works fine, therefore I believe it's an issue with generating and downloading a markdown PDF report.
I have included the build Dockerfile to assist. Any suggestions would be amazing!
Thanks!
DOCKERFILE:
FROM openanalytics/r-base
MAINTAINER ________
# system libraries of general use
RUN apt-get update && apt-get install -y \
sudo \
pandoc \
pandoc-citeproc \
libcurl4-gnutls-dev \
libcairo2-dev \
libxt-dev \
libssl-dev \
libssh2-1-dev \
libxml2-dev \
libssl1.0.0 \
libpq-dev \
git \
texlive-full \
html-xml-utils \
libv8-3.14-dev
# system library dependency for the app
RUN apt-get update
# install packages for R
RUN R -e "install.packages(c('hms','devtools'), repos='https://cloud.r-
project.org/')"
RUN R -e "require(devtools)"
RUN R -e "install.packages(c('car'), repos='https://cloud.r-project.org/')"
RUN R -e "devtools::install_version('readxl', version = '1.0.0',
repos='https://cloud.r-project.org/')"
RUN R -e "devtools::install_version('DT', version = '0.2',
repos='https://cloud.r-project.org/')"
RUN R -e "devtools::install_version('shinydashboard', version = '0.6.1',
repos='https://cloud.r-project.org/')"
RUN R -e "devtools::install_version('knitr', version = '1.18',
repos='https://cloud.r-project.org/')"
RUN R -e "devtools::install_version('magrittr', version = '1.5',
repos='https://cloud.r-project.org/')"
RUN R -e "devtools::install_version('ggrepel', version = '0.7.0',
repos='https://cloud.r-project.org/')"
RUN R -e "devtools::install_version('dplyr', version = '0.7.4',
repos='https://cloud.r-project.org/')"
RUN R -e "devtools::install_version('Rcpp', version = '0.12.14',
repos='https://cloud.r-project.org/')"
RUN R -e "devtools::install_version('rhandsontable', version = '0.3.4',
repos='https://cloud.r-project.org/')"
RUN R -e "devtools::install_version('shinyjs', version = '0.9.1',
repos='https://cloud.r-project.org/')"
RUN R -e "devtools::install_version('V8', version = '1.5',
repos='https://cloud.r-project.org/')"
RUN R -e "devtools::install_version('data.table', version = '1.10.4-3',
repos='https://cloud.r-project.org/')"
RUN R -e "devtools::install_version('packrat', version = '0.4.8-1',
repos='https://cloud.r-project.org/')"
RUN R -e "devtools::install_version('zoo', version = '1.8-1',
repos='https://cloud.r-project.org/')"
RUN R -e "install.packages('shiny', repos='https://cloud.r-project.org/')"
RUN wget https://github.com/rstudio/rmarkdown/archive/v1.8.tar.gz
RUN R CMD INSTALL v1.8.tar.gz
RUN R -e "install.packages('xml2', repos='https://cloud.r-project.org/')"
RUN R -e "install.packages('rvest', repos='https://cloud.r-project.org/')"
RUN wget https://cran.r-
project.org/src/contrib/Archive/kableExtra/kableExtra_0.3.0.tar.gz
RUN R CMD INSTALL kableExtra_0.3.0.tar.gz
# copy the app to the image
RUN mkdir /root/tsk
COPY tsk /root/tsk
COPY Rprofile.site /usr/lib/R/etc/
EXPOSE 3838
CMD ["R", "-e", "shiny::runApp('/root/tsk')"]
ERROR FROM DOCKER:
Listening on http://0.0.0.0:3838
Warning in normalizePath(path, winslash = winslash, mustWork = mustWork) :
path[1]="/tmp/RtmpMu8ezy/TSK.Rmd": No such file or directory
Warning: Error in tools::file_path_as_absolute: file '/tmp/RtmpMu8ezy/TSK.Rmd'
does not exist
[No stack trace available]
Warning in normalizePath(path, winslash = winslash, mustWork = mustWork) :
path[1]="/tmp/RtmpMu8ezy/TSK.Rmd": No such file or directory
Warning: Error in tools::file_path_as_absolute: file '/tmp/RtmpMu8ezy/TSK.Rmd'
does not exist
[No stack trace available]
Warning in normalizePath(path, winslash = winslash, mustWork = mustWork) :
path[1]="/tmp/RtmpMu8ezy/TSK.Rmd": No such file or directory
Warning: Error in tools::file_path_as_absolute: file '/tmp/RtmpMu8ezy/TSK.Rmd'
does not exist
[No stack trace available]
Simply changing the filename CASE from tsk.Rmd to TSK.Rmd - reason for this is that testing was always on OSX in an IDE which didn't throw any errors, however when building a container with Ubuntu which is case sensitive, it was unable to find the markdown file.
When building with different operating systems, be sure to check if the system is case sensitive! An easy mistake!

Resources