Custom R Source package in Docker - r

InfrastructureR_0.1.4.tar.gz is a R source package that I created using RStudio in windows and it works fine.
But when I try to use it in docker file with the command
InfrastructureR_0.1.4.tar.gz is located in the same location as Docker file
Docker is running on ubuntu 18.x machine
RUN R -e "install.packages('InfrastructureR_0.1.4.tar.gz', dependencies=FALSE, verbose=TRUE, repos=NULL, type='source')"
it doesn't install.
I run the docker build by sudo docker build -t myRApp s .
and I get the following error message
Installing package into ‘/usr/local/lib/R/site-library’
(as ‘lib’ is unspecified)
system (cmd0): /usr/lib/R/bin/R CMD INSTALL
Warning: invalid package ‘InfrastructureR_0.1.4.tar.gz’
Error: ERROR: no packages specified
Warning message:
In install.packages("InfrastructureR_0.1.4.tar.gz", dependencies = FALSE, :
installation of package ‘InfrastructureR_0.1.4.tar.gz’ had non-zero exit status
So how do I install a custom R source package in Docker

It is not enough to have your tar.gz file in the same directory as your Dockerfile. To make a local file available to a Docker container created from an image, you can use the COPY instruction.
The COPY instruction copies new files or directories from <src> and adds them to the filesystem of the container at the path <dest>:
COPY [--chown=<user>:<group>] <src>... <dest>
More in the docker docs:
https://docs.docker.com/engine/reference/builder/#copy

I think the package needs to be copied inside the container so that the container can access it. (or in a location volume that docker has access to)
R running inside the container cannot find the file.
https://www.digitalocean.com/community/tutorials/how-to-share-data-between-the-docker-container-and-the-host
https://docs.docker.com/storage/bind-mounts/

Related

Install packages during build of rocker/tidyverse docker image

I am building a docker image using rocker/tidyverse.
My Dockerfile:
FROM rocker/tidyverse:4.0.4
COPY train.R /train.R
COPY install.R /install.R
COPY entrypoint.sh /entrypoint.sh
# pre install the packages during build
RUN Rscript install.R
Here's the install.r script from above:
install.packages('pacman')
pacman::p_load(lubridate, Metrics, foreach)
Also tried a variation of this with just install.packages(<packagename>) for each of the packages in the install.r script.
When I attempt to build I get an error message:
docker-compose build rtrain
Step 5/5 : RUN Rscript install.R
---> Running in 3ebf7ab0c227
Installing package into '/usr/local/lib/R/site-library'
(as 'lib' is unspecified)
Warning: unable to access index for repository https://packagemanager.rstudio.com/cran/__linux__/focal/2021-03-30/src/contrib:
cannot open URL 'https://packagemanager.rstudio.com/cran/__linux__/focal/2021-03-30/src/contrib/PACKAGES'
Warning message:
package 'pacman' is not available for this version of R
If I remove RUN Rscript install.r from build but instead run the image and then exec into it, I am able to run it with just Rsctipt install.r. It's only during build that this happens.
On the page on the first url I linked to above there's a mention of script install2.r but I could not find any mention of this anywhere. The full blurb:
NOTES
do not use apt-get install r-cran-* to install R packages on this stack . The requested R version and all R packages are installed from source in the version-stable stack. Installing R packages from apt (e.g. the r-cran-* packages) will install the version of R and versions of the packages that were built for the stable debian release (e.g. debian:stretch ), giving you a second version of R and different packages. Please install R packages from source using the install.packages() R function (or the install2.r script), and use apt only to install necessary system libraries (e.g. libxml2 ). If you would prefer to install only the latest verions of packages from pre-built binaries using apt-get , consider using the r-base stack instead.
Underlining I did try modifying install.r to just be repeated rows of install.packages("lubridate") install.packages("Metrics") and install.packages("foreach") but the same error happens.
How can I install packages during build using this image?
Changing my version to 4.0.1 solved my problem:
FROM rocker/tidyverse:4.0.1
Then everything else the same, worked.

In R (and when installing ROracle package), how do I set OCI_LIB64?

In R (and when installing ROracle package), how do I set OCI_LIB64? I've downloaded Oracle Instant Client and have pointed the wd and the OCI_LIB64 to that. I have Windows 10 Enterprise. It's RStudio version 1.1.463 and R version 3.4.3.
I tried the following:
setwd('C:\\Users\\sriley03\\Documents\\') # set to path of download (remember to escape slashes ie: c:\\users\\etc..)
set OCI_LIB64=C:\Users\sriley03\Documents\instantclient_19_3
install.packages('ROracle_1.3-1.tar.gz', repos = NULL)
but I get the following output and errors:
> setwd('C:\\Users\\sriley03\\Documents\\') # set to path of download (remember to escape slashes ie: c:\\users\\etc..)
The working directory was changed to C:/Users/sriley03/Documents/ inside a notebook chunk. The working directory will be reset when the chunk is finished running. Use the knitr root.dir option in the setup chunk to change the working directory for notebook chunks.Error: unexpected symbol in "set OCI_LIB64"
> setwd('C:\\Users\\sriley03\\Documents\\instantclient_19_3') # set to path of download (remember to escape slashes ie: c:\\users\\etc..)
The working directory was changed to C:/Users/sriley03/Documents/instantclient_19_3 inside a notebook chunk. The working directory will be reset when the chunk is finished running. Use the knitr root.dir option in the setup chunk to change the working directory for notebook chunks.> set OCI_LIB64=C:\Users\sriley03\Documents\instantclient_19_3
Error: unexpected symbol in "set OCI_LIB64"
> install.packages('ROracle_1.3-1.tar.gz', repos = NULL)
Warning: invalid package 'ROracle_1.3-1.tar.gz'
Error: ERROR: no packages specified
In R CMD INSTALL
Warning in install.packages :
running command '"C:/PROGRA~1/R/R-34~1.3/bin/x64/R" CMD INSTALL -l "C:\Program Files\R\R-3.4.3\library" "ROracle_1.3-1.tar.gz"' had status 1
Warning in install.packages :
installation of package ‘ROracle_1.3-1.tar.gz’ had non-zero exit status
What can I do so that I can set OCI_LIB64 properly so that I can install ROracle?
Thanks!
UPDATE (8_28_19):
I've set the OCI_LIB64 and the OCI_INC to the correct paths
Sys.setenv(OCI_LIB64="C:\\Users\\sriley03\\Documents\\oreclient_install_dir\\instantclient_19_3")
Sys.setenv(OCI_INC="C:\\Users\\sriley03\\Documents\\oreclient_install_dir\\instantclient_19_3")
But now I get the following response (even though all the required headers are in that file):
* installing *source* package 'ROracle' ...
** package 'ROracle' successfully unpacked and MD5 sums checked
Oracle Client Shared Library 64-bit - 19.3.0.0.0 Operating in Instant Client mode.
found Oracle Client C:\Users\sriley03\Documents\oreclient_install_dir\instantclient_19_3
found Oracle Client include C:\Users\sriley03\Documents\oreclient_install_dir\instantclient_19_3
ERROR: cannot find Oracle Client include headers in C:\Users\sriley03\Documents\oreclient_install_dir\instantclient_19_3.
Please set OCI_INC to correct location.
Warning: running command 'sh ./configure.win' had status 1
ERROR: configuration failed for package 'ROracle'
* removing 'C:/Program Files/R/R-3.4.3/library/ROracle'
In R CMD INSTALL
Warning in install.packages :
running command '"C:/PROGRA~1/R/R-34~1.3/bin/x64/R" CMD INSTALL -l "C:\Program Files\R\R-3.4.3\library" C:\Users\sriley03\AppData\Local\Temp\RtmpWUfabz/downloaded_packages/ROracle_1.3-1.tar.gz' had status 1
Try to download the SDK Package from Oracle and unzip it in your instant client folder.
set the environment variable OCI_INC to the 'include' folder located in your instant client folder: ..."instantclient_xx_x\sdk\include"
Then, reinstall the packages.
If you still get a failure error, try to copy all files from sdk\include folder to your R include folder, then try again

How to prevent packrat installs packages in system's library inside docker?

I have a shiny project using packrat. When I create a Rocker-shiny Docker container I put commands in Dockerfile in order to install packrat package and restore library. However, I see that packrat installs packages into system library (/usr/local/lib/R/...) instead private project library. If I enter in bash docker's console and I start a R session into project dir then reads .Rprofile file and packrat is installed and starts packages installation into private library. How I can get this from Dockerfile?
In my Dockerfile:
RUN cd /srv/shiny-server && \ R -e 'install.packages("packrat" , repos="http://cran.us.r-project.org"); packrat::restore()'
Install packages into /usr/local/lib/R... what is wrong.
However, if I enter into docker bash and start a R session into my project dir, it works fine:
docker exec -it test_app bash
cd /srv/shiny-server
R # start R session into project dir
Packrat is not installed in the local library -- attempting to bootstrap an installation...
> Installing packrat into project private library:
- "/srv/shiny-server/packrat/lib/x86_64-pc-linux-gnu/3.5.3"
* installing *source* package ‘packrat’ ...
** R
** inst
** preparing package for lazy loading
** help
*** installing help indices
** building package indices
** testing if installed package can be loaded
* DONE (packrat)
> Attaching packrat
> Restoring library
Installing plyr (1.8.1) ... OK (built source)
I encountered the same problem and fixed the issue by restoring the packrat packages in my docker-compose file:
docker-compose.yml:
...
command: [sh,-c, "sudo Rscript config/packrat_restore.R"]
...
packrat_restore.R:
packrat::init(
infer.dependencies = FALSE,
enter = TRUE,
restart = FALSE)
packrat::restore()
As this workaround will always delay the startup of my containers in production, I will still try to fix the problem in the Dockerfile itself...

making change to function in R package and installing on Ubuntu

Short question: I want to edit the postgresqlWriteTable function in the RPostgreSQL package and install it on R running on an Ubuntu machine.
Long explanation:
The root of my problem is that I am trying to write to a postgres table with an auto-incrementing primary key column from R using dbWriteTable from RPostgreSQL package.
I read this post: How do I write data from R to PostgreSQL tables with an autoincrementing primary key? which suggested a fix to my problem by changing the function postgresqlWriteTable in the RPostgreSQL package. It works when I interactively use fixInNamespace in OSX environment and edit the function.
Unfortunately I have to run my script on an AWS instance running R on Ubuntu. I have RPostgreSQL installed at this location on my machine: /usr/local/lib/R/site-library/RPostgreSQL . I installed it by invoking R CMD install RPostgreSQL_0.4-1.tar.gz
Now I am trying to find the function postgresqlWriteTable. It is supposed to be in the file PostgreSQLSupport.R . I have searched the whole library - there is no such file.
I realized that on my local machine in the OSX Finder , when I unzip the tar.gz package folder, I can see the file PostgreSQLSupport.R where I am supposed to change the function.
So I changed the function. Then I removed the installed RPostgreSQL from my Ubuntu machine and copied the new folder (from my local machine) into my Ubuntu machine and tried to use devtools to install the package as suggested in the post here: Loading an R Package from a Custom directory
here's what happened:
> library("devtools")
> install("/usr/local/lib/R/site-library/RPostgreSQL")
Error: Can't find '/usr/local/lib/R/site-library/RPostgreSQL'.
> install("RPostgreSQL", "/usr/local/lib/R/site-library/RPostgreSQL")
Installing RPostgreSQL
'/usr/lib/R/bin/R' --no-site-file --no-environ --no-save --no-restore --quiet \
CMD INSTALL '/datasci/nikhil/RPostgreSQL' \
--library='/usr/local/lib/R/site-library' --install-tests
* installing *source* package ‘RPostgreSQL’ ...
file ‘R/PostgreSQLSupport.R’ has the wrong MD5 checksum
ERROR: 'configure' exists but is not executable -- see the 'R Installation and Administration Manual'
* removing ‘/usr/local/lib/R/site-library/RPostgreSQL’
Error: Command failed (1)
I am at my wit's end !
Copy the pacakge .tar.gz file to the AWS machine.
Unpack this file so you have a directory structure.
Edit the function inside the file and save your changes.
You may also have to increase the version number in the DESCRIPTION file.
Use devtools::build to create a new .tar.gz file.
Install this updated version of the package.

Error installing RMySQL

It took a good amount of time to install RMySQL on my Linux machine but I was able to install it after changing environment variables and copy and paste lib.dll file.
However, I'm now trying to install RMySQL on my 64bit window machine, but so far there's no progress yet for two days. It broke down after "running command sh ./configure.win had status 127 error, and I cannot find what this means.
Can anyone shed some lights on this?
install.packages('RMySQL',type='source')
Installing package into ‘C:/Users/chu/Documents/R/win-library/3.1’
(as ‘lib’ is unspecified)
trying URL 'http://cran.rstudio.com/src/contrib/RMySQL_0.9-3.tar.gz'
Content type 'application/x-gzip' length 165363 bytes (161 Kb)
opened URL
downloaded 161 Kb
* installing *source* package 'RMySQL' ...
** package 'RMySQL' successfully unpacked and MD5 sums checked
Warning: running command 'sh ./configure.win' had status 127
ERROR: configuration failed for package 'RMySQL'
* removing 'C:/Users/chu/Documents/R/win-library/3.1/RMySQL'
Warning in install.packages :
running command '"C:/PROGRA~1/R/R-31~1.0/bin/x64/R" CMD INSTALL -l "C:\Users\chu\Documents\R\win-library\3.1" C:\Users\chu\AppData\Local\Temp\RtmpKA9e7I/downloaded_packages/RMySQL_0.9-3.tar.gz' had status 1
Warning in install.packages :
installation of package ‘RMySQL’ had non-zero exit status
The downloaded source packages are in
‘C:\Users\chu\AppData\Local\Temp\RtmpKA9e7I\downloaded_packages’
for linux users..
install- libmysql first
sudo apt-get install libmysql++-dev
then try.
I was facing the same error. Given below is the link to a way around that worked for me.
http://www.ahschulz.de/2013/07/23/installing-rmysql-under-windows/
In short, the location of library libmysqll.dll required for compilation, had to be changed from lib folder to bin folder of the home directory set for MySQL in environment variables.
By default, R uses the /tmp directory to install packages. On security conscious machines, the /tmp directory is often marked as “noexec” in the /etc/fstab file. This means that no file under /tmp can ever be executed. Packages that require compilation or that have self-inflating data will fail with the error mentioned.
The solution is to set the TMPDIR environment variable outside R (in your shell), which R will use as the compilation directory. How to do this depends on the shell. bash:
mkdir ~/tmp
export TMPDIR=~/tmp
Then R can compile and install the package.
I ran into the same problem while updating packages on Windows server for latest version of R. I solved it by installing from a .zip file vs .tar.gz.
I actually had to go through the process of first downloading the package, and then installing from it (not from mirror) for other reasons.
Here is what it looked like:
pk <- 'caTools'
download.packages(pk, "R-3.2-packages/" ,type = "win.binary")
install.packages(
dir("R-3.2-packages/",pattern=pk,full.names = TRUE),
repos = NULL,
type = "source")
Hope this helps.
Solution if anyone faced the same problem on windows:
Make sure your MYSQL_HOME environment variable is set correctly and libmysql.dll is copied to bin folder!!!
Run install.packages('RMySQL') then when the "Do you want to install from sources..." window pops up select No.
Then copy the downloaded binary packages location from console.
Go to Packages -> Install, paste the location into Package archive and click Install.

Resources