Configuring a docker container to use host UID and generate files on the host system - Preferably at runtime - r

I am currently working on a research tool that is supposed to be containerized using docker to hopefully be run on as many different systems as possible. This works fine for the most part, we have run into a permission problem because of the workflow though: The tool takes an input file (which we mount into the container), evaluates it using R scripts and is then supposed to generate a report on the input file exactly where the file was taken from on the host system.
The latter part is problematic as at least in our university context, the internal container user lacks write permissions in the (non-root) user home folders, which we are currently taking our testing data from. This would obviously also be bad in a production context as we don't know how the potential users' system is set up, which is why we are trying to dynamically and temporarily set the permissions of the container user to the host user.
I have found different solutions that involve passing the UID/GID to the docker daemon when building the container in some way or another:
docker build --build-arg USER_ID=$(id -u ${USER}) --build-arg GROUP_ID=$(id -g ${USER}) -t IMAGE .
I also changed the dockerfile accordingly using a tutorial that suggested replacing the internal www-data user:
[...Package installation steps that are supposed to be run as root...]
ARG USER_ID
ARG GROUP_ID
RUN if [ ${USER_ID:-0} -ne 0 ] && [ ${GROUP_ID:-0} -ne 0 ]; then \
userdel -f www-data &&\
if getent group www-data ; then groupdel www-data; fi &&\
groupadd -g ${GROUP_ID} www-data &&\
useradd -l -u ${USER_ID} -g www-data www-data &&\
install -d -m 0755 -o www-data -g www-data /work/ &&\
chown --changes --silent --no-dereference --recursive \
--from=33:33 ${USER_ID}:${GROUP_ID} \
/work \
;fi
USER www-data
WORKDIR /work
RUN mkdir files
COPY data/ /opt/MTB/data/
COPY helpers/ /opt/MTB/helpers/
COPY src/www/ /opt/MTB/www/
COPY tmp/ /opt/MTB/tmp/
COPY example_data/ /opt/MTB/example_data/
COPY src/ /opt/MTB/src/
EXPOSE 8080
ENTRYPOINT ["/opt/MTB/src/starter_s_c.sh"]
The entrypoint script starter_s_c.sh is a small bashscript that feeds the trailing argument to the corresponding R script as an input file - the R script writes the report.
This works, but requires the container to be built again for every new user. What we are looking for is a solution that handles the dynamic permission setting at runtime, so that we only have to build the container once and can use it with many different user configurations.
I have found this but I am not entirely sure how to implement it as it would replace our entrypoint script and I'm not sure how to integrate this solution into our project.
Here is our current entrypoint script which already needs the permissions to be set so localmaster.r can generate the report in the host directory:
#!/bin/sh
file="$1"
cd $(dirname $0)/..
if [ $# -eq 0 ]; then
echo '.libPaths(c("~/lib/R/library", .libPaths())); library(shiny); library(shinyjs); runApp("src")' | R --vanilla
else
echo "Rscript --vanilla /opt/MTB/src/localmaster.r "$file""
Rscript --vanilla /opt/MTB/src/localmaster.r "$file"
fi
(If no arguments are given, it starts a shiny app, just to avoid confusion)
Any help or tips would be much appreciated! Thank you.

Related

Azure ARM - mount StorageAccount FileShare to a linux VM

I prepared an ARM template, template creates listed azure resources: linux VM deployment, Storage deployment, file share in this Storage Account.
ARM works fine, but I would like to add one thing, mounting file share to a linux VM (using script from file share blade, script proposed by Microsoft).
I would like to use Custom Script Extension, and then use "commandToExecute" option to paste inline linux script (this one for file share mounting).
My question is: how to retrieve password to file share and then pass it as a parameter to the inline script. Is it possible? Is it possible to paste file share mounting script as an inline script in ARM template? maybe there is any other way to complete my task? I know that I can store script in a storage account and in ARM template put "blob SAS URL" in the Custom Extension ARM area, but still is a question how to retrieve the password to File Shares, below is the script for File share mount.
sudo mkdir /mnt/wsustorageaccount
if [ ! -d "/etc/smbcredentials" ]; then
sudo mkdir /etc/smbcredentials
fi
if [ ! -f "/etc/smbcredentials/StorageAccountName.cred" ]; then
sudo bash -c 'echo "username=xxxxx" >> /etc/smbcredentials/StorageAccountName.cred'
sudo bash -c 'echo "password=xxxxxxx" >> /etc/smbcredentials/StorageAccountName.cred'
fi
sudo chmod 600 /etc/smbcredentials/StorageAccountName.cred
sudo bash -c 'echo "//StorageAccount.file.core.windows.net/test /mnt/StorageAccount cifs nofail,vers=3.0,credentials=/etc/smbcredentials/StorageAccountName.cred,dir_mode=0777,file_mode=0777,serverino" >> /etc/fstab'
sudo mount -t cifs //StorageAccountName.file.core.windows.net/test /mnt/StorageAccountName -o vers=3.0,credentials=/etc/smbcredentials/StorageAccountName.cred,dir_mode=0777,file_mode=0777,serverino
You can use this quickstart example:
listKeys(variables('storageAccountId'), '2019-04-01').keys[0].value

Mount EFS to wp-content on elastic beanstalk

So i'm having a problem setting up a Wordpress site on EB. I got the EFS to mount correctly on wp-content/uploads/wpfiles (https://docs.aws.amazon.com/elasticbeanstalk/latest/dg/php-hawordpress-tutorial.html) however this only allows the pages to be stored and not the plugins. Is it possible to mount the entire wp-content folder onto EFS, I've tried and so far failed
I'm not sure if this issue was resolved and it passed silently. I'm having the same issue as you, but with a different error. My knowledge is fairly limited so take what I say with a grain of salt, according to what I saw in your log the problem is that your instance can't see the server. I think that it could be that your EB application is getting deployed in a different Availability Zone than your EFS. What I mean is that maybe you have mount targets for AZ a, b and d and your EB is getting deployed in AZ c. I hope this helps.
I tried a different approach (it basically does the same thing, but I'm manually linking each of the subfolders instead of the wp-content folder), for it to work I deleted the original folders inside /var/app/ondeck (that will eventually get copied to /var/app/current/ that is the folder which get served). Of course, once this gets done your Wordpress won't work since it doesn't have any themes, the solution here would be to quickly log in to the EC2 instance in which your ElasticBeanstalk app is running and manually copying the contents to the mounted EFS (in my case the /wpfiles folder). To connect to the EC2 instance (you can find the instance ID under your EB health configuration) you can follow this link and to mount your EFS you can follow this link. Of course, if the config works you won't have to mount it since it would be already mounted though empty. Here is the content of my config file:
option_settings:
aws:elasticbeanstalk:application:environment:
EFS_NAME: '`{"Ref" : "FileSystem"}`'
MOUNT_DIRECTORY: '/wpfiles'
REGION: '`{"Ref": "AWS::Region"}`'
packages:
yum:
nfs-utils: []
jq: []
files:
"/tmp/mount-efs.sh" :
mode: "000755"
content: |
#!/usr/bin/env bash
mkdir -p $MOUNT_DIRECTORY
EFS_REGION=$(/opt/elasticbeanstalk/bin/get-config environment | jq -r '.REGION')
EFS_NAME=$(/opt/elasticbeanstalk/bin/get-config environment | jq -r '.EFS_NAME')
MOUNT_DIRECTORY=$(/opt/elasticbeanstalk/bin/get-config environment | jq -r '.MOUNT_DIRECTORY')
mount -t nfs4 -o nfsvers=4.1,rsize=1048576,wsize=1048576,hard,timeo=600,retrans=2 $EFS_NAME.efs.${EFS_REGION}.amazonaws.com:/ $MOUNT_DIRECTORY || true
mkdir -p $MOUNT_DIRECTORY/uploads
mkdir -p $MOUNT_DIRECTORY/plugins
mkdir -p $MOUNT_DIRECTORY/themes
chown webapp:webapp -R $MOUNT_DIRECTORY/uploads
chown webapp:webapp -R $MOUNT_DIRECTORY/plugins
chown webapp:webapp -R $MOUNT_DIRECTORY/themes
commands:
01_mount:
command: "/tmp/mount-efs.sh"
container_commands:
01-rm-wp-content-uploads:
command: rm -rf /var/app/ondeck/wp-content/uploads && rm -rf /var/app/ondeck/wp-content/plugins && rm -rf /var/app/ondeck/wp-content/themes
02-symlink-uploads:
command: ln -snf $MOUNT_DIRECTORY/uploads /var/app/ondeck/wp-content/uploads && ln -snf $MOUNT_DIRECTORY/plugins /var/app/ondeck/wp-content/plugins && ln -snf $MOUNT_DIRECTORY/themes /var/app/ondeck/wp-content/themes
I'm using another config file to create my EFS as in here, in case you have already created your EFS you must change EFS_NAME: '`{"Ref" : "FileSystem"}`' to EFS_NAME: id_of_your_EFS.
I hope this helps user3738338.
You can do following this link - https://github.com/aws-samples/eb-php-wordpress/blob/master/.ebextensions/efs-mount.config
Just keep a note it uses uploads, you can change it for wp-content.

Change chmod of dir from volume

I'm trying to run CakePHP 2 app inside of a container. I have everything setup and PHP works properly but have one problem: /var/www/app/tmp has incorrect write permissions. This directory is loaded from volume
Did you already take a look at the CakePHP2.0 docs? Maybe this is usefull:
One common issue is that the app/tmp directories and subdirectories must be writable both by the web server and the command line user. On a UNIX system, if your web server user is different from your command line user, you can run the following commands just once in your project to ensure that permissions will be setup properly:
HTTPDUSER=`ps aux | grep -E '[a]pache|[h]ttpd|[_]www|[w]ww-data|[n]ginx' | grep -v root | head -1 | cut -d\ -f1`
setfacl -R -m u:${HTTPDUSER}:rwx app/tmp
setfacl -R -d -m u:${HTTPDUSER}:rwx app/tmp
Source: https://book.cakephp.org/2.0/en/installation.html#permissions
This happens a lot if you're running PHP via a container passthrough. In this scenario, you are passing a directory through to the application with pre-defined permissions. What you'll need to do is periodically make sure permissions are being updated to the webserver from the container. Let's say your container is called web
docker exec web chown -R www-data /var/www/html
(/var/www/html being replaced with wherever your code resides)
For Example. This will make it work perfectly fine in the container, but may actually cause issues accessing the data from the host OS if you're using Linux. I had this issue several times with Laravel and PHP using a volume passthrough from the host, since the volume's files themselves are updated to a userID the host OS doesn't have.

chmod with wildcard inside symlink

I'm setting up Tomcat on Centos according to https://www.digitalocean.com/community/tutorials/how-to-install-apache-tomcat-8-on-centos-7 , but with a twist: I put Tomcat in /opt/apache-tomcat-8.5.6 and then set up a symbolic link:
sudo ln -s /opt/apache-tomcat-8.5.6 /opt/tomcat
Now I change the group ownership of /opt/tomcat to tomcat:
sudo chgrp -R tomcat /opt/tomcat/conf
Then I give the tomcat group write access to the configuration directory:
sudo chmod g+rwx /opt/tomcat/conf
But here is the problem: I try to give the tomcat group read access to all the configuration files:
sudo chmod g+r /opt/tomcat/conf/*
That gives me an error: chmod: cannot access ‘/opt/tomcat/conf/*’: No such file or directory
What? Does chmod not accept wildcards? Or does it not look inside symbolic links? What's going on?
Note that I got around it by doing this:
sudo chmod g+r -R /opt/tomcat/conf
Does that give me effectively the same thing? (I know that it additionally makes the directory readable by the group, but that seems inconsequential --- the group could already read the directory.) Why doesn't the wildcard version work?
Globs are expanded by the current shell. This happens before sudo and chown are ever invoked.
If the current shell doesn't have access to list the files, the glob will be treated as unmatched and just left alone. This makes chmod try to access a file literally named *, which fails.
root# echo /root/.*
/root/.bash_history /root/.bashrc ...
user$ sudo echo /root/.*
/root/.*
The same is true for command substitution, process substitution and other expansions, which are similarly unaffected by sudo:
root# echo $(whoami)
root
user$ sudo echo $(whoami)
user
The shell is also responsible for pipes and redirects, which are also set up before sudo ever runs:
root# echo 60 > /proc/sys/vm/swappiness
(command exits successfully)
user$ sudo echo 60 > /proc/sys/vm/swappiness
bash: /proc/sys/vm/swappiness: Permission denied
In Unix terms, sudo is wrapper for execve(2), and therefore can't help with anything that you can't do through an execve call. If you need shell functionality from the target user, you need to manually invoke that shell:
user$ sudo sh -c 'chmod g+r /opt/tomcat/conf/*'

Run multiple instances of RStudio in a web browser

I have RStudio server installed on a remote aws server (ubuntu) and want to run several projects at the same time (one of which takes lots of time to finish). On Windows there is a simple GUI solution like 'Open Project in New Window'. Is there something similar for rstudio server?
Simple question, but failed to find a solution except this related question for Macs, which offers
Run multiple rstudio sessions using projects
but how?
While running batch scripts is certainly a good option, it's not the only solution. Sometimes you may still want interactive use in different sessions rather than having to do everything as batch scripts.
Nothing stops you from running multiple instances of RStudio server on your Ubuntu server on different ports. (I find this particularly easy to do by launching RStudio through docker, as outlined here. Because an instance will keep running even when you close the browser window, you can easily launch several instances and switch between them. You'll just have to login again when you switch.
Unfortunately, RStudio-server still prevents you having multiple instances open in the browser at the same time (see the help forum). This is not a big issue as you just have to log in again, but you can work around it by using different browsers.
EDIT: Multiple instances are fine, as long as they are not on the same browser, same browser-user AND on the same IP address. e.g. a session on 127.0.0.1 and another on 0.0.0.0 would be fine. More importantly, the instances keep on running even if they are not 'open', so this really isn't a problem. The only thing to note about this is you would have to log back in to access the instance.
As for projects, you'll see you can switch between projects using the 'projects' button on the top right, but while this will preserve your other sessions I do not think the it actually supports simultaneous code execution. You need multiple instances of the R environment running to actually do that.
UPDATE 2020 Okay, it's now 2020 and there's lots of ways to do this.
For running scripts or functions in a new R environment, check out:
the callr package
The RStudio jobs panel
Run new R sessions or scripts from one or more terminal sessions in the RStudio terminal panel
Log out and log in to the RStudio-server as a different user (requires multiple users to be set up in the container, obviously not a good workflow for a single user but just noting that many different users can access the same RStudio server instance no problem.
Of course, spinning up multiple docker sessions on different ports is still a good option as well. Note that many of the ways listed above still do not allow you to restart the main R session, which prevents you from reloading installed packages, switching between projects, etc, which is clearly not ideal. I think it would be fantastic if switching between projects in an RStudio (server) session would allow jobs in the previously active project to keep running in the background, but have no idea if that's in the cards for the open source version.
Often you don't need several instances of Rstudio - in this case just save your code in .R file and launch it using ubuntu command prompt (maybe using screen)
Rscript script.R
That will launch a separate R session which will do the work without freezing your Rstudio. You can pass arguments too, for example
# script.R -
args <- commandArgs(trailingOnly = TRUE)
if (length(args) == 0) {
start = '2015-08-01'
} else {
start = args[1]
}
console -
Rscript script.R 2015-11-01
I think you need R Studio Server Pro to be able to log in with multiple users/sessions.
You can see the comparison table below for reference.
https://www.rstudio.com/products/rstudio-server-pro/
Installing another instance of rstudio server is less than ideal.
Linux server admins, fear not. You just need root access or a kind admin.
Create a group to use: groupadd Rwarrior
Create an additional user with same home directory as your primary Rstudio login:
useradd -d /home/user1 user2
Add primary and new user into Rwarrior group:
gpasswd -a user2 Rwarrior
gpasswd -a user1 Rwarrior
Take care of the permissions for your primary home directory:
cd /home
chown -R user1:Rwarrior /home/user1
chmod -R 770 /home/user1
chmod g+s /home/user1
Set password for the new user:
passwd user2
Open a new browser window in incognito/private browsing mode and login to Rstudio with the new user you created. Enjoy.
I run multiple RStudio servers by isolating them in Singularity instances. Download the Singularity image with the command singularity pull shub://nickjer/singularity-rstudio
I use two scripts:
run-rserver.sh:
Find a free port
#!/bin/env bash
set -ue
thisdir="$(dirname "${BASH_SOURCE[0]}")"
# Return 0 if the port $1 is free, else return 1
is_port_free(){
port="$1"
set +e
netstat -an |
grep --color=none "^tcp.*LISTEN\s*$" | \
awk '{gsub("^.*:","",$4);print $4}' | \
grep -q "^$port\$"
r="$?"
set -e
if [ "$r" = 0 ]; then return 1; else return 0; fi
}
# Find a free port
find_free_port(){
local lower_port="$1"
local upper_port="$2"
for ((port=lower_port; port <= upper_port; port++)); do
if is_port_free "$port"; then r=free; else r=used; fi
if [ "$r" = "used" -a "$port" = "$upper_port" ]; then
echo "Ports $lower_port to $upper_port are all in use" >&2
exit 1
fi
if [ "$r" = "free" ]; then break; fi
done
echo $port
}
port=$(find_free_port 8080 8200)
echo "Access RStudio Server on http://localhost:$port" >&2
"$thisdir/cexec" \
rserver \
--www-address 127.0.0.1 \
--www-port $port
cexec:
Create a dedicated config directory for each instance
Create a dedicated temporary directory for each instance
Use the singularity instance mechanism to avoid that forked R sessions are adopted by PID 1 and stay around after the rserver has shut down. Instead, they become children of the Singularity instance and are killed when that shuts down.
Map the current directory to the directory /data inside the container and set that as home folder (this step might not be nessecary if you don't care about reproducible paths on every machine)
#!/usr/bin/env bash
# Execute a command in the container
set -ue
if [ "${1-}" = "--help" ]; then
echo <<EOF
Usage: cexec command [args...]
Execute `command` in the container. This script starts the Singularity
container and executes the given command therein. The project root is mapped
to the folder `/data` inside the container. Moreover, a temporary directory
is provided at `/tmp` that is removed after the end of the script.
EOF
exit 0
fi
thisdir="$(dirname "${BASH_SOURCE[0]}")"
container="rserver_200403.sif"
# Create a temporary directory
tmpdir="$(mktemp -d -t cexec-XXXXXXXX)"
# We delete this directory afterwards, so its important that $tmpdir
# really has the path to an empty, temporary dir, and nothing else!
# (for example empty string or home dir)
if [[ ! "$tmpdir" || ! -d "$tmpdir" ]]; then
echo "Error: Could not create temp dir $tmpdir"
exit 1
fi
# check if temp dir is empty (this might be superfluous, see
# https://codereview.stackexchange.com/questions/238439)
tmpcontent="$(ls -A "$tmpdir")"
if [ ! -z "$tmpcontent" ]; then
echo "Error: Temp dir '$tmpdir' is not empty"
exit 1
fi
# Start Singularity instance
instancename="$(basename "$tmpdir")"
# Maybe also superfluous (like above)
rundir="$(readlink -f "$thisdir/.run/$instancename")"
if [ -e "$rundir" ]; then
echo "Error: Runtime directory '$rundir' exists already!" >&2
exit 1
fi
mkdir -p "$rundir"
singularity instance start \
--contain \
-W "$tmpdir" \
-H "$thisdir:/data" \
-B "$rundir:/data/.rstudio" \
-B "$thisdir/.rstudio/monitored/user-settings:/data/.rstudio/monitored/user-settings" \
"$container" \
"$instancename"
# Delete the temporary directory after the end of the script
trap "singularity instance stop '$instancename'; rm -rf '$tmpdir'; rm -rf '$rundir'" EXIT
singularity exec \
--pwd "/data" \
"instance://$instancename" \
"$#"

Resources