Restrictive AppArmor Profiles in R - r

I developed an R shiny web application, hosted on an Ubuntu 20.04 machine and deployed via ShinyProxy. I.e. each instance of the app runs in a separate Docker container. Various directories inside the containers are mapped to directories in the host system.
The app contains a console where users can enter R code which is evaluated in the backend. Allowing users to insert executable code into the backend is always risky. Yet, it is mandatory for this application. Docker containers provide some degree of isolation, but are not a full sandboxing solution.
Therefore, I would like to use AppArmor, called via RAppArmor, to further secure the application and prevent the user from reading, writing, or executing essentially any files on disk. That is more restrictive than what the pre-defined AppArmor profiles in the RAppArmor package implement. The problem is that R would not run, if I simply denied access to the entire file system. R's basic functionalities require access to various directories. However, I do not know what the most restrictive configuration would look like that still permits running R.
The setup should not allow R to read, write, or execute any files, except those needed for R to run and functions included in pre-defined list of packages. E.g. the user might be allowed to use functions from the gdistance package, but not the DBI package. And of course, the user may not install any packages.
Here is a much less restrictive example profile from RAppArmor:
profile r-base {
#include <abstractions/base>
#include <abstractions/nameservice>
#{PROC}/[0-9]*/attr/current r,
/bin/* rix,
/dev/tty r,
/etc/R/ r,
/etc/R/* r,
/etc/fonts/** mr,
/etc/resolv.conf r,
/etc/xml/* r,
/tmp/** rw,
/usr/bin/* rix,
/usr/lib/R/bin/* rix,
/usr/lib{,32,64}/** mr,
/usr/lib{,32,64}/R/bin/exec/R rix,
/usr/local/lib/R/** mr,
/usr/local/share/** mr,
/usr/share/** mr,
/usr/share/ca-certificates/** r,
}
This question fits multiple SE forums in that it requires a deeper understanding of the R programming language (Stack Overflow), the Linux (Ubuntu) operation system (Unix SE), and security issues (Security SE). Yet, because its focus lies on R and the RAppArmor package in particular, Stack Overflow is the best fit. This should be obvious from reading the question. I still include this paragraph because there are numerous trigger-happy reviewers and moderators who shoot down any question that is remotely related to another SE forum, without carefully considering their decision.

Related

How can I permanently set an environment variable using Autotools?

I'm adapting an existing program to use Autotools for its build, but the resulting process depends on an environment variable. Is there a way to permanently set this environment variable during the build or installation process?
The program is intended to be used by Unix users and I could try to concatenate an export command directly to the .bashrc file and warn the user in case it fails because most of them will actually just use Ubuntu to run it (it's a relatively simple program that targets students), but I'd like to know if there's a more portable way to do this.
That's what I wouldn't like to do:
export VAR=/my/totally/not/hardcoded/path >> $HOME/.bashrc
Sorry to come to this late, but all of the answers to date are shockingly ... incomplete.
Building and installing software are both core use cases for the Autotools, and the installation part can absolutely involve adding or modifying files that affect user environments. If the software is installed by a user with sufficient privilege, then such effects can absolutely be applied to all system users, though the details may vary a bit from system to system (and the Autotools can help with that, too!).
For example, on RedHat-family Linuxes such as RedHat Enterprise, Fedora, Oracle Linux, and various others, you can drop an appropriately named file in /etc/profile.d, and the commands in it will automatically be read and executed by every login shell. Setting environment variables for all users is one of the common uses of this feature. I'm uncertain about Debian-family Linuxes such as Ubuntu, but it is always possible to modify file /etc/profile instead to have the same effect, and you absolutely can write an Automake install hook to do that.
Or for an altogether different approach, you can always provide a wrapper script around your program that sets the needed environment variables (supposing that the point is other than to add a directory to the PATH so as to find the program in the first place). In that case, you can even install the main program in a location that is not ordinarily in the path, so that users don't accidentally run it directly. This mechanism has the advantage that the environment variables are scoped to a run of the program, not a whole login session, but the disadvantage that users cannot override them.
I guess, no.
Autotools are about building your program, not about environment setup for the program to run. That's what users/admins are supposed to do. (Well I can imagine doing this, but I really don't want to try to figure it out, because the idea itself seems broken to me)
If your program REALLY needs some environment variable during run-time, then you should patch your sources for your application to test if the variable exists, and set one to default desired value, if it doesn't. Another idea is to enforce usage of an obligatory command line switch to pass the value in.
It's not clear what this has to do with autotools (or any other build system). No build system, by itself, can arrange for an env var to be present when the program it builds is run at a later tiem.
One solution is for your program to have a hardcoded default value for the var which is used if the environment var isn't present when the program starts running. Another frequently used solution is to name your binary something like myprog.bin and install a shell script named myprog which sets up the environment before doing exec myprog.bin.
I'm adapting an existing program to use Autotools for its build, but the resulting process depends on an environment variable. Is there a way to permanently set this environment variable during the build or installation process?
You've not been very concrete about what the program is (e.g. is the program a daemon? A user program?) or the nature of the environment variable dependency (e.g. is it another program? A mount point? A URL? A DB connection string?). Being more specific might give a better answer for you.
Anyway, autotools is not likely to offer any feature to help: It's a build system. Depending on the nature of your environment variable dependency, you're likely going to need package management (if you package it) or system administration level setup.
Since you think your primary user base is on Ubuntu this help page might give you some ideas.

Correct configuration for collaborative development of an R package on a shared drive

We have two people on two computers. Computer A has a shared drive with Computer B. We would like to collaborate on an R package. My understanding is that we have three ingredients:
The R Project source code
The installed R package
The git repository
Currently we share all three over the shared drive. However, this doesn't allow us to work on separate Git branches (unless there's something I'm missing), since we're working on the same source files.
What would be the correct way of arranging these files to allow both users to work on separate Git branches? Would both have local versions of the R Project, and work on a common git repository? If so, would it be better to also have separate installations of the package, or a share a single one on the shared drive?
Thanks!
The capabilities you require are best supported within Git. Ideally you should structure your work to minimize the degree to which multiple people are working on a given file at the same time to reduce the effort to merge multiple versions of the same file back into the master branch. An alternate strategy would be to make frequent commits and pulls to reduce the complexity of merges. A thorough treatment of branching strategies in Git are available in Chapter 3 of Pro Git.

Deploy R function with directory as argument as executable or web application

I've written an R function (code available on demand) that improves some analysis workflows in my research group (~10 people), and now I would like to deploy it so it's easily accessible to the rest of the group. I am the only person in the group with any knowledge of R.
In a nutshell, the function does the following:
Takes one argument, the directory in which to search for microscopy images in proprietary formats
Asks user (with readline()) which channels should be analysed and what they are named
Generate several histograms and scatter plots of intensity levels per image channel, after various normalisation steps, these are deposited in a .pdf file for each image stack
Perform linear regression, generate a .txt file per image stack
The .pdf and .txt files get output to the directory the user specifies as the argument when running the function. I want to turn it into something somewhat more user-friendly, essentially removing the need to install R + function dependencies. For the sake of universality I would like to deploy it as a web application that takes a .zip file of the images as input, extracts them and then runs the function with that newly created directory as the argument. When it's done, it should output a .zip file of the created .pdfs and .txts. I'm not sure how feasible this is. I have looked into using Shiny for this but I'm having a hard time figuring out how to apply it as I do not have experience with it. I have experience in unix server administration and have a remote server that I can play around with.
The second option would be somewhat less universal, but it would be to deploy it as a Windows executable (I am the only person in my group not to use Windows as a daily OS, but I do have access to a Windows environment). Here, ideally the executable should ask the user to navigate to a directory, then use that directory as the argument to the function and output the generated files in said directory.
As my experience with R is limited, I cannot tell which option would be more feasible and worth it in the long run. My gut feeling says the web application would be the most flexible and user friendly, but more challenging to deploy. Are either of my ideas implementable, and if so, what would be a good way to do so?
Thanks in advance!

use julia language without internet connection (mirror?)

Problem:
I would like to make julia available for our developers on our corporate network, which has no internet access at all (no proxy), due to sensitive data.
As far as I understand julia is designed to use github.
For instance julia> Pkg.init() tries to access:
git://github.com/JuliaLang/METADATA.jl
Example:
I solved this problem for R by creating a local CRAN repository (rsync) and setting up a local webserver.
I also solved this problem for python the same way by creating a local PyPi repository (bandersnatch) + webserver.
Question:
Is there a way to create a local repository for metadata and packages for julia?
Thank you in advance.
Roman
Yes, one of the benefits from using the Julia package manager is that you should be able to fork METADATA and host it anywhere you'd like (and keep a branch where you can actually check new packages before allowing your clients to update). You might be one of the first people to actually set up such a system, so expect that you will need to submit some issues (or better yet; pull requests) in order to get everything working smoothly.
See the extra arguments to Pkg.init() where you specify the METADATA repo URL.
If you want a simpler solution to manage I would also think about having a two tier setup where you install packages on one system (connected to the internet), and then copy the resulting ~/.julia directory to the restricted system. If the packages you use have binary dependencies, you might run into problems if you don't have similar systems on both sides, or if some of the dependencies is installed globally, but Pkg.build("Pkgname") might be helpful.
This is how I solved it (for now), using second suggestion by
ivarne.I use a two tier setup, two networks one connected to internet (office network), one air gapped network (development network).
System information: openSuSE-13.1 (both networks), julia-0.3.5 (both networks)
Tier one (office network)
installed julia on an NFS share, /sharename/local/julia.
soft linked /sharename/local/bin/julia to /sharename/local/julia/bin/julia
appended /sharename/local/bin/ to $PATH using a script in /etc/profile.d/scriptname.sh
created /etc/gitconfig on all office network machines: [url "https://"] insteadOf = git:// (to solve proxy server problems with github)
now every user on the office network can simply run # julia
Pkg.add("PackageName") is then used to install various packages.
The two networks are connected periodically (with certain security measures ssh, firewall, routing) for automated data exchange for a short period of time.
Tier two (development network)
installed julia on NFS share equal to tier one.
When the networks are connected I use a shell script with rsync -avz --delete to synchronize the .julia directory of tier one to tier two for every user.
Conclusion (so far):
It seems to work reasonably well.
As ivarne suggested there are problems if a package is installed AND something more than just file copying is done (compiled?) on tier one, the package wont run on tier two. But this can be resolved with Pkg.build("Pkgname").
PackageCompiler.jl seems like the best tool for using modern Julia (v1.8) on secure systems. The following approach requires a build server with the same architecture as the deployment server, something your institution probably already uses for developing containers, etc.
Build a sysimage with PackageCompiler's create_sysimage()
Upload the build (sysimage and depot) along with the Julia binaries to the secure system
Alias a script to julia, similar to the following example:
#!/bin/bash
set -Eeu -o pipefail
unset JULIA_LOAD_PATH
export JULIA_PROJECT=/Path/To/Project
export JULIA_DEPOT_PATH=/Path/To/Depot
export JULIA_PKG_OFFLINE=true
/Path/To/julia -J/Path/To/sysimage.so "$#"
I've been able to run a research pipeline on my institution's secure system, for which there is a public version of the approach.

Package management running R on a shared server

some background: I'm a fairly beginning sysadmin maintaining the server for our department. The server houses several VM's, mostly Ubuntu SE 12.04, usually with a separate VM per project.
One of the tools we use is R and RStudio, also server edition. I've set this up so everyone can access it through their browser, but I'm still wondering what the best way would be to deal with package management. Ideally, I will have one folder/library with our "common" packages, which are common in many projects and use cases. I would admin this library, since I'm the only user in sudo. My colleagues should be able to add packages on a case-by-case basis in their "personal" R folders, that get checked as a backup in case a certain package is not available in our main folder.
My question has a few parts:
- Is this actually a viable way to set this up?
- How would I configure this?
- Is there a way to easily automate this library for use in other VM's?
I have a similar question pertaining to Python, but maybe I should make a new question for that..
R supports multiple libaries for packages by default. Libraries are basically just folders in which installed packages are placed.
You can use
.libPaths()
in R to view what paths are use as libraries on your system. On my Ubuntu 13.10 system, there are
a personal library at "~/R/x86_64-pc-linux-gnu-library/3.0" where packages installed by the user are placed,
"/usr/lib/R/library" where packages installed via apt-get are placed and
"/usr/lib/R/site-library" which is a system-wide library for packages that are shared by all users.
You can add additional libraries to R, but from how I understand your question, installing packages to /usr/lib/R/site-library might be what you are looking for. This can be archived relatively easily by running R as root and calling install.packages() and update.packages() from there as usual. However, running R as root is a security risk and not a good idea, so maybe it is better to create an separate user with write access to /usr/lib/R/site-library and to use that one instead of root.
If you mount /usr/lib/R/site-library on multiple VM, they should also share the packages installed there. Does this answer your question?
Having common library and personal library locations is completely feasible.
Each user should have two environment variables set. R_LIBS should point to the common library, and R_LIBS_USER should point to their personal location. See ?.Library for more information.
You can check a user's library paths using .libPaths(). You probably want users to install packages to their personal library, so some fiddling may be required to make sure that the personal library is the first element in of .libPaths().

Resources