Block potentially malicious R calls - r

Let's presuppose that you have R running with root/admin privileges. What R calls do you consider harmful, apart from system() and file.*()?
This is a platform-specific question, I'm running Linux, so I'm interested in Linux-specific security leaks. I will understand if you block discussions about R, since this post can easily emerge into "How to mess the system up with R?"

Do not run R with root privs. There is no effective way to secure R in this way, since the language includes eval and reflection, which means I can construct invocations to system even if you don't want me to.
Far better is to run R in a way that cannot affect the system or user data, no matter what it tries to do.

Anything that calls external code could also be making system changes, so you would need to block certain packages and things like .Call(), .C(), .jcall(), etc.
Suffice it to say that it will end up being a virtually impossible task, and you are better off running it in a virtualized environment, etc. if you need root access.

You can't. You should just change the question: "How do I run user-supplied R code so as not to harm the user or other users of the system?" That's actually a very interesting question and one that can be solved with a little bit of cloud computing, apparmor, chroot magic, etc.

There are tons of commands you could use to harm the system. A handful of examples: Sys.chmod, Sys.umask, unlink, any command that allows you to read/write to a connection (there are many), .Internal, .External, etc.
And if you blocked users from those commands, there's nothing stopping them from implementing something in a package that you wouldn't know to block.

As noted by just about every response to this thread, removing the "potentially harmful" calls in the R language would:
Be potentially impossible to do completely.
Be difficult to do without spending significant time writing complicated (i.e. ugly) hacks.
Kneecap the language by removing a ton of functionality that makes R so flexible.
A safer solution that doesn't require modifying/rewriting large parts of the R language would be to run R inside a jail using something like BSD Jails, Jailkit or Solaris Zones.
Many of these solutions allow the jailed process to exercise root-like privileges but restrict the areas of the computer that the process can operate on.
A disposable virtual machine is another option. If a privileged user thrashes the virtual environment, just delete it and boot another copy.

One of my all time favorites. You don't even have to be r00t.
library(multicore);
forkbomb <- function(){
repeat{
parallel(forkbomb());
}
}
forkbomb();

To adapt a cliche from gun rights people, "system() isn't harmful - people who call system() are harmful".
No function calls are intrinsically harmful, but if you allow people to use them freely then those people may cause harm.
Also, the definition of harm will depend on what you consider harmful.

In general, R is so complex that you can assume that there is a way to trick it in executing data with seemingly harmless functions, for instance through buffer overflow.

Related

How to avoid the forge model derivative queue

I want to use the forge viewer as a preview tool in my web app for generated data.
The problem I have is that the model derivative API is sometimes slow sometimes fast.
I read that this happens because the files are placed in a queue and being processed subsequentially.
In my opinion, this can be solved by:
Having the extraction.update webhook also tell me where I am in the queue. So I can inform my users with better progress information. Or when the queue is too long I can not stop the process.
Being able to have a private queue. I have no problem paying more credits if necessary.
Being able to generate svf2 files on my own server.
But I don't know if any of these options are possible. Or if there is another workaround.
Yes, that could be useful. I logged that request in our system: DERI-7940
Might be considered later on, but no plans currently
I'm not aware of any plans for that
We're always working on making the translation service better, but unfortunately, I cannot tell when it will meet your requirements - including the implementation of the webhook feature you mentioned.
SVF2 is specifically for very large models - is that what you are working with? If not, then I'm quite certain that translating to SVF would be faster.

Is there anyway to restrict an env variable to readonly

I am using the following in openssh/telnet code, which sets the user environment.
setenv("TEST_ENV", "testing", 1);
But this can be modified by the user after login, is there anyway to make it read-only env variable?
No, I'm not aware of any way of making a process's environment read-only.
You are aware, I trust, that a process can't change its parent's environment, and that a process has complete freedom to set the initial environment of any processes it in turn creates. It might be worth being a little more detailed about what you want to do, or what you want to stop a program being able to do.
Some OSs have fairly elaborate sandboxing support in the kernel (I know OS X has, for example, but it won't be the only one), and these might be able to control access to getenv. But that's obviously platform-specific.

How do Unix capabilities work?

it seems that starting kernel 2.2, they introduced the concept of Capabilities. According to the unix man page on capabilities, it says if you're not a root user, you can grant yourself of capabilities by calling cap_set_proc per thread basis. So does this mean that if you're writing a malware for unix, do you just grant yourself bunch of capabilities and compromise the system? If not, how does one grant capabilities required to run the program?
it seems that Unix's security model is quite flawed primitive. Am I getting this right?
I'll go more specific:
How do you (when running as a non-root user) send a signal to another process that is running under different user? On signal man page, it says you need CAP_KILL capability to perform this. However, reading the capabilities man page, I'm not sure how I can grant a process that capability.
From man cap_set_proc:
Please note, by default, the only processes that have CAP_SETPCAP available to them are processes started as a kernel-thread. (Typically this includes init(8), kflushd and kswapd). You will need to recompile the kernel to modify this default.
Trust me if it was that easy I'm sure someone would have exploited it by now. Unix's security model may be simple by comparison to other operating systems, but it doesn't mean it's "flawed".
it's impossible. Use Socket or File instead.

Can the R console support background tasks or interrupts (event-handling)?

While working in an R console, I'd like to set up a background task that monitors a particular connection and when an event occurs, another function (an alert) is executed. Alternatively, I can set things up so that an external function simply sends an alert to R, but this seems to be the same problem: it is necessary to set up a listener.
I can do this in a dedicated process of R, but I don't know if this is feasible from within a console. Also, I'm not interested in interrupting R if it is calculating a function, but alerting or interrupting if the console is merely waiting on input.
Here are three use cases:
The simplest possible example is watching a file. Suppose that I have a file called "latestData.csv" and I want to monitor it for changes; when it changes, myAlert() is executed. (One can extend it to do different things, but just popping up with a note that a file has changed is useful.)
A different kind of monitor would watch for whether a given machine is running low on RAM and might execute a save.image() and terminate. Again, this could be a simple issue of watching a file produced by an external monitor that saves the output of top or some other command.
A different example is like another recent SO question, about : have R halt the EC2 machine it's running on. If an alert from another machine or process tells the program to save & terminate, then being able to listen for that alert would be great.
At the moment, I suspect there are two ways of handling this: via Rserve and possibly via fork. If anyone has examples of how to do this with either package or via another method, that would be great. I think that solving any of these three use cases would solve all of them, modulo a little bit external code.
Note 1: I realize, per this answer to another SO question that R is single threaded, which is why I suspect fork and Rserve may work. However, I'm not sure about feasibility if one is interfacing with an R terminal. Although R's REPL is attached to the input from the console, I am trying to either get around this or mimic it, which is where fork or Rserve may be the answer.
Note 2: For those familiar with event handling / eventing methods, that would solve everything, too. I've just not found anything about this in R.
Update 1: I've found that the manual for writing R extensions has a section referencing event handling, which mentions the use of R_PolledEvents. This looks promising.
One more option is the svSocket package. It is non blocking.
Here is an 8 minute video using it, which has over 3,000 views. It shows how to turn an R session into a server and how to send commands to it and receive data back. It demonstrates doing that even while the server is busy; e.g., say you start a long running process and forget to save intermediate results, you can connect to the server and fetch the results (before it has finished) from it.
It depends whether you want to interrupt idling or working R. If the first, you can think of bypassing the R default REPL loop by some event listener that will queue the incoming events and evaluate them. The common option is to use tcl/tk or gtk loop; I have made something like this around libev in my triggr package, which makes R digest requests coming from a socket.
The latter case is mostly hopeless, unless you will manually make the computational code to execute if(evenOccured) processIt code periodically.
Multithreading is not a real option, because as you know two interpreters in one process will break themselves by using same global variables, while forked processes will have independent memory contents.
It turns out that the package Rdsm supports this as well.
With this package, one can set up a server/client relationship between different instances of R, each is a basic R terminal, and the server can send messages, including functions, to the clients.
Transformed to the use case I described, the server process can do whatever monitoring is necessary, and then send messages to the clients. The documentation is a little terse, unfortunately, but the functionality seems to be straightforward.
If the server process is, say, monitoring a connection (a file, a pipe, a URL, etc.) on a regular basis and a trigger is encountered, it can then send a message to the clients.
Although the primary purpose of the package is shared memory (which is how I came across it), this messaging works pretty well for other purposes, too.
Update 1: Of course for message passing, one can't ignore MPI and the Rmpi package. That may do the trick, but the Rdsm package launches / works with R consoles, which is the kind of interface I'd sought. I'm not yet sure what Rmpi supports.
A few ideas:
Run R from within another language's script (this is possible, for example, in Perl using RSPerl) and use the wrapping script to launch the listener.
Another option may be to run an external (non-R) command (using system()) from within R that will launch a listener in the background.
Running R in batch mode in the background either before launching R or in a separate window.
For example:
R --no-save < listener.R > output.out &
The listener can send an approraite email when the event occurs.

How do I email myself data from a R script?

I'm hoping to take advantage of Amazon spot instances which come at a lower cost but can terminate anytime. I want to set it up such that I can send myself data mid-way through a script so I can pick up from there in the future.
How would I email myself a .rdata file?
difficulty: The ideal solution will not involve RCurl since I am unable to install that package on my machine instance.
The same way you would on the command-line -- I like the mpack binary for that which you find in Debian and Ubuntu.
So save data to a file /tmp/foo.RData (or generate a temporary name) and then
system("mpack -s Data /tmp/foo.RData you#some.where.com")
in R. That assumes the EC2 instance has mail setup, of course.
Edit Per request for a windoze alternative: blat has been recommended by other for this task.
There is a good article on this in R News from 2007. Amongst other things, the author describes some tactics for catching errors as they occur, and automatically sending email alerts when this happens -- helpful for long simulations.
Off topic: the article also gives tips about how the linux/unix tools screen and make can be very useful for remote monitoring and automatic error reporting. These may also be relevant in cases when you are willing to let R email you.
What you're asking is probably best solved not by email but by using an EBS volume. The volume will persist regardless of the instance (note though that I'm referring to an EBS volume as opposed to an EBS-backed instance).
In another question, I mention a bunch of options for checkpointing and related tools, if you would like to use a separate function for storing your data during the processing.

Resources