The R session that I am working on is on a remote cluster due to memory contraints and the data is stored remotely. I am therefore using notepad++ to edit my files and just paste them into my SSH session as I go along. What is the best way to integrate with the remote session to take advantage of code completion and other things available in editors like RStudio. Any best practice suggestions about working on remote connections? I imagine this must be the case for most R users who work with large data sets.
From: http://www.sciviews.org/_rgui/projects/Editors.html
The famous Vim editor now also provides syntax highlighting for R. You can get a plugin to integrate R here. For Windows, there is another plugin using R-DCOM here. There is an alternate, multiplatform, plugin here. Also look at the VIM web site, because there are other interesting resources there for R users (see here for an overview). There is also an R package to better integrate debugging facilites in Vim: adtdbg
Related
I have created an R code script that:
Reads some data from a database
Makes some transformations and..
exports into a csv the modified table.
This code needs to run in a client's machine, but we need to "hide" the actual code from the user.
Is there any useful suggestions on how we can achieve that?
Up front
... it will be nearly impossible to deploy an R <something> to another computer in a way that prevents curious users from accessing the source code.
From a mailing list conversation in 2011, in response to "I would not like anyone to be able to read the code.",
R is an open source project, so providing ways for you to do this is not
one of our goals.
Duncan Murdoch https://stat.ethz.ch/pipermail/r-help/2011-July/282755.html
(Prof Murdoch was on the R Core Team and R Foundation for many years.)
Background
Several (many?) programming languages provide the ability to compile a script or program into an executable, the .exe you reference. For example, python has tools like py2exe and PyInstaller. The tools range from merely compactifying the script into a zip-ball, perhaps obfuscating the script; ... to actually creating a exe with the script either tightly embedded or such. (This part could use some more citations/research.)
This is usually good enough for many people, by keeping the honest out. I say it that way because all you need to do is google phrases like decompile py2exe and you'll find tools, howtos, tutorials, etc, whose intent might be honestly trying to help somebody recover lost code. Regardless of the intentions, they will only slow curious users.
Unfortunately, there are no tools that do this easily for R.
There are tools with the intent of making it easy for non-R-users to use R-based tools. For instance, RInno and DesktopDeployR are two tools with the intent of creating Windows (no mac/linux) installers that support R or R/shiny tools. But the intent of tools like this is to facilitate the IT tasks involved with getting a user/client to install and maintain R on their computer, not with protecting the code that it runs.
Constrain R.exe?
There have been questions (elsewhere?) that ask if they can modify the R interpreter itself so that it does not do everything it is intended to do. For instance, one could redefine base::print in such a way that functions' contents cannot be dumped, and debug doesn't show the code it's about to execute, and perhaps several other protective steps.
There are a few problems with this approach:
There is always another way to get at a function's contents. Even if you stop print.default and the debugger from doing this, there are others ways to get to the functions (body(.), for one). How many of these rabbit holes do you feel you will accurately traverse, get them all ... with no adverse effect on normal R code?
Even if you feel you can get to them all, are you encrypting the source .R files that contain your proprietary content? Okay, encrypting is good, except you need to decrypt the contents somehow. Many tools that have encrypted contents do so to thwart reverse-engineering, so they also embed (obfuscatedly, of course) the decryption key in the application itself. Just give it time, somebody will find and extract it.
You might think that you can download the key on start-up (not stored within the app), so that the code is decrypted in real-time. Sorry, network sniffers will get the key. Even if you retrieve it over https://, tools such as https://mitmproxy.org/ will render this step much less effective.
Let's say you have recompiled R to mask print and such, have a way to distribute source code encrypted, and are able to decrypt it in a way that does not easily reveal the key (for full decryption of the source code files). While it takes a dedicated user to wade through everything above to get to the source code, none of the above steps are required: they may legally compel you to release your changes to the R interpreter itself (that you put in place to prevent printing function contents). This doesn't reveal your source code, but it will reveal many of your methods, which might be sufficient. (Or just the risk of legal costs.)
R is GPL, and that means that anything that links to it is also "tainted" with the GPL. This means that anything compiled with Rcpp, for instance, will also be constrained/liberated (your choice) by the GPL. This includes thoughts of using RInside: it is also GPL (>= 2).
To do it without touching the GPL, you'd need to write your interpreter (relatively from scratch, likely) without code from the R project.
Alternatives
Ultimately, if you want to release R-based utilities/apps/functionality to clients, the only sure-fire way to allow them to use your code without seeing it is to ... control the computers on which R will run (and source code will reside). I'll add more links supporting this claim as I find them, but a small start:
https://stat.ethz.ch/pipermail/r-help/2011-July/282717.html
https://www.researchgate.net/post/How_to_make_invisible_the_R_code
Options include anything that keeps the R code and R interpreter completely under your control. Simple examples:
Shiny apps, self-hosted (or on shinyapps.io if you trust their security); servers include Shiny Server (both free and commercial versions), RStudio Connect (commercial only), and ShinyProxy. (The list is not known to be exclusive.)
Rplumber is an API server, not a shiny server. The intent is for single HTTP(s) endpoint calls, possibly authenticated, supporting whatever HTTP supports (post, get, etc). This can be served in various ways, see its hosting page for options.
Rserve. I know less about this, but from what I've experienced with it, I've not had as much luck integrating with enterprise systems (where, e.g., authentication and fine-control over authorization is important). This does allow near-raw access to R, so it might not be what you want (especially when the intent is to give to clients who may not be strong R users themselves).
OpenCPU should be discussed, but not as a viable candidate for "protect your code". It is very similar to rplumber in that it provides HTTP endpoints, but it supports endpoints for every exported function in every package installed in its R library. This includes the base package, so it is not at all difficult to get the source code of any function that you could get on the R console. I believe this is a design feature, even if it is perfectly at odds with your intent to protect your code.
Anything that can call R or Rscript. This might be PHP or mod_python or similar. Any web-page serving language that can exec("/usr/bin/Rscript",...) can take its output and turn it around to the calling agent. (It might also be possible, for example, for a PHP front-end to call an opencpu endpoint that only permits connections from the PHP-serving host.)
I have been trying to make the case to have R installed at my place of employment. However, the IT department has come back with a risk assessment that R has potential risks. After much debating (and brick-head-interaction), I suggested removing all internet connectivity from R.
I know (hope?) in principle that it can be done, since 'base' R is open source and can be edited. My questions are:
How do I disable all internet possibility from 'base' R by editing the source code?
Once 1. is done, will the lack of internet flow on to packages? That is, will all internet be cut off from any package, no matter what the package is trying to do?
(Sorry, I'm a stats/maths guy, not so much a 'deep' programming dude.)
Wow, what a very strict work condition! Yeah, data analysis can be sometimes very dangerous. :-)
Generaly speaking it's possible to use R without internet connection, you just have to download the packages and install them from source (.zip/.tar.gz files).
But adjusting R source code would be unnecessary effort. I think that your IT department should be able to block access to internet connection in the firewall settings for R apps (R, RGui and/or RStudio), which takes only few minutes to set up. E.g. in Windows they can use Windows Defender Firewall with Advanced Security to block outbound connections from R exe files:
If they use another firewall or network rules, they should be able to set it up correctly and quickly as well.
I tried RInside's Qt example qdensity and really liked it. It was easy to setup and I was surprised how easy it was to understand and modify given that I have virtually no Qt experience. Now I wonder whether it is possible to use RInside with R somewhere on a remote machine.
It seems that I cannot use RInside for this purpose. I wonder whether there is another way of creating a Qt Desktop app, that communicates with R on some server. I got R Studio Server running and I am really happy with it, but it's for the R people. In order to promote my R stuff within our institute also among non-R people I would like to offer a simple, very limited GUI that can do basics things like showin' some graph or starting a R CMD Batch. I also know shiny (and shiny server) and have been actively testing it recently, but I am looking for a simple Desktop client go connect with my server-side R.
Is there a basis to start out with Rserve and Qt?
Any suggestions (where to start, examples, generally bad idea) ???
What are R's capabilities to handle something like this IPC or D-Bus stuff.
Use Qt with C++, and just process the files that you create with R on you're server.
So for example: create the graphic and save in a format that you can load. BMP, PNG etc. Load it to you're GUI.
Also I suggest Qt Creator for GUI design. Its fast and simple. This idea only fits you if you don't want to stay in in R environment.
When I have created programs that process data and calculate things like probabilities and charts, usually use HTML for the interface using PHP and leaving the rest of the processing (for example R scripts) to the server.
For any recent visitor: Take a look at openCPU, it publishes R functions as restful services and does all the marshalling from R data types from and to JSON.
Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
We don’t allow questions seeking recommendations for books, tools, software libraries, and more. You can edit the question so it can be answered with facts and citations.
Closed 7 years ago.
Improve this question
I am looking for a web based interpreter for the language R.
To be more precise , i am looking for a IDE like http://codepad.org/ where i can provide the code and the server should execute and provide me with the output.
I went through applications like Rapache but then they don't fit my requirement as they are not made to accept code from client , execute it and provide the result.
In short , i could find web application which takes input from the user , execute a specific R script and then place the output in a neatly formated way but not a web application which accepts R code ,execute it and then place it in a neat way.
A few possibilities come to mind:
ideone provides a lot of different languages, of which, R is one of them. When you run a script, you are provided with a link that you can embed in a webpage (but which doesn't show the output, unfortunately). If you create an account, you can also store your previously run scripts.
Pro: You can easily insert /plain/ into your script and be able to get a URL that can be sourced directly in R. For example, if the URL for your script online is "http://ideone.com/PIkeD", then you can use source("http://ideone.com/plain/PIkeD") to load your script directly from the ideone servers.
Cons: Stuck at version 2.11 Might not always be the most current version of R. Presently at 3.2.2. Can't install other packages. Output doesn't show in the embed script provided.
Cloudstat console runs a more recent version of R (2.15.1) with quite a few commonly used package. It used to have a really interesting blog/notebook interface that integrated code and the output, but that doesn't seem to be available at the moment.
Pro: Useful for running something fairly straightforward in a pinch.
Cons: Can't install other packages. Output is not formatted in code blocks, so is not easily readable. At the moment, can't save or share the code you've run.
Crunch offers a full RStudio setup, runs the most recent version of R, and allows you to install the packages you need. This may be more convenient than having to install your own RStudio server. You do have to request an account though.
Pros: Pretty much all you would expect from R/RStudio. Allows you to use Sweave and R markdown to automatically create documents too. These documents can be publicly hosted too. Here's an example where I've placed a page in a public folder called "gallery": http://crunch.kmi.open.ac.uk/people/~mrdwab/gallery/howzat.html
Cons: Sometimes the loading time is a bit slow, but as I am running RStudio desktop, I don't know how Crunch compares to running my own RStudio server.
Updated January 10, 2014
Recently, there has also been a decent amount of buzz around R-Fiddle as an interesting way to share R code. It looks like it is what powers the awesome http://www.rdocumentation.org/ site.
RStudio IDE (Server) may be the answer to your question. Have a look at http://www.rstudio.com/ide/
You can try Rcloud which we are developing in AT&T research lab. Its a open source IDE like Rstudio/IPYthon and has more advanced capabilities in terms of collaboration.
https://github.com/att/rcloud
RCloud is an environment for collaboratively creating and sharing data analysis scripts. RCloud lets you mix analysis code in R, HTML5, Markdown, Python, and others. Much like Sage, iPython notebooks and Mathematica, RCloud provides a notebook interface that lets you easily record a session and annotate it with text, equations, and supporting images.
I am looking for a solution that allows me to keep a track of a multitude of R scripts that I create for various projects and purposes. Some scripts are easily tracked to specific projects, whereas others are "convenience" functions created to serve a set of tasks.
Is there a way I can create a central DB and query it to find which scripts match most appropriately?
I could create a system using a DBMS manually, but are users aware of anything in general or specific to R, that comes in the form of a software tool (maybe FOSS) ?
EDIT: Thank you for the responses. My current system is just a set of scripts with comments that allow me to identify their intended task. Though I use StatET with SVN, I would like a search utility along the lines of the "sos" package.
The question
I am looking for a solution that allows me to keep a track of a multitude of R scripts
that I create for various projects and purposes. Some scripts are easily tracked to specific
projects, whereas others are "convenience" functions created to serve a set of tasks.
fails to address the obvious follow-up of why the existing mechanism is not suitable:
Create a local package for each project
Create one or more local packages for local utility functions
Use R's already existing mechanisms for searching, indexing, testing, cross-referencing
And use any revision control system of your liking, local or on the web, to host the code for 1. to 3. above.
Reinventing an RDBMS schema for 1. to 3. is just wrong in my book. But if you must, go ahead and replicate what you can already (mostly) get for free in tested and widely used code.
R comes with several mechanisms for searching for help, most of which naturally use CRAN. Some examples: the sos package, cranberries, crantastic, and rseek. In many cases, these could be adapted to use a local repository (you can find out how to create a local repository in the R manual, which is very easy to do). Otherwise, if you package your scripts and submit them to CRAN, you will naturally have these available to you. I would also highly recommend this presentation on the subject: Creating R Packages, Using CRAN, R-Forge, And Local R Archive Networks And Subversion (SVN) Repositories from Spencer Graves and Sundar Dorai-Raj.
These would require you to put your code in packages, and create documentation, all of which is worth doing anyway. The package documentation turns out to be very useful for both documenting what things do, and helping your find them in the future. You can use roxygen to create this documentation in-line with your code. Also read this related question: Organizing R Source Code.
Alternatively, the help.search() function can be very useful for searching local packages, regardless of whether you have a repository set up.
You'd probably be best working with a version control system. Many can be indexed and be made search-able. At my work, a stack of R, Eclipse, StatET, Subversion and Subclipse works very well for us.