Is anyone aware of an R-based data analysis setup that works well in a research data centre with no internet access? I would like to use good reproducible analysis practices, but I do not have the permission upload files to a repository, for example. Also, RStudio preferences (such as path to a local package repository) are not saved. So far I know:
The miniCRAN package helps with gathering all R package dependencies.
local git version control does not require internet access. (Plus, the data centre technician may be willing to use it to release results and R scripts if the learning curve is outweighed by time savings in the long run.)
I am considering writing up a proposal for a pilot project, but I don't want to re-invent the wheel - especially if a more comprehensive platform already exists. I have looked, but haven't found one, unless Microsoft's R Open essentially does it (not clear to me).
Thanks for your consideration!
Related
I'm currently working on an university research related software which uses statistical models in it in order to process some calculations around Item Response Theory. The entire source code was written in Go, whereas it communicates with a Rscript server to run scripts written in R and return the generated results. As expected, the software itself has some dependencies needed to work properly (one of them, as seen before, is to have R/Rscript installed and some of its packages).
Due to the fact I'm new to software development, I can't find a proper way to manage all these dependencies on Windows or Linux (but I'm prioritizing Windows right now). What I was thinking is to have a kind of script which checks if [for example] R is properly installed and, if so, if each used package is also installed. If everything went well, then the software could be installed without further problems.
My question is what's the best way to do anything like that and if it's possible to do the same for other possible dependencies, such as Python, Go and some of its libraries. I'm also open to hear suggestions if installing programming languages locally on the machine isn't the proper way to manage software dependencies, or if there's a most convenient way to do it aside from creating a script.
Sorry if any needed information is missing, I would also like to know.
Thanks in advance
I have been trying to make the case to have R installed at my place of employment. However, the IT department has come back with a risk assessment that R has potential risks. After much debating (and brick-head-interaction), I suggested removing all internet connectivity from R.
I know (hope?) in principle that it can be done, since 'base' R is open source and can be edited. My questions are:
How do I disable all internet possibility from 'base' R by editing the source code?
Once 1. is done, will the lack of internet flow on to packages? That is, will all internet be cut off from any package, no matter what the package is trying to do?
(Sorry, I'm a stats/maths guy, not so much a 'deep' programming dude.)
Wow, what a very strict work condition! Yeah, data analysis can be sometimes very dangerous. :-)
Generaly speaking it's possible to use R without internet connection, you just have to download the packages and install them from source (.zip/.tar.gz files).
But adjusting R source code would be unnecessary effort. I think that your IT department should be able to block access to internet connection in the firewall settings for R apps (R, RGui and/or RStudio), which takes only few minutes to set up. E.g. in Windows they can use Windows Defender Firewall with Advanced Security to block outbound connections from R exe files:
If they use another firewall or network rules, they should be able to set it up correctly and quickly as well.
I am working as a Data Scientist for a small start up and we are using R as part of our platform for analysis, dashboards etc. Therefore, I need to ensure that we maintain security with each package we use and load.
I have looked around and done extensive searching and have come across the following links:
This is the official R Studio Blog Security update page.
This blog post shows how you can implement rJava to help with those packages that require it, though it does state that '...the integrity & safety of the R package ecosystem is still in the “trust me, everything’s 👍!!”'
This post gives some good advice for package security, but basically boils down to: if you get it from CRAN or another trusted source then it should be ok.
The CVE site lists vulnerabilities, though the last one was back in 2017.
However, all the above links essentially say the same thing, which is "if its from CRAN (or similar), then it is probably fine". Now this might indeed be the case, but I was hoping for something a bit more rigorous. Has anyone else come across this issue with production R deployment?
If possible, if someone could direct to where I might be able to find out more information on checking for security updates, breaches and changes for R packages, or how to go about testing the security myself, I would be very grateful.
Thanks!
I've long thought about learing julia - a language I secretly hope will become the new standard for scientific computing - and when it is now packaged and included in the standard Ubuntu repositories, I figured it was time. I quickly found this tutorial and started hacking...
In the linked chapter, one is urged to download a library called ols.jl from a Github repository, place it in the local directory and start using it. I feel there must be a better way of doing this.
For example, it would be logical to have some "default"-directory in which julia can always look for library files. That folder could reside under my home directory, or (perhaps even better) somewhere under e.g. /usr/share/lib on an Ubuntu system.
Also, downloading the libraries directly seems to me like something I should be able to avoid. Isn't it possible to find libraries like these in some sort of packaging system (be it via Ubuntu's apt-get or something else)?
I do realize that many of these questions and concerns may be just because julia is a young language, that most of these features are missing because of this, and that there are plans (or at least wishes) to go in this direction in the future. However, it would be nice to know if I'm just missing something obvious =)
That tutorial on Forio is ancient. There's a newer, much better package system as of version 0.1 of Julia. See the documentation here: http://docs.julialang.org/en/release-0.1/manual/packages/
I am looking for a solution that allows me to keep a track of a multitude of R scripts that I create for various projects and purposes. Some scripts are easily tracked to specific projects, whereas others are "convenience" functions created to serve a set of tasks.
Is there a way I can create a central DB and query it to find which scripts match most appropriately?
I could create a system using a DBMS manually, but are users aware of anything in general or specific to R, that comes in the form of a software tool (maybe FOSS) ?
EDIT: Thank you for the responses. My current system is just a set of scripts with comments that allow me to identify their intended task. Though I use StatET with SVN, I would like a search utility along the lines of the "sos" package.
The question
I am looking for a solution that allows me to keep a track of a multitude of R scripts
that I create for various projects and purposes. Some scripts are easily tracked to specific
projects, whereas others are "convenience" functions created to serve a set of tasks.
fails to address the obvious follow-up of why the existing mechanism is not suitable:
Create a local package for each project
Create one or more local packages for local utility functions
Use R's already existing mechanisms for searching, indexing, testing, cross-referencing
And use any revision control system of your liking, local or on the web, to host the code for 1. to 3. above.
Reinventing an RDBMS schema for 1. to 3. is just wrong in my book. But if you must, go ahead and replicate what you can already (mostly) get for free in tested and widely used code.
R comes with several mechanisms for searching for help, most of which naturally use CRAN. Some examples: the sos package, cranberries, crantastic, and rseek. In many cases, these could be adapted to use a local repository (you can find out how to create a local repository in the R manual, which is very easy to do). Otherwise, if you package your scripts and submit them to CRAN, you will naturally have these available to you. I would also highly recommend this presentation on the subject: Creating R Packages, Using CRAN, R-Forge, And Local R Archive Networks And Subversion (SVN) Repositories from Spencer Graves and Sundar Dorai-Raj.
These would require you to put your code in packages, and create documentation, all of which is worth doing anyway. The package documentation turns out to be very useful for both documenting what things do, and helping your find them in the future. You can use roxygen to create this documentation in-line with your code. Also read this related question: Organizing R Source Code.
Alternatively, the help.search() function can be very useful for searching local packages, regardless of whether you have a repository set up.
You'd probably be best working with a version control system. Many can be indexed and be made search-able. At my work, a stack of R, Eclipse, StatET, Subversion and Subclipse works very well for us.