Advice needed for R package security in production - r

I am working as a Data Scientist for a small start up and we are using R as part of our platform for analysis, dashboards etc. Therefore, I need to ensure that we maintain security with each package we use and load.
I have looked around and done extensive searching and have come across the following links:
This is the official R Studio Blog Security update page.
This blog post shows how you can implement rJava to help with those packages that require it, though it does state that '...the integrity & safety of the R package ecosystem is still in the “trust me, everything’s 👍!!”'
This post gives some good advice for package security, but basically boils down to: if you get it from CRAN or another trusted source then it should be ok.
The CVE site lists vulnerabilities, though the last one was back in 2017.
However, all the above links essentially say the same thing, which is "if its from CRAN (or similar), then it is probably fine". Now this might indeed be the case, but I was hoping for something a bit more rigorous. Has anyone else come across this issue with production R deployment?
If possible, if someone could direct to where I might be able to find out more information on checking for security updates, breaches and changes for R packages, or how to go about testing the security myself, I would be very grateful.
Thanks!

Related

Correct way to create a software install script which can manage dependencies

I'm currently working on an university research related software which uses statistical models in it in order to process some calculations around Item Response Theory. The entire source code was written in Go, whereas it communicates with a Rscript server to run scripts written in R and return the generated results. As expected, the software itself has some dependencies needed to work properly (one of them, as seen before, is to have R/Rscript installed and some of its packages).
Due to the fact I'm new to software development, I can't find a proper way to manage all these dependencies on Windows or Linux (but I'm prioritizing Windows right now). What I was thinking is to have a kind of script which checks if [for example] R is properly installed and, if so, if each used package is also installed. If everything went well, then the software could be installed without further problems.
My question is what's the best way to do anything like that and if it's possible to do the same for other possible dependencies, such as Python, Go and some of its libraries. I'm also open to hear suggestions if installing programming languages locally on the machine isn't the proper way to manage software dependencies, or if there's a most convenient way to do it aside from creating a script.
Sorry if any needed information is missing, I would also like to know.
Thanks in advance

Issue with installing postnuke

I need an old copy of the software Postnuke. I’m aware it’s outdated and discontinued but need to use it locally to use & convert a site which used to use this software.
I managed to find it using SourceForge (the 0.76 version) but it keeps hanging on the installation and I’m getting errors that don’t seem fixable to me on the step of inserting data (around 80%).
If any of the devs are around I’d really appreciate any assistance they could give me on how to get the “Set Login” stage working of the installer, specifically the start_postnuke() function because it’s missing the language and other variables from the PNconfig variable that are preventing it from installing.
I’m aware this is tagged as a Zikula question but it’s the only way I can find to try and contact who I assume are the developers of Postnuke.
You are right. Postnuke is dead. It died so long ago that nobody has any expertise. I doubt very much that installing the software is possible or truly necessary. You must have a database with info you are trying to access. Simply access it with whatever tools you are most comfortable with and pull and modify the data as needed. (fyi - I'm a former postnuke dev and current zikula dev. I've used PN since 0.62, so I know what I'm talking about).
If you really want to give it a go on getting a working installation I would recommend using the same server stack components that were "modern" at the time 0.76 was released. Apache, php, mysql. It will probably work then.
Since that time a lot of php functions have been made obsolete, and even syntax changed such as array shorthand notation.
But if you use a stack that's contemporary to that version, it should work.

What is the best practice for handling composer abandoned packages?

When I run composer updates I'll occasionally get messages that packages are abandoned and I should use a different one instead, like Package webflo/drupal-core-require-dev is abandoned, you should avoid using it. Use drupal/core-dev instead. I don't have experience with Composer so I'm curious as to what is seen as the best practice for replacing outdated packages.
Where do these messages come from? I'm unsure if the source is always reliable.
I think the best practice is quite clear from the message "you should avoid using it". How/When to do this is not as clear. Abandoned packages will not receive updates, but composer will not be able to tell you how difficult it will be to transition to the recommended alternative. It might be that all you have to do is replace the package, because it was only a name change or having to modify your code as well.
In your case webflo/drupal-core-require-dev only contains a composer.json and the required packages match with what the alternative drupal/core-dev provides. That means replacing the package should be as easy as changing the name in your composer.json and then do a composer update drupal/core-dev.
For packages where the answer is not as straightforward, you have to rely on automated/manual tests to see if everything still works. Static code analysis tools might help as well. You will have to set them up before you do the change, so that you can see how their output differs and fix the new issues that come up.
You should do the switch to the new dependency as early as possible. Leaving it in will likely cause more work in the future when replacing it and might pose a security risk (if it is outdated and insecure). I understand that this is not always possible and using something like roave/security-advisories to tell you when there are known security issues in a package might help postponing it and giving some sense of security.

How to keep abreast of known bugs and bug fixes in R packages?

Is there a standard R community resource for keeping up to date on known bugs or bug fixes for packages? My current approach is rather manual. (NB: I'm restricting this to CRAN - see Note 1.)
My use case is basically bug surveillance and the management of package updates. I've been averaging a couple of bug discoveries each month for awhile (which I duly report to the authors ;-)). Since a lot of my work is done with virtual machines, I tend to update the VM images when I have a good handle on the bug status for necessary packages. When a bunch of bugs are fixed, I can remove my workarounds, which is great, and I update the images. When I discover an outbreak of bugs, I don't create a new image.
Here are the sources I'm currently using:
NEWS files: Many, but not all, packages have NEWS files. These are certainly a helpful place to start.
Package home page: Some packages do not have a NEWS file on CRAN, but separately post a change log on the author's site.
R project-hosted mailing lists
Google Groups for packages
Personal communication with package authors
Bug tracking for packages (e.g. a developer may use Bugzilla)
It's one thing to be the first to discover a bug (I grant that bugs happen to all of us), it's another to belatedly discover a bug that is either already known or, better yet, already fixed. Both slow down my own coding, but better bug surveillance (maybe we need a cdc4R package :)) would significantly reduce the impact. Without a standard update alerting system (e.g. an extension to update.packages() that reports on which packages could be updated and links to info on what's changed), it's the user's job to seek out this information.
As such a user, trying to seek out this information, is there some standard resource I've overlooked in the list above? For instance, is there an R mailing list where it's common for developers to post their changes and bug fixes? Or is there a site that aggregates such posts, posts tests (CRAN posts R CMD CHECK output, it seems), or that gives some other feedback?
A few additional notes on other resources, for others' benefit:
I see that CRANberries has a terse diff summary on packages, which is new to me. (I am inspired to consider a grep for bug or fix in the diff output.)
bug.report() in R is a good way to send a message to R Core or the email address of a package maintainer.
Several testing packages worth consideration are: testthat, RUnit, and svUnit.
My personal "quick test" is to simply use digest to verify that results match, without having to test equality of very large objects.
Note 1: I'm tagging this cran because it's impossible to manage the universe of all R packages. For an individual package author, one can distribute a package wherever they'd like, use whatever mailing list or bug tracking system they like, etc. However, that's outside the "mainstream" for R. Were I to release a package and alert users to changes, bugs, bugfixes, I'd go with CRAN + NEWS + Bugzilla + Google Groups + R-Forge (and/or RForge), etc., but is there another standard reporting mechanism that is missing from this list?
In some sense, this note also serves to ask if there's a mechanism that developers are encouraged to use. I suspect there is no standard, as packages by R Core members seem to do many different things regarding bug and change reporting.
Note 2: I'm also adding administration (though something else may be more apropos), since this also relates to administering R. For reproducibility, administration of packages is quite important; when there are multiple users or more moving pieces, keeping aware of bugs and fixes becomes an administrative task, as well as an important consideration for development that depends on the external packages. If another tag, e.g. system-administration is more appropriate, I'm open to a change.
Not a complete answer but here are some thoughts.
In the case of data.table we track bugs (and feature requests) on R-Forge here. I imagine you could query R-Forge's tracker (programatically) for all packages hosted there. To add to your list anyway. That web tracker is where bug.report(package="data.table") points to (not just an email address to maintainer).
Also, anyone can subscribe to any <pkgname>-commits#lists.r-forge.r-project.org mailing list to receive a unified diff and commit message (at the time of commit) for each project on R-Forge. I'm not aware of a general mailing list spanning any commit to any R-Forge project, though.
At the top of ?data.table there is a link to up to the minute NEWS. This is how we communicate to users what is in the latest version (and in development) if they upgrade. That link updates in real-time; i.e., "up to the minute" is meant literally. But, they do have to check there!

Document/Scripts management for R code

I am looking for a solution that allows me to keep a track of a multitude of R scripts that I create for various projects and purposes. Some scripts are easily tracked to specific projects, whereas others are "convenience" functions created to serve a set of tasks.
Is there a way I can create a central DB and query it to find which scripts match most appropriately?
I could create a system using a DBMS manually, but are users aware of anything in general or specific to R, that comes in the form of a software tool (maybe FOSS) ?
EDIT: Thank you for the responses. My current system is just a set of scripts with comments that allow me to identify their intended task. Though I use StatET with SVN, I would like a search utility along the lines of the "sos" package.
The question
I am looking for a solution that allows me to keep a track of a multitude of R scripts
that I create for various projects and purposes. Some scripts are easily tracked to specific
projects, whereas others are "convenience" functions created to serve a set of tasks.
fails to address the obvious follow-up of why the existing mechanism is not suitable:
Create a local package for each project
Create one or more local packages for local utility functions
Use R's already existing mechanisms for searching, indexing, testing, cross-referencing
And use any revision control system of your liking, local or on the web, to host the code for 1. to 3. above.
Reinventing an RDBMS schema for 1. to 3. is just wrong in my book. But if you must, go ahead and replicate what you can already (mostly) get for free in tested and widely used code.
R comes with several mechanisms for searching for help, most of which naturally use CRAN. Some examples: the sos package, cranberries, crantastic, and rseek. In many cases, these could be adapted to use a local repository (you can find out how to create a local repository in the R manual, which is very easy to do). Otherwise, if you package your scripts and submit them to CRAN, you will naturally have these available to you. I would also highly recommend this presentation on the subject: Creating R Packages, Using CRAN, R-Forge, And Local R Archive Networks And Subversion (SVN) Repositories from Spencer Graves and Sundar Dorai-Raj.
These would require you to put your code in packages, and create documentation, all of which is worth doing anyway. The package documentation turns out to be very useful for both documenting what things do, and helping your find them in the future. You can use roxygen to create this documentation in-line with your code. Also read this related question: Organizing R Source Code.
Alternatively, the help.search() function can be very useful for searching local packages, regardless of whether you have a repository set up.
You'd probably be best working with a version control system. Many can be indexed and be made search-able. At my work, a stack of R, Eclipse, StatET, Subversion and Subclipse works very well for us.

Resources