Installing, Loading Packages & Verifying these actions. Path, Lib, Dependencies, etc - r

This is one of the more comprehensive threads and discussions on these topics I have located to date.
How to find out which package version is loaded in R?
Nonetheless, I am finding this has not provided me with sufficient information to ensure I have installed and loaded the two packages I must have before I can begin to expect R to function properly. These packages are: Rserve (1.8-0) and MASS (7.3-45).
It seems answers to these topics can be application dependent or, perhaps purpose driven is a better phrase to use. Unfortunately, the R documentation will confuse you so, I thought it best to solicit the input of more experienced users.
I am working on a personal laptop with R 3.5.1 and Win7 Pro. It is clear from the r documentation that Windows is not the best or preferred environment for R.
despite a lot of work something remains missing and I have been unable to identify what it is.

Related

Correct way to create a software install script which can manage dependencies

I'm currently working on an university research related software which uses statistical models in it in order to process some calculations around Item Response Theory. The entire source code was written in Go, whereas it communicates with a Rscript server to run scripts written in R and return the generated results. As expected, the software itself has some dependencies needed to work properly (one of them, as seen before, is to have R/Rscript installed and some of its packages).
Due to the fact I'm new to software development, I can't find a proper way to manage all these dependencies on Windows or Linux (but I'm prioritizing Windows right now). What I was thinking is to have a kind of script which checks if [for example] R is properly installed and, if so, if each used package is also installed. If everything went well, then the software could be installed without further problems.
My question is what's the best way to do anything like that and if it's possible to do the same for other possible dependencies, such as Python, Go and some of its libraries. I'm also open to hear suggestions if installing programming languages locally on the machine isn't the proper way to manage software dependencies, or if there's a most convenient way to do it aside from creating a script.
Sorry if any needed information is missing, I would also like to know.
Thanks in advance

Advice needed for R package security in production

I am working as a Data Scientist for a small start up and we are using R as part of our platform for analysis, dashboards etc. Therefore, I need to ensure that we maintain security with each package we use and load.
I have looked around and done extensive searching and have come across the following links:
This is the official R Studio Blog Security update page.
This blog post shows how you can implement rJava to help with those packages that require it, though it does state that '...the integrity & safety of the R package ecosystem is still in the “trust me, everything’s 👍!!”'
This post gives some good advice for package security, but basically boils down to: if you get it from CRAN or another trusted source then it should be ok.
The CVE site lists vulnerabilities, though the last one was back in 2017.
However, all the above links essentially say the same thing, which is "if its from CRAN (or similar), then it is probably fine". Now this might indeed be the case, but I was hoping for something a bit more rigorous. Has anyone else come across this issue with production R deployment?
If possible, if someone could direct to where I might be able to find out more information on checking for security updates, breaches and changes for R packages, or how to go about testing the security myself, I would be very grateful.
Thanks!

Centralizing libraries in julia

I've long thought about learing julia - a language I secretly hope will become the new standard for scientific computing - and when it is now packaged and included in the standard Ubuntu repositories, I figured it was time. I quickly found this tutorial and started hacking...
In the linked chapter, one is urged to download a library called ols.jl from a Github repository, place it in the local directory and start using it. I feel there must be a better way of doing this.
For example, it would be logical to have some "default"-directory in which julia can always look for library files. That folder could reside under my home directory, or (perhaps even better) somewhere under e.g. /usr/share/lib on an Ubuntu system.
Also, downloading the libraries directly seems to me like something I should be able to avoid. Isn't it possible to find libraries like these in some sort of packaging system (be it via Ubuntu's apt-get or something else)?
I do realize that many of these questions and concerns may be just because julia is a young language, that most of these features are missing because of this, and that there are plans (or at least wishes) to go in this direction in the future. However, it would be nice to know if I'm just missing something obvious =)
That tutorial on Forio is ancient. There's a newer, much better package system as of version 0.1 of Julia. See the documentation here: http://docs.julialang.org/en/release-0.1/manual/packages/

Are there features of R that are system-dependent?

My co-workers would like to make sure that our work in R is platform-independent, specifically that code will run on Linux, Mac, and Windows, and that files created on one system will work on other systems.
Since the issue has come up before in my group, I would appreciate a general answer that will make it easier for me to confidently assure my collaborators that there will not be an issue. E.g., it would help to have a reference other than "because (subject matter expert) said so on SO".
Generally, is there a way to know if any features of R are platform-specific (can I assume that this would be stated in a function's help)?
Are there packages or functions that I can be confident will be platform-independent?
Are there types of packages or functions that I should be wary of?
I have previously asked two questions about the cross-platform readability of files created by R: What are the disadvantages of using .Rdata files compared to HDF5 or netCDF? and Are R objects dumped using `dump` readable cross-platform?
Besides Carl's answer, the obvious way to ensure that your work in platform-independent is to test on all platforms.
Which is precisely what CRAN does with its 3800+ packages, and you have access to logs here.
In short, R really tries hard to be platform-independent, and mostly succeeds. To do so with your code, it is up to you to avoid APIs or tools which introduce dependencies. Look at abstractions like system.file(package="boot") and the functions they use---you can easily abstract file-system "roots", and separators are already taken care of.
Check cran.r-project.org for package listings. Every package has a page which will tell you if it's passed testing for different operating systems. Further, as you suggested, the help files are pretty explicit about OS dependencies.
R is "smart" enough to translate "/" to "\" in pathnames for those poor folks working in Windows.
Generally speaking, graphics access is the area most likely to have platform dependencies. Obviously if you system lacks {X11, ImageMagick, ..} you're stuck anyway.
Besides Carl's and Dirk's comments, you should understand that any package that requires compilation from source (as do many (all?) packages that are on Omegahat, Rforge or r-forge) will need to be done on a machine that has the proper C and Fortran libraries. Some interesting packages depend on GTK+ and Tcl/Tk, and there may be a need to make sure you can get the right versions. The http://r.research.att.com/ page that Simon Urbanek maintains is a useful resource for keeping up with supporting resources for Macs.

How to keep abreast of known bugs and bug fixes in R packages?

Is there a standard R community resource for keeping up to date on known bugs or bug fixes for packages? My current approach is rather manual. (NB: I'm restricting this to CRAN - see Note 1.)
My use case is basically bug surveillance and the management of package updates. I've been averaging a couple of bug discoveries each month for awhile (which I duly report to the authors ;-)). Since a lot of my work is done with virtual machines, I tend to update the VM images when I have a good handle on the bug status for necessary packages. When a bunch of bugs are fixed, I can remove my workarounds, which is great, and I update the images. When I discover an outbreak of bugs, I don't create a new image.
Here are the sources I'm currently using:
NEWS files: Many, but not all, packages have NEWS files. These are certainly a helpful place to start.
Package home page: Some packages do not have a NEWS file on CRAN, but separately post a change log on the author's site.
R project-hosted mailing lists
Google Groups for packages
Personal communication with package authors
Bug tracking for packages (e.g. a developer may use Bugzilla)
It's one thing to be the first to discover a bug (I grant that bugs happen to all of us), it's another to belatedly discover a bug that is either already known or, better yet, already fixed. Both slow down my own coding, but better bug surveillance (maybe we need a cdc4R package :)) would significantly reduce the impact. Without a standard update alerting system (e.g. an extension to update.packages() that reports on which packages could be updated and links to info on what's changed), it's the user's job to seek out this information.
As such a user, trying to seek out this information, is there some standard resource I've overlooked in the list above? For instance, is there an R mailing list where it's common for developers to post their changes and bug fixes? Or is there a site that aggregates such posts, posts tests (CRAN posts R CMD CHECK output, it seems), or that gives some other feedback?
A few additional notes on other resources, for others' benefit:
I see that CRANberries has a terse diff summary on packages, which is new to me. (I am inspired to consider a grep for bug or fix in the diff output.)
bug.report() in R is a good way to send a message to R Core or the email address of a package maintainer.
Several testing packages worth consideration are: testthat, RUnit, and svUnit.
My personal "quick test" is to simply use digest to verify that results match, without having to test equality of very large objects.
Note 1: I'm tagging this cran because it's impossible to manage the universe of all R packages. For an individual package author, one can distribute a package wherever they'd like, use whatever mailing list or bug tracking system they like, etc. However, that's outside the "mainstream" for R. Were I to release a package and alert users to changes, bugs, bugfixes, I'd go with CRAN + NEWS + Bugzilla + Google Groups + R-Forge (and/or RForge), etc., but is there another standard reporting mechanism that is missing from this list?
In some sense, this note also serves to ask if there's a mechanism that developers are encouraged to use. I suspect there is no standard, as packages by R Core members seem to do many different things regarding bug and change reporting.
Note 2: I'm also adding administration (though something else may be more apropos), since this also relates to administering R. For reproducibility, administration of packages is quite important; when there are multiple users or more moving pieces, keeping aware of bugs and fixes becomes an administrative task, as well as an important consideration for development that depends on the external packages. If another tag, e.g. system-administration is more appropriate, I'm open to a change.
Not a complete answer but here are some thoughts.
In the case of data.table we track bugs (and feature requests) on R-Forge here. I imagine you could query R-Forge's tracker (programatically) for all packages hosted there. To add to your list anyway. That web tracker is where bug.report(package="data.table") points to (not just an email address to maintainer).
Also, anyone can subscribe to any <pkgname>-commits#lists.r-forge.r-project.org mailing list to receive a unified diff and commit message (at the time of commit) for each project on R-Forge. I'm not aware of a general mailing list spanning any commit to any R-Forge project, though.
At the top of ?data.table there is a link to up to the minute NEWS. This is how we communicate to users what is in the latest version (and in development) if they upgrade. That link updates in real-time; i.e., "up to the minute" is meant literally. But, they do have to check there!

Resources