My co-workers would like to make sure that our work in R is platform-independent, specifically that code will run on Linux, Mac, and Windows, and that files created on one system will work on other systems.
Since the issue has come up before in my group, I would appreciate a general answer that will make it easier for me to confidently assure my collaborators that there will not be an issue. E.g., it would help to have a reference other than "because (subject matter expert) said so on SO".
Generally, is there a way to know if any features of R are platform-specific (can I assume that this would be stated in a function's help)?
Are there packages or functions that I can be confident will be platform-independent?
Are there types of packages or functions that I should be wary of?
I have previously asked two questions about the cross-platform readability of files created by R: What are the disadvantages of using .Rdata files compared to HDF5 or netCDF? and Are R objects dumped using `dump` readable cross-platform?
Besides Carl's answer, the obvious way to ensure that your work in platform-independent is to test on all platforms.
Which is precisely what CRAN does with its 3800+ packages, and you have access to logs here.
In short, R really tries hard to be platform-independent, and mostly succeeds. To do so with your code, it is up to you to avoid APIs or tools which introduce dependencies. Look at abstractions like system.file(package="boot") and the functions they use---you can easily abstract file-system "roots", and separators are already taken care of.
Check cran.r-project.org for package listings. Every package has a page which will tell you if it's passed testing for different operating systems. Further, as you suggested, the help files are pretty explicit about OS dependencies.
R is "smart" enough to translate "/" to "\" in pathnames for those poor folks working in Windows.
Generally speaking, graphics access is the area most likely to have platform dependencies. Obviously if you system lacks {X11, ImageMagick, ..} you're stuck anyway.
Besides Carl's and Dirk's comments, you should understand that any package that requires compilation from source (as do many (all?) packages that are on Omegahat, Rforge or r-forge) will need to be done on a machine that has the proper C and Fortran libraries. Some interesting packages depend on GTK+ and Tcl/Tk, and there may be a need to make sure you can get the right versions. The http://r.research.att.com/ page that Simon Urbanek maintains is a useful resource for keeping up with supporting resources for Macs.
Related
Goal
I would like to have my Common Lisp (SBCL + GNU Emacs + Slime) environment be sort of like a Smalltalk image in that I want to have a big ball of mud of all my code organized in packages and preferably projects. In other words I have messed about a bit with save-lisp-and-die and setting Lisp in Emacs to bring up the saved image. Where I get lost is the appropriate way to make it work with Swank.
Problem
I believe it is required to put swank hooks inside my Lisp image before save-lisp-and-die. But it seems a bit fragile as on change to either my SBCL version or Slime version it seems to throw a version mismatch.
Question
Am I missing something? Do people work this way or tend to be more separate project as a loadable set of packages under ASDF?
I really miss the Smalltalk way and feel like per project ASDF is a bit clunkier and more rooted in the file system. In comparison it reminds me too much of every other language and their app/project orientation. OTOH it seem a bit more stable-ish re-versions of depended upon packages. Well, the entire versioning hell across languages is another matter.
Any hints how to do what I want or why it isn't such a good idea would be much appreciated.
Images
Common Lisp implementations like SBCL support images. The idea of saved memory appeared early in Lisp in the 60s.
Smalltalk took that idea from Lisp. In many Smalltalk implementations images might be portable (OS, runtime, ...) - especially when using machine independent byte code. SBCL OTOH compiles to native machine code.
Managed source code
Smalltalk added the idea of managed source code. Smalltalk often uses a simple database plus a change log to store source code. One Lisp doing something similar was Xerox Interlisp - but with slightly different approaches.
Other Lisp implementations / IDEs don't support managed source code that way - only the Xerox Interlisp variants - AFAIK.
DEFSYSTEM
In Common Lisp the use of defsystem facilities like ASDF and IDEs like GNU Emacs + SLIME is much more file system based. Code resides in multiple systems, which are files in a directory with a system description.
It's not even clear that it's meaningful to load a newer version of a system into a Lisp system where an older version is loaded. One might be able to arrange that, but there is nothing preventing me from messing that up.
Updating Lisp
Updating a Lisp like SBCL from one version to another might
make the saved image incompatible to the runtime
make the compiled code in FASL files incompatible with the runtime
You might save an image with the runtime included/bundled. That way you have the right combination of image and runtime.
But when you update the runtime, you usually/often need to regenerate a new compatible images with your code loaded.
Since SBCL brings releases once a month, there is a temptation to update regularly. Other implementations might use different strategies: LispWorks is an example. LispWorks is released much less often and publishes patches between releases, which are loaded into the released version.
Updating SLIME
I have no idea if it would be possible to update a loaded SLIME (a SLIME which has been already loaded in an earlier version into a Lisp system) by loading a new version on top. Probably a good idea to check with the SLIME maintainers.
I've long thought about learing julia - a language I secretly hope will become the new standard for scientific computing - and when it is now packaged and included in the standard Ubuntu repositories, I figured it was time. I quickly found this tutorial and started hacking...
In the linked chapter, one is urged to download a library called ols.jl from a Github repository, place it in the local directory and start using it. I feel there must be a better way of doing this.
For example, it would be logical to have some "default"-directory in which julia can always look for library files. That folder could reside under my home directory, or (perhaps even better) somewhere under e.g. /usr/share/lib on an Ubuntu system.
Also, downloading the libraries directly seems to me like something I should be able to avoid. Isn't it possible to find libraries like these in some sort of packaging system (be it via Ubuntu's apt-get or something else)?
I do realize that many of these questions and concerns may be just because julia is a young language, that most of these features are missing because of this, and that there are plans (or at least wishes) to go in this direction in the future. However, it would be nice to know if I'm just missing something obvious =)
That tutorial on Forio is ancient. There's a newer, much better package system as of version 0.1 of Julia. See the documentation here: http://docs.julialang.org/en/release-0.1/manual/packages/
I am looking for a solution that allows me to keep a track of a multitude of R scripts that I create for various projects and purposes. Some scripts are easily tracked to specific projects, whereas others are "convenience" functions created to serve a set of tasks.
Is there a way I can create a central DB and query it to find which scripts match most appropriately?
I could create a system using a DBMS manually, but are users aware of anything in general or specific to R, that comes in the form of a software tool (maybe FOSS) ?
EDIT: Thank you for the responses. My current system is just a set of scripts with comments that allow me to identify their intended task. Though I use StatET with SVN, I would like a search utility along the lines of the "sos" package.
The question
I am looking for a solution that allows me to keep a track of a multitude of R scripts
that I create for various projects and purposes. Some scripts are easily tracked to specific
projects, whereas others are "convenience" functions created to serve a set of tasks.
fails to address the obvious follow-up of why the existing mechanism is not suitable:
Create a local package for each project
Create one or more local packages for local utility functions
Use R's already existing mechanisms for searching, indexing, testing, cross-referencing
And use any revision control system of your liking, local or on the web, to host the code for 1. to 3. above.
Reinventing an RDBMS schema for 1. to 3. is just wrong in my book. But if you must, go ahead and replicate what you can already (mostly) get for free in tested and widely used code.
R comes with several mechanisms for searching for help, most of which naturally use CRAN. Some examples: the sos package, cranberries, crantastic, and rseek. In many cases, these could be adapted to use a local repository (you can find out how to create a local repository in the R manual, which is very easy to do). Otherwise, if you package your scripts and submit them to CRAN, you will naturally have these available to you. I would also highly recommend this presentation on the subject: Creating R Packages, Using CRAN, R-Forge, And Local R Archive Networks And Subversion (SVN) Repositories from Spencer Graves and Sundar Dorai-Raj.
These would require you to put your code in packages, and create documentation, all of which is worth doing anyway. The package documentation turns out to be very useful for both documenting what things do, and helping your find them in the future. You can use roxygen to create this documentation in-line with your code. Also read this related question: Organizing R Source Code.
Alternatively, the help.search() function can be very useful for searching local packages, regardless of whether you have a repository set up.
You'd probably be best working with a version control system. Many can be indexed and be made search-able. At my work, a stack of R, Eclipse, StatET, Subversion and Subclipse works very well for us.
I'm now reading some books of R, but I want to know if I can use this language as I use Perl or Ruby. Things like:
Image Processing
File Compression
Use APIs
Interact With Internet
But it's usual and simple(as in Perl or Ruby) to do things like this?
PS: I liked this language very much, because of this I want to use it on my personal projects and spread it for my friends and at the internet.
The CRAN Task Views are reasonable starting points. So in order
Image processing: see Graphics and MedicalImaging
File compression: accessible from Base R, so try help(connection)
Use APIs: you will need to ask that question again, if you mean language bindings: yes, plenty, though no one single page for all
Interact with Internet: see above on help(connection), there are also packages that wrap curl, provide SOAP and of course the XML package.
Edit: And I forgot to stress that R as a statistical language and environment is more domain-specific than either Ruby or Python so the comparisons aren't entirely appropriate. But you can also code Gtk2 guis in R if you feel like it...
One thing I've always wondered about is how software patches work. A lot of software seems to just release new versions on their binaries that need to be installed over older versions, but some software (operating systems like Windows in particular) seem to be able to release very small patches that correct bugs or add functionality to existing software.
Most of the time the patches I see can't possibly replace entire applications, or even small files that are used within applications. To me it seems like the actual binary is being modified.
How are these kinds of patches actually implemented? Could anyone point me to any resources that explain how this works, or is it just as simple as replacing small components such as linked libraries in an application?
I'll probably never need to do a deployment in this manner, but I am curious to find out how it works. If I'm correct in my understanding that patches can really modify only portions of binary files, is this possible to do in .NET? If it is I'd like to learn it since that's the framework I'm most familiar with and I'd like to understand how it works.
This is usually implemented using binary diff algorithms -- diff the most recently released version against the new code. If the user's running the most recent version, you only need to apply the diff. Works particularly well against software, because compiled code is usually pretty similar between versions. Of course, if the user's not running the most recent version you'll have to download the whole thing anyway.
There are a couple implementations of generic binary diff algorithms: bsdiff and xdelta are good open-source implementations. I can't find any implementations for .NET, but since the algorithms in question are pretty platform-agnostic it shouldn't be too difficult to port them if you feel like a project.
If you are talking about patching windows applications then what you want to look at are .MSP files. These are similar to an .MSI but just patch and application.
Take a look at Patching and Upgrading in the MSDN documents.
What an .MSP files does is load updated files to an application install. This typically is updated dll's and resource files, but could include any file.
In addition to patching the installed application, the repair files located in C:\WINDOWS\Installer are updated as well. Then if the user selects "Repair" from Add / Remove programs the updated patch files are used as well.
I'm thinking that the binary diff method discussed by John Millikin must be used in other operating systems. Although you could make it work in windows it would be somewhat alien.