permissions for installing packages on julia in slurm cluster - julia

I've just installed julia for usage on a slurm cluster. Running a hello world job works well, so the installation was successful ... until installing a first package which gives some permissions issues. Script with command
Pkg.add("MAT")
or
Pkg.installed()
gives error message
ERROR: LoadError: SystemError (with /home/<my_user_name>/.julia/logs): mkdir: Permission denied
The same error appears if I start up julia command line from the user directory. Such message disappears when starting julia using sudo, but obviously cannot sudo cluster jobs.
I tried installing the pkg with sudo on the user account, then just using it non-sudo, but other error messages arise similar to those documented here.
https://github.com/JuliaLang/julia/issues/12876
On this page it's indicated to chown user MAT.ji , but that does not work. I tried removing and re-add the package but I'm just running in circles with the same error messages. I also got, at one point, error messages like EACCES similar to documented here
https://discourse.julialang.org/t/juliapro-pkg-installation-error-ioerror-unlink-permission-denied-eacces/35912
I'm a novice with permissions issues like this, so I could use some guidance on how to approach this problem. I'm not sure what to try, and in what order.

Permissions issues on clusters can be tricky.
If we are talking about a physical cluster, the simplest generic solution that you can probably get to work without involving your sysadmin is probably to just install your .julia somewhere where just about every process has filesystem permissions. Namely, global networked scratch (wherever that exactly is on your cluster).
This is arguably a good idea anyways, given that global scratch tends to be the fastest or one of the fastest networked filesystems around on most clusters, and every julia process is going to have to read from .julia when you start a job, so if that's on a fast parallel filesystem, so much the better. On the other hand, scratch tends to have a time limit, so you might want to keep a local copy around for when scratch/<yourusername>/.julia inevitably gets deleted.
In order for this to work well, you have tell Julia so that it knows where to look for .julia, and not to just make a new one when it doesn't find it in the default location (~). One relatively simple way to do this is with environment variables. You could do this manually, but I recommend instead putting something like the following in your ~/.bash_profile.
# Some julia-specific environment variables
# To make sure I call the Julia installation in global scratch
export JULIA_DEPOT_PATH="/path/to/your/global/scratch/username/.julia"
export JULIA_PROJECT="/path/to/your/global/scratch/username/.julia/environments/v1.6/Project.toml"
export JULIA_LOAD_PATH="/path/to/your/global/scratch/username/.julia/environments/v1.6/Project.toml:/path/to/your/julia/application/folder/possibly/also/in/scratch/julia-1.6.1/share/julia/stdlib/v1.6" # The second one may be tricky to find if you're using a cluster-provided julia, but you can always just download the latest julia release yourself and plop it in scratch too
export JULIA_BINDIR="/path/to/your/julia/application/folder/possibly/also/in/scratch/julia-1.6.1/bin" # This last line may not be necessary, but doesn't hurt to specify.
where julia versions and the actual path to your global scratch folder are adjusted as appropriate.

Related

Non-manual solution to "cannot remove prior installation of package" when re-installing R packages

I recently began receiving warnings that prior installations of R packages cannot be removed when I try to re-install packages:
install.packages("gtools")
#> Warning: cannot remove prior installation of package ‘gtools’
#> Warning: restored ‘gtools’
I found solutions to this issue encouraging me to delete the packages manually from my library folder, which I could find with .libPaths(). However, (a) this seems like a way of addressing symptoms rather than the underlying issue (which remains unclear) and (b) there are two paths for seemingly different versions of R and I'm not sure which to delete from anyway:
.libPaths()
#> [1] "C:/Users/foo/Documents/R/win-library/4.1"
#> [2] "C:/Program Files/R/R-4.1.2/library"
How can I fix the problem so I don't have to manually delete package folders every time I want to re-install a package? If there is no alternative, do I need to delete the subdirectories for the package from one of those folders or both? FWIW, I'm working in RStudio.
The problem is that you have installed packages using different permissions. On Windows, you need elevated permissions to write to Program Files. At some point you (or an admin) probably used "Run as admin" to install gtools there, and now using regular permissions you can't delete that.
You should be able to delete the Users/foo copy, if you are running as user foo, but even that one may have had permissions changed. But I'd guess the issue is that gtools is in the Program files location.
The error message from R doesn't tell you which location it is trying to delete from, which is unfortunate. In fact, allowing installations of different versions in those two locations is a bad design feature in R that just leads to confusion, because you don't necessarily always use the same version each time you load packages. (The rule for which one you use is the first acceptable one found in the .libPaths list, but since you can change .libPaths, and since packages can load other packages, it's hard to predict which one you'll have loaded at any given time.)
To fix this, you can delete both copies (if you have two) and start over, but that's risky because other packages might be depending on gtools. If you are the only user on your computer, you could instead delete the entire "C:/Users/foo/Documents/R/win-library/4.1" library, and then do all your installs using "Run as admin", but that's also easy to mess up.
(On a Mac, that's effectively what happens, because most single user systems put the user in the "admin" group, so they can always install packages to the system location. It causes a lot less confusion, but some "purists" think the Windows way is better.)
So I don't have any good advice for you, but maybe this explains the situation, and you can work out for yourself the best way forward.

When should I restart R session, GUI or computer?

I use R, Rstudio and Rcpp and I spent over a week debugging some code, that was just giving errors and warnings in unexpected places, in some cases with direct sample code from online or package documentation.
I often restart the R session or Rstudio if there are obvious problems and they usually go away.
But this morning it was really bad to the point were basic R commands would fail and restarting R did nothing. I closed all the Rstudio sessions and restarted the machine for good measure, (which was unnecessary).
When it came back and I re-loaded the sessions everything seems to be working.
Even the some rcpp code I was working on for weeks with outside packages will now compile and run where it gave gibberish errors before.
I have known for a while that R needs to be restarted once in a while, but I know it when basic functions don't run, how can I know earlier.
I am looking for a good general resource or function that can tell me I need to restart because something is not running right. I would be nice if I can also know what to restart.
Whether the R session, the GUI such as Rstudio, all sessions and GUIs or a full machine restart.
For as long as I have been dabbling with or actually using R (ie more than two decades), it has always been recommended to start a clean and fresh session.
Which is why I prefer to work on command-line for tests. When you invoke R, or Rscript, or, in my case, r (from littler) you know you get a fresh session free of possible side-effects. By keeping these tests to the command-line, my main sessions (often multiple instances inside Emacs via ESS, possibly multiple RStudio sessions too) are less affected.
Even RStudio defaults to 'install and restart' when you rebuild a package.
(I will note that a certain development package implies you could cleanly unload a package. That has been debated at length, and I think by now even its authors qualify that claim. I don't really know or case as I don't use it, having had established workflows before it appeared.)
And to add: You almost never need to restart the computer. But a fresh clean process is a something to use often. Your computer can create millions of those for you.

R Server - Resuming R Session - message hanging or taking 15+min

I frequently work on a R-server environment. However, whenever come back to my work following the last working day, the system often gets stuck with 'resuming r session'. This might take upwards of 5-15min. I try to terminate R or restart R but often this doesn't really do anything.
I'm looking for a work-around as it is very frustrating to go to the R-server URL and to have to wait forever to get started again. IDEALLY, I'd be able to pick up right where I left off. However, if this can't be done, I guess that is ok….
I was looking around at the folder structure and I noticed that there is a folder called "Suspended-R-Session".
Within this folder are a few files such as:
'options',
'lib paths',
'history',
'environment_vars',
'environment',
and 'settings'.
Should I be deleting these files in order to speed up load time???
As described in this link https://support.rstudio.com/hc/en-us/community/posts/200638878-resuming-session-hangup, in my case for R version 3.5:
cd ~/.rstudio/sessions/active/session-45204d30
rm -rf suspended-session-data

Modifying R packages (snow)

Can anybody give me some direction on editing source code of an R package? From what I've seen, changing the package from within R does not seem to be possible. In editing outside of R, I'm stuck at unpacking the tar.gz. While I can now modify the function to my heart's content, the folder looks nothing like the working snow library. I presume I will need to turn the contents into a tar.gz once again and install it in the standard way?
My colleagues and I have been attempting to get makeSOCKcluster() to work with remote IPs for the past three days. Hangs indefinitely. After digging into the snow package I've found the issue to be in the way newSOCKnode() calls socketConnection(). If I run makeSOCKcluster("IP", manual=T) and then put the output into powershell, it results in the connection being made but the program not completing. However, I can run makeSOCKcluster("IP", manual=T) in one R instance and then run system("output", wait=F, input="") in another instance which results in the program completing. I believe I can simply modify snow to do this automatically.

Disabling the default library in R

The default R library, .Library, is normally not writeable under Windows.
You need to run R as Administrator. For new packages you can set and use a personal library, but this doesn't work when updating packages in the base setup (e.g. by update.packages()).
If you forget (or don't know you need) to run as Administrator, you get duplicate versions of the same packages, messing the installation.
I think one solution could be copying all packages to a personal library and disabling the default one. I know how to add a new library path to R, i.e. .libPaths ("my/path"), but how to remove the default library from .libPaths ()?
Update for non-Windows users
Some clarifications might help mostly non-Windows R users to understand the mentioned problem.
In Windows "Log on as Administrator" (or better as a user belonging to administrators' group) and "Run as Administrator" are quite different things.
In the former case you just give your credentials at logon, much like in Linux; in the latter you are already logged as a "superuser", but in order to carry out a potentially dangerous action, you have to ask an ad hoc permission to Windows (proving that it's you and not a malware acting).
That's said, programs (and developers), before accessing known Windows' protected objects (i.e. C:\Program Files folder), ask permission to the user to avoid being blocked by the OS.
Even when they don't ask (because they assume the knowledgeable user should give this permission in advance), failure to access is normally reported like "Permission denied to access to folder etc.".
As for R version 3.0.2, update.packages() involves one of the situations, where an elevated permission request should be triggered, because this might involve writing to protected program folders. Unfortunately R doesn't ask and cannot update the directory with old packages.
What about the second safe net: user notifications? While install.packages() gives messages like:
stop ... "'lib' element %s is not a writable directory" ...
and you get the idea of a permission problem, with others functions, such as update.packages(), you get:
warning ... "package '%s' in library '%s' will not be updated"
whose causes can be everything.
Can this scenario be even worse? Yes. Besides not asking for permission to write to "Program Folders", besides not issuing a notification of the permission error, update.packages(), when unable to update packages in protected folders, actually installs them to the personal user folder, but without notifying this. This is similar to what install.packages() does, except that the latter notifies and asks permission to do this.
So you end up with two versions of the same packages in different folders! Your calculations will be therefore dependent on library priorities.
Can this scenario be even worse? Yes. You are clever (or Google) enough to understand that you need to "Run as Administrator", when you want to update packages. You restart R as Administrator and hope this will fix everything. Not at all. R sees the updated packages in the personal library and does not act. So you remain with two versions of the same packages.
To solve this you have to detect duplicate packages and remove them manually, then restart R as administrator and update again (or write a script to do this).
Clearly the solution would be R conforming to Windows apps expected behaviour, or at least do nothing when prevented to act (instead of taking non-notified decisions).
In the meantime I think that totally disabling the default library (located in a protected area) would be a temporary workaround.
A final note. Packages and package updating are crucial for using R, so my humble opinion is that the topic should deserve specific careful attention even for less GNU-blessed systems like Windows.
One solution is to change R_LIBS environment variable. You can see for example this question.
But If you don't have admin rights, you can specify location when you load the package:
library(my_package, lib.loc="my/path")

Resources