Packages in default project - julia

I'm new to Julia.
Situation) I made a (local project, e.g., by Pkg.activate(".") and use a package which is installed in default project but not in the local one, i.e., using package_installed_only_in_the_default_one.
So I was confused.
Question) Are the packages in the default project shared and can be used in other projects?

The short answer to your question is yes, if you use a default value of LOAD_PATH variable.
Here is a more detailed explanation.
What you ask for is managed by the LOAD_PATH variable which specifies what using and import statements consider as project environments or package directories when loading code.
As you can read in the Julia manual entry on Environment stacks you have the following rules:
There are a couple of noteworthy features of this design:
The primary environment—i.e. the first environment in a stack—is faithfully embedded in a stacked environment. The full dependency graph of the first environment in a stack is guaranteed to be included intact in the stacked environment including the same versions of all dependencies.
Packages in non-primary environments can end up using incompatible versions of their dependencies even if their own environments are entirely compatible. This can happen when one of their dependencies is shadowed by a version in an earlier environment in the stack (either by graph or path, or both).
Now by default LOAD_PATH has value ["#", "#v#.#", "#stdlib"] which means that:
# refers to the "current active environment" (this is the primary environment mentioned above that you have activated)
#v#.# refers to the appropriate environment in ~/.julia/environments/ folder. The # characters, are replaced with the major and minor components of the Julia version number.
#stdlib expands to the absolute path of the current Julia installation's standard library directory.

Related

is it possible to create new virtual environment in Julia using environment file (like .yml)?

is there a way to setup a virtual environment in Julia using an environment file? (for instance like .yml file for creating conda venv)
Julia Version: 1.7.1
OS: Windows 10
In Julia virtual environment is defined via Project.toml file (that keeps package names and their acceptable versions) and Manifest.toml (that keeps the exact dependence tree and package versions generated along requirements defined in Project.toml).
Here is a sample Julia session:
julia> using Pkg
julia> pkg"generate MyProject"
Generating project MyProject:
MyProject/Project.toml
MyProject/src/MyProject.jl
julia> cd("MyProject")
julia> pkg"activate ."
Activating environment at `/home/ubuntu/MyProject/Project.toml`
Finally, note that you can manipulate the Project.toml by eg. adding a package like this (this assumes the environment is active):
pkg"add DataFrames"
Sometimes you want to provide package version information to your Project.toml, for an example you could add at the end of the file:
[compat]
DataFrames = "1.3.0"
After adding the first dependency the Mainifest.toml file has been generated. Copying this file together with Project.toml across machines allows you to duplicate the environment.
In order to have all packages installed on a new machine you will need to run:
pkg"activate ."
pkg"instatiate"
pkg"instatiate" can also be used to generate Mainfest.toml when only the Project.toml is present.
The nice thing is that Julia can store many package versions simultaneously and the virtual environments only links to the central package repository (contrary to Python that copies several GBs of data each time).
pkg provides this with project.toml files.
open julia in console
julia>]
Pkg> activate <Name>
<Name> Pkg> add <PackageName>
create directory with

How stop RStudio from creating empty "R" folder within "/home" directory at every startup

After having set the path for the default working directory as well as my first (and only) project within RStudio options I wonder why RStudio keeps creating an empty folder named "R" within my "/home" directory every time it is started.
Is there any file I could delete/edit (eventually create) to stop this annoying behaviour and if so, where is it located ?
System: Linux Mint v. 19.3
Software: RStudio v. 1.3.959 / R version 3.4.4
Thanks in advance for any hints.
Yes, you can prevent the creation of the R directory — R is configurable via a set of environment variables.
However, setting these correctly isn’t trivial. The first issue is that many R packages are sensitive to the R version they’re installed with. If you upgrade R and try to load the existing package, it may break. Therefore, the R package library path should be specific to the R version.
On clusters, an additional issue is that the same library path might be read by various cluster nodes that run on different architectures; this is rare, but it happens. In such cases, compiled R packages might need to be different depending on the architecture.
Consequently, in general the R library path needs to be specific both to the R version and the system architecture.
Next, even if you configure an alternative path R will silently ignore it if it doesn’t exist. So be sure to manually create the directory that you’ve configured.
Lastly, where to put this configuration? One option would be to put it into the user environment file, the path of which can be specified with the environment variable R_ENVIRON_USER — it defaults to $HOME/.Renviron. This isn’t ideal though, because it means the user can’t temporarily override this setting when calling R: variables in this file override the calling environment.
Instead, I recommend setting this in the user profile (e.g. $HOME/.profile). However, when you use a desktop launcher to launch your RStudio, this file won’t be read, so be sure to edit your *.desktop file accordingly.1
So in sum, add the following to your $HOME/.profile:
export R_LIBS_USER=${XDG_DATA_HOME:-$HOME/.local/share}/R/%p-library/%v
And make sure this directory exists: re-source ~/.profile (launching a new shell inside the current one is not enough), and execute
mkdir -p "$(Rscript -e 'cat(Sys.getenv("R_LIBS_USER"))')"
The above is using the XDG base dir specification, which is the de-facto standard on Linux systems.2 The path is using the placeholders %p and %v. R will fill these in with the system platform and the R version (in the form major.minor), respectively.
If you want to use a custom R configuration file (“user profile”) and/or R environment file, I suggest setting their location in the same way, by configuring R_PROFILE_USER and R_ENVIRON_USER (since their default location, once again, is in the user home directory):
export R_PROFILE_USER=${XDG_CONFIG_HOME:-$HOME/.config}/R/rprofile
export R_ENVIRON_USER=${XDG_CONFIG_HOME:-$HOME/.config}/R/renviron
1 I don’t have a Linux desktop system but I believe that editing the Env entry to the following should do it:
Exec=env R_LIBS_USER=${XDG_DATA_HOME:-$HOME/.local/share}/R/%p-library/%v /path/to/rstudio
2 Other systems require different handling. On macOS, the canonical setting for the library location would be $HOME/Library/Application Support/R/library/%v. However, setting environment variables on macOS for GUI applications is frustratingly complicated.
On Windows, the canonical location is %LOCALAPPDATA%/R/library/%v. To set this variable, use [Environment]::SetEnvironmentVariable in PowerShell or, when using cmd.exe, use setx.

When should a Julia project have a Manifest AND Project file?

I am trying to understand when a Julia Project needs a Manifest AND Project file vs when it just needs a project file. What are the different situations that warrant each case? I am trying to make sure my own project is set up correctly(It has both files currently).
The Manifest.toml is a snapshot of the exact state of a Julia environment. It specifies all packages that are installed in the environment with version numbers - not just the ones that have been ] added but the entire dependency graph!
The Project.toml on the other hand just lists the direct dependencies, that is the packages that have been ] added explicitly, potentially with version bounds specified in a [compat] section.
By checking in both files (specifically the Manifest.toml), you make your project reproducible. Another user only has to ] instantiate and will have the exact same environment that you had when working on the project. This is great for application projects which might consist of multiple Julia scripts which are not intended for use by other Julia projects.
If you only check in the Project.toml you are specifying the dependency information more loosely and will leave room for Julias resolver to find appropriate package versions for all dependencies. This is what you should do when working on a Julia package since people will likely want to install your package next to other packages and overly restricting versions of dependencies will make your package incompatible.
Hence, I'd summarize as follows:
Application / "Project" -> Project.toml + Manifest.toml
Julia Package -> Only Project.toml
For more on applications and packages checkout the glossary of the Pkg.jl documentation.
(Note that there are exceptional cases (unregistered dependencies, for example) where you might have to check in a Manifest.toml for a Julia package.)
In Julia 1.2 and above, you can have nested Project.toml files to express test-specific dependencies. Since you may have a Project.toml in your test folder, which you would need to activate, I would also suggest including a Manifest.toml, as a record of under which environment you for-sure know your package's test are passing.
In other words, I believe in the package/application dichotomy mentioned in crstnbr's answer, and the recommendation to include Manifest.toml with applications, and I would further say that the tests within a package are like an application. The same goes for performance benchmarks that you might have in your package.
I haven't practiced this myself, but it seems like it would be nice to have the CI tests run under both the "frozen" version of the test/Manifest.toml, and the latest versions that the package manager can find of each package. If the tests start failing, it would be easier to tease apart whether the breakage is caused by a change in a dependency.

Is an action necessary when Sys.getenv and .libPaths return different folders for library location?

Is an action necessary when Sys.getenv and .libPaths return different folders for library location?
Sys.getenv("R_LIBS_USER") # value of the environment variable R_LIBS_USER
[1] "C:\\Users\\User\\Documents/R/win-library/3.1"
.libPaths() # the library trees within which packages are looked for
[1] "C:/Revolution/R-Enterprise-7.3/R-3.1.1/library"
Looking at the help files of the functions, I added the above comments.
In such a case where there is a difference for the library location, is there a side effect in case no action is done?
Is there anything that I have to do (like setting library location above to be same) as a must?
I think the answer is "no". I discovered that my values are different and it has been causing me no difficulties. I prefer to keep all my packages in one library and never install to the default value from Sys.getenv("R_LIBS_USER") which is in my Users/ volume. My current .libPaths() are (is) :
"/Library/Frameworks/R.framework/Versions/3.3/Resources/library"
The GUI I use gives me the option of using the R_LIBS_USER location but I do not choose to do so because it has lead to duplication and confusion on my part in the past. You can make other choices, possibly in the 'preferences" for your GUI or optionally in your .Rprofile settings (which a hidden "dotfile" on both Windows and Macs so you need to know how to make them visible on your OS if using the system browser/explorer.)

Dependency management in R

Does R have a dependency management tool to facilitate project-specific dependencies? I'm looking for something akin to Java's maven, Ruby's bundler, Python's virtualenv, Node's npm, etc.
I'm aware of the "Depends" clause in the DESCRIPTION file, as well as the R_LIBS facility, but these don't seem to work in concert to provide a solution to some very common workflows.
I'd essentially like to be able to check out a project and run a single command to build and test the project. The command should install any required packages into a project-specific library without affecting the global R installation. E.g.:
my_project/.Rlibs/*
Unfortunately, Depends: within the DESCRIPTION: file is all you get for the following reasons:
R itself is reasonably cross-platform, but that means we need this to work across platforms and OSs
Encoding Depends: beyond R packages requires encoding the Depends in a portable manner across operating systems---good luck encoding even something simple such as 'a PNG graphics library' in a way that can be resolved unambiguously across systems
Windows does not have a package manager
AFAIK OS X does not have a package manager that mixes what Apple ships and what other Open Source projects provide
Even among Linux distributions, you do not get consistency: just take RStudio as an example which comes in two packages (which all provide their dependencies!) for RedHat/Fedora and Debian/Ubuntu
This is a hard problem.
The packrat package is precisely meant to achieve the following:
install any required packages into a project-specific library without affecting the global R installation
It allows installing different versions of the same packages in different project-local package libraries.
I am adding this answer even though this question is 5 years old, because this solution apparently didn't exist yet at the time the question was asked (as far as I can tell, packrat first appeared on CRAN in 2014).
Update (November 2019)
The new R package renv replaced packrat.
As a stop-gap, I've written a new rbundler package. It installs project dependencies into a project-specific subdirectory (e.g. <PROJECT>/.Rbundle), allowing the user to avoid using global libraries.
rbundler on Github
rbundler on CRAN
We've been using rbundler at Opower for a few months now and have seen a huge improvement in developer workflow, testability, and maintainability of internal packages. Combined with our internal package repository, we have been able to stabilize development of a dozen or so packages for use in production applications.
A common workflow:
Check out a project from github
cd into the project directory
Fire up R
From the R console:
library(rbundler)
bundle('.')
All dependencies will be installed into ./.Rbundle, and an .Renviron file will be created with the following contents:
R_LIBS_USER='.Rbundle'
Any R operations run from within this project directory will adhere to the project-speciic library and package dependencies. Note that, while this method uses the package DESCRIPTION to define dependencies, it needn't have an actual package structure. Thus, rbundler becomes a general tool for managing an R project, whether it be a simple script or a full-blown package.
You could use the following workflow:
1) create a script file, which contains everything you want to setup and store it in your projectd directory as e.g. projectInit.R
2) source this script from your .Rprofile (or any other file executed by R at startup) with a try statement
try(source("./projectInit.R"), silent=TRUE)
This will guarantee that even when no projectInit.R is found, R starts without error message
3) if you start R in your project directory, the projectInit.R file will be sourced if present in the directory and you are ready to go
This is from a Linux perspective, but should work in the same way under windows and Mac as well.

Resources