When should a Julia project have a Manifest AND Project file? - julia

I am trying to understand when a Julia Project needs a Manifest AND Project file vs when it just needs a project file. What are the different situations that warrant each case? I am trying to make sure my own project is set up correctly(It has both files currently).

The Manifest.toml is a snapshot of the exact state of a Julia environment. It specifies all packages that are installed in the environment with version numbers - not just the ones that have been ] added but the entire dependency graph!
The Project.toml on the other hand just lists the direct dependencies, that is the packages that have been ] added explicitly, potentially with version bounds specified in a [compat] section.
By checking in both files (specifically the Manifest.toml), you make your project reproducible. Another user only has to ] instantiate and will have the exact same environment that you had when working on the project. This is great for application projects which might consist of multiple Julia scripts which are not intended for use by other Julia projects.
If you only check in the Project.toml you are specifying the dependency information more loosely and will leave room for Julias resolver to find appropriate package versions for all dependencies. This is what you should do when working on a Julia package since people will likely want to install your package next to other packages and overly restricting versions of dependencies will make your package incompatible.
Hence, I'd summarize as follows:
Application / "Project" -> Project.toml + Manifest.toml
Julia Package -> Only Project.toml
For more on applications and packages checkout the glossary of the Pkg.jl documentation.
(Note that there are exceptional cases (unregistered dependencies, for example) where you might have to check in a Manifest.toml for a Julia package.)

In Julia 1.2 and above, you can have nested Project.toml files to express test-specific dependencies. Since you may have a Project.toml in your test folder, which you would need to activate, I would also suggest including a Manifest.toml, as a record of under which environment you for-sure know your package's test are passing.
In other words, I believe in the package/application dichotomy mentioned in crstnbr's answer, and the recommendation to include Manifest.toml with applications, and I would further say that the tests within a package are like an application. The same goes for performance benchmarks that you might have in your package.
I haven't practiced this myself, but it seems like it would be nice to have the CI tests run under both the "frozen" version of the test/Manifest.toml, and the latest versions that the package manager can find of each package. If the tests start failing, it would be easier to tease apart whether the breakage is caused by a change in a dependency.

Related

Determining which R packages, and dependencies, use DLL files

I work in a corporate environment that uses Microsoft Windows Defender Application Control (WDAC) to provide security. This blocks unsigned EXE and DLL files from being installed on devices. R packages which use DLLs fail to install. The workaround to this is provide an R installation from an approved central source which also copies over a default set of packages, such as tidyverse, data.table etc. to the R library. Users can continue to install additional packages which are built with native R, but run into issues if they try to install, build from source, or update packages with DLL files in.
Is there a way to check whether a package uses DLL files in advance of installation?
Output something like:-
check_dll(foo)
result: "This package and its dependencies have no DLL files. You can install this package"
check_dll(bar)
result: "bar does not have any DLL files, but one dependency, OOF, uses DLL files.
You have already have a version of OOF installed so it should be safe to install bar"
check_dll(foobar)
result: "foobar has a DLL. Do not attempt to install foobar".
check_dll(RABOOF)
result: "RABOOF does not have any DLL files, but one of it's dependencies,
foobar, does have a DLL file. Do not attempt to install RABOOF".
tools::package_dependencies() will list the package dependencies, but nothing else.
Downloading the zip file from CRAN and inspecting it for a libs/x64 folder with contents will work, but seems a heavyweight approach. Theoretically if a package has lots of dependencies this could result in downloading a lot of files unnecessarily.
Look for the NeedsCompilation field in the DESCRIPTION file. If it is "yes", there will be a DLL. If it is "no", there probably won't be. (If it is not there, the package wasn't built properly, so all bets are off.)
The test is not perfect, because packages can put DLLs into the inst folder to get them installed without compiling them, though CRAN isn't supposed to allow that: "Source packages may not contain any form of binary executable code." But packages like pak (mentioned in the comments) may be allowed to get around this rule, e.g. by downloading binaries, so the test isn't perfect. You will also need to put together a blacklist of packages that will fail your WDAC tests even though they claim not to need compilation, containing pak and others like it.
The NeedsCompilation field is included as a column of the result of available.packages(), so it is very easy to access without trying to install the package.
I have accepted the answer from user2554330 as the best solution. It makes use of the normal set of commands used for package management; and the matrix generated by available.packages() can be passed to tools::package_dependencies(), removing the need for multiple internet queries.
For completeness I am documenting another possible solution. A script could query the unofficial CRAN Github mirror https://docs.r-hub.io/#cranatgh and look for a /src directory in each package project.

Packages in default project

I'm new to Julia.
Situation) I made a (local project, e.g., by Pkg.activate(".") and use a package which is installed in default project but not in the local one, i.e., using package_installed_only_in_the_default_one.
So I was confused.
Question) Are the packages in the default project shared and can be used in other projects?
The short answer to your question is yes, if you use a default value of LOAD_PATH variable.
Here is a more detailed explanation.
What you ask for is managed by the LOAD_PATH variable which specifies what using and import statements consider as project environments or package directories when loading code.
As you can read in the Julia manual entry on Environment stacks you have the following rules:
There are a couple of noteworthy features of this design:
The primary environment—i.e. the first environment in a stack—is faithfully embedded in a stacked environment. The full dependency graph of the first environment in a stack is guaranteed to be included intact in the stacked environment including the same versions of all dependencies.
Packages in non-primary environments can end up using incompatible versions of their dependencies even if their own environments are entirely compatible. This can happen when one of their dependencies is shadowed by a version in an earlier environment in the stack (either by graph or path, or both).
Now by default LOAD_PATH has value ["#", "#v#.#", "#stdlib"] which means that:
# refers to the "current active environment" (this is the primary environment mentioned above that you have activated)
#v#.# refers to the appropriate environment in ~/.julia/environments/ folder. The # characters, are replaced with the major and minor components of the Julia version number.
#stdlib expands to the absolute path of the current Julia installation's standard library directory.

Difference between tests/ and inst/tests/ for R packages

I'm developing a new package and I have some unit tests to write. What's the difference between tests/ and inst/tests/? What kinds of stuff should go in each?
In particular, I see in http://journal.r-project.org/archive/2011-1/RJournal_2011-1_Wickham.pdf that Hadley recommends using inst/tests/ "so users
also have access to them," then putting a reference in tests/ to run them all. But why not just put them all in tests/?
What #hadley means is that in binary packages, tests/ is not present; it is only in the source package. The convention is that anything in inst/ is copied to the package top-level directory upon installation, so inst/tests/ will be available as /tests in the binary and installed package directory structure.
See my permute package as an example. I used #hadley's testthat package as a learning experience and for my package tests. The package is on CRAN. Grab the source tarball and notice that it has both tests/ and inst/tests/, then grab the Windows binary and notice that only has tests/ and that is a copy of the one from inst/tests in the sources.
Strictly, only tests/ is run by R CMD check etc so during development and as checks in production packages you need code in tests/ to test the package does what it claims, or other unit tests. You can of course have code in tests/ that runs R scripts in /inst/tests/ which actually do the tests, and this has the side effect of also making the test code available to package users.
The way I see things is you need at least tests/, whether you need inst/tests will depend upon how you want to develop your package and what unit testing code/packages you are using. The inst/tests/ is something #hadley advocates but it is far from being a standard across much of CRAN.
As 'Writing R Extensions' says, only inst/tests gets installed. So only those can be used for tests for both the source versions of the package, and its binary (installed) form.
Otherwise tests/ is of course there for the usual R CMD check tests. Now, Martin Maechler once developed a 'hook' script from tests/ to use inst/tests, and I have been using that in a few of my packages, permitting them to be invoked when a user looks at the source, as well as after the fact. So you can in fact pivot out of the one set of tests into the other, and get the best of both worlds.
Edit: Here is a link to what our RProtoBuf package does to invoke inst/tests/ tests from tests/: https://github.com/eddelbuettel/rprotobuf/blob/master/tests/runUnitTests.R And the basic idea is due to Martin as I said.

Generating dependency graph of Autotools packages

I have a software system having 30+ Open Source packages, most of them using GNU Autotools suite.
Are there tools to automatically generate package-to-package dependency graph? I.e. I'd like to see something like gst-plugins-good -> gst-plugins-base -> gstreamer -> glib.
I don't think so, but you could probably whip something together with this knowledge:
Scan the file named either configure.ac or configure.in in the package's root directory.
Look for a string of the form PKG_CHECK_MODULES([...],[...]...)
The second argument of that macro consists of package requirements of the form package or package >= version separated by whitespace.
The requirement string might not be the same as the package tarball name; a tarball that contains package.pc or package.pc.in provides the package package.
This only works for dependencies that use pkg-config. Some don't and you'll need to keep track of those dependencies by hand.
Probably not, because this is a hard problem. If there were only one way to build a package, it might not be too bad, but in general this isn't the case. You have the --enable-foo and --with-foo options that you can pass into configure. Those are sometimes package dependent also, requiring more packages. Most Linux distros (I think but am not completely sure) maintain these sort of dependency lists for yum or zypper or apt or whatever the package manager is by hand, and only one layer deep, leaving it up to the package manager to traverse the graph. The packages for the distro are only built one way. It's not unusual for these lists to be broken, also.

Dependency management in R

Does R have a dependency management tool to facilitate project-specific dependencies? I'm looking for something akin to Java's maven, Ruby's bundler, Python's virtualenv, Node's npm, etc.
I'm aware of the "Depends" clause in the DESCRIPTION file, as well as the R_LIBS facility, but these don't seem to work in concert to provide a solution to some very common workflows.
I'd essentially like to be able to check out a project and run a single command to build and test the project. The command should install any required packages into a project-specific library without affecting the global R installation. E.g.:
my_project/.Rlibs/*
Unfortunately, Depends: within the DESCRIPTION: file is all you get for the following reasons:
R itself is reasonably cross-platform, but that means we need this to work across platforms and OSs
Encoding Depends: beyond R packages requires encoding the Depends in a portable manner across operating systems---good luck encoding even something simple such as 'a PNG graphics library' in a way that can be resolved unambiguously across systems
Windows does not have a package manager
AFAIK OS X does not have a package manager that mixes what Apple ships and what other Open Source projects provide
Even among Linux distributions, you do not get consistency: just take RStudio as an example which comes in two packages (which all provide their dependencies!) for RedHat/Fedora and Debian/Ubuntu
This is a hard problem.
The packrat package is precisely meant to achieve the following:
install any required packages into a project-specific library without affecting the global R installation
It allows installing different versions of the same packages in different project-local package libraries.
I am adding this answer even though this question is 5 years old, because this solution apparently didn't exist yet at the time the question was asked (as far as I can tell, packrat first appeared on CRAN in 2014).
Update (November 2019)
The new R package renv replaced packrat.
As a stop-gap, I've written a new rbundler package. It installs project dependencies into a project-specific subdirectory (e.g. <PROJECT>/.Rbundle), allowing the user to avoid using global libraries.
rbundler on Github
rbundler on CRAN
We've been using rbundler at Opower for a few months now and have seen a huge improvement in developer workflow, testability, and maintainability of internal packages. Combined with our internal package repository, we have been able to stabilize development of a dozen or so packages for use in production applications.
A common workflow:
Check out a project from github
cd into the project directory
Fire up R
From the R console:
library(rbundler)
bundle('.')
All dependencies will be installed into ./.Rbundle, and an .Renviron file will be created with the following contents:
R_LIBS_USER='.Rbundle'
Any R operations run from within this project directory will adhere to the project-speciic library and package dependencies. Note that, while this method uses the package DESCRIPTION to define dependencies, it needn't have an actual package structure. Thus, rbundler becomes a general tool for managing an R project, whether it be a simple script or a full-blown package.
You could use the following workflow:
1) create a script file, which contains everything you want to setup and store it in your projectd directory as e.g. projectInit.R
2) source this script from your .Rprofile (or any other file executed by R at startup) with a try statement
try(source("./projectInit.R"), silent=TRUE)
This will guarantee that even when no projectInit.R is found, R starts without error message
3) if you start R in your project directory, the projectInit.R file will be sourced if present in the directory and you are ready to go
This is from a Linux perspective, but should work in the same way under windows and Mac as well.

Resources