What's the definition of Package in Julia-Lang? - julia

I have frequently used Packages in Julia-Lang, there are many articles that describes how to work with them, but I don't know what is the exact definition of that.
EDIT
Following is a general definition from wiki:
Package (package management system), in which individual files or
resources are packed together as a software collection that provides
certain functionality as part of a larger system
I would like to know the special points of view toward Package that Julia-lang has. e.g. look at this definition from wiki about Java Package

I would say that a Julia package is a module (similar to a namespace in other languages) containing a collection of related functions that provide new functionality for Julia, and that will be useful for other people.
This definition is not unambiguous though. For example, I suggested recently that several image format packages could belong inside a single ImageFormats package, but the replies were that there was a good reason (code size and binary dependencies) for certain kinds of formats to be in separate packages.
If you follow the discussion of the pull requests for new packages on METADATA.jl, you will have a good idea about the community's feeling about what packages should be for / look like. My takeaway from following those discussions is that a more-or-less unified vision is starting to emerge.

Related

Common Lisp style: multiple packages in same repo

May I get recommendations or links to representative code repositories with good style for multiple related Common Lisp packages, please?
For instance, consider a high-level workflow library with accompanying lower-level API, each in its own CL package but same git repo due to synchronized releases.
Each system (*.asd file) isolates tests and may be invoked using:
(asdf:test-system foo :force t)
Separate systems may be built via make, which definitely helps isolate SBCL code-coverage reports.
Some users of the library may only want to load the lower-level API. For simplifying dependencies for those using higher-level API, it seems best to keep everything bundled in one repo. Any revision to one library would likely require updating all for the same release.
I currently have a single directory tree with a subdirectory for each CL package. There's a top-level Makefile plus one in each subdirectory that the library maintain would use. The top-level also contains symbolic links for .asd files pointing into relevant subdirectories. (It's a library deeply dependent upon POSIX calls via uiop-posix, so it's only applicable on an OS with sym-links.)
This seems to be an issue at large considering issue #1 for Quicklisp-docs [0].
Found nothing relevant in Google's CL style guide [1], State of the Common Lisp Ecosystem, 2015 [2], Edi's CL Recipes [3] or Lisp-lang [4]. Browsing repos seem to have quite a mix of styles.
[0] https://github.com/rudolfochrist/quicklisp-docs/issues/1
[1] https://google.github.io/styleguide/lispguide.xml
[2] https://web.archive.org/web/20160305061250/http://eudoxia.me/article/common-lisp-sotu-2015/
[3] http://weitz.de/cl-recipes/
[4] http://lisp-lang.org/learn/writing-libraries but see also their section on Continuous Integration
Repo to be fixed: https://gitlab.com/dpezely/cl-mmap
(commit a23bd88d of 2018-07-14; release will be tagged when fixed)
You could consider using asdf-inferred-package. With that, you could have a mmap/high package that depends on a mmap/low package. With that setup, you can actually ask Quicklisp to load either of them directly:
(ql:quickload "mmap/high")
or
(ql:quickload "mmap/low")
You can see an example in my cl-bulk repo.
Attempting to reach a specific audience that might not have seen the question here, it was posted to Common Lisp Pro mailing list.
A summary of the various responses-- apart from great insights to various possible future directions-- there is no de facto convention, mechanism or style for addressing the combination of factors:
dependency synchronization across interrelated libraries/packages/systems
accommodating loading each individually
accommodating testing each individually
accommodating code-coverage reports of each individually
At time of adding this answer, the closest to a consistent, concrete, existing solution seems to align with what had already been implemented by the package mentioned in the original post-- or close enough. (There are of course subtle design and naming differences as indicated by the earlier answer here, but I see these variations as comparable.)
Highlights of packages and systems suggested as examples:
An early implementation of CLIM (predating McCLIM) for its separation of API versus implementation
Despite conventional use of ASDF systems and packages, explore how UIOP within ASDF itself is structured
ASDF and LIL import and reexport
all symbols in a directory; see Faré's full summary
Future directions and suggested reading included:
Consider that intersection to be a software engineering question, and construct accordingly, because "In short: it depends!"
Modules of Racket or Gerbil Scheme
Perhaps internal updates to Google's CL style guide have added something relevant?
(Much thanks to Pascal J. Bourguignon, Ken Tilton, Scott McKay, Faré, Svante v. Erichsen, Don Morrison and Pascal Costanza for participating in the email thread.)

Why do many Common Lisp systems use a single packages.lisp file?

Many of the commonly used libraries I see use a single packages.lisp file to declare all of the packages in the library (system) in one place.
Since exported symbols are part of the package definition, this means individual source files don't list their exported symbols.
In my own projects, I prefer the style of defining one source file per package, and defining its interface/exports at the top of the file.
I am wondering if I am doing it wrong, or missing an essential concept that leads to the preference for a single packages.lisp file.
In case it's relevant, I'm also using ASDF's :package-inferred-system approach, and uiop:define-package instead of defpackage, to make use of it's handy symbol-shadowing :mix feature -- because I haven't figured out how to :use a package which shadows built-in symbols without re-declaring the shadows in each package that uses it.
Traditional view of using Packages
Sometimes a package (a symbol namespace) is used in tens or even hundreds of files and packages might need to be computed. This might be simpler in one file and it might be easier to get a textual overview when the package declaration(s) are in one file. For example the implementation of a text editor could be in just one package using around 100 files. Notice also that a package is really a runtime data structure in Common Lisp with a programmer interface.
Influence from other languages like Java
The style to have one package and one corresponding source file is often coming from outside of Common Lisp, from languages which usually have a correspondence of something like one class = one namespace = one file. This leads to a multitude of files, in nested directories, with often small pieces of code per file.
Common Lisp does not have these restrictions: methods are not organized in classes, classes are not namespaces, files can have any mix of definitions, ... The Lisp libraries tend to have large packages/namespaces and large files.
Structure and its limitations
Packages
Packages in Common Lisp are namespaces for symbols. This is not a full-blown module system. Also there is no actual information hiding. There is a distinction between symbols and exported symbols. Note also that symbols have multiple meanings as a name (variable, function/macro/special operator, class name, slot name, type, data object, package name, ...) and exporting a symbol from a package does not make a distinction between those meanings. Note also that it is possible, but not recommended, to use more than one package namespace in a file (for example by using multiple in-package forms). Typically one would have only one namespace in a file. This is usual declared by an in-package somewhere at the top of the file.
Classes
Classes are no namespace. Typically classes are defined using the built-in Common Lisp Object System, called CLOS. They bundle slots and are used for dispatching in CLOS generic functions. But they don't contain their methods.
Systems
Systems are not a language construct. They are an extension to Common Lisp. The idea of a system is provided by tools like ASDF. A system is a tool for organizing a bunch of files which make a library/application and their dependencies. It's also where functionality is provided to use actions on a system (compiling, loading, delivering, ...).
???
There might be something missing for better organization of code. Each project might need a slightly different way.
I would typically use files to put related functionality into one file and set up a bunch of files for system - if needed. But that could mean that a file implements more than one class and a lot of functionality. I tend to organize files in sections where directly related code is implemented. I might describe a few elements (classes, functions) at the top of the file, but that would be more of a local overview and less than a list of exported symbols. Once you load a system and its files into a Lisp with its IDE, it is the purpose of the development environment to let me query for code (where is? who uses? what is used? sub/super class? package contents? ...) and to provide browsers for that.
There have alternative ways to organize code. For example by using PROVIDE and REQUIRE, which are only very lightly described in the language standard. These tend to pull in functionality on demand and create package structures on the go.
There might be the need for something like object-oriented protocols, which provide more structure for CLOS.
Early use of packages and systems in the MIT Lisp Machine operating system
One of the early demands for packages to avoid name clashes and systems to organize code came from the Lisp Machine operating system developed at MIT in the late 70s and the 80s. It was developed in Lisp Machine Lisp and later in Common Lisp. There, a lot of different functionality (compiler, editor, listener, inspector, mail client, graphics library, file system, serial interface, ethernet interface, networking code, ...) was running in a single address space with probably tens of thousands of symbols. The package declaration was usually done in the corresponding system description file (here we again use the Lisp meaning of a system as a library or an application, a collection of files for a certain purpose) or sometimes in a separate file. Packages provided the namespace for large libraries or even entire applications. Thus files, packages, systems and even classes (earlier called Flavors) were already used then to structure software implementations. See the Lisp Machine Manual (version from 1984), Chapter 29, Maintaining Large Systems and Kent Pitman's MIT AI Memo 801 from 1984: The Description of Large Systems. Systems on the Lisp Machine had versions and supported patching (incremental versioned changes). Files had versions, too.

Documentation tool for Ada software

Brief: I'm looking for some kind of tool to produce a software description from the comments in existing software source code.
In more detail: I've got existing source code written in Ada. Changes need to be made to this source code and I also need to generate a document containing a description of the software as a whole and all of its packages, routines etc. (if possible as PDF). For the existing routines these source code comments already exist and contain sufficient detail for my needs.
The description shall include at least
overall software design
textual description of packages, routines, variables, constants etc.
call and caller graphs
For projects based on C I'd do this using Doxygen. Doxygen itself, however, does not cope with sotware written in Ada. My thought was to (automatically) convert existing comments in the source code so that Doxygen can read these. The conversion itself was no problem (using Doxygen's filter mechanism), but as keywords and syntax between C and Ada differ a lot, this did not produce any useable output.
I then had a look at Understand from SciTools. While this analyses the software to a good detail and generates nice metrices, I was not able to get anything out of it, that resembles a document with what I need.
I want to avoid (manually) writing a separate document, but instead would like to generate this from the code base. I will have to put all the necessary information (perhaps with the the exception of a general overview) there anyhow, so why not use it for documentation purpose as well.
Is there any tool that is able to do what I need?
There's a tool called "AdaDoc", which seems to do a part of what you're asking for. You can of course use "a2ps" for the textual part of your needs (I like that better than what AdaDoc generates).
There are several UML tools ("Umbrello" is one name I remember), which offer to create graphs of inter-package relations, but for a seriously sized project, the best option is to use the original design documents, and simply verify that the source text actually matches that design.
For languages not supported by Doxygen, I've written my own "general purpose" filter.
It's very basic, but useful for me.
https://github.com/malkev/doxphp

Bilingual (English and Portuguese) documentation in an R package

I am writing a package to facilitate importing Brazilian socio-economic microdata sets (Census, PNAD, etc).
I foresee two distinct groups of users of the package:
Users in Brazil, who may feel more at ease with the documentation in
Portuguese. The probably can understand English to some extent, but a
foreign language would probably make the package feel less
"ergonomic".
The broader international users community, from whom English
documentation may be a necessary condition.
Is it possible to write a package in a way that the documentation is "bilingual" (English and Portuguese), and that the language shown to the user will depend on their country/language settings?
Also,
Is that doable within the roxygen2 documentation framework?
I realise there is a tradeoff of making the package more user-friendly by making it bilingual vs. the increased complexity and difficulty to maintain. General comments on this tradeoff from previous expirience are also welcome.
EDIT: following the comment's suggestion I cross-posted r-package-devel mailling list. HERE, then follow the answers at the bottom. Duncan Murdoch posted an interesting answer covering some of what #Brandons answer (bellow) covers, but also including two additional suggestions that I think are useful:
have the package in one language, but the vignettes for different
languages. I will follow this advice.
have to versions of the package , let's say 1.1 and 1.2, one on each
language
According to Ropensci, there is no standard mechanism for translating package documentation into non-English languages. They describe the typical process of internationalization/localization as follows:
To create non-English documentation requires manual creation of
supplemental .Rd files or package vignettes.
Packages supplying
non-English documentation should include a Language field in the
DESCRIPTION file.
And some more info on the Language field:
A ‘Language’ field can be used to indicate if the package
documentation is not in English: this should be a comma-separated list
of standard (not private use or grandfathered) IETF language tags as
currently defined by RFC 5646 (https://www.rfc-editor.org/rfc/rfc5646,
see also https://en.wikipedia.org/wiki/IETF_language_tag), i.e., use
language subtags which in essence are 2-letter ISO 639-1
(https://en.wikipedia.org/wiki/ISO_639-1) or 3-letter ISO 639-3
(https://en.wikipedia.org/wiki/ISO_639-3) language codes.
Care is needed if your package contains non-ASCII text, and in particular if it is intended to be used in more than one locale. It is possible to mark the encoding used in the DESCRIPTION file and in .Rd files.
Regarding encoding...
First, consider carefully if you really need non-ASCII text. Many
users of R will only be able to view correctly text in their native
language group (e.g. Western European, Eastern European, Simplified
Chinese) and ASCII.72. Other characters may not be rendered at all,
rendered incorrectly, or cause your R code to give an error. For .Rd
documentation, marking the encoding and including ASCII
transliterations is likely to do a reasonable job. The set of
characters which is commonly supported is wider than it used to be
around 2000, but non-Latin alphabets (Greek, Russian, Georgian, …) are
still often problematic and those with double-width characters
(Chinese, Japanese, Korean) often need specialist fonts to render
correctly.
On a related note, R does, however, provide support for "errors and warnings" in different languages - "There are mechanisms to translate the R- and C-level error and warning messages. There are only available if R is compiled with NLS support (which is requested by configure option --enable-nls, the default)."
Besides bilingual documentation, please allow me the following comment: Given your two "target" groups, it may be assumed that some of your users will be running non-English OS (typically, Windows in Portuguese). When importing time series data (or any date entries as a matter of fact), due to different "date" formatting (English vs. non-English), you may get different "results" (i.e. misinterpeted date entries) when importing to English/non-English machines. I have some experience with those issues (I often work with Czech-language-based OSs) and -other than ad-hoc coding- I don't find a simple solution.
(If you find this off-topic, please feel free to delete)

Writing robust R code: namespaces, masking and using the `::` operator

Short version
For those that don't want to read through my "case", this is the essence:
What is the recommended way of minimizing the chances of new packages breaking existing code, i.e. of making the code you write as robust as possible?
What is the recommended way of making the best use of the namespace mechanism when
a) just using contributed packages (say in just some R Analysis Project)?
b) with respect to developing own packages?
How best to avoid conflicts with respect to formal classes (mostly Reference Classes in my case) as there isn't even a namespace mechanism comparable to :: for classes (AFAIU)?
The way the R universe works
This is something that's been nagging in the back of my mind for about two years now, yet I don't feel as if I have come to a satisfying solution. Plus I feel it's getting worse.
We see an ever increasing number of packages on CRAN, github, R-Forge and the like, which is simply terrific.
In such a decentralized environment, it is natural that the code base that makes up R (let's say that's base R and contributed R, for simplicity) will deviate from an ideal state with respect to robustness: people follow different conventions, there's S3, S4, S4 Reference Classes, etc. Things can't be as "aligned" as they would be if there were a "central clearing instance" that enforced conventions. That's okay.
The problem
Given the above, it can be very hard to use R to write robust code. Not everything you need will be in base R. For certain projects you will end up loading quite a few contributed packages.
IMHO, the biggest issue in that respect is the way the namespace concept is put to use in R: R allows for simply writing the name of a certain function/method without explicitly requiring it's namespace (i.e. foo vs. namespace::foo).
So for the sake of simplicity, that's what everyone is doing. But that way, name clashes, broken code and the need to rewrite/refactor your code are just a matter of time (or of the number of different packages loaded).
At best, you will know about which existing functions are masked/overloaded by a newly added package. At worst, you will have no clue until your code breaks.
A couple of examples:
try loading RMySQL and RSQLite at the same time, they don't go along very well
also RMongo will overwrite certain functions of RMySQL
forecast masks a lot of stuff with respect to ARIMA-related functions
R.utils even masks the base::parse routine
(I can't recall which functions in particular were causing the problems, but am willing to look it up again if there's interest)
Surprisingly, this doesn't seem to bother a lot of programmers out there. I tried to raise interest a couple of times at r-devel, to no significant avail.
Downsides of using the :: operator
Using the :: operator might significantly hurt efficiency in certain contexts as Dominick Samperi pointed out.
When developing your own package, you can't even use the :: operator throughout your own code as your code is no real package yet and thus there's also no namespace yet. So I would have to initially stick to the foo way, build, test and then go back to changing everything to namespace::foo. Not really.
Possible solutions to avoid these problems
Reassign each function from each package to a variable that follows certain naming conventions, e.g. namespace..foo in order to avoid the inefficiencies associated with namespace::foo (I outlined it once here). Pros: it works. Cons: it's clumsy and you double the memory used.
Simulate a namespace when developing your package. AFAIU, this is not really possible, at least I was told so back then.
Make it mandatory to use namespace::foo. IMHO, that would be the best thing to do. Sure, we would lose some extend of simplicity, but then again the R universe just isn't simple anymore (at least it's not as simple as in the early 00's).
And what about (formal) classes?
Apart from the aspects described above, :: way works quite well for functions/methods. But what about class definitions?
Take package timeDate with it's class timeDate. Say another package comes along which also has a class timeDate. I don't see how I could explicitly state that I would like a new instance of class timeDate from either of the two packages.
Something like this will not work:
new(timeDate::timeDate)
new("timeDate::timeDate")
new("timeDate", ns="timeDate")
That can be a huge problem as more and more people switch to an OOP-style for their R packages, leading to lots of class definitions. If there is a way to explicitly address the namespace of a class definition, I would very much appreciate a pointer!
Conclusion
Even though this was a bit lengthy, I hope I was able to point out the core problem/question and that I can raise more awareness here.
I think devtools and mvbutils do have some approaches that might be worth spreading, but I'm sure there's more to say.
GREAT question.
Validation
Writing robust, stable, and production-ready R code IS hard. You said: "Surprisingly, this doesn't seem to bother a lot of programmers out there". That's because most R programmers are not writing production code. They are performing one-off academic/research tasks. I would seriously question the skillset of any coder that claims that R is easy to put into production. Aside from my post on search/find mechanism which you have already linked to, I also wrote a post about the dangers of warning. The suggestions will help reduce complexity in your production code.
Tips for writing robust/production R code
Avoid packages that use Depends and favor packages that use Imports. A package with dependencies stuffed into Imports only is completely safe to use. If you absolutely must use a package that employs Depends, then email the author immediately after you call install.packages().
Here's what I tell authors: "Hi Author, I'm a fan of the XYZ package. I'd like to make a request. Could you move ABC and DEF from Depends to Imports in the next update? I cannot add your package to my own package's Imports until this happens. With R 2.14 enforcing NAMESPACE for every package, the general message from R Core is that packages should try to be "good citizens". If I have to load a Depends package, it adds a significant burden: I have to check for conflicts every time I take a dependency on a new package. With Imports, the package is free of side-effects. I understand that you might break other people's packages by doing this. I think its the right thing to do to demonstrate a commitment to Imports and in the long-run it will help people produce more robust R code."
Use importFrom. Don't add an entire package to Imports, add only those specific functions that you require. I accomplish this with Roxygen2 function documentation and roxygenize() which automatically generates the NAMESPACE file. In this way, you can import two packages that have conflicts where the conflicts aren't in the functions you actually need to use. Is this tedious? Only until it becomes a habit. The benefit: you can quickly identify all of your 3rd-party dependencies. That helps with...
Don't upgrade packages blindly. Read the changelog line-by-line and consider how the updates will affect the stability of your own package. Most of the time, the updates don't touch the functions you actually use.
Avoid S4 classes. I'm doing some hand-waving here. I find S4 to be complex and it takes enough brain power to deal with the search/find mechanism on the functional side of R. Do you really need these OO feature? Managing state = managing complexity - leave that for Python or Java =)
Write unit tests. Use the testthat package.
Whenever you R CMD build/test your package, parse the output and look for NOTE, INFO, WARNING. Also, physically scan with your own eyes. There's a part of the build step that notes conflicts but doesn't attach a WARN, etc. to it.
Add assertions and invariants right after a call to a 3rd-party package. In other words, don't fully trust what someone else gives you. Probe the result a little bit and stop() if the result is unexpected. You don't have to go crazy - pick one or two assertions that imply valid/high-confidence results.
I think there's more but this has become muscle memory now =) I'll augment if more comes to me.
My take on it :
Summary : Flexibility comes with a price. I'm willing to pay that price.
1) I simply don't use packages that cause that kind of problems. If I really, really need a function from that package in my own packages, I use the importFrom() in my NAMESPACE file. In any case, if I have trouble with a package, I contact the package author. The problem is at their side, not R's.
2) I never use :: inside my own code. By exporting only the functions needed by the user of my package, I can keep my own functions inside the NAMESPACE without running into conflicts. Functions that are not exported won't hide functions with the same name either, so that's a double win.
A good guide on how exactly environments, namespaces and the likes work you find here:
http://blog.obeautifulcode.com/R/How-R-Searches-And-Finds-Stuff/
This definitely is a must-read for everybody writing packages and the likes. After you read this, you'll realize that using :: in your package code is not necessary.

Resources