Project structure in Julia with clean separation - julia

I am trying to correctly configure julia project.
The project had initially:
One module in its own directory
Another "kind of module" in another directory, but there was no module keyword and it was patched together with help of includes.
A few files in the root to execute it all by just running "julia run.jl" for example.
No packages, no Project.toml no Manifest.toml - there is "packages.jl", which manually calls "Pkg.add" to for preset list of dependencies.
Not all includes were used to put it all together, there was some fiddling with LOAD_PATH
Logically the project contains 3 parts that I see there and something as I would see as "packages" in for example python world.
One "Common" module with basic util functions shared by all interested modules.
One module A, which has dependency on "Common".
Module B, which has dependency on A and "Common".
What I did is, I created 3 modules in their separate directories and to put it all together. These modules make sense internally and there is no real reason to expose them "outside". The whole code is in the end executable and there would be exposed probably just one function, that executes everything. I created loader file, which included all 3 module files. That way I got rid of LOAD_PATH and references in IDE started to work. This works for our purposes, but still isn't ideal. I have been reading quite a lot about Julia structure and possibilities - modules, packages, but still don't understand fully. And Revise doesn't work.
Is it correct to have modules like this? I like modules as they clearly set boundaries between modules using export lines.
I would also like to make the code as compatible with IDE as possible (LOAD_PATH settings didn't work for VS code and references to functions were broken) and also with Revise.
What is the typical structure for this?
How to clearly separate code while make the development easy?
How to make this work with Revise?
I expect it's good idea to make a package for this, but should I make it for the whole project? Then it would mean one "project" module and 3 submodules A, B and Common?
Or should it be 3(4) packages per module?
Thanks for any output. Comparison of some of the principles to Python/Java/kotlin/C#/Javascript module/packages could be helpful.

In CubicEoS.jl I've made a single package which provides two modules: CubicEoS with general algorithms and CubicEoS.BrusilovskyEoS with an implementation of necessary interface to make CubicEoS works on a concrete case. The last module depends on the first. Revise works well. As a user (not developer) of CubicEoS, I have scripts which run some calculations.
So, in your case, I would create a single package with four modules. The forth module is a hood for the others: Common, A and B. The possible file structure maybe like this
src/
TheHood.jl
Common/Common.jl
Module_A/Module_A.jl
Module_B/Module_B.jl
test/
...
examples/ # those may put out of here, but the important examples may be in test/
...
Project.toml
And, the possible module structure may be like this
# TheHood.jl
module TheHood
export bar
include("Common/Common.jl")
include("Module_A/Module_A.jl")
include("Module_B/Module_B.jl")
end
# Common/Common.jl
module Common
export util_1, util_2
using LinearAlgebra
util_1(x) = "util_1's implementation"
util_2(x) = "util_2's implementation"
end
# Module_A/Module_A.jl
module Module_A
import ..Common
export foo
foo(x) = "foo's implementation"
end
# Module_B/Module_B.jl
module Module_B
import ..Common
import ..Module_A
export bar
bar(x) = "bar's implementation"
end
Now, I'll answer the questions
What is the typical structure for this?
If the modules does not use independently, I would use the structure above. For small projects I've found "one package = one module" strategy painful when a core package updates.
How to clearly separate code while make the development easy?
In Julia, a module effectively is a namespace. In inner modules (like A, B, Common), I usually import the dev-modules, but use modules which a dev-module depends on (see above using LinearAlgebra vs import ..Common. That's my preference to clarify names.
How to make this work with Revise?
Turn the code into a package. That's preferable, because Reviseing of standlone modules is buggy (at least, in my experience). I'm usually using Revise like this
% cd where_the_package_lives # actually, where the project.toml is
% julia --project=. # or julia and pkg> activate .
% julia> using Revise
% julia> using ThePackage
After that I usually can edit source code and call the updated methods w/o restarting of the REPL. But, Revise has some limitations.
I expect it's good idea to make a package for this, but should I make it for the whole project? Then it would mean one "project" module and 3 submodules A, B and Common?
Or should it be 3(4) packages per module?
You should separate "scripty" (throughaway or command line scripts) and core (reusable) code. The core I would put in a single package. The scripty files (like examples/, CLI programs) should be alone using the package. The scripty files may define modules or whatever, but their users are endusers, not developers (e.g. running a script involves an i/o operation).

Just to end this.
The main thing I was missing and would save me a lot of time if I knew:
If you have your own package, it means the file src/YourPackage.jl will be "imported/included" once "using" keyword is being used and everything just works. No need to do any kind of import/LOAD_PATH magic.
just doing that fixed 95% of my problems. I ended up following patterns in accepted answer and also from the comment recommending DrWatson.
Only small thing is the fact that "scripts" don't work well with VSCode. "find all references" or "go to definition" and autocomplete don't work. They do work for everything within "YourPackage.jl" and it's imports perfectly and so does Revise. It's really tiny thing since scripts are usually like 3 lines of code.

Related

Load packages automatically without `using` in Julia?

When looking at files like this: https://github.com/simon-lc/Silico.jl/blob/main/examples/demo/peg_in_hole_planning.jl
The author does not call "using Silico" or "using Mehrotra" anywhere, yet calls it many times throughout the file. As someone coming from Python, I don't understand this. How does Julia know where to look for Silico without a statement like "using Silico"?
For this, you can customize the configuration file of Julia.
For example, in Windows OS, you can go to the following path:
C://Users//.julia/config/startup.jl
Open the file and write the importing command(s) you want. E.g., using Term or using OhMyREPL and using Statistics: mean, std (then those functions will be available by default). Then every time you run the Julia, those packages will be imported automatically.
*Note that if this file doesn't exist in the path, you can create a file with the same name.
You can also compile the preferred packages into the Julia system image, and the Julia REPL will start a bit quicker since it does not have to parse and compile the package when loaded. The way to do this is by using PackageCompiler.jl. [1]

Are there any good resources/best-practices to "industrialize" code in R for a data science project?

I need to "industrialize" an R code for a data science project, because the project will be rerun several times in the future with fresh data. The new code should be really easy to follow even for people who have not worked on the project before and they should be able to redo the whole workflow quite quickly. Therefore I am looking for tips, suggestions, resources and best-practices on how to achieve this objective.
Thank you for your help in advance!
You can make an R package out of your project, because it has everything you need for a standalone project that you want to share with others :
Easy to share, download and install
R has a very efficient documentation system for your functions and objects when you work within R Studio. Combined with roxygen2, it enables you to document precisely every function, and makes the code clearer since you can avoid commenting with inline comments (but please do so anyway if needed)
You can specify quite easily which dependancies your package will need, so that every one knows what to install for your project to work. You can also use packrat if you want to mimic python's virtualenv
R also provide a long format documentation system, which are called vignettes and are similar to a printed notebook : you can display code, text, code results, etc. This is were you will write guidelines and methods on how to use the functions, provide detailed instructions for a certain method, etc. Once the package is installed they are automatically included and available for all users.
The only downside is the following : since R is a functional programming language, a package consists of mainly functions, and some other relevant objects (data, for instance), but not really scripts.
More details about the last point if your project consists in a script that calls a set of functions to do something, it cannot directly appear within the package. Two options here : a) you make a dispatcher function that runs a set of functions to do the job, so that users just have to call one function to run the whole method (not really good for maintenance) ; b) you make the whole script appear in a vignette (see above). With this method, people just have to write a single R file (which can be copy-pasted from the vignette), which may look like this :
library(mydatascienceproject)
library(...)
...
dothis()
dothat()
finishwork()
That enables you to execute the whole work from a terminal or a distant machine with Rscript, with the following (using argparse to add arguments)
Rscript myautomatedtask.R --arg1 anargument --arg2 anotherargument
And finally if you write a bash file calling Rscript, you can automate everything !
Feel free to read Hadley Wickham's book about R packages, it is super clear, full of best practices and of great help in writing your packages.
One can get lost in the multiple files in the project's folder, so it should be structured properly: link
Naming conventions that I use: first, second.
Set up the random seed, so the outputs should be reproducible.
Documentation is important: you can use the Roxygen skeleton in rstudio (default ctrl+alt+shift+r).
I usually separate the code into smaller, logically cohesive scripts, and use a main.R script, that uses the others.
If you use a special set of libraries, you can consider using packrat. Once you set it up, you can manage the installed project-specific libraries.

How can I create a library in julia?

I need to know how to create a library in Julia and where I must keep it in order to call it later. I come from C and matlab, it seems there is no documentation about pratical programming in Julia.
Thanks
If you are new to Julia, you will find it helpful to realize that Julia has two mechanisms for loading code. Stating you "need to know how to create a library in Julia" would imply you most likely will want to create a Julia module docs and possibly a packagedocs. But the first method listed below may also be useful to you.
The two methods to load code in Julia are:
1. Code inclusion via the include("file_path_relative_to_call_or_pwd.jl")docs
The expression include("source.jl") causes the contents of the file source.jl to be evaluated in the global scope of the module where the include call occurs.
Regarding where the "source.jl" file is searched for:
The included path, source.jl, is interpreted relative to the file where the include call occurs. This makes it simple to relocate a subtree of source files. In the REPL, included paths are interpreted relative to the current working directory, pwd().
Including a file is an easy way to pull code from one file into another one. However, the variables, functions, etc. defined in the included file become part of the current namespace. On the other hand, a module provides its own distinct namespace.
2. Package loading via import X or using Xdocs
The import mechanism allows you to load a package—i.e. an independent, reusable collection of Julia code, wrapped in a module—and makes the resulting module available by the name X inside of the importing module.
Regarding the difference between these two methods of code loading:
Code inclusion is quite straightforward: it simply parses and evaluates a source file in the context of the caller. Package loading is built on top of code inclusion and is quite a bit more complex.
Regarding where Julia searches for module files, see docs summary:
The global variable LOAD_PATH contains the directories Julia searches for modules when calling require. It can be extended using push!:
push!(LOAD_PATH, "/Path/To/My/Module/")
Putting this statement in the file ~/.julia/config/startup.jl will extend LOAD_PATH on every Julia startup. Alternatively, the module load path can be extended by defining the environment variable JULIA_LOAD_PATH.
For one of the simplest examples of a Julia module, see Example.jl
module Example
export hello, domath
hello(who::String) = "Hello, $who"
domath(x::Number) = x + 5
end
and for the Example package, see here.
Side Note There is also a planned (future) library capability similar to what you may have used with other languages. See docs:
Library (future work): a compiled binary dependency (not written in Julia) packaged to be used by a Julia project. These are currently typically built in- place by a deps/build.jl script in a project’s source tree, but in the future we plan to make libraries first-class entities directly installed and upgraded by the package manager.

How to import custom module in julia

I have a module I wrote here:
# Hello.jl
module Hello
function foo
return 1
end
end
and
# Main.jl
using Hello
foo()
When I run the Main module:
$ julia ./Main.jl
I get this error:
ERROR: LoadError: ArgumentError: Hello not found in path
in require at ./loading.jl:249
in include at ./boot.jl:261
in include_from_node1 at ./loading.jl:320
in process_options at ./client.jl:280
in _start at ./client.jl:378
while loading /Main.jl, in expression starting on line 1
There is a new answer to this question since the release of Julia v0.7 and v1.0 that is slightly different. I just had to do this so I figured I'd post my findings here.
As already explained in other solutions, it is necessary to include the relevant script which defines the module. However, since the custom module is not a package, it cannot be loaded as a package with the same using or import commands as could be done in older Julia versions.
So the Main.jl script would be written with a relative import like this:
include("./Hello.jl")
using .Hello
foo()
I found this explained simply in Stefan Karpinski's discourse comment on a similar question. As he describes, the situation can also get more elaborate when dealing with submodules. The documentation section on module paths is also a good reference.
EDIT: Updated code to apply post-v1.0. The other answers still have a fundamental problem: if you define a module and then include that module definition in multiple places, you will get unexpected hard-to-understand errors. #kiliantics' answer is correct as long as you only include the file once. If you have a module that you're using across multiple files, make that module into a package, use add MyModule, and then type using MyModule in as many places as you want, letting Pkg handle module identity for you.
Though 张实唯's answer is the most convenient, you should not use include outside the REPL (or just once per included file as a simple practice to organize large modules, as in the first example here). If you're writing a program file, go through the trouble of adding the appropriate directory to the LOAD_PATH. Remy gives a very good explanation of how to do so, but it's worth also explaining why you should do so in the first place. (Additionally from the docs: push!(LOAD_PATH, "/Path/To/My/Module/") but note your module and your file have to have the same name)
The problem is that anything you include will be defined right where you call include even if it is also defined elsewhere. Since the goal of modules is re-use, you'll probably eventually use MyModule in more than one file. If you call include in each file, then each will have its own definition of MyModule, and even though they are identical, these will be different definitions. That means any data defined in the MyModule (such as data types) will not be the same.
To see why this is a huge problem, consider these three files:
types.jl
module TypeModule
struct A end
export A
end
a_function.jl
include("types.jl")
module AFunctionModule
using ..TypeModule
function takes_a(a::A)
println("Took A!")
end
export takes_a
end
function_caller.jl
include("a_function.jl")
include("types.jl") # delete this line to make it work
using .TypeModule, .AFunctionModule
my_a = A()
takes_a(my_a)
If you run julia function_caller.jl you'll get MethodError: no method matching takes_a(::A). This is because the type A used in function_caller.jl is different from the one used in a_function.jl. In this simple case, you can actually "fix" the problem by reversing the order of the includes in function_caller.jl (or just by deleting include("types.jl") entirely from function_caller.jl! That's not good!). But what if you wanted another file b_function.jl that also used a type defined in TypeModule? You would have to do something very hacky. Or you could just modify your LOAD_PATH so the module is only defined once.
EDIT in response to xji: To distribute a module, you'd use Pkg (docs). I understood the premise of this question to be a custom, personal module. It's also fine for distribution if you know the relative path of the directory containing your module definition from each file that needs to load that module, e.g. if all your files are in the same folder then you'd just have push!(LOAD_PATH, #__DIR__).
Incidentally, if you really don't like the idea of modifying your load path (even if it's only within the scope of a single script...) you could symlink your module into a package directory (e.g. ~/.julia/v0.6/MyModule/MyModule.jl) and then Pkg.add(MyModule) and then import as normal. I find that to be a bit more trouble.
This answer has been OUTDATED. Please see other excellent explanations.
===
You should include("./Hello.jl") before using Hello
This answers was originally written for Julia 0.4.5. There is now an easier way of importing a local file (see #kiliantics answer). However, I will leave this up as my answer explains several other methods of loading files from other directories which may be of use still.
There have already been some short answers, but I wanted to provide a more complete answer if possible.
When you run using MyModule, Julia only searches for it in a list of directories known as your LOAD_PATH. If you type LOAD_PATH in the Julia REPL, you will get something like the following:
2-element Array{ByteString,1}:
"/Applications/Julia-0.4.5.app/Contents/Resources/julia/local/share/julia/site/v0.4"
"/Applications/Julia-0.4.5.app/Contents/Resources/julia/share/julia/site/v0.4"
These are the directories that Julia will search for modules to include when you type using Hello. In the example that you provided, since Hello was not in your LOAD_PATH, Julia was unable to find it.
If you wish to include a local module, you can specify its location relative to your current working directory.
julia> include("./src/Hello.jl")
Once the file has been included, you can then run using Hello as normal to get all of the same behavior. For one off scripts, this is probably the best solution. However, if you find yourself regular having to include() a certain set of directories, you can permanently add them to your LOAD_PATH.
Adding directories to LOAD_PATH
Manually adding directories to your LOAD_PATH can be a pain if you wish to regularly use particular modules that are stored outside of the Julia LOAD_PATH. In that case, you can append additional directories to the LOAD_PATH environment variable. Julia will then automatically search through these directories whenever you issue an import or using command.
One way to do this is to add the following to your .basrc, .profile, .zshrc.
export JULIA_LOAD_PATH="/path/to/module/storage/folder"
This will append that directory onto the standard directories that Julia will search. If you then run
julia> LOAD_PATH
It should return
3-element Array{ByteString,1}:
"/path/to/module/storage/folder"
"/Applications/Julia-0.4.5.app/Contents/Resources/julia/local/share/julia/site/v0.4"
"/Applications/Julia-0.4.5.app/Contents/Resources/julia/share/julia/site/v0.4"
You can now freely run using Hello and Julia will automatically find the module (as long as it is stored underneath /path/to/module/storage/folder.
For more information, take a look at this page from the Julia Docs.
Unless you explicitly load the file (include("./Hello.jl")) Julia looks for module files in directories defined in the LOAD_PATH variable.
See this page.
I have Julia Version 1.4.2 (2020-05-23). Just this using .Hello worked for me.
However, I had to compile the Hello module before just using .Hello. It makes sense for both the defined and using scripts of Hello is on the same file.
Instead, we can define Hello in one file and use it in a different file with include("./Hello.jl");using .Hello
If you want to access function foo when importing the module with "using" you need to add "export foo" in the header of the module.

Import a module and use it in julialang

Since in http://julia.readthedocs.org/en/latest/manual/modules/ there's no much info about modules, I would like to ask the following.
I want to try two modules via ijulia. Both modules are in my working directory as
name-of-files.jul. I will call them generically module_1.jul and module_2.jul.
module_1.jul uses module_2.jul and I load it with
using module_2
On ijulia session, if I try
using module_1
gives an error. I also tried
include("module_1.jul")
This last sentence, when executed, rises an error because the module_1.jul cannot find
variable "x" that I know is contained in module_1.jul (in this case I "loaded" the module
using include("module2.jul") inside module_1.jul
Julias module system assumes some things that aren't necessarily obvious from the documenation at first.
Julia files should end with .jl extensions.
Julia looks for module files in directories defined in the LOAD_PATH variable.
Julia looks for files in those directories in the form ModuleName/src/file.jl
If using module_1 fails then I'm guessing it's because it's source files fail one of the above criteria.
Some time has passed since this question. Recently, Noah_S wrote the solution in the comments of the previous answer; this means that it is a recurrent doubt for people starting to learn the language. For their sake, I will re-write it here Noah_S' answer along with my most novel solution.
I am a mess with the julia versions and which commands work with the specific ones, so for older julia versions we have to look for the \path and then include in the julia module
push!(LOAD_PATH, "/path")
In newer versions this can be improved. Forget about looking by hand the path and just do
path = readstring(`pwd`)
push!(LOAD_PATH, chomp(path))
I hope this can be useful to many julians newcomers.

Resources