I am unfamiliar with good coding practices in Julia. Working in Jupyter notebook in python I typically put all import statements in one cell at the top of the file, which helps me easily see what the dependencies are.
Is it advisable to do the same with 'using' statements in Julia (I'm also working in Jupyter notebook for now)?
Yes you should. There is only one case I know where you might not want to do so, and that is if there is a large module that is only used under rare conditions -- for example, if you run a program that can produce a plot, or do similar optional functions with slowly loading modules, in some rare usage scenarios, but normally will never use that slow-to-load module.
Even then, you can easily wind up with errors at run time due to redefinitions of functions that have been already been run before the newly loaded module redefines them.
Related
I'm building a simulation in Julia and I have my code split across a bunch of files. Are there any benefits to wrapping everything in modules versus simplying include()-ing them in the runscript?
I have something like the following at the top of my runscript right now:
for filename in split(readall(`git ls-files`))
#everywhere include(filename)
end
I'm not planning to use the code outside of this immediate project, but I am running the simulation in parallel. Is there any benefit in creating modules?
I would say that the most important benefit is modularity :)
If you have different files that deal with different things, splitting the code into modules let's you keep track on the dependencies between the modules:
Which functions are purely implementation details of the given module and subject to change?
Which modules depend on which other modules?
It also lets you reuse the same name for different things in the different modules if you need to, if you're a little careful of what you export. (You can still access those names from the outside as qualified names)
For an example of such organisation, you can look at my repo https://github.com/toivoh/Debug.jl
Is it good practice to include every library I need to execute a function within that function?
For example, my file global.r contains several functions I need for a shiny app. Currently I have all needed packages at the top of the file. When I'm switching projects/copying these functions I have to load the packages/include them in the new code. Otherwise all needed packages are contained in that function. Of course I have to check all functions with a new R session, but I think this could help in the long run.
When I tried to load a package twice it won't load the package again but checks it's already loaded. My main question is if it would slow my functions if I restructure in that way?
I only saw that practice once, library calls inside functions, so I'm not sure.
As one of the commenters suggest, you should avoid loading packages within a function since
The function now has a global effect - as a general rule, this is something to avoid.
There is a very small performance hit.
The first point is the big one. As with most optimisation, only worry about the second point if it's an issue.
Now that we've established the principle, what are the possible solution.
In small projects, I have a file called packages.R that includes all the library calls I need. This is sourced at the top of my analysis script. BTW, all my functions are in a file call func.R. This workflow was stolen/adapted from a previous SO question
If you're only importing a single function, you could use the :: trick, e.g. package::funcA(...) That way you avoid loading the package.
For larger projects, I create an R package that handles all necessary imports. The benefit of creating a package is detailed in this answer on structuring large R projects.
Is there a way to get R to precompile all functions in a script?
The reason it matters is because the script is code for rshiny. I'd like to push forward the byte compiling to occur when the server starts up rather when the user is requesting a page.
I know cmpfun() could be used to compile one function at a time and modify function calls accordingly, but I'd like to do this without maintaining the extra boilerplate code if it's possible.
You should be able to use the JIT from compiler with:
library(compiler)
enableJIT(3)
or set the environment variable R_ENABLE_JIT to non-negative (3 is the highest amount of compilation). I did a quick experiment with my Shiny app and this seemed to produce no benefit at all, so maybe something is not working correctly. This page provides a few more details on R compilation options.
I have a rather big library with a significant set of APIs that I need to expose. In fact, I'd like to expose the whole thing. There is a lot of namespacing going on, like:
FooLibrary.Bar
FooLibrary.Qux.Rumps
FooLibrary.Qux.Scrooge
..
Basically, what I would like to do is make sure that the user can access that whole namespace. I have had a whole bunch of trouble with this, and I'm totally new to closure, so I thought I'd ask for some input.
First, I need closurebuilder.py to send the full list of files to the closure compiler. This doesn't seem supported: --namespace Foo does not include Foo.Bar. --input only allows a single file, not a directory. Nor can I simply send my list of files to the closure compiler directly, because my code is also requiring things like "goog.assers", so I do need the resolver.
In fact, the only solution I can see is having a FooLibrary.ExposeAPI JS file that #require's everything. Surely that can't be right?
This is my main issue.
However, later the closure compiler, with ADVANCED_OPTIMIZATIONS on, will optimize all these names away. Now I can fix that by adding "#export" all over the place, which I am not happy about, but should work. I suppose it would also be valid to use an extern here. Or I could simply disable advanced optimizations.
What I can't do, apparently, is say "export FooLibrary.*". Wouldn't that make sense?
Finally, for working in source mode, I need to do goog.require() for every namespace I am using. This is merely an inconvenience, though I am mentioning because it sort of related to my trouble above. I would prefer to be able to do:
goog.requireRecursively('FooLibrary')
in order to pull all the child namespaces as well; thus, recreating with a single command the environment that I have when I am using the compiled version of my library.
I feel like I am possibly misunderstanding some things, or how Closure is supposed to be used. I'd be interested in looking at other Closure-based libraries to see how they solve this.
You are discovering that Closure-compiler is built more for the end consumer and not as much for the library author.
If you are exporting basically everything, then you would be better off with SIMPLE_OPTIMIZATIONS. I would still highly encourage you to maintain compatibility of your library with ADVANCED_OPTIMIZATIONS so that users can compile the library source with their project.
First, I need closurebuilder.py to send the full list of files to the closure compiler. ...
In fact, the only solution I can see is having a FooLibrary.ExposeAPI JS file that #require's everything. Surely that can't be right?
You would need to specify an --root of your source folder and specify the namespaces of the leaf nodes of your file dependency tree. You may have better luck with the now deprecated CalcDeps.py script. I still use it for some projects.
What I can't do, apparently, is say "export FooLibrary.*". Wouldn't that make sense?
You can't do that because it only makes sense based on the final usage. You as the library writer wish to export everything, but perhaps a consumer of your library wishes to include the source (uncompiled) version and have more dead code elimination. Library authors are stuck in a kind of middle ground between SIMPLE and ADVANCED optimization levels.
What I have done for this case is maintain a separate exports file for my namespace that exports everything. When compiling a standalone version of my library for distribution, the exports file is included in the compilation. However I can still include the library source (without the exports) into a project and get full dead code elimination. The work/payoff balance of this though must be weighed against just using SIMPLE_OPTIMIZATIONS for the standalone library.
My GeolocationMarker library has an example of this strategy.
While I am writing .R functions I constantly need to manually write source("funcname.r") to get the changes reflected in the workspace. I am sure it must be possible to do this automatically. So what I would like would be just to make changes in my function, save the function and be able to use the new function in R workspace without manually "sourcing" this function. How can I do that?
UPDATE: I know about selecting appropriate lines of code and pressing CTRL+R in R Editor (RGui) or using Notepad++ and executing the lines into R. But this approach has a disadvantage of making my workspace console "muddled". I would like to stop this practice if at all possible.
You can use R studio which has a source on save option.
If you are prepared to package your functions into a package, you may enjoy exploring Hadley's devtools package. This provides a suite of tools to write, test and document
packages.
https://github.com/hadley/devtools
This approach offer many advantages, but mainly reloading the package with a minimum of retyping.
You will still have to type load_all("yourpackage") but I find this small amount of typing is small beer compared to the advantages of devtools.
For additional information, including how to setup devtools, have a look at https://github.com/hadley/devtools/wiki/development
If you're using Eclipse + StatET, you can press CTRL+R+S, which saves your script and sources it. As close to automatic as I can get.
If you can get your text editor to run a system command after it saves the file, then you could use something like AutoIt (on Windows) or a batch script (on UNIX-derivative) to pass a call to source off to all running copies of R. But that's a heck of a lot of work for not much gain.
Still, I think it's much more likely to work being event-driven on the text editor end vs. having R constantly scan for updates (or somehow interface with the OS's update-event-messaging-system).
This is likely not possible (automatically detecting disc changes without intervention or running at least one line).
R needs to read into memory functions, so a change on the disc wouldn't be reflected in the workspace without reloading your functions.
If you are into developing R functions, some amount of messiness during your development process will be likely inevitable, but perhaps I could suggest that you try writing an R-package to house your functions?
This has the advantage of being able to robustly document your functions, using lazy loading so that you have access to your functions/datasets immediately without sourcing them.
Don't be afraid of making a package, it's easy with package.skeleton() and doesn't have to go on CRAN but could be for your own personal use without distribution! Just have fun!
Try to accept some messiness during development knowing you are working towards your goal and fighting the good fight of code organization and documentation!
We are only imperfect people, in an imperfect world, but we mean well!