I'm building a simulation in Julia and I have my code split across a bunch of files. Are there any benefits to wrapping everything in modules versus simplying include()-ing them in the runscript?
I have something like the following at the top of my runscript right now:
for filename in split(readall(`git ls-files`))
#everywhere include(filename)
end
I'm not planning to use the code outside of this immediate project, but I am running the simulation in parallel. Is there any benefit in creating modules?
I would say that the most important benefit is modularity :)
If you have different files that deal with different things, splitting the code into modules let's you keep track on the dependencies between the modules:
Which functions are purely implementation details of the given module and subject to change?
Which modules depend on which other modules?
It also lets you reuse the same name for different things in the different modules if you need to, if you're a little careful of what you export. (You can still access those names from the outside as qualified names)
For an example of such organisation, you can look at my repo https://github.com/toivoh/Debug.jl
Related
I would like to use R objects (e.g., cleaned data) generated in one git-versioned R project in another git-versioned R project.
Specifically, I have multiple git-versioned R projects (that hold drake plans) that do various things for my thesis experiments (e.g., generate materials, import and clean data, generate reports/articles).
The experiment-specific projects should ideally be:
Connectable - so that I can get objects (mainly data and materials) that I generated in these projects into another git-versioned R project that generates my thesis report.
Self-contained - so that I can use them in other non-thesis projects (such as presentations, reports, and journal manuscripts). When sharing such projects, I'd ideally like not to need to share a monolithic thesis project.
Versioned - so that their use in different projects can be independent (e.g., if I make changes to the data cleaning for a manuscript after submitting the thesis, I still want the thesis to be reproducible as it was originally compiled).
At the moment I can see three ways of doing this:
Re-create the data cleaning process
But: this involves copy/paste, which I'd like to avoid, especially if things change upstream.
Access the relevant scripts/functions by changing the working directory
But: even if I used here it seems that this would introduce poor reproducibility.
Make the source projects into packages and make the objects I want to "export" into exported data (as per the data section of Hadley's R packages guide)
But: I'd like to avoid the unnecessary metadata, artefacts, and noise (e.g., see Miles McBain's "Project as an R package: An okay idea") if I can.
Is there any other way of doing this?
Edit: I tried #landau's suggestion of using a single drake plan, which worked well for a while, until (similar to #vrognas' case) I ended up with too many sub-projects (e.g., conference presentations and manuscripts) that relied on the same objects. Therefore, I added some clarifications above to my intentions with the question.
My first recommendation is to use a single drake plan to unite the stages of the overall project that need to share data. drake is designed to handle a lot of moving parts this way, and it will be more seamless when it comes to drake's decisions about what to rerun downstream. But if you really do need different plans in different sub-projects that share data, you can track each shared dataset as a file_out() file in one plan and track it with file_in() in another plan.
upstream_plan <- drake_plan(
export_file = write_csv(dataset, file_out("exported_data/dataset.csv"))
)
downstream_plan <- drake_plan(
dataset = read_csv(file_in("../upstream_project/exported_data/dataset.csv"))
)
You fundamentally misunderstood Miles McBain’s critique. He isn’t saying that you shouldn’t write reusable code nor that you shouldn’t use packages. He’s saying that you shouldn’t use packages for everything. But reusable code (i.e. code that you want to reuse) absolutely belongs in packages (or, better, modules), which can then be used in multiple projects.
That being said, first off, pay attention to Will Landau’s advice.
Secondly, you can make your RStudio projects configurable such that they can load data based on paths given in a configuration. Once that’s accomplished, nothing speaks against hard-coding paths to data in different projects inside that config file.
I am in a similar situation. I have many projects that are spawned from one raw dataset. Previously, when the project was young and small, I had it all in one version controlled project. This got out of hand as more sub-projects were spawned and my git history got cluttered from working on projects in parallel. This could be to my lack of skills with git. My folder structure looked something like this:
project/.git
project/main/
project/sub-project_1/
project/sub-project_2/
project/sub-project_n/
I contemplated having each project in its own git branch, but then I could not access them simultaneously. If I had to change something to the main dataset (eg I might have not cleaned some parts) then project 1 could become outdated and nonfunctional. Once I had finished project 1, I would have liked it to be isolated and contained for reproducibility. This is easier to achieve if the projects are separated. I don't think a drake/targets plan would solve this?
I also looked briefly into having the projects as git submodules but it seemed to add too much complexity. Again, my git ignorance might shine through here.
My current solution is to have the main data as an R-package, and each sub-project as a separate git-versioned folder (they are actually packages as well, but this is not necessary). This way I can load in a specific version of the data (using renv for package versions).
My folder structure now looks something like this:
main/.git
sub-project_1/.git
sub-project_2/.git
sub-project_n/.git
And inside each sub-project, I call library(main) to load the cleaned data. Within each sub-project, a drake/targets plan could be used.
I have a rather big library with a significant set of APIs that I need to expose. In fact, I'd like to expose the whole thing. There is a lot of namespacing going on, like:
FooLibrary.Bar
FooLibrary.Qux.Rumps
FooLibrary.Qux.Scrooge
..
Basically, what I would like to do is make sure that the user can access that whole namespace. I have had a whole bunch of trouble with this, and I'm totally new to closure, so I thought I'd ask for some input.
First, I need closurebuilder.py to send the full list of files to the closure compiler. This doesn't seem supported: --namespace Foo does not include Foo.Bar. --input only allows a single file, not a directory. Nor can I simply send my list of files to the closure compiler directly, because my code is also requiring things like "goog.assers", so I do need the resolver.
In fact, the only solution I can see is having a FooLibrary.ExposeAPI JS file that #require's everything. Surely that can't be right?
This is my main issue.
However, later the closure compiler, with ADVANCED_OPTIMIZATIONS on, will optimize all these names away. Now I can fix that by adding "#export" all over the place, which I am not happy about, but should work. I suppose it would also be valid to use an extern here. Or I could simply disable advanced optimizations.
What I can't do, apparently, is say "export FooLibrary.*". Wouldn't that make sense?
Finally, for working in source mode, I need to do goog.require() for every namespace I am using. This is merely an inconvenience, though I am mentioning because it sort of related to my trouble above. I would prefer to be able to do:
goog.requireRecursively('FooLibrary')
in order to pull all the child namespaces as well; thus, recreating with a single command the environment that I have when I am using the compiled version of my library.
I feel like I am possibly misunderstanding some things, or how Closure is supposed to be used. I'd be interested in looking at other Closure-based libraries to see how they solve this.
You are discovering that Closure-compiler is built more for the end consumer and not as much for the library author.
If you are exporting basically everything, then you would be better off with SIMPLE_OPTIMIZATIONS. I would still highly encourage you to maintain compatibility of your library with ADVANCED_OPTIMIZATIONS so that users can compile the library source with their project.
First, I need closurebuilder.py to send the full list of files to the closure compiler. ...
In fact, the only solution I can see is having a FooLibrary.ExposeAPI JS file that #require's everything. Surely that can't be right?
You would need to specify an --root of your source folder and specify the namespaces of the leaf nodes of your file dependency tree. You may have better luck with the now deprecated CalcDeps.py script. I still use it for some projects.
What I can't do, apparently, is say "export FooLibrary.*". Wouldn't that make sense?
You can't do that because it only makes sense based on the final usage. You as the library writer wish to export everything, but perhaps a consumer of your library wishes to include the source (uncompiled) version and have more dead code elimination. Library authors are stuck in a kind of middle ground between SIMPLE and ADVANCED optimization levels.
What I have done for this case is maintain a separate exports file for my namespace that exports everything. When compiling a standalone version of my library for distribution, the exports file is included in the compilation. However I can still include the library source (without the exports) into a project and get full dead code elimination. The work/payoff balance of this though must be weighed against just using SIMPLE_OPTIMIZATIONS for the standalone library.
My GeolocationMarker library has an example of this strategy.
I'm designing an information system (in asp.net) in witch will be handling different modules once its done.
and i don't have enough time nor money to make all of the modules at once, so I've decided to do a few modules first and later on when i have time or money continue with the reset of them.
Now the question is: is there a generic way to call a module from a list for example:I would create a directory with modules where i'm planning to drop the .dll of the modules, so when i make a new one i will put the new .dll there. On the other hand, i want to build something like a skeleton where i will generically call all the modules in the directory I've made via code, without having to re write the code of the skeleton whenever new modules are dropped into the directory. finally I've planned that each module should have three layers one for db accessing another one for logic and a the last one for interface drawing so each module should be independent of each other.
is it possible? how should i do this I've been looking but cant find anything yet.
is there a better way you suggest?
You would definitely need to create common interfaces that modules implement and common data contracts. If you need to load dlls dynamically - it is possible but you would need to use reflection. Look here:
http://dranaxum.wordpress.com/2008/02/25/dynamic-load-net-dll-files-creating-a-plug-in-system-c/
is there a simple way to generate a JAR file, that contains only the classes that depend transitive from a certain "main" class (reflection omitted of course).
I want to provide a little part of my application to someone else but do not want to export the whole application.
Thanks,
M.
Probably the easiest approach is using a tool like yGuard which "...provides elaborate code shrinking functionality through dependency analysis." This would also solve the same problem where you give it an entry point, and it performs dependency analysis to work out which classes can be excluded from the Jar.
However, I have throught about this problem myself a few times and fancied having a go at it myself for the challenge. All it would take would be to parse the import statements of Java source files and build a dependncy graph of how the classes interact with each other. Each reference from the main class should be recursively scanned until a complete graph is assembled. Then once the graph is assembled it would be a case of outputting this in a way that some packaging logic could process (or if you are feeling daring, the JDK has its own built in Jar creating/modifying code to do it yourself). Granted, this approach would require writing this custom utility and would also miss fully qualified class references in the code.
How do you like to organize your Drupal code? One giant module? Separate modules per feature? Separate modules per code type (theme functions, menu hooks, etc...)?
I've started by trying to organize by feature, treating modules like they were libraries. Ultimately though things are never perfectly contained... modules want to use each other's theme functions, and modules are all contributing various tabs to a common page -- two examples of it not always being so clear where to find code. This tempts me to keep all theme functions together, and all hook_menus together, but this would be awkward for other reasons...
Assume that all code is too specific to eventually share, so there's no attempt here to make self contained contributed modules. I'm mostly concerned about maintaining sanity and cleanliness in a large scale Drupal site.
I tend to have a folder with one main module with all the shared functions, and a variety of sub-modules that are broken up by logical functional divisions. I've found the single huge module approach makes finding stuff in it rather unfun.
It really doesn't make much of a difference if you're not distributing it on Drupal.org, though, so whatever makes sense to you is fine.
I load all customizations into a single module per project (menu/form/link alters, etc.). If enough customization is done, I will fork the original module or create a new module with the original module as a dependency. It's at this point that it pretty subjective: I have no hard and fast rule saying 'fork a module when I reach this many function points or lines of code'.
Anything that adds functionality (meaning that it doesn't override something else) goes into it's own module.
If any newly created or forked modules can be used in other projects or contexts, I will publish them to my personal repository.
I most often use a single module and a set of include files where I store my classes. Although views uses more than one module, it is a great example of the this strategy. Take a look at the views module includes folder to see what I mean.