Julia code encapsulation. Is this a generally good idea? - julia

Is it a common thing for Julia users to encapsulate functionality into own modules with the module keyword?
I've been looking at modules and they seem to use include more than actually using the module keyword for parts of code.
What is the better way?

Julia has 3 levels of "Places you can put code"
include'd files
submodules
packages -- which for our purposes have exactly 1 (non-sub) module.
I use to be a big fan of submodules, given I come from python.
But submodules in julia are not so good.
From my experience, code written with them tends to be annoying both from a developer, and from a user perspective.
It just don't work out cleanly.
I have ripped the submodules out of at least one of my packages, and switched to plain includes.
Python needs submodules, to help deal with its namespace -- "Namespaces are great, lets do more of those".
But because of multiple dispatch, julia does not run out of function names -- you can reuse the same name, with a different type signature, and that is fine (Good even)
In general submodules grant you separation and decoupling of each submodule from each other. But in that case, why are you not using entirely separate packages? (with a common dependency)
Things that I find go wrong with submodules:
say you had:
- module A
- module B (i.e A.B)
- type C
One would normally do using A.B
but by mistake you can do using B since B is probably in a file called B.jl in your LOAD_PATH.
If you do this, then you want to access type C.
If you had done using A.B you would end up with a type A.B.C, when you entered the statement B.C().
But if you had mistakenly done using B, then B.C() gives you the type B.C.
And this type is not compatible, with functions that (because they do the right using) expect as A.B.C.
It just gets a bit messy.
Also reload("A.B") does not work.
(however reload often doesn't work great)
Base is one of the only major chuncks of julia code that uses submodules (that I aware of). And even Base is pushed a lot of those into seperate (stdlib) packages for julia 0.7.
The short is, if you are thinking about using a submodule,
check that it isn't a habit you are bringing over from another language.
And consider if you don't just want to release another separate package.

Related

Ada code instrumentation as GNAT compilation part?

I'm looking for a best way to integrate GNAT compiler with our custom code analysis/modification tools. We are using custom tools to perform different code metrics (like execution time, test coverage and etc) and even do some code obfuscation. So for example for measuring code execution time I need to insert 2 procedure calls into each function/procedure (the first one where the function starts, and the other one for each function exit). The code for these 2 procedures are implemented in a separate translation unit. What is the best way to do these code instrumentations (insert/modify code) with GNAT compiler in terms of simplicity and performance? I can think of these several ways:
Does GNAT compiler support code generation plugins of any kind? Seem's that it doesn't, but maybe I missed something while googling about it. Maybe there is a way to do it using some metaprogramming tricks (like in some modern programming languages like Nimrod and D), but I couldn't find if Ada even supports metaprogramming at all.
Looks like the ASIS library can help me out, but it is made for creating separate tools. Is it possible to integrate ASIS-based tool with GNAT? So for example to write a tool that would be loaded by GNAT during compilation, and would modify nodes in the AST before it (the AST) is about to be transformed into GIMPLE. Using the ASIS-based tool separately (for example by preprocessing each source file before passing it to compiler) may reduce compilation time, as source code will need to be parsed twice (by the tool and by the compiler) and be saved/loaded to/from some temporary location on disk.
Is it possible to get GIMPLE from GNAT compiler, modify and pass it to GCC? I couldn't find if there is a working GIMPLE front-end inside GCC, but it seems that GIMPLE is used internally only. I can dump it with GCC compiler, but I can't recompile modified GIMPLE afterwards (seems that there is no GIMPLE front-end for GCC).

'make'-like dependency-tracking library?

There are many nice things to like about Makefiles, and many pains in the butt.
In the course of doing various project (I'm a research scientist, "data scientist", or whatever) I often find myself starting out with a few data objects on disk, generating various artifacts from those, generating artifacts from those artifacts, and so on.
It would be nice if I could just say "this object depends on these other objects", and "this object is created in the following manner from these objects", and then ask a Make-like framework to handle the details of actually building them, figuring out which objects need to be updated, farming out work to multiple processors (like Make's -j option), and so on. Makefiles can do all this - but the huge problem is that all the actions have to be written as shell commands. This is not convenient if I'm working in R or Perl or another similar environment. Furthermore, a strong assumption in Make is that all targets are files - there are some exceptions and workarounds, but if my targets are e.g. rows in a database, that would be pretty painful.
To be clear, I'm not after a software-build system. I'm interested in something that (more generally?) deals with dependency webs of artifacts.
Anyone know of a framework for these kinds of dependency webs? Seems like it could be a nice tool for doing data science, & visually showing how results were generated, etc.
One extremely interesting example I saw recently was IncPy, but it looks like it hasn't been touched in quite a while, and it's very closely coupled with Python. It's probably also much more ambitious than I'm hoping for, which is why it has to be so closely coupled with Python.
Sorry for the vague question, let me know if some clarification would be helpful.
A new system called "Drake" was announced today that targets this exact situation: http://blog.factual.com/introducing-drake-a-kind-of-make-for-data . Looks very promising, though I haven't actually tried it yet.
This question is several years old, but I thought adding a link to remake here would be relevant.
From the GitHub repository:
The idea here is to re-imagine a set of ideas from make but built for R. Rather than having a series of calls to different instances of R (as happens if you run make on R scripts), the idea is to define pieces of a pipeline within an R session. Rather than being language agnostic (like make must be), remake is unapologetically R focussed.
It is not on CRAN yet, and I haven't tried it, but it looks very interesting.
I would give Bazel a try for this. It is primarily a software build system, but with its genrule type of artifacts it can perform pretty arbitrary file generation, too.
Bazel is very extendable, using its Python-like Starlark language which should be far easier to use for complicated tasks than make. You can start by writing simple genrule steps by hand, then refactor common patterns into macros, and if things become more complicated even write your own rules. So you should be able to express your individual transformations at a high level that models how you think about them, then turn that representation into lower level constructs using something that feels like a proper programming language.
Where make depends on timestamps, Bazel checks fingerprints. So if at any one step produces the same output even though one of its inputs changed, then subsequent steps won't need to get re-computed again. If some of your data processing steps project or filter data, there might be a high probability of this kind of thing happening.
I see your question is tagged for R, even though it doesn't mention it much. Under the hood, R computations would in Bazel still boil down to R CMD invocations on the shell. But you could have complicated muliti-line commands assembled in complicated ways, to read your inputs, process them and store the outputs. If the cost of initialization of the R binary is a concern, Rserve might help although using it would make the setup depend on a locally accessible Rserve instance I believe. Even with that I see nothing that would avoid the cost of storing the data to file, and loading it back from file. If you want something that avoids that cost by keeping things in memory between steps, then you'd be looking into a very R-specific tool, not a generic tool like you requested.
In terms of “visually showing how results were generated”, bazel query --output graph can be used to generate a graphviz dot file of the dependency graph.
Disclaimer: I'm currently working at Google, which internally uses a variant of Bazel called Blaze. Actually Bazel is the open-source released version of Blaze. I'm very familiar with using Blaze, but not with setting up Bazel from scratch.
Red-R has a concept of data flow programming. I have not tried it yet.

Writing robust R code: namespaces, masking and using the `::` operator

Short version
For those that don't want to read through my "case", this is the essence:
What is the recommended way of minimizing the chances of new packages breaking existing code, i.e. of making the code you write as robust as possible?
What is the recommended way of making the best use of the namespace mechanism when
a) just using contributed packages (say in just some R Analysis Project)?
b) with respect to developing own packages?
How best to avoid conflicts with respect to formal classes (mostly Reference Classes in my case) as there isn't even a namespace mechanism comparable to :: for classes (AFAIU)?
The way the R universe works
This is something that's been nagging in the back of my mind for about two years now, yet I don't feel as if I have come to a satisfying solution. Plus I feel it's getting worse.
We see an ever increasing number of packages on CRAN, github, R-Forge and the like, which is simply terrific.
In such a decentralized environment, it is natural that the code base that makes up R (let's say that's base R and contributed R, for simplicity) will deviate from an ideal state with respect to robustness: people follow different conventions, there's S3, S4, S4 Reference Classes, etc. Things can't be as "aligned" as they would be if there were a "central clearing instance" that enforced conventions. That's okay.
The problem
Given the above, it can be very hard to use R to write robust code. Not everything you need will be in base R. For certain projects you will end up loading quite a few contributed packages.
IMHO, the biggest issue in that respect is the way the namespace concept is put to use in R: R allows for simply writing the name of a certain function/method without explicitly requiring it's namespace (i.e. foo vs. namespace::foo).
So for the sake of simplicity, that's what everyone is doing. But that way, name clashes, broken code and the need to rewrite/refactor your code are just a matter of time (or of the number of different packages loaded).
At best, you will know about which existing functions are masked/overloaded by a newly added package. At worst, you will have no clue until your code breaks.
A couple of examples:
try loading RMySQL and RSQLite at the same time, they don't go along very well
also RMongo will overwrite certain functions of RMySQL
forecast masks a lot of stuff with respect to ARIMA-related functions
R.utils even masks the base::parse routine
(I can't recall which functions in particular were causing the problems, but am willing to look it up again if there's interest)
Surprisingly, this doesn't seem to bother a lot of programmers out there. I tried to raise interest a couple of times at r-devel, to no significant avail.
Downsides of using the :: operator
Using the :: operator might significantly hurt efficiency in certain contexts as Dominick Samperi pointed out.
When developing your own package, you can't even use the :: operator throughout your own code as your code is no real package yet and thus there's also no namespace yet. So I would have to initially stick to the foo way, build, test and then go back to changing everything to namespace::foo. Not really.
Possible solutions to avoid these problems
Reassign each function from each package to a variable that follows certain naming conventions, e.g. namespace..foo in order to avoid the inefficiencies associated with namespace::foo (I outlined it once here). Pros: it works. Cons: it's clumsy and you double the memory used.
Simulate a namespace when developing your package. AFAIU, this is not really possible, at least I was told so back then.
Make it mandatory to use namespace::foo. IMHO, that would be the best thing to do. Sure, we would lose some extend of simplicity, but then again the R universe just isn't simple anymore (at least it's not as simple as in the early 00's).
And what about (formal) classes?
Apart from the aspects described above, :: way works quite well for functions/methods. But what about class definitions?
Take package timeDate with it's class timeDate. Say another package comes along which also has a class timeDate. I don't see how I could explicitly state that I would like a new instance of class timeDate from either of the two packages.
Something like this will not work:
new(timeDate::timeDate)
new("timeDate::timeDate")
new("timeDate", ns="timeDate")
That can be a huge problem as more and more people switch to an OOP-style for their R packages, leading to lots of class definitions. If there is a way to explicitly address the namespace of a class definition, I would very much appreciate a pointer!
Conclusion
Even though this was a bit lengthy, I hope I was able to point out the core problem/question and that I can raise more awareness here.
I think devtools and mvbutils do have some approaches that might be worth spreading, but I'm sure there's more to say.
GREAT question.
Validation
Writing robust, stable, and production-ready R code IS hard. You said: "Surprisingly, this doesn't seem to bother a lot of programmers out there". That's because most R programmers are not writing production code. They are performing one-off academic/research tasks. I would seriously question the skillset of any coder that claims that R is easy to put into production. Aside from my post on search/find mechanism which you have already linked to, I also wrote a post about the dangers of warning. The suggestions will help reduce complexity in your production code.
Tips for writing robust/production R code
Avoid packages that use Depends and favor packages that use Imports. A package with dependencies stuffed into Imports only is completely safe to use. If you absolutely must use a package that employs Depends, then email the author immediately after you call install.packages().
Here's what I tell authors: "Hi Author, I'm a fan of the XYZ package. I'd like to make a request. Could you move ABC and DEF from Depends to Imports in the next update? I cannot add your package to my own package's Imports until this happens. With R 2.14 enforcing NAMESPACE for every package, the general message from R Core is that packages should try to be "good citizens". If I have to load a Depends package, it adds a significant burden: I have to check for conflicts every time I take a dependency on a new package. With Imports, the package is free of side-effects. I understand that you might break other people's packages by doing this. I think its the right thing to do to demonstrate a commitment to Imports and in the long-run it will help people produce more robust R code."
Use importFrom. Don't add an entire package to Imports, add only those specific functions that you require. I accomplish this with Roxygen2 function documentation and roxygenize() which automatically generates the NAMESPACE file. In this way, you can import two packages that have conflicts where the conflicts aren't in the functions you actually need to use. Is this tedious? Only until it becomes a habit. The benefit: you can quickly identify all of your 3rd-party dependencies. That helps with...
Don't upgrade packages blindly. Read the changelog line-by-line and consider how the updates will affect the stability of your own package. Most of the time, the updates don't touch the functions you actually use.
Avoid S4 classes. I'm doing some hand-waving here. I find S4 to be complex and it takes enough brain power to deal with the search/find mechanism on the functional side of R. Do you really need these OO feature? Managing state = managing complexity - leave that for Python or Java =)
Write unit tests. Use the testthat package.
Whenever you R CMD build/test your package, parse the output and look for NOTE, INFO, WARNING. Also, physically scan with your own eyes. There's a part of the build step that notes conflicts but doesn't attach a WARN, etc. to it.
Add assertions and invariants right after a call to a 3rd-party package. In other words, don't fully trust what someone else gives you. Probe the result a little bit and stop() if the result is unexpected. You don't have to go crazy - pick one or two assertions that imply valid/high-confidence results.
I think there's more but this has become muscle memory now =) I'll augment if more comes to me.
My take on it :
Summary : Flexibility comes with a price. I'm willing to pay that price.
1) I simply don't use packages that cause that kind of problems. If I really, really need a function from that package in my own packages, I use the importFrom() in my NAMESPACE file. In any case, if I have trouble with a package, I contact the package author. The problem is at their side, not R's.
2) I never use :: inside my own code. By exporting only the functions needed by the user of my package, I can keep my own functions inside the NAMESPACE without running into conflicts. Functions that are not exported won't hide functions with the same name either, so that's a double win.
A good guide on how exactly environments, namespaces and the likes work you find here:
http://blog.obeautifulcode.com/R/How-R-Searches-And-Finds-Stuff/
This definitely is a must-read for everybody writing packages and the likes. After you read this, you'll realize that using :: in your package code is not necessary.

Can I write a DLL file that exports functions for use from Common Lisp?

Can I write a DLL file that exports functions for use from or that use Common Lisp?
Each Common Lisp implementation has a different way to extend it from various foreign languages. Which implementation do you intend to use?
The GNU CLISP implementation allows one to define external modules written in C that expose objects, symbols, and functions to Lisp. The documentation for writing an external module is complete, but you'll likely find it difficult to integrate this into the rest of your build process, unless you're already using make or shell scripts to automate portions of it.
Alternately, you can turn the question around and ask how do you access C libraries from Common Lisp. Again, most implementations have a foreign function interface, or FFI that allows them to reach out to various other languages. CLISP has an FFI, but you can also use a package like CFFI for portability among Common Lisp implementations. The CLISP documentation describes the trades in these two approaches.
ECL may be another good choice for you if you intend to embed Common Lisp within your C program.
(..i'm not 100% sure what you mean, but i'll just throw some bits out there and see what happens..)
Most Lisps can do the C <--> Lisp type thing by ways of FFI, and there are compatibility layers/libraries for doing FFI like the already mentioned CFFI.
So you can pretty much always have Lisp call C functions and have C call Lisp functions, and most do it by loading .dll/.so files into the already running Lisp process. Note that this tends to be what other environments like Python (PyGTK etc.) do too. This is often exactly what you want, so you might perhaps want to ignore most of what I say below.
The only Lisp I can think of that enables one to do things the "other way around", i.e., load a .dll/.so which "is" Lisp or is produced by Lisp into an already running C process, is ECL.
In many cases it really does not matter where you put the entry point or the "main() function" to use C terms, so if you'd like to use some other Lisp besides ECL but are thinking you "can't because .." this is something to reconsider since, yeah, you can in many cases just shuffle thing around a bit.
However, it's almost always a much better idea to user other IPC mechanisms and avoid any kind of FFI when you can.

Compiled dynamic language

I search for a programming language for which a compiler exists and that supports self modifying code. I’ve heared that Lisp supports these features, but I was wondering if there is a more C/C++/D-Like language with these features.
To clarify what I mean:
I want to be able to have in some way access to the programms code at runtime and apply any kind of changes to it, that is, removing commands, adding commands, changing them.
As if i had the AstTree of my programm. Of course i can’t have that tree in a compiled language, so it must be done different. The compile would need to translate the self-modifying commands into their binary equivalent modifications so they would work in runtime with the compiled code.
I don’t want to be dependent on an VM, thats what i meant with compiled :)
Probably there is a reason Lisp is like it is? Lisp was designed to program other languages and to compute with symbolic representations of code and data. The boundary between code and data is no longer there. This influences the design AND the implementation of a programming language.
Lisp has got its syntactical features to generate new code, translate that code and execute it. Thus pre-parsed code is also using the same data structures (symbols, lists, numbers, characters, ...) that are used for other programs, too.
Lisp knows its data at runtime - you can query everything for its type or class. Classes are objects themselves, as are functions. So these elements of the programming language and the programs also are first-class objects, they can be manipulated as such. Dynamic language has nothing to do with 'dynamic typing'.
'Dynamic language' means that the elements of the programming language (for example via meta classes and the meta-object protocol) and the program (its classes, functions, methods, slots, inheritance, ...) can be looked at runtime and can be modified at runtime.
Probably the more of these features you add to a language, the more it will look like Lisp. Since Lisp is pretty much the local maximum of a simple, dynamic, programmable programming language. If you want some of these features, then you might want to think which features of your other program language you have to give up or are willing to give up. For example for a simple code-as-data language, the whole C syntax model might not be practical.
So C-like and 'dynamic language' might not really be a good fit - the syntax is one part of the whole picture. But even the C syntax model limits us how easy we can work with a dynamic language.
C# has always allowed for self-modifying code.
C# 1 allowed you to essentially create and compile code on the fly.
C# 3 added "expression trees", which offered a limited way to dynamically generate code using an object model and abstract syntax trees.
C# 4 builds on that by incorporating support for the "Dynamic Language Runtime". This is probably as close as you are going to get to LISP-like capabilities on the .NET platform in a compiled language.
You might want to consider using C++ with LLVM for (mostly) portable code generation. You can even pull in clang as well to work in C parse trees (note that clang has incomplete support for C++ currently, but is written in C++ itself)
For example, you could write a self-modification core in C++ to interface with clang and LLVM, and the rest of the program in C. Store the parse tree for the main program alongside the self-modification code, then manipulate it with clang at runtime. Clang will let you directly manipulate the AST tree in any way, then compile it all the way down to machine code.
Keep in mind that manipulating your AST in a compiled language will always mean including a compiler (or interpreter) with your program. LLVM is just an easy option for this.
JavaScirpt + V8 (the Chrome JavaScript compiler)
JavaScript is
dynamic
self-modifying (self-evaluating) (well, sort of, depending on your definition)
has a C-like syntax (again, sort of, that's the best you will get for dynamic)
And you now can compile it with V8: http://code.google.com/p/v8/
"Dynamic language" is a broad term that covers a wide variety of concepts. Dynamic typing is supported by C# 4.0 which is a compiled language. Objective-C also supports some features of dynamic languages. However, none of them are even close to Lisp in terms of supporting self modifying code.
To support such a degree of dynamism and self-modifying code, you should have a full-featured compiler to call at run time; this is pretty much what an interpreter really is.
Try groovy. It's a dynamic Java-JVM based language that is compiled at runtime. It should be able to execute its own code.
http://groovy.codehaus.org/
Otherwise, you've always got Perl, PHP, etc... but those are not, as you suggest, C/C++/D- like languages.
I don’t want to be dependent on an VM, thats what i meant with compiled :)
If that's all you're looking for, I'd recommend Python or Ruby. They can both run on their own virtual machines and the JVM and the .Net CLR. Thus, you can choose any runtime you want. Of the two, Ruby seems to have more meta-programming facilities, but Python seems to have more mature implementations on other platforms.

Resources