Adding a Base function vs using a unique function name in Julia - julia

I am porting a library to Julia and am curious what would be considered best practice for adding Base functions from a module.
This library contains functions like values(t::MyType) and keys(t::MyType) that take unique struct types but do not really do the same thing or return the same types as the Base functions
What would be the best practice in this case?
Just add Base.values(t::MyType) and Base.keys(t::MyType) functions so they can be used without prefixes.
Change the function names to my_type_keys(t::MyType) and my_type_values(t::MyType)
Use their original names and require them to be prefixed as MyModule.values(t) and MyModule.keys(t)

If you extend Base functions you should aim for them to do conceptually the same thing. Also, you should only extend Base functions to dispatch on types you define in your own package. The rule is that you can define methods for outside (e.g. Base or some other package's) functions for your own types. Or define you own functions for outside types. But defining methods for outside functions on outside types is "type piracy".
Given that you'd be defining them on your own types, it's not a problem that the return values differ, as long as the functions are conceptualy the same.
With regards to option 2 or 3 you can do both. Option 3 requires that you don't import the Base functions explicitly (in that case you'd be extending them by defining new methods rather than defining new functions with the same name), that you don't export them, and most likely that you'll not be using the Base functions inside your module (keys is really widely used for example - you can use it but would have to preface it with Base. inside the module).
Option 2 is always safe, especially if you can find a better name than my_keys. BUT I think it is preferable to use the same name IF the function is doing conceptually the same thing, as it simplifies the user experience a lot. You can assume most users will know the Base functions and try them out if that's intuitive.

Related

Is is possible to use function overloading in R?

Is it possible to use function overloading in R? I've seen answers to this question from 2013, but the language and its possibilities evolved a lot over time.
In my specific case, I'd like the following:
formFeedback <- function(success, message)
formFeedback <- function(feedback_object)
Is this at all possible? If not, what's the best practice? Should I use two different named functions, should I make all parameters optional, or should I force the users to always pass an object and make the other one impossible?
No, there's no function overloading in R, though there are a few object systems that accomplish similar things. The simplest one is S3: it dispatches based on the class attribute of the first argument (though there are a few cases where it looks at other arguments).
So if your feedback_object has a class attribute, you could make your first form the default and your second one the method for that class, but you'd have to rename the first argument to match in all methods. The functions would need to have different names, formFeedback.default and formFeedback.yourclass.
There's also a more complicated system called S4 that can dispatch on the types of all arguments.
And you can craft your own dispatch if you like (and this is the basis for several other object systems in packages), using whatever method you like to decide which function to call.

What is the ccallable macro for in Julia?

I have recently seen that there is a macro defined in base/c.jl called ccallable, but it is not clear to me what its usefulness is. It seems to be undocumented.
After reading the documentation a little more carefully, I would summarise the usefulness of this macro as follows:
Make the annotated function be callable from C using its name. This can, for example, be used to expose functionality as a C-API when creating a custom Julia sysimage.

What is Multiple dispatch and how does one use it in Julia?

I have seen and heard many times that Julia allows "multiple dispatch", but I am not really sure what that means or looks like. Can anyone provide me an example of what it looks like programmatically and what it enables?
From the Julia docs
The choice of which method to execute when a function is applied is called dispatch. Julia allows the dispatch process to choose which of a function's methods to call based on the number of arguments given, and on the types of all of the function's arguments. This is different than traditional object-oriented languages, where dispatch occurs based only on the first argument, which often has a special argument syntax, and is sometimes implied rather than explicitly written as an argument. 1 Using all of a function's arguments to choose which method should be invoked, rather than just the first, is known as multiple dispatch. Multiple dispatch is particularly useful for mathematical code, where it makes little sense to artificially deem the operations to "belong" to one argument more than any of the others: does the addition operation in x + y belong to x any more than it does to y? The implementation of a mathematical operator generally depends on the types of all of its arguments. Even beyond mathematical operations, however, multiple dispatch ends up being a powerful and convenient paradigm for structuring and organizing programs.
So in short: other languages rely on the first parameter of a method in order to determine which method should be called whereas in Julia, multiple parameters are taken into account. This enables multiple definitions of similar functions that have the same initial parameter.
A simple example of multiple dispatch in Julia can be found here.

Difference between _ptr, _pointer, and _cpointer in Racket's FFI

The Racket FFI's documentation has types for _ptr, _cpointer, and _pointer.1
However, the documentation (as of writing this question) does not seem to compare the three different types. Obviously the first two are functions that produce ctype?s, where as the last one is a ctype? itself. But when would I use one type over the other?
1It also has as other types such as _box, _list, _gcpointer, and _cpointer/null. These are all variants of those three functions.
_ptr is a macro that is used to create types that are suitable for function types in which you need to pass data via a pointer passed as an argument (a very common idiom in C).
_pointer is a generic pointer ctype that can be used pretty much wherever a pointer is expected or returned. On the Racket side, it becomes an opaque value that you can't manipulate very easily (you can use ptr-ref if you need it). Note the docs have some caveats about interactions with GC when using this.
_cpointer constructs safer variants of _pointer that use tags to ensure that you don't mix up pointers of different types. It's generally more convenient to use define-cpointer-type instead of manually constructing these. In other words, these help you build abstractions represented by Racket's C pointers. You can do it manually with cpointer-push-tag! and _pointer but that's less convenient.
There's also a blog post I wrote that goes into more detail about some of these pointer issues: http://prl.ccs.neu.edu/blog/2016/06/27/tutorial-using-racket-s-ffi/

What is the purpose of environments in R and when I need to use more than one?

This is a basic R question: R has the concept of environment. So what purpose does it have, when do I need to start more then one and how do I switch between them? What is the advantage of multiple environments (other then looking up content of .Rdata file)?
The idea of environments is important and you use them all the time, mostly without realizing it. If you are just using R and not doing anything fancy then the indirect use of environments is all that you need and you will not need to explicitly create and manipulate environments. Only when you get into more advanced usage will you need to understand more. The main place that you use (indirectly) environments is that every function has its own environment, so every time you run a function you are using new envirnments. Why this is important is because this means that if the function uses a variable named "x" and you have a variable named "x" then the computer can keep them straight and use the correct one when it needs to and your copy of "x" does not get over written by the functions version.
Some other cases where you might use environments: Each package has its own environment so 2 packages can both be loaded with the same name of an internal function and they won't interfere with each other. You can keep your workspace a little cleaner by attaching a new enironment and loading function definitions into that environment rather than the global or working environment. When you write your own functions and you want to share variables between functions then you will need to understand about environments. Environmets can be used to emulate pass-by-reference instead of pass-by-value if you are ever in a situation where that matters (if you don't recognize those phrases then it probably does not matter).
You can think of environments as unordered lists. Both datatypes offer something like the hash table data structure to the user, i.e., a mapping from names to values. The lack of ordering in environments offers better performance when compared with lists on similar tasks.
The access functions [[ and $ work for both.
A nice fact about environments which is not true for lists is that environments pass by reference when supplied as function arguments, offering a way to improve performance when working large objects.
Personally, I never work directly with environments. Instead, I divide my scripts up in functions. This leads to an increased reusability, and to more structure. In addition, each function runs in its own environment, ensuring minimum interference in variables etc.

Resources