Treating single and multiple elements the same way ("transparent" map operator) - collections

I'm working on a programming language that is supposed to be easy, intuitive, and succinct (yeah, I know, I'm the first person to ever come up with that goal ;-) ).
One of the features that I am considering for simplifying the use of container types is to make the methods of the container's element type available on the container type itself, basically as a shortcut for invoking a map(...) method. The idea is that working with many elements should not be different from working with a single element: I can apply add(5) to a single number or to a whole list of numbers, and I shouldn't have to write slightly different code for the "one" versus the "many" scenario.
For example (Java pseudo-code):
import static java.math.BigInteger.*; // ZERO, ONE, ...
...
// NOTE: BigInteger has an add(BigInteger) method
Stream<BigInteger> numbers = Stream.of(ZERO, ONE, TWO, TEN);
Stream<BigInteger> one2Three11 = numbers.add(ONE); // = 1, 2, 3, 11
// this would be equivalent to: numbers.map(ONE::add)
As far as I can tell, the concept would not only apply to "container" types (streams, lists, sets...), but more generally to all functor-like types that have a map method (e.g., optionals, state monads, etc.).
The implementation approach would probably be more along the lines of syntactic sugar offered by the compiler rather than by manipulating the actual types (Stream<BigInteger> obviously does not extend BigInteger, and even if it did the "map-add" method would have to return a Stream<BigInteger> instead of an Integer, which would be incompatible with most languages' inheritance rules).
I have two questions regarding such a proposed feature:
(1) What are the known caveats with offering such a feature? Method name collisions between the container type and the element type are one problem that comes to mind (e.g., when I call add on a List<BigInteger> do I want to add an element to the list or do I want to add a number to all elements of the list? The argument type should clarify this, but it's something that could get tricky)
(2) Are there any existing languages that offer such a feature, and if so, how is this implemented under the hood? I did some research, and while pretty much every modern language has something like a map operator, I could not find any languages where the one-versus-many distinction would be completely transparent (which leads me to believe that there is some technical difficulty that I'm overlooking here)
NOTE: I am looking at this in a purely functional context that does not support mutable data (not sure if that matters for answering these questions)

Do you come from an object-oriented background? That's my guess because you're thinking of map as a method belonging to each different "type" as opposed to thinking about various things that are of the type functor.
Compare how TypeScript would handle this if map were a property of each individual functor:
declare someOption: Option<number>
someOption.map(val => val * 2) // Option<number>
declare someEither: Either<string, number>
someEither.map(val => val * 2) // Either<string,number>
someEither.mapLeft(string => 'ERROR') // Either<'ERROR', number>
You could also create a constant representing each individual functor instance (option, array, identity, either, async/Promise/Task, etc.), where these constants have map as a method. Then have a standalone map method that takes one of those "functor constant"s, the mapping function, and the starting value, and returns the new wrapped value:
const option: Functor = {
map: <A, B>(f: (a:A) => B) => (o:Option<A>) => Option<B>
}
declare const someOption: Option<number>
map(option)(val => val * 2)(someOption) // Option<number>
declare const either: Functor = {
map: <E, A, B>(f: (a:A) => B) => (e:Either<E, A>) => Either<E, B>
}
declare const either: Either<string,number>
map(either)(val => val * 2)(someEither)
Essentially, you have a functor "map" that uses the first parameter to identify which type you're going to be mapping, and then you pass in the data and the mapping function.
However, with proper functional languages like Haskell, you don't have to pass in that "functor constant" because the language will apply it for you. Haskell does this. I'm not fluent enough in Haskell to write you the examples, unfortunately. But that's a really nice benefit that means even less boilerplate. It also allows you to write a lot of your code in what is "point free" style, so refactoring becomes much easier if you make your language so you don't have to manually specify the type being used in order to take advantage of map/chain/bind/etc.
Consider you initially write your code that makes a bunch of API calls over HTTP. So you use a hypothetical async monad. If your language is smart enough to know which type is being used, you could have some code like
import { map as asyncMap }
declare const apiCall: Async<number>
asyncMap(n => n*2)(apiCall) // Async<number>
Now you change your API so it's reading a file and you make it synchronous instead:
import { map as syncMap }
declare const apiCall: Sync<number>
syncMap(n => n*2)(apiCall)
Look how you have to change multiple pieces of the code. Now imagine you have hundreds of files and tens of thousands of lines of code.
With a point-free style, you could do
import { map } from 'functor'
declare const apiCall: Async<number>
map(n => n*2)(apiCall)
and refactor to
import { map } from 'functor'
declare const apiCall: Sync<number>
map(n => n*2)(apiCall)
If you had a centralized location of your API calls, that would be the only place you're changing anything. Everything else is smart enough to recognize which functor you're using and apply map correctly.
As far as your concerns about name collisions, that's a concern that will exist no matter your language or design. But in functional programming, add would be a combinator that is your mapping function passed into your fmap (Haskell term) / map(lots of imperative/OO languages' term). The function you use to add a new element to the tail end of an array/list might be called snoc ("cons" from "construct" spelled backwards, where cons prepends an element to your array; snoc appends). You could also call it push or append.
As far as your one-vs-many issue, these are not the same type. One is a list/array type, and the other is an identity type. The underlying code treating them would be different as they are different functors (one contains a single element, while one contains multiple elements.
I suppose you could create a language that disallows single elements by automatically wrapping them as a single-element lists and then just uses the list map. But this seems like a lot of work to make two things that are very different look the same.
Instead, the approach where you wrap single elements to be identity and multiple elements to be a list/array, and then array and identity have their own under-the-hood handlers for the functor method map probably would be better.

Related

Add element to immutable vector rust

I am trying to create a user input validation function in rust utilising functional programming and recursion. How can I return an immutable vector with one element concatenated onto the end?
fn get_user_input(output_vec: Vec<String>) -> Vec<String> {
// Some code that has two variables: repeat(bool) and new_element(String)
if !repeat {
return output_vec.add_to_end(new_element); // What function could "add_to_end" be?
}
get_user_input(output_vec.add_to_end(new_element)) // What function could "add_to_end" be?
}
There are functions for everything else:
push adds a mutable vector to a mutable vector
append adds an element to the end of a mutable vector
concat adds an immutable vector to an immutable vector
??? adds an element to the end of a immutable vector
The only solution I have been able to get working is using:
[write_data, vec![new_element]].concat()
but this seems inefficient as I'm making a new vector for just one element (so the size is known at compile time).
You are confusing Rust with a language where you only ever have references to objects. In Rust, code can have exclusive ownership of objects, and so you don't need to be as careful about mutating an object that could be shared, because you know whether or not the object is shared.
For example, this is valid JavaScript code:
const a = [];
a.push(1);
This works because a does not contain an array, it contains a reference to an array.1 The const prevents a from being repointed to a different object, but it does not make the array itself immutable.
So, in these kinds of languages, pure functional programming tries to avoid mutating any state whatsoever, such as pushing an item onto an array that is taken as an argument:
function add_element(arr) {
arr.push(1); // Bad! We mutated the array we have a reference to!
}
Instead, we do things like this:
function add_element(arr) {
return [...arr, 1]; // Good! We leave the original data alone.
}
What you have in Rust, given your function signature, is a totally different scenario! In your case, output_vec is owned by the function itself, and no other entity in the program has access to it. There is therefore no reason to avoid mutating it, if that is your goal:
fn get_user_input(mut output_vec: Vec<String>) -> Vec<String> {
// Add mut ^^^
You have to keep in mind that any non-reference is an owned value. &Vec<String> would be an immutable reference to a vector something else owns, but Vec<String> is a vector this code owns and nobody else has access to.
Don't believe me? Here's a simple example of broken code that demonstrates this:
fn take_my_vec(y: Vec<String>) { }
fn main() {
let mut x = Vec::<String>::new();
x.push("foo".to_string());
take_my_vec(x);
println!("{}", x.len()); // E0382
}
The expression x.len() causes a compile-time error, because the vector x was moved into the function argument and we don't own it anymore.
So why shouldn't the function mutate the vector it owns now? The caller can't use it anymore.
In summary, functional programming looks a bit different in Rust. In other languages that have no way to communicate "I'm giving you this object" you must avoid mutating values you are given because the caller may not expect you to change them. In Rust, who owns a value is clear, and the argument reflects that:
Is the argument a value (Vec<String>)? The function owns the value now, the caller gave it away and can't use it anymore. Mutate it if you need to.
Is the argument an immutable reference (&Vec<String>)? The function doesn't own it, and it can't mutate it anyway because Rust won't allow it. You could clone it and mutate the clone.
Is the argument a mutable reference (&mut Vec<String>)? The caller must explicitly give the function a mutable reference and is therefore giving the function permission to mutate it -- but the function still doesn't own the value. The function can mutate it, clone it, or both -- it depends what the function is supposed to do.
If you take an argument by value, there is very little reason not to make it mut if you need to change it for whatever reason. Note that this detail (mutability of function arguments) isn't even part of the function's public signature simply because it's not the caller's business. They gave the object away.
Note that with types that have type arguments (like Vec) other expressions of ownership are possible. Here are a few examples (this is not an exhaustive list):
Vec<&String>: You now own a vector, but you don't own the String objects that it contains references to.
&Vec<&String>: You are given read-only access to a vector of string references. You could clone this vector, but you still couldn't change the strings, just rearrange them, for example.
&Vec<&mut String>: You are given read-only access to a vector of mutable string references. You can't rearrange the strings, but you can change the strings themselves.
&mut Vec<&String>: Like the above but opposite: you are allowed to rearrange the string references but you can't change the strings.
1 A good way to think of it is that non-primitive values in JavaScript are always a value of Rc<RefCell<T>>, so you're passing around a handle to the object with interior mutability. const only makes the Rc<> immutable.

Result functions of type x => x are unnecessary?

There might be a gap in my understanding of how Reselect works.
If I understand it correctly the code beneath:
const getTemplates = (state) => state.templates.templates;
export const getTemplatesSelector = createSelector(
[getTemplates],
templates => templates
);
could just as well (or better), without loosing anything, be written as:
export const getTemplatesSelector = (state) => state.templates.templates;
The reason for this, if I understand it correctly, is that Reselect checks it's first argument and if it receives the exact same object as before it returns a cached output. It does not check for value equality.
Reselect will only run templates => templates when getTemplates returns a new object, that is when state.templates.templates references a new object.
The input in this case will be exactly the same as the input so no caching functionality is gained by using Reselect.
One can only gain performance from Reselect's caching-functionality when the function (in this case templates => templates) itself returns a new object, for example via .filter or .map or something similar. In this case though the object returned is the same, no changes are made and thus we gain nothing by Reselect's memoization.
Is there anything wrong with what I have written?
I mainly want to make sure that I correctly understand how Reselect works.
-- Edits --
I forgot to mention that what I wrote assumes that the object state.templates.templates immutably that is without mutation.
Yes, in your case reselect won't bring any benefit, since getTemplates is evaluated on each call.
The 2 most important scenarios where reselect shines are:
stabilizing the output of a selector when result function returns a new object (which you mentioned)
improving selector's performance when result function is computationally expensive

Is it possible to declare a tuple struct whose members are private, except for initialization?

Is it possible to declare a tuple struct where the members are hidden for all intents and purposes, except for declaring?
// usize isn't public since I don't want users to manipulate it directly
struct MyStruct(usize);
// But now I can't initialize the struct using an argument to it.
let my_var = MyStruct(0xff)
// ^^^^
// How to make this work?
Is there a way to keep the member private but still allow new structs to be initialized with an argument as shown above?
As an alternative, a method such as MyStruct::new can be implemented, but I'm still interested to know if its possible to avoid having to use a method on the type since it's shorter, and nice for types that wrap a single variable.
Background
Without going into too many details, the only purpose of this type is to wrap a single type (a helper which hides some details, adds some functionality and is optimized away completely when compiled), in this context it's not exactly exposing hidden internals to use the Struct(value) style initializing.
Further, since the wrapper is zero overhead, its a little misleading to use the new method which is often associated with allocation/creation instead of casting.
Just as it's convenient type (int)v or int(v), instead of int::new(v), I'd like to do this for my own type.
It's used often, so the ability to use short expression is very convenient. Currently I'm using a macro which calls a new method, its OK but a little awkward/indirect, hence this question.
Strictly speaking this isn't possible in Rust.
However the desired outcome can be achieved using a normal struct with a like-named function (yes, this works!)
pub struct MyStruct {
value: usize,
}
#[allow(non_snake_case)]
pub fn MyStruct(value: usize) -> MyStruct {
MyStruct { value }
}
Now, you can write MyStruct(5) but not access the internals of MyStruct.
I'm afraid that such a concept is not possible, but for a good reason. Each member of a struct, unless marked with pub, is admitted as an implementation detail that should not raise to the surface of the public API, regardless of when and how the object is currently being used. Under this point of view, the question's goal reaches a conundrum: wishing to keep members private while letting the API user define them arbitrarily is not only uncommon but also not very sensible.
As you mentioned, having a method named new is the recommended approach of doing that. It's not like you're compromising code readability with the extra characters you have to type. Alternatively, for the case where the struct is known to wrap around an item, making the member public can be a possible solution. That, on the other hand, would allow any kind of mutations through a mutable borrow (thus possibly breaking the struct's invariants, as mentioned by #MatthieuM). This decision depends on the intended API.

Difference between List, Tuple, Sequence, Sequential, Iterable, Array, etc. in Ceylon

Ceylon has several different concepts for things that might all be considered some kind of array: List, Tuple, Sequence, Sequential, Iterable, Array, Collection, Category, etc. What's is different about these these types and when should I use them?
The best place to start to learn about these things at a basic level is the Ceylon tour. And the place to learn about these things in depth is the module API. It can also be helpful to look at the source files for these.
Like all good modern programming languages, the first few interfaces are super abstract. They are built around one formal member and provide their functionality through a bunch of default and actual members. (In programming languages created before Java 8, you may have heard these called "traits" to distinguish them from traditional interfaces which have only formal members and no functionality.)
Category
Let's start by talking about the interface Category. It represents types of which you can ask "does this collection contain this object", but you may not necessarily be able to get any of the members out of the collection. It's formal member is:
shared formal Boolean contains(Element element)
An example might be the set of all the factors of a large number—you can efficiently test if any integer is a factor, but not efficiently get all the factors.
Iterable
A subtype of Category is the interface Iterable. It represents types from which you can get each element one at a time, but not necessarily index the elements. The elements may not have a well-defined order. The elements may not even exist yet but are generated on the fly. The collection may even be infinitely long! It's formal member is:
shared formal Iterator<Element> iterator()
An example would be a stream of characters like standard out. Another example would be a range of integers provided to a for loop, for which it is more memory efficient to generate the numbers one at a time.
This is a special type in Ceylon and can be abbreviated {Element*} or {Element+} depending on if the iterable might be empty or is definitely not empty, respectively.
Collection
One of Iterable's subtypes is the interface Collection. It has one formal member:
shared formal Collection<Element> clone()
But that doesn't really matter. The important thing that defines a Collection is this line in the documentation:
All Collections are required to support a well-defined notion of value
equality, but the definition of equality depends upon the kind of
collection.
Basically, a Collection is a collection who structure is well-defined enough to be equatable to each other and clonable. This requirement for a well-defined structure means that this is the last of the super abstract interfaces, and the rest are going to look like more familiar collections.
List
One of Collection's subtypes is the interface List. It represents a collection whose elements we can get by index (myList[42]). Use this type when your function requires an array to get things out of, but doesn't care if it is mutable or immutable. It has a few formal methods, but the important one comes from its other supertype Correspondence:
shared formal Item? get(Integer key)
Sequential, Sequence, Empty
The most important of List's subtypes is the interface Sequential. It represents an immutable List. Ceylon loves this type and builds a lot of syntax around it. It is known as [Element*] and Element[]. It has exactly two subtypes:
Empty (aka []), which represents empty collections
Sequence (aka [Element+]), which represents nonempty collections.
Because the collections are immutable, there are lots of things you can do with them that you can't do with mutable collections. For one, numerous operations can fail with null on empty lists, like reduce and first, but if you first test that the type is Sequence then you can guarantee these operations will always succeed because the collection can't become empty later (they're immutable after all).
Tuple
A very special subtype of Sequence is Tuple, the first true class listed here. Unlike Sequence, where all the elements are constrained to one type Element, a Tuple has a type for each element. It gets special syntax in Ceylon, where [String, Integer, String] is an immutable list of exactly three elements with exactly those types in exactly that order.
Array
Another subtype of List is Array, also a true class. This is the familiar Java array, a mutable fixed-size list of elements.
drhagen has already answered the first part of your question very well, so I’m just going to say a bit on the second part: when do you use which type?
In general: when writing a function, make it accept the most general type that supports the operations you need. So far, so obvious.
Category is very abstract and rarely useful.
Iterable should be used if you expect some stream of elements which you’re just going to iterate over (or use stream operations like filter, map, etc.).
Another thing to consider about Iterable is that it has some extra syntax sugar in named arguments:
void printAll({Anything*} things, String prefix = "") {
for (thing in things) {
print(prefix + (thing?.string else "<null>"));
}
}
printAll { "a", "b", "c" };
printAll { prefix = "X"; "a", "b", "c" };
Try online
Any parameter of type Iterable can be supplied as a list of comma-separated arguments at the end of a named argument list. That is,
printAll { "a", "b", "c" };
is equivalent to
printAll { things = { "a", "b", "c" }; };
This allows you to craft DSL-style expressions; the tour has some nice examples.
Collection is, like Correspondence, fairly abstract and in my experience rarely used directly.
List sounds like it should be a often used type, but actually I don’t recall using it a lot. I’m not sure why. I seem to skip over it and declare my parameters as either Iterable or Sequential.
Sequential and Sequence are when you want an immutable, fixed-length list. It also has some syntax sugar: variadic methods like void foo(String* bar) are a shortcut for a Sequential or Sequence parameter. Sequential also allows you to do use the nonempty operator, which often works out nicely in combination with first and rest:
String commaSeparated(String[] strings) {
if (nonempty strings) {
value sb = StringBuilder();
sb.append(strings.first); // known to exist
for (string in strings.rest) { // skip the first one
sb.append(", ").append(string);
// we don’t need a separate boolean to track if we need the comma or not :)
}
return sb.string;
} else {
return "";
}
}
Try online
I usually use Sequential and Sequence when I’m going to iterate over a stream several times (which could be expensive for a generic Iterable), though List might be the better interface for that.
Tuple should never be used as Tuple (except in the rare case where you’re abstracting over them), but with the [X, Y, Z] syntax sugar it’s often useful. You can often refine a Sequential member to a Tuple in a subclass, e. g. the superclass has a <String|Integer>[] elements which in one subclass is known to be a [String, Integer] elements.
Array I’ve never used as a parameter type, only rarely as a class to instantiate and use.

OCaml visitor pattern

I am implementing a simple C-like language in OCaml and, as usual, AST is my intermediate code representation. As I will be doing quite some traversals on the tree, I wanted to implement
a visitor pattern to ease the pain. My AST currently follows the semantics of the language:
type expr = Plus of string*expr*expr | Int of int | ...
type command = While of boolexpr*block | Assign of ...
type block = Commands of command list
...
The problem is now that nodes in a tree are of different type. Ideally, I would pass to the
visiting procedure a single function handling a node; the procedure would switch on type of the node and do the work accordingly. Now, I have to pass a function for each node type, which does not seem like a best solution.
It seems to me that I can (1) really go with this approach or (2) have just a single type above. What is the usual way to approach this? Maybe use OO?
Nobody uses the visitor pattern in functional languages -- and that's a good thing. With pattern matching, you can fortunately implement the same logic much more easily and directly just using (mutually) recursive functions.
For example, assume you wanted to write a simple interpreter for your AST:
let rec run_expr = function
| Plus(_, e1, e2) -> run_expr e1 + run_expr e2
| Int(i) -> i
| ...
and run_command = function
| While(e, b) as c -> if run_expr e <> 0 then (run_block b; run_command c)
| Assign ...
and run_block = function
| Commands(cs) = List.iter run_command cs
The visitor pattern will typically only complicate this, especially when the result types are heterogeneous, like here.
It is indeed possible to define a class with one visiting method per type of the AST (which by default does nothing) and have your visiting functions taking an instance of this class as a parameter. In fact, such a mechanism is used in the OCaml world, albeit not that often.
In particular, the CIL library has a visitor class
(see https://github.com/kerneis/cil/blob/develop/src/cil.mli#L1816 for the interface). Note that CIL's visitors are inherently imperative (transformations are done in place). It is however perfectly possible to define visitors that maps an AST into another one, such as in Frama-C, which is based on CIL and offer in-place and copy visitor. Finally Cαml, an AST generator meant to easily take care of bound variables, generate map and fold visitors together with the datatypes.
If you have to write many different recursive operations over a set of mutually recursive datatypes (such as an AST) then you can use open recursion (in the form of classes) to encode the traversal and save yourself some boiler plate.
There is an example of such a visitor class in Real World OCaml.
The Visitor pattern (and all pattern related to reusable software) has to do with reusability in an inclusion polymorphism context (subtypes and inheritance).
Composite explains a solution in which you can add a new subtype to an existing one without modifying the latter one code.
Visitor explains a solution in which you can add a new function to an existing type (and to all of its subtypes) without modifying the type code.
These solutions belong to object-oriented programming and require message sending (method invocation) with dynamic binding.
You can do this in Ocaml is you use the "O" (the Object layer), with some limitation coming with the advantage of having strong typing.
In OCaml Having a set of related types, deciding whether you will use a class hierarchy and message sending or, as suggested by andreas, a concrete (algebraic) type together with pattern matching and simple function call, is a hard question.
Concrete types are not equivalent. If you choose the latter, you will be unable to define a new node in your AST after your node type will be defined and compiled. Once said that a A is either a A1 or a A2, your cannot say later on that there are also some A3, without modifying the source code.
In your case, if you want to implement a visitor, replace your EXPR concrete type by a class and its subclasses and your functions by methods (which are also functions by the way). Dynamic binding will then do the job.

Resources