Relation between map function and mathematical concept of map - functional-programming

I started reading "conceptual mathematics: an introduction in Category Theory". There, a map is defined as having a domain and codomain, with exactly one arrow leaving a given element of the domain and mapping it to an element in the codomain.
However, my concurrent endeavours in Haskell show the map function (without filtering) to map everything in domain tot everything in codomain.
This leads me to state that the map function in and by itself does not generate correct maps in the mathematical sense. Am i correct in stating this?

Despite the similar name, map is not the concept of functions/maps. That is, not all functions are some special case of map. They're just distinct things that happen to share a name. Happens all the time.
However, map is particular a function/map (domain a -> b, codomain [a] -> [b]), and map f for any f is a function/map (domain [a], codomain [b]).

Related

Treating single and multiple elements the same way ("transparent" map operator)

I'm working on a programming language that is supposed to be easy, intuitive, and succinct (yeah, I know, I'm the first person to ever come up with that goal ;-) ).
One of the features that I am considering for simplifying the use of container types is to make the methods of the container's element type available on the container type itself, basically as a shortcut for invoking a map(...) method. The idea is that working with many elements should not be different from working with a single element: I can apply add(5) to a single number or to a whole list of numbers, and I shouldn't have to write slightly different code for the "one" versus the "many" scenario.
For example (Java pseudo-code):
import static java.math.BigInteger.*; // ZERO, ONE, ...
...
// NOTE: BigInteger has an add(BigInteger) method
Stream<BigInteger> numbers = Stream.of(ZERO, ONE, TWO, TEN);
Stream<BigInteger> one2Three11 = numbers.add(ONE); // = 1, 2, 3, 11
// this would be equivalent to: numbers.map(ONE::add)
As far as I can tell, the concept would not only apply to "container" types (streams, lists, sets...), but more generally to all functor-like types that have a map method (e.g., optionals, state monads, etc.).
The implementation approach would probably be more along the lines of syntactic sugar offered by the compiler rather than by manipulating the actual types (Stream<BigInteger> obviously does not extend BigInteger, and even if it did the "map-add" method would have to return a Stream<BigInteger> instead of an Integer, which would be incompatible with most languages' inheritance rules).
I have two questions regarding such a proposed feature:
(1) What are the known caveats with offering such a feature? Method name collisions between the container type and the element type are one problem that comes to mind (e.g., when I call add on a List<BigInteger> do I want to add an element to the list or do I want to add a number to all elements of the list? The argument type should clarify this, but it's something that could get tricky)
(2) Are there any existing languages that offer such a feature, and if so, how is this implemented under the hood? I did some research, and while pretty much every modern language has something like a map operator, I could not find any languages where the one-versus-many distinction would be completely transparent (which leads me to believe that there is some technical difficulty that I'm overlooking here)
NOTE: I am looking at this in a purely functional context that does not support mutable data (not sure if that matters for answering these questions)
Do you come from an object-oriented background? That's my guess because you're thinking of map as a method belonging to each different "type" as opposed to thinking about various things that are of the type functor.
Compare how TypeScript would handle this if map were a property of each individual functor:
declare someOption: Option<number>
someOption.map(val => val * 2) // Option<number>
declare someEither: Either<string, number>
someEither.map(val => val * 2) // Either<string,number>
someEither.mapLeft(string => 'ERROR') // Either<'ERROR', number>
You could also create a constant representing each individual functor instance (option, array, identity, either, async/Promise/Task, etc.), where these constants have map as a method. Then have a standalone map method that takes one of those "functor constant"s, the mapping function, and the starting value, and returns the new wrapped value:
const option: Functor = {
map: <A, B>(f: (a:A) => B) => (o:Option<A>) => Option<B>
}
declare const someOption: Option<number>
map(option)(val => val * 2)(someOption) // Option<number>
declare const either: Functor = {
map: <E, A, B>(f: (a:A) => B) => (e:Either<E, A>) => Either<E, B>
}
declare const either: Either<string,number>
map(either)(val => val * 2)(someEither)
Essentially, you have a functor "map" that uses the first parameter to identify which type you're going to be mapping, and then you pass in the data and the mapping function.
However, with proper functional languages like Haskell, you don't have to pass in that "functor constant" because the language will apply it for you. Haskell does this. I'm not fluent enough in Haskell to write you the examples, unfortunately. But that's a really nice benefit that means even less boilerplate. It also allows you to write a lot of your code in what is "point free" style, so refactoring becomes much easier if you make your language so you don't have to manually specify the type being used in order to take advantage of map/chain/bind/etc.
Consider you initially write your code that makes a bunch of API calls over HTTP. So you use a hypothetical async monad. If your language is smart enough to know which type is being used, you could have some code like
import { map as asyncMap }
declare const apiCall: Async<number>
asyncMap(n => n*2)(apiCall) // Async<number>
Now you change your API so it's reading a file and you make it synchronous instead:
import { map as syncMap }
declare const apiCall: Sync<number>
syncMap(n => n*2)(apiCall)
Look how you have to change multiple pieces of the code. Now imagine you have hundreds of files and tens of thousands of lines of code.
With a point-free style, you could do
import { map } from 'functor'
declare const apiCall: Async<number>
map(n => n*2)(apiCall)
and refactor to
import { map } from 'functor'
declare const apiCall: Sync<number>
map(n => n*2)(apiCall)
If you had a centralized location of your API calls, that would be the only place you're changing anything. Everything else is smart enough to recognize which functor you're using and apply map correctly.
As far as your concerns about name collisions, that's a concern that will exist no matter your language or design. But in functional programming, add would be a combinator that is your mapping function passed into your fmap (Haskell term) / map(lots of imperative/OO languages' term). The function you use to add a new element to the tail end of an array/list might be called snoc ("cons" from "construct" spelled backwards, where cons prepends an element to your array; snoc appends). You could also call it push or append.
As far as your one-vs-many issue, these are not the same type. One is a list/array type, and the other is an identity type. The underlying code treating them would be different as they are different functors (one contains a single element, while one contains multiple elements.
I suppose you could create a language that disallows single elements by automatically wrapping them as a single-element lists and then just uses the list map. But this seems like a lot of work to make two things that are very different look the same.
Instead, the approach where you wrap single elements to be identity and multiple elements to be a list/array, and then array and identity have their own under-the-hood handlers for the functor method map probably would be better.

What is the difference between Vec<struct> and &[struct]?

I often find myself getting an error like this:
mismatched types: expected `collections::vec::Vec<u8>`, found `&[u8]` (expected struct collections::vec::Vec, found &-ptr)
As far as I know, one is mutable and one isn't but I've no idea how to go between the types, i.e. take a &[u8] and make it a Vec<u8> or vice versa.
What's the different between them? Is it the same as String and &str?
Is it the same as String and &str?
Yes. A Vec<T> is the owned variant of a &[T]. &[T] is a reference to a set of Ts laid out sequentially in memory (a.k.a. a slice). It represents a pointer to the beginning of the items and the number of items. A reference refers to something that you don't own, so the set of actions you can do with it are limited. There is a mutable variant (&mut [T]), which allows you to mutate the items in the slice. You can't change how many are in the slice though. Said another way, you can't mutate the slice itself.
take a &[u8] and make it a Vec
For this specific case:
let s: &[u8]; // Set this somewhere
Vec::from(s);
However, this has to allocate memory not on the stack, then copy each value into that memory. It's more expensive than the other way, but might be the correct thing for a given situation.
or vice versa
let v = vec![1u8, 2, 3];
let s = v.as_slice();
This is basically "free" as v still owns the data, we are just handing out a reference to it. That's why many APIs try to take slices when it makes sense.

OCaml visitor pattern

I am implementing a simple C-like language in OCaml and, as usual, AST is my intermediate code representation. As I will be doing quite some traversals on the tree, I wanted to implement
a visitor pattern to ease the pain. My AST currently follows the semantics of the language:
type expr = Plus of string*expr*expr | Int of int | ...
type command = While of boolexpr*block | Assign of ...
type block = Commands of command list
...
The problem is now that nodes in a tree are of different type. Ideally, I would pass to the
visiting procedure a single function handling a node; the procedure would switch on type of the node and do the work accordingly. Now, I have to pass a function for each node type, which does not seem like a best solution.
It seems to me that I can (1) really go with this approach or (2) have just a single type above. What is the usual way to approach this? Maybe use OO?
Nobody uses the visitor pattern in functional languages -- and that's a good thing. With pattern matching, you can fortunately implement the same logic much more easily and directly just using (mutually) recursive functions.
For example, assume you wanted to write a simple interpreter for your AST:
let rec run_expr = function
| Plus(_, e1, e2) -> run_expr e1 + run_expr e2
| Int(i) -> i
| ...
and run_command = function
| While(e, b) as c -> if run_expr e <> 0 then (run_block b; run_command c)
| Assign ...
and run_block = function
| Commands(cs) = List.iter run_command cs
The visitor pattern will typically only complicate this, especially when the result types are heterogeneous, like here.
It is indeed possible to define a class with one visiting method per type of the AST (which by default does nothing) and have your visiting functions taking an instance of this class as a parameter. In fact, such a mechanism is used in the OCaml world, albeit not that often.
In particular, the CIL library has a visitor class
(see https://github.com/kerneis/cil/blob/develop/src/cil.mli#L1816 for the interface). Note that CIL's visitors are inherently imperative (transformations are done in place). It is however perfectly possible to define visitors that maps an AST into another one, such as in Frama-C, which is based on CIL and offer in-place and copy visitor. Finally Cαml, an AST generator meant to easily take care of bound variables, generate map and fold visitors together with the datatypes.
If you have to write many different recursive operations over a set of mutually recursive datatypes (such as an AST) then you can use open recursion (in the form of classes) to encode the traversal and save yourself some boiler plate.
There is an example of such a visitor class in Real World OCaml.
The Visitor pattern (and all pattern related to reusable software) has to do with reusability in an inclusion polymorphism context (subtypes and inheritance).
Composite explains a solution in which you can add a new subtype to an existing one without modifying the latter one code.
Visitor explains a solution in which you can add a new function to an existing type (and to all of its subtypes) without modifying the type code.
These solutions belong to object-oriented programming and require message sending (method invocation) with dynamic binding.
You can do this in Ocaml is you use the "O" (the Object layer), with some limitation coming with the advantage of having strong typing.
In OCaml Having a set of related types, deciding whether you will use a class hierarchy and message sending or, as suggested by andreas, a concrete (algebraic) type together with pattern matching and simple function call, is a hard question.
Concrete types are not equivalent. If you choose the latter, you will be unable to define a new node in your AST after your node type will be defined and compiled. Once said that a A is either a A1 or a A2, your cannot say later on that there are also some A3, without modifying the source code.
In your case, if you want to implement a visitor, replace your EXPR concrete type by a class and its subclasses and your functions by methods (which are also functions by the way). Dynamic binding will then do the job.

Is getting a value using range not thread-safe in Go?

When ranging over a map m that has concurrent writers, including ones that could delete from the map, is it not thread-safe to do this?:
for k, v := range m { ... }
I'm thinking to be thread-safe I need to prevent other possible writers from changing the value v while I'm reading it, and (when using a mutex and because locking is a separate step) verify that the key k is still in the map. For example:
for k := range m {
m.mutex.RLock()
v, found := m[k]
m.mutex.RUnlock()
if found {
... // process v
}
}
(Assume that other writers are write-locking m before changing v.) Is there a better way?
Edit to add: I'm aware that maps aren't thread-safe. However, they are thread-safe in one way, according to the Go spec at http://golang.org/ref/spec#For_statements (search for "If map entries that have not yet been reached are deleted during iteration"). This page indicates that code using range needn't be concerned about other goroutines inserting into or deleting from the map. My question is, does this thread-safe-ness extend to v, such that I can get v for reading only using only for k, v := range m and no other thread-safe mechanism? I created some test code to try to force an app crash to prove that it doesn't work, but even running blatantly thread-unsafe code (lots of goroutines furiously modifying the same map value with no locking mechanism in place) I couldn't get Go to crash!
No, map operations are not atomic/thread-safe, as the commenter to your question pointed to the golang FAQ “Why are map operations not defined to be atomic?”.
To secure your accessing it, you are encouraged to use Go's channels as a means of resource access token. The channel is used to simply pass around a token. Anyone wanting to modify it will request so from the channel - blocking or non-blocking. When done with working with the map it passes the token back to the channel.
Iterating over and working with the map should be sufficiently simple and short, so you should be ok using just one token for full access.
If that is not the case, and you use the map for more complex stuff/a resource consumer needs more time with it, you may implement a reader- vs writer-access-token. So at any given time, only one writer can access the map, but when no writer is active the token is passed to any number of readers, who will not modify the map (thus they can read simultaneously).
For an introduction to channels, see the Effective Go docs on channels.
You could use concurrent-map to handle the concurrency pains for you.
// Create a new map.
map := cmap.NewConcurretMap()
// Add item to map, adds "bar" under key "foo"
map.Add("foo", "bar")
// Retrieve item from map.
tmp, ok := map.Get("foo")
// Checks if item exists
if ok == true {
// Map stores items as interface{}, hence we'll have to cast.
bar := tmp.(string)
}
// Removes item under key "foo"
map.Remove("foo")

How far should I take referential transparency?

I am building a website using erlang, mnesia, and webmachine. Most of the documentation I have read praises the virtues of having referentially transparent functions.
The problem is, all database access is external state. This means that any method that hits the database is no longer referentially transparent.
Lets say I have a user object in a database and some functions that deal with authentication.
Referentially opaque functions might look like:
handle_web_request(http_info) ->
is_authorized_user(http_info.userid),
...
%referentially opaque
is_authorized_user(userid) ->
User = get_user_from_db(userid),
User.is_authorized.
%referentially opaque
lots_of_other_functions(that_are_similar) ->
db_access(),
foo.
Referentially transparency requires that I minimize the amount of referentially opaque code, so the caller must get the object from the database and pass that in as an argument to a function:
handle_web_request(http_info) ->
User = get_user(http_info.userid),
is_authorized_user(User),
...
%referentially opaque
get_user(userid) ->
get_user_from_db(userid).
%referentially transparent
is_authorized(userobj) ->
userobj.is_authorized.
%referentially transparent
lots_of_other_functions(that_are_similar) ->
foo.
The code above is obviously not production code - it is made up purely for illustrative purposes.
I don't want to get sucked into dogma. Do the benefits of referentially transparent code (like provable unit testing) justify the less friendly interface? Just how far should I go in the pursuit of referentially transparancy?
Why not take referential transparency all the way?
Consider the definition of get_user_from_db. How does it know how to talk to the database? Obviously it assumes some (global) database context. You could change this function so that it returns a function that takes the database context as its argument. What you have is...
get_user_from_db :: userid -> User
This is a lie. You can't go from a userid to a user. You need something else: a database.
get_user_from_db :: userid -> Database -> User
Now just curry that with the userid, and given a Database at some later time, the function will give you a User. Of course, in the real world, Database will be a handle or a database connection object or whatever. For testing, give it a mock database.
You already mentioned unit-testing, keep thinking in those terms. Everything you find value in testing should be refentially transparent so you can test it.
If you don't have any complex logic that could go wrong, and a single functional/integration test would see that it is correct, then why bother going the extra distance?
Think YAGNI. But where unit-testability is a real need.

Resources