How does change of the state work under the hood in functional languages - functional-programming

I would like to know how could functional languages implement "under the hood" creation of new state of for example Vector. When I have a Vector and I add another element to that particular Vector the old one is still there unchanged and new Vector containing the old one is created containing one more element.
How is this handled internally? Could someone try to explain it? Thank you.

Conceptually, a new Vector is created each time the Vector is extended or modified. However, since the original Vector is unmodified, clever techniques may be used to share structure. See ropes for example.
Also see Okasaki's Purely Functional Data Structures.

If you prepend an element to a linked list, a new linked list is created with the new element as its head and a pointer to the old list as its tail.
If you add an item to an array, the whole array is usually copied (making it quite inefficient to build up an immutable array incrementally).
However if you only add to the end of each array once, like so:
arr1 = emptyArray()
arr2 = add(arr1, 1)
arr3 = add(arr2, 2)
arr4 = add(arr3, 3)
The whole thing could be optimized, so that arr1, arr2, arr3 and arr4, all have pointers to the same memory, but different lengths. Of course this optimization can only be performed the first time you add to any given array. If you have arr5 = add(arr4, 4) and arr5prime = add(arr4, 42) at least one of them needs to be a copy.
Note that this isn't a common optimization, so you shouldn't expect it to be there unless explicitly stated in the documentation.

Related

Elixir allow assign to variable two times [duplicate]

In Dave Thomas's book Programming Elixir he states "Elixir enforces immutable data" and goes on to say:
In Elixir, once a variable references a list such as [1,2,3], you know it will always reference those same values (until you rebind the variable).
This sounds like "it won't ever change unless you change it" so I'm confused as to what the difference between mutability and rebinding is. An example highlighting the differences would be really helpful.
Don't think of "variables" in Elixir as variables in imperative languages, "spaces for values". Rather look at them as "labels for values".
Maybe you would better understand it when you look at how variables ("labels") work in Erlang. Whenever you bind a "label" to a value, it remains bound to it forever (scope rules apply here of course).
In Erlang you cannot write this:
v = 1, % value "1" is now "labelled" "v"
% wherever you write "1", you can write "v" and vice versa
% the "label" and its value are interchangeable
v = v+1, % you can not change the label (rebind it)
v = v*10, % you can not change the label (rebind it)
instead you must write this:
v1 = 1, % value "1" is now labelled "v1"
v2 = v1+1, % value "2" is now labelled "v2"
v3 = v2*10, % value "20" is now labelled "v3"
As you can see this is very inconvenient, mainly for code refactoring. If you want to insert a new line after the first line, you would have to renumber all the v* or write something like "v1a = ..."
So in Elixir you can rebind variables (change the meaning of the "label"), mainly for your convenience:
v = 1 # value "1" is now labelled "v"
v = v+1 # label "v" is changed: now "2" is labelled "v"
v = v*10 # value "20" is now labelled "v"
Summary: In imperative languages, variables are like named suitcases: you have a suitcase named "v". At first you put sandwich in it. Than you put an apple in it (the sandwich is lost and perhaps eaten by the garbage collector). In Erlang and Elixir, the variable is not a place to put something in. It's just a name/label for a value. In Elixir you can change a meaning of the label. In Erlang you cannot. That's the reason why it doesn't make sense to "allocate memory for a variable" in either Erlang or Elixir, because variables do not occupy space. Values do. Now perhaps you see the difference clearly.
If you want to dig deeper:
1) Look at how "unbound" and "bound" variables work in Prolog. This is the source of this maybe slightly strange Erlang concept of "variables which do not vary".
2) Note that "=" in Erlang really is not an assignment operator, it's just a match operator! When matching an unbound variable with a value, you bind the variable to that value. Matching a bound variable is just like matching a value it's bound to. So this will yield a match error:
v = 1,
v = 2, % in fact this is matching: 1 = 2
3) It's not the case in Elixir. So in Elixir there must be a special syntax to force matching:
v = 1
v = 2 # rebinding variable to 2
^v = 3 # matching: 2 = 3 -> error
Immutability means that data structures don't change. For example the function HashSet.new returns an empty set and as long as you hold on to the reference to that set it will never become non-empty. What you can do in Elixir though is to throw away a variable reference to something and rebind it to a new reference. For example:
s = HashSet.new
s = HashSet.put(s, :element)
s # => #HashSet<[:element]>
What cannot happen is the value under that reference changing without you explicitly rebinding it:
s = HashSet.new
ImpossibleModule.impossible_function(s)
s # => #HashSet<[:element]> will never be returned, instead you always get #HashSet<[]>
Contrast this with Ruby, where you can do something like the following:
s = Set.new
s.add(:element)
s # => #<Set: {:element}>
Erlang and obviously Elixir that is built on top of it, embraces immutability.
They simply don’t allow values in a certain memory location to change. Never Until the variable gets garbage collected or is out of scope.
Variables aren't the immutable thing. The data they point to is the immutable thing. That's why changing a variable is referred to as rebinding.
You're point it at something else, not changing the thing it points to.
x = 1 followed by x = 2 doesn't change the data stored in computer memory where the 1 was to a 2. It puts a 2 in a new place and points x at it.
x is only accessible by one process at a time so this has no impact on concurrency and concurrency is the main place to even care if something is immutable anyway.
Rebinding doesn’t change the state of an object at all, the value is still in the same memory location, but it’s label (variable) now points to another memory location, so immutability is preserved. Rebinding is not available in Erlang, but while it is in Elixir this is not braking any constraint imposed by the Erlang VM, thanks to its implementation.
The reasons behind this choice are well explained by Josè Valim in this gist .
Let's say you had a list
l = [1, 2, 3]
and you had another process that was taking lists and then performing "stuff" against them repeatedly and changing them during this process would be bad. You might send that list like
send(worker, {:dostuff, l})
Now, your next bit of code might want to update l with more values for further work that's unrelated to what that other process is doing.
l = l ++ [4, 5, 6]
Oh no, now that first process is going to have undefined behavior because you changed the list right? Wrong.
That original list remains unchanged. What you really did was make a new list based on the old one and rebind l to that new list.
The separate process never has access to l. The data l originally pointed at is unchanged and the other process (presumably, unless it ignored it) has its own separate reference to that original list.
What matters is you can't share data across processes and then change it while another process is looking at it. In a language like Java where you have some mutable types (all primitive types plus references themselves) it would be possible to share a structure/object that contained say an int and change that int from one thread while another was reading it.
In fact, it's possible to change a large integer type in java partially while it's read by another thread. Or at least, it used to be, not sure if they clamped that aspect of things down with the 64 bit transition. Anyway, point is, you can pull the rug out from under other processes/threads by changing data in a place that both are looking at simultaneously.
That's not possible in Erlang and by extension Elixir. That's what immutability means here.
To be a bit more specific, in Erlang (the original language for the VM Elixir runs on) everything was single-assignment immutable variables and Elixir is hiding a pattern Erlang programmers developed to work around this.
In Erlang, if a=3 then that was what a was going to be its value for the duration of that variable's existence until it dropped out of scope and was garbage collected.
This was useful at times (nothing changes after assignment or pattern match so it is easy to reason about what a function is doing) but also a bit cumbersome if you were doing multiple things to a variable or collection over the course executing a function.
Code would often look like this:
A=input,
A1=do_something(A),
A2=do_something_else(A1),
A3=more_of_the_same(A2)
This was a bit clunky and made refactoring more difficult than it needed to be. Elixir is doing this behind the scenes, but hiding it from the programmer via macros and code transforms performed by the compiler.
Great discussion here
immutability-in-elixir
The variables really are immutable in sense, every new rebinding (assignment) is only visible to access that come after that. All previous access, still refer to old value(s) at the time of their call.
foo = 1
call_1 = fn -> IO.puts(foo) end
foo = 2
call_2 = fn -> IO.puts(foo) end
foo = 3
foo = foo + 1
call_3 = fn -> IO.puts(foo) end
call_1.() #prints 1
call_2.() #prints 2
call_3.() #prints 4
To make it a very simple
variables in elixir are not like container where you keep adding and removing or modifying items from the container.
Instead they are like Labels attached to a container, when you reassign a variable is as simple a you pick a label from one container and place it on a new container with expected data in it.

Parallel iteration over array with step size greater than 1

I'm working on a practice program for doing belief propagation stereo vision. The relevant aspect of that here is that I have a fairly long array representing every pixel in an image, and want to carry out an operation on every second entry in the array at each iteration of a for loop - first one half of the entries, and then at the next iteration the other half (this comes from an optimisation described by Felzenswalb & Huttenlocher in their 2006 paper 'Efficient belief propagation for early vision'.) So, you could see it as having an outer for loop which runs a number of times, and for each iteration of that loop I iterate over half of the entries in the array.
I would like to parallelise the operation of iterating over the array like this, since I believe it would be thread-safe to do so, and of course potentially faster. The operation involved updates values inside the data structures representing the neighbouring pixels, which are not themselves used in a given iteration of the outer loop. Originally I just iterated over the entire array in one go, which meant that it was fairly trivial to carry this out - all I needed to do was put .Parallel between Array and .iteri. Changing to operating on every second array entry is trickier, however.
To make the change from simply iterating over every entry, I from Array.iteri (fun i p -> ... to using for i in startIndex..2..(ArrayLength - 1) do, where startIndex is either 1 or 0 depending on which one I used last (controlled by toggling a boolean). This means though that I can't simply use the really nice .Parallel to make things run in parallel.
I haven't been able to find anything specific about how to implement a parallel for loop in .NET which has a step size greater than 1. The best I could find was a paragraph in an old MSDN document on parallel programming in .NET, but that paragraph only makes a vague statement about transforming an index inside a loop body. I do not understand what is meant there.
I looked at Parallel.For and Parallel.ForEach, as well as creating a custom partitioner, but none of those seemed to include options for changing the step size.
The other option that occurred to me was to use a sequence expression such as
let getOddOrEvenArrayEntries myarray oddOrEven =
seq {
let startingIndex =
if oddOrEven then
1
else
0
for i in startingIndex..2..(Array.length myarray- 1) do
yield (i, myarray.[i])
}
and then using PSeq.iteri from ParallelSeq, but I'm not sure whether it will work correctly with .NET Core 2.2. (Note that, currently at least, I need to know the index of the given element in the array, as it is used as the index into another array during the processing).
How can I go about iterating over every second element of an array in parallel? I.e. iterating over an array using a step size greater than 1?
You could try PSeq.mapi which provides not only a sequence item as a parameter but also the index of an item.
Here's a small example
let res = nums
|> PSeq.mapi(fun index item -> if index % 2 = 0 then item else item + 1)
You can also have a look over this sampling snippet. Just be sure to substitute Seq with PSeq

How to speed up writing to a matrix in a reference class in R

Here is a piece of R code that writes to each element of a matrix in a reference class. It runs incredibly slowly, and I’m wondering if I’ve missed a simple trick that will speed this up.
nx = 2000
ny = 10
ref_matrix <- setRefClass(
"ref_matrix",fields = list(data = "matrix"),
)
out <- ref_matrix(data = matrix(0.0,nx,ny))
#tracemem(out$data)
for (iy in 1:ny) {
for (ix in 1:nx) {
out$data[ix,iy] <- ix + iy
}
}
It seems that each write to an element of the matrix triggers a check that involves a copy of the entire matrix. (Uncommenting the tracemen() call shows this.) Now, I’ve found a discussion that seems to confirm this:
https://r-devel.r-project.narkive.com/8KtYICjV/rd-copy-on-assignment-to-large-field-of-reference-class
and this also seems to be covered by Speeding up field access in R reference classes
but in both of these this behaviour can be bypassed by not declaring a class for the field, and this works for the example in the first link which uses a 1D vector, b, which can just be set as b <<- 1:10000. But I’ve not found an equivalent way of creating a 2D array without using a explicit “matrix” instance.
Am I just missing something simple, or is this actually not possible?
Let me add a couple of things. First, I’m very new to R, so could easily have missed something. Second, I’m really just curious about the way reference classes work in this case and whether there’s a simple way to use them efficiently; I’m not looking for a really fast way to set the elements of a matrix - I can do that by not having the matrix in a reference class at all, and if I really care about speed I can write a C routine to do it and call it from R.
Here’s some background that might explain why I’m interested in this, which you’re welcome to ignore.
I got here by wanting to see how different languages, and even different compiler options and different ways of coding the same operation, compared for efficiency when accessing 2D rectangular arrays. I’ve been playing with a test program that creates two 2D arrays of the same size, and calls a subroutine that sets the first to the elements of the second plus their index values. (Almost any operation would do, but this one isn’t completely trivial to optimise.) I have this in a number of languages now, C, C++, Julia, Tcl, Fortran, Swift, etc., even hand-coded assembler (spoiler alert: assembler isn’t worth the effort any more) and thought I’d try R. The obvious implementation in R passes the two arrays to a subroutine that does the work, but because R doesn’t normally pass by reference, that routine has to make a copy of the modified array and return that as the function value. I thought using a reference class would avoid the relatively minor overhead of that copy, so I tried that and was surprised to discover that, far from speeding things up, it slowed them down enormously.
Use outer:
out$data <- outer(1:ny, 1:nx, `+`)
Also, don't use reference classes (or R6 classes) unless you actually need reference semantics. KISS and all that.

"Adding" a value to a tuple?

I am attempting to represent dice rolls in Julia. I am generating all the rolls of a ndsides with
sort(collect(product(repeated(1:sides, n)...)), by=sum)
This produces something like:
[(1,1),(2,1),(1,2),(3,1),(2,2),(1,3),(4,1),(3,2),(2,3),(1,4) … (6,3),(5,4),(4,5),(3,6),(6,4),(5,5),(4,6),(6,5),(5,6),(6,6)]
I then want to be able to reasonably modify those tuples to represent things like dropping the lowest value in the roll or adding a constant number, etc., e.g., converting (2,5) into (10,2,5) or (5,).
Does Julia provide nice functions to easily modify (not necessarily in-place) n-tuples or will it be simpler to move to a different structure to represent the rolls?
Thanks.
Tuples are immutable, so you can't modify them in-place. There is very good support for other mutable data structures, so there aren't many methods that take a tuple and return a new, slightly modified copy. One way to do this is by splatting a section of the old tuple into a new tuple, so, for example, to create a new tuple like an existing tuple t but with the first element set to 5, you would write: tuple(5, t[2:end]...). But that's awkward, and there are much better solutions.
As spencerlyon2 suggests in his comment, a one dimensional Array{Int,1} is a great place to start. You can take a look at the Data Structures manual page to get an idea of the kinds of operations you can use; one-dimensional Arrays are iterable, indexable, and support the dequeue interface.
Depending upon how important performance is and how much work you're doing, it may be worthwhile to create your own data structure. You'll be able to add your own, specific methods (e.g., reroll!) for that type. And by taking advantage of some of the domain restrictions (e.g., if you only ever want to have a limited number of dice rolls), you may be able to beat the performance of the general Array.
You can construct a new tuple based on spreading or slicing another:
julia> b = (2,5)
(2, 5)
julia> (10, b...)
(10, 2, 5)
julia> b[2:end]
(5,)

Mathematica Map question

Original question:
I know Mathematica has a built in map(f, x), but what does this function look like? I know you need to look at every element in the list.
Any help or suggestions?
Edit (by Jefromi, pieced together from Mike's comments):
I am working on a program what needs to move through a list like the Map, but I am not allowed to use it. I'm not allowed to use Table either; I need to move through the list without help of another function. I'm working on a recursive version, I have an empty list one down, but moving through a list with items in it is not working out. Here is my first case: newMap[#, {}] = {} (the map of an empty list is just an empty list)
I posted a recursive solution but then decided to delete it, since from the comments this sounds like a homework problem, and I'm normally a teach-to-fish person.
You're on the way to a recursive solution with your definition newMap[f_, {}] := {}.
Mathematica's pattern-matching is your friend. Consider how you might implement the definition for newMap[f_, {e_}], and from there, newMap[f_, {e_, rest___}].
One last hint: once you can define that last function, you don't actually need the case for {e_}.
UPDATE:
Based on your comments, maybe this example will help you see how to apply an arbitrary function:
func[a_, b_] := a[b]
In[4]:= func[Abs, x]
Out[4]= Abs[x]
SOLUTION
Since the OP caught a fish, so to speak, (congrats!) here are two recursive solutions, to satisfy the curiosity of any onlookers. This first one is probably what I would consider "idiomatic" Mathematica:
map1[f_, {}] := {}
map1[f_, {e_, rest___}] := {f[e], Sequence##map1[f,{rest}]}
Here is the approach that does not leverage pattern matching quite as much, which is basically what the OP ended up with:
map2[f_, {}] := {}
map2[f_, lis_] := {f[First[lis]], Sequence##map2[f, Rest[lis]]}
The {f[e], Sequence##map[f,{rest}]} part can be expressed in a variety of equivalent ways, for example:
Prepend[map[f, {rest}], f[e]]
Join[{f[e]}, map[f, {rest}] (#Mike used this method)
Flatten[{{f[e]}, map[f, {rest}]}, 1]
I'll leave it to the reader to think of any more, and to ponder the performance implications of most of those =)
Finally, for fun, here's a procedural version, even though writing it made me a little nauseous: ;-)
map3[f_, lis_] :=
(* copy lis since it is read-only *)
Module[{ret = lis, i},
For[i = 1, i <= Length[lis], i++,
ret[[i]] = f[lis[[i]]]
];
ret
]
To answer the question you posed in the comments, the first argument in Map is a function that accepts a single argument. This can be a pure function, or the name of a function that already only accepts a single argument like
In[1]:=f[x_]:= x + 2
Map[f, {1,2,3}]
Out[1]:={3,4,5}
As to how to replace Map with a recursive function of your own devising ... Following Jefromi's example, I'm not going to give to much away, as this is homework. But, you'll obviously need some way of operating on a piece of the list while keeping the rest of the list intact for the recursive part of you map function. As he said, Part is a good starting place, but I'd look at some of the other functions it references and see if they are more useful, like First and Rest. Also, I can see where Flatten would be useful. Finally, you'll need a way to end the recursion, so learning how to constrain patterns may be useful. Incidentally, this can be done in one or two lines depending on if you create a second definition for your map (the easier way), or not.
Hint: Now that you have your end condition, you need to answer three questions:
how do I extract a single element from my list,
how do I reference the remaining elements of the list, and
how do I put it back together?
It helps to think of a single step in the process, and what do you need to accomplish in that step.

Resources