In OCaml, how to store "pointers" to elements of a list? - functional-programming

I have a list of some data type. However, I also want to index the elements of that list with a trie, so I can do more efficient lookups. But I don't want to store the same elements twice, so I want to store the elements in the list, and in the trie I store pointers to elements, in the leaf nodes. Is this possible? I could store the index of the element in the list, however getting an element of a linked list by index is slow, so that won't do.
Apologies if this is a misunderstanding of the OCaml memory model.

Just store the element. Under the hood, this doesn't copy the value, it just copies a pointer to the value, except for values that are stored in a single memory word (just like a pointer).
In other words, things like let b = a do not make a copy a. They make b an alias of a.
Values are automatically shared in Ocaml. The only case where you wouldn't want sharing is a mutable object (reference, or structure or object with mutable fields). If you want two mutable objects with the same current value but such that an assignment will only affect one of the objects, then you need to make a copy.

Related

How to have valid references to objects owned by containers that dynamically move their objects?

If you have a pointer or reference to an object contained in a container, say a dynamic array or a hash table/map, there's the problem that the objects don't permanently stay there, and so it seems any references to these objects become invalid before too long. For example a dynamic array might need to reallocate, and a hash table might have to rehash, changing the positions of the buckets in the array.
In languages like Java (and I think C#), and probably most languages this may not be a problem. In these languages many things are references instead of the object itself. You can create a reference to the 3rd element of a dynamic array, you basically create a new reference by copying the reference to the object which lives somewhere else.
But in say C++ where a dynamic array or hash table will actually store the objects directly in its memory owned by the container what are you supposed to do? There's only one place where the object I create can live. I can create the object by allocating it somewhere, and then store pointers to that object in a dynamic array or a hash table, or any other container. However if I decide to have the container be the owner of those objects I run into problems with having a pointer or reference to those objects.
In the case of a dynamic array like an std::vector you can reference an object in the array with a index instead of a memory address. If the array is reallocated the index is still valid. However I run into the same problem if I remove an element in the array, then the index is potentially no longer valid.
In the case of something like a hash table, the table might dynamically rehash, changing the position of all the values in the buckets. Is the only way of having references to hash table values to just search for or hash the key every time you want to access it?
What are the ways of having references to objects that live in containers like these, or any others?
There aren't any magic or generally used solutions to this. You have to make tradeoffs. If you are optimizing things at this low level, one good approach might be to use a container class that informs you when it does a reallocation. It'd be interesting to find out if there is any container library with this property

Why not use vectors and lists together?

This might be a quite stupid question but, knowing the poor efficiency of searching for an element inside a list (singly-linked or doubly-linked), why not using a vector or dynamic array to store the elements of the list in order, therefore making it easier to access elements ?
Linked lists used to be more important because they are stored non-contiguously which is better for memory management. Linked lists and vectors / arrays both have a search time complexity of O(N). It's only faster to access array elements if you know the index in advance. Linked lists are for niche cases where you are frequently inserting elements into the beginning of the array. Linked lists let you do this in O(1) time as opposed to arrays O(n) because the other elements need to be shifted.

HashTable vs. Array

I have been working with Java, but in general, what's the advantage of saving elements in a hash table when we have an array?
In my understanding, ff we have arr[100], then accessing i-th element is O(1) since it's just addition of (arr type) * (i) to the base pointer of arr itself. Then how is hashtable useful and when would it be useful as opposed to array? Is
Thank you
In Java you should be using HashMaps. The Object class in Java has a int hashcode method where it creates a unique number for the object in mind.
For example, this is the hashcode of a String in java.
In hashmaps you can assign a value to a key. For example, you could be doing: <Username(String), Customer(Custom Object)>. With arrays, to find a specific Customer (If you don't know the index) you would have to go through the entire array (O(n)) in the worst case to find that.
Without hashmaps, and using some more search optimized data structures like Binary Search Trees, it would take log(n) time (O(log n)) time to find the customer.
With a hashmap, you can get the customer's object immediately. Without having to go through the entire collection of the customers.
So basically, hashmaps "Map" a "hash" integer value to a key, and then use that key to find the value.
Also just as a bonus, remember since we're putting larger information inside a small integer, we will be facing the so called "hash collision" where two keys have the same hash value but they're not the same actual things. In this case we're obviously not going to find the information instantly, however again, instead of having to search for all the records to find our specific one, we just need to search a smaller "bucket" of values which is substantially smaller than our actual collection.

Efficiency of list operations in functional languages

In functional languages like Racket or SML, we usually perform list operations in recursive call (pattern matching, list append, list concatenation...). However, I'm not sure the general implementation of these operations in functional languages. Will operations like create, update or delete elements in a list return a whole new copy of a list? I once read in a book an example about functional programming disadvantage; that is, every time a database is updated, a whole new copy of a database is returned.
I questioned this example, since data in FP is inherently immutable, thus the creating lists from existing lists should not create a whole new copy. Instead, a new list is simply just a different collection of reference to existing objects in other lists, based on filtering criteria.
For example, list A = [a,b,c], and list B=[1,2,3], and I created a new list that contains the first two elements from the existing lists, that is C=[a,b,1,2]. This new list simply contains references to a,b, from A and 1,2 from B. It should not be a new copy, because data is immutable.
So, to update an element in a list, it should only take a linear amount of time find an element in a list, create a new value and create a new list with same elements as in the old list except the updated one. To create a new list, the running environment merely updates the next pointer of the previous element. If a list is holding non-atomic elements (i.e. list, tree...), and only one atomic element in one of the non-atomic element is updated, this process is recursively applied for the non-atomic element until the atomic element is updated as described above. Is this how it should be implemented?
If someone creates a whole deep copy of a list every time a list is created from existing lists/added/updated/deleted/ with elements, they are doing it wrong, aren't they?
Another thing is, when the program environment is updated (i.e. add a new key/value entry for a new variable, so we can refer to it later), it doesn't violate the immutable property of functional programming, is it?
You are absolutely correct! FP languages with immutable data will NEVER do a deep copy (unless they are really poorly implemented). As the data is immutable there are never any problems in reusing it. It works in exactly the same way with all other structures. So for example if you are working with a tree structure then at most only the actual tree will be copied and never the data contained in it.
So while the copying sounds very expensive it is much less than you would first think if you coming from an imperative/OO background (where you really do have to copy as you have mutable data). And there are many benefits in having immutable data.

LinkedHashMap's impl - Uses Double Linked List, not a Single Linked List; Why

As I referred to the documentation of LinkedHashMap, It says, a double linked list(DLL) is maintained internally
I was trying to understand why a DLL was chosen over S(ingle)LL
The biggest advantage I get with a DLL would be traversing backwards, but I dont see any use case for LinkedHashMap() exploiting this advantage, since there is no previous() sort of operation like next() in the Iterable interface..
Can anyone explain why was a DLL, and not a SLL?
It's because with an additional hash map, you can implement deletes in O(1). If you are using a singly linked list, delete would take O(n).
Consider a hash map for storing a key value pair, and another internal hash map with keys pointing to nodes in the linked list. When deleting, if it's a doubly linked list, I can easily get to the previous element and make it point to the following element. This is not possible with a singly linked list.
http://www.quora.com/Java-programming-language/Why-is-a-Java-LinkedHashMap-or-LinkedHashSet-backed-by-a-doubly-linked-list

Resources