How to have valid references to objects owned by containers that dynamically move their objects? - pointers

If you have a pointer or reference to an object contained in a container, say a dynamic array or a hash table/map, there's the problem that the objects don't permanently stay there, and so it seems any references to these objects become invalid before too long. For example a dynamic array might need to reallocate, and a hash table might have to rehash, changing the positions of the buckets in the array.
In languages like Java (and I think C#), and probably most languages this may not be a problem. In these languages many things are references instead of the object itself. You can create a reference to the 3rd element of a dynamic array, you basically create a new reference by copying the reference to the object which lives somewhere else.
But in say C++ where a dynamic array or hash table will actually store the objects directly in its memory owned by the container what are you supposed to do? There's only one place where the object I create can live. I can create the object by allocating it somewhere, and then store pointers to that object in a dynamic array or a hash table, or any other container. However if I decide to have the container be the owner of those objects I run into problems with having a pointer or reference to those objects.
In the case of a dynamic array like an std::vector you can reference an object in the array with a index instead of a memory address. If the array is reallocated the index is still valid. However I run into the same problem if I remove an element in the array, then the index is potentially no longer valid.
In the case of something like a hash table, the table might dynamically rehash, changing the position of all the values in the buckets. Is the only way of having references to hash table values to just search for or hash the key every time you want to access it?
What are the ways of having references to objects that live in containers like these, or any others?

There aren't any magic or generally used solutions to this. You have to make tradeoffs. If you are optimizing things at this low level, one good approach might be to use a container class that informs you when it does a reallocation. It'd be interesting to find out if there is any container library with this property

Related

Redux: is state normalization necessary for composition relationship?

As we know, when saving data in a redux store, it's supposed to be transformed into a normalized state. So embedded objects should be replaced by their ids and saved within a dedicated collection in the store.
I am wondering, if that also should be done if the relationship is a composition? That means, the embedded data isn't of any use outside of the parent object.
In my case the embedded objects are registrations, and the parent object is a (real life) event. Normalizing this data structure to me feels like a lot of boilerplate without any benefit.
State normalization is more than just how you access the data by traversing the object tree. It also has to do with how you observe the data.
Part of the reason for normalization is to avoid unnecessary change notifications. Objects are treated as immutable so when they change a new object is created so that a quick reference check can indicate if something in the object changed. If you nest objects and a child object changes then you should change the parent. If some code is observing the parent then it will get change notifications every time a child changes even though it might not care. So depending on your scenario you may end up with a bunch of unnecessary change notifications.
This is also partly why you see lists of entities broken out into an array of identifiers and a map of objects. In relation to change detection, this allows you to observe the list (whether items have been added or removed) without caring about changes to the entities themselves.
So it depends on your usage. Just be aware of the cost of observing and the impact your state shape has on that.
I don't agree that data is "supposed to be [normalized]". Normalizing is a useful structure for accessing the data, but you're the architect to make that decision.
In many cases, the data stored will be an application singleton and a descriptive key is more useful than forcing some kind of id.
In your case I wouldn't bother unless there is excessive data duplication, especially because your would have to then denormalize for the object to function properly.

HashTable vs. Array

I have been working with Java, but in general, what's the advantage of saving elements in a hash table when we have an array?
In my understanding, ff we have arr[100], then accessing i-th element is O(1) since it's just addition of (arr type) * (i) to the base pointer of arr itself. Then how is hashtable useful and when would it be useful as opposed to array? Is
Thank you
In Java you should be using HashMaps. The Object class in Java has a int hashcode method where it creates a unique number for the object in mind.
For example, this is the hashcode of a String in java.
In hashmaps you can assign a value to a key. For example, you could be doing: <Username(String), Customer(Custom Object)>. With arrays, to find a specific Customer (If you don't know the index) you would have to go through the entire array (O(n)) in the worst case to find that.
Without hashmaps, and using some more search optimized data structures like Binary Search Trees, it would take log(n) time (O(log n)) time to find the customer.
With a hashmap, you can get the customer's object immediately. Without having to go through the entire collection of the customers.
So basically, hashmaps "Map" a "hash" integer value to a key, and then use that key to find the value.
Also just as a bonus, remember since we're putting larger information inside a small integer, we will be facing the so called "hash collision" where two keys have the same hash value but they're not the same actual things. In this case we're obviously not going to find the information instantly, however again, instead of having to search for all the records to find our specific one, we just need to search a smaller "bucket" of values which is substantially smaller than our actual collection.

Hash tables: Whats the best way to mark a bucket empty?

Ran into an annoying problem - I need some way to tell if the bucket I'm trying to fill is empty or not (the buckets are stored as an array of value type structs for key-value pairs).
If I were to reserve a key value for marking things empty then that would just mean that some data unfortunate enough to stumble on that hash value would never be accessible.
On the other hand, including a boolean in the KVP struct would increase the size of the struct from 16 to 24, (such a waste and I'm tight on memory as it is). Has anybody figured out a good solution for this?
This is a problem that is as intrinsic to hash tables as collisions. A related problem is dealing with deleting from a hash table, again, in the context of collisions. There's no solution that doesn't involve some compromise in performance, so it's pretty common to see hash table implementations that have a particular key that is illegal.
By far the most direct solution is to just special-case the key value that you're using to mean empty. That is, if the user is trying to store a key value 0, you just put it in a special array you keep around for that purpose.
Really lame hash tables that only work with pointers don't usually have this issue, since you can always find a pointer value which the caller can't pass in (such as a pointer to an object you own). Obviously hash tables using linked lists or array elements don't have this problem either, but then, there's a massive performance penalty for those.
You could probably find some clever way to encode it inside the table itself, by using multiple elements. The only way this would be better is if its somehow unified with deleted element handling or something else, so it would be free or faster than checking some separate list.

LinkedHashMap's impl - Uses Double Linked List, not a Single Linked List; Why

As I referred to the documentation of LinkedHashMap, It says, a double linked list(DLL) is maintained internally
I was trying to understand why a DLL was chosen over S(ingle)LL
The biggest advantage I get with a DLL would be traversing backwards, but I dont see any use case for LinkedHashMap() exploiting this advantage, since there is no previous() sort of operation like next() in the Iterable interface..
Can anyone explain why was a DLL, and not a SLL?
It's because with an additional hash map, you can implement deletes in O(1). If you are using a singly linked list, delete would take O(n).
Consider a hash map for storing a key value pair, and another internal hash map with keys pointing to nodes in the linked list. When deleting, if it's a doubly linked list, I can easily get to the previous element and make it point to the following element. This is not possible with a singly linked list.
http://www.quora.com/Java-programming-language/Why-is-a-Java-LinkedHashMap-or-LinkedHashSet-backed-by-a-doubly-linked-list

Hash tables: Open addressing and removing elements

I'm trying to understand open addressing in hash tables but there is one question which isn't answered in my literature. It concerns the deletion of elements in such a hash table if quadratic probing is used. Then the removed element is replaced by a sentinel element. The get() operation then knows that it has to go further and the add() method would overwrite the first sentinel it finds. But what happens if I want to add an element with a key that is already in the hash table but behind a sentinel in a probing path? Instead of overwriting the value of the instance with the same key which is already in the table, the add() method would overwrite the sentinel. And then we have multiple elements with the same key in the hash table. I see that as a problem since it costs memory and also since removing the element from the hash table would merely remove the first of them, so that the element could still be found in the table (i.e. it is not removed).
So it seems that it is necessary to search the whole probing path for the key of the element one wants to insert before replacing a sentinel element. Am I overlooking something? How is this problem handled in practice?
But what happens if I want to add an element with a key that is
already in the hash table but behind a sentinel in a probing path?
Instead of overwriting the value of the instance with the same key
which is already in the table, the add() method would overwrite the
sentinel.
add() has to check every element after the sentinel(s) in the probing path till it finds an empty element, as you pointed out later. If it could not find the new element in the probing path and there are sentinel elements on it, it can use the first sentinel slot to store the new element.
There is a hash table implementation on http://www.algolist.net/Data_structures/Hash_table/Open_addressing (HashMap.java). Its put() method does exactly this. (The collision resolution is linear probing in the referenced snippet but I don't think it's an important difference from the point of view of the algorithm.)
After a lot of remove operations there could be too many sentinel elements in the table. A solution for this would be to rebuild the hash table (i.e. rehash everything) occasionally (based on the number of items and the number of sentinel elements). This operation would eliminate the sentinel elements.
Another approach is eliminating the sentinel (DELETED) element from the probing path when you remove an element. Practically, you don't have sentinel elements in the table in this case; there are only FREE and OCCUPIED slots. It could be expensive.
So it seems that it is necessary to search the whole probing path for
the key of the element one wants to insert before replacing a sentinel
element.
Yes, it is. You have to search until you find an empty element.
How is this problem handled in practice?
I don't know too much about real life hash table implementations. I suppose plenty of them are available on the internet in open source projects. I've just checked the Hashtable and HashMap classes in Java. Both use chaining instead of open addressing.
Sorry for the late answer, but Java has an example of a hash table with open addressing: java.util.IdentityHashMap.
Also, you can use GNU Trove Project. Its maps are all open addressing hash tables, as explained on its overview page.

Resources