Open Addressing vs. Separate Chaining - hashtable

Which hashmap collision handling scheme is better when the load factor is close to 1 to ensure minimum memory wastage?
I personally think the answer is open addressing with linear probing, because it doesn't need any additional storage space in case of collisions. Is this correct?

Answering the question: Which hashmap collision handling scheme is better when the load factor is close to 1 to ensure minimum memory wastage?
Open addressing/probing that allows a high fill. Because as you said so yourself, there is no extra space required for collisions (just, well, possibly time -- of course this is also assuming the hash function isn't perfect).
If you did not specify "load factor close to 1" or included "cost" metrics in the question then it would be entirely different.
Happy coding.

A hashmap that is that full will degrade into a linear search, so you will want to keep them under 90% full.
You are right about open addressing using less memory, chaining will need a pointer or offset field in each node.
I have created a hasharray data structure for when I need very lightweight hashtables that will not have alot of inserts. To keep memory usage low all data is embedded in the same block of memory, with the HashArray structure at the start, then two arrays for hashs & values. Hasharray can only be used with the lookup key is stored in the value.
typedef uint16_t HashType; /* this can be 32bits if needed. */
typedef uint16_t HashSize; /* this can be made 32bits if large hasharrays are needed. */
struct HashArray {
HashSize length; /* hasharray length. */
HashSize count; /* number of hash/values pairs contained in the hasharray. */
uint16_t value_size; /* size of each value. (maximum size of value 64Kbytes) */
/* these last two fields are just for show, they are not defined in the HashArray struct. */
uint16_t hashs[length]; /* array of hashs for each value, this helps with resolving bucket collision */
uint8_t values[length * value_size]; /* array holding all values. */
};
#define hasharray_get_hashs(array) (HashType *)(((uint8_t *)(array)) + sizeof(HashArray))
#define hasharray_get_values(array) ((uint8_t *)(array)) + sizeof(HashArray) + \
((array)->length * sizeof(HashType))
#define hasharray_get_value(array, idx) (hasharray_get_values(array) + ((idx) * (array)->value_size))
The macros hasharray_get_hashs & hasharray_get_values are used to get the 'hashs' & 'values' arrays.
I have used this for adding fast lookup of complex objects that are already stored in an array. The objects have a string 'name' field which is used for the lookup. The names are hashed and inserted into the hasharray with the objects index. The values stored in the hasharray can be indexes/pointers/whole objects (I only use small 16bit index values).
If you want to pack the hasharray till it is almost full, then you will want to use full 32bit Hashs instead of the 16bit ones defined above. Larger 32bit hashs will help keep searchs fast when the hasharray is more then 90% full.
The hasharray as defined above can only hold a maximum of 65535, which is fine since I never use it on anything that would have more the a few hundred values. Anything that needs more that that should just use an normal hashtable. But if memory is really an issue, the HashSize type could be changed to 32bits. Also I use power-of-2 lengths to keep the hash lookup fast. Some people prefer to use prime bucket lengths, but that is only needed if the hash function has bad distribution.
#define hasharray_empty_hash 0xFFFF /* hash value to mark empty slots. */
void *hasharray_search(HashArray *array, HashType hash, uint32_t *next) {
HashType *hashs = hasharray_get_hashs(array);
uint32_t mask = array->length - 1;
uint32_t start_idx;
uint32_t idx;
hash = (hash == hasharray_empty_hash) ? 0 : hash; /* need one hash value to mark empty slots. */
start_hash_idx = (hash & mask);
if(*next == 0) {
idx = start_idx; /* new search. */
} else {
idx = *next & mask; /* continuing search to next slot. */
}
/* find hash in hash array. */
do {
/* check for hash match. */
if(hashs[idx] == hash) goto found_hash;
/* check for end of chain. */
if(hashs[idx] == hasharray_empty_hash) break;
idx++;
idx &= mask;
} while(idx != start_idx);
/* maximum tries reached (i.e. did a linear search of whole array) or end of chain. */
return NULL;
found_hash:
*next = idx + 1; /* where to continue search at, if this is not the right value. */
return hasharray_get_values(array) + (idx * array->value_size);
}
hash collisions will happen so the code that calls hasharray_search() needs to compare the search key with the one stored in the value object. If they don't match then hasharray_search() is called again. Also non-unique keys can exist, since searching can continue until 'NULL' is returned to find all values that match one key. The search function uses linear probing to be cache freindly.
typedef struct {
char *name; /* this is the lookup key. */
char *type;
/* other field info... */
} Field;
typedef struct {
Field *list; /* array of Field objects. */
HashArray *lookup; /* hasharray for fast lookup of Field objects by name. The values stored in this hasharray are 16bit indices. */
uint32_t field_count; /* number of Field objects in 'list'. */
} Fields;
extern Fields *fields_new(uint16_t count) {
Fields *fields;
fields = calloc(1, sizeof(Fields));
fields->list = calloc(count, sizeof(Field));
/* allocate hasharray to hold at most 'count' uint16_t values.
* The hasharray will round 'count' up to the next power-of-2.
* That power-of-2 length must be atleast (count+1), so that there will always be one empty slot.
*/
fields->lookup = hasharray_new(count, sizeof(uint16_t));
fields->field_count = count;
}
extern Field *fields_lookup_by_name(Fields *fields, const char *name) {
HashType hash = str_to_hash(name);
Field *field;
uint32_t next = 0;
uint16_t *rc;
uint16_t idx;
do {
rc = hasharray_search(fields->lookup, hash, &next);
if(rc == NULL) break; /* field not found. */
/* found a possible match. */
idx = *rc;
assert(idx < fields->field_count);
field = &(fields->list[idx]);
/* compare lookup name with field's name. */
if(strcmp(name, field->name) == 0) {
/* found match. */
return field;
}
/* field didn't match continue search to next field. */
} while(1);
return NULL;
}
The worst case searching will degrade to a linear search of the whole array if it is 99% full and the key doesn't exist. If the keys are integers, then a linear search shouldn't be to bad, also only keys with the same hash value will need to be compared. I try to keep the hasharrays sized so they are only about 70-80% full, the space wasted on empty slots isn't much if the values are only 16bit values. With this design you only waste 4bytes per empty slot when using 16bit hashs & 16bit index values. The array of objects (Field structs in the above example) has no empty spots.
Also most hashtable implementations that I have seen don't store the computed hashs and require full key compares to resolve bucket collisions. Comparing the hashs helps a lot since only a small part of the hash value is used to lookup the bucket.

As the others said, in linear probing, when load factor near to 1, the time complexity near to linear search. (When it's full, its infinite.) There is a memory-efficiency trade off here. While segregate chaining always give us theoretically constant time.
Normally, under linear probing, it's recommended to keep the load factor between 1/8 and 1/2. when the array is 1/2 full, we resize it to double the size of original array. (Reference: Algorithms. by Robert Sedgewick. Kevin Wayne. ). When delete, we resize the array to 1/2 of original size as well. If you are really interested, it's good for you to begin with the book I mentioned above.
In practical, it's said that 0.72 is an empirical value we usually use.

Related

Understanding the method for OpenCL reduction on float

Following this link, I try to understand the operating of kernel code (there are 2 versions of this kernel code, one with volatile local float *source and the other with volatile global float *source, i.e local and global versions). Below I take local version :
float sum=0;
void atomic_add_local(volatile local float *source, const float operand) {
union {
unsigned int intVal;
float floatVal;
} newVal;
union {
unsigned int intVal;
float floatVal;
} prevVal;
do {
prevVal.floatVal = *source;
newVal.floatVal = prevVal.floatVal + operand;
} while (atomic_cmpxchg((volatile local unsigned int *)source, prevVal.intVal, newVal.intVal) != prevVal.intVal);
}
If I understand well, each work-item shares the access to source variable thanks to the qualifier "volatile", doesn't it?
Afterwards, if I take a work-item, the code will add operand value to newVal.floatVal variable. Then, after this operation, I call atomic_cmpxchg function which check if previous assignment (preVal.floatVal = *source; and newVal.floatVal = prevVal.floatVal + operand; ) has been done, i.e by comparing the value stored at address source with the preVal.intVal.
During this atomic operation (which is not uninterruptible by definition), as value stored at source is different from prevVal.intVal, the new value stored at source is newVal.intVal, which is actually a float (because it is coded on 4 bytes like integer).
Can we say that each work-item has a mutex access (I mean a locked access) to value located at source address.
But for each work-item thread, is there only one iteration into the while loop?
I think there will be one iteration because the comparison "*source== prevVal.int ? newVal.intVal : newVal.intVal" will always assign newVal.intVal value to value stored at source address, won't it?
I have not understood all the subtleties of this trick for this kernel code.
Update
Sorry, I almost understand all the subtleties, especially in the while loop :
First case : for a given single thread, before the call of atomic_cmpxchg, if prevVal.floatVal is still equal to *source, then atomic_cmpxchg will change the value contained in source pointer and return the value contained in old pointer, which is equal to prevVal.intVal, so we break from the while loop.
Second case : If between the prevVal.floatVal = *source; instruction and the call of atomic_cmpxchg, the value *source has changed (by another thread ??) then atomic_cmpxchg returns old value which is no more equal to prevVal.floatVal, so the condition into while loop is true and we stay in this loop until previous condition isn't checked any more.
Is my interpretation correct?
If I understand well, each work-item shares the access to source variable thanks to the qualifier "volatile", doesn't it?
volatile is a keyword of the C language that prevents the compiler from optimizing accesses to a specific location in memory (in other words, force a load/store at each read/write of said memory location). It has no impact on the ownership of the underlying storage. Here, it is used to force the compiler to re-read source from memory at each loop iteration (otherwise the compiler would be allowed to move that load outside the loop, which breaks the algorithm).
do {
prevVal.floatVal = *source; // Force read, prevent hoisting outside loop.
newVal.floatVal = prevVal.floatVal + operand;
} while(atomic_cmpxchg((volatile local unsigned int *)source, prevVal.intVal, newVal.intVal) != prevVal.intVal)
After removing qualifiers (for simplicity) and renaming parameters, the signature of atomic_cmpxchg is the following:
int atomic_cmpxchg(int *ptr, int expected, int new)
What it does is:
atomically {
int old = *ptr;
if (old == expected) {
*ptr = new;
}
return old;
}
To summarize, each thread, individually, does:
Load current value of *source from memory into preVal.floatVal
Compute desired value of *source in newVal.floatVal
Execute the atomic compare-exchange described above (using the type-punned values)
If the result of atomic_cmpxchg == newVal.intVal, it means the compare-exchange was successful, break. Otherwise, the exchange didn't happen, go to 1 and try again.
The above loop eventually terminates, because eventually, each thread succeeds in doing their atomic_cmpxchg.
Can we say that each work-item has a mutex access (I mean a locked access) to value located at source address.
Mutexes are locks, while this is a lock-free algorithm. OpenCL can simulate mutexes with spinlocks (also implemented with atomics) but this is not one.

Computing the memory footprint (or byte length) of a map

I want to limit a map to be maximum X bytes. It seems there is no straightforward way of computing the byte length of a map though.
"encoding/binary" package has a nice Size function, but it only works for slices or "fixed values", not for maps.
I could try to get all key/value pairs from the map, infer their type (if it's a map[string]interface{}) and compute the length - but that would be both cumbersome and probably incorrect (because that would exclude the "internal" Go cost of the map itself - managing pointers to elements etc).
Any suggested way of doing this? Preferably a code example.
This is the definition for a map header:
// A header for a Go map.
type hmap struct {
// Note: the format of the Hmap is encoded in ../../cmd/gc/reflect.c and
// ../reflect/type.go. Don't change this structure without also changing that code!
count int // # live cells == size of map. Must be first (used by len() builtin)
flags uint32
hash0 uint32 // hash seed
B uint8 // log_2 of # of buckets (can hold up to loadFactor * 2^B items)
buckets unsafe.Pointer // array of 2^B Buckets. may be nil if count==0.
oldbuckets unsafe.Pointer // previous bucket array of half the size, non-nil only when growing
nevacuate uintptr // progress counter for evacuation (buckets less than this have been evacuated)
}
Calculating its size is pretty straightforward (unsafe.Sizeof).
This is the definition for each individual bucket the map points to:
// A bucket for a Go map.
type bmap struct {
tophash [bucketCnt]uint8
// Followed by bucketCnt keys and then bucketCnt values.
// NOTE: packing all the keys together and then all the values together makes the
// code a bit more complicated than alternating key/value/key/value/... but it allows
// us to eliminate padding which would be needed for, e.g., map[int64]int8.
// Followed by an overflow pointer.
}
bucketCnt is a constant defined as:
bucketCnt = 1 << bucketCntBits // equals decimal 8
bucketCntBits = 3
The final calculation would be:
unsafe.Sizeof(hmap) + (len(theMap) * 8) + (len(theMap) * 8 * unsafe.Sizeof(x)) + (len(theMap) * 8 * unsafe.Sizeof(y))
Where theMap is your map value, x is a value of the map's key type and y a value of the map's value type.
You'll have to share the hmap structure with your package via assembly, analogously to thunk.s in the runtime.

OpenCL void pointer arithmetic - strange behavior

I have wrote an OpenCL kernel that is using the opencl-opengl interoperability to read vertices and indices, but probably this is not even important because I am just doing simple pointer addition in order to get a specific vertex by index.
uint pos = (index + base)*stride;
Here i am calculating the absolute position in bytes, in my example pos is 28,643,328 with a stride of 28, index = 0 and base = 1,022,976. Well, that seems correct.
Unfortunately, I cant use vload3 directly because the offset parameter isn't calculated as an absolute address in bytes. So I just add pos to the pointer void* vertices_gl
void* new_addr = vertices_gl+pos;
new_addr is in my example = 0x2f90000 and this is where the strange part begins,
vertices_gl = 0x303f000
The result (new_addr) should be 0x4B90000 (0x303f000 + 28,643,328)
I dont understand why the address vertices_gl is getting decreased by 716,800 (0xAF000)
I'm targeting the GPU: AMD Radeon HD5830
Ps: for those wondering, I am using a printf to get these values :) ( couldn't get CodeXL working)
There is no pointer arithmetic for void* pointers. Use char* pointers to perform byte-wise pointer computations.
Or a lot better than that: Use the real type the pointer is pointing to, and don't multiply offsets. Simply write vertex[index+base] assuming vertex points to your type containing 28 bytes of data.
Performance consideration: Align your vertex attributes to a power of two for coalesced memory access. This means, add 4 bytes of padding after each vertex entry. To automatically do this, use float8 as the vertex type if your attributes are all floating point values. I assume you work with position and normal data or something similar, so it might be a good idea to write a custom struct which encapsulates both vectors in a convenient and self-explaining way:
// Defining a type for the vertex data. This is 32 bytes large.
// You can share this code in a header for inclusion in both OpenCL and C / C++!
typedef struct {
float4 pos;
float4 normal;
} VertexData;
// Example kernel
__kernel void computeNormalKernel(__global VertexData *vertex, uint base) {
uint index = get_global_id(0);
VertexData thisVertex = vertex[index+base]; // It can't be simpler!
thisVertex.normal = computeNormal(...); // Like you'd do it in C / C++!
vertex[index+base] = thisVertex; // Of couse also when writing
}
Note: This code doesn't work with your stride of 28 if you just change one of the float4s to a float3, since float3 also consumes 4 floats of memory. But you can write it like this, which will not add padding (but note that this will penalize memory access bandwidth):
typedef struct {
float pos[4];
float normal[3]; // Assuming you want 3 floats here
} VertexData;

Why doesn't the computer have to do a comparison search to find a hash table value?

I'm brushing up on how hash tables work, and so I understand how the hash function calculates a unique (for the purpose of this question) hash table value to go with a stored value, so when the stored value is searched the hash function gives the computer the hash table value.
OK, so now we have the hash table value, but how is this better? Don't we still have to iterate through until we find the matching hash table value?
The hash function will be used to be mapped to an index directly in your array. So no search or iteration is done
The hash table is stored in an array. The hash value is mapped to an array index. Depending on the implementation, either the hash value is the array index or it is a number from a larger range which is taken modulo the size of the array.
Then once it looks at that spot in the array, it has to check that the value there matches, since multiple values may have the same hash value. Usually, it actually navigates a linked list of all values which have been hashed to the same spot in the hash table. This is a much, much shorter list than the full list (especially if the size of the hash table is proportional to the amount of data in it).
There are lots of different hash tables, each with differing details about the implementation, but the simplest hash table uses a hash code as index into an array:
#define TABLESIZE 1000
char **gHashTable[TABLESIZE];
void clearHashTable() {
memset(gHashTable, 0, sizeof(gHashTable));
}
int calculateHashCode(char *string) {
int val = 0;
for (int i = 0; string[i] != '\0'; ++i)
val += string[i];
return val;
}
void insertInHash(char *string) {
int hashCode = calculateHashCode(string);
gHashTable[hashCode % TABLESIZE] = string;
}
int isInHashTable(char *string) {
int hashCode = calculateHashCode(string);
return gHashTable[hashCode % TABLESIZE] != 0;
}
Now this simple hash supports fast lookup on strings. It doesn't handle collisions well, the hash function is terrible, and a number of other problems, but it will work.

FSM data structure design

I want to write an FSM which starts with an idle state and moves from one state to another based on some event. I am not familiar with coding of FSM and google didn't help.
Appreciate if someone could post the C data structure that could be used for the same.
Thanks,
syuga2012
We've implemented finite state machine for Telcos in the past and always used an array of structures, pre-populated like:
/* States */
#define ST_ANY 0
#define ST_START 1
: : : : :
/* Events */
#define EV_INIT 0
#define EV_ERROR 1
: : : : :
/* Rule functions */
int initialize(void) {
/* Initialize FSM here */
return ST_INIT_DONE
}
: : : : :
/* Structures for transition rules */
typedef struct {
int state;
int event;
(int)(*fn)();
} rule;
rule ruleset[] = {
{ST_START, EV_INIT, initialize},
: : : : :
{ST_ANY, EV_ERROR, error},
{ST_ANY, EV_ANY, fatal_fsm_error}
};
I may have the function pointer fn declared wrong since this is from memory. Basically the state machine searched the array for a relevant state and event and called the function which did what had to be done then returned the new state.
The specific states were put first and the ST_ANY entries last since priority of the rules depended on their position in the array. The first rule that was found was the one used.
In addition, I remember we had an array of indexes to the first rule for each state to speed up the searches (all rules with the same starting state were grouped).
Also keep in mind that this was pure C - there may well be a better way to do it with C++.
A finite state machine consists of a finite number discrete of states (I know pedantic, but still), which can generally be represented as integer values. In c or c++ using an enumeration is very common.
The machine responds to a finite number of inputs which can often be represented with another integer valued variable. In more complicated cases you can use a structure to represent the input state.
Each combination of internal state and external input will cause the machine to:
possibly transition to another state
possibly generate some output
A simple case in c might look like this
enum state_val {
IDLE_STATE,
SOME_STATE,
...
STOP_STATE
}
//...
state_val state = IDLE_STATE
while (state != STOP_STATE){
int input = GetInput();
switch (state) {
case IDLE_STATE:
switch (input) {
case 0:
case 3: // note the fall-though here to handle multiple states
write(input); // no change of state
break;
case 1:
state = SOME_STATE;
break
case 2:
// ...
};
break;
case SOME_STATE:
switch (input) {
case 7:
// ...
};
break;
//...
};
};
// handle final output, clean up, whatever
though this is hard to read and more easily split into multiple function by something like:
while (state != STOP_STATE){
int input = GetInput();
switch (state) {
case IDLE_STATE:
state = DoIdleState(input);
break;
// ..
};
};
with the complexities of each state held in it's own function.
As m3rLinEz says, you can hold transitions in an array for quick indexing. You can also hold function pointer in an array to efficiently handle the action phase. This is especially useful for automatic generation of large and complex state machines.
The answers here seem really complex (but accurate, nonetheless.) So here are my thoughts.
First, I like dmckee's (operational) definition of an FSM and how they apply to programming.
A finite state machine consists of a
finite number discrete of states (I
know pedantic, but still), which can
generally be represented as integer
values. In c or c++ using an
enumeration is very common.
The machine responds to a finite
number of inputs which can often be
represented with another integer
valued variable. In more complicated
cases you can use a structure to
represent the input state.
Each combination of internal state and
external input will cause the machine
to:
possibly transition to another state
possibly generate some output
So you have a program. It has states, and there is a finite number of them. ("the light bulb is bright" or "the light bulb is dim" or "the light bulb is off." 3 states. finite.) Your program can only be in one state at a time.
So, say you want your program to change states. Usually, you'll want something to happen to trigger a state change. In this example, how about we take user input to determine the state - say, a key press.
You might want logic like this. When the user presses a key:
If the bulb is "off" then make the bulb "dim".
If the bulb is "dim", make the bulb "bright".
If the bulb is "bright", make the bulb "off".
Obviously, instead of "changing a bulb", you might be "changing the text color" or whatever it is you program needs to do. Before you start, you'll want to define your states.
So looking at some pseudoish C code:
/* We have 3 states. We can use constants to represent those states */
#define BULB_OFF 0
#define BULB_DIM 1
#define BULB_BRIGHT 2
/* And now we set the default state */
int currentState = BULB_OFF;
/* now we want to wait for the user's input. While we're waiting, we are "idle" */
while(1) {
waitForUserKeystroke(); /* Waiting for something to happen... */
/* Okay, the user has pressed a key. Now for our state machine */
switch(currentState) {
case BULB_OFF:
currentState = BULB_DIM;
break;
case BULB_DIM:
currentState = BULB_BRIGHT;
doCoolBulbStuff();
break;
case BULB_BRIGHT:
currentState = BULB_OFF;
break;
}
}
And, voila. A simple program which changes the state.
This code executes only a small part of the switch statement - depending on the current state. Then it updates that state. That's how FSMs work.
Now here are some things you can do:
Obviously, this program just changes the currentState variable. You'll want your code to do something more interesting on a state change. The doCoolBulbStuff() function might, i dunno, actually put a picture of a lightbulb on a screen. Or something.
This code only looks for a keypress. But your FSM (and thus your switch statement) can choose state based on what the user inputted (eg, "O" means "go to off" rather than just going to whatever is next in the sequence.)
Part of your question asked for a data structure.
One person suggested using an enum to keep track of states. This is a good alternative to the #defines that I used in my example. People have also been suggesting arrays - and these arrays keep track of the transitions between states. This is also a fine structure to use.
Given the above, well, you could use any sort of structure (something tree-like, an array, anything) to keep track of the individual states and define what to do in each state (hence some of the suggestions to use "function pointers" - have a state map to a function pointer which indicates what to do at that state.)
Hope that helps!
See Wikipedia for the formal definition. You need to decide on your set of states S, your input alphabet Σ and your transition function δ. The simplest representation is to have S be the set of integers 0, 1, 2, ..., N-1, where N is the number of states, and for Σ be the set of integers 0, 1, 2, ..., M-1, where M is the number of inputs, and then δ is just a big N by M matrix. Finally, you can store the set of accepting states by storing an array of N bits, where the ith bit is 1 if the ith state is an accepting state, or 0 if it is not an accepting state.
For example, here is the FSM in Figure 3 of the Wikipedia article:
#define NSTATES 2
#define NINPUTS 2
const int transition_function[NSTATES][NINPUTS] = {{1, 0}, {0, 1}};
const int is_accepting_state[NSTATES] = {1, 0};
int main(void)
{
int current_state = 0; // initial state
while(has_more_input())
{
// advance to next state based on input
int input = get_next_input();
current_state = transition_function[current_state][input];
}
int accepted = is_accepting_state[current_state];
// do stuff
}
You can basically use "if" conditional and a variable to store the current state of FSM.
For example (just a concept):
int state = 0;
while((ch = getch()) != 'q'){
if(state == 0)
if(ch == '0')
state = 1;
else if(ch == '1')
state = 0;
else if(state == 1)
if(ch == '0')
state = 2;
else if(ch == '1')
state = 0;
else if(state == 2)
{
printf("detected two 0s\n");
break;
}
}
For more sophisticated implementation, you may consider store state transition in two dimension array:
int t[][] = {{1,0},{2,0},{2,2}};
int state = 0;
while((ch = getch()) != 'q'){
state = t[state][ch - '0'];
if(state == 2){
...
}
}
A few guys from AT&T, now at Google, wrote one of the best FSM libraries available for general use. Check it out here, it's called OpenFST.
It's fast, efficient, and they created a very clear set of operations you can perform on the FSMs to do things like minimize them or determinize them to make them even more useful for real world problems.
if by FSM you mean finite state machine,
and you like it simple, use enums to name your states
and switch betweem them.
Otherwise use functors. you can look the
fancy definition up in the stl or boost docs.
They are more or less objects, that have a
method e.g. called run(), that executes
everything that should be done in that state,
with the advantage that each state has it's own
scope.

Resources