Recursive code for maximum height of a binary tree - recursion

I came across a recursive code for calculating the maximum height of a binary tree-
int maxDepth(struct node* node)
{
if (node==NULL)
return 0;
else
{
/* compute the depth of each subtree */
int lDepth = maxDepth(node->left);
int rDepth = maxDepth(node->right);
/* use the larger one */
if (lDepth > rDepth)
return(lDepth+1);
else return(rDepth+1);
}
}
I'm tried to write the code in another way-
int maxDepth(struct node* node)
{
if (node==NULL)
return 0;
else
{
/* compute the depth of each subtree */
int lDepth = 1+maxDepth(node->left); //notice the change
int rDepth = 1+maxDepth(node->right); //notice the change
/* use the larger one */
if (lDepth > rDepth)
return(lDepth);
else return(rDepth);
}
}
I'm confused whether both versions will work similarly or is there a bug in the second implementation.
I tried out few cases, for which both functions returned same results.

Arithmetically they are the same, it doesn't matter when you add the 1 to the answer because no other arithmetic transformations are being done to the value which gets returned. Technically yours is slightly less efficient because you do two additions, then throw away the smaller of the two values which wastes the work done on that one. In reality I doubt you'd ever notice the difference if you did timings.

These two C functions will behave identically. All you have done in your rewrite of the function maxDepth() is to add 1 to the variables lDepth and rDepth. However, you effectively undo that change by subtracting 1 from these variables in your return value:
int lDepth = 1+maxDepth(node->left); // you added one to lDepth
int rDepth = 1+maxDepth(node->right); // you added one to rDepth
/* use the larger one */
if (lDepth > rDepth)
return(lDepth); // but you subtract one here
else return(rDepth); // and you also subtract one here

Related

Sway of Binary Tree

Original Problem
The problem is as stated and my solution is below: Return the amount a BST tree sways in one direction.
Sway is denoted by the amount of nodes that are
"unbalanced" - nullptr on only one side, a left
swaying tree returns the negative amount it sways
with any right sway offsetting the left and vice versa
int tree_sway(Node * node){
if(!node){
return 0;
}
int m = tree_sway(node->right) + 1;
int n = tree_sway(node->left) - 1;
return m - n;
}
For the tree sway problem, is the solution I have posted correct? If not, would the only way to do this problem be to create a helper function that keeps track of how many left and right turns the recursive step makes?
The code that you have posted is not quite correct. For example, on a tree with a root and a leaf, the result will always be 0 regardless of which side the leaf is on. One way of doing it this:
int tree_swap(Node *node) {
# base case of the recursion
if (!node) { return 0; }
# go down the left and right branches and add their scores
int score = tree_swap(node->left) + tree_swap(node->right);
# adjust the score for the missing children of the current node
if (!node->left) { score++; }
if (!node->right) { score--; }
return score;
}
The general idea is that as you recurse, you first go all the way down the tree and as you come back up you count the missing left and right branches and pass the running tally up the tree.

Constraint on an array with same values group together

I have two rand arrays: pointer and value. Whatever values in the pointer should also come in value with same number of times. For eg: if pointer[i] == 2, then value should have a value 2 which occur two times and should be after 1.
Expected result is shown below.
Sample code:
class ABC;
rand int unsigned pointer[$];
rand int unsigned value[20];
int count;
constraint c_mode {
pointer.size() == count;
solve pointer before value;
//======== Pointer constraints =========//
// To avoid duplicates
unique {pointer};
foreach(pointer[i]) {
// Make sure pointer is inside 1 to 4
pointer[i] inside {[1:4]};
// Make sure in increasing order
if (i>0)
pointer[i] > pointer[i-1];
}
//======== Value constraints =========//
//Make sure Pointer = 2 has to come two times in value, but this is not working as expected
foreach(pointer[i]) {
value.sum with (int'(item == pointer[i])) == pointer[i];
}
// Ensure it will be in increasing order but not making sure that pointers are not grouping together
// For eg: if pointer = 2, then 2 has to come two times together and after 1 in the array order. This is not met with the below constraint
foreach(value[i]) {
foreach(value[j]) {
((i>j) && (value[i] inside pointer) && (value[j] inside pointer)) -> value[i] >= value[j];
}
}
}
function new(int num);
count = num;
endfunction
endclass
module tb;
initial begin
int unsigned index;
ABC abc = new(4);
abc.randomize();
$display("-----------------");
$display("Pointer = %p", abc.pointer);
$display("Value = %p", abc.value);
$display("-----------------");
end
endmodule
I would implement this using a couple of helper arrays:
class pointers_and_values;
rand int unsigned pointers[];
rand int unsigned values[];
local rand int unsigned values_dictated_by_pointers[][];
local rand int unsigned filler_values[][];
// ...
endclass
The values_dictated_by_pointers array will contain the groups of values that your pointers mandate. The other array will contain the dummy values that come between these groups. So, the values array will contain filler_values[0], values_dictated_by_pointers[0], filler_values[1], values_dictated_by_pointers[1], etc.
Computing the values mandated by the pointers is easy:
constraint compute_values_dicated_by_pointers {
values_dictated_by_pointers.size() == pointers.size();
foreach (pointers[i]) {
values_dictated_by_pointers[i].size() == pointers[i];
foreach (values_dictated_by_pointers[i,j])
values_dictated_by_pointers[i][j] == pointers[i];
}
}
You need as many groups as you need pointers. In each group you have as many elements as the pointer value for that group. Also, each element of a group has the same value as the group's pointer value.
For the filler values you didn't mention what they should look like. I interpreted your problem description to say that the values in the pointers array should only come in the patters described above. This means that they are not allowed as filler values. Depending on whether you want to allow filler values before the first value, you will need either as many filler groups as you have pointers or one extra. In the following code I allowed filler values before the "real" values:
constraint compute_filler_values {
filler_values.size() == pointers.size() + 1;
foreach (filler_values[i, j])
!(filler_values[i][j] inside { pointers });
}
You'll also need to constrain the size of each of the filler value groups, otherwise the solver will leave them as 0. Here you can change the constraints to match your requirements. I chose to always insert filler values and to never insert more than 3 filler values.
constraint max_number_of_filler_values {
foreach (filler_values[i]) {
filler_values[i].size() > 0;
filler_values[i].size() <= 3;
}
}
For the real values array, you can compute its value in post_randomize() by interleaving the other two arrays:
function void post_randomize();
values = filler_values[0];
foreach (pointers[i])
values = { values, values_dictated_by_pointers[i], filler_values[i] };
endfunction
If you need to be able to constrain values as well, then you'll have to implement this interleaving operation using constraints. I'm not going to show this, as this is probably pretty complicated in itself and warrants an own question.
Be aware that the code above might not work on all EDA tools, because of spotty support for random multi-dimensional arrays. I only got this to work on Aldec Riviera Pro on EDA Playground.

Why is this code correct while it should clearly run into an infinite loop?

I have been having a problem with this code for a while. The placement of recursive call of the function does not seem right.
i tried running the code and yes it does run into a infinite loop.
// I DEFINE HEAP STRUCTURE AS :
struct heap_array
{
int *array; // heap implementation using arrays(note : heap is atype of a tree).
int capacity; // how much the heap can hold.
int size; //how much size is currently occupied.
void MaxHeapify(struct heap_array *h,int loc) // note : loc is the location of element to be PERCOLATED DOWN.
{
int left,right,max_loc=loc;
left=left_loc_child(h,loc);
right=right_loc_child(h,loc);
if(left !=-1 && h->array[left]>h->array[loc])
{
max_loc=left;
}
if(right!=-1 && h->array[right]>h->array[max_loc])
{
max_loc=right;
}
if(max_loc!=loc) //i.e. if changes were made:
{
//swap the element at max_loc and loc
int temp=h->array[max_loc];
h->array[max_loc]=h->array[loc];
h->array[loc]=temp;
}
MaxHeapify(h,max_loc); // <-- i feel that this recursive call is misplaced. I have seen the exact same code in almost all the online videos and some books i referred to. ALSO I THINK THAT THE CALL SHOULD BE MADE WITHIN THE SCOPE OF condition if(max_loc!=loc).
//if no changes made, end the func right there.
}
In your current implementation, it looks like you don't have a base case for recursion to stop.
Remember that you need a base case in a recursive function (in this case, your MaxHeapify function), and it doesn't look like there is one.
Here is an example of MaxHeap which may be resourceful to look at
// A recursive function to max heapify the given
// subtree. This function assumes that the left and
// right subtrees are already heapified, we only need
// to fix the root.
private void maxHeapify(int pos)
{
if (isLeaf(pos))
return;
if (Heap[pos] < Heap[leftChild(pos)] ||
Heap[pos] < Heap[rightChild(pos)]) {
if (Heap[leftChild(pos)] > Heap[rightChild(pos)]) {
swap(pos, leftChild(pos));
maxHeapify(leftChild(pos));
}
else {
swap(pos, rightChild(pos));
maxHeapify(rightChild(pos));
}
}
}
Here, you can see the basecase of:
if (isLeaf(pos))
return;
You need to add a base case to your recursive function.

assume statement modelling in FramaC

I want to use user assertion of value analysis plugin of Frama-C (Neon version), however I have some problem to come up with the suitable model of assume statement, which is very useful for me to apply particular constraints, for example, here is my test code:
#include "/usr/local/share/frama-c/builtin.h"
int main(void)
{
int selection = Frama_C_interval(0,10);
int a;
assume(selection > 5);
if (selection > 5)
{
a = 2;
}
else
{
a = 1;
}
//# assert a == 2;
return 0;
}
I want that the value of selection will be greater than 5 after this assume statement so that the assertion will be valid.
My initial attempt was to write this function
void assume(int a){ while(!a); return;}
, but it was unsuccessful.
Please help me, thanks.
The easiest way to constrain selection would be to use an assert (which of course won't be proved by Value). If you want to distinguish between the assert that are in fact hypotheses you make from the assert that you want to verify, you can use ACSL's naming mechanism, such as
//# assert assumption: selection > 5;
and verify that the only assert that are unknown are the ones named assumption.
Using an assume function cannot work as such, because it will only reduce the possible value of the a parameter to be non-zero. Value is not able to infer the relation between the value of a in assume and the value of selection in main. However, it is possible to help it a little bit. First, -slevel allows to propagate several abstract state in parallel. Second, an assert given in an disjunctive will force Value to split its state (if the -slevel is big enough to do so). Thus, with the following code
#include "builtin.h"
void assume(int a) { while(!a); return; }
int main(void)
{
int selection = Frama_C_interval(0,10);
int a;
/*# assert selection > 5 || selection <= 5; */
assume(selection > 5);
if (selection > 5)
{
a = 2;
}
else
{
a = 1;
}
//# assert a == 2;
return 0;
}
and the following command line:
frama-c -cpp-extra-args="-I$(frama-c -print-share-path)" -val -slevel 2
After the first assert (which obviously valid), Frama-C will propagate separately two states: one in which selection > 5 and one in which selection <= 5. In the first case, assume is called with 1 as argument, thus returns immediately, and the then branch of the if is taken, so that the second assert is valid. In the second state, assume is called with 0, and never returns. Thus for all cases where control reaches the second assert, it is valid.
Note that you really need to add the first assert inside the body of main, and to copy in ACSL the argument you pass to assume. Otherwise, the state split won't occur (more precisely, you will split on a, not on selection).

Open Addressing vs. Separate Chaining

Which hashmap collision handling scheme is better when the load factor is close to 1 to ensure minimum memory wastage?
I personally think the answer is open addressing with linear probing, because it doesn't need any additional storage space in case of collisions. Is this correct?
Answering the question: Which hashmap collision handling scheme is better when the load factor is close to 1 to ensure minimum memory wastage?
Open addressing/probing that allows a high fill. Because as you said so yourself, there is no extra space required for collisions (just, well, possibly time -- of course this is also assuming the hash function isn't perfect).
If you did not specify "load factor close to 1" or included "cost" metrics in the question then it would be entirely different.
Happy coding.
A hashmap that is that full will degrade into a linear search, so you will want to keep them under 90% full.
You are right about open addressing using less memory, chaining will need a pointer or offset field in each node.
I have created a hasharray data structure for when I need very lightweight hashtables that will not have alot of inserts. To keep memory usage low all data is embedded in the same block of memory, with the HashArray structure at the start, then two arrays for hashs & values. Hasharray can only be used with the lookup key is stored in the value.
typedef uint16_t HashType; /* this can be 32bits if needed. */
typedef uint16_t HashSize; /* this can be made 32bits if large hasharrays are needed. */
struct HashArray {
HashSize length; /* hasharray length. */
HashSize count; /* number of hash/values pairs contained in the hasharray. */
uint16_t value_size; /* size of each value. (maximum size of value 64Kbytes) */
/* these last two fields are just for show, they are not defined in the HashArray struct. */
uint16_t hashs[length]; /* array of hashs for each value, this helps with resolving bucket collision */
uint8_t values[length * value_size]; /* array holding all values. */
};
#define hasharray_get_hashs(array) (HashType *)(((uint8_t *)(array)) + sizeof(HashArray))
#define hasharray_get_values(array) ((uint8_t *)(array)) + sizeof(HashArray) + \
((array)->length * sizeof(HashType))
#define hasharray_get_value(array, idx) (hasharray_get_values(array) + ((idx) * (array)->value_size))
The macros hasharray_get_hashs & hasharray_get_values are used to get the 'hashs' & 'values' arrays.
I have used this for adding fast lookup of complex objects that are already stored in an array. The objects have a string 'name' field which is used for the lookup. The names are hashed and inserted into the hasharray with the objects index. The values stored in the hasharray can be indexes/pointers/whole objects (I only use small 16bit index values).
If you want to pack the hasharray till it is almost full, then you will want to use full 32bit Hashs instead of the 16bit ones defined above. Larger 32bit hashs will help keep searchs fast when the hasharray is more then 90% full.
The hasharray as defined above can only hold a maximum of 65535, which is fine since I never use it on anything that would have more the a few hundred values. Anything that needs more that that should just use an normal hashtable. But if memory is really an issue, the HashSize type could be changed to 32bits. Also I use power-of-2 lengths to keep the hash lookup fast. Some people prefer to use prime bucket lengths, but that is only needed if the hash function has bad distribution.
#define hasharray_empty_hash 0xFFFF /* hash value to mark empty slots. */
void *hasharray_search(HashArray *array, HashType hash, uint32_t *next) {
HashType *hashs = hasharray_get_hashs(array);
uint32_t mask = array->length - 1;
uint32_t start_idx;
uint32_t idx;
hash = (hash == hasharray_empty_hash) ? 0 : hash; /* need one hash value to mark empty slots. */
start_hash_idx = (hash & mask);
if(*next == 0) {
idx = start_idx; /* new search. */
} else {
idx = *next & mask; /* continuing search to next slot. */
}
/* find hash in hash array. */
do {
/* check for hash match. */
if(hashs[idx] == hash) goto found_hash;
/* check for end of chain. */
if(hashs[idx] == hasharray_empty_hash) break;
idx++;
idx &= mask;
} while(idx != start_idx);
/* maximum tries reached (i.e. did a linear search of whole array) or end of chain. */
return NULL;
found_hash:
*next = idx + 1; /* where to continue search at, if this is not the right value. */
return hasharray_get_values(array) + (idx * array->value_size);
}
hash collisions will happen so the code that calls hasharray_search() needs to compare the search key with the one stored in the value object. If they don't match then hasharray_search() is called again. Also non-unique keys can exist, since searching can continue until 'NULL' is returned to find all values that match one key. The search function uses linear probing to be cache freindly.
typedef struct {
char *name; /* this is the lookup key. */
char *type;
/* other field info... */
} Field;
typedef struct {
Field *list; /* array of Field objects. */
HashArray *lookup; /* hasharray for fast lookup of Field objects by name. The values stored in this hasharray are 16bit indices. */
uint32_t field_count; /* number of Field objects in 'list'. */
} Fields;
extern Fields *fields_new(uint16_t count) {
Fields *fields;
fields = calloc(1, sizeof(Fields));
fields->list = calloc(count, sizeof(Field));
/* allocate hasharray to hold at most 'count' uint16_t values.
* The hasharray will round 'count' up to the next power-of-2.
* That power-of-2 length must be atleast (count+1), so that there will always be one empty slot.
*/
fields->lookup = hasharray_new(count, sizeof(uint16_t));
fields->field_count = count;
}
extern Field *fields_lookup_by_name(Fields *fields, const char *name) {
HashType hash = str_to_hash(name);
Field *field;
uint32_t next = 0;
uint16_t *rc;
uint16_t idx;
do {
rc = hasharray_search(fields->lookup, hash, &next);
if(rc == NULL) break; /* field not found. */
/* found a possible match. */
idx = *rc;
assert(idx < fields->field_count);
field = &(fields->list[idx]);
/* compare lookup name with field's name. */
if(strcmp(name, field->name) == 0) {
/* found match. */
return field;
}
/* field didn't match continue search to next field. */
} while(1);
return NULL;
}
The worst case searching will degrade to a linear search of the whole array if it is 99% full and the key doesn't exist. If the keys are integers, then a linear search shouldn't be to bad, also only keys with the same hash value will need to be compared. I try to keep the hasharrays sized so they are only about 70-80% full, the space wasted on empty slots isn't much if the values are only 16bit values. With this design you only waste 4bytes per empty slot when using 16bit hashs & 16bit index values. The array of objects (Field structs in the above example) has no empty spots.
Also most hashtable implementations that I have seen don't store the computed hashs and require full key compares to resolve bucket collisions. Comparing the hashs helps a lot since only a small part of the hash value is used to lookup the bucket.
As the others said, in linear probing, when load factor near to 1, the time complexity near to linear search. (When it's full, its infinite.) There is a memory-efficiency trade off here. While segregate chaining always give us theoretically constant time.
Normally, under linear probing, it's recommended to keep the load factor between 1/8 and 1/2. when the array is 1/2 full, we resize it to double the size of original array. (Reference: Algorithms. by Robert Sedgewick. Kevin Wayne. ). When delete, we resize the array to 1/2 of original size as well. If you are really interested, it's good for you to begin with the book I mentioned above.
In practical, it's said that 0.72 is an empirical value we usually use.

Resources