why insert() function take o(n) time in vector in c++ - vector

Not find proper answer
geeks for geeks link
why insert function take o(n) time complexity while we are providing the position where to insert??
can anyone tell me the reason......
in my point of view is this happened because of iterator ,,,, may be i am wrong but i want clarification on it..
i read it from the given link but i didn't find the proper answer of question.

The memory in a std::vector is typically represented as an array, so inserting a new element in the middle of the array without overwriting anything requires that all data after that point be shuffled to the right (linear complexity). Inserting at the end with push_back() is usually constant time, unless the internal capacity of the vector's array needs to be increased, in which case it is again linear.

Related

How can the insertion complexity for a hashtable be O(1)

I am creating a hashtable and inserting n elements in it from an unsorted array. As I am calling the hashfunction n times. Wouldn't the time complexity to create/insert the hashtable be O(n) ?
I tried searching everywhere, but they mention complexity in case of collisions, but don't cover how can I create a hashtable in O(1) in a perfect case as I will have to traverse the array in order to pick element one by one and put it in the hashtable?
When inserting a new record into a hash table, using a perfect hash function, an unused index entry (used to point to the record) will be found immediately giving O(1). Later, the record will be found immediately when searched for.
Of course, hash functions are seldom perfect. As the hash index starts to become populated the function will at times require two or more attempts to find an unused index entry to insert a new record and every later attempt to search for that record will also require two or more attempts before the correct index entry is found. So the actual search complexity of a hash table may end up as O(1.5) or more but that value is made up of searches where the record is most often found in the first attempt while others may require two or more.
I guess the trick is to find a hashing algorithm that is "good enough" which means a compromise between an index that isn't too big, an average complexity that is reasonably low and a worst case that is acceptable.
I posted on another search question here and showed how hashing could be used and a good hashing function be determined. The question required looking up a 64-bit value in a table containing 16000 records and replacing it with another value from the record. Computing the second value from the first was not possible. My algorithm sported an average search comnplexity of <1.5 with worst case of 14. The index size was a bit large for my taste but I think a more exhaustive search could have found a better one. In comparison, binary searches required, at best, about twice as many clock cycles to perform a lookup as the hash function did, probably because their time complexity was greater.

vector<vector> as a quick-traversal 2d data structure

I'm currently considering the implementation of a 2D data structure to allow me to store and draw objects in correct Z-Order (GDI+, entities are drawn in call order). The requirements are loosely:
Ability to add new objects to the top of any depth index
Ability to remove arbitrary object
(Ability to move object to the top of new depth index, accomplished by 2 points above)
Fast in-order and reverse-order traversal
As the main requirement is speed of traversal across the full data, the first thing that came to mind was an array like structure, eg. vector. It also easily allows for pushing new objects (removing objects not so great..). This works perfectly fine for our requirements, as it just so happens that the bulk of drawable entities don't change, and the ones that do sit at the top end of the order.
However it got me thinking of the implications for more dynamic requirements:
A vector will resize itself as required -> as the 'depth' vectors would need to be maintained contiguously in memory (top-level vector enforces it), this could lead to some pretty expensive vector resizes. Worst case all vectors need to be moved to new memory location, average case requiring all vectors up the chain to be moved.
Vectors will often hold a buffer at the end for adding new objects -> traversal could still easily force a cache miss while jumping between 'depth' vectors, rendering the top-level vector's contiguous memory less beneficial
Could someone confirm that these observations are indeed correct, making a vector a mostly very expensive structure for storing larger dynamic data sets?
From my thoughts above, I end up deducing that while traversing the whole dataset, specifically jumping between different vectors in the top-level vector, you might as well use any other data structure with inferior traversal complexity, or similar random access complexity (linked_list; map). Traversal would effectively be the same, as we might as well assume the cache misses will happen anyway, and we save ourselves a lot of bother by not keeping the depth vectors contiguously in memory.
Would that indeed be a good solution? If I'm not mistaken, on a 1D problem space, this would come down to what's more important traversal or addition/removal, vector or linked-list. On a 2D space I'm not so sure it is so black and white.
I'm wondering what sort of application requires good traversal across a 2D space, without compromising data addition/removal, and what sort of data structures are used there.
P.S. I just noticed I'm completely ignoring space-complexity, so might as well keep on ignoring it (unless you feel like adding more insight :D)
Your first assumption is somewhat incorrect.
Instead of thinking of vectors as the blob of memory itself, think of it as a pointer to automatically managed blob of memory and some metadata to keep track of it. A vector itself is a fixed size, the memory it keeps track of isn't. (See this example, note that the size of the vector object is constant: https://ideone.com/3mwjRz)
A vector of vectors can be thought of as an array of pointers. Resizing what the pointers point to doesn't mean you need to resize the array that contains them. The promise of items being contiguous still holds: the parent array has all of the pointers adjacent to each other and each pointer points to a contiguous chunk of memory. However, it's not guaranteed that the end of arr[0][N-1] is adjacent to the beginning of arr[1][0]. (To this end, your second point is correct.)
I guess that a Linked List would be more appropriate as you will always be traversing the whole list (vectors are good for random access). Linked lists inserts and removal are very cheap and the traversal isn't that different from a vector traversal. Maybe you should consider a Doubly Linked List as you want to traverse it in both ways.

Why is removing a node from a doubly-linked list faster than removing a node from a singly-linked list?

I was curious why deleting a node from a double linked list is faster than a single linked. According to my lecture, it takes O(1) for a double linked list compared to O(n) for a single linked. According to my thought process, I thought they both should be O(n) since you have to traverse across possibly all the elements so it depends on the size.
I understand it's going be associated with the fact that each node has a previous pointer and a next pointer to the next node, I just can't understand how it would be a constant operation in the sense of O(1)
This partially depends on how you're interpreting the setup. Here are two different versions.
Version 1: Let's suppose that you want to delete a linked list node containing a specific value x from a singly or doubly-linked list, but you don't know where in the list it is. In that case, you would have to traverse the list, starting at the beginning, until you found the node to remove. In both a singly- and doubly-linked list, you can then remove it in O(1) time, so the overall runtime is O(n). That said, it's harder to do the remove step in the singly-linked list, since you need to update a pointer in the preceding cell (which isn't pointed at by the cell to remove), so you need to store two pointers as you do this.
Version 2: Now let's suppose you're given a pointer to the cell to remove and need to remove it. In a doubly-linked list, you can do this by using the next and previous pointers to identify the two cells around the cell to remove and then rewiring them to splice the cell out of the list. This takes time O(1). But what about a singly-linked list? To remove this cell from the list, you have to change the next pointer of the cell that appears before the cell to remove so that it no longer points to the cell to remove. Unfortunately, you don't have a pointer to that cell, since the list is only singly-linked. Therefore, you have to start at the beginning of the list, walk downwards across the nodes, and find the node that comes right before the one to remove. This takes time O(n), so the runtime for the remove step is O(n) in the worst case, rather than O(1). (That said, if you know two pointers - the cell you want to delete and the cell right before it, then you can remove the cell in O(1) time since you don't have to scan the list to find the preceding cell.)
In short: if you know the cell to remove in advance, the doubly-linked list lets you remove it in time O(1) while a singly-linked list would require time O(n). If you don't know the cell in advance, then it's O(n) in both cases.
Hope this helps!
The list does not have to be traversed in order to connect the previous node to the following node in a double-linked list. You simply point
curr.Prev.Next = curr.Next and
curr.Next.Prev = curr.Prev.
In a single-linked list, you have to traverse the list to find the previous node. Traversal can be O(n) in a non-sorted list.
An alternative approach seems to be to use double pointers as outlined in this excellent resource: https://github.com/mkirchner/linked-list-good-taste. This means you do not need to keep track of a current and previous pointer as you only use a single pointer to pointer that can modify in place directly. Please let me know if this is inaccurate as I just learnt this.

Count the frequency of bytes in a purely functional language

If we had an assignment:
Given a block of binary data, count the frequency of the bytes within it.
And you were supposed to do this in C, the answer would be trivial and reasonably fast even for larger binary blocks. How would one go about implementing this in a purely functional language, without side effects?
For example, if you wrote a function that accepted freqency counts for each byte and the rest of the list of bytes, and returned modified frequency counts, it would have to do awful lot of work for data set of 100M bytes.
Also, if you sorted the data and then somehow counted the amount of subsequent same-valued bytes, the sort itself would take a lot of time.
Is there a reasonable way to implement this?
The straightforward way to do it is indeed to pass in and return data structures mapping bytes to counts. This would probably be implemented as some kind of tree (since that's what you get out of the standard library containers, as far as I know). In pure functional programming when you're passed in a tree and you need to return a new tree with a difference in only one node, the returned tree ends up sharing almost all of its structure and data with the original tree.
There is some overhead in traversing the tree to get to the count, but since you're counting bytes the tree is only ever smaller than 256 elements, so the overhead is log(255), which is a constant. It doesn't get larger for large data sets - it doesn't change the big-oh complexity of the algorithm. That's actually true even if you use the greatest possible overhead of copying around a full 256-entry array of counts with no sharing.
If you want to optimise this, you can take advantage of the fact that the "intermediate" frequency counts are never needed except as part of the computation of the next set of counts. That means you can use various techniques for getting the implementation to use destructive updates even while you're still semantically writing functional code. An STref in Haskell is basically letting you do this manually.
Theoretically the compiler could notice that you're replacing a never-needed-again value with a new one, so it could do the update in place for you. I don't know whether or not any actual production ready compilers are currently able to make this optimisation.

Time complexity to fill hash table?

This is a homework question, but I think there's something missing from it. It asks:
Provide a sequence of m keys to fill a hash table implemented with linear probing, such that the time to fill it is minimum.
And then
Provide another sequence of m keys, but such that the time fill it is maximum. Repeat these two questions if the hash table implements quadratic probing
I can only assume that the hash table has size m, both because it's the only number given and because we have been using that letter to address a hash table size before when describing the load factor. But I can't think of any sequence to do the first without knowing the hash function that hashes the sequence into the table.
If it is a bad hash function, such that, for instance, it hashes every entry to the same index, then both the minimum and maximum time to fill it will take O(n) time, regardless of what the sequence looks like. And in the average case, where I assume the hash function is OK, how am I supposed to know how long it will take for that hash function to fill the table?
Aren't these questions linked to the hash function stronger than they are to the sequence that is hashed?
As for the second question, I can assume that, regardless of the hash function, a sequence of size m with the same key repeated m-times will provide the maximum time, because it will cause linear probing from the second entry on. I think that will take O(n) time. Is that correct?
Well, the idea behind these questions is to test your understanding of probing styles. For linear probing, if a collision occurs, you simply test the next cell. And it goes on like this until you find an available cell to store your data.
Your hash table doesn't need to be size m but it needs to be at least size m.
First question is asking that if you have a perfect hash function, what is the complexity of populating the table. Perfect hashing function addresses each element without collision. So for each element in m, you need O(1) time. Total complexity is O(m).
Second question is asking for the case that hash(X)=cell(0), which all of the elements will search till the first empty cell(just rear of the currently populated table).
For the first element, you probe once -> O(1)
For the second element, you probe twice -> O(2)
for the nth element, you probe n times -> O(n)
overall you have m elements, so -> O(n*(n+1)/2)
For quadratic probing, you have the same strategy. The minimum case is the same, but the maximum case will have O(nlogn). ( I didn't solve it, just it's my educated guess.)
This questions doesn't sound terribly concerned with the hash function, but it would be nice to have. You seem to pretty much get it, though. It sounds to me like the question is more concerned with "do you know what a worst-case list of keys would be?" than "do you know how to exploit bad hash functions?"
Obviously, if you come up with a sequence where all the entries hash to different locations, then you have O(1) insertions for O(m) time in total.
For what you are saying about hashing all the keys to the same location, each insertion should take O(n) if that's what you are suggesting. However, that's not the total time for inserting all the elements. Also, you might want to consider not literally using the same key over and over but rather using keys that would produce the same location in the table. I think, by convention, inserting the same key should cause a replacement, though I'm not 100% sure.
I'll apologize in advance if I gave too much information or left anything unclear. This question seems pretty cut-and-dried save the part about not actually knowing the hash function, and it was kind of hard to really say much without answering the whole question.

Resources