I always use resize() because I cannot use reserve as it gives error: vector subscript out of range. As I've read info about the differences of resize() and reserve(), I saw things like reserve() sets max. number of elements could be allocated but resize() is currently what we have. In my code I know max. number of elements but reserve() doesn't give me anything useful. So, how can I make use of reserve()?
A vector has a capacity (as returned by capacity() and a size (as returned by size(). The first states how many elements a vector can hold, the second how many he does currently hold.
resize changes the size, reserve only changes the capacity.
See also the resize and reserve documentation.
As for the use cases:
Let's say you know beforehand how many elements you want to put into your vector, but you don't want to initialize them - that's the use case for reserve. Let's say your vector was empty before; then, directly after reserve(), before doing any insert or push_back, you can, of course, not directly access as many elements as you reserved space for - that would trigger the mentioned error (subscript out of range) - since the elements you are trying to access are not yet initialized; the size is still 0. So the vector is still empty; but if you choose the reserved capacity in such a way that it's higher or equal to the maximum size your vector will get, you are avoiding expensive reallocations; and at the same time you will also avoid the (in some cases expensive) initialization of each vector element that resize would do.
With resize, on the other hand, you say: Make the vector hold as many elements as I gave as an argument; initialize those whose indices are exceeding the old size, or remove the ones exceeding the given new size.
Note that reserve will never affect the elements currently in the vector (except their storage location if reallocation is needed - but not their values or their number)! Meaning that if the size of a vector is currently greater than what you pass to a call to the reserve function on that same vector, reserve will just do nothing.
See also the answer to this question: Choice between vector::resize() and vector::reserve()
reserve() is a performance optimization for using std::vector.
A typical std::vector implementation would reserve some memory on the first push_back(), for example 4 elements. When the 5th element gets pushed, the vector has to be resized: new memory has to be allocated (usually the size is doubled), the contents of the vector have to be copied to the new location, and the old memory has to be deleted.
This becomes an expensive operation when the vector holds a lot of elements. For example when you push_back() the 2^24+1th element, 16Million elements get copied just to add one element.
If you know the number of elements in advance you can reserve() the number of elements you are planning to push_back(). In this case expensive copy operations are not necessary because the memory is already reserved for the amount needed.
resize() in contrast changes the number of elements in the vector.
If no elements are added and you use resize(20), 20 elements will now be accessable. Also the amount of memory allocated will increase to an implementation-dependent value.
If 50 elements are added and you use resize(20), the last 30 elements will be removed from the vector and not be accessable any more. This doesn't necessarily change the memory allocated but this may also be implementation-dependent.
resize(n) allocates the memory for n objects and default-initializes them.
reserve() allocates the memory but does not initialize. Hence, reserve won't change the value returned by size(), but it will change the result of capacity().
Edited after underscore_d's comment.
Description how functions implemented in VS2015
VS2015 CTP6
This error dialog exist only in the DEBUG mode, when #if _ITERATOR_DEBUG_LEVEL == 2 is defined. In the RELEASE mode we don't have any problems. We get a current value by return (*(this->_Myfirst() + _Pos), so size value isn't needed:
reference operator[](size_type _Pos)
{ // subscript mutable sequence
#if _ITERATOR_DEBUG_LEVEL == 2
if (size() <= _Pos)
{ // report error
_DEBUG_ERROR("vector subscript out of range");
_SCL_SECURE_OUT_OF_RANGE;
}
#elif _ITERATOR_DEBUG_LEVEL == 1
_SCL_SECURE_VALIDATE_RANGE(_Pos < size());
#endif /* _ITERATOR_DEBUG_LEVEL */
return (*(this->_Myfirst() + _Pos));
}
If we see in the vector's source code, we can find, that a difference between resize and reserve is only in the changing of the value of this->_Mylast() in the func resize().
reserve() calls _Reallocate.
resize() calls _Reserve, that calls _Reallocate and then resize() also changes the value of this->_Mylast(): this->_Mylast() += _Newsize - size(); that is used in the size calculation(see last func)
void resize(size_type _Newsize)
{ // determine new length, padding as needed
if (_Newsize < size())
_Pop_back_n(size() - _Newsize);
else if (size() < _Newsize)
{ // pad as needed
_Reserve(_Newsize - size());
_TRY_BEGIN
_Uninitialized_default_fill_n(this->_Mylast(), _Newsize - size(),
this->_Getal());
_CATCH_ALL
_Tidy();
_RERAISE;
_CATCH_END
this->_Mylast() += _Newsize - size();
}
}
void reserve(size_type _Count)
{ // determine new minimum length of allocated storage
if (capacity() < _Count)
{ // something to do, check and reallocate
if (max_size() < _Count)
_Xlen();
_Reallocate(_Count);
}
}
void _Reallocate(size_type _Count)
{ // move to array of exactly _Count elements
pointer _Ptr = this->_Getal().allocate(_Count);
_TRY_BEGIN
_Umove(this->_Myfirst(), this->_Mylast(), _Ptr);
_CATCH_ALL
this->_Getal().deallocate(_Ptr, _Count);
_RERAISE;
_CATCH_END
size_type _Size = size();
if (this->_Myfirst() != pointer())
{ // destroy and deallocate old array
_Destroy(this->_Myfirst(), this->_Mylast());
this->_Getal().deallocate(this->_Myfirst(),
this->_Myend() - this->_Myfirst());
}
this->_Orphan_all();
this->_Myend() = _Ptr + _Count;
this->_Mylast() = _Ptr + _Size;
this->_Myfirst() = _Ptr;
}
void _Reserve(size_type _Count)
{ // ensure room for _Count new elements, grow exponentially
if (_Unused_capacity() < _Count)
{ // need more room, try to get it
if (max_size() - size() < _Count)
_Xlen();
_Reallocate(_Grow_to(size() + _Count));
}
}
size_type size() const _NOEXCEPT
{ // return length of sequence
return (this->_Mylast() - this->_Myfirst());
}
Problems
But some problems exist with reserve:
end() will be equal to begin()
23.2.1 General container requirements
5:
end() returns an iterator which is the past-the-end value for the container.
iterator end() _NOEXCEPT
{ // return iterator for end of mutable sequence
return (iterator(this->_Mylast(), &this->_Get_data()));
}
i.e. _Mylast() will be equal _Myfirst()
at() will generate an out_of_range exception.
23.2.3 Sequence containers
17:
The member function at() provides bounds-checked access to container elements. at() throws out_of_range if n >= a.size().
in the VisualStudio debugger we can see vector values, when size isn't 0
with resize:
with reserve and manually setted #define _ITERATOR_DEBUG_LEVEL 0:
Related
I see some of the posts to understand merge sort. I know recursive methods maintains stack to hold values. (my understand was return statement result will be in stack )
private int recur(int count) {
if (count > 0) {
System.out.println(count);
return count + recur(--count); // this value will be in stack.
}
return count;
}
I am confusing in merge sort how stack is maintaining here.
private void divide(int low, int high) {
System.out.println("Divide => Low: "+ low +" High: "+ high);
if (low < high) {
int middle = (low + high) / 2;
divide(low, middle); // {0,7},{0,3}, {0,1} ;
divide(middle + 1, high); // {0,0}; high = 1; // 2nd divide
combine(low, middle, high);
}
}
Is stack for all local variables?
When 2nd recursive method calls, 1st recursive will also join?
How stack are maintained in such cases?
You only have to know that a statement needs to finish and return and that you call divide or combine from divide works the same. Both need to finish before the next line of code can be executed or, if there are no more lines, the function returns. Yes, it's done with stack but it's really not important.
The state of the waiters variables low, high and middle is only the current invocations bindings so they don't get mixed with other invocations.
Every time you nest a new call it gets it's own variables and each need to finish. When the low-middle is finished it calls middle+1-high and when that finished combine. Those calls will do the same so you will have deeper nesting and how the call structure will be visited is like like a binary tree structure with the leafs being low == high (one element).
A word of advice. When looking at recursive code try doing it from leaf to more complex tree. eg. try it out with base case first, then the simplest of default case. eg.
1 element array: does nothing
2 element array: -> 1 element array (see 1.), 1 element array, combine
4 element array: -> 2 element array (see 2.), 2 element array, combine
Notice that the 2. you know both recursive calls won't do anything and combine will do perhaps a swap. The 3. does 2. twice (including the swap) before combine that will merge 2 2 element arrays that are sorted. You are perhaps looking at it the other way, which requires you to halt 3. to do 2. that halts it and does 1., then the next 1, then back to 2. to do the text that has two 1s... It needs pen and paper. Looking at it from leaf to root using what you have learned of it so far lets you understand it much easier. I do think functional recursion is easier to grasp than mutating structures like your merge sort. eg. fibonacci sequence.
Following this link, I try to understand the operating of kernel code (there are 2 versions of this kernel code, one with volatile local float *source and the other with volatile global float *source, i.e local and global versions). Below I take local version :
float sum=0;
void atomic_add_local(volatile local float *source, const float operand) {
union {
unsigned int intVal;
float floatVal;
} newVal;
union {
unsigned int intVal;
float floatVal;
} prevVal;
do {
prevVal.floatVal = *source;
newVal.floatVal = prevVal.floatVal + operand;
} while (atomic_cmpxchg((volatile local unsigned int *)source, prevVal.intVal, newVal.intVal) != prevVal.intVal);
}
If I understand well, each work-item shares the access to source variable thanks to the qualifier "volatile", doesn't it?
Afterwards, if I take a work-item, the code will add operand value to newVal.floatVal variable. Then, after this operation, I call atomic_cmpxchg function which check if previous assignment (preVal.floatVal = *source; and newVal.floatVal = prevVal.floatVal + operand; ) has been done, i.e by comparing the value stored at address source with the preVal.intVal.
During this atomic operation (which is not uninterruptible by definition), as value stored at source is different from prevVal.intVal, the new value stored at source is newVal.intVal, which is actually a float (because it is coded on 4 bytes like integer).
Can we say that each work-item has a mutex access (I mean a locked access) to value located at source address.
But for each work-item thread, is there only one iteration into the while loop?
I think there will be one iteration because the comparison "*source== prevVal.int ? newVal.intVal : newVal.intVal" will always assign newVal.intVal value to value stored at source address, won't it?
I have not understood all the subtleties of this trick for this kernel code.
Update
Sorry, I almost understand all the subtleties, especially in the while loop :
First case : for a given single thread, before the call of atomic_cmpxchg, if prevVal.floatVal is still equal to *source, then atomic_cmpxchg will change the value contained in source pointer and return the value contained in old pointer, which is equal to prevVal.intVal, so we break from the while loop.
Second case : If between the prevVal.floatVal = *source; instruction and the call of atomic_cmpxchg, the value *source has changed (by another thread ??) then atomic_cmpxchg returns old value which is no more equal to prevVal.floatVal, so the condition into while loop is true and we stay in this loop until previous condition isn't checked any more.
Is my interpretation correct?
If I understand well, each work-item shares the access to source variable thanks to the qualifier "volatile", doesn't it?
volatile is a keyword of the C language that prevents the compiler from optimizing accesses to a specific location in memory (in other words, force a load/store at each read/write of said memory location). It has no impact on the ownership of the underlying storage. Here, it is used to force the compiler to re-read source from memory at each loop iteration (otherwise the compiler would be allowed to move that load outside the loop, which breaks the algorithm).
do {
prevVal.floatVal = *source; // Force read, prevent hoisting outside loop.
newVal.floatVal = prevVal.floatVal + operand;
} while(atomic_cmpxchg((volatile local unsigned int *)source, prevVal.intVal, newVal.intVal) != prevVal.intVal)
After removing qualifiers (for simplicity) and renaming parameters, the signature of atomic_cmpxchg is the following:
int atomic_cmpxchg(int *ptr, int expected, int new)
What it does is:
atomically {
int old = *ptr;
if (old == expected) {
*ptr = new;
}
return old;
}
To summarize, each thread, individually, does:
Load current value of *source from memory into preVal.floatVal
Compute desired value of *source in newVal.floatVal
Execute the atomic compare-exchange described above (using the type-punned values)
If the result of atomic_cmpxchg == newVal.intVal, it means the compare-exchange was successful, break. Otherwise, the exchange didn't happen, go to 1 and try again.
The above loop eventually terminates, because eventually, each thread succeeds in doing their atomic_cmpxchg.
Can we say that each work-item has a mutex access (I mean a locked access) to value located at source address.
Mutexes are locks, while this is a lock-free algorithm. OpenCL can simulate mutexes with spinlocks (also implemented with atomics) but this is not one.
The problem is derive from OJ.
The description is :
We are playing the Guess Game. The game is as follows:
I pick a number from 1 to n. You have to guess which number I picked.
Every time you guess wrong, I'll tell you whether the number I picked is higher or lower.
However, when you guess a particular number x, and you guess wrong, you pay $x. You win the game when you guess the number I picked.
Given a particular n ≥ 1, find out how much money you need to have to guarantee a win.
I write small snippet about MinMax problem in recursion. But it is slow and I want to rewrite it in a iterative way. Could anyone help with that and give me the idea about how you convert the recursive solution to iterative one? Any idea is appreciated. The code is showed below:
public int getMoneyAmount(int n) {
int[][] dp = new int[n + 1][n + 1];
for(int i = 0; i < dp.length; i++)
Arrays.fill(dp[i], -1);
return solve(dp, 1, n);
}
private int solve(int[][] dp, int left, int right){
if(left >= right){
return 0;
}
if(dp[left][right] != -1){
return dp[left][right];
}
dp[left][right] = Integer.MAX_VALUE;
for(int i = left; i <= right; i++){
dp[left][right] = Math.min(dp[left][right], i + Math.max(solve(dp, left, i - 1),solve(dp, i + 1, right)));
}
return dp[left][right];
}
In general, you convert using some focused concepts:
Replace the recursion with a while loop -- or a for loop, if you can pre-determine how many iterations you need (which you can do in this case).
Within the loop, check for the recursion's termination conditions; when you hit one of those, skip the rest of the loop.
Maintain local variables to replace the parameters and return value.
The loop termination is completion of the entire problem. In your case, this would be filling out the entire dp array.
The loop body consists of the computations that are currently in your recursion step: preparing the arguments for the recursive call.
Your general approach is to step through a nested (2-D) loop to fill out your array, starting from the simplest cases (left = right) and working your way to the far corner (left = 1, right = n). Note that your main diagonal is 0 (initialize that before you get into the loop), and your lower triangle is unused (don't even bother to initialize it).
For the loop body, you should be able to derive how to fill in each succeeding diagonal (one element shorter in each iteration) from the one you just did. That assignment statement is the body. In this case, you don't need the recursion termination conditions: the one that returns 0 is what you cover in initialization; the other you never hit, controlling left and right with your loop indices.
Are these enough hints to get you moving?
I am doing one project in which I define a data types like below
typedef QVector<double> QFilterDataMap1D;
typedef QMap<double, QFilterDataMap1D> QFilterDataMap2D;
Then there is one class with the name of mono_data in which i have define this variable
QFilterMap2D valid_filters;
mono_data Scan_data // Class
Now i am reading one variable from a .mat file and trying to save it in to above "valid_filters" QMap.
Qt Code: Switch view
for(int i=0;i<1;i++)
{
for(int j=0;j<1;j++)
{
Scan_Data.valid_filters[i][j]=valid_filters[i][j];
printf("\nValid_filters=%f",Scan_Data.valid_filters[i][j]);
}
}
The transferring is done successfully but then it gives run-time error
Windows has triggered a breakpoint in SpectralDataCollector.exe.
This may be due to a corruption of the heap, and indicates a bug in
SpectralDataCollector.exe or any of the DLLs it has loaded.
The output window may have more diagnostic information
Can anyone help in solving this problem. It will be of great help to me.
Thanks
Different issues here:
1. Using double as key type for a QMap
Using a QMap<double, Foo> is a very bad idea. the reason is that this is a container that let you access a Foo given a double. For instance:
map[0.45] = foo1;
map[15.74] = foo2;
This is problematic, because then, to retrieve the data contained in map[key], you have to test if key is either equal, smaller or greater than other keys in the maps. In your case, the key is a double, and testing if two doubles are equals is not a "safe" operation.
2. Using an int as key while you defined it was double
Here:
Scan_Data.valid_filters[i][j]=valid_filters[i][j];
i is an integer, and you said it should be a double.
3. Your loop only test for (i,j) = (0,0)
Are you aware that
for(int i=0;i<1;i++)
{
for(int j=0;j<1;j++)
{
Scan_Data.valid_filters[i][j]=valid_filters[i][j];
printf("\nValid_filters=%f",Scan_Data.valid_filters[i][j]);
}
}
is equivalent to:
Scan_Data.valid_filters[0][0]=valid_filters[0][0];
printf("\nValid_filters=%f",Scan_Data.valid_filters[0][0]);
?
4. Accessing a vector with operator[] is not safe
When you do:
Scan_Data.valid_filters[i][j]
You in fact do:
QFilterDataMap1D & v = Scan_Data.valid_filters[i]; // call QMap::operator[](double)
double d = v[j]; // call QVector::operator[](int)
The first one is safe, and create the entry if it doesn't exist. The second one is not safe, the jth element in you vector must already exist otherwise it would crash.
Solution
It seems you in fact want a 2D array of double (i.e., a matrix). To do this, use:
typedef QVector<double> QFilterDataMap1D;
typedef QVector<QFilterDataMap1D> QFilterDataMap2D;
Then, when you want to transfer one in another, simply use:
Scan_Data.valid_filters = valid_filters;
Or if you want to do it yourself:
Scan_Data.valid_filters.clear();
for(int i=0;i<n;i++)
{
Scan_Data.valid_filters << QFilterDataMap1D();
for(int j=0;j<m;j++)
{
Scan_Data.valid_filters[i] << valid_filters[i][j];
printf("\nValid_filters=%f",Scan_Data.valid_filters[i][j]);
}
}
If you want a 3D matrix, you would use:
typedef QVector<QFilterDataMap2D> QFilterDataMap3D;
Which hashmap collision handling scheme is better when the load factor is close to 1 to ensure minimum memory wastage?
I personally think the answer is open addressing with linear probing, because it doesn't need any additional storage space in case of collisions. Is this correct?
Answering the question: Which hashmap collision handling scheme is better when the load factor is close to 1 to ensure minimum memory wastage?
Open addressing/probing that allows a high fill. Because as you said so yourself, there is no extra space required for collisions (just, well, possibly time -- of course this is also assuming the hash function isn't perfect).
If you did not specify "load factor close to 1" or included "cost" metrics in the question then it would be entirely different.
Happy coding.
A hashmap that is that full will degrade into a linear search, so you will want to keep them under 90% full.
You are right about open addressing using less memory, chaining will need a pointer or offset field in each node.
I have created a hasharray data structure for when I need very lightweight hashtables that will not have alot of inserts. To keep memory usage low all data is embedded in the same block of memory, with the HashArray structure at the start, then two arrays for hashs & values. Hasharray can only be used with the lookup key is stored in the value.
typedef uint16_t HashType; /* this can be 32bits if needed. */
typedef uint16_t HashSize; /* this can be made 32bits if large hasharrays are needed. */
struct HashArray {
HashSize length; /* hasharray length. */
HashSize count; /* number of hash/values pairs contained in the hasharray. */
uint16_t value_size; /* size of each value. (maximum size of value 64Kbytes) */
/* these last two fields are just for show, they are not defined in the HashArray struct. */
uint16_t hashs[length]; /* array of hashs for each value, this helps with resolving bucket collision */
uint8_t values[length * value_size]; /* array holding all values. */
};
#define hasharray_get_hashs(array) (HashType *)(((uint8_t *)(array)) + sizeof(HashArray))
#define hasharray_get_values(array) ((uint8_t *)(array)) + sizeof(HashArray) + \
((array)->length * sizeof(HashType))
#define hasharray_get_value(array, idx) (hasharray_get_values(array) + ((idx) * (array)->value_size))
The macros hasharray_get_hashs & hasharray_get_values are used to get the 'hashs' & 'values' arrays.
I have used this for adding fast lookup of complex objects that are already stored in an array. The objects have a string 'name' field which is used for the lookup. The names are hashed and inserted into the hasharray with the objects index. The values stored in the hasharray can be indexes/pointers/whole objects (I only use small 16bit index values).
If you want to pack the hasharray till it is almost full, then you will want to use full 32bit Hashs instead of the 16bit ones defined above. Larger 32bit hashs will help keep searchs fast when the hasharray is more then 90% full.
The hasharray as defined above can only hold a maximum of 65535, which is fine since I never use it on anything that would have more the a few hundred values. Anything that needs more that that should just use an normal hashtable. But if memory is really an issue, the HashSize type could be changed to 32bits. Also I use power-of-2 lengths to keep the hash lookup fast. Some people prefer to use prime bucket lengths, but that is only needed if the hash function has bad distribution.
#define hasharray_empty_hash 0xFFFF /* hash value to mark empty slots. */
void *hasharray_search(HashArray *array, HashType hash, uint32_t *next) {
HashType *hashs = hasharray_get_hashs(array);
uint32_t mask = array->length - 1;
uint32_t start_idx;
uint32_t idx;
hash = (hash == hasharray_empty_hash) ? 0 : hash; /* need one hash value to mark empty slots. */
start_hash_idx = (hash & mask);
if(*next == 0) {
idx = start_idx; /* new search. */
} else {
idx = *next & mask; /* continuing search to next slot. */
}
/* find hash in hash array. */
do {
/* check for hash match. */
if(hashs[idx] == hash) goto found_hash;
/* check for end of chain. */
if(hashs[idx] == hasharray_empty_hash) break;
idx++;
idx &= mask;
} while(idx != start_idx);
/* maximum tries reached (i.e. did a linear search of whole array) or end of chain. */
return NULL;
found_hash:
*next = idx + 1; /* where to continue search at, if this is not the right value. */
return hasharray_get_values(array) + (idx * array->value_size);
}
hash collisions will happen so the code that calls hasharray_search() needs to compare the search key with the one stored in the value object. If they don't match then hasharray_search() is called again. Also non-unique keys can exist, since searching can continue until 'NULL' is returned to find all values that match one key. The search function uses linear probing to be cache freindly.
typedef struct {
char *name; /* this is the lookup key. */
char *type;
/* other field info... */
} Field;
typedef struct {
Field *list; /* array of Field objects. */
HashArray *lookup; /* hasharray for fast lookup of Field objects by name. The values stored in this hasharray are 16bit indices. */
uint32_t field_count; /* number of Field objects in 'list'. */
} Fields;
extern Fields *fields_new(uint16_t count) {
Fields *fields;
fields = calloc(1, sizeof(Fields));
fields->list = calloc(count, sizeof(Field));
/* allocate hasharray to hold at most 'count' uint16_t values.
* The hasharray will round 'count' up to the next power-of-2.
* That power-of-2 length must be atleast (count+1), so that there will always be one empty slot.
*/
fields->lookup = hasharray_new(count, sizeof(uint16_t));
fields->field_count = count;
}
extern Field *fields_lookup_by_name(Fields *fields, const char *name) {
HashType hash = str_to_hash(name);
Field *field;
uint32_t next = 0;
uint16_t *rc;
uint16_t idx;
do {
rc = hasharray_search(fields->lookup, hash, &next);
if(rc == NULL) break; /* field not found. */
/* found a possible match. */
idx = *rc;
assert(idx < fields->field_count);
field = &(fields->list[idx]);
/* compare lookup name with field's name. */
if(strcmp(name, field->name) == 0) {
/* found match. */
return field;
}
/* field didn't match continue search to next field. */
} while(1);
return NULL;
}
The worst case searching will degrade to a linear search of the whole array if it is 99% full and the key doesn't exist. If the keys are integers, then a linear search shouldn't be to bad, also only keys with the same hash value will need to be compared. I try to keep the hasharrays sized so they are only about 70-80% full, the space wasted on empty slots isn't much if the values are only 16bit values. With this design you only waste 4bytes per empty slot when using 16bit hashs & 16bit index values. The array of objects (Field structs in the above example) has no empty spots.
Also most hashtable implementations that I have seen don't store the computed hashs and require full key compares to resolve bucket collisions. Comparing the hashs helps a lot since only a small part of the hash value is used to lookup the bucket.
As the others said, in linear probing, when load factor near to 1, the time complexity near to linear search. (When it's full, its infinite.) There is a memory-efficiency trade off here. While segregate chaining always give us theoretically constant time.
Normally, under linear probing, it's recommended to keep the load factor between 1/8 and 1/2. when the array is 1/2 full, we resize it to double the size of original array. (Reference: Algorithms. by Robert Sedgewick. Kevin Wayne. ). When delete, we resize the array to 1/2 of original size as well. If you are really interested, it's good for you to begin with the book I mentioned above.
In practical, it's said that 0.72 is an empirical value we usually use.