What is Vector data structure - vector

I know Vector in C++ and Java, it's like dynamic Array, but I can't find any general definition of Vector data structure. So what is Vector? Is Vector a general data structure(like arrray, stack, queue, tree,...) or it just a data type depending on language?

The word "vector" as applied to computer science/programming is borrowed from math, which can make the use confusing (even your question could be on multiple subjects).
The simplest example of vectors in math is the number line, used to teach elementary math (especially to help visualize negative numbers, subtraction of negative numbers, addition of negative numbers, etc).
The vector is a distance and direction from a point. This is why it can confuse the discussion, because a vector data structure COULD be three points, X,Y,Z, in a structure used in 3D graphics engines, or a 2D point (just X,Y). In that context, the subtraction of two such points results in a vector - the vector describes how far and in what direction to travel from one of the source operands to the other.
This applies to storage, like stl vectors or Java vectors, in that storage is represented as a distance from an address (where a memory address is similar to a point in space, or on a number line).
The concept is related to arrays, because arrays could be the storage allocated for a vector, but I submit that the vector is a larger concept than the array. A vector must include the concept of distance from a starting point, and if you think of the beginning of an array as the starting point, the distance to the end of the array is it's size.
So, the data structure representing a vector must include the size, whereas an array doesn't have storage to include the size, it's assumed by the way it's allocated. That is to say, if you dynamically allocate an array, there is no data structure storing the size of that array, the programmer must assume to know that size, or store it in a some integer or long.
The vector data structure (say, the design of a vector class) DOES need to store the size, so at a minimum, there would be a starting point (the base of an array, or some address in memory) and a distance from that point indicating size.
That's really "RAM" oriented, though, in description, because there's one more point not yet described which must be part of the data describing the vector - the notion of element size. If a vector represents bytes, and memory storage is typically measured in bytes, an address and a distance (or size) would represent a vector of bytes, but nothing else - and that's a very machine level thinking. A higher thought, that of some structure, has it's own size - say, the size of a float or double, or of a structure or class in C++. Whatever the element size is, the memory required to store N of them requires that the vector data structure have some knowledge of WHAT it's storing, and how large that thing is. This is why you'd think in terms of "a vector of strings" or "a vector of points". A vector must also store an element size.
So, a basic vector data structure must have:
An address (the starting point)
An element size (each thing it stores is X bytes long)
A number of elements stored (how many elements times element size is 'minimum' storage size).
One important "assumption" made in this simple 3 item list of entries in the vector data structure is that the address is allocated memory, which must be freed at some point, and is to be guarded against access beyond the end of the vector.
That means there's something missing. In order to make a vector class work, there is a recognizable difference between the number of ITEMS stored in the vector, and the amount of memory ALLOCATED for that storage. Typically, as you might realize from the use of vector from the STL, it may "know" it has room to store 10 items, but currently only has 2 of them.
So, a working vector class would ALSO have to store the amount of memory allocation. This would be how it could dynamically extend itself - it would now have sufficient information to expand storage automatically.
Thinking through just how you would make a vector class operate gives you the structure of data required to operate a vector class.

It's an array with dynamically allocated space, everytime you exceed this space new place in memory is allocated and old array is copied to the new one. Old one is freed then.
Moreover, vector usually allocates more memory, than it needs to, so it does not have to copy all the data, when new element is added.
It may seem, that lists then are much much better, but it's not necessarily so. If you do not change your vector often (in terms of size), then computer's cache memory functions much better with vectors, than lists, because they are continuus in memory space. Disadvantage is when you have large vector, that you need to expand. Then you have to agree to copy large amount of data to another space in memory.
What's more. You can add new data to the end and to the front of the vector. Because Vector's are array-like, then every time you want to add element to the beginning of the vector all the array has to be copied. Adding elements to the end of vector is far more efficient. There's no such an issue with linked lists.
Vector gives random access to it's internal kept data, while lists,queues,stacks do not.

Vectors are the same as dynamic arrays with the ability to resize
itself automatically when an element is inserted or deleted.
Vector elements are placed in contiguous storage so that they can be
accessed and traversed using iterators.
In vectors, data is inserted at the end.

Related

What is the most efficient and portable way to define an order on pointers?

I have a slice that contains pointers to values. In a performance-critical part of my program, I'm adding or removing values from this slice. For the moment, inserting a value is just an append (O(1) complexity), and removal consists in searching the slice for the corresponding pointer value, from 0 to n-1, until the pointer is found (O(n)). To improve performance, I'd like to sort values in the slice, so that searching can be done using dichotomy (so O(log(n)).
But how can I compare pointer values? Pointer arithmetic is forbidden in go, so AFAIK to compare pointer values p1 and p2 I have to use the unsafe package and do something like
uintptr(unsafe.Pointer(p1)) < uintptr(unsafe.Pointer(p2))
Now, I'm not comfortable using unsafe, at least because of its name. So, is that method correct? Is it portable? Are there potential pitfalls? Is there a better way to define an order on pointer values? I know I could use maps, but maps are slow as hell.
As said by others, don't do this. Performance can't be that critical to resort to pointer arithmetic in Go.
Pointers are comparable, Spec: Comparison operators:
Pointer values are comparable. Two pointer values are equal if they point to the same variable or if both have value nil. Pointers to distinct zero-size variables may or may not be equal.
Just use a map with the pointers as keys. Simple as that. Yes, indexing maps is slower than indexing slices, but then again, if you'd want to keep your slice sorted and you wanted to perform binary searches in that, then the performance gap decreases, as the (hash) map implementation provides you O(1) lookup while binary search is only O(log n). In case of big data set, the map might even be faster than searching in the slice.
If you anticipate a big number of pointers in the map, then pre-allocate a big one with make() passing an estimated upper size, and until your map exceeds this size, no reallocation will occur.
m := make(map[*mytype]struct{}, 1<<20) // Allocate map for 1 million entries

How to extract motion vectors and info on frame partition in HEVC HM 16.15

I am using the HEVC reference software, HM Encoder Version [16.15] (including RExt) on a [Mac OS X][GCC 4.2.1][64 bit] and would like to extract at encoder side:
1) the motion vectors for each block
2) the frame partition information, i.e. the size and location of each block in the frame to which a motion vector refers.
Does anybody have hints on what are the variables where this info is stored for each coding unit? Thanks!
All you need is available in the TComDataCU class.
1) For motion information, there is the function getCUMvField() which returns the motion vector. It's not easy to work with it though.
Basically, to access almost any of the PU/CU level syntax elements, you need to be able to work with the absolute index of that PU/CU. This unique index tells you where exactly your PU/CU is located in the CTU by pointing to the up-left 4x4 block of that part.
I rememberthat that most of the times this index is stored in the variable uiAbsPartIdx.
If you get to know how to work with this index, then you will be able to get the block partitioning information in the CTU level. so for 2) my suggestion is that you go to the slice level when you have a loop over CUTs (I think this is done in the compressSlice() function). And after the compressCtu() function is called for each CTU (which means that all RDO decisions have been made and the CTU partitioning is decided), you put a loop over all uiAbsPartIdxs of the CTU and get their width and height. For example if your CTU size is 128, then you will have 32*32=1024 unique 4x4 blocks in your CTU. The function for getting width/height of the CU corresponding to a certain uiAbsPartIdx is pCtu->getWidth(uiAbsPartIdx).
I hope it was clear.

Is it ok to create big array of AVX/SSE values

I am parallelizing a certain dynamic programming problem using AVX2/SSE instructions.
In the main iteration of my calculation, I calculate column in matrix where each cell is a structure of AVX2 registers (_m256i). I use values from the previous matrix column as input values for calculating the current column. Columns can be big, so what I do is I have an array of structures (on stack), where each structure has two _m256i elements.
Structure:
struct Cell {
_m256i first;
_m256i second;
};
An then I have array like this: Cell prevColumn [N]. N will tipically be few hundreds.
I know that _m256i basically represents an avx2 register, so I am wondering how should I think about this array, how does it behave, since N is much larger than 16 (which is number of avx registers)? Is it a good practice to create such an array, or is there some better approach that i should use when storing a lot of _m256i values that are going to be reused real soon?
Also, is there any aligning I should be doing with this structures? I read a lot about aligning, but I am still not sure how and when to do it exactly.
It's better to structure your code to do everything it can with a value before moving on. Small buffers that fit in L1 cache aren't going to be too bad for performance, but don't do that unless you need to.
I think it's more typical to write your code with buffers of int [] type, rather than __m256i type, but I'm not sure. Either way works, and should get the compile to generate efficient code. But the int [] way means less code has to be different for the SSE, AVX2, and AVX512 version. And it might make it easier to examine things with a debugger, to have your data in an array with a type that will get the data formatted nicely.
As I understand it, the load/store intrinsics are partly there as a cast between _m256i and int [], since AVX doesn't fault on unaligned, just slows down on cacheline boundaries. Assigning to / from an array of _m256i should work fine, and generate load/store instructions where needed, otherwise generate vector instructions with memory source operands. (for more compact code and fewer fused-domain uops.)

Do Bit vectors store memory locations?

Hello Computer Science World,
I am trying to answer this question..
A bit vector is simply an array of bits (0s and 1s). A bit vector of length m takes
much less space than an array of m pointers. Describe how to use a bit vector
to represent a dynamic set of distinct elements with no satellite data. Dictionary
operations should run in O(1) time.
My thinking is that a bit vector can be used to store the memory locations of the elements and since we are assuming that no two elements have the same key we can use hash function to store the memory location and access it in O(1) time.
Do Bit vectors store memory locations ?
If not can someone guide me to the promise land.
Thanks

HDF5: writing strings, structs, and vectors (oh, my)

I have a struct defined as such:
typedef struct {
string mName;
vector<int> mParts;
} AGroup;
I'm storing instances of this struct in a vector. I need to write this to an HDF (v5) file. I guess I could loop through each instance to find the longest mName, and longest mParts, create a new, non-variable length, array to hold the information, and then write that array to the file.
Is that the best way to do it? It seems overly complex just to write some data.
Variable length arrays and string introduce overhead. But they also make sense as they reflect more accurately your data structure. You could go with a compound datatype made of a variable length C string and a variable length array of integers.
If your string and vectors all have a size close to the same upper bound, save yourself some time and trouble and use fixed length strings and arrays.
It all depends on the size of your dataset. In terms of disk space, there is a tradeoff between overhead of variable length elements and wasted space in fixed size elements that is hard to estimate without trying. If your dataset is small, do what is more convenient to you. If it is large, choose what you favor most: avoid wasted space, semantic of your data, ease of programming, etc. and optimize according to this criteria.

Resources