I am on several lines codes from a Qt project to appending into a Qvector every 1s. I noticed that in STL deque could have a better performance in adding a new element into the end that vector. What's the equivalent or similar Qt way? Cause I don't find any in Qt libraries.
Julio
There is no direct equivalent to the std::deque class in QT.
However, your best bet is to use QList.
Here is what the documentation says about QT container classes:
For most purposes, QList is the right class to use. Its index-based API is more convenient than QLinkedList's iterator-based API, and it is usually faster than QVector because of the way it stores its items in memory. It also expands to less code in your executable.
Anyways, if you are only appending items once every second, there will not be much impact to choose one over the other.
There is no need to have a Qt equivalent for every std container, you can use std::deque if that is what you are after.
Anyway, note that for the case when you do a lot of insertions at the end of the vector both std::vector and QVector have a member function named reserve (see the links) that can be used to pre-allocates a bigger buffer and make insertions at the end faster.
Related
There are a lot of questions online about allocating, copying, indexing, etc 2d and 3d arrays on CUDA. I'm getting a lot of conflicting answers so I'm attempting to compile past questions to see if I can ask the right ones.
First link: https://devtalk.nvidia.com/default/topic/392370/how-to-cudamalloc-two-dimensional-array-/
Problem: Allocating a 2d array of pointers
User solution: use mallocPitch
"Correct" inefficient solution: Use malloc and memcpy in a for loop for each row (Absurd overhead)
"More correct" solution: Squash it into a 1d array "professional opinion," one comment saying no one with an eye on performance uses 2d pointer structures on the gpu
Second link: https://devtalk.nvidia.com/default/topic/413905/passing-a-multidimensional-array-to-kernel-how-to-allocate-space-in-host-and-pass-to-device-/
Problem: Allocating space on host and passing it to device
Sub link: https://devtalk.nvidia.com/default/topic/398305/cuda-programming-and-performance/dynamically-allocate-array-of-structs/
Sub link solution: Coding pointer based structures on the GPU is a bad experience and highly inefficient, squash it into a 1d array.
Third link: Allocate 2D Array on Device Memory in CUDA
Problem: Allocating and transferring 2d arrays
User solution: use mallocPitch
Other solution: flatten it
Fourth link: How to use 2D Arrays in CUDA?
Problem: Allocate and traverse 2d arrays
Submitted solution: Does not show allocation
Other solution: squash it
There are a lot of other sources mostly saying the same thing but in multiple instances I see warnings about pointer structures on the GPU.
Many people claim the proper way to allocate an array of pointers is with a call to malloc and memcpy for each row yet the functions mallocPitch and memcpy2D exist. Are these functions somehow less efficient? Why wouldn't this be the default answer?
The other 'correct' answer for 2d arrays is to squash them into one array. Should I just get used to this as a fact of life? I'm very persnickety about my code and it feels inelegant to me.
Another solution I was considering was to max a matrix class that uses a 1d pointer array but I can't find a way to implement the double bracket operator.
Also according to this link: Copy an object to device?
and the sub link answer: cudaMemcpy segmentation fault
This gets a little iffy.
The classes I want to use CUDA with all have 2/3d arrays and wouldn't there be a lot of overhead in converting those to 1d arrays for CUDA?
I know I've asked a lot but in summary should I get used to squashed arrays as a fact of life or can I use the 2d allocate and copy functions without getting bad overhead like in the solution where alloc and cpy are called in a for loop?
Since your question compiles a list of other questions, I'll answer by compiling a list of other answers.
cudaMallocPitch/cudaMemcpy2D:
First, the cuda runtime API functions like cudaMallocPitch and cudaMemcpy2D do not actually involve either double-pointer allocations or 2D (doubly-subscripted) arrays. This is easy to confirm simply by looking at the documentation, and noting the types of parameters in the function prototypes. The src and dst parameters are single-pointer parameters. They could not be doubly-subscripted, or doubly dereferenced. For additional example usage, here is one of many questions on this. here is a fully worked example usage. Another example covering various concepts associated with cudaMallocPitch/cudaMemcpy2d usage is here. Instead the correct way to think about these is that they work with pitched allocations. Also, you cannot use cudaMemcpy2D to transfer data when the underlying allocation has been created using a set of malloc (or new, or similar) operations in a loop. That sort of host data allocation construction is particularly ill-suited to working with the data on the device.
general, dynamically allocated 2D case:
If you wish to learn how to use a dynamically allocated 2D array in a CUDA kernel (meaning you can use doubly-subscripted access, e.g. data[x][y]), then the cuda tag info page contains the "canonical" question for this, it is here. The answer given by talonmies there includes the proper mechanics, as well as appropriate caveats:
there is additional, non-trivial complexity
the access will generally be less efficient than 1D access, because data access requires dereferencing 2 pointers, instead of 1.
(note that allocating an array of objects, where the object(s) has an embedded pointer to a dynamic allocation, is essentially the same as the 2D array concept, and the example you linked in your question is a reasonable demonstration for that)
Also, here is a thrust method for building a general dynamically allocated 2D array.
flattening:
If you think you must use the general 2D method, then go ahead, it's not impossible (although sometimes people struggle with the process!) However, due to the added complexity and reduced efficiency, the canonical "advice" here is to "flatten" your storage method, and use "simulated" 2D access. Here is one of many examples of questions/answers discussing "flattening".
general, dynamically allocated 3D case:
As we extend this to 3 (or higher!) dimensions, the general case becomes overly complex to handle, IMO. The additional complexity should strongly motivate us to seek alternatives. The triply-subscripted general case involves 3 pointer accesses before the data is actually retrieved, so even less efficient. Here is a fully worked example (2nd code example).
special case: array width known at compile time:
Note that it should be considered a special case when the array dimension(s) (the width, in the case of a 2D array, or 2 of the 3 dimensions for a 3D array) is known at compile-time. In this case, with an appropriate auxiliary type definition, we can "instruct" the compiler how the indexing should be computed, and in this case we can use doubly-subscripted access with considerably less complexity than the general case, and there is no loss of efficiency due to pointer-chasing. Only one pointer need be dereferenced to retrieve the data (regardless of array dimensionality, if n-1 dimensions are known at compile time for a n-dimensional array). The first code example in the already-mentioned answer here (first code example) gives a fully worked example of that in the 3D case, and the answer here gives a 2D example of this special case.
doubly-subscripted host code, singly-subscripted device code:
Finally another methodology option allows us to easily mix 2D (doubly-subscripted) access in host code while using only 1D (singly-subscripted, perhaps with "simulated 2D" access) in device code. A worked example of that is here. By organizing the underlying allocation as a contiguous allocation, then building the pointer "tree", we can enable doubly-subscripted access on the host, and still easily pass the flat allocation to the device. Although the example does not show it, it would be possible to extend this method to create a doubly-subscripted access system on the device based off a flat allocation and a manually-created pointer "tree", however this would have approximately the same issues as the 2D general dynamically allocated method given above: it would involve double-pointer (double-dereference) access, so less efficient, and there is some complexity associated with building the pointer "tree", for use in device code (e.g. it would necessitate an additional cudaMemcpy operation, probably).
From the above methods, you'll need to choose one that fits your appetite and needs. There is not one single recommendation that fits every possible case.
There are lots of real-world reasons you'd want to do this. Ours is because we have a list of variable length data structures, and we want to be able to change the size of one of the elements without recopying them all.
Here's a few things I've tried:
Just have a lot of kernel arguments. Sure, sounds hacky, but works for small N. This is actually what we've been doing.
Do 1) with some sort of macro loop which extends the kernel args to the max size (which I think is device dependent). I don't really want to do this... it sounds bad.
Create some sort of list of structs which contain pointers, and fill it before your kernel invocation. I tried this, and I think it violates the spec. According to what I've seen on the nVidia forums, preserving the address of a device pointer beyond one kernel invocation is illegal. If anyone can point to where in the spec it says this, I'd love to know, because I can't find it. However, this definitely breaks on ATI hardware, as it moves the objects around.
Give up, store the variable sized objects in a big array, and write a clever algorithm to use empty space so the whole array must be reflowed less often. This will work, but is an inelegant, complicated design. Also, it requires lots of scary pointer arithmetic...
Does anyone else have other ideas? What about experiences trying to do this; is there a least hacky way? Why?
To 3:
OpenCL 1.1 spec page 193 says "Arguments to kernel functions in a program cannot be declared as a pointer to a pointer(s)."
Struct containing a pointer to pointer (pointer to a buffer object) might not be against strict reading of this sentence but it's within the spirit: No pointers to buffer objects may be passed as arguments from host code to kernel even if they're hidden inside a user defined struct.
I'd opt for option 5: Do not use variable size data structures. If you have any way of making them constant size by all means do it. It will make your life a whole lot easier. To be precise there is no 'variable size structure'. Every struct definition produces constant sized structs, so if the size has changed then the struct itself has changed and therefore requires another mem object. Every pointer passed to kernel function must have a single type.
In addition to sharpnelis answer option 5:
If the objects have similar size you could use unions on the biggest possible object size. But make sure you use explicit alignment. Pass a second buffer identifying the union used in each object in your variable-sized-objects-in-static-size-union buffer.
I reverted to this when using opencl lib code that only allowed one variable array of arbitrary type. I simply used cl_float2 to pass two floats. Since the cl_floatN types are implemented as unions - what works for the build in types will work for you as well.
The OpenCL language, which extends C99, does not provide the memcpy function. What should be used instead?
As far as I know, there is nothing like that defined in OpenCL. OpenCL does not provide a concept like dynamic memory and therefore, such functionality is not needed.
You could just run over your array with for and copy the data element by element. But, the target array is of fixed size due to the need to specify the array length at compile time.
On the other side, OpenCL (and OpenGL as a kind of origin) was defined in a more static way. The data needs to be provided to the GPU and the result size needs to be defined. The GPU calculates the input to the pre-defined output location. It is not meant to create more processes within the GPU and it is also not meant to allocate dynamically memory to not disturbed the host doing it.
i wan't to use Native OpenGL in the paint function of my widgets(QPainter), to improve performance.
i saw that there is function QPainter::begin/endNativePainting(), that can help me.
but i can't find examples for that...
i wanted to know if those functions are low cost, or evry use of them reduce performance?
2.can i define beginNativePainting() and endNativePainting(), in general for all the widgets i use, instead of using that in every paint function i have.
tnx for any help....
There is some basic example code right in the documentation: http://doc.qt.io/qt-4.8/qpainter.html#beginNativePainting
The functions themselves should be fairly low-cost, but calling them might still cause a noticeably overhead, because Qt has to flush its internal painting queue on the beginNativePainting() call and probably has to assume that everything is changed as soon as endNativePainting() is called.
For the second part I am not sure if I understand what you are aiming at. Basically if you have a QPainter object, you can call beginNativePainting() once. But you have to match it with an endNativePainting() call. So the usual place would be the paint() method.
Qt is using a range of OpenGL functionalities to implement its 2D painting, including custom shaders and various frame buffers. It puts OpenGL into a pretty messy state.
beginNativePainting / endNativePainting are there to allow Qt's drawing engine to save this context and retrieve it once the user is done drawing.
It would have been nice to have the xxxNativePainting methods do the contrary (i.e. automatically save and restore user configuration of OpenGL), but since Qt allows to call OpenGL primitives directly, saving the global state is nigh impossible without tons of code and potential serious performance hit.
Instead, these methods simply save Qt's internal OpenGL state and, rather than having user code start in a configuration that would be meaningless anyway (and likely to change with each new Qt release), reset OpenGL to a "neutral" state.
It means that, inside a begin/end section, you will start with a clean slate: no shader linked, no vertex array, most of global parameters reset, etc.
Contrary to a simple QGLWidget / PaintGL scenario where you can afford to setup the global OpenGL state once and for all and simply call the rendering primitives each frame, you will have to restore pretty much everything just after the call to beginNativePainting (link/bind your shaders, set global parameters, select and enable various buffers, etc).
It also means that you should use native painting sparringly. Having each single widget do custom painting might soon bring your rendering to its knees.
I've been using Object as a way to have a generic associative array (map/dictionary) since AS3/Flex seems to be very limited in this regard. But I really don't like it coming from a C++/Java/C# background. Is there a better way, some standard class I've not come across... is this even considered good/bad in AS3?
Yes, Actionscript uses Object as a generic associative container and is considered the standard way of doing this.
There is also a Dictionary class available, flash.utils.Dictionary.
The difference is that Dictionary can use any value as a key, including objects, while Object uses string keys. For most uses, Object is preferred as it is faster and covers the majority of use cases.
You can see the details on Object here: http://livedocs.adobe.com/flash/9.0/ActionScriptLangRefV3/Object.html
and Dictionary here: http://livedocs.adobe.com/flash/9.0/ActionScriptLangRefV3/flash/utils/Dictionary.html
and the differences between them here: http://livedocs.adobe.com/flex/3/html/help.html?content=10_Lists_of_data_4.html
I'm afraid there's no native alternative to Object or Dictionary for maps and other structures. As for standard, well, it depends on how one defines standard, but there are a couple of known libraries that you might like to check out if you look for Java style collections.
Like this one:
http://sibirjak.com/blog/collections/as3commons-collections/
Also, you could take a look at this question, that has links to a couple of ds libraries (including the above one).
Collections in Adobe Flex
I wouldn't say using Objects is either good or bad practice. In the general case they are faster than any Actionscript alternative (since they are native), but less featured. Sometimes the provided functionality is good enough. Sometimes, it's a bit bare-bones, so something more structured could help you getting rid of lower level details in your code and focusing in your "domain logic", so to speak.
In the end, all of these libraries implement their data structures through Objects, Dictionaries and Arrays (or Vectors). So, if the native objects are fine for your needs, I'd say go with them. On the other hand, if you find yourself basically re-writting, say, an ad-hoc Set, perhaps, using one of these libs would be a wise choice.