I have some elements in a QMap<double, double> a-element. Now I want to get a vector of some values of a. The easiest approach would be (for me):
int length = x1-x0;
QVector<double> retVec;
for(int i = x0; i < length; i++)
{
retVec.push_back(a.values(i));
}
with x1 and x0 as the stop- and start-positions of the elements to be copied. But is there a faster way instead of using this for-loop?
Edit: With "faster" I mean both faster to type and (not possible, as pointed out) a faster execution. As it has been pointed out, values(i) is not working as expected, thus I will leave it here as pseudo-code until I found a better_working replacement.
Maybe this works:
QVector<double>::fromList(a.values().mid(x0, length));
The idea is to get all the values as a list of doubles, extract the sublist you are interested in, thus create a vector from that list by means of an already existent static method of QVector .
EDIT
As suggested in the comments and in the updated question, it follows a slower to type but faster solution:
QVector<double> v{length};
auto it = a.cbegin()+x0;
for(auto last = it+length; it != last; it++) {
v.push_back(it.value());
}
I assume that x0 and length take care of the actual length of the key list, so a.cbegin()+x0 is valid and it doesn't worth to add the guard it != a.cend() as well.
Try this, shouldn work, haven't tested it:
int length = x1-x0;
QVector<double> retVec;
retVec.reserve(length); // reserve to avoid reallocations
QMap<double, double>::const_iterator i = map.constBegin();
i += x0; // increment to range start
while (length--) retVec << i++.value(); // add value to vector and advance iterator
This assumes the map has actually enough elements, thus the iterator is not tested before use.
Related
I'm updating a single element in a buffer from two lanes and need an atomic for float4 types. (More specifically, I launch twice as many threads as there are buffer elements, and each successive pair of threads updates the same element.)
For instance (this pseudocode does nothing useful, but hopefully illustrates my issue):
int idx = get_global_id(0);
int mapIdx = floor (idx / 2.0);
float4 toAdd;
// ...
if (idx % 2)
{
toAdd = (float4)(0,1,0,1);
}
else
{
toAdd = float3(1,0,1,0);
}
// avoid race condition here?
// I'd like to atomic_add(map[mapIdx],toAdd);
map[mapIdx] += toAdd;
In this example, map[0] should be incremented by (1,1,1,1). (0,1,0,1) from thread 0, and (1,0,1,0) from thread 1.
Suggestions? I haven't found any reference to vector atomics in the CL documents. I suppose I could do this on each individual vector component separately:
atomic_add(map[mapIdx].x, toAdd.x);
atomic_add(map[mapIdx].y, toAdd.y);
atomic_add(map[mapIdx].z, toAdd.z);
atomic_add(map[mapIdx].w, toAdd.w);
... but that just feels like a bad idea. (And requires a cmpxchg hack since there are no float atomics.
Suggestions?
Alternatively you could try using local memory like that:
__local float4 local_map[LOCAL_SIZE/2];
if(idx < LOCAL_SIZE/2) // More optimal would be to use work items together than every second (idx%2) as they work together in a warp/wavefront anyway, otherwise that may affect the performance
local_map[mapIdx] = toAdd;
barrier(CLK_LOCAL_MEM_FENCE);
if(idx >= LOCAL_SIZE/2)
local_map[mapIdx - LOCAL_SIZE/2] += toAdd;
barrier(CLK_LOCAL_MEM_FENCE);
What will be faster - atomics or local memory - or possible (size of local memory may be too big) depends on actual kernel, so you will need to benchmark and choose the right solution.
Update:
Answering your question from comments - to write later back to global buffer do:
if(idx < LOCAL_SIZE/2)
map[mapIdx] = local_map[mapIdx];
Or you can try without introducing local buffer and write directly into global buffer:
if(idx < LOCAL_SIZE/2)
map[mapIdx] = toAdd;
barrier(CLK_GLOBAL_MEM_FENCE); // <- notice that now we use barrier related to global memory
if(idx >= LOCAL_SIZE/2)
map[mapIdx - LOCAL_SIZE/2] += toAdd;
barrier(CLK_GLOBAL_MEM_FENCE);
Aside from that I can see now problem with indexes. To use the code from my answer the previous code should look like:
if(idx < LOCAL_SIZE/2)
{
toAdd = (float4)(0,1,0,1);
}
else
{
toAdd = (float4)(1,0,1,0);
}
If you need to use id%2 though then all the code must follow this or you will have to do some index arithmetic so that the values go into right places in map.
If I understand issue correctly I would do next.
Get rid of ifs by making array with offsets
float4[2] = {(1,0,1,0), (0,1,0,1)}
and use idx %2 as offset
move map into local memory and use mem_fence(CLK_LOCAL_MEM_FENCE) to make sure all threads in group synced.
a is set< int> ARRAY, I want to copy it to b. BUT...
int main(){
set<int> a[10];
a[1].insert(99);
a[3].insert(99);
if(a[1]==a[3])cout<<"echo"<<endl;
set<int> b[10];
memcpy(b,a,sizeof(a));
if(b[1]==b[3])cout<<"echo"<<endl;// latch up here, what happen?
return 0;}
Do you know What is computer doing?
I assume the 'set' class you are using is a std::set? What makes you think that simplying memcpying the raw bytes of a std::set (or array of them, in this case) will work properly? This is highly dependent on the internal structure and implementation of the set class, and trying to do such a thing with anything more complicated than a primitive or array of primitives is almost guaranteed to give unexpected results. Doing this sort of raw byte manipulation when classes are involved is rarely going to be correct.
To do this properly you should iterate over the sets and use their '=' operator to assign them, which knows how to copy the contents properly:
for(int i = 0; i < 10; ++i) {
b[i] = a[i];
}
Even better you can use std::copy:
std::copy(std::begin(a), std::end(a), std::begin(b));
I am in trouble passing values between host code and kernel code due to some vector data types. The following code/explanation is just for referencing my problem, my code is much bigger and complicated. With this small example, hopefully, I will be able to explain where I am having a problem. I f anything more needed please let me know.
std::vector<vector<double>> output;
for (int i = 0;i<2; i++)
{
auto& out = output[i];
sum =0;
for (int l =0;l<3;l++)
{
for (int j=0;j<4; j++)
{
if (some condition is true)
{ out[j+l] = 0.;}
sum+= .....some addition...
}
out[j+l] = sum
}
}
Now I want to parallelize this code, from the second loop. This is what I have done in host code:
cl::buffer out = (context,CL_MEM_READ_ONLY | CL_MEM_COPY_HOST_PTR, output.size(), &output, NULL)
Then, I have set the arguments
cl::SetKernelArg(0, out);
Then the loop,
for (int i = 0,i<2, i++)
{
auto& out = output[i];
// sending some more arguments(which are changing accrding to loop) for sum operations
queue.enqueueNDRangeKernel(.......)
queue.enqueuereadbuffer(.....,&out,...)
}
In Kernel Code:
__kernel void sumout(__global double* out, ....)
{
int l = get_global_id(0);
int j = get_global_id(1);
if (some condition is true)
{ out[j+l] = 0.; // Here it goes out of the loop then
return}
sum+= .....some addition...
}
out[j+l] = sum
}
So now, in if condition out[j+l] is getting 0 in the loop. So out value is regularly changing. In normal code, it is a reference pointer to a vector. I am not able to read the values in output from out during my kernel and host code. I want to read the values in output[i] for every out[j+l]. But I am confused due this buffer and vector.
just for more clarification,output is a vector of vector and out is reference vector to output vector. I need to update values in output for every change in out. Since these are vectors, I passed out as cl buffer. I hope it is clear.
Please let me know, if the code is required, I will try to provide as much as I can.
You are sending pointers of vectors to opencl(ofcourse they are contiguous on pointer level) but whole data is not contiguous in memory since each inner vector points to different memory area. Opencl cannot map host pointers to device memory and there is no such command in this api.
You could use vector of arrays(latest version) or pure arrays.
I am doing one project in which I define a data types like below
typedef QVector<double> QFilterDataMap1D;
typedef QMap<double, QFilterDataMap1D> QFilterDataMap2D;
Then there is one class with the name of mono_data in which i have define this variable
QFilterMap2D valid_filters;
mono_data Scan_data // Class
Now i am reading one variable from a .mat file and trying to save it in to above "valid_filters" QMap.
Qt Code: Switch view
for(int i=0;i<1;i++)
{
for(int j=0;j<1;j++)
{
Scan_Data.valid_filters[i][j]=valid_filters[i][j];
printf("\nValid_filters=%f",Scan_Data.valid_filters[i][j]);
}
}
The transferring is done successfully but then it gives run-time error
Windows has triggered a breakpoint in SpectralDataCollector.exe.
This may be due to a corruption of the heap, and indicates a bug in
SpectralDataCollector.exe or any of the DLLs it has loaded.
The output window may have more diagnostic information
Can anyone help in solving this problem. It will be of great help to me.
Thanks
Different issues here:
1. Using double as key type for a QMap
Using a QMap<double, Foo> is a very bad idea. the reason is that this is a container that let you access a Foo given a double. For instance:
map[0.45] = foo1;
map[15.74] = foo2;
This is problematic, because then, to retrieve the data contained in map[key], you have to test if key is either equal, smaller or greater than other keys in the maps. In your case, the key is a double, and testing if two doubles are equals is not a "safe" operation.
2. Using an int as key while you defined it was double
Here:
Scan_Data.valid_filters[i][j]=valid_filters[i][j];
i is an integer, and you said it should be a double.
3. Your loop only test for (i,j) = (0,0)
Are you aware that
for(int i=0;i<1;i++)
{
for(int j=0;j<1;j++)
{
Scan_Data.valid_filters[i][j]=valid_filters[i][j];
printf("\nValid_filters=%f",Scan_Data.valid_filters[i][j]);
}
}
is equivalent to:
Scan_Data.valid_filters[0][0]=valid_filters[0][0];
printf("\nValid_filters=%f",Scan_Data.valid_filters[0][0]);
?
4. Accessing a vector with operator[] is not safe
When you do:
Scan_Data.valid_filters[i][j]
You in fact do:
QFilterDataMap1D & v = Scan_Data.valid_filters[i]; // call QMap::operator[](double)
double d = v[j]; // call QVector::operator[](int)
The first one is safe, and create the entry if it doesn't exist. The second one is not safe, the jth element in you vector must already exist otherwise it would crash.
Solution
It seems you in fact want a 2D array of double (i.e., a matrix). To do this, use:
typedef QVector<double> QFilterDataMap1D;
typedef QVector<QFilterDataMap1D> QFilterDataMap2D;
Then, when you want to transfer one in another, simply use:
Scan_Data.valid_filters = valid_filters;
Or if you want to do it yourself:
Scan_Data.valid_filters.clear();
for(int i=0;i<n;i++)
{
Scan_Data.valid_filters << QFilterDataMap1D();
for(int j=0;j<m;j++)
{
Scan_Data.valid_filters[i] << valid_filters[i][j];
printf("\nValid_filters=%f",Scan_Data.valid_filters[i][j]);
}
}
If you want a 3D matrix, you would use:
typedef QVector<QFilterDataMap2D> QFilterDataMap3D;
I've got a QVector of QVector. And I want to collect all elements in all QVectors to form a new QVector.
Currently I use the code like this
QVector<QVector<T> > vectors;
// ...
QVector<T> collected;
for (int i = 0; i < vectors.size(); ++i) {
collected += vectors[i];
}
But it seems the operator+= is actually appending each element to the QVector. So is there a more time-efficent usage of QVector or a better suitable type replace QVector?
If you really need to, then I would do something like:
QVector< QVector<T> > vectors = QVector< QVector<T> >();
int totalSize = 0;
for (int i = 0; i < vectors.size(); ++i)
totalSize += vectors.at(i).size();
QVector<T> collected;
collected.reserve(totalSize);
for (int i = 0; i < vectors.size(); ++i)
collected << vectors[i];
But please take note that this sounds a bit like premature optimisation. As the documentation points out:
QVector tries to reduce the number of reallocations by preallocating up to twice as much memory as the actual data needs.
So don't do this kind of thing unless you're really sure it will improve your performance. Keep it simple (like your current way of doing it).
Edit in response to your additional requirement of O(1):
Well if you're randomly inserting it's a linked list but if you're just appending (as that's all you've mentioned) you've already got amortized O(1) with the QVector. Take a look at the documentation for Qt containers.
for (int i = 0; i < vectors.size(); ++i) {
for(int k=0;k<vectors[i].size();k++){
collected.push_back(vectors[i][k]);
}
}
outer loop: take out each vector from vectors
inner loop: take out each element in the i'th vector and push into collected
You could use Boost Multi-Array, this provides a multi-dimensional array.
It is also a 'header only' library, so you don't need to separately compile a library, just drop the headers into a folder in your project and include them.
See the link for the tutorial and example.