struct a{
double array[2][3];
};
struct b{
double array[3][4];
};
void main(){
a x = {{1,2,3,4,5,6}};
b y = {{1,2,3,4,5,6,7,8,9,10,11,12}};
}
I have two structs, inside which there are two dim arrays with different sizes. If I want to define only one function, which can deal with both x and y (one for each time), i.e., the function allows both x.array and y.array to be its argument. How can I define the input argument? I think I should use a pointer.... But **x.array seems not to work.
For example, I want to write a function PrintArray which can print the input array.
void PrintArray( ){}
What should I input into the parenthesis? double ** seems not work for me... (we can let dimension to be the PrintArray's argument as well, telling them its 2*3 array)
Write a function that takes three parameters: a pointer, the number of rows, and the number of columns. When you call the function, reduce the array to a pointer.
void PrintArray(const double *a, int rows, int cols) {
int r, c;
for (r = 0; r < rows; ++r) {
for (c = 0; c < cols; ++c) {
printf("%3.1f ", a[r * cols + c]);
}
printf("\n");
}
}
int main(){
struct a x = {{{1,2,3},{4,5,6}}};
struct b y = {{{1,2,3,4},{5,6,7,8},{9,10,11,12}}};
PrintArray(&x.array[0][0], 2, 3);
PrintArray(&y.array[0][0], 3, 4);
return 0;
}
Related
The requirement:
Let say we have 1) Five groups of colors, each group has three colors (the colors are generated dynamically in the CPU) and 2) a list of 1000 car, each car is represented in the list by its color (the color picked from the group).
And we want to pass three arguments to an OpenCL kernel: 1) a group of the generated color, 2) a car's color array (1D), and 3) an integer array (1D) to test the car color against the color group (doing a simple calculation).
The structures:
struct GeneratedColorGroup
{
float4 Color1; //16 =2^4
float4 Color2; //16 =2^4
float4 Color3; //16 =2^4
float4 Color4; //16 =2^4
}
struct ColorGroup
{
GeneratedColorGroup Colors[8]; //512 = 2^9
}
The kernel code:
__kernel void findCarColorRelation(
const __global ColorGroup *InColorGroups,
const __global float4* InCarColor,
const __global int* CarGroupIndicator
const int carsNumber)
{
int globalID = get_global_id( 0 );
if(globalID < carsNumber)
{
ColorGroup colorGroups;
float4 carColor;
colorGroups = InColorGroups[globalID];
carColor = InCarColor[globalID];
for(int groupIndex =0; groupIndex < 8; groupIndex++)
{
if(colorGroups[groupIndex].Color1 == carColor)
{
CarGroupIndicator[globalID] = groupIndex + 1 ;
break;
}
if(colorGroups[groupIndex].Color2 == carColor)
{
CarGroupIndicator[globalID] = groupIndex * 2 + 2;
break;
}
if(colorGroups[groupIndex].Color3 == carColor)
{
CarGroupIndicator[globalID] = groupIndex * 3 + 3;
break;
}
}
}
}
Now, we have 1000 items which mean the kernel is going to be executed 1000 time. That's OK.
The problem:
As you see, we have a global ColorGroup as an input to the kernel, this global memory has five items of "GeneratedColorGroup" type.
I tried to access these items as shown in the code above but I got an unexpected result. and the execution is very slow.
What is the wrong with my code?
Any help is highly appreciated.
When passing structs from a host to a device, make sure you declare the struct type with __attribute__ ((packed)) in both host and device code. Otherwise the host and the device compilers may create have a different memory layout for the struct, i.e. they can use a different size for a padding.
Using packed structs may cause a performance degaradation, because packed structs don't have padding at all, so data within a struct may not be properly aligned and an unaligned access is usually slow. In this case, you have to either manually insert a padding with char[], or use the __attribute__ ((aligned (N))) on a struct field (or on the struct itself).
See the OpenCL C specification for details on packed and aligned attributes:
https://www.khronos.org/registry/OpenCL/sdk/1.1/docs/man/xhtml/attributes-types.html
I'm wildly guessing the problem is
... CarGroupIndicator[globalID] = groupIndex + 1 ;
... CarGroupIndicator[globalID] = groupIndex * 2 + 2;
... CarGroupIndicator[globalID] = groupIndex * 3 + 3;
... which makes it impossible to tell from the result CarGroupIndicator[globalID] what was matched exactly. E.g. match on group 5 color 1 results in value 6, but so does group 2 color 2 and also group 1 color 3 result in value 6. What you want is something like this:
... CarGroupIndicator[globalID] = groupIndex;
... CarGroupIndicator[globalID] = groupIndex + 8;
... CarGroupIndicator[globalID] = groupIndex + 16;
.. then 0-7 are color1, 8-15 color2, 16-24 color3.
I made dynamic vector class..
But the problem show when main function is looping on and on,
my2dArr's row size is increasing when the function is looping
When data is coming on looping, i want to copy new data..
void main()
{
int data[450];
DynamicArray<int> my2dArr(36, 100);
for(int i = 0;i < 36;++i)
{
for(int j = 1;j < 16;++j)
{
my2dArr[i][j-1] = data[i];
}
}
}
// vector class
class DynamicArray
{
public:
DynamicArray(){};
DynamicArray(int rows, int cols): dArray(rows, vector<T>(cols)){}
vector<T> & operator[](int i)
{
return dArray[i];
}
const vector<T> & operator[] (int i) const
{
return dArray[i];
}
void resize(int rows, int cols)//resize the two dimentional array .
{
dArray.resize(rows);
for(int i = 0;i < rows;++i) dArray[i].resize(cols);
}
void clearCOL()
{
for(int i = 0;i < dArray.size();i++)
{
for(int j = 0;j < dArray[i].size();++j)
{
dArray[j].erase();
}
}
}
private:
vector<vector<T> > dArray;
};
The nested for loop should be fine at Initializing your array, but you'd need to put values into the data array to use it in initializing.
If you're only initializing the data once you might consider a third constructor overload that takes in an int[], like so:
DynamicArray( int rows, int cols, T array[] ): dArray( rows, vector< T >( cols ) )
{
for( int i = 0; i < rows; i++ )
{
for( int j = 0; j < cols; j++ )
{
dArray[i][j] = array[i * rows + j];
}
}
}
You'd need to make sure the array was the size you specified. In your example you pass a 450 int array in to initialize a 3,600 int DynamicArray. In you're example you're actually reading illegal data cause you go to the 16th column of each of the 36 rows so you're actually reading 576 elements from a 450 int array. I suppose the array is uninitialized anyway though, so it's all garbage.
Suppose I have a List in Rcpp, here called x containing matrices. I can extract one of the elements using x[0] or something. However, how do I extract a specific element of that matrix? My first thought was x[0](0,0) but that does not seem to work. I tried using * signs but also doesn't work.
Here is some example code that prints the matrix (shows matrix can easily be extracted):
library("Rcpp")
cppFunction(
includes = '
NumericMatrix RandMat(int nrow, int ncol)
{
int N = nrow * ncol;
NumericMatrix Res(nrow,ncol);
NumericVector Rands = runif(N);
for (int i = 0; i < N; i++)
{
Res[i] = Rands[i];
}
return(Res);
}',
code = '
void foo()
{
List x;
x[0] = RandMat(3,3);
Rf_PrintValue(wrap( x[0] )); // Prints first matrix in list.
}
')
foo()
How could I change the line Rf_PrintValue(wrap( x[0] )); here to print the the element in the first row and column? In the code I want to use it for I need to extract this element to do computations.
Quick ones:
Compound expression in C++ can bite at times; the template magic gets in the way. So just assign from the List object to a whatever the element is, eg a NumericMatrix.
Then pick from the NumericMatrix as you see fit. We have row, col, element, ... access.
Printing can be easier using Rcpp::Rcout << anElement but note that we currently cannot print entire matrices or vectors -- but the int or double types are fine.
Edit:
Here is a sample implementation.
#include <Rcpp.h>
// [[Rcpp::export]]
double sacha(Rcpp::List L) {
double sum = 0;
for (int i=0; i<L.size(); i++) {
Rcpp::NumericMatrix M = L[i];
double topleft = M(0,0);
sum += topleft;
Rcpp::Rcout << "Element is " << topleft << std::endl;
}
return sum;
}
/*** R
set.seed(42)
L <- list(matrix(rnorm(9),3), matrix(1:9,3), matrix(sqrt(1:4),2))
sacha(L) # fix typo
*/
And its result:
R> Rcpp::sourceCpp('/tmp/sacha.cpp')
R> set.seed(42)
R> L <- list(matrix(rnorm(9),3), matrix(1:9,3), matrix(sqrt(1:4),2))
R> sacha(L)
Element is 1.37096
Element is 1
Element is 1
[1] 3.37096
R>
You have to be explicit at some point. The List class has no idea about the types of elements it contains, it does not know it is a list of matrices.
Dirk has shown you what we usually do, fetch the element as a NumericMatrix and process the matrix.
Here is an alternative that assumes that all elements of your list have the same structure, using a new class template: ListOf with enough glue to make the user code seamless. This just moves to a different place the explicitness.
#include <Rcpp.h>
using namespace Rcpp ;
template <typename WHAT>
class ListOf : public List {
public:
template <typename T>
ListOf( const T& x) : List(x){}
WHAT operator[](int i){ return as<WHAT>( ( (List*)this)->operator[]( i) ) ; }
} ;
// [[Rcpp::export]]
double sacha( ListOf<NumericMatrix> x){
double sum = 0.0 ;
for( int i=0; i<x.size(); i++){
sum += x[i](0,0) ;
}
return sum ;
}
/*** R
L <- list(matrix(rnorm(9),3), matrix(1:9,3), matrix(sqrt(1:4),2))
sacha( L )
*/
When I sourceCpp this file, I get:
> L <- list(matrix(rnorm(9), 3), matrix(1:9, 3), matrix(sqrt(1:4), 2))
> sacha(L)
[1] 1.087057
I have int A, B, C. And A is in range 0-9999, B is 0-99, C is 0-99.
Because the function must return only one double, I think of putting them all into one number. Otherwise I need to call function three times.
But I cannot write an efficient code to do this. This will be called millions times, so it should be quite effective, but no ASM.
I need a function double pack3int_to_double(int A, int B, int C) {}
Couldn't you just store A + 1000B + 100000C?
For example, if you wanted to store A = 1234, B = 6, and C = 89, you'd just store
89061234
CCBAAAA
You can then extract the numbers by casting the double to an int and using standard integer division and modulus tricks to recover the individual values.
Hope this helps!
If A<10,000 and B & C <100, A can be expressed with 14 bits, and B & C with 8 bits. Thus you need 30 bits in total.
You could therefore pack/unpack the integers by shifting it to the right place:
int packed = A + B<<14 + C<<22;
A = packed & 0x3FFF; B = (packed >> 14) & 0xFF; C = (packed >> 22) & 0xFF;
Bit shifting is of course MUCH faster than multiply/divide, and you can cast the int to a double and vice versa.
This is technically not legal C code, so you would use this at your own risk:
typedef union {
double x;
struct {
unsigned a : 14;
unsigned b : 7;
unsigned c : 7;
} y;
} result_t;
The C standard doesn't allow using a union member to write a value and a different one to read it out, but I am not aware of a compiler that does the static analysis to diagnose such a problem (it doesn't mean one won't do so in the future). Also, using certain int values may result in a trap representation for a double. But, if you know your system will not generate any trap representations, you can consider using this.
double pack3int_to_double(int A, int B, int C) {
result_t r;
r.y.a = A;
r.y.b = B;
r.y.c = C;
return r.x;
}
void unpack3int_from_double (double X, int *A, int *B, int *C) {
result_t r = { X };
*A = r.y.a;
*B = r.y.b;
*C = r.y.c;
}
You can use out parameters in function call and retrieve all 3 int variables.
You could return a NaN double with the data stored in the mantissa. That gives you 53 bits to utilize. Should be plenty.
http://en.m.wikipedia.org/wiki/NaN
Inspired by your answers, this is what I come up so far. This should be quite efficient, and only 32 bits are used, so the exponent of the double is not touched.
struct pack_abc {
unsigned short a;
unsigned char b, c;
int safety;
};
double pack3int_to_double(int A, int B, int C) {
struct pack_abc R = {A, B, C, 0}; // or 0 could be replaced with something smater, like NaN?
return *(double*)&R;
}
void main() {
int w = 1234, a = 56, d = 78;
int W, A, D, i;
double p = pack3int_to_double(w, a, d);
// we got the data packed into 'p', now let's unpack it
struct pack_abc *R = (struct pack_abc*) & p;
printf("%i %i %i\n", (int)R->a, (int)R->b, (int)R->c);
}
Im trying to do a merge sort in cpp on a vector called x, which contains x coordinates. As the mergesort sorts the x coordinates, its supposed to move the corresponding elements in a vector called y, containing the y coordinates. the only problem is that i dont know how to (or if i can) return both resulting vectors from the merge function.
alternatively if its easier to implement i could use a slower sort method.
No, you cannot return 2 results from a method like in this example.
vector<int>, vector<int> merge_sort();
What you can do is pass 2 vectors by reference to a function and the resultant mergesorted vector affects the 2 vectors...e.g
void merge_sort(vector<int>& x, vector<int>& y);
Ultimately, you can do what #JoshD mentioned and create a struct called point and merge sort the vector of the point struct instead.
Try something like this:
struct Point {
int x;
int y;
operator <(const Point &rhs) {return x < rhs.x;}
};
vector<Point> my_points.
mergesort(my_points);
Or if you want to sort Points with equal x value by the y cordinate:
Also, I thought I'd add, if you really ever need to, you can alway return a std::pair. A better choice is usually to return through the function parameters.
operator <(const Point &rhs) {return (x < rhs.x || x == rhs.x && y < rhs.y);}
Yes, you can return a tuple, then use structured binding (since C++17).
Here's a full example:
#include <cstdlib>
#include <iostream>
#include <numeric>
#include <tuple>
#include <vector>
using namespace std::string_literals;
auto twoVectors() -> std::tuple<std::vector<int>, std::vector<int>>
{
const std::vector<int> a = { 1, 2, 3 };
const std::vector<int> b = { 4, 5, 6 };
return { a, b };
}
auto main() -> int
{
auto [a, b] = twoVectors();
auto const sum = std::accumulate(a.begin(), a.end(), std::accumulate(b.begin(), b.end(), 0));
std::cout << "sum: "s << sum << std::endl;
return EXIT_SUCCESS;
}
You can have a vector of vectors
=> vector<vector > points = {{a, b}, {c, d}};
now you can return points.
Returning vectors is most probably not what you want, as they are copied for this purpose (which is slow). Have a look at this implementation, for example.