I just started building a code for parallel computation with OpenCL.
As far as I understand, the data generated from CPU side (host) is transffered through the buffers (clCreateBuffer -> clEnqueueWriteBuffer -> clSetKernelArg, then processed by the device).
I mainly have to deal with arrays (or matrices) of large size with double precision.
However, I realized the code never runs for arrays larger than 8000 entries with errors.
(This makes sense because 64kb is equivalent to 8000 double precision numbers.)
The error codes were either -6 (CL_OUT_OF_HOST_MEMORY) or -31 (CL_INVALID_VALUE).
One more thing when I set the argument to 2-dimensional array, I could set the size up to 8000 x 8000.
So far, I guess the maximum data size for double precision is 8000 (64kb) for 1D arrays, but I have no idea what happens for 2D or 3D arrays.
Is there any other way to transfer the data larger than 64kb?
If I did something wrong for OpenCL setup in data transfer, what would be recommended?
I appreciate your kind answer.
The hardware that I'm using is Tesla V100 which is installed on the HPC cluster in my school.
The following is a part of my code snippet that I'm testing the data transfer.
bfr(0) = clCreateBuffer(context,
& CL_MEM_READ_WRITE + CL_MEM_COPY_HOST_PTR,
& sizeof(a), c_loc(a), err);
if(err.ne.0) stop "Couldn't create a buffer";
err=clEnqueueWriteBuffer(queue,bfr(0),CL_TRUE,0_8,
& sizeof(a),c_loc(a),0,C_NULL_PTR,C_NULL_PTR)
err = clSetKernelArg(kernel, 0,
& sizeof(bfr(0)), C_LOC(bfr(0)))
print*, err
if(err.ne.0)then
print *, "clSetKernelArg kernel"
print*, err
stop
endif
The code was build by Fortran with using clfortran module.
Thank you again for your answer.
You can do much larger arrays in OpenCL, as large as memory is abailable. For example I'm commonly working with linearized 4D arrays of 2 Billion floats in a CFD application.
Arrays need to be 1D only; if you have 2D or 3D arrays, linearize them, for example with n=x+y*size_x for 2D->1D coordinates. Some older devices only allow arrays 1/4 the size of the device memory. However modern devices typically have an extension to the OpenCL specification to enable larger buffers.
Here is a quickover view on what the OpenCL C bindings do:
clCreateBuffer allocates memory on the device side (video memory for GPUs, RAM for CPUs). Buffers can be as large as host/device memory allows or on some older devices 1/4 of device memory.
clEnqueueWriteBuffer copies memory over PCIe from RAM to video memory. Both on CPU and GPU side buffers must be allocated beforehand. There is no limit on transfer size; it can be as large as the entire buffer or only a subrange of a buffer.
clSetKernelArg links the GPU buffers to the Input parameters of the kernel, so it knows which kernel parameter corresponds to which buffer. Make sure data types of the buffers and kernel arguments match as you won't get an error if they don't. Also make sure the order of kernel arguments matches.
In your case there are several possible causes for the error:
Maybe you have integer overflow during computation of the array size. In this case use 64-bit integer numbers instead to compute the array size/indices.
You are out of memory because other buffers already take up too much memory. Do some bookkeeping to keep track on total (video) memory utilization.
You have selected the wrong device, for example integrated graphics instead of the dedicated GPU, in which case much less memory is available and you end up with cause 2.
To give you a more definitive answer, please provide some additional details:
What hardware do you use?
Show a code snippet of how you allocate device memory.
UPDATE
I see some errors in your code:
The length argument in clCreateBuffer and clEnqueueWriteBuffer requires the number of bytes that your array a has. If a is of type double, then this is a_length*sizeof(double), where a_length is the number of elements in the array a. sizeof(double) returns the number of bytes for one double number which is 8. So the length argument is 8 bytes times the number of elements in the array.
For multiple flags, you typically use bitwise or | instead of +. Shouldn't be an issue here, but is unconvenional.
You had "0_8" as buffer offset. This needs to be zero (0).
const int a_length = 8000;
double* a = new double[a_length];
bfr(0) = clCreateBuffer(context, CL_MEM_READ_WRITE|CL_MEM_COPY_HOST_PTR, a_length*sizeof(double), c_loc(a), err);
if(err.ne.0) stop "Couldn't create a buffer";
err = clEnqueueWriteBuffer(queue, bfr(0), CL_TRUE, 0, a_length*sizeof(double), c_loc(a), 0, C_NULL_PTR, C_NULL_PTR);
err = clSetKernelArg(kernel, 0, sizeof(bfr(0)), C_LOC(bfr(0)));
print*, err;
if(err.ne.0) {
print *, "clSetKernelArg kernel"
print*, err
stop
}
The datasheet provided shows parameters that have addresses and bit size. I want to understand how I can use these with my arduino to program this sensor. Specifically what does the notation "[4:0]" mean next to a parameter.
All calibration parameters on the MLX90288 are stored in a 32 x 16bit non-volatile EEPROM.
"The EEPROM parameters from the first 29 addresses are stored with triple redundancy, to correct if any EEPROM bit would loose its content, by using majority voting. Consequently, an EEPROM word in this part of EEPROM only holds the information of 5 calibration bits + 1 locking bit at index 15. The EEPROM word stored at address 0 thus looks like this:
{LOCK0,PARAM[4:0],PARAM[4:0],PARAM[4:0]}"
So here it says that the clamp voltages are programmable but i don't have knowledge of what the bits mean in the brackets and how I can convert them to hexadecimal:
e.g. CLPHigh[9:0] means 10 bits (from 0 to 9).
With 10 bits the maximum value = dec 1023 (bin 11 1111 1111).
Vdd = 5 Volts, the range is from 0% to 100% (0V to 5V).
The resolution is 0.098% (100 / 1023 = approx. 0.098)
Say you want to set CLPHigh at 25% : 25/0.098 = 255 (= 0xFF)
The Output DAC Resolution = 0.0244, so 1/4 of outDac = 0.098
SOLVED apologies for wasting peoples' time, the comment on compiling with verbose made me realize I'm only using 91% available, I had an extra zero in one of my calculations!
I've been working on an Arduino sketch for a number of years, slowly adding more variable declarations. Recently I upped the size of an array of longs from 24*70 to 24*100 and that seemed to cause unexplainable errors. (I have many other arrays but this is the biggest.)
I decided to count how many bytes approximately I am declaring in my sketch and just by counting the bigger array structures I came up with an approximate number of at least 14,000 bytes (taking into account longs are 4 bytes etc).
How is this possible? Where are the other 6000 bytes being stored? I should add that I've been using it like this for a number of years, it was only the upping of the bytes used from 14 kb to approximately 17 kb that caused the bugs.
I know that the 1mb memory of 8086 is split into 16 logical sections but i only know about 4 such locations ,would anyone tell about the rest ?
I know that the 1mb memory of 8086 is split into 16 logical sections
I understand what you're saying but I'm afraid it's worse than that!
The 1MB memory has actually 65536 logical sections each overlapping the next by 65520 bytes. Your 16 logical sections are just special cases that happen to start at linear addresses divisable by 65536.
but i only know about 4 such locations
It's not clear what you mean by this but I think that you are referring to the segment registers CS, DS, ES, and SS. These are not locations as such but rather they each provide a pointer to any one of the forementioned sections. A linear address gets calculated by multiplying the appropiate segment register by 16 and then adding in the offset address. The result of this calculation is then truncated to have 20 bits only.
,would anyone tell about the rest ?
Simple enough. There's nothing else.
I am writing a computer program which utilizes input from some equipment which I seldom have availible in my office. In order to develop and test this program I am trying to use an Arduino board to simulate the communication from the actual equipment. To this effect I create datapackets on the Arduino and send them to my computeer over the serial port. The packets are formated as a header and a hexidecimal integer, representing some sensor data.
The header is supposed to contain a checksum (2's complement 256-modulo). I am however not sure how to calculate it. In the datasheet of the equipment (which communication I try to simulate), it is stated that I should first compute the sum all bytes in the packet, and then take the 256-modulo of the sum and perform a 8-bit two's complement on the result.
However, as I am a newbie to bits, bytes and serial communication, I do not understand the following:
1) Lets say that I want to send the value 5500 as two bytes (high byte and low byte). Then the high-byte is '15' and the low-byte is '7c' in hexidecimal encoding, which corresponds to 21 and 124 in decimal values. Do I then add the contributions 21 and 124 to the checksum before taking the 256-modulo, or do I have to do some bit-magic beforehand?
2) How do I perform a two's compliment?
Here is a code which should illustrate how I think. The idea is to send a packet with a header containing a byte which states the length of the packet, a byte which states the type of the packet, and a byte for the checksum. Then a two-byte integer value representing some sensor value is devided into a high-byte and a low-byte, and transmitted low-byte first.
int intVal;
byte Len = 5;
byte checksum;
byte Type = 2;
byte intValHi;
byte intValLo;
void setup(){
Serial.begin(9600);
}
void loop(){
intVal = 5500; //assume that this is a sensor value
intValHi = highByte(intVal);
intValLo = lowByte(intVal);
//how to calculate the checksum? I unsuccessfully tried the following
checksum = 0;
checksum = (Len+checksum+Type+intValHi+intValLo) % 256;
//send header
Serial.write(Len);
Serial.write(checksum);
Serial.write(Type);
//send sensor data
Serial.write(intValLo);
Serial.write(intValHi)
}
Thanks!
The first thing you should understand is that mod 256 is the same thing as looking at the bottom log(256) => 8 bits.
To understand this you have to first realize what the 'mod' operation does and how digits are represented in hardware.
Mod is the remainder after an old-school division (ie only with whole numbers).
eg 5%2 = 1
Digits in hardware are stored in 'bits' which can be interpreted as base 2 mathematics.
Thus if you want to take the mod operation of a power of 2 you don't actually have to do any math.
This is just like if you want to have the remainder of the power of 10, you just take the lower digits.
ie. 157 % 100 = 57.
This can be sped up by using the fact that bytes should overflow by themselves. This means that all you have to do to take %256 of a bunch of numbers is to add them to a single byte and the arduino will take care of the rest.
For twos compliment see this question:
What is “2's Complement”?