Related
I am encoding 6 values (4x 3bit + 1bit) into a 16bit integer and transfer them via serial to an ATTINY84 splitting them into 2 bytes. That works all good until the point that I re-assemble the bytes into a 16bit int.
Example:
I am sending the following binary state 0001110000001100 which translates to 7180 and gets split into a byte array of [18, 28].
I am putting that byte array into the EEPROM and read it on the next power cycle.
After power cycle my serial debug output looks like this:
18
28
7180
Awesome. Looks all good and my code for that part is:
byte d0 = EEPROM.read(0);
byte d1 = EEPROM.read(1);
unsigned int w = d0 + (256 * d1);
But now the weirdest thing happens. When I do a bit-by-bit read I am getting back:
0011000000111000
should be:
0001110000001100
via:
for(byte t = 0; t < 16; t++) {
serial.print(bitRead(w, t) ? "1" : "0");
}
The bit representation is completely reversed. How is that possible? Or maybe I am missing something.
Also I confirmed when I extract the actual 3 bit location to receive my original value 0..7 it's all off.
Any help would be appreciated.
So it looks like I fell into the little/big endian trap.
Basically as Alain said, in the comments - everything is correct and it's just the representations.
I came up with the following method that can extract bits from a little endian stored number that needs to be in a big endian format:
/**
* #bex
*/
uint8_t bexd(uint16_t n, uint8_t o, uint8_t l, uint8_t d) {
uint8_t v = 0;
uint8_t ob = d - o;
for (uint8_t b=ob; b > (ob-l); b--) v = ( v << 1 ) | ( 0x0001 & ( n >> (b-1) ) );
return v;
}
uint8_t bexw(uint16_t n, uint8_t o, uint8_t l) {return bexd(n, o, l, 16);}
uint8_t bexb(uint8_t n, uint8_t o, uint8_t l) {return bexd(n, o, l, 8);}
For example:
In big endian the "second" value is stored in bit 3,4, and 5, compared to little endian where it will be stored in bit 10, 11, and 12. The method above allows to work a "little endian" value like it would be an "big endian" value.
To extract the second value from this value 0011000000111000 just do:
byte v = bex(7180, 3, 3); // 111
Serial.println(v); // prints 255
Hope that helps someone.
I'm looking for a languages for querying CBOR, like JsonPath or jq but for CBOR binary format. I don't want to convert from CBOR to JSON because some CBOR type is not existed in JSON, and performance issue.
The C++ library jsoncons allows you to query CBOR with JSONPath, for example,
#include <jsoncons/json.hpp>
#include <jsoncons_ext/cbor/cbor.hpp>
#include <jsoncons_ext/jsonpath/json_query.hpp>
#include <iomanip>
using namespace jsoncons; // For convenience
int main()
{
std::vector<uint8_t> v = {0x85,0xfa,0x40,0x0,0x0,0x0,0xfb,0x3f,0x12,0x9c,0xba,0xb6,0x49,0xd3,0x89,0xc3,0x49,0x1,0x0,0x0,0x0,0x0,0x0,0x0,0x0,0x0,0xc4,0x82,0x38,0x1c,0xc2,0x4d,0x1,0x8e,0xe9,0xf,0xf6,0xc3,0x73,0xe0,0xee,0x4e,0x3f,0xa,0xd2,0xc5,0x82,0x20,0x3};
/*
85 -- Array of length 5
fa -- float
40a00000 -- 5.0
fb -- double
3f129cbab649d389 -- 0.000071
c3 -- Tag 3 (negative bignum)
49 -- Byte string value of length 9
010000000000000000
c4 -- Tag 4 (decimal fraction)
82 -- Array of length 2
38 -- Negative integer of length 1
1c -- -29
c2 -- Tag 2 (positive bignum)
4d -- Byte string value of length 13
018ee90ff6c373e0ee4e3f0ad2
c5 -- Tag 5 (bigfloat)
82 -- Array of length 2
20 -- -1
03 -- 3
*/
// Decode to a json value (despite its name, it is not JSON specific.)
json j = cbor::decode_cbor<json>(v);
// Serialize to JSON
std::cout << "(1)\n";
std::cout << pretty_print(j);
std::cout << "\n\n";
// as<std::string>() and as<double>()
std::cout << "(2)\n";
std::cout << std::dec << std::setprecision(15);
for (const auto& item : j.array_range())
{
std::cout << item.as<std::string>() << ", " << item.as<double>() << "\n";
}
std::cout << "\n";
// Query with JSONPath
std::cout << "(3)\n";
json result = jsonpath::json_query(j,"$.[?(# < 1.5)]");
std::cout << pretty_print(result) << "\n\n";
// Encode result as CBOR
std::vector<uint8_t> val;
cbor::encode_cbor(result,val);
std::cout << "(4)\n";
for (auto c : val)
{
std::cout << std::hex << std::setprecision(2) << std::setw(2)
<< std::setfill('0') << static_cast<int>(c);
}
std::cout << "\n\n";
/*
83 -- Array of length 3
fb -- double
3f129cbab649d389 -- 0.000071
c3 -- Tag 3 (negative bignum)
49 -- Byte string value of length 9
010000000000000000
c4 -- Tag 4 (decimal fraction)
82 -- Array of length 2
38 -- Negative integer of length 1
1c -- -29
c2 -- Tag 2 (positive bignum)
4d -- Byte string value of length 13
018ee90ff6c373e0ee4e3f0ad2
*/
}
Output:
(1)
[
2.0,
7.1e-05,
"-18446744073709551617",
"1.23456789012345678901234567890",
[-1, 3]
]
(2)
2.0, 2
7.1e-05, 7.1e-05
-18446744073709551617, -1.84467440737096e+19
1.23456789012345678901234567890, 1.23456789012346
1.5, 1.5
(3)
[
7.1e-05,
"-18446744073709551617",
"1.23456789012345678901234567890"
]
(4)
83fb3f129cbab649d389c349010000000000000000c482381cc24d018ee90ff6c373e0ee4e3f0ad2
Sure, you can use any general purpose programming language for querying CBOR, for example JavaScript might be a good choice. But if you are looking for a "query language" like JsonPath, I'm not aware of any specifically developed for CBOR.
I have 27 combinations of 3 values from -1 to 1 of type:
Vector3(0,0,0);
Vector3(-1,0,0);
Vector3(0,-1,0);
Vector3(0,0,-1);
Vector3(-1,-1,0);
... up to
Vector3(0,1,1);
Vector3(1,1,1);
I need to convert them to and from a 8-bit sbyte / byte array.
One solution is to say the first digit, of the 256 = X the second digit is Y and the third is Z...
so
Vector3(-1,1,1) becomes 022,
Vector3(1,-1,-1) becomes 200,
Vector3(1,0,1) becomes 212...
I'd prefer to encode it in a more compact way, perhaps using bytes (which I am clueless about), because the above solution uses a lot of multiplications and round functions to decode, do you have some suggestions please? the other option is to write 27 if conditions to write the Vector3 combination to an array, it seems inefficient.
Thanks to Evil Tak for the guidance, i changed the code a bit to add 0-1 values to the first bit, and to adapt it for unity3d:
function Pack4(x:int,y:int,z:int,w:int):sbyte {
var b: sbyte = 0;
b |= (x + 1) << 6;
b |= (y + 1) << 4;
b |= (z + 1) << 2;
b |= (w + 1);
return b;
}
function unPack4(b:sbyte):Vector4 {
var v : Vector4;
v.x = ((b & 0xC0) >> 6) - 1; //0xC0 == 1100 0000
v.y = ((b & 0x30) >> 4) - 1; // 0x30 == 0011 0000
v.z = ((b & 0xC) >> 2) - 1; // 0xC == 0000 1100
v.w = (b & 0x3) - 1; // 0x3 == 0000 0011
return v;
}
I assume your values are float not integer
so bit operations will not improve speed too much in comparison to conversion to integer type. So my bet using full range will be better. I would do this for 3D case:
8 bit -> 256 values
3D -> pow(256,1/3) = ~ 6.349 values per dimension
6^3 = 216 < 256
So packing of (x,y,z) looks like this:
BYTE p;
p =floor((x+1.0)*3.0);
p+=floor((y+1.0)*3.0*6.0);
p+=floor((y+1.0)*3.0*6.0*6.0);
The idea is convert <-1,+1> to range <0,1> hence the +1.0 and *3.0 instead of *6.0 and then just multiply to the correct place in final BYTE.
and unpacking of p looks like this:
x=p%6; x=(x/3.0)-1.0; p/=6;
y=p%6; y=(y/3.0)-1.0; p/=6;
z=p%6; z=(z/3.0)-1.0;
This way you use 216 from 256 values which is much better then just 2 bits (4 values). Your 4D case would look similar just use instead 3.0,6.0 different constant floor(pow(256,1/4))=4 so use 2.0,4.0 but beware case when p=256 or use 2 bits per dimension and bit approach like the accepted answer does.
If you need real speed you can optimize this to force float representation holding result of packet BYTE to specific exponent and extract mantissa bits as your packed BYTE directly. As the result will be <0,216> you can add any bigger number to it. see IEEE 754-1985 for details but you want the mantissa to align with your BYTE so if you add to p number like 2^23 then the lowest 8 bit of float should be your packed value directly (as MSB 1 is not present in mantissa) so no expensive conversion is needed.
In case you got just {-1,0,+1} instead of <-1,+1>
then of coarse you should use integer approach like bit packing with 2 bits per dimension or use LUT table of all 3^3 = 27 possibilities and pack entire vector in 5 bits.
The encoding would look like this:
int enc[3][3][3] = { 0,1,2, ... 24,25,26 };
p=enc[x+1][y+1][z+1];
And decoding:
int dec[27][3] = { {-1,-1,-1},.....,{+1,+1,+1} };
x=dec[p][0];
y=dec[p][1];
z=dec[p][2];
Which should be fast enough and if you got many vectors you can pack the p into each 5 bits ... to save even more memory space
One way is to store the component of each vector in every 2 bits of a byte.
Converting a vector component value to and from the 2 bit stored form is as simple as adding and subtracting one, respectively.
-1 (1111 1111 as a signed byte) <-> 00 (in binary)
0 (0000 0000 in binary) <-> 01 (in binary)
1 (0000 0001 in binary) <-> 10 (in binary)
The packed 2 bit values can be stored in a byte in any order of your preference. I will use the following format: 00XXYYZZ where XX is the converted (packed) value of the X component, and so on. The 0s at the start aren't going to be used.
A vector will then be packed in a byte as follows:
byte Pack(Vector3<int> vector) {
byte b = 0;
b |= (vector.x + 1) << 4;
b |= (vector.y + 1) << 2;
b |= (vector.z + 1);
return b;
}
Unpacking a vector from its byte form will be as follows:
Vector3<int> Unpack(byte b) {
Vector3<int> v = new Vector<int>();
v.x = ((b & 0x30) >> 4) - 1; // 0x30 == 0011 0000
v.y = ((b & 0xC) >> 2) - 1; // 0xC == 0000 1100
v.z = (b & 0x3) - 1; // 0x3 == 0000 0011
return v;
}
Both the above methods assume that the input is valid, i.e. All components of vector in Pack are either -1, 0 or 1 and that all two-bit sections of b in Unpack have a (binary) value of either 00, 01 or 10.
Since this method uses bitwise operators, it is fast and efficient. If you wish to compress the data further, you could try using the 2 unused bits too, and convert every 3 two-bit elements processed to a vector.
The most compact way is by writing a 27 digits number in base 3 (using a shift -1 -> 0, 0 -> 1, 1 -> 2).
The value of this number will range from 0 to 3^27-1 = 7625597484987, which takes 43 bits to be encoded, i.e. 6 bytes (and 5 spare bits).
This is a little saving compared to a packed representation with 4 two-bit numbers packed in a byte (hence 7 bytes/56 bits in total).
An interesting variant is to group the base 3 digits five by five in bytes (hence numbers 0 to 242). You will still require 6 bytes (and no spare bits), but the decoding of the bytes can easily be hard-coded as a table of 243 entries.
Usually the second parameter in clCreateImage2D is a flag CL_MEM_READ etc. But I found it 0 in one of the sample codes (P. no: 80, Heterogeneous Computing using openCL ):
//Create space for the source image on the device
cl_mem bufferSourceImage = clCreateImage2D(
context,0,&format, width,height,0,NULL,NULL);
Why it is so?
cl_mem_flags are bitfields:
cl.h
/* cl_mem_flags - bitfield */
#define CL_MEM_READ_WRITE (1 << 0)
#define CL_MEM_WRITE_ONLY (1 << 1)
#define CL_MEM_READ_ONLY (1 << 2)
#define CL_MEM_USE_HOST_PTR (1 << 3)
#define CL_MEM_ALLOC_HOST_PTR (1 << 4)
#define CL_MEM_COPY_HOST_PTR (1 << 5)
// reserved (1 << 6)
#define CL_MEM_HOST_WRITE_ONLY (1 << 7)
#define CL_MEM_HOST_READ_ONLY (1 << 8)
#define CL_MEM_HOST_NO_ACCESS (1 << 9)
Here, 0 is a default value for CL_MEM_READ_WRITE :
A bit-field that is used to specify allocation and usage information
such as the memory arena that should be used to allocate the buffer
object and how it will be used. The following table describes the
possible values for flags. If value specified for flags is 0, the
default is used which is CL_MEM_READ_WRITE.
From: clCreateBuffer
I have taken the Kernel from the great OpenCL SpMV article for AMD by Bryan Catanzaro.
I have given it a toy problem where the input is
A= [0 0 0 6 1 3 5 7 2 4 0 0]
offsets= [-3 0 2]
x= [1 2 3 4]
and the output y should be [7 22 15 34]
Here is the kernel:
__kernel
void dia_spmv(__global float *A, __const int rows,
__const int diags, __global int *offsets,
__global float *x, __global float *y) {
int row = get_global_id(0);
float accumulator = 0;
for(int diag = 0; diag < diags; diag++) {
int col = row + offsets[diag];
if ((col >= 0) && (col < rows)) {
float m = A[diag*rows + row];
float v = x[col];
accumulator += m * v;
}
}
y[row] = accumulator;
}
After loading and writing the input arguments I execute the kernel like this:
size_t global_work_size;
global_work_size = 4;
err = clEnqueueNDRangeKernel(cmd_queue, kernel, 1, NULL, &global_work_size,NULL, 0, NULL, NULL);
err = clFinish(cmd_queue);
And I get the correct result when I read y back from gpu memory.
I.e. I get y = [7 22 15 34]
I am new to OpenCL (and GPGPU in general) so I want to try and understand how to extend the problem correctly for much larger matrices of arbitrary dimension.
So lets say I have 1000 000 rows. What should I set global_work_size to be?
And should I set local_work_size or should I leave it as NULL?
To use the kernel for arbitrary matrix sizes you should think about the problem and rewrite the kernel. The issue is the limited memory size of the GPU and limited size for a single buffer. You can get the maximum size for a buffer with clGetDeviceInfo and CL_DEVICE_MAX_MEM_ALLOC_SIZE.
You need to split your problem into smaller pieces. Calculate them separately and merge the results afterwards.
I do not know the problem above and can not give you any hint which helps you to implement this. I can only give you the general direction.