Qt: convert decimal to hex with 4 Bytes - qt

I am using Qt to convert a decimal into a hex string
QString hexvalue = QString("%1").arg(decimal, 8, 16, QLatin1Char( '0' ));
I want to have
1: 00 00 00 01
-1: FF FF FF FF
This code however results in
FF FF FF FF FF FF FF FF and 00 00 00 01
How can I limit this to 4 Bytes?

You can use a mask with an AND operator to do this :
int val = -1;
qDebug() << QString("%1 : %2").arg(val).arg(val & 0xffffffff, 8, 16, QLatin1Char('0'));
will display
"-1 : ffffffff"
EDIT
As requested in comments, here is one way to have a variable length in the range 0 < length <= 8:
int mask = 0xffffffff >> (32 - 4 * length); // assuming a 32 bit integer
int val = -1;
qDebug() << QString("%1 : %2").arg(val).arg((unsigned int)(val & mask), length, 16, QLatin1Char('0'));

Related

get ecc public key from x and y components in PEM format using openssl

Can I get the ecc public key from x and y components in PEM format using openssl?
X:
1d 43 15 e3 84 99 d6 f6 9f 49 61 8a ae ec f2 4f
Y:
b5 1a 86 cf f9 0e 01 af 3a 9a 52 b3 c6 58 2c 48
thank you!!!!!!!!!!!
Yes it is possible. Here an example in C.
int main(void)
{
EC_GROUP *group;
EC_POINT *point;
EC_KEY *key;
BIGNUM *x, *y;
BIO *out;
ERR_load_crypto_strings();
OpenSSL_add_all_algorithms();
group = EC_GROUP_new_by_curve_name(NID_secp256k1);
x = BN_new();
y = BN_new();
BN_hex2bn(&x, "1d4315e38499d6f69f49618aaeecf24f");
BN_hex2bn(&y, "b51a86cff90e01af3a9a52b3c6582c48");
/* create EC point from X and Y */
point = EC_POINT_new(group);
EC_POINT_set_affine_coordinates_GFp(group, point, x, y, NULL);
/* Create a new EC key and set the public key */
key = EC_KEY_new();
EC_KEY_set_group(key, group);
EC_KEY_set_public_key(key, point);
out = BIO_new(BIO_s_file());
BIO_set_fp(out, stdout, BIO_NOCLOSE);
PEM_write_bio_EC_PUBKEY(out, key);
/* Clean up */
BN_free(x);
BN_free(y);
EC_POINT_free(point);
EC_GROUP_free(group);
EC_KEY_free(key);
BIO_free(out);
return 0;
}

FMA instruction showing up as three packed double operations?

I'm analyzing a piece of linear algebra code which is calling intrinsics directly, e.g.
v_dot0 = _mm256_fmadd_pd( v_x0, v_y0, v_dot0 );
My test script computes the dot product of two double precision vectors of length 4 (so only one call to _mm256_fmadd_pd needed), repeated 1 billion times. When I count the number of operations with perf I get something as follows:
Performance counter stats for './main':
0 r5380c7 (skl::FP_ARITH:512B_PACKED_SINGLE) (49.99%)
0 r5340c7 (skl::FP_ARITH:512B_PACKED_DOUBLE) (49.99%)
0 r5320c7 (skl::FP_ARITH:256B_PACKED_SINGLE) (49.99%)
2'998'943'659 r5310c7 (skl::FP_ARITH:256B_PACKED_DOUBLE) (50.01%)
0 r5308c7 (skl::FP_ARITH:128B_PACKED_SINGLE) (50.01%)
1'999'928'140 r5304c7 (skl::FP_ARITH:128B_PACKED_DOUBLE) (50.01%)
0 r5302c7 (skl::FP_ARITH:SCALAR_SINGLE) (50.01%)
1'000'352'249 r5301c7 (skl::FP_ARITH:SCALAR_DOUBLE) (49.99%)
I was surprised that the number of 256B_PACKED_DOUBLE operations is approx. 3 billion, instead of 1 billion, as this is an instruction from my architecture's instruction set. Why does perf count 3 packed double operations per call to _mm256_fmadd_pd?
Note: to test that the code is not calling other floating point operations accidentally, I commented out the call to the above mentioned intrinsic, and perf counts exactly zero 256B_PACKED_DOUBLE operations, as expected.
Edit: MCVE, as requested:
ddot.c
#include <immintrin.h> // AVX
double ddot(int m, double *x, double *y) {
int ii;
double dot = 0.0;
__m128d u_dot0, u_x0, u_y0, u_tmp;
__m256d v_dot0, v_dot1, v_x0, v_x1, v_y0, v_y1, v_tmp;
v_dot0 = _mm256_setzero_pd();
v_dot1 = _mm256_setzero_pd();
u_dot0 = _mm_setzero_pd();
ii = 0;
for (; ii < m - 3; ii += 4) {
v_x0 = _mm256_loadu_pd(&x[ii + 0]);
v_y0 = _mm256_loadu_pd(&y[ii + 0]);
v_dot0 = _mm256_fmadd_pd(v_x0, v_y0, v_dot0);
}
// reduce
v_dot0 = _mm256_add_pd(v_dot0, v_dot1);
u_tmp = _mm_add_pd(_mm256_castpd256_pd128(v_dot0), _mm256_extractf128_pd(v_dot0, 0x1));
u_tmp = _mm_hadd_pd(u_tmp, u_tmp);
u_dot0 = _mm_add_sd(u_dot0, u_tmp);
_mm_store_sd(&dot, u_dot0);
return dot;
}
main.c:
#include <stdio.h>
double ddot(int, double *, double *);
int main(int argc, char const *argv[]) {
double x[4] = {1.0, 2.0, 3.0, 4.0}, y[4] = {5.0, 5.0, 5.0, 5.0};
double xTy;
for (int i = 0; i < 1000000000; ++i) {
ddot(4, x, y);
}
printf(" %f\n", xTy);
return 0;
}
I run perf as
sudo perf stat -e r5380c7 -e r5340c7 -e r5320c7 -e r5310c7 -e r5308c7 -e r5304c7 -e r5302c7 -e r5301c7 ./a.out
The disassembly of ddot looks as follows:
0000000000000790 <ddot>:
790: 83 ff 03 cmp $0x3,%edi
793: 7e 6b jle 800 <ddot+0x70>
795: 8d 4f fc lea -0x4(%rdi),%ecx
798: c5 e9 57 d2 vxorpd %xmm2,%xmm2,%xmm2
79c: 31 c0 xor %eax,%eax
79e: c1 e9 02 shr $0x2,%ecx
7a1: 48 83 c1 01 add $0x1,%rcx
7a5: 48 c1 e1 05 shl $0x5,%rcx
7a9: 0f 1f 80 00 00 00 00 nopl 0x0(%rax)
7b0: c5 f9 10 0c 06 vmovupd (%rsi,%rax,1),%xmm1
7b5: c5 f9 10 04 02 vmovupd (%rdx,%rax,1),%xmm0
7ba: c4 e3 75 18 4c 06 10 vinsertf128 $0x1,0x10(%rsi,%rax,1),%ymm1,%ymm1
7c1: 01
7c2: c4 e3 7d 18 44 02 10 vinsertf128 $0x1,0x10(%rdx,%rax,1),%ymm0,%ymm0
7c9: 01
7ca: 48 83 c0 20 add $0x20,%rax
7ce: 48 39 c1 cmp %rax,%rcx
7d1: c4 e2 f5 b8 d0 vfmadd231pd %ymm0,%ymm1,%ymm2
7d6: 75 d8 jne 7b0 <ddot+0x20>
7d8: c5 f9 57 c0 vxorpd %xmm0,%xmm0,%xmm0
7dc: c5 ed 58 d0 vaddpd %ymm0,%ymm2,%ymm2
7e0: c4 e3 7d 19 d0 01 vextractf128 $0x1,%ymm2,%xmm0
7e6: c5 f9 58 d2 vaddpd %xmm2,%xmm0,%xmm2
7ea: c5 f9 57 c0 vxorpd %xmm0,%xmm0,%xmm0
7ee: c5 e9 7c d2 vhaddpd %xmm2,%xmm2,%xmm2
7f2: c5 fb 58 d2 vaddsd %xmm2,%xmm0,%xmm2
7f6: c5 f9 28 c2 vmovapd %xmm2,%xmm0
7fa: c5 f8 77 vzeroupper
7fd: c3 retq
7fe: 66 90 xchg %ax,%ax
800: c5 e9 57 d2 vxorpd %xmm2,%xmm2,%xmm2
804: eb da jmp 7e0 <ddot+0x50>
806: 66 2e 0f 1f 84 00 00 nopw %cs:0x0(%rax,%rax,1)
80d: 00 00 00
I just tested with an asm loop on SKL. An FMA instructions like vfmadd231pd ymm0, ymm1, ymm3 counts for 2 counts of fp_arith_inst_retired.256b_packed_double, even though it's a single uop!
I guess Intel really wanted a FLOP counter, not an instruction or uop counter.
Your 3rd 256-bit FP uop is probably coming from something else you're doing, like a horizontal sum that starts out doing a 256-bit shuffle and another 256-bit add, instead of reducing to 128-bit first. I hope you're not using _mm256_hadd_pd!
Test code inner loop:
$ asm-link -d -n "testloop.asm" # assemble with NASM -felf64 and link with ld into a static binary
mov ebp, 100000000 # setup stuff outside the loop
vzeroupper
0000000000401040 <_start.loop>:
401040: c4 e2 f5 b8 c3 vfmadd231pd ymm0,ymm1,ymm3
401045: c4 e2 f5 b8 e3 vfmadd231pd ymm4,ymm1,ymm3
40104a: ff cd dec ebp
40104c: 75 f2 jne 401040 <_start.loop>
$ taskset -c 3 perf stat -etask-clock,context-switches,cpu-migrations,page-faults,cycles,branches,instructions,uops_issued.any,uops_executed.thread,fp_arith_inst_retired.256b_packed_double -r4 ./"$t"
Performance counter stats for './testloop-cvtss2sd' (4 runs):
102.67 msec task-clock # 0.999 CPUs utilized ( +- 0.00% )
2 context-switches # 24.510 M/sec ( +- 20.00% )
0 cpu-migrations # 0.000 K/sec
2 page-faults # 22.059 M/sec ( +- 11.11% )
400,388,898 cycles # 3925381.355 GHz ( +- 0.00% )
100,050,708 branches # 980889291.667 M/sec ( +- 0.00% )
400,256,258 instructions # 1.00 insn per cycle ( +- 0.00% )
300,377,737 uops_issued.any # 2944879772.059 M/sec ( +- 0.00% )
300,389,230 uops_executed.thread # 2944992450.980 M/sec ( +- 0.00% )
400,000,000 fp_arith_inst_retired.256b_packed_double # 3921568627.451 M/sec
0.1028042 +- 0.0000170 seconds time elapsed ( +- 0.02% )
400M counts of fp_arith_inst_retired.256b_packed_double for 200M FMA instructions / 100M loop iterations.
(IDK what up with perf 4.20.g8fe28c + kernel 4.20.3-arch1-1-ARCH. They calculate per-second stuff with the decimal in the wrong place for the unit. e.g. 3925381.355 kHz is correct, not GHz. Not sure if it's a bug in perf or the kernel.
Without vzeroupper, I'd sometimes see a latency of 5 cycles, not 4, for FMA. IDK if the kernel left a register in a polluted state or something.
Why do I get three though, and not two? (see MCVE added to original post)
Your ddot4 runs _mm256_add_pd(v_dot0, v_dot1); at the start of the cleanup, and since you call it with size=4, you get the cleanup once per FMA.
Note that your v_dot1 is always zero (because you didn't actually unroll with 2 accumulators like you're planning to?) So this is pointless, but the CPU doesn't know that. My guess was wrong, it's not a 256-bit hadd, it's just a useless 256-bit vertical add.
(For larger vectors, yes multiple accumulators are very valuable to hide FMA latency. You'll want at least 8 vectors. See Why does mulss take only 3 cycles on Haswell, different from Agner's instruction tables? for more about unrolling with multiple accumulators. But then you'll want a cleanup loop that does 1 vector at a time until you're down to the last up-to-3 elements.)
Also, I think your final _mm_add_sd(u_dot0, u_tmp); is actually a bug: you've already added the last pair of elements with an inefficient 128-bit hadd, so this double-counts the lowest element.
See Get sum of values stored in __m256d with SSE/AVX for a way that doesn't suck.
Also note that GCC is splitting your unaligned loads into 128-bit halves with vinsertf128 because you compiled with the default -mtune=generic (which favours Sandybridge) instead of using -march=haswell to enable AVX+FMA and set -mtune=haswell. (Or use -march=native)

Is there any languages for querying CBOR?

I'm looking for a languages for querying CBOR, like JsonPath or jq but for CBOR binary format. I don't want to convert from CBOR to JSON because some CBOR type is not existed in JSON, and performance issue.
The C++ library jsoncons allows you to query CBOR with JSONPath, for example,
#include <jsoncons/json.hpp>
#include <jsoncons_ext/cbor/cbor.hpp>
#include <jsoncons_ext/jsonpath/json_query.hpp>
#include <iomanip>
using namespace jsoncons; // For convenience
int main()
{
std::vector<uint8_t> v = {0x85,0xfa,0x40,0x0,0x0,0x0,0xfb,0x3f,0x12,0x9c,0xba,0xb6,0x49,0xd3,0x89,0xc3,0x49,0x1,0x0,0x0,0x0,0x0,0x0,0x0,0x0,0x0,0xc4,0x82,0x38,0x1c,0xc2,0x4d,0x1,0x8e,0xe9,0xf,0xf6,0xc3,0x73,0xe0,0xee,0x4e,0x3f,0xa,0xd2,0xc5,0x82,0x20,0x3};
/*
85 -- Array of length 5
fa -- float
40a00000 -- 5.0
fb -- double
3f129cbab649d389 -- 0.000071
c3 -- Tag 3 (negative bignum)
49 -- Byte string value of length 9
010000000000000000
c4 -- Tag 4 (decimal fraction)
82 -- Array of length 2
38 -- Negative integer of length 1
1c -- -29
c2 -- Tag 2 (positive bignum)
4d -- Byte string value of length 13
018ee90ff6c373e0ee4e3f0ad2
c5 -- Tag 5 (bigfloat)
82 -- Array of length 2
20 -- -1
03 -- 3
*/
// Decode to a json value (despite its name, it is not JSON specific.)
json j = cbor::decode_cbor<json>(v);
// Serialize to JSON
std::cout << "(1)\n";
std::cout << pretty_print(j);
std::cout << "\n\n";
// as<std::string>() and as<double>()
std::cout << "(2)\n";
std::cout << std::dec << std::setprecision(15);
for (const auto& item : j.array_range())
{
std::cout << item.as<std::string>() << ", " << item.as<double>() << "\n";
}
std::cout << "\n";
// Query with JSONPath
std::cout << "(3)\n";
json result = jsonpath::json_query(j,"$.[?(# < 1.5)]");
std::cout << pretty_print(result) << "\n\n";
// Encode result as CBOR
std::vector<uint8_t> val;
cbor::encode_cbor(result,val);
std::cout << "(4)\n";
for (auto c : val)
{
std::cout << std::hex << std::setprecision(2) << std::setw(2)
<< std::setfill('0') << static_cast<int>(c);
}
std::cout << "\n\n";
/*
83 -- Array of length 3
fb -- double
3f129cbab649d389 -- 0.000071
c3 -- Tag 3 (negative bignum)
49 -- Byte string value of length 9
010000000000000000
c4 -- Tag 4 (decimal fraction)
82 -- Array of length 2
38 -- Negative integer of length 1
1c -- -29
c2 -- Tag 2 (positive bignum)
4d -- Byte string value of length 13
018ee90ff6c373e0ee4e3f0ad2
*/
}
Output:
(1)
[
2.0,
7.1e-05,
"-18446744073709551617",
"1.23456789012345678901234567890",
[-1, 3]
]
(2)
2.0, 2
7.1e-05, 7.1e-05
-18446744073709551617, -1.84467440737096e+19
1.23456789012345678901234567890, 1.23456789012346
1.5, 1.5
(3)
[
7.1e-05,
"-18446744073709551617",
"1.23456789012345678901234567890"
]
(4)
83fb3f129cbab649d389c349010000000000000000c482381cc24d018ee90ff6c373e0ee4e3f0ad2
Sure, you can use any general purpose programming language for querying CBOR, for example JavaScript might be a good choice. But if you are looking for a "query language" like JsonPath, I'm not aware of any specifically developed for CBOR.

Floating point data format sign+exponent

I am receiving data over UART from a heat meter, but I need some help to understand how i should deal with the data.
I have the documentation but that is not enough for me, I have to little experience with this kind of calculations.
Maybe someone with the right skill could explain to me how it should be done with a better example that I have from the documentation.
One value consists of the following bytes:
[number of bytes][sign+exponent] (integer)
(integer) is the register data value. The length of the integer value is
specified by [number of bytes]. [sign+exponent] is an 8-bit value that
specifies the sign of the data value and sign and value of the exponent. The
meaning of the individual bits in the [sign+exponent] byte is shown below:
Examples:
-123.45 = 04h, C2h, 0h, 0h, 30h, 39h
87654321*103 = 04h, 03h , 05h, 39h, 7Fh, B1h
255*103 = 01h, 03h , FFh
And now to one more example with actual data.
This is the information that I have from the documentation about this.
This is some data that I have received from my heat meter
10 00 56 25 04 42 00 00 1B E4
So in my example then 04 is the [number of bytes], 42 is the [sign+exponent] and 00 00 1B E4 is the (integer).
But I do not know how I should make the calculation to receive the actual value.
Any help?
Your data appears to be big-endian, according to your example. So here's how you break those bytes into the fields you need using bit shifting and masking.
n = b[0]
SI = (b[1] & 0x80) >> 7
SE = (b[1] & 0x40) >> 6
exponent = b[1] & 0x3f
integer = 0
for i = 0 to n-1:
integer = (integer << 8) + b[2+i]
The sign of the mantissa is obtained from the MSb of the Sign+exponent byte, by masking (byte & 80h != 0 => SI = -1).
The sign of the exponent is similarly obtained by byte & 40h != 0 => SE = -1.
The exponent value is EXP = byte & 3Fh.
The mantissa INT is the binary number formed by the four other bytes, which can be read as a single integer (but mind the indianness).
Finally, compute SI * INT * pow(10, SE * EXP).
In your example, SI = 1, SE = -1, EXP = 2, INT = 7140, hence
1 * 7140 * pow(10, -1 * 2) = +71.4
It is not in the scope of this answer to explain how to implement this efficiently.

hex offset sector

I'm getting a response from a nameserver which is longer then 512 bytes. in that response are some offsets. an offset from the beginning of the response is going fine, but when i get above 512 bytes the offset changes and it doesn't work anymore.
c0 0c = byte 12 from the start(works like a charm)
i have an offset:c1 f0 which means(in my knowledge so far)
c1 = 1 x 512 = 512
f0 = 240
c1 f0= byte 240 from byte 512 == byte 752
my offset should point to the beginning of a name, which should be located at byte 752
but at byte 752 the name isn't located.
Question
how does the offset work after 512 bytes?
It is a relative reference. In order to indicate that it is a relative reference, the first 2 bits are "reserved". You can reference a maximum of 14 bits: 2 bytes with the highest 2 bits are reserved. C0 01 is the reference offset 1. It does therefore not always have to be C0. it can also be C1, C2, C3, C4, CF etc. In practice this will be fairly rare unless you have a very complex long running queries which is the case. I have a query of 3000+ bytes:)
C1 = 11000001
strip 2 highest bits : 000001
number = 1
offset of C1 F0 is 1 x 256 + 240 = 496
offset of C9 9F is 9 x 256 + 159 = 2463
in one byte there are 256 combinations, not 512 which is used :S
The max of C0 is C0 FF which is 255. after that C1 00 starts
Credits of this explanation go to http://www.helpmij.nl/forum/member.php/215405-wampier

Resources