I'm working on a ESP32 using Arduino, for some reason the values are printed differently, what is the cause?
auto reset_time = 24L * 60 * 60 * 1000 * 1000; //86400000000
Serial.print("Reset Timer in: ");
Serial.println(reset_time);
Serial.print((reset_time / 1000));
Serial.println(" ms");
Serial.print((reset_time / 1000 / 1000));
Serial.println(" s");
Serial.print((reset_time / 1000 / 1000 / 60));
Serial.println(" m");
Serial.print((reset_time / 1000 / 1000 / 60 / 60));
Serial.println(" h");
This produces the following output:
21:05:58.310 -> Reset Timer in: 500654080
21:05:58.310 -> 500654 ms
21:05:58.310 -> 500 s
21:05:58.310 -> 8 m
21:05:58.310 -> 0 h
86400000000 Mod 2^32 is 500654080.
The value is larger than fits in a 32 bit int; what you see is the remainder.
If I read a C17 draft correctly, a constant expression that cannot be represented in its type is a constraint violation. It requires a diagnostic message from the compiler:
6.6 Constant expressions
Constraints
[...]
4 Each constant expression shall evaluate to a constant
that is in the range of representable values for
its type.
5.1.1.3 Diagnostics
1 A conforming implementation shall produce at least one diagnostic message (identified in an implementation-
defined manner) if a preprocessing translation unit or translation unit contains a violation
of any syntax rule or constraint,
Related
I am trying to use a function inside constraint block. Randomization happens but the constraints are not being met. I have verified that the function works as expected outside the constraint block. The function uses only function arguments and not any other class members.
local rand logic [6:0] [3:0] [9:0] coeff_mult;
constraint prods_are_multiples {
foreach(coeff_mult[i]) {
get_real(coeff_mult[i][3]) == (-1 * get_real(coeff_mult[i][0]));
get_real(coeff_mult[i][2]) == (-1 * get_real(coeff_mult[i][1]));
get_real(coeff_mult[i][0]) == (3 * get_real(coeff_mult[i][1]));
}
}
function automatic shortreal get_real(input [9:0] val);
shortreal sign;
bit [9:0] magnitude;
sign = -1**(val[9]);
magnitude = ({10{val[9]}} ^ val[9:0]) + val[9];
get_real = sign * (magnitude[9:3] + magnitude[2] * 0.5 + magnitude[1] * 0.25 + magnitude[0] * 0.125);
endfunction
I came across a similar post, but it didnt solve my problem.
Is there anything wrong with the code? If not, is there any other way of doing this?
The post you reference explains the reasoning. The inputs to your function get their random values chosen before calling the function in the constraint. So there are effectively no constraints on coeff_mult before evaluating the equality constraints.
Also, the LRM does not allow expressions of non-integral values in constraints, technically, although some tools allow limited cases.
The best strategy for randomizing real numbers is doing everything with scaled integral values, then converting the resulting values to real (or sign/magnitude) in post_randomize().
I have a working example of your kind of function usage with use of additional random variable. I have checked this with ~10 different seeds and I have also posted constraint results with 1 particular seed.
typedef bit [7:0] tabc;
class t;
rand bit [3:0] a, b;
rand tabc ca;
// Original Constraint : get_b(b) == get_a(a) + 1;
constraint c {
ca == get_b(b);
get_a(a) == ca - 1;
}
function tabc get_a (input bit[3:0] a);
return (tabc'(a + 15));
endfunction
function tabc get_b (input bit[3:0] b);
return (tabc'(b + 10));
endfunction
endclass
program temp();
t t1 = new();
initial
begin
repeat (10) begin
t1.randomize();
$display("t1.a - %0d", t1.a);
$display("t1.b - %0d", t1.b);
end
end
endprogram
// Results -
t1.a - 7
t1.b - 13
t1.a - 2
t1.b - 8
t1.a - 5
t1.b - 11
t1.a - 4
t1.b - 10
t1.a - 0
t1.b - 6
t1.a - 8
t1.b - 14
t1.a - 9
t1.b - 15
t1.a - 6
t1.b - 12
t1.a - 0
t1.b - 6
t1.a - 5
t1.b - 11
I am not quite sure of how this one is different than your method. However I am thinking that without having any random variable in constraints, the solver might have considered as no constraint for any variable and hence it didn't try to solve constraint before the solution.
I tried to insert temporary variable, in order to force solver to look at constraints first.
I am not sure how much correct my explanation is, but atleast it worked for me for some time. You can check it and try to run it with more seeds, before adapting it into your solution.
i am new to opencl and i want to actually parallelise this Sieve Prime, the C++ code is here: https://www.geeksforgeeks.org/sieve-of-atkin/
I somehow don't get the good results out of it, actually the CPU version is much faster after comparing. I tried to use NDRangekernel to avoid writing the nested loops and probably increase the performance but when i give higher limit number in function, the GPU driver stops responding and the program crashes. Maybe my NDRangekernel config is not ok, anyone could help with it? I probably don't get the NDRange properly, here are the info about my GPU.
CL_DEVICE_NAME: GeForce GT 740M
CL_DEVICE_VENDOR: NVIDIA Corporation
CL_DRIVER_VERSION: 397.31
CL_DEVICE_TYPE: CL_DEVICE_TYPE_GPU
CL_DEVICE_MAX_COMPUTE_UNITS: 2
CL_DEVICE_MAX_WORK_ITEM_DIMENSIONS: 3
CL_DEVICE_MAX_WORK_ITEM_SIZES: 1024 / 1024 / 64
CL_DEVICE_MAX_WORK_GROUP_SIZE: 1024
CL_DEVICE_MAX_CLOCK_FREQUENCY: 1032 MHz
CL_DEVICE_ADDRESS_BITS: 32
CL_DEVICE_MAX_MEM_ALLOC_SIZE: 512 MByte
CL_DEVICE_GLOBAL_MEM_SIZE: 2048 MByte
CL_DEVICE_ERROR_CORRECTION_SUPPORT: no
CL_DEVICE_LOCAL_MEM_TYPE: local
CL_DEVICE_LOCAL_MEM_SIZE: 48 KByte
CL_DEVICE_MAX_CONSTANT_BUFFER_SIZE: 64 KByte
CL_DEVICE_QUEUE_PROPERTIES:
-CL_QUEUE_OUT_OF_ORDER_EXEC_MODE_ENABLE
CL_DEVICE_QUEUE_PROPERTIES: CL_QUEUE_PROFILING_ENABLE
CL_DEVICE_IMAGE_SUPPORT: 1
CL_DEVICE_MAX_READ_IMAGE_ARGS: 256
CL_DEVICE_MAX_WRITE_IMAGE_ARGS: 16
here is my NDRange code
queue.enqueueNDRangeKernel(add, cl::NDRange(1,1), cl::NDRange((limit * limit) -1, (limit * limit) -1 ), cl::NullRange,NULL, &event);
and my kernel code:
__kernel void sieveofAktin(const int limit, __global bool* sieve)
{
int x = get_global_id(0);
int y = get_global_id(1);
//printf("%d \n", x);
int n = (4 * x * x) + (y * y);
if (n <= limit && (n % 12 == 1 || n % 12 == 5))
sieve[n] ^= true;
n = (3 * x * x) + (y * y);
if (n <= limit && n % 12 == 7)
sieve[n] ^= true;
n = (3 * x * x) - (y * y);
if (x > y && n <= limit && n % 12 == 11)
sieve[n] ^= true;
for (int r = 5; r * r < limit; r++) {
if (sieve[r]) {
for (int i = r * r; i < limit; i += r * r)
sieve[i] = false;
}
}
}
You have lots of branching in that code, and I suspect that's what may be killing your performance on GPUs. Look at chapter 6 of the NVIDIA OpenCL Best Practices Guide for details on why this hurts performance.
I'm not sure how possible it is without looking closely at the algorithm, but ideally you want to rewrite the code to use as little branching as possible. Alternatively, you could look at other algorithms entirely.
As for the locking, I'd need to see more of your host code to know what is happening, but it's possible you're exceeding various limits of your platform/device. Are you checking for errors on every OpenCL function you call?
Regardless of how good or bad your algorithm or implementation is - the driver should always respond. Non-response is quite possibly a bug. File a bug report at http://developer.nvidia.com/ .
In erlang:
cost(I, Miners) ->
BasePrice = lists:nth(I, prices()),
Owned = lists:nth(I, Miners),
Rate = increaseRate(I) / 100,
Multiplier = math:pow((1 + Rate), Owned),
floor(BasePrice * Multiplier).
for example, a base price of 8000, with an increase rate of 7, and I own 0
the price of the first one I expect to be: 8000
when buying my second one, with an increase rate of 7, and I own 1
the price of the second one I expect to be:
Multiplier = 1.07
8000 * 1.07 =
8560
This all works fine. Now I have to implement this in Solidity, which doesn't do decimal math very well. It auto rounds down such that 3/2 == 1 in Solidity.
I want to recreate my cost function in Solidity.
function cost(uint _minerIndex, uint _owned) public view returns (uint) {
uint basePrice = 8000;
uint increaseRate = 7;
return basePrice * ((1 + increaseRate / 100) ** _owned);
}
increaseRate / 100 will always return 0 if increaseRate is < 100.
How do I achieve this same effect?
From the documentation:
"Fixed point numbers are not fully supported by Solidity yet. They can be declared, but cannot be assigned to or from."
a simple solution is
(basePrice * ((100+increaseRate)** _owned))/(100 ** _owned)
but it may fail also because of arithmetic overflow, depending on your numbers and the MaxInt supported by solidity.
Hello I'm making a base 10 calculator in assembler that can take number with max length of 5 dig... so there is two numbers after the input was taken one of the five dig number is stored in ax and bl for example
AX - 23 45
BX - 00 01
So the value of the input is 12345 And the other is for example is 23243 and it's stored on CX and DX with the same idea of the first number (that stored in AX and BX...) Now, I have made the addition code, but I can't figure out how making the Subtraction code with all the neg problem...
So what I thought to do is to, for example, take bh (that I'm not using because the number can't be longer than 6 digs...) and if the number is negative Ill put 1 and if its positive I'll put 0 so this problem is solved, Now the problem is that I dont know how to make the code work like with all the sub part and the carry and every thing ...(in the addition i used commands like adc,daa...)
last example:
value is: 12345 and its positive
AX - 23 45
BX - 00 01
(if Bh is 0 the number is positive if 1 its negative...)
Now the value is : 23243 and its positive
CX - 32 43
DX - 00 02
Calculation
12345-23243(= -10898)
lets say the answer goes to CX AND DX
so it will look like that:
CX - 08 98
DX - 01 01
answer: (-10898)
Can someone please help me/give me an example code that I'll know how to do it ?
Sorry if I'm little bit Confused...
Thx.
EDIT:
here is the addition code that you ask for:
proc Add_two_numbers;2 values useing stack...
pop [150]
pop dx
pop cx
pop bx
pop ax
add al,cl
daa
mov cl,al
mov al,ah
adc al,ch
daa
mov ch,al
mov al,bl
adc al,dl
daa
mov dl,al
push cx
push dx
push [150]
ret
endp Add_two_numbers
2nd edit:
I figure out how making it Negative so I just need algorithms that sub 2 number it does not need to work with numbers like 1000-2000 please make it work only on positive values like 2000-1000
Answering your comment, this is one way you can convert from decimal and back using C as an example. I leave you to code it in asm!
#include <conio.h>
#define MAX 100000000
// input a signed decimal number
int inp_num(void) {
int number=0, neg=0, key;
while (number < MAX) {
key = _getche();
if (key == '-') {
if (number==0)
neg = 1; // else ignore
}
else if (key >= '0' && key <= '9')
number = number * 10 + key - '0';
else
break;
}
if (neg)
number = -number;
_putch('\n');
return number;
}
// output a signed number as decimal
void out_num(int number) {
int digit, suppress0, d;
suppress0 = 1; // zero-suppression on
if (number < 0) {
_putch('-');
number =-number;
}
for (d=MAX; d>0; d/=10) {
digit = number / d;
if (digit) // if non-0
suppress0 = 0; // cancel zero-suppression
if (!suppress0)
_putch('0' + digit);
number -= digit * d;
}
}
int main(void) {
int number;
number = inp_num();
out_num(number);
return 0;
}
Is it possible to divide an unsigned integer by 10 by using pure bit shifts, addition, subtraction and maybe multiply? Using a processor with very limited resources and slow divide.
Editor's note: this is not actually what compilers do, and gives the wrong answer for large positive integers ending with 9, starting with div10(1073741829) = 107374183 not 107374182. It is exact for smaller inputs, though, which may be sufficient for some uses.
Compilers (including MSVC) do use fixed-point multiplicative inverses for constant divisors, but they use a different magic constant and shift on the high-half result to get an exact result for all possible inputs, matching what the C abstract machine requires. See Granlund & Montgomery's paper on the algorithm.
See Why does GCC use multiplication by a strange number in implementing integer division? for examples of the actual x86 asm gcc, clang, MSVC, ICC, and other modern compilers make.
This is a fast approximation that's inexact for large inputs
It's even faster than the exact division via multiply + right-shift that compilers use.
You can use the high half of a multiply result for divisions by small integral constants. Assume a 32-bit machine (code can be adjusted accordingly):
int32_t div10(int32_t dividend)
{
int64_t invDivisor = 0x1999999A;
return (int32_t) ((invDivisor * dividend) >> 32);
}
What's going here is that we're multiplying by a close approximation of 1/10 * 2^32 and then removing the 2^32. This approach can be adapted to different divisors and different bit widths.
This works great for the ia32 architecture, since its IMUL instruction will put the 64-bit product into edx:eax, and the edx value will be the wanted value. Viz (assuming dividend is passed in eax and quotient returned in eax)
div10 proc
mov edx,1999999Ah ; load 1/10 * 2^32
imul eax ; edx:eax = dividend / 10 * 2 ^32
mov eax,edx ; eax = dividend / 10
ret
endp
Even on a machine with a slow multiply instruction, this will be faster than a software or even hardware divide.
Though the answers given so far match the actual question, they do not match the title. So here's a solution heavily inspired by Hacker's Delight that really uses only bit shifts.
unsigned divu10(unsigned n) {
unsigned q, r;
q = (n >> 1) + (n >> 2);
q = q + (q >> 4);
q = q + (q >> 8);
q = q + (q >> 16);
q = q >> 3;
r = n - (((q << 2) + q) << 1);
return q + (r > 9);
}
I think that this is the best solution for architectures that lack a multiply instruction.
Of course you can if you can live with some loss in precision. If you know the value range of your input values you can come up with a bit shift and a multiplication which is exact.
Some examples how you can divide by 10, 60, ... like it is described in this blog to format time the fastest way possible.
temp = (ms * 205) >> 11; // 205/2048 is nearly the same as /10
to expand Alois's answer a bit, we can expand the suggested y = (x * 205) >> 11 for a few more multiples/shifts:
y = (ms * 1) >> 3 // first error 8
y = (ms * 2) >> 4 // 8
y = (ms * 4) >> 5 // 8
y = (ms * 7) >> 6 // 19
y = (ms * 13) >> 7 // 69
y = (ms * 26) >> 8 // 69
y = (ms * 52) >> 9 // 69
y = (ms * 103) >> 10 // 179
y = (ms * 205) >> 11 // 1029
y = (ms * 410) >> 12 // 1029
y = (ms * 820) >> 13 // 1029
y = (ms * 1639) >> 14 // 2739
y = (ms * 3277) >> 15 // 16389
y = (ms * 6554) >> 16 // 16389
y = (ms * 13108) >> 17 // 16389
y = (ms * 26215) >> 18 // 43699
y = (ms * 52429) >> 19 // 262149
y = (ms * 104858) >> 20 // 262149
y = (ms * 209716) >> 21 // 262149
y = (ms * 419431) >> 22 // 699059
y = (ms * 838861) >> 23 // 4194309
y = (ms * 1677722) >> 24 // 4194309
y = (ms * 3355444) >> 25 // 4194309
y = (ms * 6710887) >> 26 // 11184819
y = (ms * 13421773) >> 27 // 67108869
each line is a single, independent, calculation, and you'll see your first "error"/incorrect result at the value shown in the comment. you're generally better off taking the smallest shift for a given error value as this will minimise the extra bits needed to store the intermediate value in the calculation, e.g. (x * 13) >> 7 is "better" than (x * 52) >> 9 as it needs two less bits of overhead, while both start to give wrong answers above 68.
if you want to calculate more of these, the following (Python) code can be used:
def mul_from_shift(shift):
mid = 2**shift + 5.
return int(round(mid / 10.))
and I did the obvious thing for calculating when this approximation starts to go wrong with:
def first_err(mul, shift):
i = 1
while True:
y = (i * mul) >> shift
if y != i // 10:
return i
i += 1
(note that // is used for "integer" division, i.e. it truncates/rounds towards zero)
the reason for the "3/1" pattern in errors (i.e. 8 repeats 3 times followed by 9) seems to be due to the change in bases, i.e. log2(10) is ~3.32. if we plot the errors we get the following:
where the relative error is given by: mul_from_shift(shift) / (1<<shift) - 0.1
Considering Kuba Ober’s response, there is another one in the same vein.
It uses iterative approximation of the result, but I wouldn’t expect any surprising performances.
Let say we have to find x where x = v / 10.
We’ll use the inverse operation v = x * 10 because it has the nice property that when x = a + b, then x * 10 = a * 10 + b * 10.
Let use x as variable holding the best approximation of result so far. When the search ends, x Will hold the result. We’ll set each bit b of x from the most significant to the less significant, one by one, end compare (x + b) * 10 with v. If its smaller or equal to v, then the bit b is set in x. To test the next bit, we simply shift b one position to the right (divide by two).
We can avoid the multiplication by 10 by holding x * 10 and b * 10 in other variables.
This yields the following algorithm to divide v by 10.
uin16_t x = 0, x10 = 0, b = 0x1000, b10 = 0xA000;
while (b != 0) {
uint16_t t = x10 + b10;
if (t <= v) {
x10 = t;
x |= b;
}
b10 >>= 1;
b >>= 1;
}
// x = v / 10
Edit: to get the algorithm of Kuba Ober which avoids the need of variable x10 , we can subtract b10 from v and v10 instead. In this case x10 isn’t needed anymore. The algorithm becomes
uin16_t x = 0, b = 0x1000, b10 = 0xA000;
while (b != 0) {
if (b10 <= v) {
v -= b10;
x |= b;
}
b10 >>= 1;
b >>= 1;
}
// x = v / 10
The loop may be unwinded and the different values of b and b10 may be precomputed as constants.
On architectures that can only shift one place at a time, a series of explicit comparisons against decreasing powers of two multiplied by 10 might work better than the solution form hacker's delight. Assuming a 16 bit dividend:
uint16_t div10(uint16_t dividend) {
uint16_t quotient = 0;
#define div10_step(n) \
do { if (dividend >= (n*10)) { quotient += n; dividend -= n*10; } } while (0)
div10_step(0x1000);
div10_step(0x0800);
div10_step(0x0400);
div10_step(0x0200);
div10_step(0x0100);
div10_step(0x0080);
div10_step(0x0040);
div10_step(0x0020);
div10_step(0x0010);
div10_step(0x0008);
div10_step(0x0004);
div10_step(0x0002);
div10_step(0x0001);
#undef div10_step
if (dividend >= 5) ++quotient; // round the result (optional)
return quotient;
}
Well division is subtraction, so yes. Shift right by 1 (divide by 2). Now subtract 5 from the result, counting the number of times you do the subtraction until the value is less than 5. The result is number of subtractions you did. Oh, and dividing is probably going to be faster.
A hybrid strategy of shift right then divide by 5 using the normal division might get you a performance improvement if the logic in the divider doesn't already do this for you.
I've designed a new method in AVR assembly, with lsr/ror and sub/sbc only. It divides by 8, then sutracts the number divided by 64 and 128, then subtracts the 1,024th and the 2,048th, and so on and so on. Works very reliable (includes exact rounding) and quick (370 microseconds at 1 MHz).
The source code is here for 16-bit-numbers:
http://www.avr-asm-tutorial.net/avr_en/beginner/DIV10/div10_16rd.asm
The page that comments this source code is here:
http://www.avr-asm-tutorial.net/avr_en/beginner/DIV10/DIV10.html
I hope that it helps, even though the question is ten years old.
brgs, gsc
elemakil's comments' code can be found here: https://doc.lagout.org/security/Hackers%20Delight.pdf
page 233. "Unsigned divide by 10 [and 11.]"