I'm learning C++, and encountering these problems in a simple program, so please help me out.
This is the code
#include<iostream>
using std::cout;
int main()
{ float pie;
pie = (22/7);
cout<<"The Value of Pi(22/7) is "<< pie<<"\n";
return 0;
}
and the output is
The Value of Pi(22/7) is 3
Why is the value of Pi not in decimal?
That's because you're doing integer division.
What you want is really float division:
#include<iostream>
using std::cout;
int main()
{
float pie;
pie = float(22)/7;// 22/(float(7)) is also equivalent
cout<<"The Value of Pi(22/7) is "<< pie<<"\n";
return 0;
}
However, this type conversion: float(variable) or float(value) isn't type safe.
You could have gotten the value you wanted by ensuring that the values you were computing were floating point to begin with as follows:
22.0/7
OR
22/7.0
OR
22.0/7.0
But, that's generally a hassle and will involve that you keep track of all the types you're working with. Thus, the final and best method involves using static_cast:
static_cast<float>(22)/7
OR
22/static_cast<float>(7)
As for why you should use static_cast - see this:
Why use static_cast<int>(x) instead of (int)x?
pie = (22/7);
Here the division is integer division, because both operands are int.
What you intend to do is floating-point division:
pie = (22.0/7);
Here 22.0 is double, so the division becomes floating-point division (even though 7 is still int).
The rule is that IF both operands are integral type (such as int, long, char etc), then it is integer division, ELSE it is floating-point division (i.e when even if a single operand is float or double).
Use:
pi = 22/7.0
If u give the two operands to the / operator as integer then the division performed will be integer division and a float will not be the result.
Related
Are there any branch-less or similar hacks for clamping an integer to the interval of 0 to 255, or a double to the interval of 0.0 to 1.0? (Both ranges are meant to be closed, i.e. endpoints are inclusive.)
I'm using the obvious minimum-maximum check:
int value = (value < 0? 0 : value > 255? 255 : value);
but is there a way to get this faster -- similar to the "modulo" clamp value & 255? And is there a way to do similar things with floating points?
I'm looking for a portable solution, so preferably no CPU/GPU-specific stuff please.
This is a trick I use for clamping an int to a 0 to 255 range:
/**
* Clamps the input to a 0 to 255 range.
* #param v any int value
* #return {#code v < 0 ? 0 : v > 255 ? 255 : v}
*/
public static int clampTo8Bit(int v) {
// if out of range
if ((v & ~0xFF) != 0) {
// invert sign bit, shift to fill, then mask (generates 0 or 255)
v = ((~v) >> 31) & 0xFF;
}
return v;
}
That still has one branch, but a handy thing about it is that you can test whether any of several ints are out of range in one go by ORing them together, which makes things faster in the common case that all of them are in range. For example:
/** Packs four 8-bit values into a 32-bit value, with clamping. */
public static int ARGBclamped(int a, int r, int g, int b) {
if (((a | r | g | b) & ~0xFF) != 0) {
a = clampTo8Bit(a);
r = clampTo8Bit(r);
g = clampTo8Bit(g);
b = clampTo8Bit(b);
}
return (a << 24) + (r << 16) + (g << 8) + (b << 0);
}
Note that your compiler may already give you what you want if you code value = min (value, 255). This may be translated into a MIN instruction if it exists, or into a comparison followed by conditional move, such as the CMOVcc instruction on x86.
The following code assumes two's complement representation of integers, which is usually a given today. The conversion from Boolean to integer should not involve branching under the hood, as modern architectures either provide instructions that can directly be used to form the mask (e.g. SETcc on x86 and ISETcc on NVIDIA GPUs), or can apply predication or conditional moves. If all of those are lacking, the compiler may emit a branchless instruction sequence based on arithmetic right shift to construct a mask, along the lines of Boann's answer. However, there is some residual risk that the compiler could do the wrong thing, so when in doubt, it would be best to disassemble the generated binary to check.
int value, mask;
mask = 0 - (value > 255); // mask = all 1s if value > 255, all 0s otherwise
value = (255 & mask) | (value & ~mask);
On many architectures, use of the ternary operator ?: can also result in a branchless instruction sequences. The hardware may support select-type instructions which are essentially the hardware equivalent of the ternary operator, such as ICMP on NVIDIA GPUs. Or it provides CMOV (conditional move) as in x86, or predication as on ARM, both of which can be used to implement branch-less code for ternary operators. As in the previous case, one would want to examine the disassembled binary code to be absolutely sure the resulting code is without branches.
int value;
value = (value > 255) ? 255 : value;
In case of floating-point operands, modern floating-point units typically provide FMIN and FMAX instructions which map straight to the C/C++ standard math functions fmin() and fmax(). Alternatively fmin() and fmax() may be translated into a comparison followed by a conditional move. Again, it would be prudent to examine the generated code to make sure it is branchless.
double value;
value = fmax (fmin (value, 1.0), 0.0);
I use this thing, 100% branchless.
int clampU8(int val)
{
val &= (val<0)-1; // clamp < 0
val |= -(val>255); // clamp > 255
return val & 0xFF; // mask out
}
For those using C#, Kotlin or Java this is the best I could do, it's nice and succinct if somewhat cryptic:
(x & ~(x >> 31) | 255 - x >> 31) & 255
It only works on signed integers so that might be a blocker for some.
For clamping doubles, I'm afraid there's no language/platform agnostic solution.
The problem with floating point that they have options from fastest operations (MSVC /fp:fast, gcc -funsafe-math-optimizations) to fully precise and safe (MSVC /fp:strict, gcc -frounding-math -fsignaling-nans). In fully precise mode the compiler does not try to use any bit hacks, even if they could.
A solution that manipulates double bits cannot be portable. There may be different endianness, also there may be no (efficient) way to get double bits, double is not necessarily IEEE 754 binary64 after all. Plus direct manipulations will not cause signals for signaling NANs, when they are expected.
For integers most likely the compiler will do it right anyway, otherwise there are already good answers given.
I need to implement but I am not sure how can I as I am completely new into this. A function called get_values that has the prototype:
void get_values(unsigned int value, unsigned int *p_lsb, unsigned int *p_msb,
unsigned int *p_combined)
The function computes the least significant byte and the most significant byte of the value
parameter. In addition, both values are combined. For this problem:
a. You may not use any loop constructs.
b. You may not use the multiplication operator (* or *=).
c. Your code must work for unsigned integers of any size (4 bytes, 8 bytes, etc.).
d. To combine the values, append the least significant byte to the most significant one.
e. Your implementation should be efficient.
The following driver (and associated output) provides an example of using the function you are
expected to write. Notice that in this example an unsigned int is 4 bytes, but your function
needs to work with an unsigned int of any size.
Driver
int main() {
unsigned int value = 0xabcdfaec, lsb, msb, combined;
get_values(value, &lsb, &msb, &combined);
printf("Value: %x, lsb: %x, msb: %x, combined: %x\n", value, lsb, msb, combined);
return 0;
}
Output
Value: abcdfaec, lsb: ec, msb: ab, combined: abec
I think you want to look into bitwise and and bit shifting operators. The last piece of the puzzle might be the sizeof() operator if the question is asking that the code should work with platforms with different sized int types.
I have a string like that "2.1648797E -05" and I need to format it to convert "0.00021648797"
Is there any solution to do this conversion
try to use double or long long
cout << setiosflags(ios::fixed) << thefloat << endl;
An important characteristic of floating point is that they do not have precision associated with all the significant figures back to the decimal point for large values. The "scientific" display reasonably reflects the inherent internal storage realities.
In C++ you can use std::stringstream First print the number, then read it as double and then print it using format specifiers to set the accuracy of the number to 12 digits. Take a look at this question for how to print decimal number with fixed precision.
If you are really just going from string representation to string representation and precision is very important or values may leave the valid range for doubles then I would avoid converting to a double.
Your value may get altered by that due to precision errors or range problems.
Try writing a simple text parser. Roughly like that:
Read the digits, omitting the decimal point up to the 'E' but store the decimal point position.
After the 'E' read the exponent as a number and add that to your stored decimal position.
Then output the digits again properly appending zeros at beginning or end and inserting the decimal point.
There are unclear issues here
1. Was the space in "2.1648797E -05" intended, let's assume it is OK.
2. 2.1648797E-05 is 10 times smaller than 0.00021648797. Assume OP meant "0.000021648797" (another zero).
3. Windows is not tagged, but OP posted a Windows answer.
The major challenge here, and I think is the OP's core question is that std::precision() has different meanings in fixed versus default and the OP wants the default meaning in fixed.
Precision field differs between fixed and default floating-point notation. On default, the precision field specifies the maximum number of useful digits to display both before and after the decimal point, possible using scientific notation, while in fixed, the precision field specifies exactly how many digits to display after the decimal point.
2 approaches to solve this: Change the input string to a number and then output the number in the new fixed space format - that is presented below. 2nd method is to parse the input string and form the new format - not done here.
#include <iostream>
#include <iomanip>
#include <string>
#include <sstream>
#include <cmath>
#include <cfloat>
double ConvertStringWithSpaceToDouble(std::string s) {
// Get rid of pesky space in "2.1648797E -05"
s.erase (std::remove (s.begin(), s.end(), ' '), s.end());
std::istringstream i(s);
double x;
if (!(i >> x)) {
x = 0; // handle error;
}
std::cout << x << std::endl;
return x;
}
std::string ConvertDoubleToString(double x) {
std::ostringstream s;
double fraction = fabs(modf(x, &x));
s.precision(0);
s.setf(std::ios::fixed);
// stream whole number part
s << x << '.';
// Threshold becomes non-zero once a non-zero digit found.
// Its level increases with each additional digit streamed to prevent excess trailing zeros.
double threshold = 0.0;
while (fraction > threshold) {
double digit;
fraction = modf(fraction*10, &digit);
s << digit;
if (threshold) {
threshold *= 10.0;
}
else if (digit > 0) {
// Use DBL_DIG to define number of interesting digits
threshold = pow(10, -DBL_DIG);
}
}
return s.str();
}
int main(int argc, char* argv[]){
std::string s("2.1648797E -05");
double x = ConvertStringWithSpaceToDouble(s);
s = ConvertDoubleToString(x);
std::cout << s << std::endl;
return 0;
}
thanks guys and i fix it using :
Decimal dec = Decimal.Parse(str, System.Globalization.NumberStyles.Any);
I have wrote an OpenCL kernel that is using the opencl-opengl interoperability to read vertices and indices, but probably this is not even important because I am just doing simple pointer addition in order to get a specific vertex by index.
uint pos = (index + base)*stride;
Here i am calculating the absolute position in bytes, in my example pos is 28,643,328 with a stride of 28, index = 0 and base = 1,022,976. Well, that seems correct.
Unfortunately, I cant use vload3 directly because the offset parameter isn't calculated as an absolute address in bytes. So I just add pos to the pointer void* vertices_gl
void* new_addr = vertices_gl+pos;
new_addr is in my example = 0x2f90000 and this is where the strange part begins,
vertices_gl = 0x303f000
The result (new_addr) should be 0x4B90000 (0x303f000 + 28,643,328)
I dont understand why the address vertices_gl is getting decreased by 716,800 (0xAF000)
I'm targeting the GPU: AMD Radeon HD5830
Ps: for those wondering, I am using a printf to get these values :) ( couldn't get CodeXL working)
There is no pointer arithmetic for void* pointers. Use char* pointers to perform byte-wise pointer computations.
Or a lot better than that: Use the real type the pointer is pointing to, and don't multiply offsets. Simply write vertex[index+base] assuming vertex points to your type containing 28 bytes of data.
Performance consideration: Align your vertex attributes to a power of two for coalesced memory access. This means, add 4 bytes of padding after each vertex entry. To automatically do this, use float8 as the vertex type if your attributes are all floating point values. I assume you work with position and normal data or something similar, so it might be a good idea to write a custom struct which encapsulates both vectors in a convenient and self-explaining way:
// Defining a type for the vertex data. This is 32 bytes large.
// You can share this code in a header for inclusion in both OpenCL and C / C++!
typedef struct {
float4 pos;
float4 normal;
} VertexData;
// Example kernel
__kernel void computeNormalKernel(__global VertexData *vertex, uint base) {
uint index = get_global_id(0);
VertexData thisVertex = vertex[index+base]; // It can't be simpler!
thisVertex.normal = computeNormal(...); // Like you'd do it in C / C++!
vertex[index+base] = thisVertex; // Of couse also when writing
}
Note: This code doesn't work with your stride of 28 if you just change one of the float4s to a float3, since float3 also consumes 4 floats of memory. But you can write it like this, which will not add padding (but note that this will penalize memory access bandwidth):
typedef struct {
float pos[4];
float normal[3]; // Assuming you want 3 floats here
} VertexData;
I saw somewhere that this is a special case and that +NaN goes from 0x7F800001 to 0x7FFFFFFF. Is the answer +NaN?
If you interpret 7FFFFFFF as an IEEE754 32-bit float then yes, 7FFFFFFF is NaN. You can understand these things from looking at the Wikipedia page for Single-precision floating-point format. I wrote this little C program to illustrate the point:
#include <stdio.h>
int main(){
unsigned u0 = 0x7FFFFFFF;
unsigned u1 = 0x7F800001;
unsigned u2 = 0x7F800000;
unsigned u3 = 0x7F7FFFFF;
// *(float*)&u0 causes the data stored in u0 to be interpreted as a float
printf("%e\n", *(float*)&u0); // This gives nan
printf("%e\n", *(float*)&u1); // This also gives nan
printf("%e\n", *(float*)&u2); // This gives inf
printf("%e\n", *(float*)&u3); // This gives 3.402823e+38, the largest possible IEEE754 32-bit float
// The above code only works because sizeof(unsigned)==sizeof(float)
printf("%u\t%u\n", sizeof(unsigned), sizeof(float));
// Remember that nan is only for floats, u0 is a perfectly valid unsigned.
printf("%u\n", u0); // This gives 2147483647
}
Again, it has to be mentioned that NaN only exists as a floating point number.
+NaN is a special value for floating point numbers (And it has no decimal equivalent. It's "Not a Number").
If you just want the decimal representation of the integer, which has 7FFFFFFF as hexadecimal representation, there's no floating point involved, and no +NaN