I'm continuing with a port of GeographicLib into F#, and I'm wondering about the use of the error-free sum. In the C++ codebase, it is defined as
/**
* The error-free sum of two numbers.
*
* #tparam T the type of the argument and the returned value.
* #param[in] u
* #param[in] v
* #param[out] t the exact error given by (\e u + \e v) - \e s.
* #return \e s = round(\e u + \e v).
*
* See D. E. Knuth, TAOCP, Vol 2, 4.2.2, Theorem B. (Note that \e t can be
* the same as one of the first two arguments.)
**********************************************************************/
template<typename T> static inline T sum(T u, T v, T& t) {
GEOGRAPHICLIB_VOLATILE T s = u + v;
GEOGRAPHICLIB_VOLATILE T up = s - v;
GEOGRAPHICLIB_VOLATILE T vpp = s - up;
up -= u;
vpp -= v;
t = -(up + vpp);
// u + v = s + t
// = round(u + v) + t
return s;
}
For reference, the GEOGRAPHICLIB_VOLATILE constant gives you the ability to use the volatile keyword or not to use it, depending on your architecture and precision.
I must confess, I haven't read the Knuth book, and I don't have a copy to hand, so I can't read Theorem B, but the description is clear enough: the function is designed to return the sum and the floating point error of the sum.
My question is: are there any optimisations in the .NET Framework that I should know about before I implement this directly? Does F# manipulate floating point numbers in the same way as C++?
Related
I was trying to figure this out for quite some time already, but I can't get it quite right. What I want to do is to round a float towards the nearest integer, based on a different float.
I basically need a function that should work like this:
float roundParam(float val, float dir)
{
if (dir >= 0)
return ceil(val);
else
return floor(val);
}
This is of course VERY inefficient, as it requires a branch per vector component. I figured this out, but it breaks for integers:
float roundParam(float val, float dir)
{
return round(val + 0.5 * sign(dir));
}
Thanks to #wim and his observation that floor(x) = -ceil(-x) and ceil(x) = -floor(-x) I was able to create this function that solved the problem:
float3 roundParam(float3 val, float3 dir)
{
float3 dirSign = sign(dir);
return dirSign * floor(dirSign * val) + dirSign;
}
In C you can use the following well vectorizable function. Maybe you can use the same idea in hlsl. This solution is only suitable if you don't care about the difference between +0 and -0 (signed zero) for dir.
float roundParam_v2(float val, float dir)
{
union fl_i32{float f; int i;} x, y, d;
x.f = val;
d.f = dir;
d.i = d.i & 0x80000000; /* extract the sign bit */
x.i = x.i ^ d.i; /* multiply x 1.0f if signbit is set */
y.f = ceilf(x.f); /* note that floor(z) = - ceil( -z) */
y.i = y.i ^ d.i; /* multiply x 1.0f if signbit is set */
return y.f;
}
How about:
float roundParam(float val, float dir)
{
return ceil(val)*(float)(dir>=0)+floor(val)*(float)(dir<0);
}
It can be probably be further optimized, but that optimization is probably already made by the compiler.
Btw, if you add the [flatten] tag to the if conditional, it probably already gets optimized by the compiler. And for such a simple branch, it is most probably already flattened by the compiler whether you tag it or not.
It would be interesting to check the compiled code and see if the branch has already been removed. I’m currently afk so I cannot check...
Perhaps use an array of pointers selected by dir?
Below is C. Unclear how usable such an approach is in hsl, shader
float roundParam(float val, float dir) {
static float (*f[2])(float) = {ceilf, floorf};
return f[!!signbit(dir)](val);
}
Are there any branch-less or similar hacks for clamping an integer to the interval of 0 to 255, or a double to the interval of 0.0 to 1.0? (Both ranges are meant to be closed, i.e. endpoints are inclusive.)
I'm using the obvious minimum-maximum check:
int value = (value < 0? 0 : value > 255? 255 : value);
but is there a way to get this faster -- similar to the "modulo" clamp value & 255? And is there a way to do similar things with floating points?
I'm looking for a portable solution, so preferably no CPU/GPU-specific stuff please.
This is a trick I use for clamping an int to a 0 to 255 range:
/**
* Clamps the input to a 0 to 255 range.
* #param v any int value
* #return {#code v < 0 ? 0 : v > 255 ? 255 : v}
*/
public static int clampTo8Bit(int v) {
// if out of range
if ((v & ~0xFF) != 0) {
// invert sign bit, shift to fill, then mask (generates 0 or 255)
v = ((~v) >> 31) & 0xFF;
}
return v;
}
That still has one branch, but a handy thing about it is that you can test whether any of several ints are out of range in one go by ORing them together, which makes things faster in the common case that all of them are in range. For example:
/** Packs four 8-bit values into a 32-bit value, with clamping. */
public static int ARGBclamped(int a, int r, int g, int b) {
if (((a | r | g | b) & ~0xFF) != 0) {
a = clampTo8Bit(a);
r = clampTo8Bit(r);
g = clampTo8Bit(g);
b = clampTo8Bit(b);
}
return (a << 24) + (r << 16) + (g << 8) + (b << 0);
}
Note that your compiler may already give you what you want if you code value = min (value, 255). This may be translated into a MIN instruction if it exists, or into a comparison followed by conditional move, such as the CMOVcc instruction on x86.
The following code assumes two's complement representation of integers, which is usually a given today. The conversion from Boolean to integer should not involve branching under the hood, as modern architectures either provide instructions that can directly be used to form the mask (e.g. SETcc on x86 and ISETcc on NVIDIA GPUs), or can apply predication or conditional moves. If all of those are lacking, the compiler may emit a branchless instruction sequence based on arithmetic right shift to construct a mask, along the lines of Boann's answer. However, there is some residual risk that the compiler could do the wrong thing, so when in doubt, it would be best to disassemble the generated binary to check.
int value, mask;
mask = 0 - (value > 255); // mask = all 1s if value > 255, all 0s otherwise
value = (255 & mask) | (value & ~mask);
On many architectures, use of the ternary operator ?: can also result in a branchless instruction sequences. The hardware may support select-type instructions which are essentially the hardware equivalent of the ternary operator, such as ICMP on NVIDIA GPUs. Or it provides CMOV (conditional move) as in x86, or predication as on ARM, both of which can be used to implement branch-less code for ternary operators. As in the previous case, one would want to examine the disassembled binary code to be absolutely sure the resulting code is without branches.
int value;
value = (value > 255) ? 255 : value;
In case of floating-point operands, modern floating-point units typically provide FMIN and FMAX instructions which map straight to the C/C++ standard math functions fmin() and fmax(). Alternatively fmin() and fmax() may be translated into a comparison followed by a conditional move. Again, it would be prudent to examine the generated code to make sure it is branchless.
double value;
value = fmax (fmin (value, 1.0), 0.0);
I use this thing, 100% branchless.
int clampU8(int val)
{
val &= (val<0)-1; // clamp < 0
val |= -(val>255); // clamp > 255
return val & 0xFF; // mask out
}
For those using C#, Kotlin or Java this is the best I could do, it's nice and succinct if somewhat cryptic:
(x & ~(x >> 31) | 255 - x >> 31) & 255
It only works on signed integers so that might be a blocker for some.
For clamping doubles, I'm afraid there's no language/platform agnostic solution.
The problem with floating point that they have options from fastest operations (MSVC /fp:fast, gcc -funsafe-math-optimizations) to fully precise and safe (MSVC /fp:strict, gcc -frounding-math -fsignaling-nans). In fully precise mode the compiler does not try to use any bit hacks, even if they could.
A solution that manipulates double bits cannot be portable. There may be different endianness, also there may be no (efficient) way to get double bits, double is not necessarily IEEE 754 binary64 after all. Plus direct manipulations will not cause signals for signaling NANs, when they are expected.
For integers most likely the compiler will do it right anyway, otherwise there are already good answers given.
I want to generate something like VisualHostKey for a SHA checksum. But it should work with any hexadecimal checksum.
The generated artifact could be an ASCII art, a 2D colour palette, or just some random garbage in a PNG. Personally I like the VisualHostKey approach but I am open for suggestions.
The idea is to be able to quickly identify that two checksums are the same using just the human eye. And when faced with a bunch of sums, quickly find the one you are looking for.
You could just use the actual OpenSSH VisualHostKey code, which is found in the key_fingerprint_randomart() function in the key.c file in the OpenSSH source code. The algorithm is fairly simple, and can take any array of bytes as input. In OpenSSH, the input is a cryptographic hash of the key; you could do the same.
(As defined in the OpenSSH source code, the function also takes a pointer to the key structure itself, but that's only used to print the type and size of the key at the top of the picture.)
In fact, since the code is freely licensed, let me just include a copy here. This is extracted from OpenSSH 6.1, $OpenBSD: key.c,v 1.99 2012/05/23 03:28:28 djm Exp $:
/*
* Copyright (c) 2000, 2001 Markus Friedl. All rights reserved.
* Copyright (c) 2008 Alexander von Gernler. All rights reserved.
*
* Redistribution and use in source and binary forms, with or without
* modification, are permitted provided that the following conditions
* are met:
* 1. Redistributions of source code must retain the above copyright
* notice, this list of conditions and the following disclaimer.
* 2. Redistributions in binary form must reproduce the above copyright
* notice, this list of conditions and the following disclaimer in the
* documentation and/or other materials provided with the distribution.
*
* THIS SOFTWARE IS PROVIDED BY THE AUTHOR ``AS IS'' AND ANY EXPRESS OR
* IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES
* OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED.
* IN NO EVENT SHALL THE AUTHOR BE LIABLE FOR ANY DIRECT, INDIRECT,
* INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT
* NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
* DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
* THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
* (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF
* THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
*/
/*
* Draw an ASCII-Art representing the fingerprint so human brain can
* profit from its built-in pattern recognition ability.
* This technique is called "random art" and can be found in some
* scientific publications like this original paper:
*
* "Hash Visualization: a New Technique to improve Real-World Security",
* Perrig A. and Song D., 1999, International Workshop on Cryptographic
* Techniques and E-Commerce (CrypTEC '99)
* sparrow.ece.cmu.edu/~adrian/projects/validation/validation.pdf
*
* The subject came up in a talk by Dan Kaminsky, too.
*
* If you see the picture is different, the key is different.
* If the picture looks the same, you still know nothing.
*
* The algorithm used here is a worm crawling over a discrete plane,
* leaving a trace (augmenting the field) everywhere it goes.
* Movement is taken from dgst_raw 2bit-wise. Bumping into walls
* makes the respective movement vector be ignored for this turn.
* Graphs are not unambiguous, because circles in graphs can be
* walked in either direction.
*/
/*
* Field sizes for the random art. Have to be odd, so the starting point
* can be in the exact middle of the picture, and FLDBASE should be >=8 .
* Else pictures would be too dense, and drawing the frame would
* fail, too, because the key type would not fit in anymore.
*/
#define FLDBASE 8
#define FLDSIZE_Y (FLDBASE + 1)
#define FLDSIZE_X (FLDBASE * 2 + 1)
static char *
key_fingerprint_randomart(u_char *dgst_raw, u_int dgst_raw_len, const Key *k)
{
/*
* Chars to be used after each other every time the worm
* intersects with itself. Matter of taste.
*/
char *augmentation_string = " .o+=*BOX#%&#/^SE";
char *retval, *p;
u_char field[FLDSIZE_X][FLDSIZE_Y];
u_int i, b;
int x, y;
size_t len = strlen(augmentation_string) - 1;
retval = xcalloc(1, (FLDSIZE_X + 3) * (FLDSIZE_Y + 2));
/* initialize field */
memset(field, 0, FLDSIZE_X * FLDSIZE_Y * sizeof(char));
x = FLDSIZE_X / 2;
y = FLDSIZE_Y / 2;
/* process raw key */
for (i = 0; i < dgst_raw_len; i++) {
int input;
/* each byte conveys four 2-bit move commands */
input = dgst_raw[i];
for (b = 0; b < 4; b++) {
/* evaluate 2 bit, rest is shifted later */
x += (input & 0x1) ? 1 : -1;
y += (input & 0x2) ? 1 : -1;
/* assure we are still in bounds */
x = MAX(x, 0);
y = MAX(y, 0);
x = MIN(x, FLDSIZE_X - 1);
y = MIN(y, FLDSIZE_Y - 1);
/* augment the field */
if (field[x][y] < len - 2)
field[x][y]++;
input = input >> 2;
}
}
/* mark starting point and end point*/
field[FLDSIZE_X / 2][FLDSIZE_Y / 2] = len - 1;
field[x][y] = len;
/* fill in retval */
snprintf(retval, FLDSIZE_X, "+--[%4s %4u]", key_type(k), key_size(k));
p = strchr(retval, '\0');
/* output upper border */
for (i = p - retval - 1; i < FLDSIZE_X; i++)
*p++ = '-';
*p++ = '+';
*p++ = '\n';
/* output content */
for (y = 0; y < FLDSIZE_Y; y++) {
*p++ = '|';
for (x = 0; x < FLDSIZE_X; x++)
*p++ = augmentation_string[MIN(field[x][y], len)];
*p++ = '|';
*p++ = '\n';
}
/* output lower border */
*p++ = '+';
for (i = 0; i < FLDSIZE_X; i++)
*p++ = '-';
*p++ = '+';
return retval;
}
It doesn't seem to have an significant dependencies on the rest of the OpenSSH code, aside from the const Key *k parameter, which is only used on one line as an argument to the key_type() and key_size() functions (or macros?). The nonstandard types u_char and u_int appear to be just aliases for unsigned char and unsigned int respectively, and the xcalloc() function seems to be just a replacement or wrapper for the standard calloc().
Generally this is accomplished by making an image generation function of some kind which accepts a seed. You then hash some data and then seed your image generator with the result. This will prevent it from making random garbage in a PNG and will give you something distinguishable.
We are using MySQL and developing an application where we'd like the ID sequence not to be publicly visible... the IDs are hardly top secret and there is no significant issue if someone indeed was able to decode them.
So, a hash is of course the obvious solution, we are currently using MD5... 32bit integers go in, and we trim the MD5 to 64bits and then store that. However, we have no idea how likely collisions are when you trim like this (especially since all numbers come from autoincrement or the current time). We currently check for collisions, but since we may be inserting 100.000 rows at once the performance is terrible (can't bulk insert).
But in the end, we really don't need the security offered by the hashes and they consume unnecessary space and also require an additional index... so, is there any simple and good enough function/algorithm out there that guarantees one-to-one mapping for any number without obvious visual patterns for sequential numbers?
EDIT: I'm using PHP which does not support integer arithmetic by default, but after looking around I found that it could be cheaply replicated with bitwise operators. Code for 32bit integer multiplication can be found here: http://pastebin.com/np28xhQF
You could simply XOR with 0xDEADBEEF, if that's good enough.
Alternatively multiply by an odd number mod 2^32. For the inverse mapping just multiply by the multiplicative inverse
Example: n = 2345678901; multiplicative inverse (mod 2^32): 2313902621
For the mapping just multiply by 2345678901 (mod 2^32):
1 --> 2345678901
2 --> 396390506
For the inverse mapping, multiply by 2313902621.
If you want to ensure a 1:1 mapping then use an encryption (i.e. a permutation), not a hash. Encryption has to be 1:1 because it can be decrypted.
If you want 32 bit numbers then use Hasty Pudding Cypher or just write a simple four round Feistel cypher.
Here's one I prepared earlier:
import java.util.Random;
/**
* IntegerPerm is a reversible keyed permutation of the integers.
* This class is not cryptographically secure as the F function
* is too simple and there are not enough rounds.
*
* #author Martin Ross
*/
public final class IntegerPerm {
//////////////////
// Private Data //
//////////////////
/** Non-zero default key, from www.random.org */
private final static int DEFAULT_KEY = 0x6CFB18E2;
private final static int LOW_16_MASK = 0xFFFF;
private final static int HALF_SHIFT = 16;
private final static int NUM_ROUNDS = 4;
/** Permutation key */
private int mKey;
/** Round key schedule */
private int[] mRoundKeys = new int[NUM_ROUNDS];
//////////////////
// Constructors //
//////////////////
public IntegerPerm() { this(DEFAULT_KEY); }
public IntegerPerm(int key) { setKey(key); }
////////////////////
// Public Methods //
////////////////////
/** Sets a new value for the key and key schedule. */
public void setKey(int newKey) {
assert (NUM_ROUNDS == 4) : "NUM_ROUNDS is not 4";
mKey = newKey;
mRoundKeys[0] = mKey & LOW_16_MASK;
mRoundKeys[1] = ~(mKey & LOW_16_MASK);
mRoundKeys[2] = mKey >>> HALF_SHIFT;
mRoundKeys[3] = ~(mKey >>> HALF_SHIFT);
} // end setKey()
/** Returns the current value of the key. */
public int getKey() { return mKey; }
/**
* Calculates the enciphered (i.e. permuted) value of the given integer
* under the current key.
*
* #param plain the integer to encipher.
*
* #return the enciphered (permuted) value.
*/
public int encipher(int plain) {
// 1 Split into two halves.
int rhs = plain & LOW_16_MASK;
int lhs = plain >>> HALF_SHIFT;
// 2 Do NUM_ROUNDS simple Feistel rounds.
for (int i = 0; i < NUM_ROUNDS; ++i) {
if (i > 0) {
// Swap lhs <-> rhs
final int temp = lhs;
lhs = rhs;
rhs = temp;
} // end if
// Apply Feistel round function F().
rhs ^= F(lhs, i);
} // end for
// 3 Recombine the two halves and return.
return (lhs << HALF_SHIFT) + (rhs & LOW_16_MASK);
} // end encipher()
/**
* Calculates the deciphered (i.e. inverse permuted) value of the given
* integer under the current key.
*
* #param cypher the integer to decipher.
*
* #return the deciphered (inverse permuted) value.
*/
public int decipher(int cypher) {
// 1 Split into two halves.
int rhs = cypher & LOW_16_MASK;
int lhs = cypher >>> HALF_SHIFT;
// 2 Do NUM_ROUNDS simple Feistel rounds.
for (int i = 0; i < NUM_ROUNDS; ++i) {
if (i > 0) {
// Swap lhs <-> rhs
final int temp = lhs;
lhs = rhs;
rhs = temp;
} // end if
// Apply Feistel round function F().
rhs ^= F(lhs, NUM_ROUNDS - 1 - i);
} // end for
// 4 Recombine the two halves and return.
return (lhs << HALF_SHIFT) + (rhs & LOW_16_MASK);
} // end decipher()
/////////////////////
// Private Methods //
/////////////////////
// The F function for the Feistel rounds.
private int F(int num, int round) {
// XOR with round key.
num ^= mRoundKeys[round];
// Square, then XOR the high and low parts.
num *= num;
return (num >>> HALF_SHIFT) ^ (num & LOW_16_MASK);
} // end F()
} // end class IntegerPerm
Do what Henrik said in his second suggestion. But since these values seem to be used by people (else you wouldn't want to randomize them). Take one additional step. Multiply the sequential number by a large prime and reduce mod N where N is a power of 2. But choose N to be 2 bits smaller than you can store. Next, multiply the result by 11 and use that. So we have:
Hash = ((count * large_prime) % 536870912) * 11
The multiplication by 11 protects against most data entry errors - if any digit is typed wrong, the result will not be a multiple of 11. If any 2 digits are transposed, the result will not be a multiple of 11. So as a preliminary check of any value entered, you check if it's divisible by 11 before even looking in the database.
You can use mod operation for big prime number.
your number * big prime number 1 / big prime number 2.
Prime number 1 should be bigger than second. Seconds should be close to 2^32 but less than it. Than it will be hard to substitute.
Prime 1 and Prime 2 should be constants.
For our application, we use bit shuffle to generate the ID. It is very easy to reverse back to the original ID.
func (m Meeting) MeetingCode() uint {
hashed := (m.ID + 10000000) & 0x00FFFFFF
chunks := [24]uint{}
for i := 0; i < 24; i++ {
chunks[i] = hashed >> i & 0x1
}
shuffle := [24]uint{14, 1, 15, 21, 0, 6, 5, 10, 4, 3, 20, 22, 2, 23, 8, 13, 19, 9, 18, 12, 7, 11, 16, 17}
result := uint(0)
for i := 0; i < 24; i++ {
result = result | (chunks[shuffle[i]] << i)
}
return result
}
There is an exceedingly simple solution that none have posted, even though an answer has been selected I highly advise any visiting this question to consider the nature of binary representations, and the application of modulos arithmetic.
Given an finite range of integers, all the values can be permuted in any order through a simple addition over their index while bound by the range of the index through a modulos. You could even leverage simple integer overflow such that using the modulos operator is not even necessary.
Essentially, you'd have a static variable in memory, where a function when called increments the static variable by some constant, enforces the boundaries, and then returns the value. This output could be an index over a collection of desired outputs, or the desired output itself
The constant of the increment that defines the mapping may be several times the size in memory of the value being returned, but given any mapping there exists some finite constant that will achieve the mapping through a trivial modulos arithmetic.
I want to calculate the volume of a 3D mesh object having a surface made up triangles.
Reading this paper, it is actually a pretty simple calculation.
The trick is to calculate the signed volume of a tetrahedron - based on your triangle and topped off at the origin. The sign of the volume comes from whether your triangle is pointing in the direction of the origin. (The normal of the triangle is itself dependent upon the order of your vertices, which is why you don't see it explicitly referenced below.)
This all boils down to the following simple function:
public float SignedVolumeOfTriangle(Vector p1, Vector p2, Vector p3) {
var v321 = p3.X*p2.Y*p1.Z;
var v231 = p2.X*p3.Y*p1.Z;
var v312 = p3.X*p1.Y*p2.Z;
var v132 = p1.X*p3.Y*p2.Z;
var v213 = p2.X*p1.Y*p3.Z;
var v123 = p1.X*p2.Y*p3.Z;
return (1.0f/6.0f)*(-v321 + v231 + v312 - v132 - v213 + v123);
}
and then a driver to calculate the volume of the mesh:
public float VolumeOfMesh(Mesh mesh) {
var vols = from t in mesh.Triangles
select SignedVolumeOfTriangle(t.P1, t.P2, t.P3);
return Math.Abs(vols.Sum());
}
Yip Frank Kruegers answer works well +1 for that. If you have vector functions available to you you could use this too:
public static float SignedVolumeOfTriangle(Vector p1, Vector p2, Vector p3)
{
return p1.Dot(p2.Cross(p3)) / 6.0f;
}
edit .. added impl. for Dot() and Cross() if you are unsure. Most Math libs will have these. If you are using WPF they are implemented as static methods of the Vector3D class.
public class Vector
{
...
public float Dot(Vector a)
{
return this.X * a.X + this.Y * a.Y + this.Z * a.Z;
}
public Vector Cross(Vector a)
{
return new Vector(
this.Y * a.Z - this.Z * a.Y,
this.Z * a.X - this.X * a.Z,
this.X * a.Y - this.Y * a.X
);
}
...
}
The GNU Triangulated Surface Library can do this for you. Keep in mind that the surface must be closed. That is not going to be the case for quite a few 3D models.
If you want to implement it yourself, you could start by taking a look at their code.
The method above is correct for "simple" objects (no intersecting/overlapping triangles) like spheres tetrahedras and so on. For more complex shapes, a good idea could be to segment the mesh (close it) and calculate the volume of each segment separately.
Hope this helps.