number squared in programming - math

I know this is probably a very simple question but how would I do something like
n2 in a programming language?
Is it n * n? Or is there another way?

n * n is the easiest way.
For languages that support the exponentiation operator (** in this example), you can also do n ** 2
Otherwise you could use a Math library to call a function such as pow(n, 2) but that is probably overkill for simply squaring a number.

n * n will almost always work -- the couple cases where it won't work are in prefix languages (Lisp, Scheme, and co.) or postfix languages (Forth, Factor, bc, dc); but obviously then you can just write (* n n) or n n* respectively.
It will also fail when there is an overflow case:
#include <limits.h>
#include <stdio.h>
int main()
{
volatile int x = INT_MAX;
printf("INT_MAX squared: %d\n", x * x);
return 0;
}
I threw the volatile quantifier on there just to point out that this can be compiled with -Wall and not raise any warnings, but on my 32-bit computer this says that INT_MAX squared is 1.
Depending on the language, you might have a power function such as pow(n, 2) in C, or math.pow(n, 2) in Python... Since those power functions cast to floating-point numbers, they are more useful in cases where overflow is possible.

There are many programming languages, each with their own way of expressing math operations.
Some common ones will be:
x*x
pow(x,2)
x^2
x ** 2
square(x)
(* x x)
If you specify a specific language, we can give you more guidance.

If n is an integer :p :
int res=0;
for(int i=0; i<n; i++)
res+=n; //res=n+n+...+n=n*n
For positive integers you may use recursion:
int square(int n){
if (n>1)
return square(n-1)+(n-1)+n;
else
return 1;
}
Calculate using array allocation (extremely sub-optimal):
#include <iostream>
using namespace std;
int heapSquare(int n){
return sizeof(char[n][n]);
}
int main(){
for(int i=1; i<=10; i++)
cout << heapSquare(i) << endl;
return 0;
}
Using bit shift (ancient Egyptian multiplication):
int sqr(int x){
int i=0;
int result = 0;
for (;i<32;i++)
if (x>>i & 0x1)
result+=x << i;
return result;
}
Assembly:
int x = 10;
_asm_ __volatile__("imul %%eax,%%eax"
:"=a"(x)
:"a"(x)
);
printf("x*x=%d\n", x);

Always use the language's multiplication, unless the language has an explicit square function. Specifically avoid using the pow function provided by most math libraries. Multiplication will (except in the most outrageous of circumstances) always be faster, and -- if your platform conforms to the IEEE-754 specification, which most platforms do -- will deliver a correctly-rounded result. In many languages, there is no standard governing the accuracy of the pow function. It will generally give a high-quality result for such a simple case (many library implementations will special-case squaring to save programmers from themselves), but you don't want to depend on this[1].
I see a tremendous amount of C/C++ code where developers have written:
double result = pow(someComplicatedExpression, 2);
presumably to avoid typing that complicated expression twice or because they think it will somehow slow down their code to use a temporary variable. It won't. Compilers are very, very good at optimizing this sort of thing. Instead, write:
const double myTemporaryVariable = someComplicatedExpression;
double result = myTemporaryVariable * myTemporaryVariable;
To sum up: Use multiplication. It will always be at least as fast and at least as accurate as anything else you can do[2].
1) Recent compilers on mainstream platforms can optimize pow(x,2) into x*x when the language semantics allow it. However, not all compilers do this at all optimization settings, which is a recipe for hard to debug rounding errors. Better not to depend on it.
2) For basic types. If you really want to get into it, if multiplication needs to be implemented in software for the type that you are working with, there are ways to make a squaring operation that is faster than multiplication. You will almost never find yourself in a situation where this matters, however.

Related

how recursion processing under the hood

It should be fundamental question about recursion.
Simple code:
func fact(n int) int {
if n == 0 {
return 1
}
return n * fact(n-1)
}
how the line n * fact(n-1) will be processing under the hood by general programming languages, C++, Java, Go...
In my understanding the line n * fact(n-1) will create expression on the fly like
n * n-1 * n-2. ... So executable program will prepare expression according to incoming functional parameter. Also how will be processing simple recursion and tail recursion under the hood. Could you add more details, any useful docs.
Thanks.
You can use godbolt.org to see what's happening "under the hood" for C++ and Go. (As well as a few other languages.)
If you modify your algorithm to one of the languages (such as C++), godbolt will show you the assembly language that is generated. You can't get much more "under the hood" than knowing whats happening with the registers and how it branches in assembly.
Of course, it requires you to understand assembly. But your example is actually quite a simple one.
Here is a quick C++ example (of your code) you can paste into godbolt:
int fact(int n);
int main()
{
fact(5);
}
int fact(int n)
{
if (n == 0)
{
return 1;
}
return n * fact(n-1);
}
Hope that gives you new insights into what is going on behind the scenes.

Hacks for clamping integer to 0-255 and doubles to 0.0-1.0?

Are there any branch-less or similar hacks for clamping an integer to the interval of 0 to 255, or a double to the interval of 0.0 to 1.0? (Both ranges are meant to be closed, i.e. endpoints are inclusive.)
I'm using the obvious minimum-maximum check:
int value = (value < 0? 0 : value > 255? 255 : value);
but is there a way to get this faster -- similar to the "modulo" clamp value & 255? And is there a way to do similar things with floating points?
I'm looking for a portable solution, so preferably no CPU/GPU-specific stuff please.
This is a trick I use for clamping an int to a 0 to 255 range:
/**
* Clamps the input to a 0 to 255 range.
* #param v any int value
* #return {#code v < 0 ? 0 : v > 255 ? 255 : v}
*/
public static int clampTo8Bit(int v) {
// if out of range
if ((v & ~0xFF) != 0) {
// invert sign bit, shift to fill, then mask (generates 0 or 255)
v = ((~v) >> 31) & 0xFF;
}
return v;
}
That still has one branch, but a handy thing about it is that you can test whether any of several ints are out of range in one go by ORing them together, which makes things faster in the common case that all of them are in range. For example:
/** Packs four 8-bit values into a 32-bit value, with clamping. */
public static int ARGBclamped(int a, int r, int g, int b) {
if (((a | r | g | b) & ~0xFF) != 0) {
a = clampTo8Bit(a);
r = clampTo8Bit(r);
g = clampTo8Bit(g);
b = clampTo8Bit(b);
}
return (a << 24) + (r << 16) + (g << 8) + (b << 0);
}
Note that your compiler may already give you what you want if you code value = min (value, 255). This may be translated into a MIN instruction if it exists, or into a comparison followed by conditional move, such as the CMOVcc instruction on x86.
The following code assumes two's complement representation of integers, which is usually a given today. The conversion from Boolean to integer should not involve branching under the hood, as modern architectures either provide instructions that can directly be used to form the mask (e.g. SETcc on x86 and ISETcc on NVIDIA GPUs), or can apply predication or conditional moves. If all of those are lacking, the compiler may emit a branchless instruction sequence based on arithmetic right shift to construct a mask, along the lines of Boann's answer. However, there is some residual risk that the compiler could do the wrong thing, so when in doubt, it would be best to disassemble the generated binary to check.
int value, mask;
mask = 0 - (value > 255); // mask = all 1s if value > 255, all 0s otherwise
value = (255 & mask) | (value & ~mask);
On many architectures, use of the ternary operator ?: can also result in a branchless instruction sequences. The hardware may support select-type instructions which are essentially the hardware equivalent of the ternary operator, such as ICMP on NVIDIA GPUs. Or it provides CMOV (conditional move) as in x86, or predication as on ARM, both of which can be used to implement branch-less code for ternary operators. As in the previous case, one would want to examine the disassembled binary code to be absolutely sure the resulting code is without branches.
int value;
value = (value > 255) ? 255 : value;
In case of floating-point operands, modern floating-point units typically provide FMIN and FMAX instructions which map straight to the C/C++ standard math functions fmin() and fmax(). Alternatively fmin() and fmax() may be translated into a comparison followed by a conditional move. Again, it would be prudent to examine the generated code to make sure it is branchless.
double value;
value = fmax (fmin (value, 1.0), 0.0);
I use this thing, 100% branchless.
int clampU8(int val)
{
val &= (val<0)-1; // clamp < 0
val |= -(val>255); // clamp > 255
return val & 0xFF; // mask out
}
For those using C#, Kotlin or Java this is the best I could do, it's nice and succinct if somewhat cryptic:
(x & ~(x >> 31) | 255 - x >> 31) & 255
It only works on signed integers so that might be a blocker for some.
For clamping doubles, I'm afraid there's no language/platform agnostic solution.
The problem with floating point that they have options from fastest operations (MSVC /fp:fast, gcc -funsafe-math-optimizations) to fully precise and safe (MSVC /fp:strict, gcc -frounding-math -fsignaling-nans). In fully precise mode the compiler does not try to use any bit hacks, even if they could.
A solution that manipulates double bits cannot be portable. There may be different endianness, also there may be no (efficient) way to get double bits, double is not necessarily IEEE 754 binary64 after all. Plus direct manipulations will not cause signals for signaling NANs, when they are expected.
For integers most likely the compiler will do it right anyway, otherwise there are already good answers given.

Recursion in functional programming will not give high concurrency. Is it correct?

I am new to functional programming. Loops in imperative programming replaces recursion in FP. Another statement is FP gives high concurrency. The instructions being executed parallelly on multi-core/cpu systems as the data is immutable.
Whereas in recursion, steps cannot be executed parallelly due to a step execution is dependent on the previous steps result.
So, I am assuming that recursion in FP will not give high concurrency. Am I correct?
Sort of. You cannot get more execution parallelism than the data parallelism; this is Amdahl's law. However, you frequently have more data parallelism than is expressed in typical sequential algorithms, whether functional or imperative. Consider for example taking the scalar multiple of a vector: (note: this is some made-up algol-style language):1
function scalar_multiple(scalar c, vector v) {
vector v1;
for (int i = 0; i < length(v); i++) {
v1[i] = c * v[i];
}
return v1;
}
Obviously, this isn't going to run in parallel. The situation isn't improved if we re-write in a functional language, using recursion (you can think of this as Haskell):
scalar_multiple c [] = []
scalar_multiple c (x:xn) = c * x : scalar_multiple c xn
This is still a sequential algorithm!
However, you can notice that there is no data dependency --- you don't actually need the result of earlier / later multiplications to calculate later ones. So we have the potential for parallelization here. This can be accomplished in an imperative language:
function scalar_multiple(scalar c, vector v) {
vector v1;
parallel_for (int i in 0..length(v)-1) {
v1[i] = c * v[i];
}
return v1;
}
But this parallel_for is a dangerous construct. Consider a search function:
function first(predicate p, vector v) {
for (int i = 0; i < length(v); i++) {
if (p(v[i])) return i;
}
return -1;
}
If we try speeding this up by replacing for with parallel_for:
function first(predicate p, vector v) {
parallel_for (int i in 0..length(v)-1) {
if (p(v[i])) return i;
}
return -1;
}
Now we won't necessarily return the index of the first element to satisfy the condition, just an element that satisfies it. We broke the contract of the function by parallelizing it.
The obvious solution is 'don't allow return inside parallel_for. But there are lots of other dangerous constructs; in fact, you'll notice I had to abandon the C-style for loop because the increment-and-test pattern itself is dangerous in parallel languages. Consider:
function sequence(int n) {
vector v;
int c = 0;
parallel_for (int i = 0..n-1) {
v[i] = c++;
}
return v;
}
This is again a 'toy' example ("just use v[i] = i;!"), but it illustrates the point: this function initializes v in a random order, due to parallelism. So it turns out that the constructs that are 'safe' to use inside a construct like parallel_for are precisely the constructs that are allowed in purely-functional languages, which makes adding parallel constructs to those languages 'safer' than adding them to imperative languages.
1 This is just a very simple example; of course, real parallelism involves finding bigger chunks of work to parallize than this!
Not sure, if I understand you right, but it generally depends on what you want to accomplish.
One recursion alone cannot execute its subcalls parallel. But you CAN have 2 recursions working on the same dataset. i.e. processing an array from left AND from right simultaneosly trough two concurrent running recursive functions. Those (two) functions can then (theretically) run parallel.
In detail it does not matter if you have a recursive function or a function with a loop inside as long as there is a function who can run on its own. So in respect to your question:
No, a recursive function per definition does not give you any concurrency.
Loops are replaced by higher-order functions more frequently than by direct recursion. Recursion is sort of a catch-all measure in functional programming for when higher-order functions don't already exist for what you need to do.
For example, if you want to run the same calculation on all elements of a list, you use a map, which is highly parallelizable. Finding which elements meet certain criteria is a filter, also highly parallelizable.
Some algorithms just plain require the result of the previous iteration in order to proceed. Those are the ones that tend to require a recursive function, and you're right, they are not generally easy to make highly concurrent.

Multiply number by 10 n times

Is there a better mathematical way to multiply a number by 10 n times in Dart than the following (below). I don't want to use the math library, because it would be overkill. It's no big deal; however if there's a better (more elegant) way than the "for loop", preferably one line, I'd like to know.
int iDecimals = 3;
int iValue = 1;
print ("${iValue} to power of ${iDecimals} = ");
for (int iLp1 = 1; iLp1 <= iDecimals; iLp1++) {
iValue *= 10;
}
print ("${iValue}");
You are not raising to a power of ten, you are multiplying by a power of ten. That is in your code the answer will be iValue * 10^(iDecimals) while raising to a power means iValue^10.
Now, your code still contains exponentiation and what it does is raises ten to the power iDecimals and then multiplies by iValue. Raising may be made way more efficient. (Disclaimer: I've never written a line of dart code before and I don't have an interpreter to test, so this might not work right away.)
int iValue = 1;
int p = 3;
int a = 10;
// The following code raises `a` to the power of `p`
int tmp = 1;
while (p > 1) {
if (p % 2 == 0) {
p /= 2;
} else {
c *= a;
p = (p - 1) / 2;
}
a *= a;
}
a *= t;
// in our example now `a` is 10^3
iValue *= a;
print ("${iValue}");
This exponentiation algorithm is very straightforward and it is known as Exponentiation by squaring.
Use the math library. Your idea of doing so being "overkill" is misguided. The following is easier to write, easier to read, fewer lines of code, and most likely faster than anything you might replace it with:
import 'dart:math';
void main() {
int iDecimals = 3;
int iValue = 1;
print("${iValue} times ten to the power of ${iDecimals} = ");
iValue *= pow(10, iDecimals);
print(iValue);
}
Perhaps you're deploying to JavaScript, concerned about deployment size, and unaware that dart2js does tree shaking?
Finally, if you do want to raise a number to the power of ten, as you asked for but didn't do, simply use pow(iValue, 10).
Considering that you don't want to use any math library, i think this is the best way to compute the power of a number. The time complexity of this code snippet also seems minimal. If you need a one line solution you will have to use some math library function.
Btw, you are not raising to the power but simply multiplying a number with 10 n times.
Are you trying to multiply something by a power of 10? If so, I believe Dart supports scientific notation. So the above value would be written as: iValue = 1e3;
Which is equal to 1000. If you want to raise the number itself to the power of ten, I think your only other option is to use the Math library.
Because the criteria was that the answer needed to not require the math library and needed to be fast and ideally a mathematical-solution (not String), and because using the exponential solution requires too much overhead - String, double, integer, I think that the only answer that meets the criteria is as follows :
for (int iLp1=0; iLp1<iDecimal; iLp1++, iScale*=10);
It is quite fast, doesn't require the "math" library, and is a one-liner

Integer polynomial interpolation (or fast select case)

Let x in {10, 37, 96, 104} set.
Let f(x) a "select case" function:
int f1(int x) {
switch(x) {
case 10: return 3;
case 37: return 1;
case 96: return 0;
case 104: return 1;
}
assert(...);
}
Then, we can avoid conditional jumps writing f(x) as a "integer polynomial" like
int f2(int x) {
// P(x) = (x - 70)^2 / 1000
int q = x - 70;
return (q * q) >> 10;
}
In some cases (still including mul operations) would f2 better than f1 (eg. large conditional evaluations).
Are there methods to find P(x) from a switch injection?
Thank you very much!
I suggest you start reading the Wikipedia page about Polynomial Interpolation, if you do not know how to calculate the interpolation polynomial.
Note, that not all calculation methods are suitable for practical application, because of numerical issues (e.g. divisions in the Lagrange version). I am confident that you shold be able to find a libary providing this functionality. Note that the construction will take some time too, hence this makes only sence if your function will be called quite frequently.
Be aware that integer function values and integer points of support do not imply integer coefficients for your polynomial! Thus, in the general case, you will require O(n) floating point operations, and finally a round toward the nearest integer. It may depend on your input wether the interpolation method is reliable and faster than the approach using switch.
Further, I want to propose a differnt solution, assuming that n is rather large. Why dont you put your entries (the pairs (10,3), (37,1), (96,0), (104,1) for your example) inside a serchtree (e.g. std::map in C++ or SortedDictionary in C#)? Thus, your query cost would reduce from linear to O(log n)!

Resources