Variable vs Constant - asp.net

I was wondering the differences of variable and constants as I see different declaration of variable/constant in the codes written by ex-colleagues.
I know that variable is something that can be change throughout the code and the value of constant is fixed and can't be changed. By far I've written everything in variable (even if the variable will not be change). Is it my practice is incorrect? Perhaps my code is not complicated therefore I use variable all the time.
Anyhow, if my understanding proven wrong, please enlighten me with the correct guidelines on this matter will do.

It is a good code practice to use constants whenever possible.
At runtime / compile time it will be known that only Read operations can be done on those values, thus some accessing / IO optimizations will be done to the code automatically , which will significantly increase performance.
Another difference is that constants are stored in a different preallocated section of your code (compiler dependent, but on most compilers this is what happens), which makes them easier to access , and they don't get allocated / deallocated all the time (so another performance optimization).
And finnaly, constants can be evaluated at compile time .
For example, if you have an ecuation of constants, something like the following :
float a = const1 * const2 / const3 + const4;
Then the whole expression will be evaluated at compile time, saving cycles at runtime (since the value will always be the same).
Some popular constants that refer to this sort of optimization are PI , PI/2 , PI/4, 1/PI.

const int const_a = 10;
int static_a = 70;
public void sample()
{
static_a = const_a+10; //This is correct
// const_a=88; //It is wrong
}
In the above example, if we declare the variable as const we can't able to assign the value from anywhere but we can use that variable.

Related

Options to structure compile-time constants in OpenCL?

I need to pass a bunch of constants into my OpenCL kernel. Luckily, these are mostly known at compile-time, meaning: kernel compile-time. And therefore I can pass them in as a bunch of defines like -D leftOuterMargin=3 -D rightOuterMargin=2 -D leftInnerMargin=1 -D rightInnerMargin=2 ....
But this gets a bit unwieldy, and makes it hard to write re-usable functions inside the kernel. I'm looking for something a bit more structured, like say structs. However, structs would seem to be stored either in constant space (if create using a global constant instantiation, probably via an appropriate #define), or private space (if create inside the kernel function, again probably via an appropriate #define)?
What options are available for structuring constant data in a kernel, that is known at compile-time? Some things I hope for:
structured, can just pass a single pointer-like thing into reusable methods, rather than having 8-16 clumsily-named #defines, or 8 parameters into each non-kernel method
the values load quickly when used, just like normal #defined values
the values can be used by the compiler for optimizations, eg if I use one for a loop upper-bound, that loop can be fully unwrapped at compile time
won't increase register pressure
relatively standard, will work generally, across different gpu platforms/manufacturers
This:
the values load quickly when used, just like normal #defined values
the values can be used by the compiler for optimizations, eg if I use one for a loop upper-bound, that loop can be fully unwrapped at compile time
won't increase register pressure
Is incompatible with this:
can just pass a single pointer-like thing into reusable methods [...] or 8 parameters into each non-kernel method
If you want the values to be constant expressions (in other words, "known to the compiler"), then the only options are #define and global-scope const. No passing values dynamically by parameter or by indirection.
I suggest you make a struct with your different options:
struct Options {
int leftOuterMargin;
int rightOuterMargin;
int leftInnerMargin;
int rightInnerMargin;
// and so on ...
};
Then you can define a header included in all translation units where the constants are required:
// constants.h
static const Options constants = {
.leftOuterMargin = 3,
.rightOuterMargin = 2,
.leftInnerMargin = 1,
.rightInnerMargin = 2
};
The compiler should be able to optimize your code just as well as if you had used #define.

Why is code unreachable in Frama-C Value Analysis?

When running Frama-C value analysis with some benchmarks, e.g. susan in http://www.eecs.umich.edu/mibench/automotive.tar.gz, we noticed that a lot of blocks are considered dead code or unreachable. However, in practice, these code is executed as we printed out some debug information from these blocks. Is there anybody noticed this issue? How can we solve this?
Your code has a peculiarity which is not in Pascal's list, and which explains some parts of the dead code. Quite a few functions are declared as such
f(int x, int y);
The return type is entirely missing. The C standard indicates that such functions should return int, and Frama-C follows this convention. When parsing those function, it indicates that they never return anything on some of their paths
Body of function f falls-through. Adding a return statement.
On top on the return statement, Frama-C also adds an /*# assert \false; annotation, to indicate that the execution paths of the functions that return nothing should be dead code. In your code, this annotation is always false: those functions are supposed to return void, and not int. You should correct your code with the good return type.
Occurrences of dead code in the results of Frama-C's value analysis boil down to two aspects, and even these two aspects are only a question of human intentions and are indistinguishable from the point of view of the analyzer.
Real bugs that occur with certainty everytime a particular statement is reached. For instance the code after y = 0; x = 100 / y; is unreachable because the program stops at the division everytime. Some bugs that should be run-time errors do not always stop execution, for instance, writing to an invalid address. Consider yourself lucky that they do in Frama-C's value analysis, not the other way round.
Lack of configuration of the analysis context, including not having provided an informative main() function that sets up variation ranges of the program's inputs with such built-in functions as Frama_C_interval(), missing library functions for which neither specifications nor replacement code are provided, assembly code inside the C program, missing option -absolute-valid-range when one would be appropriate, ...

Common subexpression elimination OpenCL

I have a very large kernel which uses ~1000 temporary variables to compute ~1000 equations. So it is safe to assume that all of the temporary variables will be put in private off-chip memory aka CUDA __local memory (i know it's bad, but there is no other way).
My question is if common subexpressions are eliminated between neighboring lines like these:
const float t747 = t472*t28*t26*t715*t30*t11;
const float t748 = t472*t28*t717*t26*t30*t11;
As you can see the only difference is variable t717 versus t715. The question is if those two lines translate into 7 or 12 global loads?
Because if the target compiler (Nvidia Kepler GPU in my case) does not use registers to cache common subexpressions between lines i'm gonna need to implement it myself.
Note: All code is generated automatically, so manual tuning won't be possible.
EDIT: All t0-t999 variables are declared as "const float".
The compilers translate all the global reads as direct reads. So, in your case 12 reads.
This is due to the fact that the global memory is considered as volatile memory, and caching is not possible. However, if you simply do this: (I think you know it, but anyway...)
const float temp = t472*t28*t26*t30*t11;
const float t747 = temp*t715;
const float t748 = temp*t717;
The compiler will translate that into 7 global reads.
NOTE: At least this was valid with old arquitectures, I dunno if there is some new compiler/arquitecture that can cleverly detect these cases and optimize them.

Performing operations on CUDA matrices while reading from a global Point

Hey there,
I have a mathematical function (multidimensional which means that there's an index which I pass to the C++-function on which single mathematical function I want to return. E.g. let's say I have a mathematical function like that:
f = Vector(x^2*y^2 / y^2 / x^2*z^2)
I would implement it like that:
double myFunc(int function_index)
{
switch(function_index)
{
case 1:
return PNT[0]*PNT[0]*PNT[1]*PNT[1];
case 2:
return PNT[1]*PNT[1];
case 3:
return PNT[2]*PNT[2]*PNT[1]*PNT[1];
}
}
whereas PNT is defined globally like that: double PNT[ NUM_COORDINATES ]. Now I want to implement the derivatives of each function for each coordinate thus generating the derivative matrix (columns = coordinates; rows = single functions). I wrote my kernel already which works so far and which call's myFunc().
The Problem is: For calculating the derivative of the mathematical sub-function i concerning coordinate j, I would use in sequential mode (on CPUs e.g.) the following code (whereas this is simplified because usually you would decrease h until you reach a certain precision of your derivative):
f0 = myFunc(i);
PNT[ j ] += h;
derivative = (myFunc(j)-f0)/h;
PNT[ j ] -= h;
now as I want to do this on the GPU in parallel, the problem is coming up: What to do with PNT? As I have to increase certain coordinates by h, calculate the value and than decrease it again, there's a problem coming up: How to do it without 'disturbing' the other threads? I can't modify PNT because other threads need the 'original' point to modify their own coordinate.
The second idea I had was to save one modified point for each thread but I discarded this idea quite fast because when using some thousand threads in parallel, this is a quite bad and probably slow (perhaps not realizable at all because of memory limits) idea.
'FINAL' SOLUTION
So how I do it currently is the following, which adds the value 'add' on runtime (without storing it somewhere) via preprocessor macro to the coordinate identified by coord_index.
#define X(n) ((coordinate_index == n) ? (PNT[n]+add) : PNT[n])
__device__ double myFunc(int function_index, int coordinate_index, double add)
{
//*// Example: f[i] = x[i]^3
return (X(function_index)*X(function_index)*X(function_index));
// */
}
That works quite nicely and fast. When using a derivative matrix with 10000 functions and 10000 coordinates, it just takes like 0.5seks. PNT is defined either globally or as constant memory like __constant__ double PNT[ NUM_COORDINATES ];, depending on the preprocessor variable USE_CONST.
The line return (X(function_index)*X(function_index)*X(function_index)); is just an example where every sub-function looks the same scheme, mathematically spoken:
f = Vector(x0^3 / x1^3 / ... / xN^3)
NOW THE BIG PROBLEM ARISES:
myFunc is a mathematical function which the user should be able to implement as he likes to. E.g. he could also implement the following mathematical function:
f = Vector(x0^2*x1^2*...*xN^2 / x0^2*x1^2*...*xN^2 / ... / x0^2*x1^2*...*xN^2)
thus every function looking the same. You as a programmer should only code once and not depending on the implemented mathematical function. So when the above function is being implemented in C++, it looks like the following:
__device__ double myFunc(int function_index, int coordinate_index, double add)
{
double ret = 1.0;
for(int i = 0; i < NUM_COORDINATES; i++)
ret *= X(i)*X(i);
return ret;
}
And now the memory accesses are very 'weird' and bad for performance issues because each thread needs access to each element of PNT twice. Surely, in such a case where each function looks the same, I could rewrite the complete algorithm which surrounds the calls to myFunc, but as I stated already: I don't want to code depending on the user-implemented function myFunc...
Could anybody come up with an idea how to solve this problem??
Thanks!
Rewinding back to the beginning and starting with a clean sheet, it seems you want to be able to do two things
compute an arbitrary scalar valued
function over an input array
approximate the partial derivative of an arbitrary scalar
valued function over the input array
using first order accurate finite differencing
While the function is scalar valued and arbitrary, it seems that there are, in fact, two clear forms which this function can take:
A scalar valued function with scalar arguments
A scalar valued function with vector arguments
You appeared to have started with the first type of function and have put together code to deal with computing both the function and the approximate derivative, and are now wrestling with the problem of how to deal with the second case using the same code.
If this is a reasonable summary of the problem, then please indicate so in a comment and I will continue to expand it with some code samples and concepts. If it isn't, I will delete it in a few days.
In comments, I have been trying to suggest that conflating the first type of function with the second is not a good approach. The requirements for correctness in parallel execution, and the best way of extracting parallelism and performance on the GPU are very different. You would be better served by treating both types of functions separately in two different code frameworks with different usage models. When a given mathematical expression needs to be implemented, the "user" should make a basic classification as to whether that expression is like the model of the first type of function, or the second. The act of classification is what drives algorithmic selection in your code. This type of "classification by algorithm" is almost universal in well designed libraries - you can find it in C++ template libraries like Boost and the STL, and you can find it in legacy Fortran codes like the BLAS.

Efficiency of stack-based expression evaluation for math parsing

I have to write, for academic purposes, an application that plots user-input expressions like: f(x) = 1 - exp(3^(5*ln(cosx)) + x)
The approach I've chosen to write the parser is to convert the expression in RPN with the Shunting-Yard algorithm, treating primitive functions like "cos" as unary operators. This means the function written above would be converted in a series of tokens like:
1, x, cos, ln, 5, *,3, ^, exp, -
The problem is that to plot the function I have to evaluate it LOTS of times, so applying the stack evaluation algorithm for each input value would be very inefficient.
How can I solve this? Do I have to forget the RPN idea?
How much is "LOTS of times"? A million?
What kind of functions could be input? Can we assume they are continuous?
Did you try measuring how well your code performs?
(Sorry, started off with questions!)
You could try one of the two approaches (or both) described briefly below (there are probably many more):
1) Parse Trees.
You could create a Parse Tree. Then do what most compilers do to optimize expressions, constant folding, common subexpression elimination (which you could achieve by linking together the common expression subtrees and caching the result), etc.
Then you could use lazy evaluation techniques to avoid whole subtrees. For instance if you have a tree
*
/ \
A B
where A evaluates to 0, you could completely avoid evaluating B as you know the result is 0. With RPN you would lose out on the lazy evaluation.
2) Interpolation
Assuming your function is continuous, you could approximate your function to a high degree of accuracy using Polynomial Interpolation. This way you can do the complicated calculation of the function a few times (based on the degree of polynomial you choose), and then do fast polynomial calculations for the rest of the time.
To create the initial set of data, you could just use approach 1 or just stick to using your RPN, as you would only be generating a few values.
So if you use Interpolation, you could keep your RPN...
Hope that helps!
Why reinvent the wheel? Use a fast scripting language instead.
Integrating something like lua into your code will take very little time and be very fast.
You'll usually be able byte compile your expression, and that should result in code that runs very fast, certainly fast enough for simple 1D graphs.
I recommend lua as its fast, and integrates with C/C++ easier than any other scripting language. Another good options would be python, but while its better known I found it trickier to integrate.
Why not keep around a parse tree (I use "tree" loosely, in your case it's a sequence of operations), and mark input variables accordingly? (e.g. for inputs x, y, z, etc. annotate "x" with 0 to signify the first input variable, "y" with 1 to signify the 2nd input variable, etc.)
That way you can parse the expression once, keep the parse tree, take in an array of inputs, and apply the parse tree to evaluate.
If you're worrying about the performance aspects of the evaluation step (vs. the parsing step), I don't think you'd do much better unless you get into vectorizing (applying your parse tree on a vector of inputs at once) or hard-coding the operations into a fixed function.
What I do is use the shunting algorithm to produce the RPN. I then "compile" the RPN into a tokenised form that can be executed (interpretively) repeatedly without re-parsing the expression.
Michael Anderson suggested Lua. If you want to try Lua for just this task, see my ae library.
Inefficient in what sense? There's machine time and programmer time. Is there a standard for how fast it needs to run with a particular level of complexity? Is it more important to finish the assignment and move on to the next one (perfectionists sometimes never finish)?
All those steps have to happen for each input value. Yes, you could have a heuristic that scans the list of operations and cleans it up a bit. Yes, you could compile some of it down to assembly instead of calling +, * etc. as high level functions. You can compare vectorization (doing all the +'s then all the *'s etc, with a vector of values) to doing the whole procedure for one value at a time. But do you need to?
I mean, what do you think happens if you plot a function in gnuplot or Mathematica?
Your simple interpretation of RPN should work just fine, especially since it contains
math library functions like cos, exp, and ^(pow, involving logs)
symbol table lookup
Hopefully, your symbol table (with variables like x in it) will be short and simple.
The library functions will most likely be your biggest time-takers, so unless your interpreter is poorly written, it will not be a problem.
If, however, you really gotta go for speed, you could translate the expression into C code, compile and link it into a dll on-the-fly and load it (takes about a second). That, plus memoized versions of the math functions, could give you the best performance.
P.S. For parsing, your syntax is pretty vanilla, so a simple recursive-descent parser (about a page of code, O(n) same as shunting-yard) should work just fine. In fact, you might just be able to compute the result as you parse (if math functions are taking most of the time), and not bother with parse trees, RPN, any of that stuff.
I think this RPN based library can serve the purpose: http://expressionoasis.vedantatree.com/
I used it with one of my calculator project and it works well. It is small and simple, but extensible.
One optimization would be to replace the stack with an array of values and implement the evaluator as a three address mechine where each operation loads from two (or one) location and saves to a third. This can make for very tight code:
struct Op {
enum {
add, sub, mul, div,
cos, sin, tan,
//....
} op;
int a, b, d;
}
void go(Op* ops, int n, float* v) {
for(int i = 0; i < n; i++) {
switch(ops[i].op) {
case add: v[op[i].d] = v[op[i].a] + v[op[i].b]; break;
case sub: v[op[i].d] = v[op[i].a] - v[op[i].b]; break;
case mul: v[op[i].d] = v[op[i].a] * v[op[i].b]; break;
case div: v[op[i].d] = v[op[i].a] / v[op[i].b]; break;
//...
}
}
}
The conversion from RPN to 3-address should be easy as 3-address is a generalization.

Resources