Currently I have to work in an environment where the power-operator is bugged. Can anyone think of a method temporarily work around this bug and compute a^b (both floating point) without a power function or operator?
if you have sqrt() available:
double sqr( double x ) { return x * x; }
// meaning of 'precision': the returned answer should be base^x, where
// x is in [power-precision/2,power+precision/2]
double mypow( double base, double power, double precision )
{
if ( power < 0 ) return 1 / mypow( base, -power, precision );
if ( power >= 10 ) return sqr( mypow( base, power/2, precision/2 ) );
if ( power >= 1 ) return base * mypow( base, power-1, precision );
if ( precision >= 1 ) return sqrt( base );
return sqrt( mypow( base, power*2, precision*2 ) );
}
double mypow( double base, double power ) { return mypow( base, power, .000001 ); }
test code:
void main()
{
cout.precision( 12 );
cout << mypow( 2.7, 1.23456 ) << endl;
cout << pow ( 2.7, 1.23456 ) << endl;
cout << mypow( 1.001, 1000.7 ) << endl;
cout << pow ( 1.001, 1000.7 ) << endl;
cout << mypow( .3, -10.7 ) << endl;
cout << pow ( .3, -10.7 ) << endl;
cout << mypow( 100000, .00001 ) << endl;
cout << pow ( 100000, .00001 ) << endl;
cout << mypow( 100000, .0000001 ) << endl;
cout << pow ( 100000, .0000001 ) << endl;
}
outputs:
3.40835049344
3.40835206431
2.71882549461
2.71882549383
393371.348073
393371.212573
1.00011529225
1.00011513588
1.00000548981
1.00000115129
You can use the identity ab = e(b log a), then all the calculations are relative to the same base e = 2.71828...
Now you have to implement f(x) = ln(x), and g(x) = e^x. The fast, low precision method would be to use lookup tables for f(x) and g(x). Maybe that's good enough for your purposes. If not, you can use the Taylor series expansions to express ln(x) and e^x in terms
of multiplication and addition.
given that you can use sqrt, this simple recursive algorithm works:
Suppose that we're calculating aˆb. The way the algorithm works is by doing Fast Exponentiation on the exponent until we hit the fractional part, once in the fractional part, do a modified binary search, until we're close enough to the fractional part.
double EPS = 0.0001;
double exponentiation(double base, double exp){
if(exp >= 1){
double temp = exponentiation(base, exp / 2);
return temp * temp;
} else{
double low = 0;
double high = 1.0;
double sqr = sqrt(base);
double acc = sqr;
double mid = high / 2;
while(abs(mid - exp) > EPS){
sqr = sqrt(sqr);
if (mid <= exp) {
low = mid;
acc *= sqr;
} else{
high = mid;
acc *= (1/sqr);
}
mid = (low + high) / 2;
}
return acc;
}
}
Related
One of my OpenCL helper functions writing to global memory in one place runs just fine, and the kernel executes typically. Still, when run from directly after that line, it freezes/crashes the kernel, and my program can't function.
The values in this function change (different values for an NDRange of 2^16), and therefore the loops change as well, and not all threads can execute the same code because of the conditionals.
Why exactly is this an issue? Am I missing some kind of memory blocking or something?
void add_world_seeds(yada yada yada...., const uint global_id, __global long* world_seeds)
for (; indexer < (1 << 16); indexer += increment) {
long k = (indexer << 16) + c;
long target2 = (k ^ e) >> 16;
long second_addend = get_partial_addend(k, x, z) & MASK_16;
if (ctz(target2 - second_addend) < mult_trailing_zeroes) { continue; }
long a = (((first_mult_inv * (target2 - second_addend)) >> mult_trailing_zeroes) ^ (J1_MUL >> 32)) & mask;
for (; a < (1 << 16); a += increment) {
world_seeds[global_id] = (a << 32) + k; //WORKS HERE
if (get_population_seed((a << 32) + k, x, z) != population_seed_state) { continue; }
world_seeds[global_id] = (a << 32) + k; //DOES NOT WORK HERE
}
}
for (; a < (1 << 16); a += increment) {
world_seeds[global_id] = (a << 32) + k; //WORKS HERE
if (get_population_seed((a << 32) + k, x, z) != population_seed_state) { continue; }
world_seeds[global_id] = (a << 32) + k; //DOES NOT WORK HERE
}
There was in fact a bug causing the undefined behavior in the code, in particular the main reversal kernel included a variable in the arguments called "increment", and in that same kernel I defined another variable called increment. It compiled fine but led to completely all over the wall wrong results and memory crashes.
I am the novice in the field of distributed-computation and I know the most popular standard is the Message Passing Interface. However, if I only have one server, I can also run my program under the MPI framework as the following demo example.
# include <cmath>
# include <cstdlib>
# include <ctime>
# include <iomanip>
# include <iostream>
# include <mpi.h>
using namespace std;
int main ( int argc, char *argv[] );
double f ( double x );
void timestamp ( );
int main ( int argc, char *argv[] )
{
double end_time;
int i;
int id;
int ierr;
int m;
int p;
double r8_pi = 3.141592653589793238462643;
int process;
double q_global;
double q_local;
int received;
int source;
double start_time;
MPI_Status status;
int tag;
int target;
double x;
double xb[2];
double x_max = 1.0;
double x_min = 0.0;
//
// Establish the MPI environment.
//
ierr = MPI_Init ( &argc, &argv );
if ( ierr != 0 )
{
cout << "\n";
cout << "INTERVALS_MPI - Fatal error!";
cout << " MPI_Init returned ierr = " << ierr << "\n";
exit ( 1 );
}
//
// Determine this processes's rank.
//
ierr = MPI_Comm_rank ( MPI_COMM_WORLD, &id );
//
// Get the number of processes.
//
ierr = MPI_Comm_size ( MPI_COMM_WORLD, &p );
//
// Say hello (once), and shut down right away unless we
// have at least 2 processes available.
//
if ( id == 0 )
{
timestamp ( );
cout << "\n";
cout << "INTERVALS - Master process:\n";
cout << " C++ version\n";
cout << "\n";
cout << " An MPI example program,\n";
cout << " A quadrature over an interval is done by\n";
cout << " assigning subintervals to processes.\n";
cout << "\n";
cout << " The number of processes is " << p << "\n";
start_time = MPI_Wtime ( );
if ( p <= 1 )
{
cout << "\n";
cout << "INTERVALS - Master process:\n";
cout << " Need at least 2 processes!\n";
MPI_Finalize ( );
cout << "\n";
cout << "INTERVALS - Master process:\n";
cout << " Abnormal end of execution.\n";
exit ( 1 );
}
}
cout << "\n";
cout << "Process " << id << ": Active!\n";
//
// Every process could figure out the endpoints of its interval
// on its own. But we want to demonstrate communication. So we
// assume that the assignment of processes to intervals is done
// only by the master process, which then tells each process
// what job it is to do.
//
if ( id == 0 )
{
for ( process = 1; process <= p-1; process++ )
{
xb[0] = ( ( double ) ( p - process ) * x_min
+ ( double ) ( process - 1 ) * x_max )
/ ( double ) ( p - 1 );
xb[1] = ( ( double ) ( p - process - 1 ) * x_min
+ ( double ) ( process ) * x_max )
/ ( double ) ( p - 1 );
target = process;
tag = 1;
ierr = MPI_Send ( xb, 2, MPI_DOUBLE, target, tag, MPI_COMM_WORLD );
}
}
else
{
source = 0;
tag = 1;
ierr = MPI_Recv ( xb, 2, MPI_DOUBLE, source, tag, MPI_COMM_WORLD, &status );
}
//
// Wait here until everyone has gotten their assignment.
//
ierr = MPI_Barrier ( MPI_COMM_WORLD );
if ( id == 0 )
{
cout << "\n";
cout << "INTERVALS - Master process:\n";
cout << " Subintervals have been assigned.\n";
}
//
// Every process needs to be told the number of points to use.
// Since this is the same value for everybody, we use a broadcast.
// Again, we are doing it in this roundabout way to emphasize that
// the choice for M could really be made at runtime, by processor 0,
// and then sent out to the others.
//
m = 100;
source = 0;
ierr = MPI_Bcast ( &m, 1, MPI_INT, source, MPI_COMM_WORLD );
//
// Now, every process EXCEPT 0 computes its estimate of the
// integral over its subinterval, and sends the result back
// to process 0.
//
if ( id != 0 )
{
q_local = 0.0;
for ( i = 1; i <= m; i++ )
{
x = ( ( double ) ( 2 * m - 2 * i + 1 ) * xb[0]
+ ( double ) ( 2 * i - 1 ) * xb[1] )
/ ( double ) ( 2 * m );
q_local = q_local + f ( x );
}
q_local = q_local * ( xb[1] - xb[0] ) / ( double ) ( m );
target = 0;
tag = 2;
ierr = MPI_Send ( &q_local, 1, MPI_DOUBLE, target, tag, MPI_COMM_WORLD );
}
//
// Process 0 expects to receive N-1 partial results.
//
else
{
received = 0;
q_global = 0.0;
while ( received < p - 1 )
{
source = MPI_ANY_SOURCE;
tag = 2;
ierr = MPI_Recv ( &q_local, 1, MPI_DOUBLE, source, tag, MPI_COMM_WORLD,
&status );
q_global = q_global + q_local;
received = received + 1;
}
}
//
// The master process prints the answer.
//
if ( id == 0 )
{
cout << "\n";
cout << "INTERVALS - Master process:\n";
cout << " Estimate for PI is " << q_global << "\n";
cout << " Error is " << q_global - r8_pi << "\n";
end_time = MPI_Wtime ( );
cout << "\n";
cout << " Elapsed wall clock seconds = "
<< end_time - start_time << "\n";
}
//
// Terminate MPI.
//
MPI_Finalize ( );
//
// Terminate.
//
if ( id == 0 )
{
cout << "\n";
cout << "INTERVALS - Master process:\n";
cout << " Normal end of execution.\n";
cout << "\n";
timestamp ( );
}
return 0;
}
//****************************************************************************80
double f ( double x )
{
double value;
value = 4.0 / ( 1.0 + x * x );
return value;
}
//****************************************************************************80
void timestamp ( )
{
# define TIME_SIZE 40
static char time_buffer[TIME_SIZE];
const struct std::tm *tm_ptr;
std::time_t now;
now = std::time ( NULL );
tm_ptr = std::localtime ( &now );
std::strftime ( time_buffer, TIME_SIZE, "%d %B %Y %I:%M:%S %p", tm_ptr );
std::cout << time_buffer << "\n";
return;
# undef TIME_SIZE
}
Actually, this is the simple case that we use the MPI to compute the integral of specific function. I run this program by using 4 processes. I am confused that we can also use the OpenMP to do the share memory programming instead of MPI to reduce the communication cost. I do not know the meaning of MPI on single machine.
I'm trying to implement some draws using a polya urn scheme using Rcpp. Basically, I have a matrix I'm drawing from, and a 2nd matrix with weights proportional to the probabilities. After each draw, I need to increase the weight of whichever cell I drew.
I was running into some indexing errors which lead me to examine the sampling more generally, and I found that my weight matrix was getting modified by RcppArmadillo::sample. Two questions (1) is this behavior that I should have expected or is this a bug which I should report somewhere? (2) Any ideas on current work-around? Here's a reproducible example:
#include <RcppArmadilloExtensions/sample.h>
// [[Rcpp::depends(RcppArmadillo)]]
using namespace Rcpp ;
// [[Rcpp::export]]
void sampler(int N, int inc, NumericMatrix& weight_matrix, int reps) {
IntegerVector wm_tmp = seq_along(weight_matrix);
Rcout << "Initial weight_matrix:\n" << weight_matrix << "\n";
int x_ind;
for(int i = 0; i < reps; ++i) {
x_ind = RcppArmadillo::sample(wm_tmp, 1, true, weight_matrix)(0) - 1;
Rcout << "Weight matrix after sample: (rep = " << i << ")\n" << weight_matrix << "\n";
Rcout << "x_ind: " << x_ind << "\n";
// get indices
weight_matrix[x_ind] = weight_matrix[x_ind] + inc;
Rcout << "Add increment of " << inc << " to weight_matrix:\n" << weight_matrix << "\n";
}
}
//
// // [[Rcpp::export]]
// IntegerVector seq_cpp(IntegerMatrix x) {
// IntegerVector tmp = seq_along(x);
// IntegerVector ret = RcppArmadillo::sample(tmp, 2, true);
// return ret;
// }
/*** R
weight_matrix <- matrix(1, 5, 2)
sampler(5, 1, weight_matrix, 3)
weight_matrix <- matrix(1, 5, 2)
sampler(5, 0, weight_matrix, 3)
*/
Thanks!
That is known and documented behaviour.
You could do
i) Use Rcpp::clone() to create a distinct copy of your SEXP (ie NumericMatrix).
ii) Use an Armadillo matrix instead and pass as const arma::mat & m.
There are architectural reasons having to do with the way R organizes its data structure which mean that we cannot give you fast access (no copies!) and also protect against writes.
I was solving problem timusOJ Metro. I solved it using DP but I am getting TLE. I don't know how to do it better? Solution in Ideone. Help!
#include <bits/stdc++.h>
using namespace std;
double dis[1005][1005];
map< pair< int, int >, int > m;
int N, M, K;
double min( double a, double b ){
if( a < b )
return a;
return b;
}
int main(){
cin >> M >> N;
cin >> K;
dis[0][0]= 0;
for( int i= 1; i<= N; ++i )
dis[i][0]= 100*i;
for( int j= 1; j<= M; ++j )
dis[0][j]= 100*j;
for( int i= 0, x,y; i< K; ++i ){
scanf( "%d%d", &x, &y );
m[{y,x}]++;
}
double shortcut= sqrt(20000);
for( int i= 1; i<= N; ++i )
for( int j= 1; j<= M; ++j ){
//cout << i << " " << j << "\n\t";
if( m[{i,j}] > 0 ){
dis[i][j]= min( dis[i-1][j-1] + shortcut , min(
dis[i-1][j] , dis[i][j-1] ) + 100 ) ;
m[{i,j}]--;
}else{
dis[i][j]= min( dis[i-1][j], dis[i][j-1] ) + 100;
}
}
cout << ceil(dis[N][M]) << endl;
}
P.S. I have problem in formatting code in this platform therefore ideone is used.
UPD: Accepted :)
Read the question wrong , I claim that by triangular inequality The distance to go to a block diagonally is lesser than travelling right and up.!
I will submit and explain soon..
I'm very interested in cryptography, and since I like programming too, I decided to make a little program to encrypt files using XTEA encryption algorithm.
I got inspired from Wikipedia, and so I wrote this function to do the encryption (To save space, I won't post the deciphering function, as it is almost the same):
void encipher(long *v, long *k)
{
long v0 = v[0], v1 = v[1];
long sum = 0;
long delta = 0x9e3779b9;
short rounds = 32;
for(uint32 i = 0; i<rounds; i++)
{
v0 += (((v1 << 4) ^ (v1 >> 5)) + v1) ^ (sum + k[sum & 3]);
sum += delta;
v1 += (((v0 << 4) ^ (v0 >> 5)) + v0) ^ (sum + k[(sum>>11) & 3]);
}
v[0] = v1;
v[1] = v1;
}
Now when I want to use it, I wrote this code:
long data[2]; // v0 and v1, 64bits
data[0] = 1;
data[1] = 1;
long key[4]; // 4 * 4 bytes = 16bytes = 128bits
*key = 123; // sets the key
cout << "READ: \t\t" << data[0] << endl << "\t\t" << data[1] << endl;
encipher(data, key);
cout << "ENCIPHERED: \t" << data[0] << endl << "\t\t" << data[1] << endl;
decipher(data, key);
cout << "DECIPHERED: \t" << data[0] << endl << "\t\t" << data[1] << endl;
I always get either run-time crash or wrong decipher text:
I do understand the basics of the program, but I don't really know what is wrong with my code. Why is the enciphered data[0] and data1 the same? And why is deciphered data completely different from the starting data? Am I using the types wrong?
I hope you can help me solving my problem :) .
Jan
The problem is here:
v[0] = v1; // should be v[0] = v0
v[1] = v1;
Also, you only set the first 4 bytes of the key. The remaining 12 bytes are uninitialized.
Try something like this:
key[0] = 0x12345678;
key[1] = 0x90ABCDEF;
key[2] = 0xFEDCBA09;
key[3] = 0x87654321;
The fixed code gives me this output:
READ: 1
1
ENCIPHERED: -303182565
-1255815002
DECIPHERED: 1
1