Converting 16 bit integer into 0..100 - math

I have a system that takes/gives volume (as in an audio amplifier) as a 16-bit unsigned integer. I have another system that takes/gives volume as a integer between 0 and 100.
0 is 0
100 is 65535
What's the math to convert to/from? E.g. in C#.

Dividing by 655.36 is the same thing as multiplying by 100 and then dividing by 65536, which can be done purely in integer arithmetic:
int scaled = input * 100 >> 16;
That's biased downwards however (and therefore does not ever result in 100), because of the truncation implicit in the division/right shift. You can make it round evenly by adding a bias of 0.5,
int temp = input * 100;
temp += 0x8000; // 0x8000 = 0.5 in Q16
int scaled = temp >> 16;
Here, 0xfeb9 and up will result in 100. If that wasn't supposed to happen because 100 was an exclusive bound, you can of course multiply by 99 instead.
The other way around can be done using the same principles,
int scaled = ((input << 16) - 50) / 100;
This ensures that 100 -> 65535, 65536 is not a 16 bit number so it should probably be avoided.
A largely similar thing can be done shorter but with extra multiplication,
int scaled = input * 65535 / 100;
Which distributes the results a bit differently but it doesn't make a lot of difference.

Using Vala, which is very similar to C# (a very rough and simple approach):
public static int convert_from_unsigned_int_to_percentage (uint16 val) {
return (int) ((val / 65535.0) * 100);
}
public static uint16 convert_from_percentage_to_unsigned_int (int val) {
return (uint16) ((val / 100.0) * 65535);
}
int main (string[] args) {
print ("test1 65535 -> ? = %d\n", convert_from_unsigned_int_to_percentage (65535));
print ("test1 32500 -> ? = %d\n", convert_from_unsigned_int_to_percentage (32500));
print ("test1 100%% -> ? = %d\n", convert_from_percentage_to_unsigned_int (100));
print ("test1 50%% -> ? = %d\n", convert_from_percentage_to_unsigned_int (50));
return 0;
}
Output:
./volume
test1 65535 -> ? = 100
test1 32500 -> ? = 49
test1 100% -> ? = 65535
test1 50% -> ? = 32767

Related

MPI program runtime error MPI_GATHER, qsub mpijobparallel

I am trying to run this fast fourier implementation code. It compiles fine but gives this error at runtime. I have no idea about the error or what it means. Can anyone help me out?
I compiled and run the program by:
mpicc -o exec test.c
./exec
CODE:
This is the code that I found on GITHUB. Its the parallel version of fast fourier algorithm.
#include <stdio.h>
#include <mpi.h> //To use MPI
#include <complex.h> //to use complex numbers
#include <math.h> //for cos() and sin()
#include "timer.h" //to use timer
#define PI 3.14159265
#define bigN 16384 //Problem Size
#define howmanytimesavg 3
int main()
{
int my_rank,comm_sz;
MPI_Init(NULL,NULL); //start MPI
MPI_Comm_size(MPI_COMM_WORLD,&comm_sz); ///how many processes are we
using?
MPI_Comm_rank(MPI_COMM_WORLD,&my_rank); //which process is this?
double start,finish;
double avgtime = 0;
FILE *outfile;
int h;
if(my_rank == 0) //if process 0 open outfile
{
outfile = fopen("ParallelVersionOutput.txt", "w"); //open from current
directory
}
for(h = 0; h < howmanytimesavg; h++) //loop to run multiple times for AVG
time.
{
if(my_rank == 0) //If it's process 0 starts timer
{
start = MPI_Wtime();
}
int i,k,n,j; //Basic loop variables
double complex evenpart[(bigN / comm_sz / 2)]; //array to save the data
for EVENHALF
double complex oddpart[(bigN / comm_sz / 2)]; //array to save the data
for ODDHALF
double complex evenpartmaster[ (bigN / comm_sz / 2) * comm_sz]; //array
to save the data for EVENHALF
double complex oddpartmaster[ (bigN / comm_sz / 2) * comm_sz]; //array
to save the data for ODDHALF
double storeKsumreal[bigN]; //store the K real variable so we can abuse
symmerty
double storeKsumimag[bigN]; //store the K imaginary variable so we can
abuse symmerty
double subtable[(bigN / comm_sz)][3]; //Each process owns a subtable
from the table below
double table[bigN][3] = //TABLE of numbers to use
{
0,3.6,2.6, //n, Real,Imaginary CREATES TABLE
1,2.9,6.3,
2,5.6,4.0,
3,4.8,9.1,
4,3.3,0.4,
5,5.9,4.8,
6,5.0,2.6,
7,4.3,4.1,
};
if(bigN > 8) //Everything after row 8 is all 0's
{
for(i = 8; i < bigN; i++)
{
table[i][0] = i;
for(j = 1; j < 3;j++)
{
table[i][j] = 0.0; //set to 0.0
}
}
}
int sendandrecvct = (bigN / comm_sz) * 3; //how much to send and
recieve??
MPI_Scatter(table,sendandrecvct,MPI_DOUBLE,subtable,sendandrecvct,MPI_DOUBLE,0,MPI_COMM_WORLD); //scatter the table to subtables
for (k = 0; k < bigN / 2; k++) //K coeffiencet Loop
{
/* Variables used for the computation */
double sumrealeven = 0.0; //sum of real numbers for even
double sumimageven = 0.0; //sum of imaginary numbers for even
double sumrealodd = 0.0; //sum of real numbers for odd
double sumimagodd = 0.0; //sum of imaginary numbers for odd
for(i = 0; i < (bigN/comm_sz)/2; i++) //Sigma loop EVEN and ODD
{
double factoreven , factorodd = 0.0;
int shiftevenonnonzeroP = my_rank * subtable[2*i][0]; //used to shift index numbers for correct results for EVEN.
int shiftoddonnonzeroP = my_rank * subtable[2*i + 1][0]; //used to shift index numbers for correct results for ODD.
/* -------- EVEN PART -------- */
double realeven = subtable[2*i][1]; //Access table for real number at spot 2i
double complex imaginaryeven = subtable[2*i][2]; //Access table for imaginary number at spot 2i
double complex componeeven = (realeven + imaginaryeven * I); //Create the first component from table
if(my_rank == 0) //if proc 0, dont use shiftevenonnonzeroP
{
factoreven = ((2*PI)*((2*i)*k))/bigN; //Calculates the even factor for Cos() and Sin()
// *********Reduces computational time*********
}
else //use shiftevenonnonzeroP
{
factoreven = ((2*PI)*((shiftevenonnonzeroP)*k))/bigN; //Calculates the even factor for Cos() and Sin()
// *********Reduces computational time*********
}
double complex comptwoeven = (cos(factoreven) - (sin(factoreven)*I)); //Create the second component
evenpart[i] = (componeeven * comptwoeven); //store in the evenpart array
/* -------- ODD PART -------- */
double realodd = subtable[2*i + 1][1]; //Access table for real number at spot 2i+1
double complex imaginaryodd = subtable[2*i + 1][2]; //Access table for imaginary number at spot 2i+1
double complex componeodd = (realodd + imaginaryodd * I); //Create the first component from table
if (my_rank == 0)//if proc 0, dont use shiftoddonnonzeroP
{
factorodd = ((2*PI)*((2*i+1)*k))/bigN;//Calculates the odd factor for Cos() and Sin()
// *********Reduces computational time*********
}
else //use shiftoddonnonzeroP
{
factorodd = ((2*PI)*((shiftoddonnonzeroP)*k))/bigN;//Calculates the odd factor for Cos() and Sin()
// *********Reduces computational time*********
}
double complex comptwoodd = (cos(factorodd) - (sin(factorodd)*I));//Create the second component
oddpart[i] = (componeodd * comptwoodd); //store in the oddpart array
}
/*Process ZERO gathers the even and odd part arrays and creates a evenpartmaster and oddpartmaster array*/
MPI_Gather(evenpart,(bigN / comm_sz / 2),MPI_DOUBLE_COMPLEX,evenpartmaster,(bigN / comm_sz / 2), MPI_DOUBLE_COMPLEX,0,MPI_COMM_WORLD);
MPI_Gather(oddpart,(bigN / comm_sz / 2),MPI_DOUBLE_COMPLEX,oddpartmaster,(bigN / comm_sz / 2), MPI_DOUBLE_COMPLEX,0,MPI_COMM_WORLD);
if(my_rank == 0)
{
for(i = 0; i < (bigN / comm_sz / 2) * comm_sz; i++) //loop to sum the EVEN and ODD parts
{
sumrealeven += creal(evenpartmaster[i]); //sums the realpart of the even half
sumimageven += cimag(evenpartmaster[i]); //sums the imaginarypart of the even half
sumrealodd += creal(oddpartmaster[i]); //sums the realpart of the odd half
sumimagodd += cimag(oddpartmaster[i]); //sums the imaginary part of the odd half
}
storeKsumreal[k] = sumrealeven + sumrealodd; //add the calculated reals from even and odd
storeKsumimag[k] = sumimageven + sumimagodd; //add the calculated imaginary from even and odd
storeKsumreal[k + bigN/2] = sumrealeven - sumrealodd; //ABUSE symmetry Xkreal + N/2 = Evenk - OddK
storeKsumimag[k + bigN/2] = sumimageven - sumimagodd; //ABUSE symmetry Xkimag + N/2 = Evenk - OddK
if(k <= 10) //Do the first 10 K's
{
if(k == 0)
{
fprintf(outfile," \n\n TOTAL PROCESSED SAMPLES : %d\n",bigN);
}
fprintf(outfile,"================================\n");
fprintf(outfile,"XR[%d]: %.4f XI[%d]: %.4f \n",k,storeKsumreal[k],k,storeKsumimag[k]);
fprintf(outfile,"================================\n");
}
}
}
if(my_rank == 0)
{
GET_TIME(finish); //stop timer
double timeElapsed = finish-start; //Time for that iteration
avgtime = avgtime + timeElapsed; //AVG the time
fprintf(outfile,"Time Elaspsed on Iteration %d: %f Seconds\n", (h+1),timeElapsed);
}
}
if(my_rank == 0)
{
avgtime = avgtime / howmanytimesavg; //get avg time
fprintf(outfile,"\nAverage Time Elaspsed: %f Seconds", avgtime);
fclose(outfile); //CLOSE file ONLY proc 0 can.
}
MPI_Barrier(MPI_COMM_WORLD); //wait to all proccesses to catch up before finalize
MPI_Finalize(); //End MPI
return 0;
}
ERROR:
Fatal error in PMPI_Gather: Invalid datatype, error stack:
PMPI_Gather(904): MPI_Gather(sbuf=0x7fffb62799a0, scount=8192,
MPI_DATATYPE_NULL, rbuf=0x7fffb6239980, rcount=8192, MPI_DATATYPE_NULL,
root=0, MPI_COMM_WORLD) failed
PMPI_Gather(815): Datatype for argument sendtype is a null datatype
[unset]: write_line error; fd=-1 buf=:cmd=abort exitcode=537490947
:
system msg for write_line failure : Bad file descriptor
There is no MPI_DATATYPE_NULL in your code, but you only use MPI_DOUBLE_COMPLEX. Note the latter type is a Fortran datatype, and using it in C is not correct strictly speaking.
My guess is that MPI_DOUBLE_COMPLEX is causing the issue (type not defined or not initialized because you invoked the C version of MPI_Init()).
You can obviously rewrite your code in Fortran, or use your own derived datatype for a C double complex number.
Meanwhile, I suggest you write simple C and Fortran helloworld programs that use MPI_DOUBLE_COMPLEX (MPI_Bcast() of one element for example) to confirm the issue is with MPI_DOUBLE_COMPLEX and is restricted to C or not.

Hough transform and OpenCL

I'm trying to implement Hough transform for circles in OpenCL, but i've encountered really weird problem. Every time i run the Hough kernel, i end up with slightly different accumulator, even though parameters are the same and accumulator is always a freshly zero'ed table (ex. http://imgur.com/a/VcIw1). My kernel code is as below:
#define BLOCK_LEN 256
__kernel void HoughCirclesKernel(
__global int* A,
__global int* imgData,
__global int* _width,
__global int* _height,
__global int* r
)
{
__local int imgBuff[BLOCK_LEN];
int localThreadIndex = get_local_id(0); //threadIdx.x
int globalThreadIndex = get_local_id(0) + get_group_id(0) * BLOCK_LEN; //threadIdx.x + blockIdx.x * Block_Len
int width = *_width; int height = *_height;
int radius = *r;
A[globalThreadIndex] = 0;
barrier(CLK_GLOBAL_MEM_FENCE);
if(globalThreadIndex < width*height)
{
imgBuff[localThreadIndex] = imgData[globalThreadIndex];
barrier(CLK_LOCAL_MEM_FENCE);
if(imgBuff[localThreadIndex] > 0)
{
float s1, c1;
for(int i = 0; i<180; i++)
{
s1 = sincos(i, &c1);
int centerX = globalThreadIndex % width + radius * c1;
int centerY = ((globalThreadIndex - centerX) / height) + radius * s1;
if(centerX < width && centerY < height)
atomic_inc(A + centerX + centerY * width);
}
}
}
barrier(CLK_GLOBAL_MEM_FENCE);
}
Could this be the fault of how I am incrementing the accumulator?
if(globalThreadIndex < width*height)
{
imgBuff[localThreadIndex] = imgData[globalThreadIndex];
barrier(CLK_LOCAL_MEM_FENCE);
...
}
this is undefined behaviour since there is a barrier inside a branch.
All streaming units in a compute unit must enter same memory fence.
Try this:
if(globalThreadIndex < width*height)
{
imgBuff[localThreadIndex] = imgData[globalThreadIndex];
...
}
barrier(CLK_LOCAL_MEM_FENCE);
Alse there could be another issue if you are using multiple devices:
get_local_id(0) + get_group_id(0)
here get_group_id(0) is getting group id per device and it starts from 0 for all devices just as get_global_id starts zero too; so you should add proper offsets in the "ndrange" instruction when using multiple devices. Even though different devices can support same floatig point accuracy requirements, one of them may give better accuracy than other and can give slightly different results. If it is single device, then you should try lowering gpu frequencies as it may have defects or side effects of an overclock.
I have managed to solve my problem by finding and correcting three issues.
First of all the kernel code, the line:
int centerY = ((globalThreadIndex - centerX) / height) + radius * s1;
should be:
int centerY = (globalThreadIndex / width) + radius * s1;
The main change here was dividing by width, not height. This caused inaccuracy problems.
if(centerX < width && centerY < height)
The above condition was changed to:
if(x < width && x >= 0)
if(y < height && y >=0)
As for the accumulator problem, first I will post the code I used to create clBuffer (I am using OpenCL.net library for C#):
int[] a = new int[width*height]; //image size
ErrorCode error;
Mem cl_accumulator = (Mem)Cl.CreateBuffer(cl_context, MemFlags.ReadWrite, (IntPtr)(a.Length * sizeof(int)), out error);
CheckErr(error, "Cl.CreateBuffer");
The fix here was simple and pretty much self-explainatory:
int[] a = Enumerable.Repeat(0, width * height).ToArray();
ErrorCode error;
GCHandle accHandle = GCHandle.Alloc(a, GCHandleType.Pinned);
IntPtr accPtr = accHandle.AddrOfPinnedObject();
Mem cl_accumulator = (Mem)Cl.CreateBuffer(cl_context, MemFlags.ReadWrite | MemFlags.CopyHostPtr, (IntPtr)(a.Length * sizeof(int)), accPtr, out error);
CheckErr(error, "Cl.CreateBuffer");
I filled the accumulator table with zeros and then copied it to device buffer each time I executed the kernel.
The above errors caused the accumulator to look different and bit malformed each time I executed the kernel.

How to share work roughly evenly between processes in MPI despite the array_size not being cleanly divisible by the number of processes?

Hi all, I have an array of length N, and I'd like to divide it as best as possible between 'size' processors. N/size has a remainder, e.g. 1000 array elements divided by 7 processes, or 14 processes by 3 processes.
I'm aware of at least a couple of ways of work sharing in MPI, such as:
for (i=rank; i<N;i+=size){ a[i] = DO_SOME_WORK }
However, this does not divide the array into contiguous chunks, which I'd like to do as I believe is faster for IO reasons.
Another one I'm aware of is:
int count = N / size;
int start = rank * count;
int stop = start + count;
// now perform the loop
int nloops = 0;
for (int i=start; i<stop; ++i)
{
a[i] = DO_SOME_WORK;
}
However, with this method, for my first example we get 1000/7 = 142 = count. And so the last rank starts at 852 and ends at 994. The last 6 lines are ignored.
Would be best solution to append something like this to the previous code?
int remainder = N%size;
int start = N-remainder;
if (rank == 0){
for (i=start;i<N;i++){
a[i] = DO_SOME_WORK;
}
This seems messy, and if its the best solution I'm surprised I haven't seen it elsewhere.
Thanks for any help!
If I had N tasks (e.g., array elements) and size workers (e.g., MPI ranks), I would go as follows:
int count = N / size;
int remainder = N % size;
int start, stop;
if (rank < remainder) {
// The first 'remainder' ranks get 'count + 1' tasks each
start = rank * (count + 1);
stop = start + count;
} else {
// The remaining 'size - remainder' ranks get 'count' task each
start = rank * count + remainder;
stop = start + (count - 1);
}
for (int i = start; i <= stop; ++i) { a[i] = DO_SOME_WORK(); }
That is how it works:
/*
# ranks: remainder size - remainder
/------------------------------------\ /-----------------------------\
rank: 0 1 remainder-1 size-1
+---------+---------+-......-+---------+-------+-------+-.....-+-------+
tasks: | count+1 | count+1 | ...... | count+1 | count | count | ..... | count |
+---------+---------+-......-+---------+-------+-------+-.....-+-------+
^ ^ ^ ^
| | | |
task #: rank * (count+1) | rank * count + remainder |
| |
task #: rank * (count+1) + count rank * count + remainder + count - 1
\------------------------------------/
# tasks: remainder * count + remainder
*/
Here's a closed-form solution.
Let N = array length and P = number of processors.
From j = 0 to P-1,
Starting point of array on processor j = floor(N * j / P)
Length of array on processor j = floor(N * (j + 1) / P) – floor(N * j / P)
Consider your "1000 steps and 7 processes" example.
simple division won't work because integer division (in C) gives you the floor, and you are left with some remainder: i.e. 1000 / 7 is 142, and there will be 6 doodads hanging out
ceiling division has the opposite problem: ceil(1000/7) is 143, but then the last processor overruns the array, or ends up with less to do than the others.
You are asking for a scheme to evenly distribute the remainder over processors. Some processes should have 142, others 143. There must be a more formal approach but considering the attention this question's gotten in the last six months maybe not.
Here's my approach. Every process needs to do this algorithm, and just pick out the answer it needs for itself.
#include <mpi.h>
#include <stdio.h>
#include <stdlib.h>
int main (int argc, char ** argv)
{
#define NR_ITEMS 1000
int i, rank, nprocs;;
int *bins;
MPI_Init(&argc, &argv);
MPI_Comm_rank(MPI_COMM_WORLD, &rank);
MPI_Comm_size(MPI_COMM_WORLD, &nprocs);
bins = calloc(nprocs, sizeof(int));
int nr_alloced = 0;
for (i=0; i<nprocs; i++) {
remainder = NR_ITEMS - nr_alloced;
buckets = (nprocs - i);
/* if you want the "big" buckets up front, do ceiling division */
bins[i] = remainder / buckets;
nr_alloced += bins[i];
}
if (rank == 0)
for (i=0; i<nprocs; i++) printf("%d ", bins[i]);
MPI_Finalize();
return 0;
}
I know this is long sense gone but a simple way to do this is to give each process the floor of the (number of items) / (number of processes) + (1 if process_num < num_items mod num_procs). In python, an array with work counts:
# Number of items
NI=128
# Number of processes
NP=20
# Items per process
[NI/NP + (1 if P < NI%NP else 0)for P in range(0,NP)]
Improving off of #Alexander's answer: make use of min to condense the logic.
int count = N / size;
int remainder = N % size;
int start = rank * count + min(rank, remainder);
int stop = (rank + 1) * count + min(rank + 1, remainder);
for (int i = start; i < stop; ++i) { a[i] = DO_SOME_WORK(); }
I think that the best solution is to write yourself a little function for splitting work across processes evenly enough. Here's some pseudo-code, I'm sure you can write C (is that C in your question ?) better than I can.
function split_evenly_enough(num_steps, num_processes)
return = repmat(0, num_processes) ! pseudo-Matlab for an array of num_processes 0s
steps_per_process = ceiling(num_steps/num_processes)
return = steps_per_process - 1 ! set all elements of the return vector to this number
return(1:mod(num_steps, num_processes)) = steps_per_process ! some processes have 1 more step
end
How about this?
int* distribute(int total, int processes) {
int* distribution = new int[processes];
int last = processes - 1;
int remaining = total;
int process = 0;
while (remaining != 0) {
++distribution[process];
--remaining;
if (process != last) {
++process;
}
else {
process = 0;
}
}
return distribution;
}
The idea is that you assign an element to the first process, then an element to the second process, then an element to the third process, and so on, jumping back to the first process whenever the last one is reached.
This method works even when the number of processes is greater than the number of elements. It uses only very simple operations and should therefore be very fast.
I had a similar problem, and here is my non optimum solution with Python and mpi4py API. An optimum solution would take into account how the processors are laid out, here extra work is ditributed to lower ranks. The uneven workload only differ by one task, so it should not be a big deal in general.
from mpi4py import MPI
import sys
def get_start_end(comm,N):
"""
Distribute N consecutive things (rows of a matrix , blocks of a 1D array)
as evenly as possible over a given communicator.
Uneven workload (differs by 1 at most) is on the initial ranks.
Parameters
----------
comm: MPI communicator
N: int
Total number of things to be distributed.
Returns
----------
rstart: index of first local row
rend: 1 + index of last row
Notes
----------
Index is zero based.
"""
P = comm.size
rank = comm.rank
rstart = 0
rend = N
if P >= N:
if rank < N:
rstart = rank
rend = rank + 1
else:
rstart = 0
rend = 0
else:
n = N//P # Integer division PEP-238
remainder = N%P
rstart = n * rank
rend = n * (rank+1)
if remainder:
if rank >= remainder:
rstart += remainder
rend += remainder
else:
rstart += rank
rend += rank + 1
return rstart, rend
if __name__ == '__main__':
comm = MPI.COMM_WORLD
n = int(sys.argv[1])
print(comm.rank,get_start_end(comm,n))

Codility K-Sparse Test **Spoilers**

Have you tried the latest Codility test?
I felt like there was an error in the definition of what a K-Sparse number is that left me confused and I wasn't sure what the right way to proceed was. So it starts out by defining a K-Sparse Number:
In the binary number "100100010000" there are at least two 0s between
any two consecutive 1s. In the binary number "100010000100010" there
are at least three 0s between any two consecutive 1s. A positive
integer N is called K-sparse if there are at least K 0s between any
two consecutive 1s in its binary representation. (My emphasis)
So the first number you see, 100100010000 is 2-sparse and the second one, 100010000100010, is 3-sparse. Pretty simple, but then it gets down into the algorithm:
Write a function:
class Solution { public int sparse_binary_count(String S,String T,int K); }
that, given:
string S containing a binary representation of some positive integer A,
string T containing a binary representation of some positive integer B,
a positive integer K.
returns the number of K-sparse integers within the range [A..B] (both
ends included)
and then states this test case:
For example, given S = "101" (A = 5), T = "1111" (B=15) and K=2, the
function should return 2, because there are just two 2-sparse integers
in the range [5..15], namely "1000" (i.e. 8) and "1001" (i.e. 9).
Basically it is saying that 8, or 1000 in base 2, is a 2-sparse number, even though it does not have two consecutive ones in its binary representation. What gives? Am I missing something here?
Tried solving that one. The assumption that the problem makes about binary representations of "power of two" numbers being K sparse by default is somewhat confusing and contrary.
What I understood was 8-->1000 is 2 power 3 so 8 is 3 sparse. 16-->10000 2 power 4 , and hence 4 sparse.
Even we assume it as true , and if you are interested in below is my solution code(C) for this problem. Doesn't handle some cases correctly, where there are powers of two numbers involved in between the two input numbers, trying to see if i can fix that:
int sparse_binary_count (const string &S,const string &T,int K)
{
char buf[50];
char *str1,*tptr,*Sstr,*Tstr;
int i,len1,len2,cnt=0;
long int num1,num2;
char *pend,*ch;
Sstr = (char *)S.c_str();
Tstr = (char *)T.c_str();
str1 = (char *)malloc(300001);
tptr = str1;
num1 = strtol(Sstr,&pend,2);
num2 = strtol(Tstr,&pend,2);
for(i=0;i<K;i++)
{
buf[i] = '0';
}
buf[i] = '\0';
for(i=num1;i<=num2;i++)
{
str1 = tptr;
if( (i & (i-1))==0)
{
if(i >= (pow((float)2,(float)K)))
{
cnt++;
continue;
}
}
str1 = myitoa(i,str1,2);
ch = strstr(str1,buf);
if(ch == NULL)
continue;
else
{
if((i % 2) != 0)
cnt++;
}
}
return cnt;
}
char* myitoa(int val, char *buf, int base){
int i = 299999;
int cnt=0;
for(; val && i ; --i, val /= base)
{
buf[i] = "0123456789abcdef"[val % base];
cnt++;
}
buf[i+cnt+1] = '\0';
return &buf[i+1];
}
There was an information within the test details, showing this specific case. According to this information, any power of 2 is considered K-sparse for any K.
You can solve this simply by binary operations on integers. You are even able to tell, that you will find no K-sparse integers bigger than some specific integer and lower than (or equal to) integer represented by T.
As far as I can see, you must pay also a lot of attention to the performance, as there are sometimes hundreds of milions of integers to be checked.
My own solution, written in Python, working very efficiently even on large ranges of integers and being successfully tested for many inputs, has failed. The results were not very descriptive, saying it does not work as required within question (although it meets all the requirements in my opinion).
/////////////////////////////////////
solutions with bitwise operators:
no of bits per int = 32 on 32 bit system,check for pattern (for K=2,
like 1001, 1000) in each shift and increment the count, repeat this
for all numbers in range.
///////////////////////////////////////////////////////
int KsparseNumbers(int a, int b, int s) {
int nbits = sizeof(int)*8;
int slen = 0;
int lslen = pow(2, s);
int scount = 0;
int i = 0;
for (; i < s; ++i) {
slen += pow(2, i);
}
printf("\n slen = %d\n", slen);
for(; a <= b; ++a) {
int num = a;
for(i = 0 ; i < nbits-2; ++i) {
if ( (num & slen) == 0 && (num & lslen) ) {
scount++;
printf("\n Scount = %d\n", scount);
break;
}
num >>=1;
}
}
return scount;
}
int main() {
printf("\n No of 2-sparse numbers between 5 and 15 = %d\n", KsparseNumbers(5, 15, 2));
}

Mathematically Find Max Value without Conditional Comparison

----------Updated ------------
codymanix and moonshadow have been a big help thus far. I was able to solve my problem using the equations and instead of using right shift I divided by 29. Because with 32bits signed 2^31 = overflows to 29. Which works!
Prototype in PHP
$r = $x - (($x - $y) & (($x - $y) / (29)));
Actual code for LEADS (you can only do one math function PER LINE!!! AHHHH!!!)
DERIVDE1 = IMAGE1 - IMAGE2;
DERIVED2 = DERIVED1 / 29;
DERIVED3 = DERIVED1 AND DERIVED2;
MAX = IMAGE1 - DERIVED3;
----------Original Question-----------
I don't think this is quite possible with my application's limitations but I figured it's worth a shot to ask.
I'll try to make this simple. I need to find the max values between two numbers without being able to use a IF or any conditional statement.
In order to find the the MAX values I can only perform the following functions
Divide, Multiply, Subtract, Add, NOT, AND ,OR
Let's say I have two numbers
A = 60;
B = 50;
Now if A is always greater than B it would be simple to find the max value
MAX = (A - B) + B;
ex.
10 = (60 - 50)
10 + 50 = 60 = MAX
Problem is A is not always greater than B. I cannot perform ABS, MAX, MIN or conditional checks with the scripting applicaiton I am using.
Is there any way possible using the limited operation above to find a value VERY close to the max?
finding the maximum of 2 variables:
max = a-((a-b)&((a-b)>>31))
where >> is bitwise right-shift (also called SHR or ASR depeding on signedness).
Instead of 31 you use the number of bits your numbers have minus one.
I guess this one would be the most simplest if we manage to find difference between two numbers (only the magnitude not sign)
max = ((a+b)+|a-b|)/2;
where |a-b| is a magnitude of difference between a and b.
If you can't trust your environment to generate the appropriate branchless operations when they are available, see this page for how to proceed. Note the restriction on input range; use a larger integer type for the operation if you cannot guarantee your inputs will fit.
Solution without conditionals. Cast to uint then back to int to get abs.
int abs (a) { return (int)((unsigned int)a); }
int max (a, b) { return (a + b + abs(a - b)) / 2; }
int max3 (a, b, c) { return (max(max(a,b),c); }
Using logical operations only, short circuit evaluation and assuming the C convention of rounding towards zero, it is possible to express this as:
int lt0(int x) {
return x && (!!((x-1)/x));
}
int mymax(int a, int b) {
return lt0(a-b)*b+lt0(b-a)*a;
}
The basic idea is to implement a comparison operator that will return 0 or 1. It's possible to do a similar trick if your scripting language follows the convention of rounding toward the floor value like python does.
function Min(x,y:integer):integer;
Var
d:integer;
abs:integer;
begin
d:=x-y;
abs:=d*(1-2*((3*d) div (3*d+1)));
Result:=(x+y-abs) div 2;
end;
Hmmm. I assume NOT, AND, and OR are bitwise? If so, there's going to be a bitwise expression to solve this. Note that A | B will give a number >= A and >= B. Perhaps there's a pruning method for selecting the number with the most bits.
To extend, we need the following to determine whether A (0) or B (1) is greater.
truth table:
0|0 = 0
0|1 = 1
1|0 = 0
1|1 = 0
!A and B
therefore, will give the index of the greater bit. Ergo, compare each bit in both numbers, and when they are different, use the above expression (Not A And B) to determine which number was greater. Start from the most significant bit and proceed down both bytes. If you have no looping construct, manually compare each bit.
Implementing "when they are different":
(A != B) AND (my logic here)
try this, (but be aware for overflows)
(Code in C#)
public static Int32 Maximum(params Int32[] values)
{
Int32 retVal = Int32.MinValue;
foreach (Int32 i in values)
retVal += (((i - retVal) >> 31) & (i - retVal));
return retVal;
}
You can express this as a series of arithmetic and bitwise operations, e.g.:
int myabs(const int& in) {
const int tmp = in >> ((sizeof(int) * CHAR_BIT) - 1);
return tmp - (in ^ tmp(;
}
int mymax(int a, int b) {
return ((a+b) + myabs(b-a)) / 2;
}
//Assuming 32 bit integers
int is_diff_positive(int num)
{
((num & 0x80000000) >> 31) ^ 1; // if diff positive ret 1 else 0
}
int sign(int x)
{
return ((num & 0x80000000) >> 31);
}
int flip(int x)
{
return x ^ 1;
}
int max(int a, int b)
{
int diff = a - b;
int is_pos_a = sign(a);
int is_pos_b = sign(b);
int is_diff_positive = diff_positive(diff);
int is_diff_neg = flip(is_diff_positive);
// diff (a - b) will overflow / underflow if signs are opposite
// ex: a = INT_MAX , b = -3 then a - b => INT_MAX - (-3) => INT_MAX + 3
int can_overflow = is_pos_a ^ is_pos_b;
int cannot_overflow = flip(can_overflow);
int res = (cannot_overflow * ( (a * is_diff_positive) + (b *
is_diff_negative)) + (can_overflow * ( (a * is_pos_a) + (b *
is_pos_b)));
return res;
}
This is my implementation using only +, -, *, %, / operators
using static System.Console;
int Max(int a, int b) => (a + b + Abs(a - b)) / 2;
int Abs(int x) => x * ((2 * x + 1) % 2);
WriteLine(Max(-100, -2) == -2); // true
WriteLine(Max(2, -100) == 2); // true
I just came up with an expression:
(( (a-b)-|a-b| ) / (2(a-b)) )*b + (( (b-a)-|b-a| )/(2(b-a)) )*a
which is equal to a if a>b and is equal to b if b>a
when a>b:
a-b>0, a-b = |a-b|, (a-b)-|a-b| = 0 so the coeficcient for b is 0
b-a<0, b-a = -|b-a|, (b-a)-|b-a| = 2(b-a)
so the coeficcient for a is 2(b-a)/2(b-a) which is 1
so it would ultimately return 0*b+1*a if a is bigger and vice versa
Find MAX between n & m
MAX = ( (n/2) + (m/2) + ( ((n/2) - (m/2)) * ( (2*((n/2) - (m/2)) + 1) % 2) ) )
Using #define in c:
#define MAX(n, m) ( (n/2) + (m/2) + ( ((n/2) - (m/2)) * ( (2*((n/2) - (m/2)) + 1) % 2) ) )
or
#define ABS(n) ( n * ( (2*n + 1) % 2) ) // Calculates abs value of n
#define MAX(n, m) ( (n/2) + (m/2) + ABS((n/2) - (m/2)) ) // Finds max between n & m
#define MIN(n, m) ( (n/2) + (m/2) - ABS((n/2) - (m/2)) ) // Finds min between n & m
please look at this program.. this might be the best answer till date on this page...
#include <stdio.h>
int main()
{
int a,b;
a=3;
b=5;
printf("%d %d\n",a,b);
b = (a+b)-(a=b); // this line is doing the reversal
printf("%d %d\n",a,b);
return 0;
}
If A is always greater than B .. [ we can use] .. MAX = (A - B) + B;
No need. Just use: int maxA(int A, int B){ return A;}
(1) If conditionals are allowed you do max = a>b ? a : b.
(2) Any other method either use a defined set of numbers or rely on the implicit conditional checks.
(2a) max = a-((a-b)&((a-b)>>31)) this is neat, but it only works if you use 32 bit numbers. You can expand it arbitrary large number N, but the method will fail if you try to find max(N-1, N+1). This algorithm works for finite state automata, but not a Turing machine.
(2b) Magnitude |a-b| is a condition |a-b| = a-b>0 a-b : b-a
What about:
Square root is also a condition. Whenever c>0 and c^2 = d we have second solution -c, because (-c)^2 = (-1)^2*c^2 = 1*c^2 = d. Square root returns the greatest in the pair. I comes with a build in int max(int c1, int c2){return max(c1, c2);}
Without comparison operator math is very symmetric as well as limited in power. Positive and negative numbers cannot be distinguished without if of some sort.
It depends which language you're using, but the Ternary Operator might be useful.
But then, if you can't perform conditional checks in your 'scripting application', you probably don't have the ternary operator.
using System;
namespace ConsoleApp2
{
class Program
{
static void Main(string[] args)
{
float a = 101, b = 15;
float max = (a + b) / 2 + ((a > b) ? a - b : b - a) / 2;
}
}
}
#region GetMaximumNumber
/// <summary>
/// Provides method to get maximum values.
/// </summary>
/// <param name="values">Integer array for getting maximum values.</param>
/// <returns>Maximum number from an array.</returns>
private int GetMaximumNumber(params int[] values)
{
// Declare to store the maximum number.
int maximumNumber = 0;
try
{
// Check that array is not null and array has an elements.
if (values != null &&
values.Length > 0)
{
// Sort the array in ascending order for getting maximum value.
Array.Sort(values);
// Get the last value from an array which is always maximum.
maximumNumber = values[values.Length - 1];
}
}
catch (Exception ex)
{
throw ex;
}
return maximumNumber;
}
#endregion

Resources