Frequency Shift without Intermediate Buffer using IPP

Frequency Shift without Intermediate Buffer using IPP - math

I have a buffer full of real/imaginary sample values of a signal, and would like to shift the signal in the frequency domain by multiplying a sinusoid.
I see myself with four options:
some IPP function (I could not find one though)
manually calculate the result (probably slower than IPP)
generate the sinusoid in a separate buffer (probably requires lots of memory)
generate parts of the sinusoid in a separate buffer (requires recalculating the tone buffer)
I'm wondering what would be the best approach here, and/or whether I have just missed that there is a readymade function for frequency shifting a complex signal.

If you are going for speed, do it in the frequency domain.
FFT -> circular shift by N bins -> IFFT
I have found the ffw++ wrapper quite handy.
If you are really set on doing it in the time domain, you could use Intel's VML functions in some fashion like this:
// Create a single period of frequency offset wave
vector<complex<float> > cxWave(period);
for(int i = 0; i < period; ++i)
cxWave = i * 2 * M_PI / period;
vcExp( period, &cxWave.at(0), &cxWave.at(0) );
// Multiply entire signal by the complex sinusoid
for(int frame=0; frame < numFrames; ++frame)
{
vcMul( period, &input.at(frame*period), &cxWave.at(0), &cxWave.at(0) );
}
You would of course need to fill in the blanks.

Related

Trying to understand how a series of arrays is being mapped in an AVR routine

I'm trying to port an Arduino AVR routine either to ESP32/8266 or a Python script and would appreciate understanding how to crack the operation of this program. I'm self-teaching and am only looking to get something that works - pretty isn't required. This is a hobby and I am the only audience. The basic operations are understood (99% certain ;)) - there are 4 arrays total: Equilarg and Nodefactor contain 10 rows of 37 values; startSecs contains the epochtime values for the start of each year (2022-2032); and speed contains 37 values.
I believe each row of the Equilarg and Nodefactor arrays corresponds to the year, but I can't work out how the the specific element is pulled from each of the 3, 37 element arrays.
Here is the operating code:
// currentTide calculation function, takes a DateTime object from real time clock.
float TideCalc::currentTide (DateTime now)
{
// Calculate difference between current year and starting year.
YearIndx = now.year() - startYear;
// Calculate hours since start of current year. Hours = seconds / 3600
currHours = (now.unixtime() - pgm_read_dword_near (&startSecs[YearIndx])) / float(3600);
// Shift currHours to Greenwich Mean Time
currHours = currHours + adjustGMT;
// **************Calculate current tide height**********
// initialize results variable, units of feet.
// (This is 3.35 if it matters to understanding how it works)
tideHeight = Datum;
for (int harms = 0; harms < 37; harms++)
{
// Step through each harmonic constituent, extract the relevant
// values of Nodefactor, Amplitude, Equilibrium argument, Kappa
// and Speed.
currNodefactor = pgm_read_float_near (&Nodefactor[YearIndx][harms]);
currAmp = pgm_read_float_near (&Amp[harms]);
currEquilarg = pgm_read_float_near (&Equilarg[YearIndx][harms]);
currKappa = pgm_read_float_near (&Kappa[harms]);
currSpeed = pgm_read_float_near (&Speed[harms]);
// Calculate each component of the overall tide equation
// The currHours value is assumed to be in hours from the start of
// the year, in the Greenwich Mean Time zone, not the local time zone.
tideHeight = tideHeight + currNodefactor * currAmp
* cos ((currSpeed * currHours + currEquilarg - currKappa) * DEG_TO_RAD);
}
//***************End of Tide Height calculation**********
// Output of tideCalc is the tide height, units of feet.
return tideHeight;
}
I've made several attempts to reverse engineer by running the code on an AVR board and trapping the input values and then work backwards but I'm just not seeing a basic part or two. In this instance knowing "kinda" what's going on falls too short.

pgm_read_float_near reads a float value from flash memory. It needs the address of the value. We give it the address of the indexed value when we use &Amp[harms] for example. Both Nodefactor and Equilarg are doubly indexed - by year and then by harmonic, while the other three are indexed by the harmonic alone.
It sounds like this is a Fourier series curve fit for the tide height. So they're summing up a series of cosine values, each with different amplitude, frequency, and phase.
As #Tom suggests, copy the code to a plain C file, make a little routine for a dummy pgm_read_float_near and see how it works on your PC. Many times I write and debug algorithms on a "big" computer, and later plop the code into the Arduino.
Have fun!

Arduino: find the peak and troughs in sensor data

I'm trying to poll an accelerometer every x ms on 3 axiz, and trying to figure out how to determine the peaks and troughs of the readings I get.
Ideally, I wouldn't want to collect a whole bunch of data before I can start counting the peaks - maybe every 10 minutes at most if data collection is first required. The peaks should also only be counted if the absolute value of the peak is within an acceptable "distance" of the average set of peaks - to prevent a very small peak from being counted...
I would appreciate any pointers wrt doing this?

You can start with calculation of standard deviation and count peaks which deviates more than a specified level.
Wiki article

It depends on what your signal looks like. I solved a similar problem with a sinusoidal signal using an used a IIR filter on the signal to smooth out the noise and prevent false peaks.
You might try something like this:
int signalPin = 3; // accel connected to pin 3
float signal;
float gain = 0.1;
void setup()
{
pinMode(signalPin, INPUT);
signal = analogRead(signalPin); // get initial reading
}
void loop()
{
// allow a new reading of the accel to slightly change the signal (depending on the value of gain)
signal += (analogRead(signalPin)*gain - signal); // IIR filter
}
You will probably also want to use the map function to adjust the value obtained from analogRead, as well as use int or long instead of float if you're worried about efficiency.
Now you can use a strategy like the following:
Read a new value from the accel (checking if a new value is higher than the previous and if so, save it as the new max. When they stop increasing, that is your max value. You can require N number of samples within a certain threshold to verify you are at the flat top of the peak. Hopefully, this gets you started.

How to get a "random" number in OpenCL

I'm looking to get a random number in OpenCL. It doesn't have to be real random or even that random. Just something simple and quick.
I see there is a ton of real random parallelized fancy pants random algorithms in OpenCL that are like thousand and thousands of lines. I do NOT need anything like that. A simple 'random()' would be fine, even if it is easy to see patterns in it.
I see there is a Noise function? Any easy way to use that to get a random number?

I was solving this "no random" issue for last few days and I came up with three different approaches:
Xorshift - I created generator based on this one. All you have to do is provide one uint2 number (seed) for whole kernel and every work item will compute his own rand number
// 'randoms' is uint2 passed to kernel
uint seed = randoms.x + globalID;
uint t = seed ^ (seed << 11);
uint result = randoms.y ^ (randoms.y >> 19) ^ (t ^ (t >> 8));
Java random - I used code from .next(int bits) method to generate random number. This time you have to provide one ulong number as seed.
// 'randoms' is ulong passed to kernel
ulong seed = randoms + globalID;
seed = (seed * 0x5DEECE66DL + 0xBL) & ((1L << 48) - 1);
uint result = seed >> 16;
Just generate all on CPU and pass it to kernel in one big buffer.
I tested all three approaches (generators) in my evolution algorithm computing Minimum Dominating Set in graphs.
I like the generated numbers from the first one, but it looks like my evolution algorithm doesn't.
Second generator generates numbers that has some visible pattern but my evolution algorithm likes it that way anyway and whole thing run little faster than with the first generator.
But the third approach shows that it's absolutely fine to just provide all numbers from host (cpu). First I though that generating (in my case) 1536 int32 numbers and passing them to GPU in every kernel call would be too expensive (to compute and transfer to GPU). But it turns out, it is as fast as my previous attempts. And CPU load stays under 5%.
BTW, I also tried MWC64X Random but after I installed new GPU driver the function mul_hi starts causing build fail (even whole AMD Kernel Analyer crashed).

the following is the algorithm used by the java.util.Random class according to the doc:
(seed * 0x5DEECE66DL + 0xBL) & ((1L << 48) - 1)
See the documentation for its various implementations. Passing the worker's id in for the seed and looping a few time should produce decent randomness
or another metod would be to have some random operations occur that are fairly ceratain to overflow:
long rand= yid*xid*as_float(xid-yid*xid);
rand*=rand<<32^rand<<16|rand;
rand*=rand+as_double(rand);
with xid=get_global_id(0); and yid= get_global_id(1);

I am currently implementing a Realtime Path Tracer. You might already know that Path Tracing requires many many random numbers.
Before generating random numbers on the GPU I simply generated them on the CPU (using rand(), which sucks) and passed them to the GPU.
That quickly became a bottleneck.
Now I am generating the random numbers on the GPU with the Park-Miller Pseudorandom Number Generator (PRNG).
It is extremely simple to implement and achieves very good results.
I took thousands of samples (in the range of 0.0 to 1.0) and averaged them together.
The resulting value was very close to 0.5 (which is what you would expect). Between different runs the divergence from 0.5 was around 0.002. Therefore it has a very uniform distribution.
Here's a paper describing the algorithm:http://www.cems.uwe.ac.uk/~irjohnso/coursenotes/ufeen8-15-m/p1192-parkmiller.pdf
And here's a paper about the above algorithm optimized for CUDA (which can easily be ported to OpenCL): http://www0.cs.ucl.ac.uk/staff/ucacbbl/ftp/papers/langdon_2009_CIGPU.pdf
Here's an example of how I'm using it:
int rand(int* seed) // 1 <= *seed < m
{
int const a = 16807; //ie 7**5
int const m = 2147483647; //ie 2**31-1
*seed = (long(*seed * a))%m;
return(*seed);
}
kernel random_number_kernel(global int* seed_memory)
{
int global_id = get_global_id(1) * get_global_size(0) + get_global_id(0); // Get the global id in 1D.
// Since the Park-Miller PRNG generates a SEQUENCE of random numbers
// we have to keep track of the previous random number, because the next
// random number will be generated using the previous one.
int seed = seed_memory[global_id];
int random_number = rand(&seed); // Generate the next random number in the sequence.
seed_memory[global_id] = *seed; // Save the seed for the next time this kernel gets enqueued.
}
The code serves just as an example. I have not tested it.
The array "seed_memory" is being filled with rand() only once before the first execution of the kernel. After that, all random number generation is happening on the GPU. I think it's also possible to simply use the kernel id instead of initializing the array with rand().

It seems OpenCL does not provide such functionality. However, some people have done some research on that and provide BSD licensed code for producing good random numbers on GPU.

This is my version of OpenCL float pseudorandom noise, using trigonometric function
//noise values in range if 0.0 to 1.0
static float noise3D(float x, float y, float z) {
float ptr = 0.0f;
return fract(sin(x*112.9898f + y*179.233f + z*237.212f) * 43758.5453f, &ptr);
}
__kernel void fillRandom(float seed, __global float* buffer, int length) {
int gi = get_global_id(0);
float fgi = float(gi)/length;
buffer[gi] = noise3D(fgi, 0.0f, seed);
}
You can generate 1D or 2D noize by passing to noise3D normalized index coordinates as a first parameters, and the random seed (generated on CPU for example) as a last parameter.
Here are some noise pictures generated with this kernel and different seeds:

GPU don't have good sources of randomness, but this can be easily overcome by seeding a kernel with a random seed from the host. After that, you just need an algorithm that can work with a massive number of concurrent threads.
This link describes a Mersenne Twister implementation using OpenCL: Parallel Mersenne Twister. You can also find an implementation in the NVIDIA SDK.

I had the same problem.
www.thesalmons.org/john/random123/papers/random123sc11.pdf
You can find the documentation here.
http://www.thesalmons.org/john/random123/releases/latest/docs/index.html
You can download the library here:
http://www.deshawresearch.com/resources_random123.html

why not? you could just write a kernel that generates random numbers, tough that would need more kernel calls and eventually passing the random numbers as argument to your other kernel which needs them

you cant generate random numbers in kernel , the best option is to generate the random number in host (CPU) and than transfer that to the GPU through buffers and use it in the kernel.

Simple physics-based movement

I'm working on a 2D game where I'm trying to accelerate an object to a top speed using some basic physics code.
Here's the pseudocode for it:
const float acceleration = 0.02f;
const float friction = 0.8f; // value is always 0.0..1.0
float velocity = 0;
float position = 0;
move()
{
velocity += acceleration;
velocity *= friction;
position += velocity;
}
This is a very simplified approach that doesn't rely on mass or actual friction (the in-code friction is just a generic force acting against movement). It works well as the "velocity *= friction;" part keeps the velocity from going past a certain point. However, it's this top speed and its relationship to the acceleration and friction where I'm a bit lost.
What I'd like to do is set a top speed, and the amount of time it takes to reach it, then use them to derive the acceleration and friction values.
i.e.,
const float max_velocity = 2.0;
const int ticks; = 120; // If my game runs at 60 FPS, I'd like a
// moving object to reach max_velocity in
// exactly 2 seconds.
const float acceleration = ?
const float friction = ?

I found this question very interesting since I had recently done some work on modeling projectile motion with drag.
Point 1: You are essentially updating the position and velocity using an explicit/forward Euler iteration where each new value for the states should be a function of the old values. In such a case, you should be updating the position first, then updating the velocity.
Point 2: There are more realistic physics models for the effect of drag friction. One model (suggested by Adam Liss) involves a drag force that is proportional to the velocity (known as Stokes' drag, which generally applies to low velocity situations). The one I previously suggested involves a drag force that is proportional to the square of the velocity (known as quadratic drag, which generally applies to high velocity situations). I'll address each one with regard to how you would deduce formulas for the maximum velocity and the time required to effectively reach the maximum velocity. I'll forego the complete derivations since they are rather involved.
Stokes' drag:
The equation for updating the velocity would be:
velocity += acceleration - friction*velocity
which represents the following differential equation:
dv/dt = a - f*v
Using the first entry in this integral table, we can find the solution (assuming v = 0 at t = 0):
v = (a/f) - (a/f)*exp(-f*t)
The maximum (i.e. terminal) velocity occurs when t >> 0, so that the second term in the equation is very close to zero and:
v_max = a/f
Regarding the time needed to reach the maximum velocity, note that the equation never truly reaches it, but instead asymptotes towards it. However, when the argument of the exponential equals -5, the velocity is around 98% of the maximum velocity, probably close enough to consider it equal. You can then approximate the time to maximum velocity as:
t_max = 5/f
You can then use these two equations to solve for f and a given a desired vmax and tmax.
Quadratic drag:
The equation for updating the velocity would be:
velocity += acceleration - friction*velocity*velocity
which represents the following differential equation:
dv/dt = a - f*v^2
Using the first entry in this integral table, we can find the solution (assuming v = 0 at t = 0):
v = sqrt(a/f)*(exp(2*sqrt(a*f)*t) - 1)/(exp(2*sqrt(a*f)*t) + 1)
The maximum (i.e. terminal) velocity occurs when t >> 0, so that the exponential terms are much greater than 1 and the equation approaches:
v_max = sqrt(a/f)
Regarding the time needed to reach the maximum velocity, note that the equation never truly reaches it, but instead asymptotes towards it. However, when the argument of the exponential equals 5, the velocity is around 99% of the maximum velocity, probably close enough to consider it equal. You can then approximate the time to maximum velocity as:
t_max = 2.5/sqrt(a*f)
which is also equivalent to:
t_max = 2.5/(f*v_max)
For a desired vmax and tmax, the second equation for tmax will tell you what f should be, and then you can plug that in to the equation for vmax to get the value for a.
This seems like a bit of overkill, but these are actually some of the simplest ways to model drag! Anyone who really wants to see the integration steps can shoot me an email and I'll send them to you. They are a bit too involved to type here.
Another Point: I didn't immediately realize this, but the updating of the velocity is not necessary anymore if you instead use the formulas I derived for v(t). If you are simply modeling acceleration from rest, and you are keeping track of the time since the acceleration began, the code would look something like:
position += velocity_function(timeSinceStart)
where "velocity_function" is one of the two formulas for v(t) and you would no longer need a velocity variable. In general, there is a trade-off here: calculating v(t) may be more computationally expensive than simply updating velocity with an iterative scheme (due to the exponential terms), but it is guaranteed to remain stable and bounded. Under certain conditions (like trying to get a very short tmax), the iteration can become unstable and blow-up, a common problem with the forward Euler method. However, maintaining limits on the variables (like 0 < f < 1), should prevent these instabilities.
In addition, if you're feeling somewhat masochistic, you may be able to integrate the formula for v(t) to get a closed form solution for p(t), thus foregoing the need for a Newton iteration altogether. I'll leave this for others to attempt. =)

Warning: Partial Solution
If we follow the physics as stated, there is no maximum velocity. From a purely physical viewpoint, you've fixed the acceleration at a constant value, which means the velocity is always increasing.
As an alternative, consider the two forces acting on your object:
The constant external force, F, that tends to accelerate it, and
The force of drag, d, which is proportional to the velocity and tends to slow it down.
So the velocity at iteration n becomes: vn = v0 + n F - dvn-1
You've asked to choose the maximum velocity, vnmax, that occurs at iteration nmax.
Note that the problem is under-constrained; that is, F and d are related, so you can arbitrarily choose a value for one of them, then calculate the other.
Now that the ball's rolling, is anyone willing to pick up the math?
Warning: it's ugly and involves power series!
Edit: Why doe the sequence n**F** in the first equation appear literally unless there's a space after the n?

velocity *= friction;
This doesn't prevent the velocity from going about a certain point...
Friction increases exponentially (don't quote me on that) as the velocity increases, and will be 0 at rest. Eventually, you will reach a point where friction = acceleration.
So you want something like this:
velocity += (acceleration - friction);
position += velocity;
friction = a*exp(b*velocity);
Where you pick values for a and b. b will control how long it takes to reach top speed, and a will control how abruptly the friction increases. (Again, don't do your own research on this- I'm going from what I remember from grade 12 physics.)

This isn't answering your question, but one thing you shouldn't do in simulations like this is depend on a fixed frame rate. Calculate the time since the last update, and use the delta-T in your equations. Something like:
static double lastUpdate=0;
if (lastUpdate!=0) {
deltaT = time() - lastUpdate;
velocity += acceleration * deltaT;
position += velocity * deltaT;
}
lastUpdate = time();
It's also good to check if you lose focus and stop updating, and when you gain focus set lastUpdate to 0. That way you don't get a huge deltaT to process when you get back.

If you want to see what can be done with very simple physics models using very simple maths, take a look at some of the Scratch projects at http://scratch.mit.edu/ - you may get some useful ideas & you'll certainly have fun.

This is probably not what you are looking for but depending on what engine you are working on, it might be better to use a engine built by some one else, like farseer(for C#).
Note Codeplex is down for maintenance.

How to test randomness (case in point - Shuffling)

First off, this question is ripped out from this question. I did it because I think this part is bigger than a sub-part of a longer question. If it offends, please pardon me.
Assume that you have a algorithm that generates randomness. Now how do you test it?
Or to be more direct - Assume you have an algorithm that shuffles a deck of cards, how do you test that it's a perfectly random algorithm?
To add some theory to the problem -
A deck of cards can be shuffled in 52! (52 factorial) different ways. Take a deck of cards, shuffle it by hand and write down the order of all cards. What is the probability that you would have gotten exactly that shuffle? Answer: 1 / 52!.
What is the chance that you, after shuffling, will get A, K, Q, J ... of each suit in a sequence? Answer 1 / 52!
So, just shuffling once and looking at the result will give you absolutely no information about your shuffling algorithms randomness. Twice and you have more information, Three even more...
How would you black box test a shuffling algorithm for randomness?

Statistics. The de facto standard for testing RNGs is the Diehard suite (originally available at http://stat.fsu.edu/pub/diehard). Alternatively, the Ent program provides tests that are simpler to interpret but less comprehensive.
As for shuffling algorithms, use a well-known algorithm such as Fisher-Yates (a.k.a "Knuth Shuffle"). The shuffle will be uniformly random so long as the underlying RNG is uniformly random. If you are using Java, this algorithm is available in the standard library (see Collections.shuffle).
It probably doesn't matter for most applications, but be aware that most RNGs do not provide sufficient degrees of freedom to produce every possible permutation of a 52-card deck (explained here).

Here's one simple check that you can perform. It uses generated random numbers to estimate Pi. It's not proof of randomness, but poor RNGs typically don't do well on it (they will return something like 2.5 or 3.8 rather ~3.14).
Ideally this would be just one of many tests that you would run to check randomness.
Something else that you can check is the standard deviation of the output. The expected standard deviation for a uniformly distributed population of values in the range 0..n approaches n/sqrt(12).
/**
* This is a rudimentary check to ensure that the output of a given RNG
* is approximately uniformly distributed. If the RNG output is not
* uniformly distributed, this method will return a poor estimate for the
* value of pi.
* #param rng The RNG to test.
* #param iterations The number of random points to generate for use in the
* calculation. This value needs to be sufficiently large in order to
* produce a reasonably accurate result (assuming the RNG is uniform).
* Less than 10,000 is not particularly useful. 100,000 should be sufficient.
* #return An approximation of pi generated using the provided RNG.
*/
public static double calculateMonteCarloValueForPi(Random rng,
int iterations)
{
// Assumes a quadrant of a circle of radius 1, bounded by a box with
// sides of length 1. The area of the square is therefore 1 square unit
// and the area of the quadrant is (pi * r^2) / 4.
int totalInsideQuadrant = 0;
// Generate the specified number of random points and count how many fall
// within the quadrant and how many do not. We expect the number of points
// in the quadrant (expressed as a fraction of the total number of points)
// to be pi/4. Therefore pi = 4 * ratio.
for (int i = 0; i < iterations; i++)
{
double x = rng.nextDouble();
double y = rng.nextDouble();
if (isInQuadrant(x, y))
{
++totalInsideQuadrant;
}
}
// From these figures we can deduce an approximate value for Pi.
return 4 * ((double) totalInsideQuadrant / iterations);
}
/**
* Uses Pythagoras' theorem to determine whether the specified coordinates
* fall within the area of the quadrant of a circle of radius 1 that is
* centered on the origin.
* #param x The x-coordinate of the point (must be between 0 and 1).
* #param y The y-coordinate of the point (must be between 0 and 1).
* #return True if the point is within the quadrant, false otherwise.
*/
private static boolean isInQuadrant(double x, double y)
{
double distance = Math.sqrt((x * x) + (y * y));
return distance <= 1;
}

First, it is impossible to know for sure if a certain finite output is "truly random" since, as you point out, any output is possible.
What can be done, is to take a sequence of outputs and check various measurements of this sequence against what is more likely. You can derive a sort of confidence score that the generating algorithm is doing a good job.
For example, you could check the output of 10 different shuffles. Assign a number 0-51 to each card, and take the average of the card in position 6 across the shuffles. The convergent average is 25.5, so you would be surprised to see a value of 1 here. You could use the central limit theorem to get an estimate of how likely each average is for a given position.
But we shouldn't stop here! Because this algorithm could be fooled by a system that only alternates between two shuffles that are designed to give the exact average of 25.5 at each position. How can we do better?
We expect a uniform distribution (equal likelihood for any given card) at each position, across different shuffles. So among the 10 shuffles, we could try to verify that the choices 'look uniform.' This is basically just a reduced version of the original problem. You could check that the standard deviation looks reasonable, that the min is reasonable, and the max value as well. You could also check that other values, such as the closest two cards (by our assigned numbers), also make sense.
But we also can't just add various measurements like this ad infinitum, since, given enough statistics, any particular shuffle will appear highly unlikely for some reason (e.g. this is one of very few shuffles in which cards X,Y,Z appear in order). So the big question is: which is the right set of measurements to take? Here I have to admit that I don't know the best answer. However, if you have a certain application in mind, you can choose a good set of properties/measurements to test, and work with those -- this seems to be the way cryptographers handle things.

There's a lot of theory on testing randomness. For a very simple test on a card shuffling algorithm you could do a lot of shuffles and then run a chi squared test that the probability of each card turning up in any position was uniform. But that doesn't test that consecutive cards aren't correlated so you would also want to do tests on that.
Volume 2 of Knuth's Art of Computer Programming gives a number of tests that you could use in sections 3.3.2 (Empirical tests) and 3.3.4 (The Spectral Test) and the theory behind them.

The only way to test for randomness is to write a program that attempts to build a predictive model for the data being tested, and then use that model to try to predict future data, and then showing that the uncertainty, or entropy, of its predictions tend towards maximum (i.e. the uniform distribution) over time. Of course, you'll always be uncertain whether or not your model has captured all of the necessary context; given a model, it'll always be possible to build a second model that generates non-random data that looks random to the first. But as long as you accept that the orbit of Pluto has an insignificant influence on the results of the shuffling algorithm, then you should be able to satisfy yourself that its results are acceptably random.
Of course, if you do this, you might as well use your model generatively, to actually create the data you want. And if you do that, then you're back at square one.

Shuffle alot, and then record the outcomes (if im reading this correctly). I remember seeing comparisons of "random number generators". They just test it over and over, then graph the results.
If it is truly random the graph will be mostly even.

I'm not fully following your question. You say
Assume that you have a algorithm that generates randomness. Now how do you test it?
What do you mean? If you're assuming you can generate randomness, there's no need to test it.
Once you have a good random number generator, creating a random permutation is easy (e.g. Call your cards 1-52. Generate 52 random numbers assigning each one to a card in order, and then sort according to your 52 randoms) . You're not going to destroy the randomness of your good RNG by generating your permutation.
The difficult question is whether you can trust your RNG. Here's a sample link to people discussing that issue in a specific context.

Testing 52! possibilities is of course impossible. Instead, try your shuffle on smaller numbers of cards, like 3, 5, and 10. Then you can test billions of shuffles and use a histogram and the chi-square statistical test to prove that each permutation is coming up an "even" number of times.

No code so far, therefore I copy-paste a testing part from my answer to the original question.
// ...
int main() {
typedef std::map<std::pair<size_t, Deck::value_type>, size_t> Map;
Map freqs;
Deck d;
const size_t ntests = 100000;
// compute frequencies of events: card at position
for (size_t i = 0; i < ntests; ++i) {
d.shuffle();
size_t pos = 0;
for(Deck::const_iterator j = d.begin(); j != d.end(); ++j, ++pos)
++freqs[std::make_pair(pos, *j)];
}
// if Deck.shuffle() is correct then all frequencies must be similar
for (Map::const_iterator j = freqs.begin(); j != freqs.end(); ++j)
std::cout << "pos=" << j->first.first << " card=" << j->first.second
<< " freq=" << j->second << std::endl;
}
This code does not test randomness of underlying pseudorandom number generator. Testing PRNG randomness is a whole branch of science.

For a quick test, you can always try compressing it. Once it doesn't compress, then you can move onto other tests.
I've tried dieharder but it refuses to work for a shuffle. All tests fail. It is also really stodgy, it wont let you specify the range of values you want or anything like that.

Pondering it myself, what I would do is something like:
Setup (Pseudo code)
// A card has a Number 0-51 and a position 0-51
int[][] StatMatrix = new int[52][52]; // Assume all are set to 0 as starting values
ShuffleCards();
ForEach (card in Cards) {
StatMatrix[Card.Position][Card.Number]++;
}
This gives us a matrix 52x52 indicating how many times a card has ended up at a certain position. Repeat this a large number of times (I would start with 1000, but people better at statistics than me may give a better number).
Analyze the matrix
If we have perfect randomness and perform the shuffle an infinite number of times then for each card and for each position the number of times the card ended up in that position is the same as for any other card. Saying the same thing in a different way:
statMatrix[position][card] / numberOfShuffle = 1/52.
So I would calculate how far from that number we are.

Develop Reference

r css asp.net wordpress firebase qt symfony nginx http apache-flex