Calculating pi using Monte Carlo and MPI_Reduce - mpi

I am working on a project where we need to parallelize this problem using MPI. So the basic idea is each process will get its share of points, do the test (whether the points are in the circle) then call MPI_Reduce. The root then will get the reduced result and present the final answer.
What I am confused about is what to reduce. Should each process calculate pi, call reduce with the pi and the root just take the average of the reduced pi value ? or Should each process call reduce with the number of hits(successful points inside the circle) and then the root calculates the pi using this result? Hope this was clear. Thanks.

I would definitely do the latter: have each process return the number of hits and have the root function calculate pi using these results. In this manner, you doing the same process as you would if calculating this serially.

Related

Signal to filter in R

I need to filter a signal without it losing its properties so that later this signal is inserted into an artificial neural network. I'm using the R and the signal library, I thought about using a low-pass filter or an FFT.
This is the signal to be filtered, it is about shifting pixels in a video. In the case I calculated the resultant of vectors X and Y to obtain only one value and thus generate this graph / signal:
Using the signal library and the fftfilt function, I obtained the following signal, which seems to be easier for a neural network to be trained, but I did not understand what the function is doing and if the signal properties have remained.
resulting <- fftfilt(rep(1,50)/50,resulting)
Could someone explain how this function works or suggest a better method to filter this signal.
As for the fftfilt(...) function I can tell you roughly what it does: it is an approximate finite impulse response filter implementation that uses FFT of the filter impulse response function with some padding as a window. It gets the signal's FFT within the window and filter's IR FFT, then generates the filtered signal in frequency domain by just multiplying the two results and then uses the reverse FFT to get the actual result in time domain. If your FIR filter has a huge number of coefficients (even though in many cases it is just a sign of a bad system design and should not be needed) the function works much faster than filter(Ma(...),...). With more reasonable number of coefficients (like definitely below 100) the direct and precise approach is actually faster.
As for the proper methods of filtering there are so many of them that there are full thick books on just the topic. And my personal experience in the field is a little bit skewed towards somewhat specific very low computing power microcontroller-based sensor signal DSP tricks with fixed point arithmetics, precise unity gains in a selected pass band point, power of 2 scale coefficients and staged implementations so I doubt it would help you in this case. Basically first you just need to know what result do you want to get from your filter (and do you even need to filter or maybe just something like peak detection and decimation is enough) and then just know what you are doing. From your message it is hard to guess what your "neural network" requirements are, what you think you need for it and what you actually need.

Numerical optimization with MPI

I am trying to parallelize an optimization routine by using MPI directives. The structure of the program is roughly like in the block diagram at the end of the text. Data is fed to the optimization routine, it calls an Objective function subroutine and another subroutine, which calculates a matrix called “Jacobian”. The Optimization routine iterates as many times as needed to reach a minimum of the Objective function and exits with a result.The Jacobian is used to decide in which direction the minimum might be and to take a step in that direction. I don’t have control over the Optimization routine, I only supply the Objective function and the function calculating the Jacobian. Most of the time is spend on calculating the Jacobian. Since each matrix element of the Jacobian is independent of the rest of the elements, it seems as a good candidate for parallelization. However, I haven’t been able to accomplish this. Initially I was thinking that I can distribute the calculation of the Jacobian over a large number of nodes, each of which would calculate only some of the matrix elements. I did that but after just one iteration all the threads on the nodes exit and the program stalls. I am starting to think that without the source code of the Optimization routine this might not be possible. The reason is that distributing the code over multiple nodes and instructing them to only calculate a fraction of the Jacobian messes up the optimization on all of them, except the master. Is there a way around it, using MPI and without touching the code in the optimization routine ? Can only the function calculating the Jacobian be executed on all nodes except the master ? How would you do this ?
It turned out easier than I thought. As explained in the question, the worker threads were exiting after just one iteration. The solution is to just enclose the code in the Jacobian calculation executed by the workers with an infinite while loop and break out of it by sending a message from the main thread (master) once it exits with the answer.

Trying to know the cut off point of an inbuilt function, since currently it is not running. In R

In R, I am trying to use the markov chain package and converting clickstream data to markov chain. I have 4GB of RAM but the program cannot run the command(after a lot of time). This is because after a while the ongoing conversion cannot allocate more than 3969mb of data(that is what the screen says). I am trying to find out that, as to what point will the program run? So if I have say `n' nodes, till how many nodes(obviously less than n) or rows(the rows might contain same or different nodes) will the program run. I am trying to do Attribution Modelling using R. The conversion path are converted from clickstream form to a markov chain. Trying to find out the transition matrix using that.
Image with the function and a sample dataset. Here the h,c,d,p are different nodes. Image here of the code for a small clickstream data
Attached the image of the code and a sample data. The function converts this data into a markov chain containing a lot of important things out of which I am mainly trying to get the Transition Matrix and the Steady State. As I increase the data size(the number of different channel path or Users are not important, it is the different nodes that are important), the function is unable to perform as it cannot allocate more than the 4GB of RAM. I tried hit and trial to get to the point beyond which the function is not working but it did not help. Is there a way where I can know that till what node(or row) will the function work? So that I can generate the Transition Matrix till that point. And maybe the increase in the memory usage with every increasing node as I would like to believe the relationship between the two won't be linear.
Please let me know if the question is not specific enough and if it might need any more details.

Find the radius of a cluster, given that its center is the average of the centers of two other clusters

I do not know if it is possible to find it, but I am using Kmeans clustering with Mahout, and I am stuck to the following.
In my implementation, I create with two different threads the following clusters:
CL-1{n=4 c=[1.75] r=[0.82916]}
CL-1{n=2 c=[4.5] r=[0.5]}
So, I would like to finally combine these two clusters into one final cluster.
In my code, I manage to find that for the final cluster the total points are n=6, the new average of the centers is c=2.666 but I am not able to find the final combined radius.
I know that the radius is the Population Standard Deviation, and I can calculate it if I previously know each point that belongs to the cluster.
However, in my case I do not have previous knowledge of the points, so I need the "average" of the 2 radius I mentioned before, in order to finally have this: CL-1{n=6 c=[2.666] r=[???]}.
Any ideas?
Thanks for your help.
It's not hard. Remember how the "radius" (not a very good name) is computed.
It's probably the standard deviation; so if you square this value and multiply it by the number of objects, you get the sum of squares. You can aggregate the sum of squares, and then reverse this process to get a standard deviation again. It's pretty basic statistic knowledge; you want to compute the weighted quadratic mean, just like you computed the weighted arithmetic mean for the center.
However, since your data is 1 dimensional, I'm pretty sure it will fit into main memory. As long as your data fits into memory, stay away from Mahout. It's slooooow. Use something like ELKI instead, or SciPy, or R. Run benchmarks. Mahout will perform several orders of magnitude slower than all the others. You won't need all of this Canopy-thing then either.

Calculating ballistic trajectory with changing conditions during flight

There is a good compilation of trajectory math in wikipedia.
But I need to calculate a trajectory that has non uniform conditions. E.g. the wind speed changes above certain altitude. (Cannot be modeled easily.)
Should I calculate projectile's velocity vector e.g. every second and then for the next second based on that (having small enough tdelta)
Or should I try to split the trajectory into pieces - based on the parameters (e.g. wind is vwind 1 between y1 and y2 so I calculate for y<y1, y1≤y<y2 and y2≤y separately).
Try to build and solve a symbolic equation - run time - with all the parameters modeled. (Is this completely utopistic? Traditional programmin languages aren't too good solving symbols.)
Something completely different... ?
Are there good languages / frameworks for handling symbolic math?
I'd suggest an "improved" first approach: solve the differential equations of motion numerically with e.g. the classic Runge-Kutta method.
The nice part is that with these algorithms, once you correctly set up your framework, you just have to write an "evaluate" function for the motion law (which can be almost anything - you don't need to restrict to particular forces), and everything should work fine (as far as the integration step is adequate).
If the conditions really are cleanly divided into two domains like that, then the second approach is probably best. The first approach is both imprecise and overkill, and the third, if done right, will wind up being equivalent to the second.

Resources