Discretization simulation of a Wiener Process - r

I got some problems with this homework which I have totally no idea, never got into this field before and I really need some help.
First, we have a wiener process like
Which means the probability of the process drops beneath -3 within the time interval [0,1].
Now the thing is we have to simulate the process by discretize it.
1.Suppose we first discretize the process by 100 points and simulate 10,000 process in this way.
i.e., W(0.01), W(0.02), …., W(1.00).
Note that W(t) – W(t-0.01) ~ N(0,0.01) independently.
2.Using the data obtained at 1., we approximate
by
what is the relationship between this value and the real
(larger, equal to or smaller)?
3.Repeat 1. and 2. by cutting [0,1] into 10,000 points instead. Will the
resulting probability increases or decreases?

Related

How is RSME calculated between point clouds?

RSME calculates how close the predicted value is compared to the actual value, but in a point cloud, there are 2 things that I am confused about:
How do we know which point corresponds to which point, to be subtracted from?
Point clouds are 3-dimensional since it has xyz values, but how do people turn those 3 values to one RSME value?
First of all, it's RMSE, not RSME. It stands for Root Mean Square Error:
https://en.wikipedia.org/wiki/Root-mean-square_deviation
With 3D coordinates you can compare component wise, or however else you choose to define a distance measure. Then you plug this into the RMSE formula. Essentially this means comparing an expected value to your observed value.
As for the point correspondence - this depends on the algorithm of choice. Probably one of the most famous examples is ICP:
https://de.wikipedia.org/wiki/Iterative_Closest_Point_Algorithm
In a nutshell for every point of one cloud, the closest point of the other cloud is determined. Then an error measure is calculated and lastly points are transformed. This is done an arbitrary number of times, depending on the desired precision.
Since I strongly suspect that you are indeed looking for ICP, here is the description as to how they are put together:
https://en.wikipedia.org/wiki/Iterative_closest_point
Other than that you will have to do some reading yourself.

Turning a random number generating loop into a one equation?

Here's some pseudocode:
count = 0
for every item in a list
1/20 chance to add one to count
This is more or less my current code, but there could be hundreds of thousands of items in that list; therefore, it gets inefficient fast. (isn't this called like, 0(n) or something?)
Is there a way to compress this into one equation?
Let's look at the properties of the random variable you've described. Quoting Wikipedia:
The binomial distribution with parameters n and p is the discrete probability distribution of the number of successes in a sequence of n independent yes/no experiments, each of which yields success with probability p.
Let N be the number of items in the list, and C be a random variable that represents the count you're obtaining from your pseudocode. C will follow a binomial probability distribution (as shown in the image below), with p = 1/20:
The remaining problem is how to efficently poll a random variable with said probability distribution. There are a number of libraries that allow you to draw samples from random variables with a specified PDF. I've never had to implement it myself, so I don't exactly know the details, but many are open source and you can refer to the implementation for yourself.
Here's how you would calculate count with the numpy library in Python:
n, p = 10, 0.05 # 10 trials, probability of success is 0.05
count = np.random.binomial(n, p) # draw a single sample
Apparently the OP was asking for a more efficient way to generate random numbers with the same distribution this will give. I though the question was how to do the exact same operation as the loop, but as a one liner (and preferably with no temporary list that exists just to be iterated over).
If you sample a random number generator n times, it's going to have at best O(n) run time, regardless of how the code looks.
In some interpreted languages, using more compact syntax might make a noticeable difference in the constant factors of run time. Other things can affect the run time, like whether you store all the random values and then process them, or process them on the fly with no temporary storage.
None of this will allow you to avoid having your run time scale up linearly with n.

Generate random small numbers with a target average

I need to write a function that returns on of the numbers (-2,-1,0,1,2) randomly, but I need the average of the output to be a specific number (say, 1.2).
I saw similar questions, but all the answers seem to rely on the target range being wide enough.
Is there a way to do this (without saving state) with this small selection of possible outputs?
UPDATE: I want to use this function for (randomized) testing, as a stub for an expensive function which I don't want to run. The consumer of this function runs it a couple of hundred times and takes an average. I've been using a simple randint function, but the average is always very close to 0, which is not realistic.
Point is, I just need something simple that won't always average to 0. I don't really care what the actual average is. I may have asked the question wrong.
Do you really mean to require that specific value to be the average, or rather the expected value? In other words, if the generated sequence were to contain an extraordinary number of small values in its initial part, should the rest of the sequence atempt to compensate for that in an attempt to get the overall average right? I assume not, I assume you want all your samples to be computed independently (after all, you said you don't want any state), in which case you can only control the expected value.
If you assign a probability pi for each of your possible choices, then the expected value will be the sum of these values, weighted by their probabilities:
EV = − 2p−2 − p−1 + p1 + 2p2 = 1.2
As additional constraints you have to require that each of these probabilities is non-negative, and that the above four add up to a value less than 1, with the remainder taken by the fifth probability p0.
there are many possible assignments which satisfy these requirements, and any one will do what you asked for. Which of them are reasonable for your application depends on what that application does.
You can use a PRNG which generates variables uniformly distributed in the range [0,1), and then map these to the cases you described by taking the cumulative sums of the probabilities as cut points.

Why are frequencies represented as complex numbers?

In a FFT, the resulting frequencies represent both magnitude and phase. Since each frequency element in the output array of an FFT essentially just describes the SIN wave at each frequency interval, shouldn't it just be magnitude that we need? What is the significance of the phase represented in the imaginary part of the complex number?
To clarify my question, to be able to put a meaning to the phase of a wave, I need a reference point or reference wave.
When an FFT reports the phase for each sin wave in the resulting frequency domain output, what is the reference wave with respect to which it is reporting the phase?
Because the phase of different components affects the total signal. The two functions in the plot are both summed from sine waves with periods of pi and 2pi, but the phase of the p=2pi sine waves are different. As you can see, the outputs are not the same.
Well in layman's words: magnitude tells you how much of that frequency is there, and phase tells you where it is.
FFTs (there is more than one convention) usually report phase with respect to the zero-th sample. Or if you use FFTShift, with respect to the sample at the center of an FFT window that indexes from 0 to N-1 (e.g. sample number N/2 = sin(0) for a phase of 0). The latter convention, centering phase using FFTShift, is often better, as there can be a big discontinuity at the edges of an FFT aperture, or nearly no data at the edges after using a tapered window function.
If you use FFTShift to center the phase reference, zero phase represents an even function, and a phase of pi or -pi represents an odd function in the window.
Human hearing, in general, can't discriminate the phase of a single sound source. BUT, phase is important when dealing with combined sounds, or multiple sine waves of the same frequency. Sinusoids that are in phase add or sum. Sinusoids of the opposite phase cancel. So if you have the FFT of, say, two loudspeaker responses without phase, you won't know whether they will sound great or horrible together.

Statistical best fit for gesture detection

I have a linear regression equation from school , which gives a value between 1 and -1 indicative of whether or not a set of data points are close enough to a linear function
and the equation given here
http://people.hofstra.edu/stefan_waner/realworld/calctopic1/regression.html
under best fit of a line. I would like to use these to do simple gesture detection based on a point in 3-space (x,y,z) - forward, back, left, right, up, down. First I would see if they fall on a line in 2 of the 3 dimensions, then I would see if that line's slope approached zero or infinity.
Is this fast enough for functional gesture recognition? If not, could someone propose an alternative algorithm?
If I've understood your question correctly then (1) the calculation you describe here would probably be plenty fast enough, (2) it may not actually do what you want, and (3) the stuff that'll be slow in an actual implementation would lie elsewhere.
So, I think you're proposing to do this. (1) Identify the positions of ... something ... (the user's hand, perhaps) in three-dimensional space, at several successive times. (2) For (say) each of {x,y} and {x,z}, look at those two coordinates of each point, compute the correlation coefficient (which is what your formula describes) and see whether it's close to +-1. (3) If both correlation coefficients are close to +-1 then the points lie approximately on a straight line; calculate the gradient of that line (using a formula similar to that of the correlation coefficient). (4) If the gradients are both very close to 0 or +- infinity, then your line is approximately parallel to one axis, which is the case you're trying to recognize.
1: Is it fast enough? You might perhaps be sampling at 50 frames per second or thereabouts, and your gestures might take a second to execute. So you'll have somewhere on the order of 50 positions. So, the total number of arithmetic operations you'll need is maybe a few hundred (including a modest number of square roots). In the worst case, you might be doing this in emulated floating-point on a slow ARM processor or something; in that case, each arithmetic operation might take a couple of hundred cycles, so the whole thing might be 100k cycles, which for a really slow processor running at 100MHz would be about a millisecond. You're not going to have any problem with the time taken to do this calculation.
2: Is it the right thing? It's not clear that it's the right calculation. For instance, suppose your user's hand moves back and forth rapidly several times along the x-axis; that will give you a positive result; is that what you want? Suppose the user attempts the gesture you want but moves at slightly the wrong angle; you may get a negative result. Suppose they move exactly along the x-axis for a bit and then along the y-axis for a bit; then the projections onto the {x,y}, {x,z} and {y,z} planes will all pass your test. These all seem like results you might not want.
3: Is it where the real cost will lie? This all assumes you've already got (x,y,z) coordinates. Getting those is probably going to be more expensive than processing them. For instance, if you have a camera-based system of some kind then there'll be some nontrivial image processing for every frame. Or perhaps you're integrating up data from accelerometers (which, by the way, is likely to give nasty inaccurate position results); the chances are that you're doing some filtering and other calculations to get position data. I bet that the cost of performing a calculation like this one will be substantially less than the cost of getting the coordinates in the first place.

Resources