Possible infinite loop on math equation? - math

I have the following problem, and am having trouble understanding part of the equation:
Monte Carlo methods to estimate an integral is basically, take a lot of random samples and determined a weighted average. For example, the integral of f(x) can be estimated from N independent random samples xr by
alt text http://www.goftam.com/images/area.gif
for a uniform probability distribution of xr in the range [x1, x2]. Since each
function evaluation f(xr) is independent, it is easy to distribute this work
over a set of processes.
What I don't understand is what f(xr) is supposed to do? Does it feed back into the same equation? Wouldn't that be an infinite loop?

It should say f(xi)
f() is the function we are trying to integrate via the numerical monte carlo method, which estimates an integral (and its error) by evaluating randomly choosen points from the integration region.

Your goal is to compute the integral of f from x1 to x2. For example, you may wish to compute the integral of sin(x) from 0 to pi.
Using Monte Carlo integration, you can approximate this by sampling random points in the interval [x1,x2] and evaluating f at those points. Perhaps you'd like to call this MonteCarloIntegrate( f, x1, x2 ).
So no, MonteCarloIntegrate does not "feed back" into itself. It calls a function f, the function you are trying to numerically integrate, e.g. sin.

Replace f(x_r) by f(x_r_i) (read: f evaluated at x sub r sub i). The r_i are chosen uniformly at random from the interval [x_1, x_2].
The point is this: the area under f on [x_1, x_2] is equal to (x_2 - x_1) times the average of f on the interval [x_1, x_2]. That is
A = (x_2 - x_1) * [(1 / (x_2 - x_1)) * int_{x_1}^{x_2} f(x)\, dx]
The portion in square brackets is the average of f on [x_1, x_2] which we will denote avg(f). How can we estimate the average of f? By sampling it at N random points and taking the average value of f evaluated at those random points. To wit:
avg(f) ~ (1 / N) * sum_{i=1}^{N} f(x_r_i)
where x_r_1, x_r_2, ..., x_r_N are points chosen uniformly at random from [x_1, x_2].
A = (x_2 - x_1) * avg(f) ~ (x_2 - x_1) * (1 / N) * sum_{i=1}^{N} f(x_r_i).
Here is another way to think about this equation: the area under f on the interval [x_1, x_2] is the same as the area of a rectangle with length (x_2 - x_1) and height equal to the average height of f. The average height of f is approximately
(1 / N) * sum_{i=1}^{N} f(x_r_i)
which is value that we produced previously.

Whether it's xi or xr is irrelevant - it's the random number that we're feeding into function f().
I'm more likely to write the function (aside from formatting) as follows:
(x2-x1) * sum(f(xi))/N
That way, we can see that we're taking the average of N samples of f(x) to get an average height of the function, then multiplying by the width (x2-x1).
Because, after all, integration is just calculating area under the curve. (Nice pictures at http://hyperphysics.phy-astr.gsu.edu/Hbase/integ.html#c4.

x_r is a random value from the integral's range.
Substituting Random(x_1, x_2) for x_r would give an equivalent equation.


Generating Integer Sequences based on a Modified Bernoulli Distribution

I want to use R to randomly generate an integer sequence that each integer is picked from a pool of integers (0,1,2,3....,k) with replacement. k is pre-determined. The selection probability for every integer k in (0,1,2,3....,k) is pk(1-p) where p is pre-determined. That is, 1 has much higher probability to be picked compared to k and my final integer sequence will likely have more 1 than k. I am not sure how to implement this number selecting process in R.
A generic approach to this type of problem would be:
Calculate the p^k * (1-p) for each integer
Create a cumulative sum of these in a table t.
Draw a number from a uniform distribution with range(t)
Measure how far into t that number falls and check which integer that corresponds to.
The larger the probability for an integer is, the larger part of that range it will cover.
Here's quick and dirty example code:
draw <- function(n=1, k, p) {
v <- seq( 0, k )
pr <- (p ** v) * (1-p)
t <- cumsum(pr)
r <- range(t)
x <- runif( n, min=min(r), max=max(r) )
f <- findInterval( x, vec=t )
v[ f+1 ] ## first interval is 0, and it will likely never pass highest interval
Note, the proposed solution doesn't care if your density function adds up to 1. In real life it likely will, based on your description. But that's not really important for the solution.
The answer by Sirius is good. But as I can tell, what you're describing is something like a truncated geometric distribution.
I should note that the geometric distribution is defined differently in different works (see MathWorld, for example), so we use the distribution defined as follows:
P(X = x) ~ p^x * (1 - p), where x is an integer in [0, k].
I am not very familiar with R, but the solution involves calling rgeom(1, 1 - p) until the result is k or less.
Alternatively, you can use a generic rejection sampler, since the probabilities are known (better called weights here, since they need not sum to 1). Rejection sampling is described as follows:
Assume and each weight is 0 or greater. Store the weights in a list. Calculate the highest weight, call it max. Then, to choose an integer in the interval [0, k] using rejection sampling:
Choose a uniform random integer i in the interval [0, k].
With probability weights[i]/max (where weights[i] = p^i * (1-p) in your case), return i. Otherwise, go to step 1.
Given the weights for each item, there are many other ways to make a weighted choice besides rejection sampling or the solution in Sirius's answer; see my note on weighted choice algorithms.

Defining exponential distribution in R to estimate probabilities

I have a bunch of random variables (X1,....,Xn) which are i.i.d. Exp(1/2) and represent the duration of time of a certain event. So this distribution has obviously an expected value of 2, but I am having problems defining it in R. I did some research and found something about a so-called Monte-Carlo Stimulation, but I don't seem to find what I am looking for in it.
An example of what i want to estimate is: let's say we have 10 random variables (X1,..,X10) distributed as above, and we want to determine for example the probability P([X1+...+X10<=25]).
You don't actually need monte carlo simulation in this case because:
If Xi ~ Exp(λ) then the sum (X1 + ... + Xk) ~ Erlang(k, λ) which is just a Gamma(k, 1/λ) (in (k, θ) parametrization) or Gamma(k, λ) (in (α,β) parametrization) with an integer shape parameter k.
From wikipedia (https://en.wikipedia.org/wiki/Exponential_distribution#Related_distributions)
So, P([X1+...+X10<=25]) can be computed by
pgamma(25, shape=10, rate=0.5)
Are you aware of rexp() function in R? Have a look at documentation page by typing ?rexp in R console.
A quick answer to your Monte Carlo estimation of desired probability:
mean(rowSums(matrix(rexp(1000 * 10, rate = 0.5), 1000, 10)) <= 25)
I have generated 1000 set of 10 exponential samples, putting them into a 1000 * 10 matrix. We take row sum and get a vector of 1000 entries. The proportion of values between 0 and 25 is an empirical estimate of the desired probability.
Thanks, this was helpful! Can I use replicate with this code, to make it look like this: F <- function(n, B=1000) mean(replicate(B,(rexp(10, rate = 0.5)))) but I am unable to output the right result.
replicate here generates a matrix, too, but it is an 10 * 1000 matrix (as opposed to a 1000* 10 one in my answer), so you now need to take colSums. Also, where did you put n?
The correct function would be
F <- function(n, B=1000) mean(colSums(replicate(B, rexp(10, rate = 0.5))) <= n)
For non-Monte Carlo method to your given example, see the other answer. Exponential distribution is a special case of gamma distribution and the latter has additivity property.
I am giving you Monte Carlo method because you name it in your question, and it is applicable beyond your example.

Generate random numbers with distribution e^x

Is there any way to generate random numbers of the distribution e^x with x defined over a certain range in R?
Your question is unclear, so I'm going to assume that you'd like to treat ex as a density. The function ex on the range (0,0.5) is not a density, because densities are required to have an area of 1. However, over any specified finite range it can be scaled by its area to turn it into a density. Integrating ex over the specified range yields ex/0.6487212707001282 as a valid density (to within roundoff error).
We then integrate the density from 0 to x for 0 ≤ x ≤ 0.5 to derive the cumulative distribution function: F(x) = (ex - 1) / 0.6487212707001282.
we can now use inverse transform sampling to generate values from this distribution. Set the CDF equal to U, a Uniform(0,1) random number, and solve for x:
x = ln(1 + 0.6487212707001282 * U)

univariate nonlinear optimization with quadratic constraint in R

I have a quadratic function f where, f = function (x) {2+.1*x+.23*(x*x)}. Let's say I have another quadratic fn g where g = function (x) {3+.4*x-.60*(x*x)}
Now, I want to maximize f given the constraints 1. g>0 and 2. 600<x<650
I have tried the packages optim,constrOptim and optimize. optimize does one dim. optimization, but without constraints and constrOptim I couldn't understand. I need to this using R. Please help.
P.S. In this example, the values may be erratic as I have given two random quadratic functions, but basically I want maximization of a quadratic fn given a quadratic constraint.
If you solve g(x)=0 for x by the usual quadratic formula then that just gives you another set of bounds on x. If your x^2 coefficent is negative then g(x) > 0 between the solutions, otherwise g(x)>0 outside the solutions, so within (-Inf, x1) and (x2, Inf).
In this case, g(x)>0 for -1.927 < x < 2.59. So in this case both your constraints cannot be simultaneously achieved (g(x) is LESS THAN 0 for 600<x<650).
But supposing your second condition was 1 < x < 5, then you'd just combine the solution from g(x)>0 with that interval to get 1 < x < 2.59, and then maximise f in that interval using standard univariate optimisation.
And you don't even need to run an optimisation algorithm. Your target f is quadratic. If the coefficient of x^2 is positive the maximum is going to be at one of your limits of x, so you only have a small number of values to try. If the coefficient of x^2 is -ve then the maximum is either at a limit or at the point where f(x) peaks (solve f'(x)=0) if that is within your limits.
So you can do this precisely, there's just a few conditions to test and then some intervals to compute and then some values of f at those interval limits to calculate.

How to sample N points between 0 and R if they are exponentially distributed?

The density of my points x ∈ [0,R] is exponential: ρ(x)~e^x
How can I sample N points from there?
Taking your request at face value, if you want a density function that grows exponentially for x ∈ [0,R] the cumulative distribution function turns out to be (exp(x) - 1) / (exp(R) - 1). To generate this via inversion, set the CDF equal to a Uniform(0,1) and solve for x. The inversion turns out to be:
ln(1 + (exp(R) - 1) * U)
where U represents a call to the Uniform(0,1) PRNG.
If what you actually want is a truncated form of what most probability folks know as the exponential distribution, we need to determine an upper bound for the random number corresponding to your truncation point R. In that case, the inversion is:
-ln(1 - [1 - exp(-lambda * R)] * U) / lambda
As before, U represents a call to the Uniform(0,1) PRNG. This will generate exponentials at rate lambda, truncated at a max of R.
Use inverse sampling: you generate uniform distributed values and map them to the output of the cdf of your distribution.
