How do I get started with this? - r

So I am stuck on this problem for a long time.
I was think I should first create the two functions, like this:
n = runif(10000)
int sum = 0
estimator1_fun = function(n){
for(i in 1:10000){
sum = sum + ((n/i)*runif(1))
)
return (sum)
}
and do the same for the other function, and use the mse formula? Am I even approaching this correctly? I tried formatting it, but found that using an image would be better.

Assuming U(0,Theta_0) is the uniform distribution from 0 to Theta_0, and that Theta_0 is a fixed constant, I would proceed as follows:
1. Define Theta_0. Give it a fixed value.
2. Write the function that gives a random number from that distribution
- The distribution function is runif(0,Theta_0).
- Arguments could be Theta_0 and N.
3. Sample it a few thousand (or whatever) times into a vector X.
4. Calculate the two estimates.
5. Repeat steps 3 & 4 for more samples
6. Plot the two estimates against the number of samples and
see if it is approaching Theta_0

Related

How do I perform a simulation to find a z-score (x) with a given probability in R

As the title illustrates, I would like to conduct a simulation test. I was given a probability P(L>x)=0.05, and L follows a normal distribution with mean=0, std=100. I was asked to perform some sort of simulation, IDEALLY using a hit-or-miss approach multiple times to do so in order to find an appropriate x. I was not allowed to use qnorm() function. Can you please help me out? Thank you
As we want P(L>x)=0.05, we can create a function that calculates P(L>x)-0.05, and find the x that turns it to 0 (its root) with uniroot:
prob = function(x){
n = 10000
L = rnorm(n,0,100)
sum(L > x)/n - 0.05}
uniroot(prob, c(-400,-50))
Obs: the second argument for uniroot is the arbitrary interval where it'll try to find the root.
This will find a different root every time you run it as L is created inside prob. For better accuracy, you can increase n.

Calculating lm() within a loop

Objective: The overall objective of the problem is to calculate the confidence interval (CI) of various sample sizes (n=2,4..1024) of rnorm, 10,000 times and then count the number of times each one fails (this likely requires a counter and an if/else statement). Finally the results are to be plotted
I am trying to calculate CI of the means for several simulations of a sample sizes, however, I am first trying to break down the code for one specific sample size a = 8.
The problem I have is that I do not know how to generate a linear model for each row. Would anyone know how I can do this? Here is what I have so far:
a <- 8
n.sim.3 <- 10000
for ( i in a) {
r.mat <- matrix(rnorm(i*n.sim.3), nrow=n.sim.3, ncol = a)
lm.tmp <- apply(three.mat,1,lm(n.sim.3~1) # The lm command is where I'm stuck I don't think this is correct)
confint.tmp <- confint(lm.tmp)

Generate N random integers that are sampled from a uniform distribution and sum to M in R [duplicate]

In some code I want to choose n random numbers in [0,1) which sum to 1.
I do so by choosing the numbers independently in [0,1) and normalizing them by dividing each one by the total sum:
numbers = [random() for i in range(n)]
numbers = [n/sum(numbers) for n in numbers]
My "problem" is, that the distribution I get out is quite skew. Choosing a million numbers not a single one gets over 1/2. By some effort I've calculated the pdf, and it's not nice.
Here is the weird looking pdf I get for 5 variables:
Do you have an idea for a nice algorithm to choose the numbers, that result in a more uniform or simple distribution?
You are looking to partition the distance from 0 to 1.
Choose n - 1 numbers from 0 to 1, sort them and determine the distances between each of them.
This will partition the space 0 to 1, which should yield the occasional large result which you aren't getting.
Even so, for large values of n, you can generally expect your max value to decrease as well, just not as quickly as your method.
You might be interested in the Dirichlet distribution which is used for generate quantities that sum to 1 if you're looking for probabilities. There's also a section on how to generate them using gamma distributions here.
Another way to get n random numbers which sum up to 1:
import random
def create_norm_arr(n, remaining=1.0):
random_numbers = []
for _ in range(n - 1):
r = random.random() # get a random number in [0, 1)
r = r * remaining
remaining -= r
random_numbers.append(r)
random_numbers.append(remaining)
return random_numbers
random_numbers = create_norm_arr(5)
print(random_numbers)
print(sum(random_numbers))
This makes higher numbers more likely.

Generate Poisson process using R

I want to generate a process where in every step there is a realisation of a Poisson random variable, this realisation should be saved and then it should be realize the next Poisson random variable and add it to the sum of all realisations before. Furthermore there should be a chance that in every step this process stops. Hope that makes sense to you guys... Any thought is appreciated!
More compactly, pick a single geometrically distributed random number for the total number of steps achieved before stopping, then use cumsum to sum that many Poisson deviates:
stopping.prob <- 0.3 ## for example
lambda <- 3.5 ## for example
n <- rgeom(1,1-stopping.prob)+1 ## constant probability per step of stopping
cumsum(rpois(n,lambda))
You are very vague on the parameters of your simulation but how's this?
Lambda for random Poisson number.
lambda <- 5
This is the threshold value when the function exits.
th <- 0.999
Create a vector of length 1000.
bin <- numeric(1000)
Run the darn thing. It basically rolls a "dice" (values generated are between 0 and 1). If the values is below th, it returns a random Poisson number. If the value is above th (but not equal), the function stops.
for (i in 1:length(bin)) {
if (runif(1) < th) {
bin[i] <- rpois(1, lambda = lambda)
} else {
stop("didn't meet criterion, exiting")
}
}
Remove zeros if any.
bin <- bin[bin != 0]
You can use cumsum to cumulatively sum values.
cumsum(bin)

Combining two normal random variables

suppose I have the following 2 random variables :
X where mean = 6 and stdev = 3.5
Y where mean = -42 and stdev = 5
I would like to create a new random variable Z based on the first two and knowing that : X happens 90% of the time and Y happens 10% of the time.
It is easy to calculate the mean for Z : 0.9 * 6 + 0.1 * -42 = 1.2
But is it possible to generate random values for Z in a single function?
Of course, I could do something along those lines :
if (randIntBetween(1,10) > 1)
GenerateRandomNormalValue(6, 3.5);
else
GenerateRandomNormalValue(-42, 5);
But I would really like to have a single function that would act as a probability density function for such a random variable (Z) that is not necessary normal.
sorry for the crappy pseudo-code
Thanks for your help!
Edit : here would be one concrete interrogation :
Let's say we add the result of 5 consecutives values from Z. What would be the probability of ending with a number higher than 10?
But I would really like to have a
single function that would act as a
probability density function for such
a random variable (Z) that is not
necessary normal.
Okay, if you want the density, here it is:
rho = 0.9 * density_of_x + 0.1 * density_of_y
But you cannot sample from this density if you don't 1) compute its CDF (cumbersome, but not infeasible) 2) invert it (you will need a numerical solver for this). Or you can do rejection sampling (or variants, eg. importance sampling). This is costly, and cumbersome to get right.
So you should go for the "if" statement (ie. call the generator 3 times), except if you have a very strong reason not to (using quasi-random sequences for instance).
If a random variable is denoted x=(mean,stdev) then the following algebra applies
number * x = ( number*mean, number*stdev )
x1 + x2 = ( mean1+mean2, sqrt(stdev1^2+stdev2^2) )
so for the case of X = (mx,sx), Y= (my,sy) the linear combination is
Z = w1*X + w2*Y = (w1*mx,w1*sx) + (w2*my,w2*sy) =
( w1*mx+w2*my, sqrt( (w1*sx)^2+(w2*sy)^2 ) ) =
( 1.2, 3.19 )
link: Normal Distribution look for Miscellaneous section, item 1.
PS. Sorry for the wierd notation. The new standard deviation is calculated by something similar to the pythagorian theorem. It is the square root of the sum of squares.
This is the form of the distribution:
ListPlot[BinCounts[Table[If[RandomReal[] < .9,
RandomReal[NormalDistribution[6, 3.5]],
RandomReal[NormalDistribution[-42, 5]]], {1000000}], {-60, 20, .1}],
PlotRange -> Full, DataRange -> {-60, 20}]
It is NOT Normal, as you are not adding Normal variables, but just choosing one or the other with certain probability.
Edit
This is the curve for adding five vars with this distribution:
The upper and lower peaks represent taking one of the distributions alone, and the middle peak accounts for the mixing.
The most straightforward and generically applicable solution is to simulate the problem:
Run the piecewise function you have 1,000,000 (just a high number) of times, generate a histogram of the results (by splitting them into bins, and divide the count for each bin by your N (1,000,000 in my example). This will leave you with an approximation for the PDF of Z at every given bin.
Lots of unknowns here, but essentially you just wish to add the two (or more) probability functions to one another.
For any given probability function you could calculate a random number with that density by calculating the area under the probability curve (the integral) and then generating a random number between 0 and that area. Then move along the curve until the area is equal to your random number and use that as your value.
This process can then be generalized to any function (or sum of two or more functions).
Elaboration:
If you have a distribution function f(x) which ranges from 0 to 1. You could calculate a random number based on the distribution by calculating the integral of f(x) from 0 to 1, giving you the area under the curve, lets call it A.
Now, you generate a random number between 0 and A, let's call that number, r. Now you need to find a value t, such that the integral of f(x) from 0 to t is equal to r. t is your random number.
This process can be used for any probability density function f(x). Including the sum of two (or more) probability density functions.
I'm not sure what your functions look like, so not sure if you are able to calculate analytic solutions for all this, but worse case scenario, you could use numeric techniques to approximate the effect.

Resources