How to get only positive values in a monte-carlo simulation? - r

Using the code below we get 10,000 random values normally distributed around the mean and the values can be positive and negative. I am dealing with a problem where negative values of simulation result makes no sense. How can I generate a normal distribution with only positive values? Or is there any other appropriate way to handle this?
runs <- 100000
sims <- rnorm(runs,mean=30,sd=30)

If you want to get rid of the negatives you can do this after your code. This will give you a kind of truncated normal distribution if that's what you're after.
sims <- sims[sims>0]

Related

R: How to generate several random variables at once

I want to generate 1000 random variables coming from different normal distributions. I use the function "rmvnorm" for that and in a small setting, it is easily done but I have no idea how to automate it, especially for the sigma matrix (I want no correlation between the Xs). I don't really care about their means or their standard deviation. I was thinking of using a loop (e.g. increase by A the mean and by B the variance) but I want something more random and have no idea how I can do that. Again, writing down a matrix of 1000 dimension is not smart (with the condition that the off-diag elements are 0).
I have searched online but I am probably not using the rights words so I apologize if it was already asked and answered.
Thanks!
You can pass equal-length vectors for the parameters of rnorm. The first value returned will be a random draw from a normal distribution with a mean equal to the first value in the mean vector and sd equal to the first value in the sd vector:
rnorm(1e3, 1:1e3, 1:1e3)
Not sure what is meant by "I want something more random", but you can use random values for the mean and sd vectors:
rnorm(1e3, runif(1e3)*1e3, 1/rgamma(1e3, 10, 20))

Can you make an argument for a function to be a random sample in R?

So, I'd like to test how precise is t-test for detecting a mean for various distributions. But I don't want to have to define the sampling distribution each time I run the function in the function. If I write function(data, mju) and then as data input rnorm(n) or any other random sample, I obviously get the same results when replicating the function, because I only have the one "data" sample, that was first inputted. To understand more clearly what I want, here is the code:
t_ci <- function(data,mju){
prod(t.test(data)$conf.int - mju)
}
set.seed(NULL)
prec_t <- function(data, n, N, mju){
sim <- replicate(N, t_ci(data, mju))
sim[sim<0]/N
}
The first function checks, whether the real theoretical parameter "mju" in in the confidence interval. The second one replicates the function t_ci N times, to see how precise the t test confidence intervals are for selected data. I'd like to have an option to just indicate the distribution and then it would generate n-sized samples N times and calculate the precision. But as far as my code goes, it only replicates the same data over and over. Maybe there is a solution for this problem?
Also, it seems that something is wrong with the function prec_t, because I'd like to have a count of times the t_ci produced negative outcome and then divide by N.
Any help would be greatly appreciated! Thanks in advance.

How do I calculate the probability of sampling all samples from a list?

I have a set amount of samples. I want to estimate how big a sample size I need (lack of terminology?), to at least pick all samples once by a specific probability, when the samples are replaced.
I am fairly new to R. But I think what I have to do is work with the sample function, but was also trying to do it by Monte Carlo simulation, but couldn't wrap my head around it.
I tried something like this:
sampling_prob = function(idnumber, probability_de){
sample(1:idnumber, sampling_number, replace = TRUE)
MISSING CODE
}
sampling_prob(2200,0.99)
> 200000 ##This is not the correct number, but a very far-fetched guess
Thus, I want to estimate my needed sample size/sampling_number based on simulations.

How to generate OUTLIER-FREE data in R?

I would like to know how can I generate an OUTLIER-FREE data using R.
I'm generating data using RNORM.
Say I have a linear equation
Y = B0 + B1*X + E, where X~N(5,9) and E~N(0,1).
I'm going to use RNORM in generating X and E.
Below are the codes used:
X <- rnorm(50,5,3) #I'm generating 50 Xi's w/ mean=5 & var=9
E <- rnorm(50,0,1) #I'm generating 50 residuals w/ mean=0 & var=1
Now, I'm going to generate Y by plugging the generated data on X & E above in the linear equation.
If the data I've generated above is outlier-free (no influential observation), then no Cook's Distance of observations should exceed 4/n, which is the usual cut-off for detecting influential/outlying observations.
But I wasn't not able to get this so far. I'm still getting outliers once I generate data following this procedure.
Can you help me out on this? Do you know a way how can I generate data which is OUTLIER-FREE.
Thanks a lot!
Well, one way would be to detect and delete those outliers by finding the generated points that exceed some cutoff. Of course this would harm the "randomness" in your generated data but your request for outlier-free data implies that by definition. Possibly, decreasing the variance of X could also help.
Is there a particular reason you need the X's to be normally distributed? The assumption of normality in regression is for the residuals (the error term). Typically the measured independent variable won't be normally distributed -- in a balanced, (quasi-)experimental setup, the X's should be close to uniformly distributed. A uniform distribution for the X's (or even an evenly divided sequence generated with seq()) would help you here because the "outlierness" of outliers arises from being both being far from the center from the sample space and being comparatively few in number. With a uniform distribution, they are no longer few in number, which reduces their leverage.
As a sidebar: real-data has outliers. This is actually one of the ways we can detect touched-up or even faked data in science. If you're interested in simulations that correspond to something in reality, then outliers may not be a bad thing. And there is a whole world of robust methods for dealing with data with arbitrarily bad outliers in a principled way as opposed to arbitrary cutoff points.

Generate random numbers from an exponential distribution

Using R, I want to generate 100 random numbers from an exponential distribution with a mean of 50. I want to store these numbers in a vector. I think I did it correctly, but I cannot find anything on the internet to verify my code. Here is my code:
vector <- rexp(100,50)
Solution:
vector <- rexp(100, 1/50)
(answered in comments by #SeverinPappadeux )

Resources