Using R, I want to generate 100 random numbers from an exponential distribution with a mean of 50. I want to store these numbers in a vector. I think I did it correctly, but I cannot find anything on the internet to verify my code. Here is my code:
vector <- rexp(100,50)
Solution:
vector <- rexp(100, 1/50)
(answered in comments by #SeverinPappadeux )
Related
I want to generate 1000 random variables coming from different normal distributions. I use the function "rmvnorm" for that and in a small setting, it is easily done but I have no idea how to automate it, especially for the sigma matrix (I want no correlation between the Xs). I don't really care about their means or their standard deviation. I was thinking of using a loop (e.g. increase by A the mean and by B the variance) but I want something more random and have no idea how I can do that. Again, writing down a matrix of 1000 dimension is not smart (with the condition that the off-diag elements are 0).
I have searched online but I am probably not using the rights words so I apologize if it was already asked and answered.
Thanks!
You can pass equal-length vectors for the parameters of rnorm. The first value returned will be a random draw from a normal distribution with a mean equal to the first value in the mean vector and sd equal to the first value in the sd vector:
rnorm(1e3, 1:1e3, 1:1e3)
Not sure what is meant by "I want something more random", but you can use random values for the mean and sd vectors:
rnorm(1e3, runif(1e3)*1e3, 1/rgamma(1e3, 10, 20))
I have a txt file with numbers that looks like this(but with 100 numbers) -
[1] 7.1652348 5.6665965 4.4757553 4.8497086 15.2276296 -0.5730937
[7] 4.9798067 2.7396933 5.1468304 10.1221489 9.0165661 65.7118194
[13] 5.5205704 6.3067488 8.6777177 5.2528503 3.5039562 4.2477401
[19] 11.4137624 -48.1722034 -0.3764006 5.7647536 -27.3533138 4.0968204
I need to estimate MLE theta parameter from this distrubution -
[![this is my distrubution ][1]][1]
and I need to estimate theta from a sample of 1000 observations with replace, and save the sample, and do a hist.
How can I estimate theta from my sample? I have no information about normal distrubation.
I wrote something like this -
data<-read.table(file.choose(), header = TRUE, sep= "")
B <- 1000
sample.means <- numeric(data)
sample.sd <- numeric(data)
for (i in 1:B) {
MySample <- sample(data, length(data), replace = TRUE)
sample.means <- c(sample.means,mean(MySample))
sample.sd <- c(sample.sd,sd(MySample))
}
sd(sample.sd)
but it doesn't work..
This question incorporates multiple different ones, so let's tackle each step by step.
First, you will need to draw a random sample from your population (with replacement). Assuming your 100 population-observations sit in a vector named pop.
rs <- sample(pop, 1000, replace = True)
gives you your vector of random samples. If you wanna save it, you can write it to your disk in multiple formats, so I'll just suggest a few related questions (How to Export/Import Vectors in R?).
In a second step, you can use the mle()-function of the stats4-package (https://stat.ethz.ch/R-manual/R-devel/library/stats4/html/mle.html) and specify the objective function explicitly.
However, the second part of your question is more of a statistical/conceptual question than R related, IMO.
Try to understand what MLE actually does. You do not need normally distributed variables. The idea behind MLE is to choose theta in such a way, that under the resulting distribution the random sample is the most probable. Check https://en.wikipedia.org/wiki/Maximum_likelihood_estimation for more details or some youtube videos, if you'd like a more intuitive approach.
I assume, in the description of your task, it is stated that f(x|theta) is the conditional joint density function and that the observations x are iir?
What you wanna do in this case, is to select theta such that the squared difference between the observation x and the parameter theta is minimized.
For your statistical understanding, in such cases, it makes sense to perform log-linearization on the equation, instead of dealing with a non-linear function.
Minimizing the squared difference is equivalent to maximizing the log-transformed function since the sum is negative (<=> the product was in the denominator) and the log, as well as the +1 are solely linear transformations.
This leaves you with the maximization problem:
And the first-order condition:
Obviously, you would also have to check that you are actually dealing with a maximum via the second-order condition but I'll omit that at this stage for simplicity.
The algorithm in R does nothing else than solving this maximization problem.
Hope this helps for your understanding. Maybe some smarter people can give some additional input.
Using the code below we get 10,000 random values normally distributed around the mean and the values can be positive and negative. I am dealing with a problem where negative values of simulation result makes no sense. How can I generate a normal distribution with only positive values? Or is there any other appropriate way to handle this?
runs <- 100000
sims <- rnorm(runs,mean=30,sd=30)
If you want to get rid of the negatives you can do this after your code. This will give you a kind of truncated normal distribution if that's what you're after.
sims <- sims[sims>0]
I'm looking to fit a weighted distribution to a data set I have.
I'm currently using the fitdist command but don't know if there is a way to add weighting.
library(fitdistrplus)
df<-data.frame(value=rlnorm(100,1,0.5),weight=runif(100,0,2))
#This is what I'm doing but not really what I want
fit_df<-fitdist(df$value,"lnorm")
#How to do this
fit_df_weighted<-fitdist(df$value,"lnorm",weight=df$weight)
I'm sure this has been answered before somewhere but I've looked and can't find anything.
thanks in advance,
Gordon
Perhaps you could use the rep() function and a quick loop to approximate the distribution.
You could multiply each weighted value by, say, 10000, round the number, and then use it to indicate how many multiples of the value you need in your vector. After running a quick loop, you could then run the vector through the fitdist() algorithm.
df$scaled_weight <- round(df$weight*10000,0)
my_vector <- vector()
## quick loop
for (i in 1:nrow(df)){
values <- rep(df$value[i], df$scaled_weight[i])
my_vector <- c(my_vector, values)
}
## find parameters
fit_df_weighted <- fitdist(my_vector,"lnorm")
The standard errors would be rubbish, but the estimated parameters should be sufficient.
I'm trying to generate random numbers with a multivariate skew normal distribution using the rmsn command from the sn package in R. I would like, ideally, to be able to get three columns of numbers with a specified variances and covariances, while having one column strongly skewed. But I'm struggling to achieve both goals simultaneously.
The post at skew normal distribution was related and useful (and the source of some of the code below), but hasn't completely clarified the issue for me.
I've been trying:
a <- c(5, 0, 0) # set shape parameter
s <- diag(3) # create variance-covariance matrix
w <- sqrt(1/(1-((2*(a^2)/(1 + a^2))/pi))) # determine scale parameter to get sd of 1
xi <- w*a/sqrt(1 + a^2)*sqrt(2/pi) # determine location parameter to get mean of 0
apply(rmsn(n=1000, xi=c(xi), Omega=s, alpha=a), 2, sd)
colMeans(rmsn(n=1000, xi=c(xi), Omega=s, alpha=a))
The columns means and SDs are correct for the second and third columns (which have no skew) but not the first (which does). Can anyone clarify where my code above, or my thinking, has gone wrong? I may be misunderstanding how to use rmsn, or the output. Any assistance would be appreciated.
The location is not the mean (except when there is no skew). From the documentation:
Notice that the location vector ‘xi’ does not represent the mean
vector of the distribution (which in fact may not even exist if ‘df <=
1’), and similarly ‘Omega’ is not the covariance matrix of the
distribution
And you may want to replace Omega=s with Omega=w.
And this is supposed to be a variance matrix: there should be no square root.