R: runif produces NAN - r

I want to generate some data which correspond to a quantile function. But the data need a min and a max value.
set.seed(30)
a1<-950 ; a2<-0; a3<-2.48; a4<-1.92
invcdf<-function (x)(a1+a2*a3*((-log(x))^(1/a4)))/(a3*((-log(x))^(1/a4))+1)
t<-invcdf(runif(2000,min=80,max=800))
When I use min and max in the runif function NaN's are produced.
How can I improve this code to avoid NaN's? I can't change the parameters.

Since you don't explain what exactly you are trying to do (which distribution are you trying to sample?), all I can do is interpret this as an attempt to generate random variable according to some distribution using its inverse CDF function. Because I don't know which it is, I can't comment on whether your implementation of it is correct.
However, when you use this method, you should know that the CDF function takes values between 0 and 1, as it is a cumulative density, starting at 0, and going to 1 in some limit.
The inverse of that function then only makes sense if you feed it values between 0 and 1, and that is where a possible error lies. runif(2000,min=80,max=800) generates random values between 80 and 800, way outside the (0,1) interval.
If you instead do this:
t <- invcdf(runif(2000))
We do get results (which happen to lie between 80 and 800 mostly):

Related

R: How to generate several random variables at once

I want to generate 1000 random variables coming from different normal distributions. I use the function "rmvnorm" for that and in a small setting, it is easily done but I have no idea how to automate it, especially for the sigma matrix (I want no correlation between the Xs). I don't really care about their means or their standard deviation. I was thinking of using a loop (e.g. increase by A the mean and by B the variance) but I want something more random and have no idea how I can do that. Again, writing down a matrix of 1000 dimension is not smart (with the condition that the off-diag elements are 0).
I have searched online but I am probably not using the rights words so I apologize if it was already asked and answered.
Thanks!
You can pass equal-length vectors for the parameters of rnorm. The first value returned will be a random draw from a normal distribution with a mean equal to the first value in the mean vector and sd equal to the first value in the sd vector:
rnorm(1e3, 1:1e3, 1:1e3)
Not sure what is meant by "I want something more random", but you can use random values for the mean and sd vectors:
rnorm(1e3, runif(1e3)*1e3, 1/rgamma(1e3, 10, 20))

How can I create a normal distributed set of data in R?

I'm a newbie in statistics and I'm studying R.
I decided to do this exercise to pratice some analysis with an original dataset.
This is the issue: I want to create a datset of let's say 100 subjects and for each one of them I have a test score.
This test score has a range that goes from 0 to 70 and the mean score is 48 (and its improbable that someone scores 0).
Firstly I tried to create the set with x <- round(runif(100, min=0, max=70)) , but then I found out that were not normally distributed using plot(x).
So I searched another Rcommand and found this, but I couldn't decide the min\max:
ex1 <- round(rnorm(100, mean=48 , sd=5))
I really can't understand what I have to do!
I would like to write a function that gives me a set of data normally distributed, in a range of 0-70, with a mean of 48 and a not so big standard deviation in order to do some T-test later...
Any help?
Thanks a lot in advance guys
The normal distribution, by definition, does not have a min or max. If you go more than a few standard deviations from the mean, the probability density is very small, but not 0. You can truncate a normal distribution, chopping of the tails. Here, I use pmin and pmax to set any values below 0 to 0, and any values above 70 to 70:
ex1 <- round(rnorm(100, mean=48 , sd=5))
ex1 <- pmin(ex1, 70)
ex1 <- pmax(ex1, 0)
You can calculate the probability of an individual observation being below or above a certain point using pnorm. For your mean of 48 and SD of 5, the probability an individual observation is less than 0 is very small:
pnorm(0, mean = 48, sd = 5)
# [1] 3.997221e-22
This probability is so small that the truncation step is unnecessary in most applications. But if you started experimenting with bigger standard deviations, or mean values closer to the bounds, it could become necessary.
This method of truncation is simple, but it is a bit of a hack. If you truncated a distribution to be within 1 SD of the mean using this method, you would end up with spikes a the upper and lower bound that are even higher than the density at the mean! But it should work well enough for less extreme applications. A more robust method might be to draw more samples than you need, and keep the first n samples that fall within your bounds. If you really care to do things right, there are packages that implement truncated normal distributions.
(Because the normal distribution is symmetric, and 100 is farther from your mean than 0, the probability of observations > 100 are even smaller.)

Converting Optim to constrOptim in R

I am trying to determine the weights of 9 metrics which will return the highest accuracy ratio. Since they are weights, the values need to sum to 1 and lie between 0 & 1. I am currently using the optim function, but do to constraints, I think I need to switch to constrOptim. I was wondering the best way to do this. Below I have included the code i am currently using. x.matrix is 20,000 by 9 matrix of values ranked between 1-10.
pars<-c(w1=(1/9),w2=(1/9),w3=(1/9),w4=(1/9),w5=(1/9),w6=(1/9),w7=(1/9),w8=(1/9),w9=(1/9))
OptPars<-function(pars){(-(rcorr.cens(x.matrix%*%pars),f)["Dxy"])}
opt<-optim(pars,OptPars)
Say you have values x on the range (-Inf, Inf) and you need values p in the range [0,1] that sum to 1, you can do the following transformation
p <- exp(x)/sum(exp(x))
If you do that translation in your optimization function and do the same transformation on the best set of parameters, you should get what you want.

Use the cumulative distribution function of Weibull in R

I have to simulate a system's fail times, to do so I have to use the Weibull distribution with a "decreasing hazard rate" and a shape of "0.7-0.8". I have to generate a file with 100 results for the function that uses random numbers from 0 to 1.
So I've been searching a bit and I found this R function:
pweibull(q, shape, scale = 1, lower.tail = T, log.p = F)
There are some other (rweibull,qweibull...) but I think this is the one that I have to use, since is the cumulative distribution one, as the exercise statement says. The problem is that I am new to R and that I don't really know what parameters I have to pass to this function.
I'm guessing shape should be 0.7-0.8, and scale 1. For q parameter, should I create a random vector of 100 numbers with 0 to 1 values? If so, any tip of how to do it? Also any tip on how to export the resultant data to a file?
I'm not sure what the question is, but if you want to generate 100 values drawn from Weibull distribution with shape parameter of 0.75 use rweibull(100, 0.75).
If you want to see what the probability is that they are larger than zero, use pweibull(rweibull(100, 0.75), 0.75).
You should also be aware that there is a general no-homework rule on these sites.

multivariate skew normal in R

I'm trying to generate random numbers with a multivariate skew normal distribution using the rmsn command from the sn package in R. I would like, ideally, to be able to get three columns of numbers with a specified variances and covariances, while having one column strongly skewed. But I'm struggling to achieve both goals simultaneously.
The post at skew normal distribution was related and useful (and the source of some of the code below), but hasn't completely clarified the issue for me.
I've been trying:
a <- c(5, 0, 0) # set shape parameter
s <- diag(3) # create variance-covariance matrix
w <- sqrt(1/(1-((2*(a^2)/(1 + a^2))/pi))) # determine scale parameter to get sd of 1
xi <- w*a/sqrt(1 + a^2)*sqrt(2/pi) # determine location parameter to get mean of 0
apply(rmsn(n=1000, xi=c(xi), Omega=s, alpha=a), 2, sd)
colMeans(rmsn(n=1000, xi=c(xi), Omega=s, alpha=a))
The columns means and SDs are correct for the second and third columns (which have no skew) but not the first (which does). Can anyone clarify where my code above, or my thinking, has gone wrong? I may be misunderstanding how to use rmsn, or the output. Any assistance would be appreciated.
The location is not the mean (except when there is no skew). From the documentation:
Notice that the location vector ‘xi’ does not represent the mean
vector of the distribution (which in fact may not even exist if ‘df <=
1’), and similarly ‘Omega’ is not the covariance matrix of the
distribution
And you may want to replace Omega=s with Omega=w.
And this is supposed to be a variance matrix: there should be no square root.

Resources