Need random sample: Weibull distribution with specific inter-arrival times

Need random sample: Weibull distribution with specific inter-arrival times - r

I am using R, and would like to generate a number of observations using rweibull(n, shape, scale = 1).
I have the arrival rate (i.e. 1/interarrival time), but I do not know how to use it in rweibull function.

The scale parameter is what you need to be working with and the shape parameter is what needs to be set to 1 to create an exponential distribution. The scale parameter is 1/rate:
interT = 8
plot( density(rexp(100, rate=1/interT)) )
with( density(rweibull(100, scale=interT, shape=1)),
lines(x,y, col="red"))
(But if you are using the survival package you need to be aware that the parameters are different.)

Related

Generate beta-binomial distribution from existing vector

Is it possible to/how can I generate a beta-binomial distribution from an existing vector?
My ultimate goal is to generate a beta-binomial distribution from the below data and then obtain the 95% confidence interval for this distribution.
My data are body condition scores recorded by a veterinarian. The values of body condition range from 0-5 in increments of 0.5. It has been suggested to me here that my data follow a beta-binomial distribution, discrete values with a restricted range.
set1 <- as.data.frame(c(3,3,2.5,2.5,4.5,3,2,4,3,3.5,3.5,2.5,3,3,3.5,3,3,4,3.5,3.5,4,3.5,3.5,4,3.5))
colnames(set1) <- "numbers"
I see that there are multiple functions which appear to be able to do this, betabinomial() in VGAM and rbetabinom() in emdbook, but my stats and coding knowledge is not yet sufficient to be able to understand and implement the instructions provided on the function help pages, at least not in a way that has been helpful for my intended purpose yet.

We can look at the distribution of your variables, y-axis is the probability:
x1 = set1$numbers*2
h = hist(x1,breaks=seq(0,10))
bp = barplot(h$counts/length(x1),names.arg=(h$mids+0.5)/2,ylim=c(0,0.35))
You can try to fit it, but you have too little data points to estimate the 3 parameters need for a beta binomial. Hence I fix the probability so that the mean is the mean of your scores, and looking at the distribution above it seems ok:
library(bbmle)
library(emdbook)
library(MASS)
mtmp <- function(prob,size,theta) {
-sum(dbetabinom(x1,prob,size,theta,log=TRUE))
}
m0 <- mle2(mtmp,start=list(theta=100),
data=list(size=10,prob=mean(x1)/10),control=list(maxit=1000))
THETA=coef(m0)[1]
We can also use a normal distribution:
normal_fit = fitdistr(x1,"normal")
MEAN=normal_fit$estimate[1]
SD=normal_fit$estimate[2]
Plot both of them:
lines(bp[,1],dbetabinom(1:10,size=10,prob=mean(x1)/10,theta=THETA),
col="blue",lwd=2)
lines(bp[,1],dnorm(1:10,MEAN,SD),col="orange",lwd=2)
legend("topleft",c("normal","betabinomial"),fill=c("orange","blue"))
I think you are actually ok with using a normal estimation and in this case it will be:
normal_fit$estimate
mean sd
6.560000 1.134196

R - Gamma Cumulative Distribution Function

I want to calculate the Gamma CDF for an array of data that I have. I have calculated the alpha and beta parameters, however I am not sure of how to calculate the CDF in R,(Is there something like Matlab's gamcdf?).
I have seen some people use fitdistr, or pgamma, but I do not understand how to put the alpha and beta values or I do not need them at all?
Thanks.

A gamma distribution is defined by the two parameters, and given those two parameters, you can calculate the cdf for an array of values using pgamma.
# Let's make a vector
x = seq(0, 3, .01)
# Now define the parameters of your gamma distribution
shape = 1
rate = 2
# Now calculate points on the cdf
cdf = pgamma(x, shape, rate)
# Shown plotted here
plot(x,cdf)
Note that the Gamma has different ways it can be parameterized. Check ?pgamma for specifics to ensure your 2 parameters match.

Fitting Model Parameters To Histogram Data in R

So I've got a data set that I want to parameterise but it is not a Gaussian distribution so I can't parameterise it in terms of it's mean and standard deviation. I want to fit a distribution function with a set of parameters and extract the values of the parameters (eg. a and b) that give the best fit. I want to do this exactly the same as the
lm(y~f(x;a,b))
except that I don't have a y, I have a distribution of different x values.
Here's an example. If I assume that the data follows a Gumbel, double exponential, distribution
f(x;u,b) = 1/b exp-(z + exp-(z)) [where z = (x-u)/b]:
#library(QRM)
#library(ggplot2)
rg <- rGumbel(1000) #default parameters are 0 and 1 for u and b
#then plot it's distribution
qplot(rg)
#should give a nice skewed distribution
If I assume that I don't know the distribution parameters and I want to perform a best fit of the probability density function to the observed frequency data, how do I go about showing that the best fit is (in this test case), u = 0 and b = 1?
I don't want code that simply maps the function onto the plot graphically, although that would be a nice aside. I want a method that I can repeatedly use to extract variables from the function to compare to others. GGPlot / qplot was used as it quickly shows the distribution for anyone wanting to test the code. I prefer to use it but I can use other packages if they are easier.
Note: This seems to me like a really obvious thing to have been asked before but I can't find one that relates to histogram data (which again seems strange) so if there's another tutorial I'd really like to see it.

Use the cumulative distribution function of Weibull in R

I have to simulate a system's fail times, to do so I have to use the Weibull distribution with a "decreasing hazard rate" and a shape of "0.7-0.8". I have to generate a file with 100 results for the function that uses random numbers from 0 to 1.
So I've been searching a bit and I found this R function:
pweibull(q, shape, scale = 1, lower.tail = T, log.p = F)
There are some other (rweibull,qweibull...) but I think this is the one that I have to use, since is the cumulative distribution one, as the exercise statement says. The problem is that I am new to R and that I don't really know what parameters I have to pass to this function.
I'm guessing shape should be 0.7-0.8, and scale 1. For q parameter, should I create a random vector of 100 numbers with 0 to 1 values? If so, any tip of how to do it? Also any tip on how to export the resultant data to a file?

I'm not sure what the question is, but if you want to generate 100 values drawn from Weibull distribution with shape parameter of 0.75 use rweibull(100, 0.75).
If you want to see what the probability is that they are larger than zero, use pweibull(rweibull(100, 0.75), 0.75).
You should also be aware that there is a general no-homework rule on these sites.

Interpolate new values using a set of samples

I'm new to R. Having a set of samples along with the target, I want to fit a numeric function to solve the target of new samples. My sample is time in seconds indicating the duration of a user's staying at this place:
>b <- c(101,25711,13451,19442,26,3083,133,184,4403,9713,6918,10056,12201,10624,14984,5241,
+21619,44285,3262,2115,1822,11291,3243,12989,3607,12882,4462,11553,7596,2926,12955,
+1832,3539,6897,13571,16668,813,1824,10304,2508,1493,4407,7820,507,15866,7442,7738,
+5705,2869,10137,11276,12884,11298,...)
Firstly, I convert them to hours dividing by 3600, and I want to fit a function as pdf of the duration:
> b <- b/3600
> hist(c,xlim=c(0,13),prob=T,breaks=seq(0,24,by=0.5))
> lines(density(x), col=red)
I want to fit the red line on the figure, and interpolate new values to find the probability of the specific duration on this place say p(duration = 1.5hours).
Thanks for your attention!

As suggested above, you can fit a distribution with fitdistr in MASS package.
If you use a continuous distribution you will have the probability that the time is within an interval. If you use a discrete distribution, you may compute the probability of a certain time (in hours).
For the continuous case, you can use a Gamma distribution: fitdistr(b, "Gamma") will give you the parameter estimates, and then you can use pgamma with those estimates and an interval.
For the discrete case, you can use a Poisson distribution: fitdistr(b, "Poisson") and then the dpois function with the estimate and the value you want.
To decide which one to use, I'd just plot the pdf with the histogram and take a look.