Different moments given by R using the same library - r

I'm using R along with library moments to generate a small dataset and compute the four initial moments of my data:
Mean
Variation
Skewness
Kurstosis
The code is shown below. I set a random seed for my PRNG and generates 1000 data points using a normal distribution.
Then, I print four moments two ways. First, I print then individually. Then, I print them using the method all.moments.
library(moments)
set.seed(123)
x = rnorm(1000, sd = 0.02)
print(mean(x));
print(var(x));
print(skewness(x))
print(kurtosis(x))
print(moments::all.moments(x, order.max = 4))
The outputs are shown below.
print(mean(x));
0.0003225573
print(var(x));
0.0003933836
print(skewness(x));
0.06529391
print(kurtosis(x));
2.925747
print(moments::all.moments(x, order.max = 4));
1.000000e+00 3.225573e-04 3.930942e-04 8.889998e-07 4.527577e-07
One may note that both the skewness and the kurtosis of both methods are different.
My question is: Why they give different results? Which result is the right one?

Note that the third and fourth moments are NOT the skewness and kurtosis. These should be calculated afterwards

Related

How to generate a sample of artificial data with a particular variance?

I am trying to generate a data set with the following information->
sample size:200
variance = 2
mean = 20
I have tried generating it using the rnorm() function but it only takes standard deviation as variable. I have also tried to square root the standard deviation to generate the desired variance but it doesn't work either.
How can I generate such dataset with that mean and variance in Rstudio?
Thank you.
x = rnorm(200, 20, sd=sqrt(2))
c(mean(x), var(x))
[1] 20.064919 1.981597

Is there an R function to find the probability of certain data being created from a Beta distribution?

I have a vector x of values in R. I want to know the probability that the data was made from a Beta(20,40) distribution. I am using R.
When I make this function call
dbeta(x, 10, 20)
I get the probability for each entry in the vector.
0.065278039 0.003434240 0.036265577 0.175467370 0.018132789 0.065278039
0.175467370 0.175467370
I was wondering if it is possible to output one number to show the probability that the entire data vector was made from a Beta distribution.
For example, the probability of dataset $x$ being generated from a Beta(20,40) distribution is some number.
Thanks!
You can try by performing some hypotesis tests like Kolmogorov-Smirnov or Hoeffing and compare the data taken from the dataset with the Beta(20,40).
This tests are used to evaluate the hypothesis that two samples are drawn for the same distribution.
Something like this ks.test(x,y = 'pbeta', shape1 = 20, shape2 = 40) should do the work.

Generate multivariate nonnormal random numbers in R

Background
I want to generate multivariate distributed random numbers with a fixed variance matrix. For example, I want to generate a 2 dimensional data with covariance value = 0.5, each dimensional variance = 1. The first maginal of data is a norm distribution with mean = 0, sd = 1, and the next is a exponential distribution with rate = 2.
My attempt
My attempt is that we can generate a correlated multinormal distribution random numbers and then revised them to be any distribution by Inverse transform sampling.
In below, I give an example about transforming 2 dimensional normal distribution random numbers into a norm(0,1)+ exp(2) random number:
# generate a correlated multi-normal distribution, data[,1] and data[,2] are standard norm
data <- mvrnorm(n = 1000,mu = c(0,0), Sigma = matrix(c(1,0.5,0.5,1),2,2))
# calculate the cdf of dimension 2
exp_cdf = ecdf(data[,2])
Fn = exp_cdf(data[,2])
# inverse transform sampling to get Exponetial distribution with rate = 2
x = -log(1-Fn + 10^(-5))/2
mean(x);cor(data[,1],x)
Out:
[1] 0.5035326
[1] 0.436236
From the outputs, the new x is a set of exponential(rate = 2) random numbers. Also, x and data[,1] are correlated with 0.43. The correlated variance is 0.43, not very close to my original setting value 0.5. It maybe a issue. I think covariance of sample generated should stay more closer to initial setting value. In general, I think my method is not quite decent, maybe you guys have some amazing code snippets.
My question
As a statistics graduate, I know there exist 10+ methods to generate multivariate random numbers theoretically. In this post, I want to collect bunch of code snippets to do it automatically using packages or handy . And then, I will compare them from different aspects, like time consuming and quality of data etc. Any ideas is appreciated!
Note
Some users think I am asking for package recommendation. However, I am not looking for any recommendation. I already knew commonly used statistical theroms and R packages. I just wanna know how to generate multivariate distributed random numbers with a fixed variance matrix decently and give a code example about generate norm + exp random numbers. I think there must exist more powerful code snippets to do it in a decent way! So I ask for help right now!
Sources:
generating-correlated-random-variables, math
use copulas to generate multivariate random numbers, stackoverflow
Ross simulation, theoretical book
R CRAN distribution task View

Generate beta-binomial distribution from existing vector

Is it possible to/how can I generate a beta-binomial distribution from an existing vector?
My ultimate goal is to generate a beta-binomial distribution from the below data and then obtain the 95% confidence interval for this distribution.
My data are body condition scores recorded by a veterinarian. The values of body condition range from 0-5 in increments of 0.5. It has been suggested to me here that my data follow a beta-binomial distribution, discrete values with a restricted range.
set1 <- as.data.frame(c(3,3,2.5,2.5,4.5,3,2,4,3,3.5,3.5,2.5,3,3,3.5,3,3,4,3.5,3.5,4,3.5,3.5,4,3.5))
colnames(set1) <- "numbers"
I see that there are multiple functions which appear to be able to do this, betabinomial() in VGAM and rbetabinom() in emdbook, but my stats and coding knowledge is not yet sufficient to be able to understand and implement the instructions provided on the function help pages, at least not in a way that has been helpful for my intended purpose yet.
We can look at the distribution of your variables, y-axis is the probability:
x1 = set1$numbers*2
h = hist(x1,breaks=seq(0,10))
bp = barplot(h$counts/length(x1),names.arg=(h$mids+0.5)/2,ylim=c(0,0.35))
You can try to fit it, but you have too little data points to estimate the 3 parameters need for a beta binomial. Hence I fix the probability so that the mean is the mean of your scores, and looking at the distribution above it seems ok:
library(bbmle)
library(emdbook)
library(MASS)
mtmp <- function(prob,size,theta) {
-sum(dbetabinom(x1,prob,size,theta,log=TRUE))
}
m0 <- mle2(mtmp,start=list(theta=100),
data=list(size=10,prob=mean(x1)/10),control=list(maxit=1000))
THETA=coef(m0)[1]
We can also use a normal distribution:
normal_fit = fitdistr(x1,"normal")
MEAN=normal_fit$estimate[1]
SD=normal_fit$estimate[2]
Plot both of them:
lines(bp[,1],dbetabinom(1:10,size=10,prob=mean(x1)/10,theta=THETA),
col="blue",lwd=2)
lines(bp[,1],dnorm(1:10,MEAN,SD),col="orange",lwd=2)
legend("topleft",c("normal","betabinomial"),fill=c("orange","blue"))
I think you are actually ok with using a normal estimation and in this case it will be:
normal_fit$estimate
mean sd
6.560000 1.134196

Generating multiple confidence intervals from samples of a normal distribution in R

I am an statistics student and R beginner (understatement of the year) trying to generate multiple confidence intervals for randomly generated samples of a normal distribution as part of an assignment.
I used the function
data <- replicate(25, rnorm(20, 50, 6))
to generate 25 samples of size n=20 from a N(50, 6^2) distribution (in a double matrix).
My question is, how do I find a 95% confidence interval for each sample of this distribution? I know that I can use colMeans(data) and sd(data) to find the sample mean and sample standard deviation for each sample, but I am having a brain fart trying to think of a function that can generate the confidence intervals for all columns in the double matrix (data).
As of now, my (extremely crude) solution consists of creating the functions
left <- function (x,y){x-(qnorm(0.975)*y/sqrt(20))}
right <- function (x,y){x+(qnorm(0.975)*y/sqrt(20))}
left(colMeans(data), sd(data)
right(colMeans(data), sd(data)
to generate 2 vectors of left and right bounds. Please let me know if there is a better way I can do this.
I suppose you could use the t.test() function. It returns the mean and the 95% confidence interval for a given vector of numbers.
# Create your data
data <- replicate(25, rnorm(20, 50, 6))
data <- as.data.frame(data)
After you make your data, you could apply the t.test() function to all columns using the lapply() function.
# Apply the t.test function and save the results
results <- lapply(data, t.test)
If you only want to see the confidence interval or mean returned, you can call them using the dollar sign operator. For example, for column one of your original data frame, you could type the following:
# Check 95% CI for sample one
results[[1]]$conf.int[1:2]
You could come up with a more eloquent way of saving these data to a results data frame. Remember, you can always see what individual bits of information you can yank from an object by using the str() command. For example:
# Example
example <- t.test(data[,1])
str(example)
Hope this helps. Try this link for more information: Using R to find Confidence Intervals

Resources