Need Help Understanding R Code - r

I'm trying to find if (a) fewer than 62% or more than 74% of the sample means within one standard deviation of the expected value, or (b) fewer than 92% or more than 98% of the sample means within two standard deviations of the expected value.
Given that we have already set mu and sigma, and Finv is a quantile function. I was given the last two lines of code. Can someone please explain to me what they mean and what kind of output I should be getting? (Currently my only output is 0)
n.iterations <- 100000
n <- 10
xbar <- numeric(n.iterations)
for (i in 1:n.iterations){
x <- sapply(runif(n), Finv)
xbar[i] <- mean(x)
}
mean((mu-1*sigma/sqrt(n) <= xbar) & (xbar <= mu+1*sigma/sqrt(n)))
mean((mu-2*sigma/sqrt(n) <= xbar) & (xbar <= mu+2*sigma/sqrt(n)))

I'm a little bit puzzled by your question, because it askes about data "within standard deviation" but also asks about quantiles - which seems odd... and here is why
Consider the upper picture generated from the following code:
mymean <- 5
mysd <- 2
curve(dnorm(x, mean = mymean, sd = mysd), from = -2, to = 12)
abline(v = mymean, col = "red", lwd = 2)
xtimessd = 1
abline(v = c(mymean - mysd*xtimessd, mymean + mysd*xtimessd), col = "blue", lwd = 1, lty = 2)
xtimessd = 2
abline(v = c(mymean - mysd*xtimessd, mymean + mysd*xtimessd), col = "cyan", lwd = 1, lty = 2)
xtimessd = 3
abline(v = c(mymean - mysd*xtimessd, mymean + mysd*xtimessd), col = "green", lwd = 1, lty = 2)
# 62th and 74th quantile
targetQunatiles <- qnorm(c(0.62, 0.75), mean = mymean, sd = mysd)
abline(v = targetQunatiles, col = "orange", lwd = 2, lty = 1)
Given your population mean and standard deviation the figure about shows the probability density function of a normal distribution.
The dotted lines are the "xtimes within sd" values. (There is really no magic, but it is related to the 68–95–99.7 rule).
On the other hand, if we look into the quantile function, i.e., in your example we are looking into values 62% and 74%, that can be computed by qnorm.
As you can see, based on your question "fewer than 62% or more than 74% of the sample means", you will exclude values between 5.610962 and 6.348980.
So, still, from your question I don't know what you are asking about the relation between the statement of "within sd" and "looking for quantiles" as both are independen of each other.

Related

Density plot of the F-distribution (df1=1). Theoretical or simulated?

I am plotting the density of F(1,49) in R. It seems that the simulated plot does not match the theoretical plot when values approach the zero.
set.seed(123)
val <- rf(1000, df1=1, df2=49)
plot(density(val), yaxt="n",ylab="",xlab="Observation",
main=expression(paste("Density plot (",italic(n),"=1000, ",italic(df)[1],"=1, ",italic(df)[2],"=49)")),
lwd=2)
curve(df(x, df1=1, df2=49), from=0, to=10, add=T, col="red",lwd=2,lty=2)
legend("topright",c("Theoretical","Simulated"),
col=c("red","black"),lty=c(2,1),bty="n")
Using density(val, from = 0) gets you much closer, although still not perfect. Densities near boundaries are notoriously difficult to calculate in a satisfactory way.
By default, density uses a Gaussian kernel to estimate the probability density at a given point. Effectively, this means that at each point an observation was found, a normal density curve is placed there with its center at the observation. All these normal densities are added up, then the result is normalized so that the area under the curve is 1.
This works well if observations have a central tendency, but gives unrealistic results when there are sharp boundaries (Try plot(density(runif(1000))) for a prime example).
When you have a very high density of points close to zero, but none below zero, the left tail of all the normal kernels will "spill over" into the negative values, giving a Gaussian-type which doesn't match the theoretical density.
This means that if you have a sharp boundary at 0, you should remove values of your simulated density that are between zero and about two standard deviations of your smoothing kernel - anything below this will be misleading.
Since we can control the standard deviation of our smoothing kernel with the bw parameter of density, and easily control which x values are plotted using ggplot, we will get a more sensible result by doing something like this:
library(ggplot2)
ggplot(as.data.frame(density(val), bw = 0.1), aes(x, y)) +
geom_line(aes(col = "Simulated"), na.rm = TRUE) +
geom_function(fun = ~ df(.x, df1 = 1, df2 = 49),
aes(col = "Theoretical"), lty = 2) +
lims(x = c(0.2, 12)) +
theme_classic(base_size = 16) +
labs(title = expression(paste("Density plot (",italic(n),"=1000, ",
italic(df)[1],"=1, ",italic(df)[2],"=49)")),
x = "Observation", y = "") +
scale_color_manual(values = c("black", "red"), name = "")
The kde1d and logspline packages are not bad for such densities.
sims <- rf(1500, 1, 49)
library(kde1d)
kd <- kde1d(sims, bw = 1, xmin = 0)
plot(kd, col = "red", xlim = c(0, 2), ylim = c(0, 2))
curve(df(x, 1, 49), add = TRUE)
library(logspline)
fit <- logspline(sims, lbound = 0, knots = c(0, 0.5, 1, 1.5, 2))
plot(fit, col = "red", xlim = c(0, 2), ylim = c(0, 2))
curve(df(x, 1, 49), add = TRUE)

Plotting posterior distribution in R

I want to compute a posterior density plot with conjugate prior
I have data with known parameters (mean =30 , sd =10)
I have two priors one with normal distribution with known parameter ( mean =10 , sd=5) and other with t distribution with same mean and sd but degree of freedom 4
I want a graph with density plots for prior,data and posterior ?
can you help me with r code for this problem ?
Plus I am getting wrong density function for posterior in my opinion..Here is my code so far
x=seq(from=-90, to=90, by= 1)
data=dnorm(x,mean=30,sd =10)
prior = dnorm(x,mean=10,sd =5)
posterior = dnorm(x,mean=10,sd =5)*dnorm(x,mean=30,sd =10) # prior* data #Prior*data
plot(x,data , type="l", col="blue")
lines(x,prior, type="l", col="red")
lines(x,posterior , type="l", col="green")
You need to add the two distributions together not multiply. I attach an example below that uses equal weight between the two distributions:
x <- seq(from = -90, to = 90, by = 1)
data <- dnorm(x, mean = 30, sd = 10)
prior <- dnorm(x, mean = 10, sd = 5)
posterior <- 0.5 * dnorm(x, mean = 10, sd = 5) + 0.5 * dnorm(x, mean = 30, sd = 10)
plot(x, prior, type = "l", col = "red")
lines(x, posterior, type = "l", col = "green")
lines(x, data , type = "l", col = "blue")

How to fit a curve to a histogram

I've explored similar questions asked about this topic but I am having some trouble producing a nice curve on my histogram. I understand that some people may see this as a duplicate but I haven't found anything currently to help solve my problem.
Although the data isn't visible here, here is some variables I am using just so you can see what they represent in the code below.
Differences <- subset(Score_Differences, select = Difference, drop = T)
m = mean(Differences)
std = sqrt(var(Differences))
Here is the very first curve I produce (the code seems most common and easy to produce but the curve itself doesn't fit that well).
hist(Differences, density = 15, breaks = 15, probability = TRUE, xlab = "Score Differences", ylim = c(0,.1), main = "Normal Curve for Score Differences")
curve(dnorm(x,m,std),col = "Red", lwd = 2, add = TRUE)
I really like this but don't like the curve going into the negative region.
hist(Differences, probability = TRUE)
lines(density(Differences), col = "Red", lwd = 2)
lines(density(Differences, adjust = 2), lwd = 2, col = "Blue")
This is the same histogram as the first, but with frequencies. Still doesn't look that nice.
h = hist(Differences, density = 15, breaks = 15, xlab = "Score Differences", main = "Normal Curve for Score Differences")
xfit = seq(min(Differences),max(Differences))
yfit = dnorm(xfit,m,std)
yfit = yfit*diff(h$mids[1:2])*length(Differences)
lines(xfit, yfit, col = "Red", lwd = 2)
Another attempt but no luck. Maybe because I am using qnorm, when the data obviously isn't normal. The curve goes into the negative direction again.
sample_x = seq(qnorm(.001, m, std), qnorm(.999, m, std), length.out = l)
binwidth = 3
breaks = seq(floor(min(Differences)), ceiling(max(Differences)), binwidth)
hist(Differences, breaks)
lines(sample_x, l*dnorm(sample_x, m, std)*binwidth, col = "Red")
The only curve that visually looks nice is the 2nd, but the curve falls into the negative direction.
My question is "Is there a "standard way" to place a curve on a histogram?" This data certainly isn't normal. 3 of the procedures I presented here are from similar posts but I am having some troubles obviously. I feel like all methods of fitting a curve will depend on the data you're working with.
Update with solution
Thanks to Zheyuan Li and others! I will leave this up for my own reference and hopefully others as well.
hist(Differences, probability = TRUE)
lines(density(Differences, cut = 0), col = "Red", lwd = 2)
lines(density(Differences, adjust = 2, cut = 0), lwd = 2, col = "Blue")
OK, so you are just struggling with the fact that density goes beyond "natural range". Well, just set cut = 0. You possibly want to read plot.density extends “xlim” beyond the range of my data. Why and how to fix it? for why. In that answer, I was using from and to. But now I am using cut.
## consider a mixture, that does not follow any parametric distribution family
## note, by construction, this is a strictly positive random variable
set.seed(0)
x <- rbeta(1000, 3, 5) + rexp(1000, 0.5)
## (kernel) density estimation offers a flexible nonparametric approach
d <- density(x, cut = 0)
## you can plot histogram and density on the density scale
hist(x, prob = TRUE, breaks = 50)
lines(d, col = 2)
Note, by cut = 0, density estimation is done strictly within range(x). Outside this range, density is 0.

Diagram that compares the power of the t-Test with the power of the Chi-sqare-Test

I try to compare the power functions of the Chi-square-Test and the t-Test for one particular value and my overall goal was to show that the t-Test is more powerful (because it has an assumption about the distribution). I used the pwr package for R for calculating the power of each function and then wrote two functions and plotted the results.
However, I do not find that the t-test is better than the Chi-square-test, and I am puzzled by the result. I spend hours on it so every help is so much appreciated.
Is the code wrong, do I have a wrong understanding of the power functions, or is there something wrong in the package?
library(pwr)
#mu is the value for which the power is calculated
#no is the number of observations
#function of the power of the t-test with a h0 of .2
g <- function(mu, alpha, no) { #calculate the power of a particular value for the t-test with h0=.2
p <- mu-.20
sigma <- sqrt(.5*(1-.5))
pwr.t.test(n = no, d = p/sigma, sig.level = alpha, type = "one.sample", alternative="greater")$power # d is the effect size p/sigma
}
#chi squared test
h <- function(mu, alpha, no, degree) {#calculate the power of a particular value for the chi squared test
p01 <- .2 # these constructs the effect size (which is a bit different for the chi squared)
p02 <- .8
p11 <-mu
p12 <- 1-p11
effect.size <- sqrt(((p01-p11)^2/p01)+((p02-p12)^2/p02)) # effect size
pwr.chisq.test(N=no, df=degree, sig.level = alpha, w=effect.size)$power
}
#create a diagram
plot(1, 1, type = "n",
xlab = expression(mu),
xlim = c(.00, .75),
ylim = c(0, 1.1),
ylab = expression(1-beta),
axes=T, main="Power function t-Test and Chi-squared-Test")
axis(side = 2, at = c(0.05), labels = c(expression(alpha)), las = 3)
axis(side = 1, at = 3, labels = expression(mu[0]))
abline(h = c(0.05, 1), lty = 2)
legend(.5,.5, # places a legend at the appropriate place
c("t-Test","Chi-square-Test"), # puts text in the legend
lwd=c(2.5,2.5),col=c("black","red"))
curve(h(x, alpha = 0.05, no = 100, degree=1), from = .00, to = .75, add = TRUE, col="red",lwd=c(2.5,2.5) )
curve(g(x, alpha = 0.05, no = 100), from = .00, to = .75, add = TRUE, lwd=c(2.5,2.5))
Thanks a lot in advance!
If I understand the problem right, you are testing for a Binomial distribution with the mean under the null equal to 0.2 and the alternative being null greater than 0.2? If so, then on line 2 of you function g, shouldn't it be sigma <- sqrt(.2*(1-.2)) instead of sigma <- sqrt(.5*(1-.5))? That way, your standard deviation will be smaller, resulting in a larger test statistic and hence smaller p-value leading to higher power.

Plot normal, left and right skewed distribution in R

I want to create 3 plots for illustration purposes:
- normal distribution
- right skewed distribution
- left skewed distribution
This should be an easy task, but I found only this link, which only shows a normal distribution. How do I do the rest?
If you are not too tied to normal, then I suggest you use beta distribution which can be symmetrical, right skewed or left skewed based on the shape parameters.
hist(rbeta(10000,5,2))
hist(rbeta(10000,2,5))
hist(rbeta(10000,5,5))
Finally I got it working, but with both of your help, but I was relying on this site.
N <- 10000
x <- rnbinom(N, 10, .5)
hist(x,
xlim=c(min(x),max(x)), probability=T, nclass=max(x)-min(x)+1,
col='lightblue', xlab=' ', ylab=' ', axes=F,
main='Positive Skewed')
lines(density(x,bw=1), col='red', lwd=3)
This is also a valid solution:
curve(dbeta(x,8,4),xlim=c(0,1))
title(main="posterior distrobution of p")
just use fGarch package and these functions:
dsnorm(x, mean = 0, sd = 1, xi = 1.5, log = FALSE)
psnorm(q, mean = 0, sd = 1, xi = 1.5)
qsnorm(p, mean = 0, sd = 1, xi = 1.5)
rsnorm(n, mean = 0, sd = 1, xi = 1.5)
** mean, sd, xi location parameter mean, scale parameter sd, skewness parameter xi.
Examples
## snorm -
# Ranbdom Numbers:
par(mfrow = c(2, 2))
set.seed(1953)
r = rsnorm(n = 1000)
plot(r, type = "l", main = "snorm", col = "steelblue")
# Plot empirical density and compare with true density:
hist(r, n = 25, probability = TRUE, border = "white", col = "steelblue")
box()
x = seq(min(r), max(r), length = 201)
lines(x, dsnorm(x), lwd = 2)
# Plot df and compare with true df:
plot(sort(r), (1:1000/1000), main = "Probability", col = "steelblue",
ylab = "Probability")
lines(x, psnorm(x), lwd = 2)
# Compute quantiles:
round(qsnorm(psnorm(q = seq(-1, 5, by = 1))), digits = 6)

Resources