Sampling distribution using R - r

I really need helps to figure out:
Suppose we are testing H0: µ = 5 against H1: µ < 5 for a normal population with σ = 1. A random sample of size n = 9 is available from this population. The z test is used with α = 0.05. The rejection region region for this test is 1.645, x bar is 4.45.
1) On the same graph, use R to plot the sampling distribution of the test statistic when µ = 5 and when µ = 4.2.
2) On your graph, shade and label the area that represents the probability of type I error.
3) On your graph, shade and label the area that represents the probability of type II error.
4) Compute the probability of type II error when µ = 4.2. Provide the appropriate R codes.
I could figure out only 1):
z1 = (4.45 - 5)/(1/sqrt(9))
z1
k1 = seq(from=-1.65, to=+1.65, by=.05)
dens1 = dnorm(k1)
plot(k1, dens1, type="l")
par(new =TRUE)
z2 = (4.45 - 4.2)/(1/sqrt(9))
z2 k2 = seq(from=-.75, to=+0.75, by=.05)
dens2 = dnorm(k2)
p = plot(k2, dens2, type="l", xlab="", ylab="")

Some approximation to the graph (1) is:
curve(dnorm(x,5 ,sqrt(1/9)), xlim=c(0, 14), ylab='', lwd=2, col='blue')
curve(dnorm(x,4.2,sqrt(1/9)), add=T, lwd=2)
curve(dnorm(x,5,1), add=T, col='blue')
curve(dnorm(x,4.2,1), add=T)
legend('topright', c('Samp. dist. for mu=5','Samp. dist. for mu=4.2',
'N(5,1)','N(4.2,1)'),
bty='n', lwd=c(2,2,1,1), col=c(4,1,4,1))

Related

Find the CDF of y = log(x) where x ~ U[0,1]

I'm trying to use Monte Carlo Aprox. in R in order to find a solution of this problem:
I have a x ~ U(0,1 and Y=log(X).
What I want obatain is an estimation of the pdf and the cdf.
The problem is that My goal is obtain an estimation of the CDF without use ECDF comand. So, there is any way to aproximate my CDF without this comand? theoretically I can integrate my pdf but I don't know its exact shape.
In order to obtain these two I create this R code:
X = runif(1000) # a= 0 and b=1 default
sample = log(X)
hist(sample, xlim=c(-6,0), main="Estimated vs true pdf", freq = FALSE,
axes=FALSE, xlab="", ylab="")
par(new=T)
curve(exp(x), xlim = c(-6, 0), n = 1000, col = "blue" , lwd = 3,
xlab="", ylab="")
text(-1, 0.8, expression(f(x) == e^{x}), col = "blue")
#CDF
plot(ecdf(sample), main="Estimated CDF")
Is it correct? consider that in the next point I obtain the true shape of the pdf that is f(y) = e^-y define between -inf and 0.

Do I need to include prior probability when plotting probability in R?

Is the wrong or am I overthinking?
For a binary classification problem {0, 1} with 1 predictor X. The prior probability of being in class 0 is = 0.6 and the density function is standard normal f0(x) = Normal (0,1)
The density function for X in class 1 is also normal but with mean = 1 and variance = 0.5. f1(x) = Normal (1, 0.5)
and I need to plot priorprobabilityf0(x) and priorprobailityf1(x) into the same figure.
This is what I have written in R but I cant seem to work out if I was meant to include the prior probabilities.
x <- seq(-5, 5, length=200)
y <- dnorm(x, mean= 0, sd=1)
t = sqrt(0.5)
z <- seq(-5, 5, length=200)
a <- dnorm(x, mean=1, sd=t)
plot(z, a, type="l", lwd=2, col='red', xlim = c(-5,5), ylim = c(0,0.65), xlab = "Observed Value", ylab = "Probability Density")
lines(x,y, col='blue', lwd=2)

how to calculate the slope of a smoothed curve in R

I have the following data:
I plotted the points of that data and then smoothed it on the plot using the following code :
scatter.smooth(x=1:length(Ticker$ROIC[!is.na(Ticker$ROIC)]),
y=Ticker$ROIC[!is.na(Ticker$ROIC)],col = "#AAAAAA",
ylab = "ROIC Values", xlab = "Quarters since Feb 29th 2012 till Dec 31st 2016")
Now I want to find the Point-wise slope of this smoothed curve. Also fit a trend line to the smoothed graph. How can I do that?
There are some interesting R packages that implement nonparametric derivative estimation. The short review of Newell and Einbeck can be helpful: http://maths.dur.ac.uk/~dma0je/Papers/newell_einbeck_iwsm07.pdf
Here we consider an example based on the pspline package (smoothing splines with penalties on order m derivatives):
The data generating process is a negative logistic models with an additive noise (hence y values are all negative like the ROIC variable of #ForeverLearner) :
set.seed(1234)
x <- sort(runif(200, min=-5, max=5))
y = -1/(1+exp(-x))-1+0.1*rnorm(200)
We start plotting the nonparametric estimation of the curve (the black line is the true curve and the red one the estimated curve):
library(pspline)
pspl <- smooth.Pspline(x, y, df=5, method=3)
f0 <- predict(pspl, x, nderiv=0)
Then, we estimate the first derivative of the curve:
f1 <- predict(pspl, x, nderiv=1)
curve(-exp(-x)/(1+exp(-x))^2,-5,5, lwd=2, ylim=c(-.3,0))
lines(x, f1, lwd=3, lty=2, col="red")
And here the second derivative:
f2 <- predict(pspl, x, nderiv=2)
curve((exp(-x))/(1+exp(-x))^2-2*exp(-2*x)/(1+exp(-x))^3, -5, 5,
lwd=2, ylim=c(-.15,.15), ylab=)
lines(x, f2, lwd=3, lty=2, col="red")
#DATA
set.seed(42)
x = rnorm(20)
y = rnorm(20)
#Plot the points
plot(x, y, type = "p")
#Obtain points for the smooth curve
temp = loess.smooth(x, y, evaluation = 50) #Use higher evaluation for more points
#Plot smooth curve
lines(temp$x, temp$y, lwd = 2)
#Obtain slope of the smooth curve
slopes = diff(temp$y)/diff(temp$x)
#Add a trend line
abline(lm(y~x))

Students-t distribution and histogram overlay

Does anyone know why the t-distribution in the histogram overlay is just an horizontal line? The warnings() in fit.std result from the etimation of the dof, which can lead to an infinite likelihood - see Fernandez & Steel (1999).
library(zoo)
library(rugarch)
data(sp500ret)
g= zoo(sp500ret$SP500RET, as.Date(rownames(sp500ret)))
(fit.std = fitdistr(g,"t"))
mu.std = fit.std$estimate[["m"]]
lambda = fit.std$estimate[["s"]]
nu = fit.std$estimate[["df"]]
# plot
hist(g, density=20, breaks=20, prob=T)
curve(dt(x, nu, lambda), col="red", lwd=2, add=TRUE, yaxt="n")
From the help file for fitdistr:
For the "t" named distribution the density is taken to be the location-scale family with location m and scale s.
For a location-scale family if we have a location parameter m and a scale parameter s then we can get the density at 'x' using the standardized version (location = 0, scale = 1, call it f) by using:
f((x-m)/s)/s
So for you we have mu.std is the location parameter and lambda is the scale so we would want to change your line to:
curve(dt((x-mu.std)/lambda, nu)/lambda, col="red", lwd=2, add=TRUE, yaxt="n")

How to plot a normal distribution by labeling specific parts of the x-axis?

I am using the following code to create a standard normal distribution in R:
x <- seq(-4, 4, length=200)
y <- dnorm(x, mean=0, sd=1)
plot(x, y, type="l", lwd=2)
I need the x-axis to be labeled at the mean and at points three standard deviations above and below the mean. How can I add these labels?
The easiest (but not general) way is to restrict the limits of the x axis. The +/- 1:3 sigma will be labeled as such, and the mean will be labeled as 0 - indicating 0 deviations from the mean.
plot(x,y, type = "l", lwd = 2, xlim = c(-3.5,3.5))
Another option is to use more specific labels:
plot(x,y, type = "l", lwd = 2, axes = FALSE, xlab = "", ylab = "")
axis(1, at = -3:3, labels = c("-3s", "-2s", "-1s", "mean", "1s", "2s", "3s"))
Using the code in this answer, you could skip creating x and just use curve() on the dnorm function:
curve(dnorm, -3.5, 3.5, lwd=2, axes = FALSE, xlab = "", ylab = "")
axis(1, at = -3:3, labels = c("-3s", "-2s", "-1s", "mean", "1s", "2s", "3s"))
But this doesn't use the given code anymore.
If you like hard way of doing something without using R built in function or you want to do this outside R, you can use the following formula.
x<-seq(-4,4,length=200)
s = 1
mu = 0
y <- (1/(s * sqrt(2*pi))) * exp(-((x-mu)^2)/(2*s^2))
plot(x,y, type="l", lwd=2, col = "blue", xlim = c(-3.5,3.5))
An extremely inefficient and unusual, but beautiful solution, which works based on the ideas of Monte Carlo simulation, is this:
simulate many draws (or samples) from a given distribution (say the normal).
plot the density of these draws using rnorm. The rnorm function takes as arguments (A,B,C) and returns a vector of A samples from a normal distribution centered at B, with standard deviation C.
Thus to take a sample of size 50,000 from a standard normal (i.e, a normal with mean 0 and standard deviation 1), and plot its density, we do the following:
x = rnorm(50000,0,1)
plot(density(x))
As the number of draws goes to infinity this will converge in distribution to the normal. To illustrate this, see the image below which shows from left to right and top to bottom 5000,50000,500000, and 5 million samples.
In general case, for example: Normal(2, 1)
f <- function(x) dnorm(x, 2, 1)
plot(f, -1, 5)
This is a very general, f can be defined freely, with any given parameters, for example:
f <- function(x) dbeta(x, 0.1, 0.1)
plot(f, 0, 1)
I particularly love Lattice for this goal. It easily implements graphical information such as specific areas under a curve, the one you usually require when dealing with probabilities problems such as find P(a < X < b) etc.
Please have a look:
library(lattice)
e4a <- seq(-4, 4, length = 10000) # Data to set up out normal
e4b <- dnorm(e4a, 0, 1)
xyplot(e4b ~ e4a, # Lattice xyplot
type = "l",
main = "Plot 2",
panel = function(x,y, ...){
panel.xyplot(x,y, ...)
panel.abline( v = c(0, 1, 1.5), lty = 2) #set z and lines
xx <- c(1, x[x>=1 & x<=1.5], 1.5) #Color area
yy <- c(0, y[x>=1 & x<=1.5], 0)
panel.polygon(xx,yy, ..., col='red')
})
In this example I make the area between z = 1 and z = 1.5 stand out. You can move easily this parameters according to your problem.
Axis labels are automatic.
This is how to write it in functions:
normalCriticalTest <- function(mu, s) {
x <- seq(-4, 4, length=200) # x extends from -4 to 4
y <- (1/(s * sqrt(2*pi))) * exp(-((x-mu)^2)/(2*s^2)) # y follows the formula
of the normal distribution: f(Y)
plot(x,y, type="l", lwd=2, xlim = c(-3.5,3.5))
abline(v = c(-1.96, 1.96), col="red") # draw the graph, with 2.5% surface to
either side of the mean
}
normalCriticalTest(0, 1) # draw a normal distribution with vertical lines.
Final result:

Resources