Does anyone know why the t-distribution in the histogram overlay is just an horizontal line? The warnings() in fit.std result from the etimation of the dof, which can lead to an infinite likelihood - see Fernandez & Steel (1999).
library(zoo)
library(rugarch)
data(sp500ret)
g= zoo(sp500ret$SP500RET, as.Date(rownames(sp500ret)))
(fit.std = fitdistr(g,"t"))
mu.std = fit.std$estimate[["m"]]
lambda = fit.std$estimate[["s"]]
nu = fit.std$estimate[["df"]]
# plot
hist(g, density=20, breaks=20, prob=T)
curve(dt(x, nu, lambda), col="red", lwd=2, add=TRUE, yaxt="n")
From the help file for fitdistr:
For the "t" named distribution the density is taken to be the location-scale family with location m and scale s.
For a location-scale family if we have a location parameter m and a scale parameter s then we can get the density at 'x' using the standardized version (location = 0, scale = 1, call it f) by using:
f((x-m)/s)/s
So for you we have mu.std is the location parameter and lambda is the scale so we would want to change your line to:
curve(dt((x-mu.std)/lambda, nu)/lambda, col="red", lwd=2, add=TRUE, yaxt="n")
Related
I'm trying to use Monte Carlo Aprox. in R in order to find a solution of this problem:
I have a x ~ U(0,1 and Y=log(X).
What I want obatain is an estimation of the pdf and the cdf.
The problem is that My goal is obtain an estimation of the CDF without use ECDF comand. So, there is any way to aproximate my CDF without this comand? theoretically I can integrate my pdf but I don't know its exact shape.
In order to obtain these two I create this R code:
X = runif(1000) # a= 0 and b=1 default
sample = log(X)
hist(sample, xlim=c(-6,0), main="Estimated vs true pdf", freq = FALSE,
axes=FALSE, xlab="", ylab="")
par(new=T)
curve(exp(x), xlim = c(-6, 0), n = 1000, col = "blue" , lwd = 3,
xlab="", ylab="")
text(-1, 0.8, expression(f(x) == e^{x}), col = "blue")
#CDF
plot(ecdf(sample), main="Estimated CDF")
Is it correct? consider that in the next point I obtain the true shape of the pdf that is f(y) = e^-y define between -inf and 0.
I would like to plot the line and the 95% confidence interval from a linear model where the response has been logit transformed back on the original scale of the data. So the result should be a curved line including the confidence intervals on the original scale, where it would be a straight line on the logit transformed scale. See code:
# Data
dat <- data.frame(c(45,75,14,45,45,55,65,15,3,85),
c(.37, .45, .24, .16, .46, .89, .16, .24, .23, .49))
colnames(dat) <- c("age", "bil.")
# Logit transformation
dat$bb_logit <- log(dat$bil./(1-dat$bil.))
# Model
modelbb <- lm(bb_logit ~ age + I(age^2), data=dat)
summary(modelbb)
# Backtranform
dat$bb_back <- exp(predict.lm(modelbb))/ (1 + exp(predict.lm(modelbb)))
# Plot
plot(dat$age, dat$bb_back)
abline(modelbb)
What do I try here is to plot the curved regression line and add the confidence interval. Within ggplot2 there is the geom_smooth function where the the linear model can be specified, but I could not find a way of plotting the predictions from the predict.lm(my model).
I would also like to know how to add a coloured polygon which will represent the confidence interval as in the image below. I know I have to use function polygon and coordinates but I do not know how.
You may use predict on an age range say 1:100, specify interval= option for the CIs. Plotting with type="l" will smooth a nice curve. Confidence intervals then can be added using lines.
p <- predict(modelbb, data.frame(age=1:100), interval="confidence")
# Backtransform
p.tr <- exp(p) / (1 + exp(p))
plot(1:100, p.tr[,1], type="l", ylim=range(p.tr), xlab="age", ylab="bil.")
sapply(2:3, function(i) lines(1:100, p.tr[,i], lty=2))
legend("topleft", legend=c("fit", "95%-CI"), lty=1:2)
Yields
Edit
To get shaded confidence bands use polygon. Since you want two confidence levels you probably need to make one prediction for each. The line will get covered by the polygons, so it's better to make an empty plot first using type="n" and draw the lines at the end. (Note that I'll also show you some hints for custom axis labeling.) The trick for the polygons is to express the values back and forth using rev.
p.95 <- predict(modelbb, data.frame(age=1:100), interval="confidence", level=.95)
p.99 <- predict(modelbb, data.frame(age=1:100), interval="confidence", level=.99)
# Backtransform
p.95.tr <- exp(p.95) / (1 + exp(p.95))
p.99.tr <- exp(p.99) / (1 + exp(p.99))
plot(1:100, p.99.tr[,1], type="n", ylim=range(p.99.tr), xlab="Age", ylab="",
main="", yaxt="n")
mtext("Tree biomass production", 3, .5)
mtext("a", 2, 2, at=1.17, xpd=TRUE, las=2, cex=3)
axis(2, (1:5)*.2, labels=FALSE)
mtext((1:5)*2, 2, 1, at=(1:5)*.2, las=2)
mtext(bquote(Production ~(kg~m^-2~year^-1)), 2, 2)
# CIs
polygon(c(1:100, 100:1), c(p.99.tr[,2], rev(p.99.tr[,3])), col=rgb(.5, 1, .2),
border=NA)
polygon(c(1:100, 100:1), c(p.95.tr[,2], rev(p.95.tr[,3])), col=rgb(0, .8, .5),
border=NA)
# fit
lines(1:100, p.99.tr[,1], ylim=range(p.99.tr), lwd=2)
#legend
legend("topleft", legend=c("fit", "99%-CI", "95%-CI"), lty=c(1, NA, NA), lwd=2,
pch=c(NA, 15, 15), bty="n",
col=c("#000000", rgb(.5, 1, .2), rgb(0, .8, .5)))
Yields
I perform the linear interpolation and specifies where interpolation is to take place. I get the interpolated data, but I don't know how to get the confidence interval for these interpolated data (red) by linear interpolation.
Is there any other way (loess,splines ) to do interpolation meeting two requirements:
Specifying where interpolation is to take place
getting the interpolated data and confidence interval for interpolated data.
Here is the code that I used.
x <- rnorm(100)
y <- 0.4 * x+ rnorm(100, 0, 1)
ptsLin <- approx(x, y, method="linear", xout=seq(-2,2, 0.1))
plot(x, y, xlab=NA, ylab=NA, pch=19, main="Linear interpolation",
cex=1.5)
points(ptsLin, pch=16, col="red", lwd=1.5)
lines(ptsLin)
legend(x="bottomleft", c("Data", "linear"), pch=c(19, 16), col=c("black", "red"
), bg="white")
I have the following data:
I plotted the points of that data and then smoothed it on the plot using the following code :
scatter.smooth(x=1:length(Ticker$ROIC[!is.na(Ticker$ROIC)]),
y=Ticker$ROIC[!is.na(Ticker$ROIC)],col = "#AAAAAA",
ylab = "ROIC Values", xlab = "Quarters since Feb 29th 2012 till Dec 31st 2016")
Now I want to find the Point-wise slope of this smoothed curve. Also fit a trend line to the smoothed graph. How can I do that?
There are some interesting R packages that implement nonparametric derivative estimation. The short review of Newell and Einbeck can be helpful: http://maths.dur.ac.uk/~dma0je/Papers/newell_einbeck_iwsm07.pdf
Here we consider an example based on the pspline package (smoothing splines with penalties on order m derivatives):
The data generating process is a negative logistic models with an additive noise (hence y values are all negative like the ROIC variable of #ForeverLearner) :
set.seed(1234)
x <- sort(runif(200, min=-5, max=5))
y = -1/(1+exp(-x))-1+0.1*rnorm(200)
We start plotting the nonparametric estimation of the curve (the black line is the true curve and the red one the estimated curve):
library(pspline)
pspl <- smooth.Pspline(x, y, df=5, method=3)
f0 <- predict(pspl, x, nderiv=0)
Then, we estimate the first derivative of the curve:
f1 <- predict(pspl, x, nderiv=1)
curve(-exp(-x)/(1+exp(-x))^2,-5,5, lwd=2, ylim=c(-.3,0))
lines(x, f1, lwd=3, lty=2, col="red")
And here the second derivative:
f2 <- predict(pspl, x, nderiv=2)
curve((exp(-x))/(1+exp(-x))^2-2*exp(-2*x)/(1+exp(-x))^3, -5, 5,
lwd=2, ylim=c(-.15,.15), ylab=)
lines(x, f2, lwd=3, lty=2, col="red")
#DATA
set.seed(42)
x = rnorm(20)
y = rnorm(20)
#Plot the points
plot(x, y, type = "p")
#Obtain points for the smooth curve
temp = loess.smooth(x, y, evaluation = 50) #Use higher evaluation for more points
#Plot smooth curve
lines(temp$x, temp$y, lwd = 2)
#Obtain slope of the smooth curve
slopes = diff(temp$y)/diff(temp$x)
#Add a trend line
abline(lm(y~x))
I really need helps to figure out:
Suppose we are testing H0: µ = 5 against H1: µ < 5 for a normal population with σ = 1. A random sample of size n = 9 is available from this population. The z test is used with α = 0.05. The rejection region region for this test is 1.645, x bar is 4.45.
1) On the same graph, use R to plot the sampling distribution of the test statistic when µ = 5 and when µ = 4.2.
2) On your graph, shade and label the area that represents the probability of type I error.
3) On your graph, shade and label the area that represents the probability of type II error.
4) Compute the probability of type II error when µ = 4.2. Provide the appropriate R codes.
I could figure out only 1):
z1 = (4.45 - 5)/(1/sqrt(9))
z1
k1 = seq(from=-1.65, to=+1.65, by=.05)
dens1 = dnorm(k1)
plot(k1, dens1, type="l")
par(new =TRUE)
z2 = (4.45 - 4.2)/(1/sqrt(9))
z2 k2 = seq(from=-.75, to=+0.75, by=.05)
dens2 = dnorm(k2)
p = plot(k2, dens2, type="l", xlab="", ylab="")
Some approximation to the graph (1) is:
curve(dnorm(x,5 ,sqrt(1/9)), xlim=c(0, 14), ylab='', lwd=2, col='blue')
curve(dnorm(x,4.2,sqrt(1/9)), add=T, lwd=2)
curve(dnorm(x,5,1), add=T, col='blue')
curve(dnorm(x,4.2,1), add=T)
legend('topright', c('Samp. dist. for mu=5','Samp. dist. for mu=4.2',
'N(5,1)','N(4.2,1)'),
bty='n', lwd=c(2,2,1,1), col=c(4,1,4,1))