R calculated gamma distribution density wrong? - r

I am trying to numerically calculate marginal likelihood (marginalize over a positive parameter). I am using Gamma distribution as prior for that parameter. Here I looked at the behavior of Gamma distribution for two specific parameter settings:
s = 28.4; r = 17000
plot(x, dgamma(x, shape=s, rate = r), type = 'l', ylab = 'density')
abline(v = s/r, col = 'red')
I got the following results:
Then I tried the following to get a tighter Gamma distribution:
lines(x, dgamma(x, shape=s*1000, rate = r*1000), col = 'blue')
and the result:
I am confused. As distribution gets tighter, the height should grow taller, otherwise the area won't integrate to 1. Did I miss anything? Or is there any numerally problems? Thanks!

Your x variable needs to have more samples to capture the narrow peak in the second density function:
x = seq(0, .01, .000001)
s = 28.4; r = 17000
plot(x, dgamma(x, shape=s, rate = r), type = 'l', ylab = 'density')
abline(v = s/r, col = 'red')
lines(x, dgamma(x, shape=s*1000, rate = r*1000), col = 'blue')

Related

Problem with my code- Univariate regression plot not showing lines

this will sound very basic, but I cannot find the solution to this problem with my code. I did a univariate regression (regr1) between the 2 variables immigrate_policy and lrgen. In plotting the commands for the lines do not show.
One problem could be the sequence maybe? Because the range for lrgen should actually be between 1 and 9, but I had to put manually 1:8 because every other sequence I put gives me an error. With this sequence, however, the lines in the plot are weird, and definitely not right
Following is my code:
regr1 <- lm(formula = ITA$immigrate_policy ~ ITA$lrgen, data = ITA)
summary(regr1)
install.packages("stargazer") library(stargazer) help(stargazer)
stargazer(regr1, type ="html",out="project.html")
stargazer(regr1, type="text",out="project/regression.html")
plot(ITA$lrgen, ITA$immigrate_policy,
xlab = "Political Stance of the party", ylab = "Position towards Immigration policies") abline(regr1, col = "red", lwd = 2)
range(ITA$lrgen)
ci <- data.frame(lrgen = seq(1:8))
sim <- predict(regr1, newdata = ci, interval = "confidence", level =
0.99)
lines(c(1:8),sim[,2], lt = "dashed", lwd = 1, col = "yellow")
lines(c(1:8),sim[,3], lt = "dashed", lwd = 1, col = "yellow")

How to reduce cubic equation's maximum peak (fitting)

The ventilation volume data were collected according to the efficiency. Several samples were taken and fitted into cubic equations.
It was written in Excel, and a third regression equation was obtained.
However, as you can see from the picture, the ventilation volume at 90-95% is higher than 100%. The data should never be higher than 100%, but the maximum vertex of the auto regression is convex so that it exceeds 100% in the form of a curve.
Is there a way to reduce the maximum vertex and fit it? Use the measured data as it is, but do not exceed 100%.
The use of R or other statistical programs is also welcome.
R values ​​can be a little lower.
Thank you.
I extracted data from the scatterplot and found a good fit to a Gompertz type of sigmoidal equation "a * exp(-1.0 * exp((x - b)/c)) + Offset", with the extracted data giving parameters a = -4.7537951574153149E+03, b = 5.4531406419707224E+01, c = 2.1494180901343391E+01, and Offset = 4.4056239791186508E+03 yielding RMSE = 57.17 and R-squared = 0.9988, see below. If this seems like it might be useful to you, I suggest re-fitting the actual data using these values as the initial parameter estimates.
Here are a few ideas in R:
First, I'm making some example data that are similar to yours and fitting a linear model with x^3, x^2, and x as predictors:
# make example data
xx = rep(c(30, 50, 70, 100), each = 10)
yy = 1/(1+exp(-(xx-50)/15)) * 4798.20 + rnorm(length(xx), sd = 20)
xx = c(0, xx)
yy = c(0, yy)
# fit third-order linear model
m0 = lm(yy ~ I(xx^3) + I(xx^2) + xx)
x_to_predict = data.frame(xx = seq(0, 100, length.out = length(xx)))
lm_preds = predict(m0, newdata = x_to_predict)
Idea 1: You could fit a model that uses a sigmoid (or other monotonic) curve.
# fit quasibinomial model for proportion
# first scale response variable between 0 and 1
m1 = glm(I(yy/max(yy)) ~ xx , family = quasibinomial())
# predict
preds_glm = predict(m1,
newdata = x_to_predict,
type = "response")
Idea 2: Fit a generalized additive model that will make a smooth curve.
# fit Generalized Additive Model
library(mgcv)
# you have to tune "k" somewhat -- larger means more "wiggliness"
m2 = gam(yy ~ s(xx, k = 4))
gam_preds = predict(m2,
newdata = x_to_predict,
type = "response")
Here's what the plots for each model look like:
# plot data and predictions
plot(xx, yy, ylab = "result", xlab = "efficiency")
lines(x_to_predict$xx,
preds_glm*max(yy), "l", col = 'red', lwd = 2)
lines(x_to_predict$xx,
gam_preds, "l", col = 'blue', lwd = 2)
lines(x_to_predict$xx, lm_preds,
"l", col = 'black', lwd = 2, lty = 2)
legend("bottomright",
lty = c(0, 1, 1, 2),
legend = c("data", "GLM prediction", "GAM prediction", "third-order lm"),
pch = c(1, NA_integer_, NA_integer_, NA_integer_),
col = c("black", "red", "blue", "black"))

How to fit a curve to a histogram

I've explored similar questions asked about this topic but I am having some trouble producing a nice curve on my histogram. I understand that some people may see this as a duplicate but I haven't found anything currently to help solve my problem.
Although the data isn't visible here, here is some variables I am using just so you can see what they represent in the code below.
Differences <- subset(Score_Differences, select = Difference, drop = T)
m = mean(Differences)
std = sqrt(var(Differences))
Here is the very first curve I produce (the code seems most common and easy to produce but the curve itself doesn't fit that well).
hist(Differences, density = 15, breaks = 15, probability = TRUE, xlab = "Score Differences", ylim = c(0,.1), main = "Normal Curve for Score Differences")
curve(dnorm(x,m,std),col = "Red", lwd = 2, add = TRUE)
I really like this but don't like the curve going into the negative region.
hist(Differences, probability = TRUE)
lines(density(Differences), col = "Red", lwd = 2)
lines(density(Differences, adjust = 2), lwd = 2, col = "Blue")
This is the same histogram as the first, but with frequencies. Still doesn't look that nice.
h = hist(Differences, density = 15, breaks = 15, xlab = "Score Differences", main = "Normal Curve for Score Differences")
xfit = seq(min(Differences),max(Differences))
yfit = dnorm(xfit,m,std)
yfit = yfit*diff(h$mids[1:2])*length(Differences)
lines(xfit, yfit, col = "Red", lwd = 2)
Another attempt but no luck. Maybe because I am using qnorm, when the data obviously isn't normal. The curve goes into the negative direction again.
sample_x = seq(qnorm(.001, m, std), qnorm(.999, m, std), length.out = l)
binwidth = 3
breaks = seq(floor(min(Differences)), ceiling(max(Differences)), binwidth)
hist(Differences, breaks)
lines(sample_x, l*dnorm(sample_x, m, std)*binwidth, col = "Red")
The only curve that visually looks nice is the 2nd, but the curve falls into the negative direction.
My question is "Is there a "standard way" to place a curve on a histogram?" This data certainly isn't normal. 3 of the procedures I presented here are from similar posts but I am having some troubles obviously. I feel like all methods of fitting a curve will depend on the data you're working with.
Update with solution
Thanks to Zheyuan Li and others! I will leave this up for my own reference and hopefully others as well.
hist(Differences, probability = TRUE)
lines(density(Differences, cut = 0), col = "Red", lwd = 2)
lines(density(Differences, adjust = 2, cut = 0), lwd = 2, col = "Blue")
OK, so you are just struggling with the fact that density goes beyond "natural range". Well, just set cut = 0. You possibly want to read plot.density extends “xlim” beyond the range of my data. Why and how to fix it? for why. In that answer, I was using from and to. But now I am using cut.
## consider a mixture, that does not follow any parametric distribution family
## note, by construction, this is a strictly positive random variable
set.seed(0)
x <- rbeta(1000, 3, 5) + rexp(1000, 0.5)
## (kernel) density estimation offers a flexible nonparametric approach
d <- density(x, cut = 0)
## you can plot histogram and density on the density scale
hist(x, prob = TRUE, breaks = 50)
lines(d, col = 2)
Note, by cut = 0, density estimation is done strictly within range(x). Outside this range, density is 0.

why my GAM fit doesn't seem to have a correct intecept? [R]

My GAM curves are being shifted downwards. Is there something wrong with the intercept? I'm using the same code as Introduction to statistical learning... Any help's appreciated..
Here's the code. I simulated some data (a straight line with noise), and fit GAM multiple times using bootstrap.
(It took me a while to figure out how to plot multiple GAM fits in one graph. Thanks to this post Sam's answer, and this post)
library(gam)
N = 1e2
set.seed(123)
dat = data.frame(x = 1:N,
y = seq(0, 5, length = N) + rnorm(N, mean = 0, sd = 2))
plot(dat$x, dat$y, xlim = c(1,100), ylim = c(-5,10))
gamFit = vector('list', 5)
for (ii in 1:5){
ind = sample(1:N, N, replace = T) #bootstrap
gamFit[[ii]] = gam(y ~ s(x, 10), data = dat, subset = ind)
par(new=T)
plot(gamFit[[ii]], col = 'blue',
xlim = c(1,100), ylim = c(-5,10),
axes = F, xlab='', ylab='')
}
The issue is with plot.gam. If you take a look at the help page (?plot.gam), there is a parameter called scale, which states:
a lower limit for the number of units covered by the limits on the ‘y’ for each plot. The default is scale=0, in which case each plot uses the range of the functions being plotted to create their ylim. By setting scale to be the maximum value of diff(ylim) for all the plots, then all subsequent plots will produced in the same vertical units. This is essential for comparing the importance of fitted terms in additive models.
This is an issue, since you are not using range of the function being plotted (i.e. the range of y is not -5 to 10). So what you need to do is change
plot(gamFit[[ii]], col = 'blue',
xlim = c(1,100), ylim = c(-5,10),
axes = F, xlab='', ylab='')
to
plot(gamFit[[ii]], col = 'blue',
scale = 15,
axes = F, xlab='', ylab='')
And you get:
Or you can just remove the xlim and ylim parameters from both calls to plot, and the automatic setting of plot to use the full range of the data will make everything work.

Plot normal, left and right skewed distribution in R

I want to create 3 plots for illustration purposes:
- normal distribution
- right skewed distribution
- left skewed distribution
This should be an easy task, but I found only this link, which only shows a normal distribution. How do I do the rest?
If you are not too tied to normal, then I suggest you use beta distribution which can be symmetrical, right skewed or left skewed based on the shape parameters.
hist(rbeta(10000,5,2))
hist(rbeta(10000,2,5))
hist(rbeta(10000,5,5))
Finally I got it working, but with both of your help, but I was relying on this site.
N <- 10000
x <- rnbinom(N, 10, .5)
hist(x,
xlim=c(min(x),max(x)), probability=T, nclass=max(x)-min(x)+1,
col='lightblue', xlab=' ', ylab=' ', axes=F,
main='Positive Skewed')
lines(density(x,bw=1), col='red', lwd=3)
This is also a valid solution:
curve(dbeta(x,8,4),xlim=c(0,1))
title(main="posterior distrobution of p")
just use fGarch package and these functions:
dsnorm(x, mean = 0, sd = 1, xi = 1.5, log = FALSE)
psnorm(q, mean = 0, sd = 1, xi = 1.5)
qsnorm(p, mean = 0, sd = 1, xi = 1.5)
rsnorm(n, mean = 0, sd = 1, xi = 1.5)
** mean, sd, xi location parameter mean, scale parameter sd, skewness parameter xi.
Examples
## snorm -
# Ranbdom Numbers:
par(mfrow = c(2, 2))
set.seed(1953)
r = rsnorm(n = 1000)
plot(r, type = "l", main = "snorm", col = "steelblue")
# Plot empirical density and compare with true density:
hist(r, n = 25, probability = TRUE, border = "white", col = "steelblue")
box()
x = seq(min(r), max(r), length = 201)
lines(x, dsnorm(x), lwd = 2)
# Plot df and compare with true df:
plot(sort(r), (1:1000/1000), main = "Probability", col = "steelblue",
ylab = "Probability")
lines(x, psnorm(x), lwd = 2)
# Compute quantiles:
round(qsnorm(psnorm(q = seq(-1, 5, by = 1))), digits = 6)

Resources