I am new to R and would like to add a fit to a gamma distribution to my histogram. I would like the gamma distribution fit to overlay my histogram.
I am able to calculate the gamma distribution with the dgamma function and also with the fitdist function. However, I am not able to overlay this gamma distribution as a fit onto my histogram.
This is the code I tried:
hist(mydata, breaks = 30, freq = FALSE, col = "grey")
lines(dgamma(mydata, shape = 1))
The code I tried does not overlay the gamma distribution fit onto my histogram. I only get the histogram without the fit.
See if the following example can help in overlaying
a fitted line in black
a PDF graph in red, dotted
on a histogram.
First, create a dataset.
set.seed(1234) # Make the example reproducible
mydata <- rgamma(100, shape = 1, rate = 1)
Now fit a gamma distribution to the data.
param <- MASS::fitdistr(mydata, "gamma")
This vector is needed for the fitted line.
x <- seq(min(mydata), max(mydata), length.out = 100)
And plot them all.
hist(mydata, breaks = 30, freq = FALSE, col = "grey", ylim = c(0, 1))
curve(dgamma(x, shape = param$estimate[1], rate = param$estimate[2]), add = TRUE)
lines(sort(mydata), dgamma(sort(mydata), shape = 1),
col = "red", lty = "dotted")
Related
I need to plot theoretical Pareto distribution in R.
I want this as a line - not points and not polylines.
My distribution function is 1−(1/x)^2.
I plotted empirical distribution of my sample and also theoretical distribution at one graph:
ecdf(b2)
plot(ecdf(b2))
lines(x, (1-(1/x)^2), col = "red", lwd = 2, xlab = "", ylab = "")
But I got:
You can see that red line is not continuous, it's something like polyline. Is it possible to get the continuous red line?
Do you have any advices?
Use curve() instead.
library(EnvStats)
set.seed(8675309)
# You did not supply the contents of b2 so I generated some
b2 <- rpareto(100, 1, 2)
plot(ecdf(b2))
ppareto <- function(x) 1−(1/x)^2
curve(ppareto, col = "red", add = TRUE)
I need to overlay a normal distribution curve based on a dataset on a histogram of the same dataset.
I get the histogram and the normal curve right individually. But the curve just stays a flat line when combined to the histogram using the add = TRUE attribute in the curve function.
I did try adjusting the xlim and ylim to check if it works but am not getting the intended results, I am confused about how to set the (x and y) limits to suit both the histogram and the curve.
Any suggestions? My dataset is a set of values for 100 individuals daily walk distances ranging from min = 0.4km to max = 10km
bd.m <- read_excel('walking.xlsx')
hist(bd.m, ylim = c(0,10))
curve(dnorm(x, mean = mean(bd.m), sd = sd(bd.m)), add = TRUE, col = 'red')
You need to set freq = FALSEin the call to hist. For example:
dt <- rnorm(1000, 2)
hist(dt, freq = F)
curve(dnorm(x, mean = mean(dt), sd = sd(dt)), add = TRUE, col = 'red')
I can create a lognormal probability plot using the probplot() function from the e1071 package. A problem arises when I try to add another set of lognormal data to the first plot. Although I use the command par(new=T), the xaxis of the two plots are different and don't align.
Is there another way to go about this?
I tried using the points() function. However, it appears I need the x and y coordinates to plot it and I don't know how to extract the x, y coordinates from the probplot() function.
''' R
# Program to plot random logn failure times with probability plot
library(e1071)
logn_prob_plot <- function() {
set.seed(1)
x<-rlnorm(10,1,1)
par(bty="l")
par(col.lab="white")
p<-probplot(x,qdist=qlnorm)
par(col.lab="black")
mtext(text="failure time", col="black",side=1,line=3,outer=F)
mtext(text="lognormal probability", col="black",side=2,line=3,outer=F)
set.seed(2)
y=rlnorm(10,2,3)
par(new=T)
par(col.lab="white")
probplot(y,qdist=qlnorm,xlab="fail time",ylab="lognormal probability")
par(col.lab="black")
mtext(text="failure time", col="black",side=1,line=3,outer=F)
mtext(text="lognormal probability", col="black",side=2,line=3,outer=F)
}
logn_prob_plot()
My expected result is two groups of data on the same probability plot with the same x and y axes. Instead, I get two different x-axes that are not aligned.
First lets simulate the variables:
set.seed(1)
x<-rlnorm(10,1,1)
set.seed(2)
y=rlnorm(10,2,3)
The first probplot is:
p<-probplot(x,qdist=qlnorm, meanlog = 1, sdlog = 1)
which produces the output:
The second probplot is:
q <- probplot(y,qdist=qlnorm,meanlog = 2, sdlog = 3)
which produces the output:
Your best shot a merging them is using the scale of the smaller one and discarding some points:
p<-probplot(x,qdist=qlnorm, meanlog = 1, sdlog = 1)
points(sort(x), p[[1]](ppoints(length(x))), col = "red", pch = 19)
lines(q, col = "blue")
points(sort(y), q[[1]](ppoints(length(y))), col = "blue", pch = 19)
which gives:
The red line and points are from the distribution with meanlog = 1, sdlog = 1 and the
blue ones are from the one with meanlog = 2, sdlog = 3.
I further have to warn you that from reading the code of the probplot() function:
xl <- quantile(x, c(0.25, 0.75))
yl <- qdist(c(0.25, 0.75), ...)
slope <- diff(yl)/diff(xl)
the slope of the line is determined only by position the first and the third quartile and not bz what happens elsewhere.
My GAM curves are being shifted downwards. Is there something wrong with the intercept? I'm using the same code as Introduction to statistical learning... Any help's appreciated..
Here's the code. I simulated some data (a straight line with noise), and fit GAM multiple times using bootstrap.
(It took me a while to figure out how to plot multiple GAM fits in one graph. Thanks to this post Sam's answer, and this post)
library(gam)
N = 1e2
set.seed(123)
dat = data.frame(x = 1:N,
y = seq(0, 5, length = N) + rnorm(N, mean = 0, sd = 2))
plot(dat$x, dat$y, xlim = c(1,100), ylim = c(-5,10))
gamFit = vector('list', 5)
for (ii in 1:5){
ind = sample(1:N, N, replace = T) #bootstrap
gamFit[[ii]] = gam(y ~ s(x, 10), data = dat, subset = ind)
par(new=T)
plot(gamFit[[ii]], col = 'blue',
xlim = c(1,100), ylim = c(-5,10),
axes = F, xlab='', ylab='')
}
The issue is with plot.gam. If you take a look at the help page (?plot.gam), there is a parameter called scale, which states:
a lower limit for the number of units covered by the limits on the ‘y’ for each plot. The default is scale=0, in which case each plot uses the range of the functions being plotted to create their ylim. By setting scale to be the maximum value of diff(ylim) for all the plots, then all subsequent plots will produced in the same vertical units. This is essential for comparing the importance of fitted terms in additive models.
This is an issue, since you are not using range of the function being plotted (i.e. the range of y is not -5 to 10). So what you need to do is change
plot(gamFit[[ii]], col = 'blue',
xlim = c(1,100), ylim = c(-5,10),
axes = F, xlab='', ylab='')
to
plot(gamFit[[ii]], col = 'blue',
scale = 15,
axes = F, xlab='', ylab='')
And you get:
Or you can just remove the xlim and ylim parameters from both calls to plot, and the automatic setting of plot to use the full range of the data will make everything work.
Say I some data, d, and I fit nls models to two subsets of the data.
x<- seq(0,4,0.1)
y1<- (x*2 / (0.2 + x))
y1<- y1+rnorm(length(y1),0,0.2)
y2<- (x*3 / (0.2 + x))
y2<- y2+rnorm(length(y2),0,0.4)
d<-data.frame(x,y1,y2)
m.y1<-nls(y1~v*x/(k+x),start=list(v=1.9,k=0.19),data=d)
m.y2<-nls(y2~v*x/(k+x),start=list(v=2.9,k=0.19),data=d)
I then want to plot the fitted model regression line over data, and shade the prediction interval. I can do this with the package investr and get nice plots for each subset individually:
require(investr)
plotFit(m.y1,interval="prediction",ylim=c(0,3.5),pch=19,col.pred='light blue',shade=T)
plotFit(m.y2,interval="prediction",ylim=c(0,3.5),pch=19,col.pred='pink',shade=T)
However, if I plot them together I have a problem. The shading of the second plot covers the points and shading of the first plot:
1: How can I make sure the points on the first plot end up on top of the shading of the second plot?
2: How can I make the region where the shaded prediction intervals overlap a new color (like purple, or any fusion of the two colors that are overlapping)?
Use adjustcolor to add transparency like this:
plotFit(m.y1, interval = "prediction", ylim = c(0,3.5), pch = 19,
col.pred = adjustcolor("lightblue", 0.5), shade = TRUE)
par(new = TRUE)
plotFit(m.y2, interval = "prediction", ylim = c(0,3.5), pch = 19,
col.pred = adjustcolor("light pink", 0.5), shade = TRUE)
Depending on what you want you can play around with the two transparency values (here both set to 0.5) and possibly make only one of them transparent.