Histogram overlay not visible - r

I need to overlay a normal distribution curve based on a dataset on a histogram of the same dataset.
I get the histogram and the normal curve right individually. But the curve just stays a flat line when combined to the histogram using the add = TRUE attribute in the curve function.
I did try adjusting the xlim and ylim to check if it works but am not getting the intended results, I am confused about how to set the (x and y) limits to suit both the histogram and the curve.
Any suggestions? My dataset is a set of values for 100 individuals daily walk distances ranging from min = 0.4km to max = 10km
bd.m <- read_excel('walking.xlsx')
hist(bd.m, ylim = c(0,10))
curve(dnorm(x, mean = mean(bd.m), sd = sd(bd.m)), add = TRUE, col = 'red')

You need to set freq = FALSEin the call to hist. For example:
dt <- rnorm(1000, 2)
hist(dt, freq = F)
curve(dnorm(x, mean = mean(dt), sd = sd(dt)), add = TRUE, col = 'red')

Related

How to plot a function with plot() in R with a step size of 0.5?

New to R. I'm given a function f(x)=x^3-3x+7. I need to plot this function in red and plot its derivative with a step-size of 0.5 in blue. dfx vs x in the same graph. I have plotted f(x), but i cant plot dfx with adjusted step size.
My code:
f1 <- function(x) {x^3-3*x+7}
exp = D(expression(x^3-3*x+7),'x')
f2 <- function(x) {D(exp,'x')}
curve(f1,from=-2,to=2,col='red')
curve(exp,col='blue',add=TRUE,type='p')
I need the points to be plotted at an interval of 0.5 and also draw a line to connect them
I'm not 100% sure what you need. If by changing the step size you mean that both curves be entirely visible in the same plot, that can be done by increasing the limits on the y-axis:
curve(f1, from = -2, to = 2, col='red', add = F, ylim = c(0,10))
curve(exp, col = 'blue', add = T, type = 'p')

Formatting histograms in R

I'm trying to fit Variance-Gamma distribution to empirical data of 1-minute logarithmic returns. In order to visualize the results I plotted together 2 histograms: empirical and theoretical.
(a is the vector of empirical data)
SP_hist <- hist(a,
col = "lightblue",
freq = FALSE,
breaks = seq(a, max(a), length.out = 141),
border = "white",
main = "",
xlab = "Value",
xlim = c(-0.001, 0.001))
hist(VG_sim_rescaled,
freq = FALSE,
breaks = seq(min(VG_sim_rescaled), max(VG_sim_rescaled), length.out = 141),
xlab = "Value",
main = "",
col = "orange",
add = TRUE)
(empirical histogram-blue, theoretical histogram-orange)
However, after having plotted 2 histograms together, I started wondering about 2 things:
In both histograms I stated, that freq = FALSE. Therefore, the y-axis should be in range (0, 1). In the actual picture values on the y-axis exceed 3,000. How could it happen? How to solve it?
I need to change the bucketing size (the width of the buckets) and the density per unit length of the x-axis. How is it possible to do these tasks?
Thank you for your help.
freq=FALSE means that the area of the entire histogram is normalized to one. As your x-axis has a very small range (about 10^(-4)), the y-values must be quite large to achieve an area (= x times y) of one.
The only way to set the number of bins is by providing a vector of break points to the parameter breaks. Theoretically, this parameter also accepts a single number, but this number is ignored by hist. Thus try the following:
bins <- 6 # number of cells
breaks <- seq(min(x),max(x),(max(x)-min(x))/bins)
hist(x, freq=FALSE, breaks=breaks)

How to overlay density histogram with gamma distribution fit in R?

I am new to R and would like to add a fit to a gamma distribution to my histogram. I would like the gamma distribution fit to overlay my histogram.
I am able to calculate the gamma distribution with the dgamma function and also with the fitdist function. However, I am not able to overlay this gamma distribution as a fit onto my histogram.
This is the code I tried:
hist(mydata, breaks = 30, freq = FALSE, col = "grey")
lines(dgamma(mydata, shape = 1))
The code I tried does not overlay the gamma distribution fit onto my histogram. I only get the histogram without the fit.
See if the following example can help in overlaying
a fitted line in black
a PDF graph in red, dotted
on a histogram.
First, create a dataset.
set.seed(1234) # Make the example reproducible
mydata <- rgamma(100, shape = 1, rate = 1)
Now fit a gamma distribution to the data.
param <- MASS::fitdistr(mydata, "gamma")
This vector is needed for the fitted line.
x <- seq(min(mydata), max(mydata), length.out = 100)
And plot them all.
hist(mydata, breaks = 30, freq = FALSE, col = "grey", ylim = c(0, 1))
curve(dgamma(x, shape = param$estimate[1], rate = param$estimate[2]), add = TRUE)
lines(sort(mydata), dgamma(sort(mydata), shape = 1),
col = "red", lty = "dotted")

why my GAM fit doesn't seem to have a correct intecept? [R]

My GAM curves are being shifted downwards. Is there something wrong with the intercept? I'm using the same code as Introduction to statistical learning... Any help's appreciated..
Here's the code. I simulated some data (a straight line with noise), and fit GAM multiple times using bootstrap.
(It took me a while to figure out how to plot multiple GAM fits in one graph. Thanks to this post Sam's answer, and this post)
library(gam)
N = 1e2
set.seed(123)
dat = data.frame(x = 1:N,
y = seq(0, 5, length = N) + rnorm(N, mean = 0, sd = 2))
plot(dat$x, dat$y, xlim = c(1,100), ylim = c(-5,10))
gamFit = vector('list', 5)
for (ii in 1:5){
ind = sample(1:N, N, replace = T) #bootstrap
gamFit[[ii]] = gam(y ~ s(x, 10), data = dat, subset = ind)
par(new=T)
plot(gamFit[[ii]], col = 'blue',
xlim = c(1,100), ylim = c(-5,10),
axes = F, xlab='', ylab='')
}
The issue is with plot.gam. If you take a look at the help page (?plot.gam), there is a parameter called scale, which states:
a lower limit for the number of units covered by the limits on the ‘y’ for each plot. The default is scale=0, in which case each plot uses the range of the functions being plotted to create their ylim. By setting scale to be the maximum value of diff(ylim) for all the plots, then all subsequent plots will produced in the same vertical units. This is essential for comparing the importance of fitted terms in additive models.
This is an issue, since you are not using range of the function being plotted (i.e. the range of y is not -5 to 10). So what you need to do is change
plot(gamFit[[ii]], col = 'blue',
xlim = c(1,100), ylim = c(-5,10),
axes = F, xlab='', ylab='')
to
plot(gamFit[[ii]], col = 'blue',
scale = 15,
axes = F, xlab='', ylab='')
And you get:
Or you can just remove the xlim and ylim parameters from both calls to plot, and the automatic setting of plot to use the full range of the data will make everything work.

Combining 2 datasets in a single plot in R

I have two columns of data, f.delta and g.delta that I would like to produce a scatter plot of in R.
Here is how I am doing it.
plot(f.delta~x, pch=20, col="blue")
points(g.delta~x, pch=20, col="red")
The problem is this: the values of f.delta vary from 0 to -7; the values of g.delta vary from 0 to 10.
When the plot is drawn, the y axis extends from 1 to -7. So while all the f.delta points are visible, any g.delta point that has y>1 is cut-off from view.
How do I stop R from automatically setting the ylims from the data values. Have tried, unsuccessfully, various combinations of yaxt, yaxp, ylims.
Any suggestion will be greatly appreciated.
Thanks,
Anjan
In addition to Gavin's excellent answer, I also thought I'd mention that another common idiom in these cases is to create an empty plot with the correct limits and then to fill it in using points, lines, etc.
Using Gavin's example data:
with(df,plot(range(x),range(f.delta,g.delta),type = "n"))
points(f.delta~x, data = df, pch=20, col="blue")
points(g.delta~x, data = df, pch=20, col="red")
The type = "n" causes plot to create only the empty plotting window, based on the range of x and y values we've supplied. Then we use points for both columns on this existing plot.
You need to tell R what the limits of the data are and pass that as argument ylim to plot() (note the argument is ylim not ylims!). Here is an example:
set.seed(1)
df <- data.frame(f.delta = runif(10, min = -7, max = 0),
g.delta = runif(10, min = 0, max = 10),
x = rnorm(10))
ylim <- with(df, range(f.delta, g.delta)) ## compute y axis limits
plot(f.delta ~ x, data = df, pch = 20, col = "blue", ylim = ylim)
points(g.delta ~ x, data = df, pch = 20, col = "red")
Which produces

Resources