Overlaying a normal pdf onto a histogram in R - r

I've looked through similar problems and I am unable to resolve mine based on what has been answered with them. I have a histogram of Body Mass Index Data. I have found the mean and sd of the data, and I am trying to overlay a normal pdf with the same mean and sd of the data to compare it with the histogram.
Here is what I have so far.
I used hist(BMI) to graph the histogram. I found the mean(BMI) to be 26.65 and the sd(BMI) to be 3.47.
I am trying to use the curve function to plot a normal curve with those same parameters over the histogram.
curve(dnorm(x, mean = 26.65, sd = 3.47), add = T, col = "red")
HistOverlay
As you can see, the red curve is barely visible at the very bottom of the histogram. Why is this error occurring?
Thank you.

The normal distribution is probability density function. Hence the integral of the function is 1. Simply speaking, it is the probability that a single individual has this BMI. To generate the result you want to have you need to multiply the curve with the size of your population.
Something along the line:
curve(100*dnorm(x, mean = 26.65, sd = 3.47), add = T, col = "red")

Related

exponential curve fit on histogram in R

I made a histogram in R and I have to fit an exponential curve on it.
But the curve doesn't appear on the histogram.
This is the code:
hist(Adat$price, main="histogram",xlab="data")
curve(dexp(x, rate=1,log=FALSE), add = TRUE)
Could someone help me please?
You need to add set the argument freq=FALSE if you want the histogram to be normalized:
set.seed(32418)
sim <- rexp(100) + rnorm(100,0,.01)
hist(sim, freq=FALSE)
curve(dexp(x, rate=1, log=FALSE), add = TRUE)
Otherwise, the height of the bins will be a function of the number of samples. In fact, the curve technically did appear on your graph, it's just so small that you can't distinguish it from a flat line at y = 0.

Smoothing using kernel and loess in R

I am trying to smooth my data set, using kernel or loess smoothing method. But, They are all not clear or not what I want. Several questions are the followings.
My x data is "conc" and y data is "depth", which is ex. cm.
1) Kernel smooth
k <- kernel("daniell", 150)
plot(k)
K <- kernapply(conc, k)
plot(conc~depth)
lines(K, col = "red")
Here, my data is smoothed by frequency=150. This means that every data point is averaged by neighboring (right and left) 150 data points? What "daniell" means? I could not find what it means online.
2) Loess smooth
p<-qplot(depth, conc, data=total)
p1 <- p + geom_smooth(method = "loess", size = 1, level=0.95)
Here, what is the default of loess smooth function? If I want to smooth my data with frequency=150 like above case (moving average by every 150 data point), how can I modify this code?
3) To show y-axis with log scale, I put "log10(conc)", instead of "conc", and it worked. But, I cannot change the y-axis tick label. I tried to use "scale_y_log10(limits = c(1,1e3))" in my code to show axis tick labe like 10^0, 10^1, 10^2..., but did not work.
Please answer my questions. Thanks a lot for your help.
Sum

How to plot exponential function on barplot R?

So I have a barplot in which the y axis is the log (frequencies). From just eyeing it, it appears that bars decrease exponentially, but I would like to know this for sure. What I want to do is also plot an exponential on this same graph. Thus, if my bars fall below the exponential, I would know that my bars to decrease either exponentially or faster than exponential, and if the bars lie on top of the exponential, I would know that they dont decrease exponentially. How do I plot an exponential on a bar graph?
Here is my graph if that helps:
If you're trying to fit density of an exponential function, you should probably plot density histogram (not frequency). See this question on how to plot distributions in R.
This is how I would do it.
x.gen <- rexp(1000, rate = 3)
hist(x.gen, prob = TRUE)
library(MASS)
x.est <- fitdistr(x.gen, "exponential")$estimate
curve(dexp(x, rate = x.est), add = TRUE, col = "red", lwd = 2)
One way of visually inspecting if two distributions are the same is with a Quantile-Quantile plot, or Q-Q plot for short. Typically this is done when inspecting if a distribution follows standard normal.
The basic idea is to plot your data, against some theoretical quantiles, and if it matches that distribution, you will see a straight line. For example:
x <- qnorm(seq(0,1,l=1002)) # Theoretical normal quantiles
x <- x[-c(1, length(x))] # Drop ends because they are -Inf and Inf
y <- rnorm(1000) # Actual data. 1000 points drawn from a normal distribution
l.1 <- lm(sort(y)~sort(x))
qqplot(x, y, xlab="Theoretical Quantiles", ylab="Actual Quantiles")
abline(coef(l.1)[1], coef(l.1)[2])
Under perfect conditions you should see a straight line when plotting the theoretical quantiles against your data. So you can do the same plotting your data against the exponential function you think it will follow.

Visual comparison of distribution between groups: How is scale modified for "asymmetric beanplots"?

I recently came across the R-package beanplot and the offered possibility to plot the distribution of two subgroups in one single plot (special asymmetric beanplot). You find a description of the package in the Journal of Statistical Software and on the cran.r-project.org.
I produced an asymmetric beanplot using the following CODE:
library(psych)
library(beanplot)
var1 <-c(20,33,NA,39,NA,40,34,33,NA,38,NA,8,7,NA,NA,40,34,24,25,36,40,37,34,NA,35)
var2 <- c(1,0,1,1,1,0,1,0,1,NA,1,0,0,0,0,1,1,0,1,0,1,1,NA,0,1)
mydata<-data.frame(var1,var2)
table(mydata)
par(lend = 1, mai = c(0.8, 0.8, 0.5, 0.5))
beanplot(var1 ~ var2, data= mydata, side = "both",log="",
what=c(1,1,1,0), border = NA, col = list("black", c("grey", "white")))
legend("bottomleft", fill =c("black", "grey"), legend = c("no", "yes"))
The produced plot nicely shows the different shape of the two subgroups' distribution.
PROBLEM
The dependent variable is measured on a scale ranging from 7 to 40. However, the y-axis appears to go from -1 to +55.
It would be great if anyone could explain how the scale is modified, i.e. what is actually plotted here. Is there a way to plot the distribution by using the original scale?
Many many thanks!
beanplot uses density. The estimated density can give mass to areas past the range of the observed data. You could try this to get an idea of what density does - plot(density(1:2))and you should see that it's just taking an average of gaussian densities centered at the data points (note that you can use a different kernel as beanplot does allow you to specify a kernel parameter). How it chooses the variance for that gaussian is up to you but by default it looks like beanplot uses bw.SJ with the "dpi" method to choose the bandwidth.
You could use the cutmin and cutmax to control the range that beanplot actually plots over but this doesn't actually change the density estimate.

best fitting curve from plot in R

I have a probability density function in a plot called ph that i derived from two samples of data, by the help of a user of stackoverflow, in this way
few <-read.table('outcome.dat',head=TRUE)
many<-read.table('alldata.dat',head=TRUE)
mh <- hist(many$G,breaks=seq(0,1.,by=0.03), plot=FALSE)
fh <- hist(few$G, breaks=mh$breaks, plot=FALSE)
ph <- fh
ph$density <- fh$counts/(mh$counts+0.001)
plot(ph,freq=FALSE,col="blue")
I would like to fit the best curve of the plot of ph, but i can't find a working method.
how can i do this? I have to extract the vaule from ph and then works on they? or there is same function that works on
plot(ph,freq=FALSE,col="blue")
directly?
Assuming you mean that you want to perform a curve fit to the data in ph, then something along the lines of
nls(FUN, cbind(ph$counts, ph$mids),...) may work. You need to know what sort of function 'FUN' you think the histogram data should fit, e.g. normal distribution. Read the help file on nls() to learn how to set up starting "guess" values for the coefficients in FUN.
If you simply want to overlay a curve onto the histogram, then smoo<-spline(ph$mids,ph$counts);
lines(smoo$x,smoo$y)
will come close to doing that. You may have to adjust the x and/or y scaling.
Do you want a density function?
x = rnorm(1000)
hist(x, breaks = 30, freq = FALSE)
lines(density(x), col = "red")

Resources