I would like to shorten the height of my normal dist curve so that the full curve can be seen on the graph.
histCvferr <- hist(cvf_ref_err, breaks = 10, density = 60,
col = "lightgray", xlab = "Residuals", main = "")
xfit <- seq(min(cvf_ref_err), max(cvf_ref_err), length = 40)
yfit <- dnorm(xfit, mean = mean(cvf_ref_err), sd = sd(cvf_ref_err))
yfit <- yfit * diff(h$mids[1:2]) * length(cvf_ref_err)
lines(xfit, yfit, col = "black", lwd = 2)
As you can see the top part of the curve cuts off
And also how can I change the bins so that they are black outlines with no fill?
Try passing a new argument "ylim" in the hist() function and change the range of y. Try passing the following argument in the hist() function. I hope this might help you.
ylim = c(0, 70)
Related
I have a histograms to visualize distribution, of interval data. I want to overlay a smooth normal distribution curve. The line does not go all the way down to touch the x-axis as highlited with the red box. Another concern I have with this is that the very left bar should not be less than 0.5. Some of the bar resides below 0.5 as shown in the blue box. My lowest data value is 0.58.
g <- dataset$NEW_CASE_FATALITY_RATE
h <- hist(g # depandant variable (case_fatality_rate)
, main = "Histogram - Case Fatality Rate" # chart title
, xlab = "Case Fatality Rate"
,ylab = "f"
,col = "#f0ffff"
,xlim = c(0.0,2.5)
,ylim = c(0.0,12)
)
xfit <- seq(min(g), max(g), length = 51)
yfit <- dnorm(xfit, mean = mean(g), sd = sd(g))
yfit <- yfit * diff(h$mids[1:2]) * length(g)
lines(xfit, yfit, col = "black", lwd = 2)
grid(nx = NA, ny = NULL,
lty = 1, col = "gray", lwd = 1)
I'm trying to graph two normal distributions over two histograms in the same plot in R. Here is an example of what I would like it to look like:
Here is my current code but I'm not getting the second Normal distribution to properly overlay:
g = R_Hist$`AvgFeret,20-60`
m<-mean(g)
std<-sqrt(var(g))
h <- hist(g, breaks = 20, xlab="Average Feret Diameter", main = "Histogram of 60-100um beads", col=adjustcolor("red", alpha.f =0.2))
xfit <- seq(min(g), max(g), length = 680)
yfit <- dnorm(xfit, mean=mean(g), sd=sd(g))
yfit <- yfit*diff(h$mids[1:2]) * length(g)
lines(xfit, yfit, col = "red", lwd=2)
k = R_Hist$`AvgFeret,60-100`
ms <-mean(k)
stds <-sqrt(var(k))
j <- hist(k, breaks=20, add=TRUE, col = adjustcolor("blue", alpha.f = 0.3))
xfit <- seq(min(j), max(j), length = 314)
yfit <- dnorm(xfit, mean=mean(j), sd=sd(j))
yfit <- yfit*diff(j$mids[1:2]) * length(j)
lines(xfit, yfit, col="blue", lwd=2)
and here is the graph this code is generating:
I haven't yet worked on figuring out how to rescale the axis so any help on that would also be appreciated, but I'm sure I can just look that up! Should I be using ggplot2 for this application? If so how do you overlay a normal curve in that library?
Also as a side note, here are the errors generated from graphing the second (blue) line:
To have them on the same scale, the easiest might be to run hist() first to get the values.
h <- hist(g, breaks = 20, plot = FALSE)
j <- hist(k, breaks = 20, plot = FALSE)
ymax <- max(c(h$counts, j$counts))
xmin <- 0.9 * min(c(g, k))
xmax <- 1.1 * max(c(g,k))
Then you can simply use parameters xlim and ylim in your first call to hist():
h <- hist(g, breaks = 20,
xlab="Average Feret Diameter",
main = "Histogram of 60-100um beads",
col=adjustcolor("red", alpha.f =0.2),
xlim=c(xmin, xmax),
ylim=c(0, ymax))
The errors for the second (blue) line are because you didn't replace j (the histogram object) with k (the raw values):
xfit <- seq(min(k), max(k), length = 314)
yfit <- dnorm(xfit, mean=mean(k), sd=sd(k))
yfit <- yfit*diff(j$mids[1:2]) * length(k)
As for the ggplot2 approach, you can find a good answer here and in the posts linked therein.
I have a data set and I'm looking to put a curve through the histogram for some values. Here is my code:
g = na.omit(peeronly$wellbeing1yr)
hist(g)
h <- hist(g, breaks = 10, density = 10, ylim=c(0,70),
col = "red", xlab = "Well-being score", main = " ")
xfit <- seq(min(g), max(g), length = 100)
yfit <- dnorm(xfit, mean = mean(g), sd = sd(g))
yfit <- yfit * diff(h$mids[1:2]) * length(g)
lines(xfit, yfit, col = "blue", lwd = 2)
And here is the output:
Why does the curve not go through the highest bar and why does it go weird towards the end?
There are a couple of things which might stop a histogram fitting a normal distribution. The most obvious one us that your data aren't normally distributed. In your case, it looks as though the distribution is kurtotic (i.e. too peaked to be normal). It is possible to test this with stats::shapiro.test()
The other reason it may not appear to fit well is that the shape of the histogram is sensitive to cuts on the x axis, so playing with these can sometimes give the same data an apparently better fit.
#import data
data = diameters$V1
error = .005 #mm
#make histogram
h <- hist(data, breaks = "FD", density = 10,
col = "lightblue", xlab = "Diameter", main = "Overall")
# Make normal curve
xfit <- seq(min(data), max(data), length = 40)
yfit <- dnorm(xfit, mean = mean(data), sd = sd(data))
yfit <- yfit * diff(h$mids[1:2]) * length(data)
#Draw normal curve
lines(xfit, yfit, col = "black", lwd = 2)
Output:
Expectation:
Is it possible to add error bars to the histogram using the value of +/- error without any external libraries?
You should be able to draw them with the arrows() function:
## Create a histogram from random data
> hist(sample(runif(100)))
> arrows(x0 = 0.15, y0 = 11, x1 = 0.15, y1 = 13, code = 3, length = 0.05, angle = 90)
x0 and x1 specify the start and finish x coordinates (for a straight vertical line, keep them the same)
y0 and y1 specify the start and finish y coordinates e.g the length of the line to draw.
code = 3 tells R to draw a double sided 'arrow', angle = 90 makes the 'arrow' a flat line and length = 0.05 specifies how wide the error bars should be.
See ?arrows for more details.
I am trying to generate 100 random data(s) from normal distribution, create histogram of it and put density function over the histogram.
So far i have created
set.seed(123)
rs <- rnorm(100, mean = weighted.mean(femals$Salary), sd = sd(femals$Salary))
h <- hist(rs, col = "lightgray" , density = 50 )
xfit <- seq(min(femals$Salary), max(femals$Salary), length = 40)
yfit <- dnorm(xfit, mean = mean(femals$Salary), sd = sd(femals$Salary))
yfit <- yfit * diff(h$mids[1:2]) * length(femals$Salary)
lines(xfit, yfit, col = "red", lwd = 2)
The result of this is
However i am unsure if this is correct. Isnt density function way to low for that histogram? Shouldnt density follow the edges of histogram? Is this correct or did i make mistake in my code?
the mean and standard deviaton are:
weighted mean(femals$Salary) = 5138.852
sed(femals$Salary) = 539.8707