superimpose normal density curve to histogram malfunctioning (base r) - r

I am using base R, and had a code for teaching about normal distribution, and have ran the code successfully many times.
Now, however, when I superimpose the normal density curve, it doesn't seem to function properly.
Here is an example code:
set.seed(100)
data <- rnorm(1000, mean = 0, sd = 1)
hist(data, main = "Normal Distribution", xlab = "X", ylab = "Frequency", col = "444", xlim=c(-4,4))
Now I try to superimpose a density curve over the plot, using the density() command:
lines(density(data), col = "red", lwd = 2)
As you see, the line is flat, and I am perplexed as to why? So I tried another method:
x <- seq(-4, 4, length.out = 100)
lines(x, dnorm(x, mean = 0, sd = 1), col = "red", lwd = 2)
But I get the same result.
Any thoughts why it's not working properly?

The answer came to me thanks to one of the users comments.
Using base R, the hist() function will not plot a probability function by default, which is what needed here. Thus, if I set freq=F the code will worked.
Here is the correct answer:
set.seed(100)
data <- rnorm(1000, mean = 0, sd = 1)
hist(data, main = "Normal Distribution", xlab = "X", ylab = "Frequency", col = "444", xlim=c(-4,4), freq = F)
lines(density(data), col ='777', lwd = 2)

Related

R: plot of normalmixEM shows truncated density plots (mixtools)

I'm currently trying to plot the components found via EM algorithm. However, the estimated densities do not extend fully to the end. It looks like this:
My code is:
plot(EM_data, which=2, xlim= c(0, 80), xlab2= "", yaxt= "n", main2 ="", lwd2=0.8, border = "azure3")
lines(density(EM_data), lty=2, lwd=0.8)
The plot is truncated wether I specify xlim or not. xlim2 is not defined for this type of plot. Where am I going wrong?
The method to plot mixEM only draws within the range of the data, if you want to extend the densities you must build your own function.
Use something like this:
Example data:
library(mixtools)
data(faithful)
attach(faithful)
set.seed(100)
EM_data<-normalmixEM(waiting, arbvar = FALSE, epsilon = 1e-03)
mixtools plot:
plot(EM_data, which=2, xlim= c(30, 110), xlab2= "", yaxt= "n", main2 ="",
lwd2=0.8, border = "azure3")
lines(density(EM_data$x), lty=2, lwd=0.8)
Adaptation by extending densities:
a <- hist(EM_data$x, plot = FALSE)
maxy <- max(max(a$density), 0.3989 * EM_data$lambda/EM_data$sigma)
hist(EM_data$x, prob = TRUE, main = "", xlab = "", xlim= c(30, 110),
ylim = c(0, maxy), yaxt= "n", border = "azure3")
for (i in 1:ncol(EM_data$posterior)) {
curve(EM_data$lambda[i] * dnorm(x, mean = EM_data$mu[i], sd = EM_data$sigma[i]),
col = 1 + i, lwd = 0.8, add = TRUE)
}
lines(density(EM_data$x), lty=2, lwd=0.8)

How to replicate a figure describing standard error of the mean in R?

The first figure in link here shows a very nice example of how to visualise standard error and I would like to replicate that in R.
I'm getting there with the following
set.seed(1)
pop<-rnorm(1000,175,10)
mean(pop)
hist(pop)
#-------------------------------------------
# Plotting Standard Error for small Samples
#-------------------------------------------
smallSample <- replicate(10,sample(pop,3,replace=TRUE)) ; smallSample
smallMeans<-colMeans(smallSample)
par(mfrow=c(1,2))
x<-c(1:10)
plot(x,smallMeans,ylab="",xlab = "",pch=16,ylim = c(150,200))
abline(h=mean(pop))
#-------------------------------------------
# Plotting Standard Error for Large Samples
#-------------------------------------------
largeSample <- replicate(10,sample(pop,20,replace=TRUE))
largeMeans<-colMeans(largeSample)
x<-c(1:10)
plot(x,largeMeans,ylab="",xlab = "",pch=16,ylim = c(150,200))
abline(h=mean(pop))
But I'm not sure how to plot the raw data as they have with the X symbols. Thanks.
Using base plotting, you need to use the arrows function.
In R there is no function (ASAIK) that computes standard error so try this
sem <- function(x){
sd(x) / sqrt(length(x))
}
Plot (using pch = 4 for the x symbols)
plot(x, largeMeans, ylab = "", xlab = "", pch = 4, ylim = c(150,200))
abline(h = mean(pop))
arrows(x0 = 1:10, x1 = 1:10, y0 = largeMeans - sem(largeSample) * 5, largeMeans + sem(largeSample) * 5, code = 0)
Note: the SE's from the data you provided were quite small, so i multiplied them by 5 to make them more obvious
Edit
Ahh, to plot all the points, then perhaps ?matplot, and ?matpoints would be helpful? Something like:
matplot(t(largeSample), ylab = "", xlab = "", pch = 4, cex = 0.6, col = 1)
abline(h = mean(pop))
points(largeMeans, pch = 19, col = 2)
Is this more the effect you're after?

Plot Lognormal Probability Density in R

I am trying to generate a plot for Lognormal Probability Density in R, with 3 different means log and standards deviation log. I have tried the following, but my graph is so ugly and does not look good at all.
x<- seq(0,10,length = 100)
a <- dlnorm(x, meanlog = 0, sdlog = 1, log = FALSE)
b <- dlnorm(x, meanlog = 0, sdlog = 1.5, log = FALSE)
g <- dlnorm(x, meanlog = 1.5, sdlog = 0.2, log = FALSE)
plot(x,a, lty=5, col="blue", lwd=3)
lines(x,b, lty=2, col = "red")
lines(x,g, lty=4, col = "green")
I even was trying to add legend on the right top for each mean log and standard deviation log, but it would not work with me. I was wondering if someone could guide me out with that.
Right top of the graph
There is really nothing wrong in your code. You just forgot to:
use type = "l" in plot;
set a good ylim to hold all lines.
Here is a simple solution with matplot:
matplot(x, cbind(a,b,g), type = "l", ylab = "density", main = "log-normal",
col = 1:3, lty = 1:3)
To add legend, use
legend("topright",
legend = c("mu = 0, sd = 1", "mu = 0, sd = 1.5", "mu = 1.5, sd = 0.2"),
col = 1:3,
lty = 1:3)
You can also read ?plotmath for adding expressions. Try changing the legend argument above to:
legend = c(expression(ln(y) %~% N(0,1)),
expression(ln(y) %~% N(0,1.5)),
expression(ln(y) %~% N(1.5,0.2)))

How To Avoid Density Curve Getting Cut Off In Plot

I am working on an assignment using R and the fitted density curve that is overlaid on the histogram is cut off at it's peak.
Example:
x <- rexp(1000, 0.2)
hist(x, prob = TRUE)
lines(density(x), col = "blue", lty = 3, lwd = 2)
I have done a search on the internet for this but didn't find anything addressing this problem. I have tried playing with the margins, but that doesn't work. Am I missing something in my code?
Thank you for your help!
Here's the simple literal answer to the question. Make an object to hold the result of your density call and use that to set the ylim of the histogram.
x <- rexp(1000, 0.2)
tmp <- density(x)
hist(x, prob = TRUE, ylim = c(0, max(tmp$y)))
lines(tmp, col = "blue", lty = 3, lwd = 2)
(should probably go to SO)

Plot normal, left and right skewed distribution in R

I want to create 3 plots for illustration purposes:
- normal distribution
- right skewed distribution
- left skewed distribution
This should be an easy task, but I found only this link, which only shows a normal distribution. How do I do the rest?
If you are not too tied to normal, then I suggest you use beta distribution which can be symmetrical, right skewed or left skewed based on the shape parameters.
hist(rbeta(10000,5,2))
hist(rbeta(10000,2,5))
hist(rbeta(10000,5,5))
Finally I got it working, but with both of your help, but I was relying on this site.
N <- 10000
x <- rnbinom(N, 10, .5)
hist(x,
xlim=c(min(x),max(x)), probability=T, nclass=max(x)-min(x)+1,
col='lightblue', xlab=' ', ylab=' ', axes=F,
main='Positive Skewed')
lines(density(x,bw=1), col='red', lwd=3)
This is also a valid solution:
curve(dbeta(x,8,4),xlim=c(0,1))
title(main="posterior distrobution of p")
just use fGarch package and these functions:
dsnorm(x, mean = 0, sd = 1, xi = 1.5, log = FALSE)
psnorm(q, mean = 0, sd = 1, xi = 1.5)
qsnorm(p, mean = 0, sd = 1, xi = 1.5)
rsnorm(n, mean = 0, sd = 1, xi = 1.5)
** mean, sd, xi location parameter mean, scale parameter sd, skewness parameter xi.
Examples
## snorm -
# Ranbdom Numbers:
par(mfrow = c(2, 2))
set.seed(1953)
r = rsnorm(n = 1000)
plot(r, type = "l", main = "snorm", col = "steelblue")
# Plot empirical density and compare with true density:
hist(r, n = 25, probability = TRUE, border = "white", col = "steelblue")
box()
x = seq(min(r), max(r), length = 201)
lines(x, dsnorm(x), lwd = 2)
# Plot df and compare with true df:
plot(sort(r), (1:1000/1000), main = "Probability", col = "steelblue",
ylab = "Probability")
lines(x, psnorm(x), lwd = 2)
# Compute quantiles:
round(qsnorm(psnorm(q = seq(-1, 5, by = 1))), digits = 6)

Resources