Density curve on histogram is flat - r

I am trying to plot a curve that follows the trend of the histogram of my data, I have looked around and have tried out other peoples code but I still get a flat line. Here is my code
hist(Ferr,xlab = "Ferritin Plasma Concentration", ylab = "Frequency", main = "Histogram of Ferritin
Plasma Concentration", xlim = c(0,250), ylim = c(0,50), cex.axis=0.8, cex.lab=0.8,cex.main = 1)
curve(dnorm(x, mean = mean(Ferr), sd = sd(Ferr)), col="blue", add=TRUE)
lines(density(Ferr), col="red")
If anyone can help me to see where I have gone wrong, that would be great thank you.

Unlike an histogram, the integral of a density function over the whole space is equal to 1 :
sum(density(x)*dx) = 1
To scale the density function to the histogram, you can multiply it by the maximum value of the histogram bins and divide it by the distance between points.
Let's take mtcars$mpg as example:
Ferr <- mtcars$mpg
d <- density(Ferr)
dx <- diff(d$x)[1]
sum(d$y)*dx
[1] 1.000851
h <- hist(Ferr)
lines(x=d$x,y=max(h$counts)*d$y/dx)

You need to set freq = FALSE (and remove the constraints on ylimand xlim and change "Frequency" to "Density"):
hist(Ferr,
freq= FALSE,
xlab = "Ferritin Plasma Concentration", ylab = "Density",
main = "Histogram of Ferritin Plasma Concentration",
cex.axis=0.8, cex.lab=0.8,cex.main = 1)
curve(dnorm(x, mean = mean(Ferr), sd = sd(Ferr)), col="blue", add=TRUE)
lines(density(Ferr), col="red")
Toy data:
Ferr <- rnorm(1000)

Related

Plot text and shapes at x,y distance from plot margins in R

To visualize power with different values of N, effect sizes, and variances I am simulating and plotting the data below:
Libraries:
library(repr)
library(plotrix)
# N = sample size
N = 1000
# Sample 1
x_gi = rnorm(N, mean=38.3, sd=8)
# Plot
x_gi_hist=hist(x_gi ,
breaks=25,
main=expression(paste(italic(bar(x)[0]))),
xlab="taurine per unit of something",
col="firebrick")
# Sample 2
x_gf = rnorm(N, mean=20, sd=7)
# Plot
x_gf_hist=hist(x_gf ,
breaks=25,
main=expression(paste(italic(bar(x)[gf]))),
xlab="taurine per unit of something",
col="grey70")
# Both samples together
#####################
# H0: Grain-inclusive distribution
# Plot
plot(x_gi_hist,
col= c("firebrick",rgb(0.8,0,0,1/2))[cut(x_gi_hist$breaks, c(-Inf, max(sort(x_gi)[(1: (length(x_gi)*0.05))]),Inf))],
xlim=c(0,75), ylim=c(0,N/5),
main="", xlab="taurine per unit of something",
cex.lab=1.5,
border=c("firebrick",rgb(0.7,0,0,1/4)))
# H0 label
text(x=mean(x_gi), y=max(x_gi_hist$counts) + (N/15),
label = expression(paste(italic(H[0]))),
cex=2)
# Mean label
text(x=mean(x_gi), y=max(x_gi_hist$counts) + (N/30),
label = expression(paste(italic(bar(x)[0]))),
cex=2)
# Critical value
ablineclip(v=max(sort(x_gi)[(1: (length(x_gi)*0.05))]+0.8),
lwd=2, lty=2, col="red",
y1=0, y2=max(x_gf_hist$counts)-5)
# Critical value
text(x=max(sort(x_gi)[(1: (length(x_gi)*0.05))]+0.8), y=max(x_gf_hist$counts)-4,
label="critical value", adj=0,cex= 1.5)
# Effect size arrow
arrows(x0=mean(x_gf)+4, x1=mean(x_gi)-4,
y0=max(x_gi_hist$counts) + (N/30), y1=max(x_gi_hist$counts) + (N/30),
lty=1, code=3, lwd=3, col="black")
# Effect size label
text(x=mean(x_gf) + (1/4 * mean(x_gi)), y=max(x_gi_hist$counts) + (N/25),
label = expression(paste(italic(d))),
cex=2)
#####################
# HA: Grain-free distribution
# Plot GF
plot(x_gf_hist,
col= c(rgb(0,0,0,0.2), rgb(0,0,0,0.5))[cut(x_gf_hist$breaks, c(-Inf, max(sort(x_gi)[(1: (length(x_gi)*0.05))]),Inf))],
xlim=c(0,75),ylim=c(0,N/5),
add=T, border=rgb(0,0,0,0.2))
# HA label
text(x=mean(x_gf), y=max(x_gi_hist$counts) + (N/15),
label = expression(paste(italic(H[A]))),
cex=2)
# Mean label
text(x=mean(x_gf), y=max(x_gi_hist$counts) + (N/30),
label = expression(paste(italic(bar(x)[gf]))),
cex=2, font=2)
This works when N > 100 but breaks down when N<10.
Question: is there a way to generate sensible plots across lower values of N, say when N=10
Here is an example:
Thanks for your time!

Creating a relative frequency histogram and superimposing a normal distribution curve

Basically, I have trouble plotting the relative frequency histogram, as when I plot the data my y axis always becomes greater than one. I also want to superimpose a normal distribution on top however it never seems to work.
What I have produced so far: https://imgur.com/H9lWBVg
I have tried multiple methods in plotting the histogram such as hist(), truehist() and plot() etc.
truehist(aest,freq=TRUE, xlab = "Average Est", col="blue")
curve(dnorm(x,mean(aest),sd(aest)),col="red", add=TRUE, lwd=2)
legend("topright",legend=c(paste("median = ",toString(mean(aest))),paste("mean = ",toString(median(aest))),paste("SD = ",toString(sd(aest)))), cex=0.65)
You are looking for a density plot, not a frequency one. Try hist with
freq = FALSE
And you'll get the result you want. I don't have your data, but subbing some random data I have in it will look like so:
hist(move$dist,freq=FALSE, xlab = "Average Est", col="blue")
curve(dnorm(x,mean(move$dist),sd(move$dist)),col="red", add=TRUE, lwd=2)
legend("topright",
legend=c(paste("median = ",toString(mean(move$dist))),
paste("mean = ",toString(median(move$dist))),
paste("SD = ",toString(sd(move$dist)))),
cex=0.65)
Or you can do truehist, but then the parameter isn't freq it is
prob = TRUE
which will look something like this:
truehist(move$dist,prob = TRUE, xlab = "Average Est", col="blue", nbins = "fd")
curve(dnorm(x,mean(move$dist),sd(move$dist)),col="red", add=TRUE, lwd=2)
legend("topright",
legend=c(paste("median = ",toString(mean(move$dist))),
paste("mean = ",toString(median(move$dist))),
paste("SD = ",toString(sd(move$dist)))),
cex=0.65)

2 Y axis histogram (normal frequency vs relative frequency)

I would like your help, please.
I have this 2 plots, separately. One is normal frequency and the other one, with exactly the same data, is for relative frequency.
Can you tell me how can i join them in a single plot with 2 y axis ( frequency and relative frequency?)
x<- AAA$starch
h<-hist(x, breaks=40, col="lightblue", xlab="Starch ~ Corn",
main="Histogram with Normal Curve", xlim=c(58,70),ylim = c(0,2500),axes=TRUE)
xfit<-seq(min(x),max(x),length=40)
yfit<-dnorm(xfit,mean=mean(x),sd=sd(x))
yfit <- yfit*diff(h$mids[1:2])*length(x)
lines(xfit, yfit, col="blue", lwd=3)
library(HistogramTools)
x<- AAA$starch
c <- hist(x,breaks=10, ylab="Relative Frequency", main="Histogram with Normal Curve",ylim=c(0,2500), xlim=c(58,70), axes=TRUE)
PlotRelativeFrequency((c))
Thank you!!
EDIT:
This is just an example image of what I want...
I use doubleYScale from package latticeExtra.
Here is an example (I am not sure about relative frequency calculation) :
library(latticeExtra)
set.seed(42)
firstSet <- rnorm(500,4)
breaks = 0:10
#Cut data into sections
firstSet.cut = cut(firstSet, breaks, right=FALSE)
firstSet.freq = table(firstSet.cut)
#Calculate relative frequency
firstSet.relfreq = firstSet.freq / length(firstSet)
#Parse to a list to use xyplot later and assigning x values
firstSet.list <- list(x = 1:10, y = as.vector(firstSet.relfreq))
#Build histogram and relative frequency curve
hist1 <- histogram(firstSet, breaks = 10, freq = TRUE, col='skyblue', xlab="Starch ~ Corn", ylab="Frequency", main="Histogram with Normal Curve", ylim=c(0,40), xlim=c(0,10), plot=FALSE)
relFreqCurve <- xyplot(y ~ x, firstSet.list, type="l", ylab = "Relative frequency", ylim=c(0,1))
#Build double objects plot
doubleYScale(hist1, relFreqCurve, add.ylab2 = TRUE)
And here is the result with two y axis with different scales :

Add confidence band to R IRF Plot

I use following example code to plot an impulse response function:
# Load data and apply VAR
library("vars")
data(Canada)
data <- Canada
data <- data.frame(data[,1:2])
names(data)
var <- VAR(data, p=2, type = "both")
# Apply IRf
irf <- irf(var, impulse = "e", response = "prod", boot = T, cumulative = FALSE, n.ahead = 20)
str(irf)
plot(irf)
# Response
irf$irf
# Lower & Higher
irf$Lower
irf$Upper
#Create DataFrame and Plot
irf_df <- data.frame(irf$irf,irf$Lower,irf$Upper)
irf_df$T<-seq.int(nrow(irf_df)) #T
irf_df
plot(data.frame(irf_df$T, irf_df[1]), type="l", main="Impulse Response")
abline(h=0, col="blue", lty=2)
It looks like it works so far, though I sense that the code could be improved.
Would it be possible to add a confidence band for the Lower and Upper bounds of the confidence interval?
If you want to plot the Lower and Upper bands, you can use the lines() function, setting the y-limits of the plot if desired.
plot(irf_df$T, irf_df$prod, type="l", main="Impulse Response",
ylim = c(min(irf_df$prod.1), max(irf_df$prod.2)) * 1.1)
abline(h=0, col="blue", lty=2)
lines(irf_df$T, irf_df$prod.1, lty = 2)
lines(irf_df$T, irf_df$prod.2, lty = 2)
For a fancier plot with the confidence band filled in, use polygon. Here, we set up an empty plot, then plot the polygon, and finally overlay the line. Also note here that there's no need to set up a new data.frame: we can simply use values from the irf() output:
plot(irf$irf$e, type = "n", main = "Impulse Response",
ylim = c(min(irf$Lower$e), max(irf$Upper$e)))
polygon(x = c(seq_along(irf$irf$e), rev(seq_along(irf$irf$e))),
y = c(irf$Lower$e, rev(irf$Upper$e)),
lty = 0, col = "#fff7ec")
lines(irf$irf$e)
Output:

How to do 3D bar plot in R

I would like to produce this kind of graph:
However, I don't know how to do it using R. I was wondering if someone knew a solution to do it in R?
I would use the package rgl.
library(rgl)
# load your data
X= c(1:6)
Y=seq(10,70, 10)
Z=c(-70, -50, -30, -20, -10, 10)
# create an empty plot with the good dimensions
plot3d(1,1,1, type='n', xlim=c(min(X),max(X)),
ylim=c(min(Y),max(Y)),
zlim=c(min(Z),max(Z)),
xlab="", ylab="", zlab="", axe=F )
# draw your Y bars
for(i in X){ segments3d(x = rep(X[i],2), y = c(0,Y[i]), z=0, lwd=6, col="purple")}
# do the same for the Z bars
plot3d(X,0,Z, add=T, axe=F, typ="n")
for(i in X){segments3d(x = rep(X[i],2), y = 0, z= c(0,Z[i]), lwd=6, col="blue" )}
# draw your axis
axes3d()
mtext3d(text = "Time (days)", edge = "y+", line =3, col=1 )
mtext3d(text = "Change %", edge = "z++", line = 5, col=1 )
However I have found the width of the bars restricted to 6. That could be a limit. Better looking when you have more data.
Hope it could help.

Resources