Basically, I have trouble plotting the relative frequency histogram, as when I plot the data my y axis always becomes greater than one. I also want to superimpose a normal distribution on top however it never seems to work.
What I have produced so far: https://imgur.com/H9lWBVg
I have tried multiple methods in plotting the histogram such as hist(), truehist() and plot() etc.
truehist(aest,freq=TRUE, xlab = "Average Est", col="blue")
curve(dnorm(x,mean(aest),sd(aest)),col="red", add=TRUE, lwd=2)
legend("topright",legend=c(paste("median = ",toString(mean(aest))),paste("mean = ",toString(median(aest))),paste("SD = ",toString(sd(aest)))), cex=0.65)
You are looking for a density plot, not a frequency one. Try hist with
freq = FALSE
And you'll get the result you want. I don't have your data, but subbing some random data I have in it will look like so:
hist(move$dist,freq=FALSE, xlab = "Average Est", col="blue")
curve(dnorm(x,mean(move$dist),sd(move$dist)),col="red", add=TRUE, lwd=2)
legend("topright",
legend=c(paste("median = ",toString(mean(move$dist))),
paste("mean = ",toString(median(move$dist))),
paste("SD = ",toString(sd(move$dist)))),
cex=0.65)
Or you can do truehist, but then the parameter isn't freq it is
prob = TRUE
which will look something like this:
truehist(move$dist,prob = TRUE, xlab = "Average Est", col="blue", nbins = "fd")
curve(dnorm(x,mean(move$dist),sd(move$dist)),col="red", add=TRUE, lwd=2)
legend("topright",
legend=c(paste("median = ",toString(mean(move$dist))),
paste("mean = ",toString(median(move$dist))),
paste("SD = ",toString(sd(move$dist)))),
cex=0.65)
Related
I am trying to plot a curve that follows the trend of the histogram of my data, I have looked around and have tried out other peoples code but I still get a flat line. Here is my code
hist(Ferr,xlab = "Ferritin Plasma Concentration", ylab = "Frequency", main = "Histogram of Ferritin
Plasma Concentration", xlim = c(0,250), ylim = c(0,50), cex.axis=0.8, cex.lab=0.8,cex.main = 1)
curve(dnorm(x, mean = mean(Ferr), sd = sd(Ferr)), col="blue", add=TRUE)
lines(density(Ferr), col="red")
If anyone can help me to see where I have gone wrong, that would be great thank you.
Unlike an histogram, the integral of a density function over the whole space is equal to 1 :
sum(density(x)*dx) = 1
To scale the density function to the histogram, you can multiply it by the maximum value of the histogram bins and divide it by the distance between points.
Let's take mtcars$mpg as example:
Ferr <- mtcars$mpg
d <- density(Ferr)
dx <- diff(d$x)[1]
sum(d$y)*dx
[1] 1.000851
h <- hist(Ferr)
lines(x=d$x,y=max(h$counts)*d$y/dx)
You need to set freq = FALSE (and remove the constraints on ylimand xlim and change "Frequency" to "Density"):
hist(Ferr,
freq= FALSE,
xlab = "Ferritin Plasma Concentration", ylab = "Density",
main = "Histogram of Ferritin Plasma Concentration",
cex.axis=0.8, cex.lab=0.8,cex.main = 1)
curve(dnorm(x, mean = mean(Ferr), sd = sd(Ferr)), col="blue", add=TRUE)
lines(density(Ferr), col="red")
Toy data:
Ferr <- rnorm(1000)
Can I change the y-axis numbers to be horizontal on an NMDS plot created in vegan?
library(vegan)
sp <- poop[,28:34]
bat <- poop[,4:7]
mds1 <- metaMDS(sp, k=3,try=200)
plot(mds1$points[,1], mds1$points[,2], pch = as.numeric(bat$species),
col= as.numeric(bat$species),
xlab = "NMDS1", ylab= "NMDS2")
In R, the direction of labels is controlled by graphical parameter las (see ?par). You can also give this parameter in plot call for the metaMDS result. As you see from ?par, las=1 will put all labels horizontal.
More seriously, you should not plot metaMDS results like you do. It is better to use the dedicated plot method for the result, or if you want to do it all by yourself, you should at least force equal aspect ratio for axes with asp = 1 in your plot call. So the following should work:
## with metaMDS plot:
plot(mds1, display="si", las=1, type = "n") # for an empty plot
points(mds1, pch = as.numeric(bat$species), col= as.numeric(bat$species))
## or with generic plot:
plot(mds1$points[,1], mds1$points[,2], pch = as.numeric(bat$species),
col= as.numeric(bat$species),
xlab = "NMDS1", ylab= "NMDS2",
asp = 1, las = 1) # this is new
I am struggling with how to create a double-log plot in r, and how to plot two linear regressions in this plot. I have succeeded with getting the x-axis transformed to a log-axis, but I can't get the y-axis transformed into a log-axis.
The code I use is this, and it creates the plot in the link below:
PR_abs <- lm(PR.Interval..s. ~ log.BM., data = rest)
PR_abs.d <- lm(PR.Interval..s. ~ log.BM., data = dist[4:21,])
plot(rest$Body.Mass..kg.,rest$PR.Interval..s., xlab = "", type="n",
ylab= "PR duration (sec)", pch=20, ylim= c(0,1), log="x", cex=1.4)
points(rest[1:3,]$Body.Mass..kg., rest[1:3,]$PR.Interval..s., cex=1.2,pch=17)
points(rest[4:21,]$Body.Mass..kg., rest[4:21,]$PR.Interval..s., cex=1.2, pch=16)
points(rest[4:21,]$Body.Mass..kg., dist[4:21,]$PR.Interval..s., cex=1.2, pch=1)
lines(rest$Body.Mass..kg.,PR_abs$fitted.values)
lines(rest[4:21,]$Body.Mass..kg.,PR_abs.d$fitted.values,lty=2)
legend("topleft", c("Rest","Exercised","Embryo"), cex=0.8, bty="n", lty=c(1,2,0),
pch=c(16,1,17))
I have tried using ggplot2 as well, but without success:
ggplot(data = rest, aes(x = Body.Mass..kg., y = QT..RR)) +
geom_point(data = rest, aes(x = Body.Mass..kg., y = QT..RR), shape=1)+
geom_abline(aes(intercept=coef(QT_rel)[1],slope=coef(QT_rel)[2])) +
coord_trans(y="log2", x="log2")
Any suggestions? :)
Best, Ditte
I would like to show the probability for the histogram, with a density curve fit, and with the bars labeled by the count. The code below generates two figures, the top shows the frequency bars (labeled by frequency) with the density curve. The bottom shows the probability bars (labeled by probability) with the density curve. What I would like to have is the probability bars labeled by frequency, so we can read probability and frequency. Or, I would like to have the second plot, with the bar labels from the first plot.
coeff_value = c(6.32957806, 3.04396650, 0.02487562, 3.50699592, 5.03952569, 3.05907173,
0.41095890, 1.88648325, 5.04250569, 0.89320388, 0.83732057, 1.12033195,
2.35697101, 0.58695652, 4.83363583, 7.91154791, 7.99614644, 9.58737864,
1.27358491, 1.03938247, 8.66028708, 6.32458234, 3.85263158, 1.37299546,
0.53639847, 7.63614043, 0.51502146, 9.86557280, 0.60728745, 3.00613232,
6.46573393, 2.60848869, 2.34273319, 1.82448037, 6.36600884, 0.70043777,
1.47600793, 0.42510121, 2.58064516, 3.45377741, 6.29475205, 4.97536946,
2.24637681, 2.12000000, 1.92792793, 0.97613883, 6.01214190, 4.47316103,
1.87272727, 10.08896797, 0.09049774, 1.93779904, 6.53444676, 3.46590909,
6.52730822, 7.23229671, 4.91740279, 5.24545125)
h=hist(coeff_value,plot=F,freq=T,breaks=10)
h$density = h$density*100
par(mfrow=c(2,1))
plt=plot(h, freq=T, main="Freq = T",xlab="rate",
ylab="Frequency", xlim=c(0, 20), ylim=c(0, 30),
col="gray", labels = TRUE)
densF=density(coeff_value)
lines(densF$x, densF$y*length(coeff_value), lwd=2, col='green')
plt=plot(h, freq=F, main="Freq = F",xlab="rate",
ylab="Probability (%)", xlim=c(0, 20), ylim=c(0, 30),
col="gray", labels = TRUE)
densF=density(coeff_value)
lines(densF$x, densF$y*100, lwd=2, col='green')
paste("bar sum =",sum(h$density))
paste("line integral =",sum((densF$y[-length(densF$y)]*100)*diff(densF$x)))
Just plot your histogram and capture the output (you'll still need to multiply the density by 100 to get to % before plotting):
h <- hist(coeff_value,plot=F,breaks=10)
h$density <- h$density*100
plot(h, freq=F, xlab="rate",
ylab="Probability (%)", ylim=c(0, 25),
col="gray")
densF <- density(coeff_value)
lines(densF$x, densF$y*100, lwd=2, col='green')
Now h contains all the information you need:
text(h$mids,h$density,h$counts,pos=3)
I like to produce my own grid lines when plotting so I can control tick marks, etc. and I am struggling with this with the 'hist' plotting routine.
hist(WindSpeed, breaks=c(0:31), freq=TRUE, col="blue", xaxt="n", yaxt="n", xlab="Wind Speed (m/s)",main="Foo", cex.main=1.5, cex.axis=1, cex.lab=1, tck=1, font.lab=2)
axis(1, tck=1, ,col.ticks="light gray")
axis(1, tck=-0.015, col.ticks="black")
axis(2, tck=1, col.ticks="light gray", lwd.ticks="1")
axis(2, tck=-0.015)
minor.tick(nx=5, ny=2, tick.ratio=0.5)
box()
Plot:
I have then just been able to use the 'lines' or 'points' command to replot the data over top for other types of plots, but with the histogram its not so easy.
Any help would be great.
I added my code below and image based upon John's response...
I added my code below and image based upon John's response...
hist(WindSpeed, breaks=30, freq=TRUE, col="blue", xaxt="n", yaxt="n", xlab="Wind Speed (m/s)",main="Foo", cex.main=1.5, cex.axis=1, cex.lab=1, font.lab=2)
axis(1, tck=1, col.ticks="light gray")
axis(1, tck=-0.015, col.ticks="black")
axis(2, tck=1, col.ticks="light gray", lwd.ticks="1")
axis(2, tck=-0.015)
minor.tick(nx=5, ny=2, tick.ratio=0.5)
box()
hist(WindSpeed, add=TRUE, breaks=30, freq=TRUE, col="blue", xaxt="n", yaxt="n", xlab="Wind Speed (m/s)", main="Foo", cex.main=1.5, cex.axis=1, cex.lab=1, font.lab=2)
Actually, R has a way to do this! It's the panel.first argument to plot.default, which hist calls to do most of the work. It takes an expression which is evaluated "after the plot axes are set up but before any plotting takes place. This can be useful for drawing background grids or scatterplot smooths," to quote from ?plot.default.
hist(WindSpeed, breaks=c(0:31), freq=TRUE, col="blue", xaxt="n", yaxt="n",
xlab="Wind Speed (m/s)", main="Foo",
cex.main=1.5, cex.axis=1, cex.lab=1, tck=1, font.lab=2,
panel.first={
axis(1, tck=1, col.ticks="light gray")
axis(1, tck=-0.015, col.ticks="black")
axis(2, tck=1, col.ticks="light gray", lwd.ticks="1")
axis(2, tck=-0.015)
minor.tick(nx=5, ny=2, tick.ratio=0.5)
box()
})
See How do I draw gridlines using abline() that are behind the data? for another question that uses this method.
This is relatively easy.
Generate the histogram but don't plot it.
h <- hist(y, plot = FALSE)
Now generate your base plot... I've added some features to make it look more like a standard historgram
plot(h$mids, h$counts, ylim = c(0, max(h$counts)), xlim = range(h$mids)*1.1,
type = 'n', bty = 'n', xlab = 'y', ylab = 'Counts', main = 'Histogram of y')
add your grid
grid()
add your histogram
hist(y, add = TRUE)
Or, as I discovered through this process... you can do it even easier
hist(y)
grid()
hist(y, add = TRUE, col = 'white')
This last method is just redrawing the histogram over the grid.
In R, order matters when you plot. As you've discovered, adding things to a plot adds on top of what you've plotted before. So we need a way to plot the grid first and then the histogram. Try something like this:
plot(1:10,1:10,type = "n")
grid(10,10)
hist(rnorm(100,5,1),add = TRUE)
I haven't recreated your example, since it isn't reproducible, but this general idea should work. But the key idea is to create an empty plot with the correct dimensions using the type = "n" option to plot, then add the grid, then add the histogram using the add = TRUE argument.
Note that the add argument is actually for plot.histogram, hist passes it along via ....
The base graphics solution suggested by #joran is fine. Alternatives:
d <- data.frame(x=rnorm(1000))
library(lattice)
histogram(~x,data=d,panel=function(...) {
panel.grid(...)
panel.histogram(...) }
)
Or:
library(ggplot2)
qplot(x,data=d,geom="histogram",binwidth=0.1)+theme_bw()+
labs(x="Wind speed", y="Frequency")
(But of course you will have to learn all the details of adjusting labels, titles, etc. ... I'm not actually sure how to do titles in ggplot ...)
Another methods for grid lines in background:
A)
hist( y, panel.first=grid() ) # see: help( plot.default )
box()
B)
plot.new() # new empty plot
nv <- length( pretty(x) ) - 1 # number of vertical grid lines (or set by hand)
nh <- length( pretty(y) ) - 1 # number of horizontal grid lines (or set by hand)
grid( nx = nv, ny = nh ) # preplot grid lines
par( new = TRUE ) # add next plot
plot( x, y ) # plot or hist, etc
box() # if plot hist
Arbitrary lines in background with abline:
C)
How do I draw gridlines using abline() that are behind the data?
D)
# first, be sure there is no +/-Inf, NA, NaN in x and y
# then, make the container plot with two invisible points:
plot( x = range( pretty( x ) ), y = range( pretty( y ) ), type = "n", ann = FALSE )
abline( h = hlines, v = vlines ) # draw lines. hlines, vlines: vectors of coordinates
par( new = TRUE ) # add next plot. It is not necessary with points, lines, segments, ...
plot( x, y ) # plot, hist, etc
box() # if plot hist