I am trying to plot multiple functions using curve(). My example tries to plot multiple normal distributions with different means and the same standard deviation.
png("d:/R/standardnormal-different-means.png",width=600,height=300)
#First normal distribution
curve(dnorm,
from=-2,to=2,ylab="d(x)",
xlim=c(-5,5))
abline(v=0,lwd=4,col="black")
#Only second normal distribution is plotted
myMean <- -1
curve(dnorm(x,mean=myMean),
from=myMean-2,to=myMean+2,
ylab="d(x)",xlim=c(-5,5), col="blue")
abline(v=-1,lwd=4,col="blue")
dev.off()
As the curve() function creates a new plot each time, only the second normal distribution is plotted.
I reopened this question because the ostensible duplicates focus on plotting two different functions or two different y-vectors with separate calls to curve. But since we want the same function, dnorm, plotted for different means, we can automate the process (although the answers to the other questions could also be generalized and automated in a similar way).
For example:
my_curve = function(m, col) {
curve(dnorm(x, mean=m), from=m - 3, to=m + 3, col=col, add=TRUE)
abline(v=m, lwd=2, col=col)
}
plot(NA, xlim=c(-10,10), ylim=c(0,0.4), xlab="Mean", ylab="d(x)")
mapply(my_curve, seq(-6,6,2), rainbow(7))
Or, to generalize still further, let's allow multiple means and standard deviations and provide an option regarding whether to include a mean line:
my_curve = function(m, sd, col, meanline=TRUE) {
curve(dnorm(x, mean=m, sd=sd), from=m - 3*sd, to=m + 3*sd, col=col, add=TRUE)
if(meanline==TRUE) abline(v=m, lwd=2, col=col)
}
plot(NA, xlim=c(-10,10), ylim=c(0,0.4), xlab="Mean", ylab="d(x)")
mapply(my_curve, rep(0,4), 4:1, rainbow(4), MoreArgs=list(meanline=FALSE))
You can also use line segments that start at zero and stop at the top of the density distribution, rather than extending all the way from the bottom to the top of the plot. For a normal distribution the mean is also the point of highest density. However, I've used the which.max approach below as a more general way of identifying the x-value at which the maximum y-value occurs. I've also added arguments for line width (lwd) and line end cap style (lend=1 means flat rather than rounded):
my_curve = function(m, sd, col, meanline=TRUE, lwd=1, lend=1) {
x=curve(dnorm(x, mean=m, sd=sd), from=m - 3*sd, to=m + 3*sd, col=col, add=TRUE)
if(meanline==TRUE) segments(m, 0, m, x$y[which.max(x$y)], col=col, lwd=lwd, lend=lend)
}
plot(NA, xlim=c(-10,20), ylim=c(0,0.4), xlab="Mean", ylab="d(x)")
mapply(my_curve, seq(-5,5,5), c(1,3,5), rainbow(3))
Related
I am using R to plot a function, and want to add lines describing multiple functions to the same plot. To plot a function, I write:
plot(function(x){x},
xlab="Celsius", xlim=c(-100, 100),
ylab="Degrees", ylim=c(-100, 100))
This would plot a 1:1 line. If I want to plot a different function on the same graph, I can use the points() function but this requires data values for x to be provided such that it plots length(x) points (joined by lines) as:
points(x=seq(-100, 100, by=0.1),
y=c(seq(-100, 100, by=0.1)-32)*5/9,
typ="l", col="red")
Is it possible to add lines to a plot when plotting a function rather than having to calculate data points using points() or another function? Essentially, it would be something like this:
plot(function(x){x},
xlab="Celsius", xlim=c(-100, 100),
ylab="Degrees", ylim=c(-100, 100))
points(function(x){(x-32)*5/9},
typ="l", col="red")
This is just an example, it shows the relationship between degrees Celsius on the X axis, and degrees on the Y axis in Celsius (black) and Fahrenheit (red). In reality I want to plot multiple complex functions but that would just add noise to the question.
One solution I found is
plot(function(x){x},
xlab="Celsius", xlim=c(-100, 100),
ylab="Degrees", ylim=c(-100, 100))
par(new=TRUE)
plot(function(x){(x-32)*5/9},
xlab="", xlim=c(-100, 100),
ylab="", ylim=c(-100, 100),
axes=FALSE, col="red")
But it seems cumbersome having to define limits and labels and AXES=FALSE each time.
You can use the plot function twice and add add = TRUE for the second plot.
With plot, you can also use from and to parameters to avoid repeating the y-axis limits, although it will keep the y-axis limits defined in the first plot (so it might not be optimal).
plot(function(x){x},
xlab="Celsius", xlim=c(-100, 100),
ylab="Degrees", ylim=c(-100, 100))
plot(function(x) {(x-32)*5/9}, from = -100, to = 100, typ="l", col="red", add=T)
As mentioned by #Roland and #user2554330, you can also use curves if you want to plot multiple lines from the same function, and use () to avoid assigning the function beforehand, with add = i!=1 standing for add = T at every iteration except the first one.
for(y in 1:10) {
curve((x + 10*y), from=-100, to=100, add=i!=1)
}
Having trouble creating a plot of different power functions for different alpha levels. This is what I have currently but I cannot figure out how to create the multiple lines representing the smooth power function across different alpha levels:
d <- data.frame()
for (s in seq(0,.5,.05)) {
for (n in seq(20,500,by=20)){
d <- rbind(d,power.t.test(n=n,delta = 11,sig.level=s,sd= 22.9))
}
}
d$sig.level.factor <-as.factor(d$sig.level)
plot(d$power~d$n, col=d$sig.level.factor)
for i in length(sig.level.factor){
lines(d$n[d$sig.level.factor==d$sig.level.factor[i]],d$power[d$sig.level.factor==d$sig.level.factor[i]], type="l", lwd=2, col=colors[n])
}
for (i in 1:length(seq(0,.5,.05))){
lines(d$n[d$sig.level.factor==d$sig.level[i]], d$power, type="l", lwd=2, col=colors[i])
}
for (i in 1:length(d$sig.level.factor)){
lines(d$n[d$sig.level.factor==i], d$power[d$sig.level.factor==i], type="l", lwd=2, col=colors[i])
}
My goal is to create the lines that will show the smooth curves connecting all the points that contain equivalent alpha values across different sample sizes.
Slightly late answer but hope you can still use it. You can create a matrix of results over n and significance levels using sapply and then plot everything in one go using the super useful function matplot:
n <- seq(20, 300, by=10)
alphas <- seq(0, .25, .05)
res <- sapply(alphas, function(s) {power.t.test(n=n, delta=11, sig.level=s, sd= 22.9)$power})
matplot(n, res, type="l", xlab="Sample size", ylab="Power")
There is one annoying "feature" of power.t.test and power.prop.test and that is that it is not fully vectorized over all arguments. However, your situation, where the output is the power makes it easier.
In R I'm able to overlap a normal curve to a density histogram:
Eventually I can convert the density histogram to a probability one:
a <- rnorm(1:100)
test <-hist(a, plot=FALSE)
test$counts=(test$counts/sum(test$counts))*100 # Probability
plot(test, ylab="Probability")
curve(dnorm(x, mean=mean(a), sd=sd(a)), add=TRUE)
But I cannot overlap the normal curve anymore since it goes off scale.
Any solution? Maybe a second Y-axis
Now the question is clear to me. Indeed a second y-axis seems to be the best choice for this as the two data sets have completely different scales.
In order to do this you could do:
set.seed(2)
a <- rnorm(1:100)
test <-hist(a, plot=FALSE)
test$counts=(test$counts/sum(test$counts))*100 # Probability
plot(test, ylab="Probability")
#start new graph
par(new=TRUE)
#instead of using curve just use plot and create the data your-self
#this way below is how curve works internally anyway
curve_data <- dnorm(seq(-2, 2, 0.01), mean=mean(a), sd=sd(a))
#plot the line with no axes or labels
plot(seq(-2, 2, 0.01), curve_data, axes=FALSE, xlab='', ylab='', type='l', col='red' )
#add these now with axis
axis(4, at=pretty(range(curve_data)))
Output:
At first you should save your rnorm data otherwise you get different data each time.
seed = rnorm(100)
Next go ahead with
hist(seed,probability = T)
curve(dnorm(x, mean=mean(na.omit(seed)), sd=sd(na.omit(seed))), add=TRUE)
Now you have the expected result. Histogram with density curve.
The y-axis isn't a "probability" as you have labeled it. It is count data. If you convert your histogram to probabilities, you shouldn't have a problem:
x <- rnorm(1000)
hist(x, freq= FALSE, ylab= "Probability")
curve(dnorm(x, mean=mean(x), sd=sd(x)), add=TRUE)
This question already has answers here:
Closed 10 years ago.
Possible Duplicate:
Fitting a density curve to a histogram in R
I'd like to plot on the same graph the histogram and various pdf's. I've tried for just one pdf with the following code (adopted from code I've found in the web):
hist(data, freq = FALSE, col = "grey", breaks = "FD")
.x <- seq(0, 0.1, length.out=100)
curve(dnorm(.x, mean=a, sd=b), col = 2, add = TRUE)
It gives me an error. Can you advise me?
For multiple pdf's what's the trick?
And I've observed that the histogram seems to be plot the density (on y-y axis) instead of the number of observations.... how can I change this?
Many thanks!
It plots the density instead of the frequency because you specified freq=FALSE. It is not very fair to complain about it doing exactly what you told it to do.
The curve function expects an expression involving x (not .x) and it does not require you to precompute the x values. You probably want something like:
a <- 5
b <- 2
hist( rnorm(100, a, b), freq=FALSE )
curve( dnorm(x,a,b), add=TRUE )
To head of your next question, if you specify freq=TRUE (or just leave it out for the default) and add the curve then the curve just runs along the bottom (that is the whole purpose of plotting the histogram as a density rather than frequencies). You can work around this by scaling the expression given to curve by the width of the bins and the number of total points:
out <- hist( rnorm(100, a, b) )
curve( dnorm(x,a,b)*100*diff(out$breaks[1:2]), add=TRUE )
Though personally the first option (density scale) without tickmark labels on the y-axis makes more sense to me.
h<-hist(data, breaks="FD", col="red", xlab="xTitle", main="Normal pdf and histogram")
xfit<-seq(min(data),max(data),length=100)
x.norm<-rnorm(n=100000, mean=a, sd=b)
yfit<-dnorm(xfit,mean=mean(x.norm),sd=sd(x.norm))
yfit <- yfit*diff(h$mids[1:2])*length(loose_All)
lines(xfit, yfit, col="blue", lwd=2)
I'm trying to generate a histogram in R with a logarithmic scale for y. Currently I do:
hist(mydata$V3, breaks=c(0,1,2,3,4,5,25))
This gives me a histogram, but the density between 0 to 1 is so great (about a million values difference) that you can barely make out any of the other bars.
Then I've tried doing:
mydata_hist <- hist(mydata$V3, breaks=c(0,1,2,3,4,5,25), plot=FALSE)
plot(rpd_hist$counts, log="xy", pch=20, col="blue")
It gives me sorta what I want, but the bottom shows me the values 1-6 rather than 0, 1, 2, 3, 4, 5, 25. It's also showing the data as points rather than bars. barplot works but then I don't get any bottom axis.
A histogram is a poor-man's density estimate. Note that in your call to hist() using default arguments, you get frequencies not probabilities -- add ,prob=TRUE to the call if you want probabilities.
As for the log axis problem, don't use 'x' if you do not want the x-axis transformed:
plot(mydata_hist$count, log="y", type='h', lwd=10, lend=2)
gets you bars on a log-y scale -- the look-and-feel is still a little different but can probably be tweaked.
Lastly, you can also do hist(log(x), ...) to get a histogram of the log of your data.
Another option would be to use the package ggplot2.
ggplot(mydata, aes(x = V3)) + geom_histogram() + scale_x_log10()
It's not entirely clear from your question whether you want a logged x-axis or a logged y-axis. A logged y-axis is not a good idea when using bars because they are anchored at zero, which becomes negative infinity when logged. You can work around this problem by using a frequency polygon or density plot.
Dirk's answer is a great one. If you want an appearance like what hist produces, you can also try this:
buckets <- c(0,1,2,3,4,5,25)
mydata_hist <- hist(mydata$V3, breaks=buckets, plot=FALSE)
bp <- barplot(mydata_hist$count, log="y", col="white", names.arg=buckets)
text(bp, mydata_hist$counts, labels=mydata_hist$counts, pos=1)
The last line is optional, it adds value labels just under the top of each bar. This can be useful for log scale graphs, but can also be omitted.
I also pass main, xlab, and ylab parameters to provide a plot title, x-axis label, and y-axis label.
Run the hist() function without making a graph, log-transform the counts, and then draw the figure.
hist.data = hist(my.data, plot=F)
hist.data$counts = log(hist.data$counts, 2)
plot(hist.data)
It should look just like the regular histogram, but the y-axis will be log2 Frequency.
I've put together a function that behaves identically to hist in the default case, but accepts the log argument. It uses several tricks from other posters, but adds a few of its own. hist(x) and myhist(x) look identical.
The original problem would be solved with:
myhist(mydata$V3, breaks=c(0,1,2,3,4,5,25), log="xy")
The function:
myhist <- function(x, ..., breaks="Sturges",
main = paste("Histogram of", xname),
xlab = xname,
ylab = "Frequency") {
xname = paste(deparse(substitute(x), 500), collapse="\n")
h = hist(x, breaks=breaks, plot=FALSE)
plot(h$breaks, c(NA,h$counts), type='S', main=main,
xlab=xlab, ylab=ylab, axes=FALSE, ...)
axis(1)
axis(2)
lines(h$breaks, c(h$counts,NA), type='s')
lines(h$breaks, c(NA,h$counts), type='h')
lines(h$breaks, c(h$counts,NA), type='h')
lines(h$breaks, rep(0,length(h$breaks)), type='S')
invisible(h)
}
Exercise for the reader: Unfortunately, not everything that works with hist works with myhist as it stands. That should be fixable with a bit more effort, though.
Here's a pretty ggplot2 solution:
library(ggplot2)
library(scales) # makes pretty labels on the x-axis
breaks=c(0,1,2,3,4,5,25)
ggplot(mydata,aes(x = V3)) +
geom_histogram(breaks = log10(breaks)) +
scale_x_log10(
breaks = breaks,
labels = scales::trans_format("log10", scales::math_format(10^.x))
)
Note that to set the breaks in geom_histogram, they had to be transformed to work with scale_x_log10