Plot ranges when plotting multiple things - r

R will automatically determine the plot ranges (xlim, ylim) from the first plot() call. But when you are plotting multiple things on a single plot, the subsequent calls to plot() might not fit in the frame, as in this case:
mu <- 8
sd <- 8
plot(function (x) dnorm(x, mu, sd, log = TRUE), xlim = c(0, 10)) # log likelihood
plot(function (x) (mu-x)/sd^2, col = "green", add = TRUE, xlim = c(0, 10)) # derivative log likelihood
I know I could first determine the ranges of all plot components myself and then min and max them together and pass the range to the first plot() call... but it is sooo inconvenient... and results in R scripts which are bulky and not easy to read.
Is there some simple way to handle this in R, am I missing something? I am sure libraries like ggplot or lattice have better solutions, would be interesting to see them, but I strongly prefer solution with the base R. Thanks! :)
EDIT: is it possible something like to defer the plotting of the plot until I call the last plot() and then plot everything? :-) This could be very elegant and the code would stay nicely compact :)

You could adapt your approach a little and define the functions beforehand.
fun1 <- function(x) dnorm(x, mu, sd, log=TRUE)
fun2 <- function(x) (mu-x)/sd^2
Then you easily may calculate the range of everything in the desired xlim of the plot – here 0:10.
y.range <- range(sapply(0:10, function(x) c(fun1(x), fun2(x))))
Finally rather use the already defined functions than repeat them in the plot() call. NB you may be looking for curve(), but it also works with plot() in this case.
curve(fun1, xlim=c(0, 10), ylim=y.range)
curve(fun2, col="green", add=TRUE, xlim=c(0, 10))

How about something along this line:
plot(function (x) dnorm(x, mu, sd, log = TRUE), xlim = c(0, 10)) # log likelihood
par(new=TRUE)
plot.function(function (x) (mu-x)/sd^2, col = "green", axes=FALSE, ann=FALSE) # derivative log
axis(side = 4)
Output

Related

Plotting power plot in R without using ggplot2()?

Having trouble creating a plot of different power functions for different alpha levels. This is what I have currently but I cannot figure out how to create the multiple lines representing the smooth power function across different alpha levels:
d <- data.frame()
for (s in seq(0,.5,.05)) {
for (n in seq(20,500,by=20)){
d <- rbind(d,power.t.test(n=n,delta = 11,sig.level=s,sd= 22.9))
}
}
d$sig.level.factor <-as.factor(d$sig.level)
plot(d$power~d$n, col=d$sig.level.factor)
for i in length(sig.level.factor){
lines(d$n[d$sig.level.factor==d$sig.level.factor[i]],d$power[d$sig.level.factor==d$sig.level.factor[i]], type="l", lwd=2, col=colors[n])
}
for (i in 1:length(seq(0,.5,.05))){
lines(d$n[d$sig.level.factor==d$sig.level[i]], d$power, type="l", lwd=2, col=colors[i])
}
for (i in 1:length(d$sig.level.factor)){
lines(d$n[d$sig.level.factor==i], d$power[d$sig.level.factor==i], type="l", lwd=2, col=colors[i])
}
My goal is to create the lines that will show the smooth curves connecting all the points that contain equivalent alpha values across different sample sizes.
Slightly late answer but hope you can still use it. You can create a matrix of results over n and significance levels using sapply and then plot everything in one go using the super useful function matplot:
n <- seq(20, 300, by=10)
alphas <- seq(0, .25, .05)
res <- sapply(alphas, function(s) {power.t.test(n=n, delta=11, sig.level=s, sd= 22.9)$power})
matplot(n, res, type="l", xlab="Sample size", ylab="Power")
There is one annoying "feature" of power.t.test and power.prop.test and that is that it is not fully vectorized over all arguments. However, your situation, where the output is the power makes it easier.

Multiple plots using curve() function (e.g. normal distribution)

I am trying to plot multiple functions using curve(). My example tries to plot multiple normal distributions with different means and the same standard deviation.
png("d:/R/standardnormal-different-means.png",width=600,height=300)
#First normal distribution
curve(dnorm,
from=-2,to=2,ylab="d(x)",
xlim=c(-5,5))
abline(v=0,lwd=4,col="black")
#Only second normal distribution is plotted
myMean <- -1
curve(dnorm(x,mean=myMean),
from=myMean-2,to=myMean+2,
ylab="d(x)",xlim=c(-5,5), col="blue")
abline(v=-1,lwd=4,col="blue")
dev.off()
As the curve() function creates a new plot each time, only the second normal distribution is plotted.
I reopened this question because the ostensible duplicates focus on plotting two different functions or two different y-vectors with separate calls to curve. But since we want the same function, dnorm, plotted for different means, we can automate the process (although the answers to the other questions could also be generalized and automated in a similar way).
For example:
my_curve = function(m, col) {
curve(dnorm(x, mean=m), from=m - 3, to=m + 3, col=col, add=TRUE)
abline(v=m, lwd=2, col=col)
}
plot(NA, xlim=c(-10,10), ylim=c(0,0.4), xlab="Mean", ylab="d(x)")
mapply(my_curve, seq(-6,6,2), rainbow(7))
Or, to generalize still further, let's allow multiple means and standard deviations and provide an option regarding whether to include a mean line:
my_curve = function(m, sd, col, meanline=TRUE) {
curve(dnorm(x, mean=m, sd=sd), from=m - 3*sd, to=m + 3*sd, col=col, add=TRUE)
if(meanline==TRUE) abline(v=m, lwd=2, col=col)
}
plot(NA, xlim=c(-10,10), ylim=c(0,0.4), xlab="Mean", ylab="d(x)")
mapply(my_curve, rep(0,4), 4:1, rainbow(4), MoreArgs=list(meanline=FALSE))
You can also use line segments that start at zero and stop at the top of the density distribution, rather than extending all the way from the bottom to the top of the plot. For a normal distribution the mean is also the point of highest density. However, I've used the which.max approach below as a more general way of identifying the x-value at which the maximum y-value occurs. I've also added arguments for line width (lwd) and line end cap style (lend=1 means flat rather than rounded):
my_curve = function(m, sd, col, meanline=TRUE, lwd=1, lend=1) {
x=curve(dnorm(x, mean=m, sd=sd), from=m - 3*sd, to=m + 3*sd, col=col, add=TRUE)
if(meanline==TRUE) segments(m, 0, m, x$y[which.max(x$y)], col=col, lwd=lwd, lend=lend)
}
plot(NA, xlim=c(-10,20), ylim=c(0,0.4), xlab="Mean", ylab="d(x)")
mapply(my_curve, seq(-5,5,5), c(1,3,5), rainbow(3))

How to make a plot of generalized beta distribution?

I am trying to plot the beta-gumbel distribution using R(software) by the following,
The genreal idea is that, in the pdf of beta distribution, instead of plugging in x, we use the cdf of gumbel instead. But I couldn't get the right plot.
x <- seq(-3, 3, length=100)
Fx = pgumbel(x,loc=0,scale=1)
y = dbeta(Fx,shape1=0.5,shape2=0.5)
plot(x, y, type="l", lty=2, xlab="x value", ylab="Density",ylim=c(0,1))
I don't believe you when you say that you didn't use any add-on packages: pgumbel() is not in base R. library("sos"); findFn("pgumbel") finds it in a variety of places, I used the evd package.
There are a couple of small issues here.
library("evd")
The main thing is that you want length=100 rather than by=100 (which gives you a single-element x vector):
x <- seq(-3, 3, length=100)
The actual computations are OK:
Fx = pgumbel(x,loc=0,scale=1)
y = dbeta(Fx,shape1=0.5,shape2=0.5)
You need to change ylim to be able to see what's going on. However, I also think you need to do something to account for the differential dx in order to get a proper density function (that's more of a StackExchange than a StackOverflow question though).
par(las=1,bty="l") ## my personal preferences
plot(x, y, type="l", lty=2, xlab="x value", ylab="Density",ylim=c(0,40))

Get xlim from a plot in R

I want an hist and a density on the same plot, I'm trying this:
myPlot <- plot(density(m[,1])), main="", xlab="", ylab="")
par(new=TRUE)
Oldxlim <- myPlot$xlim
Oldylim <- myPlot$ylim
hist(m[,3],xlim=Oldxlim,ylim=Oldylim,prob=TRUE)
but I can't access myPlot's xlim and ylim.
Is there a way to get them from myPlot? What else should I do instead?
Using par(new=TRUE) is rarely, if ever, the best solution. Many plotting functions have an option like add=TRUE that will add to the existing plot (including the plotting function for histograms as mentioned in the comments).
If you really need to do it this way then look at the usr argument to the par function, doing mylims <- par("usr") will give the x and y limits of the existing plot in user coordinates. However when you use that information on a new plot make sure to set xaxs='i' or the actual coordinates used in the new plot will be extended by 4% beyond what you specify.
The functions grconvertX and grconvertY are also useful to know. They could be used or this purpose, but are probably overkill compared to par("usr"), but they can be useful for finding the limits in other coordinate systems, or finding values like the middle of the plotting region in user coordinates.
Have you considered specifying your own xlim and ylim in the first plot (setting them to appropriate values) then just using those values again to set the limits on the histogram in the second plot?
Just by plotting density on its own you should be able to work out sensible values for the minimum and maximum values for both axes then replace xmin, xmax, ymin and ymax for those values in the code below.
something like;
myPlot <- plot(density(m[,1])), main="", xlab="", ylab="", xlim =c(xmin, xmax), ylim = c(ymin, ymax)
par(new=TRUE)
hist(m[,3],xlim=c(min, max),ylim=c(min, max),prob=TRUE)
If for any reason you are not able to use range() to get the limits, I'd follow #Greg's suggestion. This would only work if the par parameters "xaxs" and "yaxs" are set to "s" (which is the default) and the coordinate range is extended by 4%:
plot(seq(0.8,9.8,1), 10:19)
usr <- par('usr')
xr <- (usr[2] - usr[1]) / 27 # 27 = (100 + 2*4) / 4
yr <- (usr[4] - usr[3]) / 27
xlim <- c(usr[1] + xr, usr[2] - xr)
ylim <- c(usr[3] + yr, usr[4] - yr)
I think the best solution is to fix them when you plot your density.
Otherwise hacing in the code of plot.default (plot.R)
xlab=""
ylab=""
log =""
xy <- xy.coords(x, y, xlab, ylab, log)
xlim1 <- range(xy$x[is.finite(xy$x)])
ylim1 <- range(xy$y[is.finite(xy$y)])
or to use the code above to generate xlim and ylim then call your plot for density
dd <- density(c(-20,rep(0,98),20))
plot(dd,xlim=xlim1,ylim=ylim1)
x <- rchisq(100, df = 4)
hist(x,xlim=xlim1,ylim=xlim1,prob=TRUE,add=TRUE)
Why not use ggplot2?
library(ggplot2)
set.seed(42)
df <- data.frame(x = rnorm(500,mean=10,sd=5),y = rlnorm(500,sdlog=1.1))
p1 <- ggplot(df) +
geom_histogram(aes(x=y,y = ..density..),binwidth=2) +
geom_density(aes(x=x),fill="green",alpha=0.3)
print(p1)

histogram and pdf in the same graph [duplicate]

This question already has answers here:
Closed 10 years ago.
Possible Duplicate:
Fitting a density curve to a histogram in R
I'd like to plot on the same graph the histogram and various pdf's. I've tried for just one pdf with the following code (adopted from code I've found in the web):
hist(data, freq = FALSE, col = "grey", breaks = "FD")
.x <- seq(0, 0.1, length.out=100)
curve(dnorm(.x, mean=a, sd=b), col = 2, add = TRUE)
It gives me an error. Can you advise me?
For multiple pdf's what's the trick?
And I've observed that the histogram seems to be plot the density (on y-y axis) instead of the number of observations.... how can I change this?
Many thanks!
It plots the density instead of the frequency because you specified freq=FALSE. It is not very fair to complain about it doing exactly what you told it to do.
The curve function expects an expression involving x (not .x) and it does not require you to precompute the x values. You probably want something like:
a <- 5
b <- 2
hist( rnorm(100, a, b), freq=FALSE )
curve( dnorm(x,a,b), add=TRUE )
To head of your next question, if you specify freq=TRUE (or just leave it out for the default) and add the curve then the curve just runs along the bottom (that is the whole purpose of plotting the histogram as a density rather than frequencies). You can work around this by scaling the expression given to curve by the width of the bins and the number of total points:
out <- hist( rnorm(100, a, b) )
curve( dnorm(x,a,b)*100*diff(out$breaks[1:2]), add=TRUE )
Though personally the first option (density scale) without tickmark labels on the y-axis makes more sense to me.
h<-hist(data, breaks="FD", col="red", xlab="xTitle", main="Normal pdf and histogram")
xfit<-seq(min(data),max(data),length=100)
x.norm<-rnorm(n=100000, mean=a, sd=b)
yfit<-dnorm(xfit,mean=mean(x.norm),sd=sd(x.norm))
yfit <- yfit*diff(h$mids[1:2])*length(loose_All)
lines(xfit, yfit, col="blue", lwd=2)

Resources