I want an hist and a density on the same plot, I'm trying this:
myPlot <- plot(density(m[,1])), main="", xlab="", ylab="")
par(new=TRUE)
Oldxlim <- myPlot$xlim
Oldylim <- myPlot$ylim
hist(m[,3],xlim=Oldxlim,ylim=Oldylim,prob=TRUE)
but I can't access myPlot's xlim and ylim.
Is there a way to get them from myPlot? What else should I do instead?
Using par(new=TRUE) is rarely, if ever, the best solution. Many plotting functions have an option like add=TRUE that will add to the existing plot (including the plotting function for histograms as mentioned in the comments).
If you really need to do it this way then look at the usr argument to the par function, doing mylims <- par("usr") will give the x and y limits of the existing plot in user coordinates. However when you use that information on a new plot make sure to set xaxs='i' or the actual coordinates used in the new plot will be extended by 4% beyond what you specify.
The functions grconvertX and grconvertY are also useful to know. They could be used or this purpose, but are probably overkill compared to par("usr"), but they can be useful for finding the limits in other coordinate systems, or finding values like the middle of the plotting region in user coordinates.
Have you considered specifying your own xlim and ylim in the first plot (setting them to appropriate values) then just using those values again to set the limits on the histogram in the second plot?
Just by plotting density on its own you should be able to work out sensible values for the minimum and maximum values for both axes then replace xmin, xmax, ymin and ymax for those values in the code below.
something like;
myPlot <- plot(density(m[,1])), main="", xlab="", ylab="", xlim =c(xmin, xmax), ylim = c(ymin, ymax)
par(new=TRUE)
hist(m[,3],xlim=c(min, max),ylim=c(min, max),prob=TRUE)
If for any reason you are not able to use range() to get the limits, I'd follow #Greg's suggestion. This would only work if the par parameters "xaxs" and "yaxs" are set to "s" (which is the default) and the coordinate range is extended by 4%:
plot(seq(0.8,9.8,1), 10:19)
usr <- par('usr')
xr <- (usr[2] - usr[1]) / 27 # 27 = (100 + 2*4) / 4
yr <- (usr[4] - usr[3]) / 27
xlim <- c(usr[1] + xr, usr[2] - xr)
ylim <- c(usr[3] + yr, usr[4] - yr)
I think the best solution is to fix them when you plot your density.
Otherwise hacing in the code of plot.default (plot.R)
xlab=""
ylab=""
log =""
xy <- xy.coords(x, y, xlab, ylab, log)
xlim1 <- range(xy$x[is.finite(xy$x)])
ylim1 <- range(xy$y[is.finite(xy$y)])
or to use the code above to generate xlim and ylim then call your plot for density
dd <- density(c(-20,rep(0,98),20))
plot(dd,xlim=xlim1,ylim=ylim1)
x <- rchisq(100, df = 4)
hist(x,xlim=xlim1,ylim=xlim1,prob=TRUE,add=TRUE)
Why not use ggplot2?
library(ggplot2)
set.seed(42)
df <- data.frame(x = rnorm(500,mean=10,sd=5),y = rlnorm(500,sdlog=1.1))
p1 <- ggplot(df) +
geom_histogram(aes(x=y,y = ..density..),binwidth=2) +
geom_density(aes(x=x),fill="green",alpha=0.3)
print(p1)
Related
I think my problem is best explained by an example:
set.seed(12)
n <- 100
x <- rt(n, 1, 0)
library("ggplot2")
p <- ggplot() + geom_density(aes(x))
p
p + xlim(min(x), 300)
default xlim
new xlim
Why does the y axis automatically change when I change xlim? The density should not change, so it does not make sense to me. When I use base plot this does not happen.
plot(density(x))
plot(density(x), xlim = c(min(x), 300))
Using xlim completely drops observations that are outside of the range. Try using p + coord_cartesian(xlim = c(min(x), 300)).
Use geom_density(..., n=2^16) or similar for a more stable experience.
It would appear that in contrast to density, the function geom_density does take the x range set via xlim into account when deciding at which points to evaluate the density estimation. However, the number of such points remains fixed at 512 (unless using n to set it to a higher value). Hence the larger the x range, the more likely some peaks will be missed. I think this should be documented.
Suppose I generate data using x <- rnorm(10000) and then plot a simple histogram using hist(x).
This obviously shows that the data is normal, but the x and y axes are determined by the values generated. How could I adjust x so that the histogram will still appear as a normal curve, but on a plot whose bounds are x=[0,1] and y=[0,1]. I tried using this normalization method from another answer, https://stats.stackexchange.com/questions/70801/how-to-normalize-data-to-0-1-range, and setting xlim and ylim to c(0,1), but the result was not what I wanted, as it basically just fills up the entire plot.
I'm not sure what you mean by 'fills up the whole plot'. This code seems to work fine:
x <- rnorm(1000)
z <- (x - min(x))/(max(x) - min(x))
hist(z)
Then if you want the y-axis on a scale of 0-1:
hist1 <- hist(z)
hist1$counts <- hist1$counts/sum(hist1$counts)
plot(hist1, ylim = c(0,1)) ## Looks squished to me if you include the ylim argument
I'm trying to log-transform the x axis of a density plot and get unexpected results. The code without the transformation works fine:
library(ggplot2)
data = data.frame(x=c(1,2,10,11,1000))
dens = density(data$x)
densy = sapply(data$x, function(x) { dens$y[findInterval(x, dens$x)] })
ggplot(data, aes(x = x)) +
geom_density() +
geom_point(y = densy)
If I add scale_x_log10(), I get the following result:
Apart from the y values having been rescaled, something seems to have happened to the x values as well -- the peaks of the density function are not quite where the points are.
Am I using the log transformation incorrectly here?
The shape of the density curve changes after the transformation because the distribution of the data has changed and the bandwidths are different. If you set a bandwidth of (bw=1000) prior to the transformation and 10 afterward, you will get two normal looking densities (with different y-axis values because the support will be much larger in the first case). Here is an example showing how varying bandwidths change the shape of the density.
data = data.frame(x=c(1,2,10,11,1000), y=0)
## Examine how changing bandwidth changes the shape of the curve
par(mfrow=c(2,1))
greys <- colorRampPalette(c("black", "red"))(10)
plot(density(data$x), main="No Transform")
points(data, pch=19)
plot(density(log10(data$x)), ylim=c(0,2), main="Log-transform w/ varying bw")
points(log10(data$x), data$y, pch=19)
for (i in 1:10)
points(density(log10(data$x), bw=0.02*i), col=greys[i], type="l")
legend("topright", paste(0.02*1:10), col=greys, lty=2, cex=0.8)
I'm trying to calculate a Bezier-like spline curve that passes through a sequence of x-y coordinates. An example would be like the following output from the cscvn function in Matlab (example link):
I believe the (no longer maintained) grid package used to do this (grid.xspline function?), but I haven't been able to install an archived version of the package, and don't find any examples exactly along the lines of what I would like.
The bezier package also looks promising, but it is very slow and I also can't get it quite right:
library(bezier)
set.seed(1)
n <- 10
x <- runif(n)
y <- runif(n)
p <- cbind(x,y)
xlim <- c(min(x) - 0.1*diff(range(x)), c(max(x) + 0.1*diff(range(x))))
ylim <- c(min(y) - 0.1*diff(range(y)), c(max(y) + 0.1*diff(range(y))))
plot(p, xlim=xlim, ylim=ylim)
text(p, labels=seq(n), pos=3)
bp <- pointsOnBezier(cbind(x,y), n=100)
lines(bp$points)
arrows(bp$points[nrow(bp$points)-1,1], bp$points[nrow(bp$points)-1,2],
bp$points[nrow(bp$points),1], bp$points[nrow(bp$points),2]
)
As you can see, it doesn't pass through any points except the end values.
I would greatly appreciate some guidance here!
There is no need to use grid really. You can access xspline from the graphics package.
Following from your code and the shape from #mrflick:
set.seed(1)
n <- 10
x <- runif(n)
y <- runif(n)
p <- cbind(x,y)
xlim <- c(min(x) - 0.1*diff(range(x)), c(max(x) + 0.1*diff(range(x))))
ylim <- c(min(y) - 0.1*diff(range(y)), c(max(y) + 0.1*diff(range(y))))
plot(p, xlim=xlim, ylim=ylim)
text(p, labels=seq(n), pos=3)
You just need one extra line:
xspline(x, y, shape = c(0,rep(-1, 10-2),0), border="red")
It may not the be the best approach, bit grid certainly isn't inactive. It's included as a default package with the R installation. It's the underlying graphics engine for plotting libraries like lattice and ggplot. You shouldn't need to install it, you should just be able to load it. Here's how I might translate your code to use grid.xpline
set.seed(1)
n <- 10
x <- runif(n)
y <- runif(n)
xlim <- c(min(x) - 0.1*diff(range(x)), c(max(x) + 0.1*diff(range(x))))
ylim <- c(min(y) - 0.1*diff(range(y)), c(max(y) + 0.1*diff(range(y))))
library(grid)
grid.newpage()
pushViewport(viewport(xscale=xlim, yscale=ylim))
grid.points(x, y, pch=16, size=unit(2, "mm"),
default.units="native")
grid.text(seq(n), x,y, just=c("center","bottom"),
default.units="native")
grid.xspline(x, y, shape=c(0,rep(-1, 10-2),0), open=TRUE,
default.units="native")
popViewport()
which results in
note that grid is pretty low-level so it's not super easy to work with, but it does allow you far more control of what and where you plot.
And if you want to extract the points along the curve rather than draw it, look at the ?xsplinePoints help page.
Thanks to all that helped with this. I'm summarizing the lessons learned plus a few other aspects.
Catmull-Rom spline vs. cubic B-spline
Negative shape values in the xspline function return a Catmull-Rom type spline, with spline passing through the x-y points. Positive values return a cubic B type spline. Zero values return a sharp corner. If a single shape value is given, this is used for all points. The shape of end points is always treated like a sharp corner (shape=0), and other values do not influence the resulting spline at the end points:
# Catmull-Rom spline vs. cubic B-spline
plot(p, xlim=extendrange(x, f=0.2), ylim=extendrange(y, f=0.2))
text(p, labels=seq(n), pos=3)
# Catmull-Rom spline (-1)
xspline(p, shape = -1, border="red", lwd=2)
# Catmull-Rom spline (-0.5)
xspline(p, shape = -0.5, border="orange", lwd=2)
# cubic B-spline (0.5)
xspline(p, shape = 0.5, border="green", lwd=2)
# cubic B-spline (1)
xspline(p, shape = 1, border="blue", lwd=2)
legend("bottomright", ncol=2, legend=c(-1,-0.5), title="Catmull-Rom spline", col=c("red", "orange"), lty=1)
legend("topleft", ncol=2, legend=c(1, 0.5), title="cubic B-spline", col=c("blue", "green"), lty=1)
Extracting results from xspline for external plotting
This took some searching, but the trick is to apply the argument draw=FALSE to xspline.
# Extract xy values
plot(p, xlim=extendrange(x, f=0.1), ylim=extendrange(y, f=0.1))
text(p, labels=seq(n), pos=3)
spl <- xspline(x, y, shape = -0.5, draw=FALSE)
lines(spl)
arrows(x0=(spl$x[length(spl$x)-0.01*length(spl$x)]), y0=(spl$y[length(spl$y)-0.01*length(spl$y)]),
x1=(spl$x[length(spl$x)]), y1=(spl$y[length(spl$y)])
)
I'm trying to generate a histogram in R with a logarithmic scale for y. Currently I do:
hist(mydata$V3, breaks=c(0,1,2,3,4,5,25))
This gives me a histogram, but the density between 0 to 1 is so great (about a million values difference) that you can barely make out any of the other bars.
Then I've tried doing:
mydata_hist <- hist(mydata$V3, breaks=c(0,1,2,3,4,5,25), plot=FALSE)
plot(rpd_hist$counts, log="xy", pch=20, col="blue")
It gives me sorta what I want, but the bottom shows me the values 1-6 rather than 0, 1, 2, 3, 4, 5, 25. It's also showing the data as points rather than bars. barplot works but then I don't get any bottom axis.
A histogram is a poor-man's density estimate. Note that in your call to hist() using default arguments, you get frequencies not probabilities -- add ,prob=TRUE to the call if you want probabilities.
As for the log axis problem, don't use 'x' if you do not want the x-axis transformed:
plot(mydata_hist$count, log="y", type='h', lwd=10, lend=2)
gets you bars on a log-y scale -- the look-and-feel is still a little different but can probably be tweaked.
Lastly, you can also do hist(log(x), ...) to get a histogram of the log of your data.
Another option would be to use the package ggplot2.
ggplot(mydata, aes(x = V3)) + geom_histogram() + scale_x_log10()
It's not entirely clear from your question whether you want a logged x-axis or a logged y-axis. A logged y-axis is not a good idea when using bars because they are anchored at zero, which becomes negative infinity when logged. You can work around this problem by using a frequency polygon or density plot.
Dirk's answer is a great one. If you want an appearance like what hist produces, you can also try this:
buckets <- c(0,1,2,3,4,5,25)
mydata_hist <- hist(mydata$V3, breaks=buckets, plot=FALSE)
bp <- barplot(mydata_hist$count, log="y", col="white", names.arg=buckets)
text(bp, mydata_hist$counts, labels=mydata_hist$counts, pos=1)
The last line is optional, it adds value labels just under the top of each bar. This can be useful for log scale graphs, but can also be omitted.
I also pass main, xlab, and ylab parameters to provide a plot title, x-axis label, and y-axis label.
Run the hist() function without making a graph, log-transform the counts, and then draw the figure.
hist.data = hist(my.data, plot=F)
hist.data$counts = log(hist.data$counts, 2)
plot(hist.data)
It should look just like the regular histogram, but the y-axis will be log2 Frequency.
I've put together a function that behaves identically to hist in the default case, but accepts the log argument. It uses several tricks from other posters, but adds a few of its own. hist(x) and myhist(x) look identical.
The original problem would be solved with:
myhist(mydata$V3, breaks=c(0,1,2,3,4,5,25), log="xy")
The function:
myhist <- function(x, ..., breaks="Sturges",
main = paste("Histogram of", xname),
xlab = xname,
ylab = "Frequency") {
xname = paste(deparse(substitute(x), 500), collapse="\n")
h = hist(x, breaks=breaks, plot=FALSE)
plot(h$breaks, c(NA,h$counts), type='S', main=main,
xlab=xlab, ylab=ylab, axes=FALSE, ...)
axis(1)
axis(2)
lines(h$breaks, c(h$counts,NA), type='s')
lines(h$breaks, c(NA,h$counts), type='h')
lines(h$breaks, c(h$counts,NA), type='h')
lines(h$breaks, rep(0,length(h$breaks)), type='S')
invisible(h)
}
Exercise for the reader: Unfortunately, not everything that works with hist works with myhist as it stands. That should be fixable with a bit more effort, though.
Here's a pretty ggplot2 solution:
library(ggplot2)
library(scales) # makes pretty labels on the x-axis
breaks=c(0,1,2,3,4,5,25)
ggplot(mydata,aes(x = V3)) +
geom_histogram(breaks = log10(breaks)) +
scale_x_log10(
breaks = breaks,
labels = scales::trans_format("log10", scales::math_format(10^.x))
)
Note that to set the breaks in geom_histogram, they had to be transformed to work with scale_x_log10