To plot the empirical cumulative density of three variables, x1, x2 and x3, I used the following in r:
plot.ecdf(x1, col="blue",
main="Distribution XYZ",
xlab="x_i", ylab="Prob(x_i<=y)")
lines(ecdf(x2), col="red") # adds a line
lines(ecdf(x3), col="green") # adds line
legend(600,0.6, # places a legend from (x,y)=(600,0.6) on
c("x1","x2","x3"), # puts text in the legend
lty=c(1,1,1), # gives the legend appropriate symbols (lines)
lwd=c(1,1,1),col=c("blue","red","green")) # gives the legend lines the correct color and width
The resulting plot however has two horizontal lines (broken lines) at 0 and 1 besides the box. And, the origin of the box has a space below zero on the vertical axis and a space to the left of zero on the horizontal axis. May you suggest how to remove these space and the additional lines. I wanted but could not post the plot.
EDITED:
A sample data can be generated as follows:
sample data
n <- 1000; u <- runif(n)
a <- -4.46; b <- 1.6; c <- -4.63
d <- ( a * u ) + (b * ( ( 1.5 * ( u ** 2 )) - 0.5 )) + (c * ( (2.5 * (u ** 3)) - (1.5 * u )))
x1 <- -126/d; x2 <- -131/d; x3 <- -187/d
It sounds like you are asking for the style provided by different settings for 'xaxs' and 'yaxs', and maybe 'xlim':
Try:
plot.ecdf(x1, col="blue",
main="Distribution XYZ",
xlab="x_i", ylab="Prob(x_i<=y)",
xaxs="i",yaxs="i", xlim=c(0,1000) )
lines(ecdf(x2), col="red")
lines(ecdf(x3), col="green")
legend(600,0.6,
c("x1","x2","x3"),
lty=c(1,1,1),
lwd=c(1,1,1),col=c("blue","red","green"))
use the yaxs = "i" argument:
plot.ecdf(x1, col="blue",
main="Distribution XYZ",
xlab="x_i", ylab="Prob(x_i<=y)", yaxs = "i")
This will align the axis at the ylim points (which supposedly are 0 and 1 for plot.ecdf).
If you also want to remove the dotted line at 0 and 1, just call box():
box()
You could use the arguments xlim, ylim and col.01line:
x1 = runif(100)
x2 = runif(100)
x3 = runif(100)
plot.ecdf(x1, col="blue", main="Distribution XYZ",xlab="x_i", ylab="Prob(x_i<=y)", ylim=c(0, 1), xlim=c(0,1), col.01line="white", verticals=FALSE)
lines(ecdf(x2), col="red", col.01line="white")
lines(ecdf(x3), col="green", col.01line="white")
legend(600,0.6,c("x1","x2","x3"), lty=c(1,1,1), lwd=c(1,1,1),col=c("blue","red","green"))
Alternatively, you could just use the generic plot:
f1 = ecdf(x1)
f2 = ecdf(x2)
f3 = ecdf(x3)
pts = seq(0, 1, 0.01)
plot(pts, f1(pts), col="blue", type="b", xlab="x_i", ylab="Prob(x_i<=y)", pch=16, main="Distribution XYZ")
lines(pts, f2(pts), col="red", type="b", pch=16)
lines(pts, f3(pts), col="green", type="b", pch=16)
legend("topleft", c("x1","x2","x3"), fill=c("blue","red","green"))
I borrowed this code from statmethods[dot]net. The result is a coloured area under a normal distribution.
mean=100; sd=15
lb=80; ub=120
x <- seq(-4,4,length=100)*sd + mean
hx <- dnorm(x,mean,sd)
plot(x, hx, type="n", xlab="IQ Values", ylab="Density",
main="Normal Distribution", axes=FALSE)
i <- x >= lb & x <= ub
lines(x, hx)
polygon(c(lb,x[i],ub), c(0,hx[i],0), col="red")
area <- pnorm(ub, mean, sd) - pnorm(lb, mean, sd)
result <- paste("P(",lb,"< IQ <",ub,") =",
signif(area, digits=3))
mtext(result,2)
I was wondering if it is possible to select an image as the colour of the red polygon?
Many thanks!
Working with your code, I basically suggest plotting an image (I used a random image below) using the png package (based on advice from In R, how to plot with a png as background?) and then plot your lines over that.
mean=100; sd=15
lb=80; ub=120
x <- seq(-4,4,length=100)*sd + mean
hx <- dnorm(x,mean,sd)
# load package and an image
library(png)
ima <- readPNG("Red_Hot_Sun.PNG")
# plot an empty plot with your labels, etc.
plot(1,xlim=c(min(x),max(x)), type="n", xlab="IQ Values", ylab="Density",
main="Normal Distribution", axes=FALSE)
# put in the image
lim <- par()
rasterImage(ima, lim$usr[1], lim$usr[3], lim$usr[2], lim$usr[4])
# add your plot
par(new=TRUE)
plot(x, hx, xlim=c(min(x),max(x)), type="l", xlab="", ylab="", axes=FALSE)
i <- x >= lb & x <= ub
lines(x, hx)
# add a polygon to cover the background above plot
polygon(c(x,180,180,20,20), c(hx,0,1,1,0), col="white")
# add polygons to cover the areas under the plot you don't want visible
polygon(c(-20,-20,x[x<=lb],lb), c(-10,min(hx),hx[x<=lb],-10), col="white")
polygon(c(ub,x[x>=ub],200,200), c(-1,hx[x>=ub],min(hx),-1), col="white")
# add your extra text
area <- pnorm(ub, mean, sd) - pnorm(lb, mean, sd)
result <- paste("P(",lb,"< IQ <",ub,") =",
signif(area, digits=3))
mtext(result,2)
Gives you:
the type argument to xyplot() can take "s" for "steps." From help(plot):
The two step types differ in their x-y preference: Going from
(x1,y1) to (x2,y2) with x1 < x2, 'type = "s"' moves first
horizontal, then vertical, whereas 'type = "S"' moves the other
way around.
i.e. if you use type="s", the horizontal part of the step has its left end attached to the data point, while type="S" has its right end attached to the data point.
library(lattice)
set.seed(12345)
num.points <- 10
my.df <- data.frame(x=sort(sample(1:100, num.points)),
y=sample(1:40, num.points, replace=TRUE))
xyplot(y~x, data=my.df, type=c("p","s"), col="blue", main='type="s"')
xyplot(y~x, data=my.df, type=c("p","S"), col="red", main='type="S"')
How could one achieve a "step" plot, where the vertical motion happens between data points points, i.e. at x1 + (x2-x1)/2, so that the horizontal part of the step is centered on the data point?
Edited to include some example code. better late than never I suppose.
I am using excellent #nico answer to give its lattice version. Even I am ok with #Dwin because the question don't supply a reproducible example, but customizing lattice panel is sometimes challenging.
The idea is to use panel.segments which is the equivalent of segments of base graphics.
library(lattice)
xyplot(y~x,
panel =function(...){
ll <- list(...)
x <- ll$x
y <- ll$y
x.start <- x - (c(0, diff(x)/2))
x.end <- x + (c(diff(x)/2, 0))
panel.segments(x.start, y, x.end, y, col="orange", lwd=2)
panel.segments(x.end[-length(x.end)], y[1:(length(y)-1)],
x.end[-length(x.end)], y[-1], col="orange", lwd=2)
## this is optional just to compare with type s
panel.xyplot(...,type='s')
## and type S
panel.xyplot(...,type='S')
})
This is a base graphics solution, as I am not too much of an expert in lattice.
Essentially you can use segments to draw first the horizontal, then the vertical steps, passing the shifted coordinates as a vector.
Here is an example:
set.seed(12345)
# Generate some data
num.points <- 10
x <- sort(sample(1:100, num.points))
y <- sample(1:40, num.points, replace=T)
# Plot the data with style = "s" and "S"
par(mfrow=c(1,3))
plot(x, y, "s", col="red", lwd=2, las=1,
main="Style: 's'", xlim=c(0, 100))
points(x, y, pch=19, col="red", cex=0.8)
plot(x, y, "S", col="blue", lwd=2, las=1,
main="Style: 'S'", xlim=c(0, 100))
points(x, y, pch=19, col="blue", cex=0.8)
# Now plot our points
plot(x, y, pch=19, col="orange", cex=0.8, las=1,
main="Centered steps", xlim=c(0, 100))
# Calculate the starting and ending points of the
# horizontal segments, by shifting the x coordinates
# by half the difference with the next point
# Note we leave the first and last point as starting and
# ending points
x.start <- x - (c(0, diff(x)/2))
x.end <- x + (c(diff(x)/2, 0))
# Now draw the horizontal segments
segments(x.start, y, x.end, y, col="orange", lwd=2)
# and the vertical ones (no need to draw the last one)
segments(x.end[-length(x.end)], y[1:(length(y)-1)],
x.end[-length(x.end)], y[-1], col="orange", lwd=2)
Here is the result:
I would like to rotate a histogram in R, plotted by hist(). The question is not new, and in several forums I have found that it is not possible. However, all these answers date back to 2010 or even later.
Has anyone found a solution meanwhile?
One way to get around the problem is to plot the histogram via barplot() that offers the option "horiz=TRUE". The plot works fine but I fail to overlay a density in the barplots. The problem probably lies in the x-axis since in the vertical plot, the density is centered in the first bin, while in the horizontal plot the density curve is messed up.
Any help is very much appreciated!
Thanks,
Niels
Code:
require(MASS)
Sigma <- matrix(c(2.25, 0.8, 0.8, 1), 2, 2)
mvnorm <- mvrnorm(1000, c(0,0), Sigma)
scatterHist.Norm <- function(x,y) {
zones <- matrix(c(2,0,1,3), ncol=2, byrow=TRUE)
layout(zones, widths=c(2/3,1/3), heights=c(1/3,2/3))
xrange <- range(x) ; yrange <- range(y)
par(mar=c(3,3,1,1))
plot(x, y, xlim=xrange, ylim=yrange, xlab="", ylab="", cex=0.5)
xhist <- hist(x, plot=FALSE, breaks=seq(from=min(x), to=max(x), length.out=20))
yhist <- hist(y, plot=FALSE, breaks=seq(from=min(y), to=max(y), length.out=20))
top <- max(c(xhist$counts, yhist$counts))
par(mar=c(0,3,1,1))
plot(xhist, axes=FALSE, ylim=c(0,top), main="", col="grey")
x.xfit <- seq(min(x),max(x),length.out=40)
x.yfit <- dnorm(x.xfit,mean=mean(x),sd=sd(x))
x.yfit <- x.yfit*diff(xhist$mids[1:2])*length(x)
lines(x.xfit, x.yfit, col="red")
par(mar=c(0,3,1,1))
plot(yhist, axes=FALSE, ylim=c(0,top), main="", col="grey", horiz=TRUE)
y.xfit <- seq(min(x),max(x),length.out=40)
y.yfit <- dnorm(y.xfit,mean=mean(x),sd=sd(x))
y.yfit <- y.yfit*diff(yhist$mids[1:2])*length(x)
lines(y.xfit, y.yfit, col="red")
}
scatterHist.Norm(mvnorm[,1], mvnorm[,2])
scatterBar.Norm <- function(x,y) {
zones <- matrix(c(2,0,1,3), ncol=2, byrow=TRUE)
layout(zones, widths=c(2/3,1/3), heights=c(1/3,2/3))
xrange <- range(x) ; yrange <- range(y)
par(mar=c(3,3,1,1))
plot(x, y, xlim=xrange, ylim=yrange, xlab="", ylab="", cex=0.5)
xhist <- hist(x, plot=FALSE, breaks=seq(from=min(x), to=max(x), length.out=20))
yhist <- hist(y, plot=FALSE, breaks=seq(from=min(y), to=max(y), length.out=20))
top <- max(c(xhist$counts, yhist$counts))
par(mar=c(0,3,1,1))
barplot(xhist$counts, axes=FALSE, ylim=c(0, top), space=0)
x.xfit <- seq(min(x),max(x),length.out=40)
x.yfit <- dnorm(x.xfit,mean=mean(x),sd=sd(x))
x.yfit <- x.yfit*diff(xhist$mids[1:2])*length(x)
lines(x.xfit, x.yfit, col="red")
par(mar=c(3,0,1,1))
barplot(yhist$counts, axes=FALSE, xlim=c(0, top), space=0, horiz=TRUE)
y.xfit <- seq(min(x),max(x),length.out=40)
y.yfit <- dnorm(y.xfit,mean=mean(x),sd=sd(x))
y.yfit <- y.yfit*diff(yhist$mids[1:2])*length(x)
lines(y.xfit, y.yfit, col="red")
}
scatterBar.Norm(mvnorm[,1], mvnorm[,2])
#
Source of scatter plot with marginal histograms (click first link after "adapted from..."):
http://r.789695.n4.nabble.com/newbie-scatterplot-with-marginal-histograms-done-and-axes-labels-td872589.html
Source of density in a scatter plot:
http://www.statmethods.net/graphs/density.html
scatterBarNorm <- function(x, dcol="blue", lhist=20, num.dnorm=5*lhist, ...){
## check input
stopifnot(ncol(x)==2)
## set up layout and graphical parameters
layMat <- matrix(c(2,0,1,3), ncol=2, byrow=TRUE)
layout(layMat, widths=c(5/7, 2/7), heights=c(2/7, 5/7))
ospc <- 0.5 # outer space
pext <- 4 # par extension down and to the left
bspc <- 1 # space between scatter plot and bar plots
par. <- par(mar=c(pext, pext, bspc, bspc),
oma=rep(ospc, 4)) # plot parameters
## scatter plot
plot(x, xlim=range(x[,1]), ylim=range(x[,2]), ...)
## 3) determine barplot and height parameter
## histogram (for barplot-ting the density)
xhist <- hist(x[,1], plot=FALSE, breaks=seq(from=min(x[,1]), to=max(x[,1]),
length.out=lhist))
yhist <- hist(x[,2], plot=FALSE, breaks=seq(from=min(x[,2]), to=max(x[,2]),
length.out=lhist)) # note: this uses probability=TRUE
## determine the plot range and all the things needed for the barplots and lines
xx <- seq(min(x[,1]), max(x[,1]), length.out=num.dnorm) # evaluation points for the overlaid density
xy <- dnorm(xx, mean=mean(x[,1]), sd=sd(x[,1])) # density points
yx <- seq(min(x[,2]), max(x[,2]), length.out=num.dnorm)
yy <- dnorm(yx, mean=mean(x[,2]), sd=sd(x[,2]))
## barplot and line for x (top)
par(mar=c(0, pext, 0, 0))
barplot(xhist$density, axes=FALSE, ylim=c(0, max(xhist$density, xy)),
space=0) # barplot
lines(seq(from=0, to=lhist-1, length.out=num.dnorm), xy, col=dcol) # line
## barplot and line for y (right)
par(mar=c(pext, 0, 0, 0))
barplot(yhist$density, axes=FALSE, xlim=c(0, max(yhist$density, yy)),
space=0, horiz=TRUE) # barplot
lines(yy, seq(from=0, to=lhist-1, length.out=num.dnorm), col=dcol) # line
## restore parameters
par(par.)
}
require(mvtnorm)
X <- rmvnorm(1000, c(0,0), matrix(c(1, 0.8, 0.8, 1), 2, 2))
scatterBarNorm(X, xlab=expression(italic(X[1])), ylab=expression(italic(X[2])))
It may be helpful to know that the hist() function invisibly returns all the information that you need to reproduce what it does using simpler plotting functions, like rect().
vals <- rnorm(10)
A <- hist(vals)
A
$breaks
[1] -1.5 -1.0 -0.5 0.0 0.5 1.0 1.5
$counts
[1] 1 3 3 1 1 1
$intensities
[1] 0.2 0.6 0.6 0.2 0.2 0.2
$density
[1] 0.2 0.6 0.6 0.2 0.2 0.2
$mids
[1] -1.25 -0.75 -0.25 0.25 0.75 1.25
$xname
[1] "vals"
$equidist
[1] TRUE
attr(,"class")
[1] "histogram"
You can create the same histogram manually like this:
plot(NULL, type = "n", ylim = c(0,max(A$counts)), xlim = c(range(A$breaks)))
rect(A$breaks[1:(length(A$breaks) - 1)], 0, A$breaks[2:length(A$breaks)], A$counts)
With those parts, you can flip the axes however you like:
plot(NULL, type = "n", xlim = c(0, max(A$counts)), ylim = c(range(A$breaks)))
rect(0, A$breaks[1:(length(A$breaks) - 1)], A$counts, A$breaks[2:length(A$breaks)])
For similar do-it-yourselfing with density(), see:
Axis-labeling in R histogram and density plots; multiple overlays of density plots
I'm not sure whether it is of interest, but I sometimes want to use horizontal histograms without any packages and be able to write or draw at any position of the graphic.
That's why I wrote the following function, with examples provided below. If anyone knows a package to which this would fit well, please write me: berry-b at gmx.de
Please be sure not to have a variable hpos in your workspace, as it will be overwritten with a function. (Yes, for a package I would need to insert some safety parts in the function).
horiz.hist <- function(Data, breaks="Sturges", col="transparent", las=1,
ylim=range(HBreaks), labelat=pretty(ylim), labels=labelat, border=par("fg"), ... )
{a <- hist(Data, plot=FALSE, breaks=breaks)
HBreaks <- a$breaks
HBreak1 <- a$breaks[1]
hpos <<- function(Pos) (Pos-HBreak1)*(length(HBreaks)-1)/ diff(range(HBreaks))
barplot(a$counts, space=0, horiz=T, ylim=hpos(ylim), col=col, border=border,...)
axis(2, at=hpos(labelat), labels=labels, las=las, ...)
print("use hpos() to address y-coordinates") }
For examples
# Data and basic concept
set.seed(8); ExampleData <- rnorm(50,8,5)+5
hist(ExampleData)
horiz.hist(ExampleData, xlab="absolute frequency")
# Caution: the labels at the y-axis are not the real coordinates!
# abline(h=2) will draw above the second bar, not at the label value 2. Use hpos:
abline(h=hpos(11), col=2)
# Further arguments
horiz.hist(ExampleData, xlim=c(-8,20))
horiz.hist(ExampleData, main="the ... argument worked!", col.axis=3)
hist(ExampleData, xlim=c(-10,40)) # with xlim
horiz.hist(ExampleData, ylim=c(-10,40), border="red") # with ylim
horiz.hist(ExampleData, breaks=20, col="orange")
axis(2, hpos(0:10), labels=F, col=2) # another use of hpos()
One shortcoming: the function doesn't work with breakpoints provided as a vector with different widths of the bars.
Thank you, Tim and Paul. You made me think harder and use what hist() actually provides.
This is my solution now (with great help from Alex Pl.):
scatterBar.Norm <- function(x,y) {
zones <- matrix(c(2,0,1,3), ncol=2, byrow=TRUE)
layout(zones, widths=c(5/7,2/7), heights=c(2/7,5/7))
xrange <- range(x)
yrange <- range(y)
par(mar=c(3,3,1,1))
plot(x, y, xlim=xrange, ylim=yrange, xlab="", ylab="", cex=0.5)
xhist <- hist(x, plot=FALSE, breaks=seq(from=min(x), to=max(x), length.out=20))
yhist <- hist(y, plot=FALSE, breaks=seq(from=min(y), to=max(y), length.out=20))
top <- max(c(xhist$density, yhist$density))
par(mar=c(0,3,1,1))
barplot(xhist$density, axes=FALSE, ylim=c(0, top), space=0)
x.xfit <- seq(min(x),max(x),length.out=40)
x.yfit <- dnorm(x.xfit, mean=mean(x), sd=sd(x))
x.xscalefactor <- x.xfit / seq(from=0, to=19, length.out=40)
lines(x.xfit/x.xscalefactor, x.yfit, col="red")
par(mar=c(3,0,1,1))
barplot(yhist$density, axes=FALSE, xlim=c(0, top), space=0, horiz=TRUE)
y.xfit <- seq(min(y),max(y),length.out=40)
y.yfit <- dnorm(y.xfit, mean=mean(y), sd=sd(y))
y.xscalefactor <- y.xfit / seq(from=0, to=19, length.out=40)
lines(y.yfit, y.xfit/y.xscalefactor, col="red")
}
For examples:
require(MASS)
#Sigma <- matrix(c(2.25, 0.8, 0.8, 1), 2, 2)
Sigma <- matrix(c(1, 0.8, 0.8, 1), 2, 2)
mvnorm <- mvrnorm(1000, c(0,0), Sigma) ; scatterBar.Norm(mvnorm[,1], mvnorm[,2])
An asymmetric Sigma leads to a somewhat bulkier histogram of the respective axis.
The code is left deliberately "unelegant" in order to increase comprehensibility (for myself when I revisit it later...).
Niels
When using ggplot, flipping axes works very well. See for example this example which shows how to do this for a boxplot, but it works equally well for a histogram I assume. In ggplot one can quite easily overlay different plot types, or geometries in ggplot2 jargon. So combining a density plot and a histogram should be easy.