Draw the outer envelope of multiple density graphs in one graph - r

I would like to draw the external outer envelope (contour) of a distribution graph which includes multiple density graphs and extract its value.
X <- c(1,2,1,4,3,1,2,8,9,0,5,4,2,2,5,5,7,8,8,9,5,6,5,6,3,4,5,3,4,5,4)
Y <- c(0,3,1,1,3,2,7,1,2,1,9,2,1,3,6,1,9,5,2,9,1,1,2,1,3,4,6,9,4,5,2)
Z <- c(1,4,9,5,7,8,2,8,9,0,5,4,2,2,5,5,7,8,8,9,5,6,5,6,9,9,9,2,6,7,1)
W <- c(1,8,9,7,8,9,12,3,11,21,5,4,8,8,2,3,2,1,2,3,4,5,6,3,1,9,2,1,8,4,1)
Q <- c(1,8,9,7,8,9,12,3,16,30,2,3,4,4,4,3,7,7,2,3,2,5,9,3,2,1,1,1,1,1,0)
n <- data.frame(X,Y,Z,W,Q)
plot((density(X)))
fun <- function(x)lines((density(n[[x]])))
t <- seq(1:length(n))
lapply(t,fun)
Have searched on web and I could find the contour but it can not be applied to a distribution graph generated as above.

I suspect you are asking for what might be called the "outer envelope". My first effort had a couple of problems as you can see:
lines( x= density(X)$x,
y=apply( do.call( cbind,
lapply(n, function(x){ density(x)$y})), 1, max), col="red", lwd=3)
And I think that also highlights problems with your efforts as well, since it illustrates the problems with not establishing a common grid on which to hang your density estimates.
So establish limits and re-do:
from=min(X,Y,Z,W,Q); to = max(X,Y,Z,W,Q)
png()
plot( Xd <- density(X, from=from, to=to))
fun <- function(x) lines(density(n[[x]],from=from, to = to ) )
t <- seq(1:length(n))
lapply(t,fun)
lines( x= density(X, from=from, to=to )$x,
y=apply( do.call( cbind, lapply(n,
function(x){ density(x, from=from, to = to)$y})), 1, max), col="red", lwd=3)
dev.off()

Here's an answer very similar to BondedDust's (and based on his from/to approach) that I think is easier to read and possibly faster if the dataset is very large, because it doesn't calculate the densities twice.
from <- min(n)
to <- max(n)
t <- seq(1:length(n))
ds <- lapply(t, function(i) density(n[[i]], from=from, to=to)) #Densities
maxd <- apply(sapply(ds, "[[", "y"), 1, max) #Max y of each x
plot(density(X), type="n", ylim=c(0, max(maxd)), xlim=c(0, 15))
for (i in t) lines(ds[[i]])
lines(seq(from, to, length.out = length(maxd)), maxd, col="red", lwd=3)
I manually set the plot xlim for better visualization.

Related

I need x on x axis y on y axis and fx and fy in plot area in R

I need this type of figure:
I need x on x axis, y on y axis and fx and fy in ploy area. Could you please help me in R i.e. both curves intersect to each other. My code is
gx <- expand.grid(x=seq(1,5,length=50))
fx <- function(x) { exp(-x) }
gx$fx <- apply(gx,1,fx)
plot(gx, type="l",col="red")
gy <- expand.grid(y=seq(1,5,length=50))
fy <- function(y) { y*exp(-y) }
gy$fy <- apply(gy, 1, fy)
par(new=TRUE)
plot(gy, type="l", col="green")
Not 100% sure I understand what this question means, but if you are looking to label your axes you can use the xlab and ylab graphical parameters:
ie:
plot(gx, type="l",col="red", xlab="label for x axis", ylab="label for y axis")
Here's the plot with base:
plot(gx, type="n", xlab="", ylab="")
for(i in 1:2) lines(get(c("gx", "gy")[i]), col=c("red", "green")[i])
title(xlab="x", ylab="y")
I personally prefer to do a little bit more data-manipulation to combine both data (gx and gy) into a single data.frame in long-form using "dplyr" and "tidyr" packages :
dat <- data.frame(gx, gy)
dat <- dat %>%
gather(xvar, x, x,y) %>%
gather(yvar, y, fx, fy)
head(dat)
# xvar x yvar y
# 1 x 1.000000 fx 0.3678794
# 2 x 1.081633 fx 0.3390415
# 3 x 1.163265 fx 0.3124642
# 4 x 1.244898 fx 0.2879703
# 5 x 1.326531 fx 0.2653964
# 6 x 1.408163 fx 0.2445921
This will make it easy to visualise with ggplot:
ggplot(dat, aes(x,y, col=yvar)) + geom_line()
Use of legend and lines functions along with plot would give the following figure.
plot(gx,type="l",col="red", xlab ="x", ylab="y")
lines(gy,col="green")
legend("topright", inset=.05, cex = 1, c("f(x)","f(y)"), horiz=TRUE, lty=c(1,1), lwd=c(2,2), col=c("red","green"), text.font=3)
It doesn't seem anyone so far has answered about putting f(x) and f(y) in the plot area. You can do this with text(), though it's not the most elegant solution because you have to manually give the coordinates of where to place the text.
The data you gave:
gx <- expand.grid(x=seq(1,5,length=50))
fx <- function(x) { exp(-x) }
gx$fx <- apply(gx,1,fx)
gy <- expand.grid(y=seq(1,5,length=50))
fy <- function(y) { y*exp(-y) }
gy$fy <- apply(gy, 1, fy)
Creating the plot:
plot(gx,type="l",col="red", xlab ="x", ylab="y")
lines(gy,col="green")
text(x=c(3,2),y=c(0.18,0.1), labels=c("f(y)","f(x)"))
Gives this:
I hope this helps! The part that is confusing about your question is when you say that you need the lines to intersect. That's an issue with your data and/or the functions you are applying, and not one we can answer without some clarity.

Hacking the plot data.frame method in R to include histograms on the diagonals and lowess fits in the scatterplots

Dummy example:
N=1000
x1 = rgamma(N,5,10)
x2 = rnorm(N)+x1
x3 = (x1+x2)/(runif(N)+1)
d = data.frame(x1,x2,x3)
plot(d,col=rgb(1,0,0,.5),pch=19,cex=.5)
I'd like to take the plot data frame method and augment it to include histograms on the diagonals and lowess fits on each of the scatterplots. Is it possible to do without completely re-writing the function? Where do I even find the source code for methods?
When you plot data.frames like this, you are basically calling the pairs() function. See ?pairs for more information. There is an example if a histogram there. Here's an example that also plots a loess line
panel.hist <- function(x, ...)
{
usr <- par("usr"); on.exit(par(usr))
par(usr = c(usr[1:2], 0, 1.5) )
h <- hist(x, plot = FALSE)
breaks <- h$breaks; nB <- length(breaks)
y <- h$counts; y <- y/max(y)
rect(breaks[-nB], 0, breaks[-1], y, ...)
}
panel.loess<-function(x, y, ...) {
ll <- loess(y~x)
points(x,y, ...)
nx<-seq(min(x), max(x), length.out=100)
lines(nx, predict(ll, nx), col="black")
}
pairs(d,col=rgb(1,0,0,.5),pch=19,cex=.5,
diag.panel=panel.hist,
lower.panel=panel.loess)
which gives

Adding stippling to image/contour plot

have some data that I would like to add "stippling" to show where it is "important", as they do in the IPCC plots
At the moment I am really struggling with trying to do this in R.
If I make up some test data and plot it:
data <- array(runif(12*6), dim=c(12,6) )
over <- ifelse(data > 0.5, 1, 0 )
image(1:12, 1:6, data)
What I would like to finally do is over-plot some points based on the array "over" on top of the current image.
Any suggestions!??
This should help - I had do do a similar thing before and wrote a function that I posted here.
#required function from www.menugget.blogspot.com
matrix.poly <- function(x, y, z=mat, n=NULL){
if(missing(z)) stop("Must define matrix 'z'")
if(missing(n)) stop("Must define at least 1 grid location 'n'")
if(missing(x)) x <- seq(0,1,,dim(z)[1])
if(missing(y)) y <- seq(0,1,,dim(z)[2])
poly <- vector(mode="list", length(n))
for(i in seq(length(n))){
ROW <- ((n[i]-1) %% dim(z)[1]) +1
COL <- ((n[i]-1) %/% dim(z)[1]) +1
dist.left <- (x[ROW]-x[ROW-1])/2
dist.right <- (x[ROW+1]-x[ROW])/2
if(ROW==1) dist.left <- dist.right
if(ROW==dim(z)[1]) dist.right <- dist.left
dist.down <- (y[COL]-y[COL-1])/2
dist.up <- (y[COL+1]-y[COL])/2
if(COL==1) dist.down <- dist.up
if(COL==dim(z)[2]) dist.up <- dist.down
xs <- c(x[ROW]-dist.left, x[ROW]-dist.left, x[ROW]+dist.right, x[ROW]+dist.right)
ys <- c(y[COL]-dist.down, y[COL]+dist.up, y[COL]+dist.up, y[COL]-dist.down)
poly[[i]] <- data.frame(x=xs, y=ys)
}
return(poly)
}
#make vector of grids for hatching
incl <- which(over==1)
#make polygons for each grid for hatching
polys <- matrix.poly(1:12, 1:6, z=over, n=incl)
#plot
png("hatched_image.png")
image(1:12, 1:6, data)
for(i in seq(polys)){
polygon(polys[[i]], density=10, angle=45, border=NA)
polygon(polys[[i]], density=10, angle=-45, border=NA)
}
box()
dev.off()
Or, and alternative with "stipples":
png("hatched_image2.png")
image(1:12, 1:6, data)
for(i in seq(polys)){
xran <- range(polys[[i]]$x)
yran <- range(polys[[i]]$y)
xs <- seq(xran[1], xran[2],,5)
ys <- seq(yran[1], yran[2],,5)
grd <- expand.grid(xs,ys)
points(grd, pch=19, cex=0.5)
}
box()
dev.off()
Update:
In (very late) response to Paul Hiemstra's comment, here are two more examples with a matrix of higher resolution. The hatching maintains a nice regular pattern, but it is not nice to look at when broken up. The stippled example is much nicer:
n <- 100
x <- 1:n
y <- 1:n
M <- list(x=x, y=y, z=outer(x, y, FUN = function(x,y){x^2 * y * rlnorm(n^2,0,0.2)}))
image(M)
range(M$z)
incl <- which(M$z>5e5)
polys <- matrix.poly(M$x, M$y, z=M$z, n=incl)
png("hatched_image.png", height=5, width=5, units="in", res=400)
op <- par(mar=c(3,3,1,1))
image(M)
for(i in seq(polys)){
polygon(polys[[i]], density=10, angle=45, border=NA, lwd=0.5)
polygon(polys[[i]], density=10, angle=-45, border=NA, lwd=0.5)
}
box()
par(op)
dev.off()
png("stippled_image.png", height=5, width=5, units="in", res=400)
op <- par(mar=c(3,3,1,1))
image(M)
grd <- expand.grid(x=x, y=y)
points(grd$x[incl], grd$y[incl], pch=".", cex=1.5)
box()
par(op)
dev.off()
Do it using the coordinate positioning mechanism of ?image [1].
data(volcano)
m <- volcano
dimx <- nrow(m)
dimy <- ncol(m)
d1 <- list(x = seq(0, 1, length = dimx), y = seq(0, 1, length = dimy), z = m)
With your 'image' constructed that way you keep the structure with the object, and its
coordinates intact. You can collect multiple matrices into a 3D array or as multiple
elements, but you need to augment image() in order to handle that, so I keep them
separate here.
Make a copy of the data to specify an interesting area.
d2 <- d1
d2$z <- d2$z > 155
Use the coordinates to specify which cells are interesting. This is expensive if you have a very big raster, but it's super easy to do.
pts <- expand.grid(x = d2$x, y = d2$y)
pts$over <- as.vector(d2$z)
Set up the plot.
op <- par(mfcol = c(2, 1))
image(d1)
image(d1)
points(pts$x[pts$over], pts$y[pts$over], cex = 0.7)
par(op)
Don't forget to modify the plotting of points to get different effects, in particular a very dense grid with lots of points will take ages to draw all those little circles. pch = "." is a good choice.
Now, do you have some real data to plot on that nice projection? See examples here for some of the options: http://spatial-analyst.net/wiki/index.php?title=Global_datasets
[1] R has classes for more sophisticated handling of raster data, see package sp and raster
for two different approaches.
This is a solution in the spirit of #mdsummer's comment using ggplot2. I first draw the grid, and then draw +'es at the locations where a certain value has been exceeded. Note that ggplot2 works with data.frame's, not with multi-dimensional arrays or matrices. You can use melt from the reshape package to convert from an array / marix to a data.frame flat structure.
Here is a concrete example using the example data from the geom_tile documentation:
pp <- function (n,r=4) {
x <- seq(-r*pi, r*pi, len=n)
df <- expand.grid(x=x, y=x)
df$r <- sqrt(df$x^2 + df$y^2)
df$z <- cos(df$r^2)*exp(-df$r/6)
df
}
require(ggplot2)
dat = pp(200)
over = dat[,c("x","y")]
over$value = with(dat, ifelse(z > 0.5, 1, 0))
ggplot(aes(x = x, y = y), data = dat) +
geom_raster(aes(fill = z)) +
scale_fill_gradient2() +
geom_point(data = subset(over, value == 1), shape = "+", size = 1)
This is probably coming too late, but I'd like to post my answer as a reference too.
One nice option for spatial data is to use the rasterVis package. Once you have a "base" raster object, and the "mask" object, which you will use to draw the stippling, you can do something like:
require(raster)
require(rasterVis)
# Scratch raster objects
data(volcano)
r1 <- raster(volcano)
# Here we are selecting only values from 160 to 180.
# This will be our "mask" layer.
over <- ifelse(volcano >=160 & volcano <=180, 1, NA)
r2 <- raster(over)
# And this is the key step:
# Converting the "mask" raster to spatial points
r.mask <- rasterToPoints(r2, spatial=TRUE)
# Plot
levelplot(r1, margin=F) +
layer(sp.points(r.mask, pch=20, cex=0.3, alpha=0.8))
which resembles the map that the OP was looking for. Parameters of the points such as color, size and type can be fine tuned. ?sp.points provides all the arguments that can be used to do that.

plotting SPX vs. VIX using quantmod in R

I just got introduced to quantmod, and looked at examples here
http://www.r-chart.com/2010/06/stock-analysis-using-r.html
I tried the following code,
getSymbols(c("^GSPC","^VIX"))
head(as.xts(merge(GSPC,VIX)))
chartSeries(c(GSPC, VIX), subset='last 3 months')
but the graph was completely out-of-scale, so I hope some of the experts on this forum can show me how to plot this correctly.
Try this:
chart_Series(GSPC)
add_Series(OHLC(VIX)+1000,on=1)
You need to use OHLC to remove the volume from VIX, since it's always zero and seems to hose up the automatic ylim calculation. I also added 1000 to make the levels of the two series to be a bit closer together.
Here is an example that does not use chartSeries.
ind <- function(x) {
# Divide each column by the first non-NA value
# (There may already be a function to do that.)
coredata(x) <- t(t(coredata(x)) / apply(coredata(x),2,function(u){ c(u[!is.na(u)&u!=0],NA)[1] }))
x
}
x <- cbind( Ad(GSPC), Ad(VIX) )
x <- x["2011-11::"]
# Using base graphics
matplot(
index(x), coredata(ind(x)),
xlab="", ylab="", main="",
type="l", lty=1, lwd=3, axes=FALSE
)
abline(h=1, lty=3, col="lightgrey")
axis(2, las=1)
axis.Date(1, index(x))
box()
legend( "topleft", gsub("\\..*", "", names(x)), lty=1, lwd=3, col=1:2 )
# If you prefer ggplot2
library(ggplot2)
library(reshape2)
d <- data.frame( date = index(x), coredata(ind(x)) )
names(d) <- gsub("\\..*", "", names(d))
d <- melt(d, id.vars="date")
ggplot(d, aes(date, value, color=variable)) + geom_line(size=2)

Plot weighted frequency matrix

This question is related to two different questions I have asked previously:
1) Reproduce frequency matrix plot
2) Add 95% confidence limits to cumulative plot
I wish to reproduce this plot in R:
I have got this far, using the code beneath the graphic:
#Set the number of bets and number of trials and % lines
numbet <- 36
numtri <- 1000
#Fill a matrix where the rows are the cumulative bets and the columns are the trials
xcum <- matrix(NA, nrow=numbet, ncol=numtri)
for (i in 1:numtri) {
x <- sample(c(0,1), numbet, prob=c(5/6,1/6), replace = TRUE)
xcum[,i] <- cumsum(x)/(1:numbet)
}
#Plot the trials as transparent lines so you can see the build up
matplot(xcum, type="l", xlab="Number of Trials", ylab="Relative Frequency", main="", col=rgb(0.01, 0.01, 0.01, 0.02), las=1)
My question is: How can I reproduce the top plot in one pass, without plotting multiple samples?
Thanks.
You can produce this plot...
... by using this code:
boring <- function(x, occ) occ/x
boring_seq <- function(occ, length.out){
x <- seq(occ, length.out=length.out)
data.frame(x = x, y = boring(x, occ))
}
numbet <- 31
odds <- 6
plot(1, 0, type="n",
xlim=c(1, numbet + odds), ylim=c(0, 1),
yaxp=c(0,1,2),
main="Frequency matrix",
xlab="Successive occasions",
ylab="Relative frequency"
)
axis(2, at=c(0, 0.5, 1))
for(i in 1:odds){
xy <- boring_seq(i, numbet+1)
lines(xy$x, xy$y, type="o", cex=0.5)
}
for(i in 1:numbet){
xy <- boring_seq(i, odds+1)
lines(xy$x, 1-xy$y, type="o", cex=0.5)
}
You can also use Koshke's method, by limiting the combinations of values to those with s<6 and at Andrie's request added the condition on the difference of Ps$n and ps$s to get a "pointed" configuration.
ps <- ldply(0:35, function(i)data.frame(s=0:i, n=i))
plot.new()
plot.window(c(0,36), c(0,1))
apply(ps[ps$s<6 & ps$n - ps$s < 30, ], 1, function(x){
s<-x[1]; n<-x[2];
lines(c(n, n+1, n, n+1), c(s/n, s/(n+1), s/n, (s+1)/(n+1)), type="o")})
axis(1)
axis(2)
lines(6:36, 6/(6:36), type="o")
# need to fill in the unconnected points on the upper frontier
Weighted Frequency Matrix is also called Position Weight Matrix (in bioinformatics).
It can be represented in a form of a sequence logo.
This is at least how I plot weighted frequency matrix.
library(cosmo)
data(motifPWM); attributes(motifPWM) # Loads a sample position weight matrix (PWM) containing 8 positions.
plot(motifPWM) # Plots the PWM as sequence logo.

Resources