R: Draw a polygon with conditional colour - r

I want to colour the area under a curve. The area with y > 0 should be red, the area with y < 0 should be green.
x <- c(1:4)
y <- c(0,1,-1,2,rep(0,4))
plot(y[1:4],type="l")
abline(h=0)
Using ifelse() does not work:
polygon(c(x,rev(x)),y,col=ifelse(y>0,"red","green"))
What I achieved so far is the following:
polygon(c(x,rev(x)),y,col="green")
polygon(c(x,rev(x)),ifelse(y>0,y,0),col="red")
But then the red area is too large. Do you have any ideas how to get the desired result?

If you want two different colors, you need two different polygons. You can either call polygon multiple times, or you can add NA values in your x and y vectors to indicate a new polygon. R will not automatically calculate the intersection for you. You must do that yourself. Here's how you could draw that with different colors.
x <- c(1,2,2.5,NA,2.5,3,4)
y <- c(0,1,0,NA,0,-1,0)
#calculate color based on most extreme y value
g <- cumsum(is.na(x))
gc <- ifelse(tapply(y, g,
function(x) x[which.max(abs(x))])>0,
"red","green")
plot(c(1, 4),c(-1,1), type = "n")
polygon(x, y, col = gc)
abline(h=0)
In the more general case, it might not be as easy to split a polygon into different regions. There seems to be some support for this type of operation in GIS packages, where this type of thing is more common. However, I've put together a somewhat general case that may work for simple polygons.
First, I define a closure that will define a cutting line. The function will take a slope and y-intercept for a line and will return the functions we need to cut a polygon.
getSplitLine <- function(m=1, b=0) {
force(m); force(b)
classify <- function(x,y) {
y >= m*x + b
}
intercepts <- function(x,y, class=classify(x,y)) {
w <- which(diff(class)!=0)
m2 <- (y[w+1]-y[w])/(x[w+1]-x[w])
b2 <- y[w] - m2*x[w]
ix <- (b2-b)/(m-m2)
iy <- ix*m + b
data.frame(x=ix,y=iy,idx=w+.5, dir=((rank(ix, ties="first")+1) %/% 2) %% 2 +1)
}
plot <- function(...) {
abline(b,m,...)
}
list(
intercepts=intercepts,
classify=classify,
plot=plot
)
}
Now we will define a function to actually split a polygon using the splitter we've just defined.
splitPolygon <- function(x, y, splitter) {
addnullrow <- function(x) if (!all(is.na(x[nrow(x),]))) rbind(x, NA) else x
rollup <- function(x,i=1) rbind(x[(i+1):nrow(x),], x[1:i,])
idx <- cumsum(is.na(x) | is.na(y))
polys <- split(data.frame(x=x,y=y)[!is.na(x),], idx[!is.na(x)])
r <- lapply(polys, function(P) {
x <- P$x; y<-P$y
side <- splitter$classify(x, y)
if(side[1] != side[length(side)]) {
ints <- splitter$intercepts(c(x,x[1]), c(y, y[1]), c(side, side[1]))
} else {
ints <- splitter$intercepts(x, y, side)
}
sideps <- lapply(unique(side), function(ss) {
pts <- data.frame(x=x[side==ss], y=y[side==ss],
idx=seq_along(x)[side==ss], dir=0)
mm <- rbind(pts, ints)
mm <- mm[order(mm$idx), ]
br <- cumsum(mm$dir!=0 & c(0,head(mm$dir,-1))!=0 &
c(0,diff(mm$idx))>1)
if (length(unique(br))>1) {
mm<-rollup(mm, sum(br==br[1]))
}
br <- cumsum(c(FALSE,abs(diff(mm$dir*mm$dir))==3))
do.call(rbind, lapply(split(mm, br), addnullrow))
})
pss<-rep(unique(side), sapply(sideps, nrow))
ps<-do.call(rbind, lapply(sideps, addnullrow))[,c("x","y")]
attr(ps, "side")<-pss
ps
})
pss<-unname(unlist(lapply(r, attr, "side")))
src <- rep(seq_along(r), sapply(r, nrow))
r <- do.call(rbind, r)
attr(r, "source")<-src
attr(r, "side")<-pss
r
}
The input is just the values of x and y as you would pass to polygon along with the cutter. It will return a data.frame with x and y values that can be used with polygon.
For example
x <- c(1,2,2.5,NA,2.5,3,4)
y <- c(1,-2,2,NA,-1,2,-2)
sl<-getSplitLine(0,0)
plot(range(x, na.rm=T),range(y, na.rm=T), type = "n")
p <- splitPolygon(x,y,sl)
g <- cumsum(c(F, is.na(head(p$y,-1))))
gc <- ifelse(attr(p,"side")[is.na(p$y)],
"red","green")
polygon(p, col=gc)
sl$plot(lty=2, col="grey")
This should work for simple concave polygons as well with sloped lines. Here's another example
x <- c(1,2,3,4,5,4,3,2)
y <- c(-2,2,1,2,-2,.5,-.5,.5)
sl<-getSplitLine(.5,-1.25)
plot(range(x, na.rm=T),range(y, na.rm=T), type = "n")
p <- splitPolygon(x,y,sl)
g <- cumsum(c(F, is.na(head(p$y,-1))))
gc <- ifelse(attr(p,"side")[is.na(p$y)],
"red","green")
polygon(p, col=gc)
sl$plot(lty=2, col="grey")
Right now things can get a bit messy when the the vertex of the polygon falls directly on the splitting line. I may try to correct that in the future.

A faster, but not very accurate solution is to split data frame to list according to grouping variable (e.g. above=red and below=blue). This is a pretty nice workaround for rather big (I would say > 100 elements) datasets. For smaller chunks some discontinuity may be visible:
x <- 1:100
y1 <- sin(1:100/10)*0.8
y2 <- sin(1:100/10)*1.2
plot(x, y2, type='l')
lines(x, y1, col='red')
df <- data.frame(x=x, y1=y1, y2=y2)
df$pos_neg <- ifelse(df$y2-df$y1>0,1,-1) # above (1) or below (-1) average
# create the number for chunks to be split into lists:
df$chunk <- c(1,cumsum(abs(diff(df$pos_neg)))/2+1) # first element needs to be added`
df$colors <- ifelse(df$pos_neg>0, "red","blue") # colors to be used for filling the polygons
# create lists to be plotted:
l <- split(df, df$chunk) # we should get 4 sub-lists
lapply(l, function(x) polygon(c(x$x,rev(x$x)),c(x$y2,rev(x$y1)),col=x$colors))
As I said, for smaller dataset some discontinuity may be visible if sharp changes occur between positive and negative areas, but if horizontal line distinguishes between those two, or more elements are plotted then this effect is neglected:

Related

How to find outer limits of contour lines for three-dimensional data in r

I would like to extend the example given here
How to plot a contour line showing where 95% of values fall within, in R and in ggplot2
to data with three dimensions (x, y and z), and instead of plotting the contour line I'd like to get the limits of the x, y and z values.
This is the example from the previous post.
library(ggplot2)
set.seed(1001)
d <- data.frame(x=rnorm(1000),y=rnorm(1000))
kd <- ks::kde(d, compute.cont=TRUE)
contour_95 <- with(kd, contourLines(x=eval.points[[1]], y=eval.points[[2]],
z=estimate, levels=cont["5%"])[[1]])
contour_95 <- data.frame(contour_95)
ggplot(data=d, aes(x, y)) +
geom_point() +
geom_path(aes(x, y), data=contour_95) +
theme_bw()
and then, it's possible to get the limits of the contour like this:
range(contour_95$x)
range(contour_95$y)
I would love to know how to get the x, y and z ranges of 3-D contours at specified percentiles.
ks:kde can deal with higher dimensions, but contourLines() cant.
This is what I've tried...
set.seed(1001)
d <- data.frame(x=rnorm(1000),y=rnorm(1000), y=rnorm(1000))
kd <- ks::kde(d, compute.cont=TRUE)
#what kd$estimates are > 95th percentile?
#make function that can extract from 3d array
multi.which <- function(A){
if ( is.vector(A) ) return(which(A))
d <- dim(A)
T <- which(A) - 1
nd <- length(d)
t( sapply(T, function(t){
I <- integer(nd)
I[1] <- t %% d[1]
sapply(2:nd, function(j){
I[j] <<- (t %/% prod(d[1:(j-1)])) %% d[j]
})
I
}) + 1 )
}
#extract those estimates that have >density than 95th percentile
ests <- multi.which(kd$estimate > kd$cont["5%"])
#make into a long dataframe with column number in the second column and row number in first column
col1=rep(1, nrow(ests))
col2=rep(2, nrow(ests))
col3=rep(3, nrow(ests))
rows=c(ests[,1], ests[,2], ests[,3])
cols=c(col1,col2,col3)
index=cbind(rows,cols)#this is the index so we can extract the coordinates in multi-D space
car::some(index)
#get coordinates with this function
fExtract <- function(dat, indexDat){
dat[as.matrix(indexDat)]
}
#pull three coordinates (x,y,z) from eval.points into 3 columns
eval.pts <- cbind(kd$eval.points[[1]], kd$eval.points[[2]], kd$eval.points[[3]])
v <- fExtract(eval.pts, index) #one long vector
#re-create the three columns of x, y and z coordinates of points at higher density than 95th percentile
x1 <- v[1:nrow(ests)]
y1 <- v[(nrow(ests)+1):(2*nrow(ests))]
z1 <- v[(2*nrow(ests)+1):(3*nrow(ests))]
#the three coordinates.
fin <- cbind(x1,y1,z1)
#get range of each dimension
range(x1)
range(y1)
range(z1)
But I'm not confident it's right.

How to clip an isosurface to a ball?

Consider the Togliatti implicit surface. I want to clip it to the ball centered at the origin with radius 4.8. A solution, with the misc3d package, consists in using the mask argument of the computeContour3d function, which allows to use only the points satisfying x^2+y^2+z^2 < 4.8^2:
library(misc3d)
# Togliatti surface equation: f(x,y,z) = 0
f <- function(x,y,z){
w <- 1
64*(x-w)*
(x^4-4*x^3*w-10*x^2*y^2-4*x^2*w^2+16*x*w^3-20*x*y^2*w+5*y^4+16*w^4-20*y^2*w^2) -
5*sqrt(5-sqrt(5))*(2*z-sqrt(5-sqrt(5))*w)*(4*(x^2+y^2-z^2)+(1+3*sqrt(5))*w^2)^2
}
# make grid
nx <- 220; ny <- 220; nz <- 220
x <- seq(-5, 5, length=nx)
y <- seq(-5, 5, length=ny)
z <- seq(-4, 4, length=nz)
g <- expand.grid(x=x, y=y, z=z)
# calculate voxel
voxel <- array(with(g, f(x,y,z)), dim = c(nx,ny,nz))
# mask: keep points satisfying x^2+y^2+z^2 < 4.8^2, in order to
# clip the surface to the ball of radius 4.8
mask <- array(with(g, x^2+y^2+z^2 < 4.8^2), dim = c(nx,ny,nz))
# compute isosurface
surf <- computeContour3d(voxel, maxvol=max(voxel), level=0, mask=mask, x=x, y=y, z=z)
# draw isosurface
drawScene.rgl(makeTriangles(surf, smooth=TRUE))
But the borders of the resulting surface are irregular:
How to get regular, smooth borders?
The solution I found resorts to spherical coordinates. It consists in defining the function f in terms of spherical coordinates (ρ, θ, ϕ), then to compute the isosurface with ρ running from 0 to the desired radius, and then to transform the result to Cartesian coordinates:
# Togliatti surface equation with spherical coordinates
f <- function(ρ, θ, ϕ){
w <- 1
x <- ρ*cos(θ)*sin(ϕ)
y <- ρ*sin(θ)*sin(ϕ)
z <- ρ*cos(ϕ)
64*(x-w)*
(x^4-4*x^3*w-10*x^2*y^2-4*x^2*w^2+16*x*w^3-20*x*y^2*w+5*y^4+16*w^4-20*y^2*w^2) -
5*sqrt(5-sqrt(5))*(2*z-sqrt(5-sqrt(5))*w)*(4*(x^2+y^2-z^2)+(1+3*sqrt(5))*w^2)^2
}
# make grid
nρ <- 300; nθ <- 400; nϕ <- 300
ρ <- seq(0, 4.8, length = nρ) # ρ runs from 0 to the desired radius
θ <- seq(0, 2*pi, length = nθ)
ϕ <- seq(0, pi, length = nϕ)
g <- expand.grid(ρ=ρ, θ=θ, ϕ=ϕ)
# calculate voxel
voxel <- array(with(g, f(ρ,θ,ϕ)), dim = c(nρ,nθ,nϕ))
# calculate isosurface
surf <- computeContour3d(voxel, maxvol=max(voxel), level=0, x=ρ, y=θ, z=ϕ)
# transform to Cartesian coordinates
surf <- t(apply(surf, 1, function(rtp){
ρ <- rtp[1]; θ <- rtp[2]; ϕ <- rtp[3]
c(
ρ*cos(θ)*sin(ϕ),
ρ*sin(θ)*sin(ϕ),
ρ*cos(ϕ)
)
}))
# draw isosurface
drawScene.rgl(makeTriangles(surf, smooth=TRUE, color = "violetred"))
Now the resulting surface has regular, smooth borders:
Your solution is excellent for the problem you stated, because spherical coordinates are so natural for that boundary. However, here is a more general solution that would work for other smooth boundaries.
The idea is to allow input of a boundary function, and cull points when they are too large or too small. In your case it would be the squared distance from the origin, and you would want to cull points where the value is bigger than 4.8^2. But sometimes the triangles being drawn to make the smooth surface should only be partially culled: one point would be kept and two deleted, or two kept and one deleted. If you cull the whole triangle that leads to the jagged edges in your original plot.
To fix this, the points can be modified. If only one is supposed to be kept, then the other two points can be shrunk towards it until they lie on an approximation to the boundary. If two are supposed to be kept you want the shape to be a quadrilateral, so you would build that out of two triangles.
This function does that, assuming the input surf is the output of computeContour3d:
boundSurface <- function(surf, boundFn, bound = 0, greater = TRUE) {
# Surf is n x 3: each row is a point, triplets are triangles
values <- matrix(boundFn(surf) - bound, 3)
# values is (m = n/3) x 3: each row is the boundFn value at one point
# of a triangle
if (!greater)
values <- -values
keep <- values >= 0
# counts is m vector counting number of points to keep in each triangle
counts <- apply(keep, 2, sum)
# result is initialized to an empty array
result <- matrix(nrow = 0, ncol = 3)
# singles is set to all the rows of surf where exactly one
# point in the triangle is kept, say s x 3
singles <- surf[rep(counts == 1, each = 3),]
if (length(singles)) {
# singleValues is a subset of values where only one vertex is kept
singleValues <- values[, counts == 1]
singleIndex <- 3*col(singleValues) + 1:3 - 3
# good is the index of the vertex to keep, bad are those to fix
good <- apply(singleValues, 2, function(col) which(col >= 0))
bad <- apply(singleValues, 2, function(col) which(col < 0))
for (j in 1:ncol(singleValues)) {
goodval <- singleValues[good[j], j]
for (i in 1:2) {
badval <- singleValues[bad[i,j], j]
alpha <- goodval/(goodval - badval)
singles[singleIndex[bad[i,j], j], ] <-
(1-alpha)*singles[singleIndex[good[j], j],] +
alpha *singles[singleIndex[bad[i,j], j],]
}
}
result <- rbind(result, singles)
}
doubles <- surf[rep(counts == 2, each = 3),]
if (length(doubles)) {
# doubleValues is a subset of values where two vertices are kept
doubleValues <- values[, counts == 2]
doubleIndex <- 3*col(doubleValues) + 1:3 - 3
doubles2 <- doubles
# good is the index of the vertex to keep, bad are those to fix
good <- apply(doubleValues, 2, function(col) which(col >= 0))
bad <- apply(doubleValues, 2, function(col) which(col < 0))
newvert <- matrix(NA, 2, 3)
for (j in 1:ncol(doubleValues)) {
badval <- doubleValues[bad[j], j]
for (i in 1:2) {
goodval <- doubleValues[good[i,j], j]
alpha <- goodval/(goodval - badval)
newvert[i,] <-
(1-alpha)*doubles[doubleIndex[good[i,j], j],] +
alpha *doubles[doubleIndex[bad[j], j],]
}
doubles[doubleIndex[bad[j], j],] <- newvert[1,]
doubles2[doubleIndex[good[1,j], j],] <- newvert[1,]
doubles2[doubleIndex[bad[j], j],] <- newvert[2,]
}
result <- rbind(result, doubles, doubles2)
}
# Finally add all the rows of surf where the whole
# triangle is kept
rbind(result, surf[rep(counts == 3, each = 3),])
}
You would use it after computeContour3d and before makeTriangles, e.g.
fn <- function(x) {
apply(x^2, 1, sum)
}
drawScene.rgl(makeTriangles(boundSurface(surf, fn, bound = 4.8^2,
greater = FALSE),
smooth = TRUE))
Here's the output I see:
It's not quite as good as yours, but it would work for many different boundary functions.
Edited to add: Version 0.100.26 of rgl now has a function clipMesh3d which incorporates these ideas.

How to plot nicely-spaced data labels?

Labeling data points in a plot can get unwieldy:
Randomly sampling few labels may disappoint:
What would be a nice way to pick a small set of nicely-spaced data labels? That is, to randomly pick representatives whose labels are not overlapping.
# demo data
set.seed(123)
N <- 50
x <- runif(N)
y <- x + rnorm(N, 0, x)
data <- data.frame(x, y, labels=state.name)
# plot with labels
plot(x,y)
text(x,y,labels)
# plot a few labels
frame()
few_labels <- data[sample(N, 10), ]
plot(x,y)
with(few_labels, text(x,y,labels))
One way to do is through clustering. Here is a solution with stats::hclust. We agglomerate the data points in cluster and then pick one random observation from each cluster.
few_labels <- function(df, coord=1:ncol(df),grp=5){
require(dplyr)
df$cl <- cutree(hclust(dist(df[,coord])),grp)
few_labels <- df %>% group_by(cl) %>%
do(sample_n(.,1))
return(few_labels)
}
# demo data
set.seed(123)
N <- 50
x <- runif(N)
y <- x + rnorm(N, 0, x)
data <- data.frame(x, y, labels=state.name)
# plot a few labels
frame()
few_labels <- few_labels(data,coord=1:2,grp=12)
plot(x,y)
with(few_labels, text(x,y,labels))
For all labels:
xlims=c(-1,2)
plot(x,y,xlim=xlims)
#text(x,y,data$labels,pos = 2,cex=0.7)
library(plotrix)
spread.labels(x,y,data$labels,cex=0.7,ony=NA)
Another way is to pick randomly a point, throw all proximate ones, and so on, until no point is left:
radius <- .1 # of a ball containing the largest label
d <- as.matrix(dist(data[, c("x","y")], upper=TRUE, diag=TRUE))
remaining <- 1:N
spaced <- numeric()
i <- 1
while(length(remaining)>0) {
p <- ifelse(length(remaining)>1, sample(remaining, 1), remaining)
spaced <- c(spaced, p) # ...
remaining <- setdiff(remaining, which(d[p, ] < 2*radius))
i <- i + 1
}
frame()
plot(x,y)
spaced_labels <- data[spaced, ]
with(spaced_labels, text(x,y,labels))

vertically distribute multiple lines with smart spacing

A common display of spectroscopic data (intensity vs wavelength) is used below to compare the position of peaks in the data across multiple spectra. Assuming they all share a baseline at 0, it is convenient to offset the multiple lines vertically by a constant spacing, to avoid the distraction of overlapping lines.
Thus becomes
I'm looking for a better strategy to perform this vertical shift automatically, starting from data in long format. Here is a minimal example.
# fake data (5 similar-looking spectra)
spec <- function(){
x <- runif(100, 0, 100)
data.frame(x=x, y=jitter(dnorm(x, mean=jitter(50), sd=jitter(5)), amount=0.01))
}
require(plyr)
all <- ldply(1:5, function(ii) data.frame(spec(), id=ii))
My current strategy is as follows:
convert the spectra from long format to wide format. This involves interpolation, as the spectra do not necessarily have identical x axis values.
find the minimum offset between spectra to avoid overlap between neighbours
shift the spectra by multiples of this distance
melt back to long format
I implemented this using plyr,
# function that evenly spaces the spectra to avoid overlap
# d is in long format, s is a scaling factor for the vertical shift
require(plyr); require(ggplot2)
spread_plot <- function(d, s=1){
ranges <- ddply(d, "id", with, each(min,max,length)(x))
common_x <- seq(max(ranges$min), min(ranges$max), length=max(ranges$length))
new_y <- dlply(d, "id", function(x) approx(x$x, x$y, common_x)$y)
mat <- do.call(cbind, new_y)
test <- apply(mat, 1, diff)
shift <- max(-test[test < 0])
origins <- s*seq(0, by=shift, length=ncol(mat))
for(ii in seq_along(origins)){
current <- unique(d[["id"]])[ii]
d[d[["id"]] == current, "y"] <-
d[d[["id"]] == current, "y"] + origins[ii]
}
d
}
test <- spread_plot(all)
ggplot(test, aes(x, y, colour=id, group=id))+
geom_line() + guides(colour=guide_legend())
This strategy suffers from a few shortcomings:
it is slow
the offset is not a pretty number; I do not know how to automatically round it well so that spectra are offset e.g. by 0.02, or 50, etc. depending on the range of the intensities. pretty(origins) is problematic in that it can return a different number of values.
I feel I'm missing a simpler solution, perhaps working directly with the original data in long format.
Interesting question.
Here's a possibility, offered without detailed comment, except to point out that it:
Should be very fast, due to a combo of its avoidance of plyr, use of data.table, and operation on data in its original long format.
Uses pretty() to pick a pretty offset.
Like your code, is not guaranteed to produce no intersections of lines, since overlap can happen between the lattice of points formed by common_x.
Here's the code
## Setup
library(data.table)
library(plyr)
library(ggplot2)
spec <- function(){
x <- runif(100, 0, 100)
data.frame(x=x, y=jitter(dnorm(x, mean=jitter(50), sd=jitter(5)), amount=0.01))
}
all <- ldply(1:5, function(ii) data.frame(spec(), id=ii))
## Function that uses data.table rather than plyr to compute and add offsets
spread_plot <- function(d, s=1){
d <- data.table(d, key="id")
ranges <- d[, list(min=min(x), max=max(x), length=length(x)),by="id"]
common_x <- seq(max(ranges$min), min(ranges$max), length=max(ranges$length))
new_y <- d[,list(y=approx(x, y, common_x)$y, N=seq_along(common_x)),
by="id"]
shift <- max(new_y[, max(abs(diff(y))), by = "N"][[2]])
shift <- pretty(c(0, shift), n=0)[2]
origins <- s*seq(0, by=shift, length=length(unique(d$id)))
d[,y:=(y + origins[.GRP]),by="id"]
d
}
## Try it out
test <- spread_plot(all)
ggplot(test, aes(x, y, colour=id, group=id))+
geom_line() + guides(colour=guide_legend())
I still think you could rely on some assumptions about typical data from spectroscopy. Usually, x values are sorted, the number of them is equal for all spectra and they are quite similar:
# new fake data (5 similar-looking spectra)
spec <- function(){
x <- jitter(seq(0,100,1),0.1)
data.frame(x=x, y=jitter(dnorm(x, mean=jitter(50), sd=jitter(5)), amount=0.01))
}
require(plyr)
all <- ldply(1:5, function(ii) data.frame(spec(), id=ii))
If these assumptions are valid, you could treat the spectra as having identical x values:
library(ggplot2)
spread_plot <- function(d, s=0.05) {
#add some checks here, e.g., for equal length
d <- d[order(d$x),]
d$id <- factor(d$id)
l <- levels(d$id)
pretty_offset <- pretty(s*min(tapply(d$y, d$id, function(x) abs(diff(range(x))))))[2]
for (i in seq_len(length(l)-1)+1) {
mean_delta_y <- mean(d[d$id == l[i], "y"] - d[d$id == l[i-1], "y"])
d[d$id == l[i], "y"] <- d[d$id == l[i], "y"] - mean_delta_y
min_delta_y <- abs(1.05 * min(d[d$id == l[i], "y"] - d[d$id == l[i-1], "y"]))
pretty_delta_y <- max(min_delta_y, pretty_offset)
d[d$id == l[i], "y"] <- d[d$id == l[i], "y"] + pretty_delta_y
}
p <- ggplot(d, aes(x=x, y=y, col=id)) + geom_line()
print(p)
}
spread_plot(all, s=0)
spread_plot(all, s=0.5)
As suggested by hadley, the for loop can be avoided very simply,
d$y <- d$y + origins[d$id]
Full code:
spread_plot <- function(d, s=1){
ranges <- ddply(d, "id", with, each(min,max,length)(x))
common_x <- seq(max(ranges$min), min(ranges$max), length=max(ranges$length))
new_y <- dlply(d, "id", function(x) approx(x$x, x$y, common_x)$y)
mat <- do.call(cbind, new_y)
test <- apply(mat, 1, diff)
shift <- max(-test[test < 0])
origins <- s*seq(0, by=shift, length=ncol(mat))
d$y <- d$y + origins[d$id]
d
}
test <- spread_plot(all)
ggplot(test, aes(x, y, colour=id, group=id))+
geom_line() + guides(colour=guide_legend())

Ordering 1:17 by perfect square pairs

There was an interesting question on R-help:
"Take the numbers one up to 17. Can you write them out in a line so that every pair of numbers that are next to each other, adds up to give a square number?"
My solution is below and not particularly special. I'm curious about a more elegant and/or robust solution. Maybe a solution that can take an arbitrary string of numbers and order them like this if possible?
sq.test <- function(a, b) {
## test for number pairs that sum to squares.
sqrt(sum(a, b)) == floor(sqrt(sum(a, b)))
}
ok.pairs <- function(n, vec) {
## given n as a member of vec,
## which other members of vec satisfiy sq.test
vec <- vec[vec!=n]
vec[sapply(vec, sq.test, b=n)]
}
grow.seq <- function(y) {
## given a starting point (y) and a pairs list (pl)
## grow the squaring sequence.
ly <- length(y)
if(ly == y[1]) return(y)
## this line is the one that breaks down on other number sets...
y <- c(y, max(pl[[y[ly]]][!pl[[y[ly]]] %in% y]))
y <- grow.seq(y)
return(y)
}
## start vector
x <- 1:17
## get list of possible pairs
pl <- lapply(x, ok.pairs, vec=x)
## pick start at max since few combinations there.
y <- max(x)
grow.seq(y)
You can use outer to compute the allowable pairs.
The resulting matrix is the adjacency matrix of a graph,
and you just want a Hamiltonian path on it.
# Allowable pairs form a graph
p <- outer(
1:17, 1:17,
function(u,v) round(sqrt(u + v),6) == floor(sqrt(u+v)) )
)
rownames(p) <- colnames(p) <- 1:17
image(p, col=c(0,1))
# Read the solution on the plot
library(igraph)
g <- graph.adjacency(p, "undirected")
V(g)$label <- V(g)$name
plot(g, layout=layout.fruchterman.reingold)

Resources