It seems that a statistical problem that I am working on requires doing something known in computational geometry as "offline orthogonal range counting":
Suppose I have a set of n points (for the moment, in the plane). For every pair of points i and j, I would like to count the number of remaining points in the set that are in the rectangle whose diagonal is the segment with endpoints i and j. The overall output then is a vector of n(n-1) values each in [0, 1, 2, ... , n-2].
I've seen that a rich literature on the problem (or at least a very similar problem) exists, but I cannot find an implementation. I would prefer an R (a statistical computing language) package, but I guess that's asking too much. An open source C/C++ implementation will also work.
Thanks.
I hope I understand well your proble. Here an implementation in R using package geometry. I use
mesh.drectangle function which compute a signed distance from points p to boundary of rectangle.
I create a combination for all points using combn
for each point p of combination , I compute the distance from the rectangle rect_p to the others points
if distance < 0 I choose the points.
For example
library(geometry)
## I generate some data
set.seed(1234)
p.x <- sample(1:100,size=30,replace=T)
p.y <- sample(1:100,size=30,replace=T)
points <- cbind(p.x,p.y)
## the algortithm
ll <- combn(1:nrow(points),2,function(x){
x1<- p.x[x[1]]; y1 <- p.y[x[1]]
x2<- p.x[x[2]]; y2 <- p.y[x[2]]
p <- points[-x,]
d <- mesh.drectangle(p,x1,y1,x2,y2)
res <- NA
if(length(which(d <0))){
points.in = as.data.frame(p,ncol=2)[ d < 0 , ]
res <- list(n = nrow(points.in),
rect = list(x1=x1,x2=x2,y1=y1,y2=y2),
points.in = points.in)
}
res
},simplify=F)
ll <- ll[!is.na(ll)]
## the result
nn <- do.call(rbind,lapply(ll,'[[','n'))
To visualize the results, I plots rectangles with 5 points for example.
library(grid)
grid.newpage()
vp <- plotViewport(xscale = extendrange(p.x),
yscale = extendrange(p.y))
pushViewport(vp)
grid.xaxis()
grid.yaxis()
grid.points(x=points[,'p.x'],y=points[,'p.y'],pch='*')
cols <- rainbow(length(ll))
ll <- ll[nn == 5] ## here I plot only the rectangle with 5 points
lapply(seq_along(ll),function(i){
x <- ll[[i]]
col <- sample(cols,1)
x1<- x$rect$x1; x2<- x$rect$x2
y1<- x$rect$y1; y2<- x$rect$y2
grid.rect(x=(x1+x2)*.5,y=(y1+y2)*.5,
width= x2-x1,height = y2-y1,
default.units ='native',
gp=gpar(fill=col,col='red',alpha=0.2)
)
grid.points(x=x$points.in$p.x,y=x$points.in$p.y,pch=19,
gp=gpar(col=rep(col,x$n)))
}
)
upViewport()
Related
I am new to using r program. I have a task to use r to create a function to simulate standard normal distribution containing 500 observations and three variables, x,y,& z.
I am to use cube as a decision surface to categorize observations based on whether they fell within or outside the cube.
Below is my code. I am able to plot the 3D data, but I am not sure of how to categorize the datasets into two classes.
library(scatterplot3d)
set.seed (1234)
nObs <- 500
x <- matrix (rnorm (1.25*nObs), ncol =2)
y <- matrix (rnorm (1.25*nObs), ncol =2)
z <- matrix (rnorm (1.25*nObs), ncol =2)
mSample <- function(nObs,x,y,z){
x1 <- rnorm(1,x)
x1[y==1,] <- x[y==1,] + 1
mSample <- as_tibble(rbind(mvnfast::rmvn(x,y = y1,z = z1), mvnfast::rmvn(x,y = y1,z = z1)))
mSample[1:x1, 1.25] <- 0
mSample[(x1 + 1):(x1 + 1), 1.25] <- 1
mSample <- mSample[sample(nrow(mSample)), ]
colnames(mSample <- c("x", "y", "class"))
mSample
}
spl <- scatterplot3d(x,y,z)
spl <- scatterplot3d(x,y,z,pch=16,highlight.3d=TRUE)
I had a similar question to this recently. Basically, to know if a given point is inside or outside of a cube, first you need to know the length of the cube.
Then, simply iterate over all the points (nObs) and do an if statement
if (x > -cubeLength ** x < cubeLength && y > -cubeLength ** y < cubeLength && z > -cubeLength ** z < cubeLength) {
classify positive
}
else {
classify negative
}
I am trying to compute the turning angles of sequential vectors in the complex plane. Please see the code below for a demo data frame and my attempt at calculating the angles.
The sign of the angles seem correct: left turns are positive and right turns are negative. However, the turning angles do not look right when I reference the plot. NOTE: I want the turning angles and not the angles between the vectors. Image for reference:
set.seed(123)
# Generate a random path and plot it
path.short.random <- function(points = 6) {
x <- runif(points, -1, 1)
y <- rnorm(points, 0, 0.25)
i <- order(x, y)
x <- x[i]
y <- y[i]
path <- data.frame(x = x, y = y)
plot(x, y, main = "Random Path", asp = 1)
# draw arrows from point to point
s <- seq(length(x) - 1) # one shorter than data
arrows(x[s], y[s], x[s + 1], y[s + 1], col = 1:points)
path
}
# Save the path as a data frame
df <- path.short.random()
# Compute sequential turning angles
get.angles <- function(df) {
df$polar <- complex(real = df$x, imaginary = df$y)
df$displacement <- c(0, diff(df$polar))
diff(Arg(df$displacement[2:nrow(df)]))
}
get.angles(df)
Consider the Togliatti implicit surface. I want to clip it to the ball centered at the origin with radius 4.8. A solution, with the misc3d package, consists in using the mask argument of the computeContour3d function, which allows to use only the points satisfying x^2+y^2+z^2 < 4.8^2:
library(misc3d)
# Togliatti surface equation: f(x,y,z) = 0
f <- function(x,y,z){
w <- 1
64*(x-w)*
(x^4-4*x^3*w-10*x^2*y^2-4*x^2*w^2+16*x*w^3-20*x*y^2*w+5*y^4+16*w^4-20*y^2*w^2) -
5*sqrt(5-sqrt(5))*(2*z-sqrt(5-sqrt(5))*w)*(4*(x^2+y^2-z^2)+(1+3*sqrt(5))*w^2)^2
}
# make grid
nx <- 220; ny <- 220; nz <- 220
x <- seq(-5, 5, length=nx)
y <- seq(-5, 5, length=ny)
z <- seq(-4, 4, length=nz)
g <- expand.grid(x=x, y=y, z=z)
# calculate voxel
voxel <- array(with(g, f(x,y,z)), dim = c(nx,ny,nz))
# mask: keep points satisfying x^2+y^2+z^2 < 4.8^2, in order to
# clip the surface to the ball of radius 4.8
mask <- array(with(g, x^2+y^2+z^2 < 4.8^2), dim = c(nx,ny,nz))
# compute isosurface
surf <- computeContour3d(voxel, maxvol=max(voxel), level=0, mask=mask, x=x, y=y, z=z)
# draw isosurface
drawScene.rgl(makeTriangles(surf, smooth=TRUE))
But the borders of the resulting surface are irregular:
How to get regular, smooth borders?
The solution I found resorts to spherical coordinates. It consists in defining the function f in terms of spherical coordinates (ρ, θ, ϕ), then to compute the isosurface with ρ running from 0 to the desired radius, and then to transform the result to Cartesian coordinates:
# Togliatti surface equation with spherical coordinates
f <- function(ρ, θ, ϕ){
w <- 1
x <- ρ*cos(θ)*sin(ϕ)
y <- ρ*sin(θ)*sin(ϕ)
z <- ρ*cos(ϕ)
64*(x-w)*
(x^4-4*x^3*w-10*x^2*y^2-4*x^2*w^2+16*x*w^3-20*x*y^2*w+5*y^4+16*w^4-20*y^2*w^2) -
5*sqrt(5-sqrt(5))*(2*z-sqrt(5-sqrt(5))*w)*(4*(x^2+y^2-z^2)+(1+3*sqrt(5))*w^2)^2
}
# make grid
nρ <- 300; nθ <- 400; nϕ <- 300
ρ <- seq(0, 4.8, length = nρ) # ρ runs from 0 to the desired radius
θ <- seq(0, 2*pi, length = nθ)
ϕ <- seq(0, pi, length = nϕ)
g <- expand.grid(ρ=ρ, θ=θ, ϕ=ϕ)
# calculate voxel
voxel <- array(with(g, f(ρ,θ,ϕ)), dim = c(nρ,nθ,nϕ))
# calculate isosurface
surf <- computeContour3d(voxel, maxvol=max(voxel), level=0, x=ρ, y=θ, z=ϕ)
# transform to Cartesian coordinates
surf <- t(apply(surf, 1, function(rtp){
ρ <- rtp[1]; θ <- rtp[2]; ϕ <- rtp[3]
c(
ρ*cos(θ)*sin(ϕ),
ρ*sin(θ)*sin(ϕ),
ρ*cos(ϕ)
)
}))
# draw isosurface
drawScene.rgl(makeTriangles(surf, smooth=TRUE, color = "violetred"))
Now the resulting surface has regular, smooth borders:
Your solution is excellent for the problem you stated, because spherical coordinates are so natural for that boundary. However, here is a more general solution that would work for other smooth boundaries.
The idea is to allow input of a boundary function, and cull points when they are too large or too small. In your case it would be the squared distance from the origin, and you would want to cull points where the value is bigger than 4.8^2. But sometimes the triangles being drawn to make the smooth surface should only be partially culled: one point would be kept and two deleted, or two kept and one deleted. If you cull the whole triangle that leads to the jagged edges in your original plot.
To fix this, the points can be modified. If only one is supposed to be kept, then the other two points can be shrunk towards it until they lie on an approximation to the boundary. If two are supposed to be kept you want the shape to be a quadrilateral, so you would build that out of two triangles.
This function does that, assuming the input surf is the output of computeContour3d:
boundSurface <- function(surf, boundFn, bound = 0, greater = TRUE) {
# Surf is n x 3: each row is a point, triplets are triangles
values <- matrix(boundFn(surf) - bound, 3)
# values is (m = n/3) x 3: each row is the boundFn value at one point
# of a triangle
if (!greater)
values <- -values
keep <- values >= 0
# counts is m vector counting number of points to keep in each triangle
counts <- apply(keep, 2, sum)
# result is initialized to an empty array
result <- matrix(nrow = 0, ncol = 3)
# singles is set to all the rows of surf where exactly one
# point in the triangle is kept, say s x 3
singles <- surf[rep(counts == 1, each = 3),]
if (length(singles)) {
# singleValues is a subset of values where only one vertex is kept
singleValues <- values[, counts == 1]
singleIndex <- 3*col(singleValues) + 1:3 - 3
# good is the index of the vertex to keep, bad are those to fix
good <- apply(singleValues, 2, function(col) which(col >= 0))
bad <- apply(singleValues, 2, function(col) which(col < 0))
for (j in 1:ncol(singleValues)) {
goodval <- singleValues[good[j], j]
for (i in 1:2) {
badval <- singleValues[bad[i,j], j]
alpha <- goodval/(goodval - badval)
singles[singleIndex[bad[i,j], j], ] <-
(1-alpha)*singles[singleIndex[good[j], j],] +
alpha *singles[singleIndex[bad[i,j], j],]
}
}
result <- rbind(result, singles)
}
doubles <- surf[rep(counts == 2, each = 3),]
if (length(doubles)) {
# doubleValues is a subset of values where two vertices are kept
doubleValues <- values[, counts == 2]
doubleIndex <- 3*col(doubleValues) + 1:3 - 3
doubles2 <- doubles
# good is the index of the vertex to keep, bad are those to fix
good <- apply(doubleValues, 2, function(col) which(col >= 0))
bad <- apply(doubleValues, 2, function(col) which(col < 0))
newvert <- matrix(NA, 2, 3)
for (j in 1:ncol(doubleValues)) {
badval <- doubleValues[bad[j], j]
for (i in 1:2) {
goodval <- doubleValues[good[i,j], j]
alpha <- goodval/(goodval - badval)
newvert[i,] <-
(1-alpha)*doubles[doubleIndex[good[i,j], j],] +
alpha *doubles[doubleIndex[bad[j], j],]
}
doubles[doubleIndex[bad[j], j],] <- newvert[1,]
doubles2[doubleIndex[good[1,j], j],] <- newvert[1,]
doubles2[doubleIndex[bad[j], j],] <- newvert[2,]
}
result <- rbind(result, doubles, doubles2)
}
# Finally add all the rows of surf where the whole
# triangle is kept
rbind(result, surf[rep(counts == 3, each = 3),])
}
You would use it after computeContour3d and before makeTriangles, e.g.
fn <- function(x) {
apply(x^2, 1, sum)
}
drawScene.rgl(makeTriangles(boundSurface(surf, fn, bound = 4.8^2,
greater = FALSE),
smooth = TRUE))
Here's the output I see:
It's not quite as good as yours, but it would work for many different boundary functions.
Edited to add: Version 0.100.26 of rgl now has a function clipMesh3d which incorporates these ideas.
I want to colour the area under a curve. The area with y > 0 should be red, the area with y < 0 should be green.
x <- c(1:4)
y <- c(0,1,-1,2,rep(0,4))
plot(y[1:4],type="l")
abline(h=0)
Using ifelse() does not work:
polygon(c(x,rev(x)),y,col=ifelse(y>0,"red","green"))
What I achieved so far is the following:
polygon(c(x,rev(x)),y,col="green")
polygon(c(x,rev(x)),ifelse(y>0,y,0),col="red")
But then the red area is too large. Do you have any ideas how to get the desired result?
If you want two different colors, you need two different polygons. You can either call polygon multiple times, or you can add NA values in your x and y vectors to indicate a new polygon. R will not automatically calculate the intersection for you. You must do that yourself. Here's how you could draw that with different colors.
x <- c(1,2,2.5,NA,2.5,3,4)
y <- c(0,1,0,NA,0,-1,0)
#calculate color based on most extreme y value
g <- cumsum(is.na(x))
gc <- ifelse(tapply(y, g,
function(x) x[which.max(abs(x))])>0,
"red","green")
plot(c(1, 4),c(-1,1), type = "n")
polygon(x, y, col = gc)
abline(h=0)
In the more general case, it might not be as easy to split a polygon into different regions. There seems to be some support for this type of operation in GIS packages, where this type of thing is more common. However, I've put together a somewhat general case that may work for simple polygons.
First, I define a closure that will define a cutting line. The function will take a slope and y-intercept for a line and will return the functions we need to cut a polygon.
getSplitLine <- function(m=1, b=0) {
force(m); force(b)
classify <- function(x,y) {
y >= m*x + b
}
intercepts <- function(x,y, class=classify(x,y)) {
w <- which(diff(class)!=0)
m2 <- (y[w+1]-y[w])/(x[w+1]-x[w])
b2 <- y[w] - m2*x[w]
ix <- (b2-b)/(m-m2)
iy <- ix*m + b
data.frame(x=ix,y=iy,idx=w+.5, dir=((rank(ix, ties="first")+1) %/% 2) %% 2 +1)
}
plot <- function(...) {
abline(b,m,...)
}
list(
intercepts=intercepts,
classify=classify,
plot=plot
)
}
Now we will define a function to actually split a polygon using the splitter we've just defined.
splitPolygon <- function(x, y, splitter) {
addnullrow <- function(x) if (!all(is.na(x[nrow(x),]))) rbind(x, NA) else x
rollup <- function(x,i=1) rbind(x[(i+1):nrow(x),], x[1:i,])
idx <- cumsum(is.na(x) | is.na(y))
polys <- split(data.frame(x=x,y=y)[!is.na(x),], idx[!is.na(x)])
r <- lapply(polys, function(P) {
x <- P$x; y<-P$y
side <- splitter$classify(x, y)
if(side[1] != side[length(side)]) {
ints <- splitter$intercepts(c(x,x[1]), c(y, y[1]), c(side, side[1]))
} else {
ints <- splitter$intercepts(x, y, side)
}
sideps <- lapply(unique(side), function(ss) {
pts <- data.frame(x=x[side==ss], y=y[side==ss],
idx=seq_along(x)[side==ss], dir=0)
mm <- rbind(pts, ints)
mm <- mm[order(mm$idx), ]
br <- cumsum(mm$dir!=0 & c(0,head(mm$dir,-1))!=0 &
c(0,diff(mm$idx))>1)
if (length(unique(br))>1) {
mm<-rollup(mm, sum(br==br[1]))
}
br <- cumsum(c(FALSE,abs(diff(mm$dir*mm$dir))==3))
do.call(rbind, lapply(split(mm, br), addnullrow))
})
pss<-rep(unique(side), sapply(sideps, nrow))
ps<-do.call(rbind, lapply(sideps, addnullrow))[,c("x","y")]
attr(ps, "side")<-pss
ps
})
pss<-unname(unlist(lapply(r, attr, "side")))
src <- rep(seq_along(r), sapply(r, nrow))
r <- do.call(rbind, r)
attr(r, "source")<-src
attr(r, "side")<-pss
r
}
The input is just the values of x and y as you would pass to polygon along with the cutter. It will return a data.frame with x and y values that can be used with polygon.
For example
x <- c(1,2,2.5,NA,2.5,3,4)
y <- c(1,-2,2,NA,-1,2,-2)
sl<-getSplitLine(0,0)
plot(range(x, na.rm=T),range(y, na.rm=T), type = "n")
p <- splitPolygon(x,y,sl)
g <- cumsum(c(F, is.na(head(p$y,-1))))
gc <- ifelse(attr(p,"side")[is.na(p$y)],
"red","green")
polygon(p, col=gc)
sl$plot(lty=2, col="grey")
This should work for simple concave polygons as well with sloped lines. Here's another example
x <- c(1,2,3,4,5,4,3,2)
y <- c(-2,2,1,2,-2,.5,-.5,.5)
sl<-getSplitLine(.5,-1.25)
plot(range(x, na.rm=T),range(y, na.rm=T), type = "n")
p <- splitPolygon(x,y,sl)
g <- cumsum(c(F, is.na(head(p$y,-1))))
gc <- ifelse(attr(p,"side")[is.na(p$y)],
"red","green")
polygon(p, col=gc)
sl$plot(lty=2, col="grey")
Right now things can get a bit messy when the the vertex of the polygon falls directly on the splitting line. I may try to correct that in the future.
A faster, but not very accurate solution is to split data frame to list according to grouping variable (e.g. above=red and below=blue). This is a pretty nice workaround for rather big (I would say > 100 elements) datasets. For smaller chunks some discontinuity may be visible:
x <- 1:100
y1 <- sin(1:100/10)*0.8
y2 <- sin(1:100/10)*1.2
plot(x, y2, type='l')
lines(x, y1, col='red')
df <- data.frame(x=x, y1=y1, y2=y2)
df$pos_neg <- ifelse(df$y2-df$y1>0,1,-1) # above (1) or below (-1) average
# create the number for chunks to be split into lists:
df$chunk <- c(1,cumsum(abs(diff(df$pos_neg)))/2+1) # first element needs to be added`
df$colors <- ifelse(df$pos_neg>0, "red","blue") # colors to be used for filling the polygons
# create lists to be plotted:
l <- split(df, df$chunk) # we should get 4 sub-lists
lapply(l, function(x) polygon(c(x$x,rev(x$x)),c(x$y2,rev(x$y1)),col=x$colors))
As I said, for smaller dataset some discontinuity may be visible if sharp changes occur between positive and negative areas, but if horizontal line distinguishes between those two, or more elements are plotted then this effect is neglected:
autocurve.edges does an amazing job of curving edges in igraph plots so that they don't overlap when they point in the same direction. However, when they point in opposite directions, no curvature is applied.
d <- data.frame(start=c("a","a","b","c"),end=c("b","b","c","b"))
graph <- graph.data.frame(d, directed=T)
plot(graph,
vertex.color="white")
The issue is for the arrows between b and c (or c and b).
Other than specifying curvature manually, any suggestions?
I would use the edge.curved option with the same seq call that autocurve.edges uses.
plot(graph,
vertex.color="white", edge.curved=seq(-0.5, 0.5, length = ecount(graph)))
EDIT:
As Étienne pointed out, this solution also curves edges for unique observations. The solution is then to modify the autocurve.edges function. This is my modified function called autocurve.edges2. Basically, it generates a vector, which curves only non-unique edges.
autocurve.edges2 <-function (graph, start = 0.5)
{
cm <- count.multiple(graph)
mut <-is.mutual(graph) #are connections mutual?
el <- apply(get.edgelist(graph, names = FALSE), 1, paste,
collapse = ":")
ord <- order(el)
res <- numeric(length(ord))
p <- 1
while (p <= length(res)) {
m <- cm[ord[p]]
mut.obs <-mut[ord[p]] #are the connections mutual for this point?
idx <- p:(p + m - 1)
if (m == 1 & mut.obs==FALSE) { #no mutual conn = no curve
r <- 0
}
else {
r <- seq(-start, start, length = m)
}
res[ord[idx]] <- r
p <- p + m
}
res
}
And here's the result when adding a single, non-mutual edge (C->D):
library(igraph)
d <- data.frame(start=c("a","a","b","c","c"),end=c("b","b","c","b","d"))
graph <- graph.data.frame(d, directed=T)
curves <-autocurve.edges2(graph)
plot(graph, vertex.color="white", edge.curved=curves)