I am working with voronoi tessellations. I have different polygons representing regions in the tessellations.
The points below are used to draw the tessellation in the figure.
[,1] [,2]
1 -0.4960583 -0.3529047
2 -2.4986929 0.8897895
3 3.6514561 -1.3533369
4 -1.7263101 -5.5341202
5 2.2140143 0.3883696
6 -2.5208933 -1.4881461
7 -3.2556913 4.4535629
8 0.6423109 -2.8350062
9 -0.4160715 1.2676151
10 4.4059361 4.5641771
Using tessdata as input to draw the tessellation as below:
dd<-deldir(tessdata[,1], tessdata[,2])
Sammon coordinates are below.
[,1] [,2]
1 3.14162704 -1.45728604
2 2.35422623 2.46437927
3 -0.85051049 2.71503294
4 1.94310458 -0.45936958
5 0.08737757 3.74324701
6 1.23007799 1.34443842
7 0.01571924 2.19322032
8 1.43320754 2.64818631
9 -0.05463431 0.66980876
10 1.51344967 5.03351176
I want to construct the tessellations for which the sammon coordinate points are input. The tessellation using these points should be within one of the regions in the figure shown and for that, the above points should be scaled or we can restrict the plot of the tessellation within one of the regions in the above figure.
Hope i have covered all the necessary data.
sammon's projection comes in "MASS" package.
voronoi tessellations from "deldir" package.
dirsgs argument of the deldir function output will give the coordinates of the points forming the lines in the tessellations.
segments function of package graphics can be used to join the 2 points whose coordinates are extracted from dirsgs.
If you want to restrict the second set of points
to one of the tiles of the tessellation,
you can use tile.list to have a description of each tile,
and then check which points are in this tile
(there are many functions to do so:
in the following example, I use secr::pointsInPolygon).
# Sample data
x <- matrix( rnorm(20), nc = 2 )
y <- matrix( rnorm(1000), nc=2 )
# Tessellation
d <- deldir(x[,1], x[,2])
plot(d, wlines="tess")
# Pick a cell at random
cell <- sample( tile.list(d), 1 )[[1]]
points( cell$pt[1], cell$pt[2], pch=16 )
polygon( cell$x, cell$y, lwd=3 )
# Select the points inside that cell
i <- pointsInPolygon(
points(y[!i,], pch=".")
points(y[i,], pch="+")
# Compute a tessellation of those points
dd <- deldir(y[i,1], y[i,2])
plot(dd, wlines="tess", add=TRUE)
If, instead, you want to translate and rescale the points
to fit them into the tile, that is trickier.
We need to somehow estimate how far away from the tile the points are:
to this end, let us define a few auxilliary functions to compute,
first the distance from a point to a segment,
then the distance from a point to a polygon.
distance_to_segment <- function(M, A, B) {
norm <- function(u) sqrt(sum(u^2))
lambda <- sum( (B-A) * (M-A) ) / norm(B-A)^2
if( lambda <= 0 ) {
} else if( lambda >= 1 ) {
} else {
N <- A + lambda * (B-A)
A <- c(-.5,0)
B <- c(.5,.5)
x <- seq(-1,1,length=100)
y <- seq(-1,1,length=100)
z <- apply(
function(u) distance_to_segment( u, A, B )
image(x, y, matrix(z,nr=length(x)))
distance_to_polygon <- function(x, poly) {
closed_polygon <- rbind(poly, poly[1,])
if( pointsInPolygon( t(x), closed_polygon ) )
d <- rep(Inf, nrow(poly))
for(i in 1:nrow(poly)) {
A <- closed_polygon[i,]
B <- closed_polygon[i+1,]
d[i] <- distance_to_segment(x,A,B)
x <- matrix(rnorm(20),nc=2)
poly <- x[chull(x),]
x <- seq(-5,5,length=100)
y <- seq(-5,5,length=100)
z <- apply(
function(u) distance_to_polygon( u, poly )
image(x, y, matrix(z,nr=length(x)))
polygon(poly, lwd=3)
We can now look for a transformation of the form
x --> lambda * x + a
y --> lambda * y + b
that minimizes the (sum of the squared) distances to the polygon.
That is actually not sufficient: we are likely to end up with scaling factor
lambda equal to (or close to) zero.
To avoid this, we can add a penalty if lambda is small.
# Sample data
x <- matrix(rnorm(20),nc=2)
x <- x[chull(x),]
y <- matrix( c(1,2) + 5*rnorm(20), nc=2 )
plot(y, axes=FALSE, xlab="", ylab="")
# Function to minimize:
# either the sum of the squares of the distances to the polygon,
# if at least one point is outside,
# or minus the square of the scaling factor.
# It is not continuous, but (surprisingly) that does not seem to be a problem.
f <- function( p ) {
lambda <- log( 1 + exp(p[1]) )
a <- p[2:3]
y0 <- colMeans(y)
transformed_points <- t( lambda * (t(y)-y0) + a )
distances <- apply(
function(u) distance_to_polygon(u, x)
if( all(distances == 0) ) - lambda^2
else sum( distances^2 )
# Minimize this function
p <- optim(c(1,0,0), f)$par
# Compute the optimal parameters
lambda <- log( 1 + exp(p[1]) )
a <- p[2:3]
y0 <- colMeans(y)
# Compute the new coordinates
transformed_points <- t( lambda * (t(y)-y0) + a )
# Plot them
segments( y[,1], y[,2], transformed_points[,1], transformed_points[,2], lty=3 )
points( transformed_points, pch=3 )
deldir( transformed_points[,1], transformed_points[,2] ),
wlines="tess", add=TRUE
Consider the Togliatti implicit surface. I want to clip it to the ball centered at the origin with radius 4.8. A solution, with the misc3d package, consists in using the mask argument of the computeContour3d function, which allows to use only the points satisfying x^2+y^2+z^2 < 4.8^2:
# Togliatti surface equation: f(x,y,z) = 0
f <- function(x,y,z){
w <- 1
(x^4-4*x^3*w-10*x^2*y^2-4*x^2*w^2+16*x*w^3-20*x*y^2*w+5*y^4+16*w^4-20*y^2*w^2) -
# make grid
nx <- 220; ny <- 220; nz <- 220
x <- seq(-5, 5, length=nx)
y <- seq(-5, 5, length=ny)
z <- seq(-4, 4, length=nz)
g <- expand.grid(x=x, y=y, z=z)
# calculate voxel
voxel <- array(with(g, f(x,y,z)), dim = c(nx,ny,nz))
# mask: keep points satisfying x^2+y^2+z^2 < 4.8^2, in order to
# clip the surface to the ball of radius 4.8
mask <- array(with(g, x^2+y^2+z^2 < 4.8^2), dim = c(nx,ny,nz))
# compute isosurface
surf <- computeContour3d(voxel, maxvol=max(voxel), level=0, mask=mask, x=x, y=y, z=z)
# draw isosurface
drawScene.rgl(makeTriangles(surf, smooth=TRUE))
But the borders of the resulting surface are irregular:
How to get regular, smooth borders?
The solution I found resorts to spherical coordinates. It consists in defining the function f in terms of spherical coordinates (ρ, θ, ϕ), then to compute the isosurface with ρ running from 0 to the desired radius, and then to transform the result to Cartesian coordinates:
# Togliatti surface equation with spherical coordinates
f <- function(ρ, θ, ϕ){
w <- 1
x <- ρ*cos(θ)*sin(ϕ)
y <- ρ*sin(θ)*sin(ϕ)
z <- ρ*cos(ϕ)
(x^4-4*x^3*w-10*x^2*y^2-4*x^2*w^2+16*x*w^3-20*x*y^2*w+5*y^4+16*w^4-20*y^2*w^2) -
# make grid
nρ <- 300; nθ <- 400; nϕ <- 300
ρ <- seq(0, 4.8, length = nρ) # ρ runs from 0 to the desired radius
θ <- seq(0, 2*pi, length = nθ)
ϕ <- seq(0, pi, length = nϕ)
g <- expand.grid(ρ=ρ, θ=θ, ϕ=ϕ)
# calculate voxel
voxel <- array(with(g, f(ρ,θ,ϕ)), dim = c(nρ,nθ,nϕ))
# calculate isosurface
surf <- computeContour3d(voxel, maxvol=max(voxel), level=0, x=ρ, y=θ, z=ϕ)
# transform to Cartesian coordinates
surf <- t(apply(surf, 1, function(rtp){
ρ <- rtp[1]; θ <- rtp[2]; ϕ <- rtp[3]
# draw isosurface
drawScene.rgl(makeTriangles(surf, smooth=TRUE, color = "violetred"))
Now the resulting surface has regular, smooth borders:
Your solution is excellent for the problem you stated, because spherical coordinates are so natural for that boundary. However, here is a more general solution that would work for other smooth boundaries.
The idea is to allow input of a boundary function, and cull points when they are too large or too small. In your case it would be the squared distance from the origin, and you would want to cull points where the value is bigger than 4.8^2. But sometimes the triangles being drawn to make the smooth surface should only be partially culled: one point would be kept and two deleted, or two kept and one deleted. If you cull the whole triangle that leads to the jagged edges in your original plot.
To fix this, the points can be modified. If only one is supposed to be kept, then the other two points can be shrunk towards it until they lie on an approximation to the boundary. If two are supposed to be kept you want the shape to be a quadrilateral, so you would build that out of two triangles.
This function does that, assuming the input surf is the output of computeContour3d:
boundSurface <- function(surf, boundFn, bound = 0, greater = TRUE) {
# Surf is n x 3: each row is a point, triplets are triangles
values <- matrix(boundFn(surf) - bound, 3)
# values is (m = n/3) x 3: each row is the boundFn value at one point
# of a triangle
if (!greater)
values <- -values
keep <- values >= 0
# counts is m vector counting number of points to keep in each triangle
counts <- apply(keep, 2, sum)
# result is initialized to an empty array
result <- matrix(nrow = 0, ncol = 3)
# singles is set to all the rows of surf where exactly one
# point in the triangle is kept, say s x 3
singles <- surf[rep(counts == 1, each = 3),]
if (length(singles)) {
# singleValues is a subset of values where only one vertex is kept
singleValues <- values[, counts == 1]
singleIndex <- 3*col(singleValues) + 1:3 - 3
# good is the index of the vertex to keep, bad are those to fix
good <- apply(singleValues, 2, function(col) which(col >= 0))
bad <- apply(singleValues, 2, function(col) which(col < 0))
for (j in 1:ncol(singleValues)) {
goodval <- singleValues[good[j], j]
for (i in 1:2) {
badval <- singleValues[bad[i,j], j]
alpha <- goodval/(goodval - badval)
singles[singleIndex[bad[i,j], j], ] <-
(1-alpha)*singles[singleIndex[good[j], j],] +
alpha *singles[singleIndex[bad[i,j], j],]
result <- rbind(result, singles)
doubles <- surf[rep(counts == 2, each = 3),]
if (length(doubles)) {
# doubleValues is a subset of values where two vertices are kept
doubleValues <- values[, counts == 2]
doubleIndex <- 3*col(doubleValues) + 1:3 - 3
doubles2 <- doubles
# good is the index of the vertex to keep, bad are those to fix
good <- apply(doubleValues, 2, function(col) which(col >= 0))
bad <- apply(doubleValues, 2, function(col) which(col < 0))
newvert <- matrix(NA, 2, 3)
for (j in 1:ncol(doubleValues)) {
badval <- doubleValues[bad[j], j]
for (i in 1:2) {
goodval <- doubleValues[good[i,j], j]
alpha <- goodval/(goodval - badval)
newvert[i,] <-
(1-alpha)*doubles[doubleIndex[good[i,j], j],] +
alpha *doubles[doubleIndex[bad[j], j],]
doubles[doubleIndex[bad[j], j],] <- newvert[1,]
doubles2[doubleIndex[good[1,j], j],] <- newvert[1,]
doubles2[doubleIndex[bad[j], j],] <- newvert[2,]
result <- rbind(result, doubles, doubles2)
# Finally add all the rows of surf where the whole
# triangle is kept
rbind(result, surf[rep(counts == 3, each = 3),])
You would use it after computeContour3d and before makeTriangles, e.g.
fn <- function(x) {
apply(x^2, 1, sum)
drawScene.rgl(makeTriangles(boundSurface(surf, fn, bound = 4.8^2,
greater = FALSE),
smooth = TRUE))
Here's the output I see:
It's not quite as good as yours, but it would work for many different boundary functions.
Edited to add: Version 0.100.26 of rgl now has a function clipMesh3d which incorporates these ideas.
I have a curve whose Y-values are produced by my small R function below (neatly annotated). If you run my entire R code, you see my curve (but remember, it's a function so if I changed the argument values, I could get a different curve):
Obviously, one can determine/assume many intervals that would cover/take 95% of the total area under this curve. But using, optimize(), how can I find the SHORTEST (in x-value units) of these many possible 95% intervals? What then would be the corresponding x-values for the the two ends of this shortest 95% interval?
Note: The idea of shortest interval for a uni-modal curve like mine makes sense. In reality, the shortest one would be the one that tends to be toward the middle where the height (y-value) is larger, so then x-value doesn't need to be so large for the intended interval to cover/take 95% of the total area under the curve.
Here is my R code (please run the entire code):
ppp <- function(f, N, df1, df2, petasq, alpha, beta) {
pp <- function(petasq) dbeta(petasq, alpha, beta)
ll <- function(petasq) df(f, df1, df2, (petasq * N) / (1 - petasq) )
marg <- integrate(function(x) pp(x)*ll(x), 0, 1)[[1]]
po <- function(x) pp(x)*ll(x) / marg
return(po(petasq) )
# Now I use my function above to get the y-values for my plot:
petasq <- seq(0, 1, by = .0001) ## These are X-values for my plot
f <- 30 # a function needed argument
df1 <- 3 # a function needed argument
df2 <- 108 # a function needed argument
N <- 120 # a function needed argument
alpha = 5 # a function needed argument
beta = 4 # a function needed argument
## Now use the ppp() function to get the Y-values for the X-value range above:
y.values <- ppp(f, N, df1, df2, petasq, alpha, beta)
## Finally plot petasq (as X-values) against the Y.values:
plot(petasq, y.values, ty="l", lwd = 3 )
Based on your revised question, I found the optimization that minimizes the SHORTEST distance (in x-value units) between LEFT and RIGHT boundaries:
ppp <- function(petasq, f, N, df1, df2, alpha, beta) {
pp <- function(petasq) dbeta(petasq, alpha, beta)
ll <- function(petasq) df(f, df1, df2, (petasq * N) / (1 - petasq) )
marg <- integrate(function(x) pp(x)*ll(x), 0, 1)[[1]]
po <- function(x) pp(x)*ll(x) / marg
return(po(petasq) )
petasq <- seq(0, 1, by = .0001) ## These are X-values for my plot
f <- 30 # a function needed argument
df1 <- 3 # a function needed argument
df2 <- 108 # a function needed argument
N <- 120 # a function needed argument
alpha = 5 # a function needed argument
beta = 4 # a function needed argument
optim_func <- function(x_left) {
int_function <- function(petasq) {
ppp(petasq, f=f, N=N, df1=df1, df2=df2, alpha=alpha, beta=beta)
# For every LEFT value, find the corresponding RIGHT value that gives 95% area.
find_95_right <- function(x_right) {
(0.95 - integrate(int_function, lower=x_left, upper=x_right, subdivisions = 10000)$value)^2
x_right_obj <- optimize(f=find_95_right, interval=c(0.5,1))
if(x_right_obj$objective > .Machine$double.eps^0.25) return(100)
return(x_right_obj$minimum - x_left)
x_left <- optimize(f=optim_func, interval=c(0.30,0.40))$minimum
find_95_right <- function(x_right) {
(0.95 - integrate(int_function, lower=x_left, upper=x_right, subdivisions = 10000)$value)^2
int_function <- function(petasq) {
ppp(petasq, f=f, N=N, df1=df1, df2=df2, alpha=alpha, beta=beta)
x_right <- optimize(f=find_95_right, interval=c(0.5,1))$minimum
See the comments in the code. Hopefully this finally satisfies your question :) Results:
> x_right
[1] 0.5409488
> x_left
[1] 0.3201584
Also, you can plot the distance between LEFT and RIGHT as a function of the left boundary:
left_x_values <- seq(0.30, 0.335, 0.0001)
DISTANCE <- sapply(left_x_values, optim_func)
plot(left_x_values, DISTANCE, type="l")
If we think of this as trying to calculate the interval with the smallest area, we can start calculating the areas of each of the regions we are plotting. We can then find the largest area (which presumably will be near the center) and start walking out till we found the area we are looking for.
Since you've already calculate the x and y values for the plot, i'll reuse those to save some calculations. Here's an implementation of that algorithm
pseduoarea <- function(x, y, target=.95) {
dx <- diff(x)
areas <- dx * .5 * (head(y,-1) + tail(y, -1))
peak <- which.max(areas)
range <- c(peak, peak)
found <- areas[peak]
while(found < target) {
if(areas[range[1]-1] > areas[range[2]+1]) {
range[1] <- range[1]-1
found <- found + areas[range[1]-1]
} else {
range[2] <- range[2]+1
found <- found + areas[range[2]+1]
attr(val, "indexes")<-range
attr(val, "area")<-found
And we call it with
pseduoarea(petasq, y.values)
# [1] 0.3194 0.5413
This does assume that all the values in petasq are equally spaced
I don't think you need to use optimize (unless this were part of an unadmitted homework assignment). Instead just normalize a cumulative sum and figure out at which points your criteria are met:
> which(cusm.y >= 0.025)[1]
[1] 3163
> which(cusm.y >= 0.975)[1]
[1] 5375
You can check that these are reasonable indices to use for the pulling values from the petasq vector with:
abline( v= c( petasq[ c( which(cusm.y >= 0.025)[1], which(cusm.y >= 0.975)[1])]),
This is admittedly equivalent to constructing an integration function with a normalization constant across the domain of the "density" function. The fact that the intervals are all of equal dimension allows omitting the differencing of "x"-vector from the height times base calculation.
I suppose there is another interpretation possible. That would require that we discover how many values of an ascending-sorted version of petasq are needed to sum to 95% of the total sum. This gives a different strategy and the plot shows where a horizontal line would intersect the curve:
which( cumsum( sort( y.values, decreasing=TRUE) ) > 0.95* sum(y.values, na.rm=TRUE) )[1]
#[1] 2208
sort( y.values, decreasing=TRUE)[2208]
#[1] 1.059978
plot(petasq, y.values, ty="l", lwd = 3 )
abline( h=sort( y.values, decreasing=TRUE)[2208], col="blue")
To get the petasq values you would need to determine the first y.values that exceeded that value and then the next y.values that dropped below that level. These can be obtained via:
order(y.values, decreasing=TRUE)[2208]
#[1] 3202
order(y.values, decreasing=TRUE)[2209]
#[1] 5410
And then the plot would look like:
png(); plot(petasq, y.values, ty="l", lwd = 3 )
abline( v= petasq[ c(3202, 5410)], col="blue", lty=3, lwd=2)
The area between the two dotted blue lines is 95% of the total area above the zero line:
I have a matrix containing position (X,Y,elevation). I add a column to the matrix I call "index". I create a X and a Y vector from the matrix. They both include the index column. I then sort in ascending value the X and Y vector I just made. I then construct a Z matrix containing the elevation and I relate it to the position using the index. I then try to use the command contour (I want to plot a contour graph) and I get the error saying that X and Y should be ascending order... which I just made !!! What did I do wrong?
Matrice_Noeud<-matrix(ncol = ncol(noeud), nrow=nrow(noeud))
for (i in 1:nrow(noeud)) {
for (j in 1:ncol(noeud)) {
Matrice_Noeud <- cbind(Matrice_Noeud, c(seq(1,nrow(noeud),1)))
Z<-matrix(NA, ncol=nrow(noeud),nrow=nrow(noeud))
for (x_i in 1:nrow(noeud)) {
for (y_i in 1:nrow(noeud)) {
if (Y$Index[y_i]==X$Index[x_i]) {
OK, since I'd started doing it, I've done it.
#### making example data
## assumptions: length(unique(x))=19, length(unique(y))=12, nrow(data)=121
## (They mean the number of grid points is 19 * 12 = 228, but z.value is only 121.)
xyz.f <- function(m, n) - m + (n - 7)^2 + 16 # make z from x and y (it means nothing special)
xyz <- cbind( xyz <- expand.grid(x = round(seq(11,15,,19), 2), y = round(seq(6,10,,12), 2)),
z = apply(xyz, 1, function(k) xyz.f(k[1], k[2])) )
set.seed(1); ind <- sample(19*12, 121) # decide to use the 121 z of 19*12
noeud <- as.matrix(xyz[ind,]) # example data maked out
#### making contour()'s arguments
Xx <- sort(unique(noeud[,1]))
Yy <- sort(unique(noeud[,2])) # nrow(noeud); length(Xx); length(Yy) # OK (121, 19, 12)
Zz <- matrix(NA, ncol=length(Yy), nrow=length(Xx)) # make 19 x 12 Z matrix (empty)
# In each row, calculate x (y) value is what number in Xx (Yy) (= the position in Z matrix)
X0 <- as.numeric( factor( noeud[,1] ) ) # (edit) using Mr.Tufte's code in R help mailing.
Y0 <- as.numeric( factor( noeud[,2] ) )
apply(cbind(X0, Y0, noeud[,3]), 1, function (a) Zz[ a[1], a[2] ] <<- a[3])
## contour()'s arguments ( Xx, Yy, Zz ) maked out
contour(Xx, Yy, Zz, xlab="including NAs") # length(Zz); length(Zz[!is.na(Zz)]) # OK (228,121)
#### interpolating
## I know few packages having interpolation functions.
library(akima) # use cubic spline interpolation methods of H. Akima
NOEUD <- interp(noeud[,1], noeud[,2], noeud[,3])
#### results
par.old <- par(no.readonly=T); par(mfrow=c(1,3), mar=c(4,0,1,0))
contour(Xx, Yy, Zz, xlab="including NAs", yaxt="n") # the including NAs data
contour(NOEUD, xlab="Akima interpolation", yaxt="n") # the Akima interpolation data
contour(Xx, Yy, matrix(xyz[,3], nrow=19), xlab="origin", yaxt="n") # the origin data
# (edit) I noticed some interp()'s arguments make a difference (default: linear=T, extrap=F).
contour(interp(noeud[,1], noeud[,2], noeud[,3], linear=T, extrap=F), xlab="Akima interp() default")
contour(interp(noeud[,1], noeud[,2], noeud[,3], linear=F, extrap=F), xlab="interp(linear=F)")
contour(interp(noeud[,1], noeud[,2], noeud[,3], linear=F, extrap=T), xlab="interp(linear=F, extrap=T)")
### supplement (using the same data, output is about the same)
noeud2 <- data.frame(x=noeud[,1], y=noeud[,2], z=noeud[,3]) # equal to the including NAs data
NOEUD2 <- cbind(expand.grid(x=NOEUD$x, y=NOEUD$y), z=c(NOEUD$z)) # equal to the Akima interpolation data
ggplot2::ggplot( noeud2, aes( x, y, z = z )) + geom_contour()
lattice::contourplot( z ~ x * y, NOEUD2 )
This is a slightly specific problem, so a bit of knowledge of R and of Bézier curves is required to be of help... (thanks if you do!!)
So I need some help with my R code: I have a series of discretely sampled observations and I am trying to fit a Bézier Curve of the 5th order through these points with simple LSS regression. I have some limitations on the position of the 6 control points:
A & B have the same Y-axis coordinate
B & C have the same X-axis coordinate
C & D have the same Y-axis coordinate
D & E have the same X-axis coordinate
E & F have the same Y-axis coordinate
A is located on the observation 2 turning points ago from the last
The X-axis coordinate of the last observation is
somewhere between the X-axis coordinates of E and F
Like this image:
Say I have these data:
They have a "curvy" shape so a Bézier curve would fit: the result of my code is this image: the data are in red, the 5th order Bézier and its control points with their restrictions in blue:
Like this image:
So you see that I have some kind of solution, but this is the problem:
The X-axis location of right-most control point is always to the right of the last input data point, and to get an appropriate fit, I had to require a value of t (t goes from 0 to 1 in a Bézier) where t is at if the input data end (the "limit" variable in my code). How do I rewrite it so I don't have to do that anymore, and the horizontal spread of the t-values remains constant, also outside of the input data?
(given the restrictions on the control points, and maximizing the fit of the part of the curve that overlaps with the input data)
If you can help, please take a look at this R code, any help is .. much much appreciated and happy holidays!!
ps: what I call exampledata.csv in my code is just the data above.
getT <- function(x){
# Calculates length from origin of each point in the path.
# args:
# x : a one dimensional vector
# Returns:
# out : a vector of distances from the origin, as a percent of end point - start point distance
out <- cumsum(abs(diff(x)))
out <- c(0, out/ out[length(out)])
cost_f <- function(X,Y,K){
pred <-K%*%X
c <- Y- pred
out <- list(loss= as.vector(t(c)%*%c), pred = pred)
df <- read.csv('exampledata.csv')
T <- nrow(df)
df['d'] = 1:T
# # identify all turning points:
# turn_point <- c(1)
# for(i in 2:(T-1)){
# if( ( (df[i,'x'] < df[i-1,'x']) & (df[i,'x'] < df[i+1,'x'])) | ( (df[i,'x'] > df[i-1,'x']) & (df[i,'x'] > df[i+1,'x'])) ){
# turn_point <- c(turn_point, i)
# }
# }
fit_last_piece <- function(df){
limit <- .79
turn_point <- c(1)
for(i in 2:(T-1)){
if( ( (df[i,'x'] < df[i-1,'x']) & (df[i,'x'] < df[i+1,'x'])) | ( (df[i,'x'] > df[i-1,'x']) & (df[i,'x'] > df[i+1,'x'])) ){
turn_point <- c(turn_point, i)
nk <- length(turn_point) # number of turning points
data <- df[turn_point[nk-1]:nrow(df),]
end_x <- data$d[1]
end_y <- data$x[1]
constr_x <- matrix(c(1,0,0,0,0,0, # remember data is input column to column
0,0,0,0,0,1),nrow = 6, ncol = 4)
constr_y <- matrix(c(1,1,0,0,0,0,
0,0,0,0,1,1),nrow = 6, ncol = 3)
M = matrix(c(-1,5,-10,10,-5,1,
1,0,0,0,0,0),nrow = 6, ncol = 6)
t_x = getT(data$d)*limit
T_x = cbind(t_x^5, t_x^4 ,t_x^3, t_x^2, t_x,rep(1,length(t_x)))
in_par <- ( tail(data$d,1)-data$d[1])*c(2/5,4/5,6/5) + data$d[1] # initial values of the intermediate x levels are at 1/3 and 2/3 midpoints
res_x <- optim(par = in_par, fn = function(par){cost_f(c(data$d[1], par[1],par[2], par[3]), data$d, T_x%*%M%*%constr_x)$loss})
#res_x <- optimize(f = function(par){cost_f(c(df$d[1],par,df$d[nrow(df)]), df$d, T_x%*%M%*%constr_x)$loss}, interval = c(df$d[1],df$d[nrow(df)]),tol = .Machine$double.eps^0.25)
optim_x <- c(data$d[1],res_x$par)
pred_x <- cost_f(optim_x, data$d, T_x%*%M%*%constr_x)$pred
t_y = getT(data$x)*limit
T_y = cbind(t_y^5, t_y^4,t_y^3, t_y^2, t_y,rep(1,length(t_y)))
in_par <- c()
res_y <- optim(par = c(data$x[floor(nrow(data)/2)],tail(data$x,1)), fn = function(par){cost_f(c(data$x[1],par[1],par[2]), data$x, T_y%*%M%*%constr_y)$loss})
optim_y <- c(data$x[1],res_y$par[1],res_y$par[2])
#pred_y <- cost_f(res_y$par, df$x, T_y%*%M%*%constr_y)$pred
pred_y <- cost_f(optim_y, data$x, T_y%*%M%*%constr_y)$pred
t_x_p <- c(t_x,seq(tail(t_x,1),1,length.out = 10))
T_x_p <- cbind(t_x_p^5, t_x_p^4 ,t_x_p^3, t_x_p^2, t_x_p,rep(1,length(t_x_p)))
t_y_p <- c(t_y,seq(tail(t_y,1),1,length.out = 10))
T_y_p <- cbind(t_y_p^5, t_y_p^4 ,t_y_p^3, t_y_p^2, t_y_p,rep(1,length(t_y_p)))
pred_x <- T_x_p%*%M%*%constr_x%*%optim_x
pred_y <- T_y_p%*%M%*%constr_y%*%optim_y
# this part is new:
plot(pred_x,pred_y, ylim = c(min(c(data$x, pred_y,res_y$par)), max(c(data$x, pred_y,res_y$par))),col="blue",type="b")
points(data$d,data$x,col = 'red',type="b")
I want to colour the area under a curve. The area with y > 0 should be red, the area with y < 0 should be green.
x <- c(1:4)
y <- c(0,1,-1,2,rep(0,4))
Using ifelse() does not work:
What I achieved so far is the following:
But then the red area is too large. Do you have any ideas how to get the desired result?
If you want two different colors, you need two different polygons. You can either call polygon multiple times, or you can add NA values in your x and y vectors to indicate a new polygon. R will not automatically calculate the intersection for you. You must do that yourself. Here's how you could draw that with different colors.
x <- c(1,2,2.5,NA,2.5,3,4)
y <- c(0,1,0,NA,0,-1,0)
#calculate color based on most extreme y value
g <- cumsum(is.na(x))
gc <- ifelse(tapply(y, g,
function(x) x[which.max(abs(x))])>0,
plot(c(1, 4),c(-1,1), type = "n")
polygon(x, y, col = gc)
In the more general case, it might not be as easy to split a polygon into different regions. There seems to be some support for this type of operation in GIS packages, where this type of thing is more common. However, I've put together a somewhat general case that may work for simple polygons.
First, I define a closure that will define a cutting line. The function will take a slope and y-intercept for a line and will return the functions we need to cut a polygon.
getSplitLine <- function(m=1, b=0) {
force(m); force(b)
classify <- function(x,y) {
y >= m*x + b
intercepts <- function(x,y, class=classify(x,y)) {
w <- which(diff(class)!=0)
m2 <- (y[w+1]-y[w])/(x[w+1]-x[w])
b2 <- y[w] - m2*x[w]
ix <- (b2-b)/(m-m2)
iy <- ix*m + b
data.frame(x=ix,y=iy,idx=w+.5, dir=((rank(ix, ties="first")+1) %/% 2) %% 2 +1)
plot <- function(...) {
Now we will define a function to actually split a polygon using the splitter we've just defined.
splitPolygon <- function(x, y, splitter) {
addnullrow <- function(x) if (!all(is.na(x[nrow(x),]))) rbind(x, NA) else x
rollup <- function(x,i=1) rbind(x[(i+1):nrow(x),], x[1:i,])
idx <- cumsum(is.na(x) | is.na(y))
polys <- split(data.frame(x=x,y=y)[!is.na(x),], idx[!is.na(x)])
r <- lapply(polys, function(P) {
x <- P$x; y<-P$y
side <- splitter$classify(x, y)
if(side[1] != side[length(side)]) {
ints <- splitter$intercepts(c(x,x[1]), c(y, y[1]), c(side, side[1]))
} else {
ints <- splitter$intercepts(x, y, side)
sideps <- lapply(unique(side), function(ss) {
pts <- data.frame(x=x[side==ss], y=y[side==ss],
idx=seq_along(x)[side==ss], dir=0)
mm <- rbind(pts, ints)
mm <- mm[order(mm$idx), ]
br <- cumsum(mm$dir!=0 & c(0,head(mm$dir,-1))!=0 &
if (length(unique(br))>1) {
mm<-rollup(mm, sum(br==br[1]))
br <- cumsum(c(FALSE,abs(diff(mm$dir*mm$dir))==3))
do.call(rbind, lapply(split(mm, br), addnullrow))
pss<-rep(unique(side), sapply(sideps, nrow))
ps<-do.call(rbind, lapply(sideps, addnullrow))[,c("x","y")]
attr(ps, "side")<-pss
pss<-unname(unlist(lapply(r, attr, "side")))
src <- rep(seq_along(r), sapply(r, nrow))
r <- do.call(rbind, r)
attr(r, "source")<-src
attr(r, "side")<-pss
The input is just the values of x and y as you would pass to polygon along with the cutter. It will return a data.frame with x and y values that can be used with polygon.
For example
x <- c(1,2,2.5,NA,2.5,3,4)
y <- c(1,-2,2,NA,-1,2,-2)
plot(range(x, na.rm=T),range(y, na.rm=T), type = "n")
p <- splitPolygon(x,y,sl)
g <- cumsum(c(F, is.na(head(p$y,-1))))
gc <- ifelse(attr(p,"side")[is.na(p$y)],
polygon(p, col=gc)
sl$plot(lty=2, col="grey")
This should work for simple concave polygons as well with sloped lines. Here's another example
x <- c(1,2,3,4,5,4,3,2)
y <- c(-2,2,1,2,-2,.5,-.5,.5)
plot(range(x, na.rm=T),range(y, na.rm=T), type = "n")
p <- splitPolygon(x,y,sl)
g <- cumsum(c(F, is.na(head(p$y,-1))))
gc <- ifelse(attr(p,"side")[is.na(p$y)],
polygon(p, col=gc)
sl$plot(lty=2, col="grey")
Right now things can get a bit messy when the the vertex of the polygon falls directly on the splitting line. I may try to correct that in the future.
A faster, but not very accurate solution is to split data frame to list according to grouping variable (e.g. above=red and below=blue). This is a pretty nice workaround for rather big (I would say > 100 elements) datasets. For smaller chunks some discontinuity may be visible:
x <- 1:100
y1 <- sin(1:100/10)*0.8
y2 <- sin(1:100/10)*1.2
plot(x, y2, type='l')
lines(x, y1, col='red')
df <- data.frame(x=x, y1=y1, y2=y2)
df$pos_neg <- ifelse(df$y2-df$y1>0,1,-1) # above (1) or below (-1) average
# create the number for chunks to be split into lists:
df$chunk <- c(1,cumsum(abs(diff(df$pos_neg)))/2+1) # first element needs to be added`
df$colors <- ifelse(df$pos_neg>0, "red","blue") # colors to be used for filling the polygons
# create lists to be plotted:
l <- split(df, df$chunk) # we should get 4 sub-lists
lapply(l, function(x) polygon(c(x$x,rev(x$x)),c(x$y2,rev(x$y1)),col=x$colors))
As I said, for smaller dataset some discontinuity may be visible if sharp changes occur between positive and negative areas, but if horizontal line distinguishes between those two, or more elements are plotted then this effect is neglected: