Plot a Region in R - r

I generated a Matrix with 100 random x-y-Coordinates in the [-1,1]^2 Interval:
n <- 100
datam <- matrix(c(rep(1,n), 2*runif(n)-1, 2*runif(n)-1), n)
# leading 1 column needed for computation
# second column has x coordinates, third column has y coordinates
and classified them into 2 classes -1 and 1 by a given target function f (a vector).
I computed a hypothesis function g and now want to visualize how good it matches the
target function f.
f <- c(1.0, 0.5320523, 0.6918301) # the given target function
ylist <- sign(datam %*% f) # classify into -1 and 1
# perceptron algorithm to find g:
perceptron = function(datam, ylist) {
w <- c(1,0,0) # starting vector
made.mistake = TRUE
while (made.mistake) {
for (i in 1:n) {
if (ylist[i] != sign(t(w) %*% datam[i,])) {
w <- w + ylist[i]*datam[i,]
g <- perceptron(datam, ylist)
I now want to compare f to g in plot.
I can do this quite easily in mathematica. Shown here is the data set with the target function f that separates the data in the +1 and -1 parts:
This mathematica plot shows both f and g in comparison (different data set and f)
This is the corresponding mathematica code
ContourPlot[g.{1, x1, x2} == 0, {x1, -1, 1}, {x2, -1, 1}]
How can I do something similar in R (ggplot would be nice)?

Same thing using ggplot. This example follows your code exactly, then adds at the end:
# OP's code...
# ...
glist <- sign(datam %*% g)
library(reshape2) # for melt(...)
library(plyr) # for .(...)
df <- data.frame(datam,f=ylist,g=glist) # df has columns: X1, X2, X3, f, g
gg <- melt(df,id.vars=c("X1","X2","X3"),"model")
ggp <- ggplot(gg, aes(x=X2, y=X3, color=factor(value)))
ggp <- ggp + geom_point()
ggp <- ggp + geom_abline(subset=.(model=="f"),intercept=-f[1]/f[3],slope=-f[2]/f[3])
ggp <- ggp + geom_abline(subset=.(model=="g"),intercept=-g[1]/g[3],slope=-g[2]/g[3])
ggp <- ggp + facet_wrap(~model)
ggp <- ggp + scale_color_discrete(name="Mistake")
ggp <- ggp + labs(title=paste0("Comparison of Target (f) and Hypothesis (g) [n=",n,"]"))
ggp <- ggp + theme(plot.title=element_text(face="bold"))
Below are results for n=200, 500, and 1000. When n=100, g=c(1,0,0). You can see that f and g converge for n~500.
In case you are new to ggplot: first we create a data frame (df) which has the coordinates (X2 and X3) and two columns for the classifications based on f and g. Then we use melt(...) to convert this to a new dataframe, gg, in "long" format. gg has columns X1, X2, X3, model, and value. The column, gg$model identifies the model (f or g). The corresponding classifications are in gg$value. Then the ggplot calls do the following:
Establish the default dataset, gg, the x and y coords, and the coloring [ggplot(...)]
Add the points layer [geom_point(...)]
Add lines separating the classifications [geom_abline(...)]
Tell ggplot to plot the two models in different "facets" [facet_wrap(...)]
Set the legend name.
Set the plot title.
Make the plot title bold.

Your example is still not reproducible. Look at my code and you will see that f and g are identical. Also, it seems as you are extrapolating the lines (second part of your questions) for data points you don't have. Have you any evidence that the discrimination should be linear?
#Data generation
n <- 10000
datam <- matrix(c(rep(1,n), 2*runif(n)-1, 2*runif(n)-1), n)
# leading 1 column needed for computation
# second column has x coordinates, third column has y coordinates
f <- c(1.0, 0.5320523, 0.6918301) # the given target function
f.col <- ifelse(sign(datam %*% f)==1,"darkred", "darkblue")<-sign(datam %*% f)
# perceptron algorithm to find g:
perceptron = function(datam, ylist) {
w <- c(1,0,0) # starting vector
made.mistake = TRUE
while (made.mistake) {
for (i in 1:n) {
if (ylist[i] != sign(t(w) %*% datam[i,])) {
w <- w + ylist[i]*datam[i,]
g <- perceptron(datam,<-sign(datam %*% g)
Plotting the overall data
plot(datam.df$X2, datam.df$X3, col=f.col, pch=".", cex=2)
I will produce separate plots for the g and f function since something is not working in your example and f and g are identical. Once you sort this out you can put all in one plot.You can also see and choose if you want shadowing or not. If you have no evidence that the classification are linear it's probably more wise to use chull() to mark the data you have.
For the f function
plot(datam.df$X2, datam.df$X3, col=f.col, pch=".", xlim=c(-1,-0.5), ylim=c(-1,-.5), cex=3, main="f function")
ch.f<-chull(datam.df.f$X2, datam.df.f$X3 )
ch.f <- rbind(x = datam.df.f[ch.f, ], datam.df.f[ch.f[1], ])
polygon(ch.f, lwd=3, col=rgb(0,0,180,alpha=50, maxColorValue=255))
For the g function
g.col <- ifelse(sign(datam %*% g)==1,"darkred", "darkblue")
plot(datam.df$X2, datam.df$X3, col=g.col, pch=".", xlim=c(-1,-0.5), ylim=c(-1,-.5), cex=3, main="g function")
ch.g<-chull(datam.df.g$X2, datam.df.g$X3 )
ch.g <- rbind(x = datam.df.g[ch.g, ], datam.df.g[ch.g[1], ])
polygon(ch.g, col=rgb(0,0,180,alpha=50, maxColorValue=255), lty=3, lwd=3)
the ch.f and ch.g objects are the coordinates for the "bag" around your points. You can extract the points to describe your line.
lm.f<-lm(c(ch.f$X3[ ch.f$X2> -0.99 & ch.f$X2< -0.65 & ch.f$X3<0 ])~c(ch.f$X2[ ch.f$X2>-0.99 & ch.f$X2< -0.65 & ch.f$X3<0]))
curve(lm.f$coefficients[1]+x*lm.f$coefficients[2], from=-1., to=-0.59, lwd=5, add=T)
lm.g<-lm(c(ch.g$X3[ ch.g$X2> -0.99 & ch.g$X2< -0.65 & ch.g$X3<0 ])~c(ch.g$X2[ ch.g$X2>-0.99 & ch.g$X2< -0.65 & ch.g$X3<0]))
curve(lm.g$coefficients[1]+x*lm.g$coefficients[2], from=-1., to=-0.59, lwd=5, add=T, lty=3)
And you get
Unfortunately because f and g functions are same in your example you cannot see the different lines in the above picture

You can use the col argument in plot() to indicate classification of the f() function. And you can use polygon() to shade the classification area of your g() function. If you give us a reproducible example we could answer with specific code. It would result in a figure similar to that of Mathematica you present.


How to find outer limits of contour lines for three-dimensional data in r

I would like to extend the example given here
How to plot a contour line showing where 95% of values fall within, in R and in ggplot2
to data with three dimensions (x, y and z), and instead of plotting the contour line I'd like to get the limits of the x, y and z values.
This is the example from the previous post.
d <- data.frame(x=rnorm(1000),y=rnorm(1000))
kd <- ks::kde(d, compute.cont=TRUE)
contour_95 <- with(kd, contourLines(x=eval.points[[1]], y=eval.points[[2]],
z=estimate, levels=cont["5%"])[[1]])
contour_95 <- data.frame(contour_95)
ggplot(data=d, aes(x, y)) +
geom_point() +
geom_path(aes(x, y), data=contour_95) +
and then, it's possible to get the limits of the contour like this:
I would love to know how to get the x, y and z ranges of 3-D contours at specified percentiles.
ks:kde can deal with higher dimensions, but contourLines() cant.
This is what I've tried...
d <- data.frame(x=rnorm(1000),y=rnorm(1000), y=rnorm(1000))
kd <- ks::kde(d, compute.cont=TRUE)
#what kd$estimates are > 95th percentile?
#make function that can extract from 3d array
multi.which <- function(A){
if ( is.vector(A) ) return(which(A))
d <- dim(A)
T <- which(A) - 1
nd <- length(d)
t( sapply(T, function(t){
I <- integer(nd)
I[1] <- t %% d[1]
sapply(2:nd, function(j){
I[j] <<- (t %/% prod(d[1:(j-1)])) %% d[j]
}) + 1 )
#extract those estimates that have >density than 95th percentile
ests <- multi.which(kd$estimate > kd$cont["5%"])
#make into a long dataframe with column number in the second column and row number in first column
col1=rep(1, nrow(ests))
col2=rep(2, nrow(ests))
col3=rep(3, nrow(ests))
rows=c(ests[,1], ests[,2], ests[,3])
index=cbind(rows,cols)#this is the index so we can extract the coordinates in multi-D space
#get coordinates with this function
fExtract <- function(dat, indexDat){
#pull three coordinates (x,y,z) from eval.points into 3 columns
eval.pts <- cbind(kd$eval.points[[1]], kd$eval.points[[2]], kd$eval.points[[3]])
v <- fExtract(eval.pts, index) #one long vector
#re-create the three columns of x, y and z coordinates of points at higher density than 95th percentile
x1 <- v[1:nrow(ests)]
y1 <- v[(nrow(ests)+1):(2*nrow(ests))]
z1 <- v[(2*nrow(ests)+1):(3*nrow(ests))]
#the three coordinates.
fin <- cbind(x1,y1,z1)
#get range of each dimension
But I'm not confident it's right.

ggplot2 plots / results are different within and outside of loop [Bug?]

The my simple case:
Plotting graphs within the loop brings different results than plotting it directly after the loop
# Initialize
Input <- list(c(3,3,3,3),c(1,1,1,1))
y <- c()
x <- c()
plotlist <- c()
Answer <- c()
# create helper grid
x.grid = c(1:4)
y.grid = c(1:4)
helpergrid <- expand.grid(xgrid=x.grid, ygrid=y.grid )
#- Loop Lists -
for (m in c(1,2))
# # Loop within each list
# for(j in 1:4)
# {
# y[j] <- Input[[m]][j]
# x[j] <- j
# }
y[1] <- Input[[m]][1]
x[1] <- 1
y[2] <- Input[[m]][2]
x[2] <- 2
y[3] <- Input[[m]][3]
x[3] <- 3
y[4] <- Input[[m]][4]
x[4] <- 4
Points <- data.frame(x, y)
# Example Plot
plot = ggplot() + labs(title = paste("Loop m = ",m)) + labs(subtitle = paste("y-values = ",Points$y)) + geom_tile(data = helpergrid, aes(x=xgrid, y=ygrid, fill=1), colour="grey20") + geom_point(data = Points, aes(x=Points$x, y=Points$y), stroke=3, size=5, shape=1, color="white") + theme_minimal()
# Plot to plotlist
plotlist[[m]] <- plot
# --- Plot plotlist within loop ---
# --- Plot plotlist outside of loop ---
Here is an image of the results:
Plot Results
as aaumai is pointing out that there is a nested loop that might cause the issue for ggplot using static values, however the resulting plot 'is' showing the correct y-value (y=3) explicitely, but the geom_points are using the wrong values (y=1)...
It makes absolutely (!) no sense to me, I am relatively new to R and trying to debug this for hours now - so I hope someone can help me with this !!
EDIT: I manually removed the nested loop and updated the example code, but the problem still persists :(
The problem arises due to your use of Points$x within aes. The "tl;dr" is that basically you should never use $ or [ or [[ within aes. See the answer here from baptiste.
# Initialize
Input <- list(c(3,3,3,3),c(1,1,1,1))
y <- c()
x <- c()
plotlist <- c()
Answer <- c()
# create helper grid
x.grid = c(1:4)
y.grid = c(1:4)
helpergrid <- expand.grid(xgrid=x.grid, ygrid=y.grid )
#- Loop Lists -
for (m in c(1,2)) {
y[1] <- Input[[m]][1]
x[1] <- 1
y[2] <- Input[[m]][2]
x[2] <- 2
y[3] <- Input[[m]][3]
x[3] <- 3
y[4] <- Input[[m]][4]
x[4] <- 4
Points <- data.frame(x, y)
# Example Plot
plot = ggplot() + labs(title = paste("Loop m = ",m)) + labs(subtitle = paste("y-values = ",force(Points$y))) +
geom_tile(data = helpergrid, aes(x=xgrid, y=ygrid, fill=1), colour="grey20") +
geom_point(data = Points, aes(x=x, y=y), stroke=3, size=5, shape=1, color="white") + theme_minimal()
# Plot to plotlist
plotlist[[m]] <- plot
# --- Plot plotlist within loop ---
# --- Plot plotlist outside of loop ---
I believe the reason this happens is due to lazy evaluation. The data passed into geom_tile/point gets stored, but when the plot is printed, it grabs Points$x from the current environment. During the loop, this points to the current state of the Points data frame, the desired state. After the loop is finished, only the second version of Points exists, so when the referenced value from aes is evaluated, it grabs the x values from Points$x as it exists after the second evaluation of the loop. Hope this is clear, feel free to ask further if not.
To clarify, if you remove Points$ and just refer to x within aes, it takes these values from the data.frame as it was passed into the data argument of the geom calls.
If I'm not mistaken, this is because you have a loop within the loop.
The plot within the loop returns plots for changing y values in the Points data (from 1 to 4), whereas the plot outside is only plotting the static values.

How to convert ellipsoid to mesh3d in R?

I use the ellipsoidhull method from cluster package to obtain the minimum volume enclosing ellipsoid (mvee) from a set of points. This method returns an object of class ellipsoid. I need to plot the generated ellipsoid. I tried to use the wire3d method from rgl package to plot ellipsoids but this method gets objects of class mesh3d as input parameter. How can I convert an ellipsoid object to a mesh3d object?
If you don't actually care about a mesh, and are just looking to plot a transparent ellipsoid you can use this:
ellipsoid3d <- function(cen, a = 1,b = 1,c = 1,n = 65, ...){
f <- function(s,t){
cbind( a * cos(t)*cos(s) + cen[1],
b * sin(s) + cen[2],
c * sin(t)*cos(s) + cen[3])
persp3d(f, slim = c(-pi/2,pi/2), tlim = c(0, 2*pi), n = n, add = T, ...)
n <- 6
for (i in 1:n){
cen <- 3*runif(3)
a <- runif(1)
b <- runif(1)
c <- runif(1)
clr <- c("red","blue","green")[i %% 3 + 1 ]
elpf <- ellipsoid3d(cen,a=a,b=b,c=c,col=clr,alpha=0.5)
I modified the interesting answer from cuttlefish44 to get this - see this link: enter link description here
There is also a qmesh3d answer from dww there that you could modify in a similar manner to get a mesh3d if that is what you really want, but I thought this more elegant.
xyz <- cbind(rnorm(10), rnorm(10), rnorm(10))
e <- ellipsoidhull(xyz)
A <- e$cov
center <- e$loc
r <- sqrt(e$d2)
sphr <- vcgSphere()
ell <- translate3d(
transform3d(sphr, chol(A)),
r, r, r),
center[1], center[2], center[3])
shade3d(ell, color="red", alpha=0.3)

R: Draw a polygon with conditional colour

I want to colour the area under a curve. The area with y > 0 should be red, the area with y < 0 should be green.
x <- c(1:4)
y <- c(0,1,-1,2,rep(0,4))
Using ifelse() does not work:
What I achieved so far is the following:
But then the red area is too large. Do you have any ideas how to get the desired result?
If you want two different colors, you need two different polygons. You can either call polygon multiple times, or you can add NA values in your x and y vectors to indicate a new polygon. R will not automatically calculate the intersection for you. You must do that yourself. Here's how you could draw that with different colors.
x <- c(1,2,2.5,NA,2.5,3,4)
y <- c(0,1,0,NA,0,-1,0)
#calculate color based on most extreme y value
g <- cumsum(
gc <- ifelse(tapply(y, g,
function(x) x[which.max(abs(x))])>0,
plot(c(1, 4),c(-1,1), type = "n")
polygon(x, y, col = gc)
In the more general case, it might not be as easy to split a polygon into different regions. There seems to be some support for this type of operation in GIS packages, where this type of thing is more common. However, I've put together a somewhat general case that may work for simple polygons.
First, I define a closure that will define a cutting line. The function will take a slope and y-intercept for a line and will return the functions we need to cut a polygon.
getSplitLine <- function(m=1, b=0) {
force(m); force(b)
classify <- function(x,y) {
y >= m*x + b
intercepts <- function(x,y, class=classify(x,y)) {
w <- which(diff(class)!=0)
m2 <- (y[w+1]-y[w])/(x[w+1]-x[w])
b2 <- y[w] - m2*x[w]
ix <- (b2-b)/(m-m2)
iy <- ix*m + b
data.frame(x=ix,y=iy,idx=w+.5, dir=((rank(ix, ties="first")+1) %/% 2) %% 2 +1)
plot <- function(...) {
Now we will define a function to actually split a polygon using the splitter we've just defined.
splitPolygon <- function(x, y, splitter) {
addnullrow <- function(x) if (!all([nrow(x),]))) rbind(x, NA) else x
rollup <- function(x,i=1) rbind(x[(i+1):nrow(x),], x[1:i,])
idx <- cumsum( |
polys <- split(data.frame(x=x,y=y)[!,], idx[!])
r <- lapply(polys, function(P) {
x <- P$x; y<-P$y
side <- splitter$classify(x, y)
if(side[1] != side[length(side)]) {
ints <- splitter$intercepts(c(x,x[1]), c(y, y[1]), c(side, side[1]))
} else {
ints <- splitter$intercepts(x, y, side)
sideps <- lapply(unique(side), function(ss) {
pts <- data.frame(x=x[side==ss], y=y[side==ss],
idx=seq_along(x)[side==ss], dir=0)
mm <- rbind(pts, ints)
mm <- mm[order(mm$idx), ]
br <- cumsum(mm$dir!=0 & c(0,head(mm$dir,-1))!=0 &
if (length(unique(br))>1) {
mm<-rollup(mm, sum(br==br[1]))
br <- cumsum(c(FALSE,abs(diff(mm$dir*mm$dir))==3)), lapply(split(mm, br), addnullrow))
pss<-rep(unique(side), sapply(sideps, nrow))
ps<, lapply(sideps, addnullrow))[,c("x","y")]
attr(ps, "side")<-pss
pss<-unname(unlist(lapply(r, attr, "side")))
src <- rep(seq_along(r), sapply(r, nrow))
r <-, r)
attr(r, "source")<-src
attr(r, "side")<-pss
The input is just the values of x and y as you would pass to polygon along with the cutter. It will return a data.frame with x and y values that can be used with polygon.
For example
x <- c(1,2,2.5,NA,2.5,3,4)
y <- c(1,-2,2,NA,-1,2,-2)
plot(range(x, na.rm=T),range(y, na.rm=T), type = "n")
p <- splitPolygon(x,y,sl)
g <- cumsum(c(F,$y,-1))))
gc <- ifelse(attr(p,"side")[$y)],
polygon(p, col=gc)
sl$plot(lty=2, col="grey")
This should work for simple concave polygons as well with sloped lines. Here's another example
x <- c(1,2,3,4,5,4,3,2)
y <- c(-2,2,1,2,-2,.5,-.5,.5)
plot(range(x, na.rm=T),range(y, na.rm=T), type = "n")
p <- splitPolygon(x,y,sl)
g <- cumsum(c(F,$y,-1))))
gc <- ifelse(attr(p,"side")[$y)],
polygon(p, col=gc)
sl$plot(lty=2, col="grey")
Right now things can get a bit messy when the the vertex of the polygon falls directly on the splitting line. I may try to correct that in the future.
A faster, but not very accurate solution is to split data frame to list according to grouping variable (e.g. above=red and below=blue). This is a pretty nice workaround for rather big (I would say > 100 elements) datasets. For smaller chunks some discontinuity may be visible:
x <- 1:100
y1 <- sin(1:100/10)*0.8
y2 <- sin(1:100/10)*1.2
plot(x, y2, type='l')
lines(x, y1, col='red')
df <- data.frame(x=x, y1=y1, y2=y2)
df$pos_neg <- ifelse(df$y2-df$y1>0,1,-1) # above (1) or below (-1) average
# create the number for chunks to be split into lists:
df$chunk <- c(1,cumsum(abs(diff(df$pos_neg)))/2+1) # first element needs to be added`
df$colors <- ifelse(df$pos_neg>0, "red","blue") # colors to be used for filling the polygons
# create lists to be plotted:
l <- split(df, df$chunk) # we should get 4 sub-lists
lapply(l, function(x) polygon(c(x$x,rev(x$x)),c(x$y2,rev(x$y1)),col=x$colors))
As I said, for smaller dataset some discontinuity may be visible if sharp changes occur between positive and negative areas, but if horizontal line distinguishes between those two, or more elements are plotted then this effect is neglected:

How to use plot3d/surface3d (or another function?) to plot 4d function ("fourth dimension" denoted by color scale)?

I would like to plot:
production.ts(31, .002, 10,12,125313.93,211,95,x,"2014-02-01","2014-05-14",z,y) as function of x,y,z
As something like this plot from Mathematica, (if possible in R):
I have a function:
library("lubridate"); library("rgl")
production.ts <- function(a, b, z, c, d, e,
f, g, h, j, r, k) {
elapsed <- (4-z)*10 + (4-c)
un.days <- 100 - elapsed
gone.days <- day(as.Date(h))
rem.days <- day(as.Date(j))
r.days <- as.numeric(as.Date(j) - as.Date(h))
m.r <- f/100*d
inputs <- d * a * (gone.days - 1)/365 + r
prin <- m.r + inputs
costs <- (r.days/365 * r + 1) * prin
added.p <- a/100*d + r
due <- d * 1-un.days
tomr.f <- 1- due + k^2
acct.paid <- (d - due)*tomr.f
net <- added.p + due + acct.paid <- net/(1+r*(e-30-day(as.Date(j)))/365)
end <- d - due - acct.paid
more.add.p <- end*a*(rem.days-1)/365
rem <- (f-g)/100 * end
total.fv <- + rem + more.add.p
out <- costs - total.fv
I have tried:
func.3d<-Vectorize(production.ts(31, .002, 10,12,125313.93,211,95,x,"2014-02-01","2014-05-14",z,y))
c <- func.3d; c <- cut(c,breaks=64); cols <- rainbow(64)[as.numeric(c)]
plot3d(x, y, z, col=cols,type="s",size=1)
But this plots lines and the colors don't line up with the values the function should output.
Does anyone know how I could do this? Thanks, I really appreciate your time!
Like this?
df <- expand.grid(x=x,y=y,z=z)
f <- function(x,y,z) {production.ts(31, .002, 10,12,125313.93,211,95,x,"2014-02-01","2014-05-14",z,y)}
df$c <- f(df$x,df$y,df$z)
c <- cut(df$c,breaks=64)
cols <- rainbow(64)[as.numeric(c)]
plot3d(df$x, df$y, df$z, col=cols,type="p",size=1)
Your code was not plotting lines. When you pass x, y, and z like that to plot3d(...) it cycles through all the elements together, so x[1],y[1],z[1] is a point, x[2],y[2],z[2] is another point, and so on. Since the vectors are different lengths, the shorter ones are recycled to fill out to the length of the longest. The visual effect of this is that the points lie on a line.
You want yo plot every combination of x, y, and z, and give each point a color based on that combination. The code above does that. The plot does not quite look like yours, but I can't tell if that is because of the way you have defined your function.
Also, the way you defined x, y, and z there would be 201 X 10001 X 901 = 1,811,191,101 points, which is too many to handle. The code above plots 1,000,000 points.
Finally, plotting spheres (type="s") is very expensive and unnecessary in this case.
