Draw line through 2d density plot - r

I have a large dataset of gene expression from ~10,000 patient samples (TCGA), and I'm plotting a predicted expression value (x) and the actual observed value (y) of a certain gene signature. For my downstream analysis, I need to draw a precise line through the plot and calculate different parameters in samples above/below the line.
No matter how I draw a line through the data (geom_smooth(method = 'lm', 'glm', 'gam', or 'loess')), the line always seems imperfect - it doesn't cut through the data to my liking (red line is lm in figure).
After playing around for a while, I realized that the 2d kernel density lines (geom_density2d) actually do a good job of showing the slope/trends of my data, so I manually drew a line that kind of cuts through the density lines (black line in figure).
My question: how can I automatically draw a line that cuts through the kernel density lines, as for the black line in the figure? (Rather than manually playing with different intercepts and slopes till something looks good).
The best approach I can think of is to somehow calculate intercept and slope of the longest diameter for each of the kernel lines, take an average of all those intercepts and slopes and plot that line, but that's a bit out of my league. Maybe someone here has experience with this and can help?
A more hacky approach may be getting the x,y coords of each kernel density line from ggplot_build, and going from there, but it feels too hacky (and is also out of my league).
Thanks!
EDIT: Changed a few details to make the figure/analysis easier. (Density lines are smoother now).
Reprex:
library(MASS)
set.seed(123)
samples <- 10000
r <- 0.9
data <- mvrnorm(n=samples, mu=c(0, 0), Sigma=matrix(c(2, r, r, 2), nrow=2))
x <- data[, 1] # standard normal (mu=0, sd=1)
y <- data[, 2] # standard normal (mu=0, sd=1)
test.df <- data.frame(x = x, y = y)
lm(y ~ x, test.df)
ggplot(test.df, aes(x, y)) +
geom_point(color = 'grey') +
geom_density2d(color = 'red', lwd = 0.5, contour = T, h = c(2,2)) + ### EDIT: h = c(2,2)
geom_smooth(method = "glm", se = F, lwd = 1, color = 'red') +
geom_abline(intercept = 0, slope = 0.7, lwd = 1, col = 'black') ## EDIT: slope to 0.7
Figure:

I generally agree with #Hack-R.
However, it was kind of a fun problem and looking into ggplot_build is not such a big deal.
require(dplyr)
require(ggplot2)
p <- ggplot(test.df, aes(x, y)) +
geom_density2d(color = 'red', lwd = 0.5, contour = T, h = c(2,2))
#basic version of your plot
p_built <- ggplot_build(p)
p_data <- p_built$data[[1]]
p_maxring <- p_data[p_data[['level']] == min(p_data[['level']]),] %>%
select(x,y) # extracts the x/y coordinates of the points on the largest ellipse from your 2d-density contour
Now this answer helped me to find the points on this ellipse which are furthest apart.
coord_mean <- c(x = mean(p_maxring$x), y = mean(p_maxring$y))
p_maxring <- p_maxring %>%
mutate (mean_dev = sqrt((x - mean(x))^2 + (y - mean(y))^2)) #extra column specifying the distance of each point to the mean of those points
coord_farthest <- c('x' = p_maxring$x[which.max(p_maxring$mean_dev)], 'y' = p_maxring$y[which.max(p_maxring$mean_dev)])
# gives the coordinates of the point farthest away from the mean point
farthest_from_farthest <- sqrt((p_maxring$x - coord_farthest['x'])^2 + (p_maxring$y - coord_farthest['y'])^2)
#now this looks which of the points is the farthest from the point farthest from the mean point :D
coord_fff <- c('x' = p_maxring$x[which.max(farthest_from_farthest)], 'y' = p_maxring$y[which.max(farthest_from_farthest)])
ggplot(test.df, aes(x, y)) +
geom_density2d(color = 'red', lwd = 0.5, contour = T, h = c(2,2)) +
# geom_segment using the coordinates of the points farthest apart
geom_segment((aes(x = coord_farthest['x'], y = coord_farthest['y'],
xend = coord_fff['x'], yend = coord_fff['y']))) +
geom_smooth(method = "glm", se = F, lwd = 1, color = 'red') +
# as per your request with your geom_smooth line
coord_equal()
coord_equal is super important, because otherwise you will get super weird results - it messed up my brain too. Because if the coordinates are not set equal, the line will seemingly not pass through the point furthest apart from the mean...
I leave it to you to build this into a function in order to automate it. Also, I'll leave it to you to calculate the y-intercept and slope from the two points

Tjebo's approach was kind of good initially, but after a close look, I found that it found the longest distance between two points on an ellipse. While this is close to what I wanted, it failed with either an irregular shape of the ellipse, or the sparsity of points in the ellipse. This is because it measured the longest distance between two points; whereas what I really wanted is the longest diameter of an ellipse; i.e.: the semi-major axis. See image below for examples/details.
Briefly:
To find/draw density contours of specific density/percentage:
R - How to find points within specific Contour
To get the longest diameter ("semi-major axis") of an ellipse:
https://stackoverflow.com/a/18278767/3579613
For function that returns intercept and slope (as in OP), see last piece of code.
The two pieces of code and images below compare two Tjebo's approach vs. my new approach based on the above posts.
#### Reprex from OP
require(dplyr)
require(ggplot2)
require(MASS)
set.seed(123)
samples <- 10000
r <- 0.9
data <- mvrnorm(n=samples, mu=c(0, 0), Sigma=matrix(c(2, r, r, 2), nrow=2))
x <- data[, 1] # standard normal (mu=0, sd=1)
y <- data[, 2] # standard normal (mu=0, sd=1)
test.df <- data.frame(x = x, y = y)
#### From Tjebo
p <- ggplot(test.df, aes(x, y)) +
geom_density2d(color = 'red', lwd = 0.5, contour = T, h = 2)
p_built <- ggplot_build(p)
p_data <- p_built$data[[1]]
p_maxring <- p_data[p_data[['level']] == min(p_data[['level']]),][,2:3]
coord_mean <- c(x = mean(p_maxring$x), y = mean(p_maxring$y))
p_maxring <- p_maxring %>%
mutate (mean_dev = sqrt((x - mean(x))^2 + (y - mean(y))^2)) #extra column specifying the distance of each point to the mean of those points
p_maxring = p_maxring[round(seq(1, nrow(p_maxring), nrow(p_maxring)/23)),] #### Make a small ellipse to illustrate flaws of approach
coord_farthest <- c('x' = p_maxring$x[which.max(p_maxring$mean_dev)], 'y' = p_maxring$y[which.max(p_maxring$mean_dev)])
# gives the coordinates of the point farthest away from the mean point
farthest_from_farthest <- sqrt((p_maxring$x - coord_farthest['x'])^2 + (p_maxring$y - coord_farthest['y'])^2)
#now this looks which of the points is the farthest from the point farthest from the mean point :D
coord_fff <- c('x' = p_maxring$x[which.max(farthest_from_farthest)], 'y' = p_maxring$y[which.max(farthest_from_farthest)])
farthest_2_points = data.frame(t(cbind(coord_farthest, coord_fff)))
plot(p_maxring[,1:2], asp=1)
lines(farthest_2_points, col = 'blue', lwd = 2)
#### From answer in another post
d = cbind(p_maxring[,1], p_maxring[,2])
r = ellipsoidhull(d)
exy = predict(r) ## the ellipsoid boundary
lines(exy)
me = colMeans((exy))
dist2center = sqrt(rowSums((t(t(exy)-me))^2))
max(dist2center) ## major axis
lines(exy[dist2center == max(dist2center),], col = 'red', lwd = 2)
#### The plot here is made from the data in the reprex in OP, but with h = 0.5
library(MASS)
set.seed(123)
samples <- 10000
r <- 0.9
data <- mvrnorm(n=samples, mu=c(0, 0), Sigma=matrix(c(2, r, r, 2), nrow=2))
x <- data[, 1] # standard normal (mu=0, sd=1)
y <- data[, 2] # standard normal (mu=0, sd=1)
test.df <- data.frame(x = x, y = y)
## MAKE BLUE LINE
p <- ggplot(test.df, aes(x, y)) +
geom_density2d(color = 'red', lwd = 0.5, contour = T, h = 0.5) ## NOTE h = 0.5
p_built <- ggplot_build(p)
p_data <- p_built$data[[1]]
p_maxring <- p_data[p_data[['level']] == min(p_data[['level']]),][,2:3]
coord_mean <- c(x = mean(p_maxring$x), y = mean(p_maxring$y))
p_maxring <- p_maxring %>%
mutate (mean_dev = sqrt((x - mean(x))^2 + (y - mean(y))^2))
coord_farthest <- c('x' = p_maxring$x[which.max(p_maxring$mean_dev)], 'y' = p_maxring$y[which.max(p_maxring$mean_dev)])
farthest_from_farthest <- sqrt((p_maxring$x - coord_farthest['x'])^2 + (p_maxring$y - coord_farthest['y'])^2)
coord_fff <- c('x' = p_maxring$x[which.max(farthest_from_farthest)], 'y' = p_maxring$y[which.max(farthest_from_farthest)])
## MAKE RED LINE
## h = 0.5
## Given the highly irregular shape of the contours, I will use only the largest contour line (0.95) for draing the line.
## Thus, average = 1. See function below for details.
ln = long.diam("x", "y", test.df, h = 0.5, average = 1) ## NOTE h = 0.5
## PLOT
ggplot(test.df, aes(x, y)) +
geom_density2d(color = 'red', lwd = 0.5, contour = T, h = 0.5) + ## NOTE h = 0.5
geom_segment((aes(x = coord_farthest['x'], y = coord_farthest['y'],
xend = coord_fff['x'], yend = coord_fff['y'])), col = 'blue', lwd = 2) +
geom_abline(intercept = ln[1], slope = ln[2], color = 'red', lwd = 2) +
coord_equal()
Finally, I came up with the following function to deal with all this. Sorry for the lack of comments/clarity
#### This will return the intercept and slope of the longest diameter (semi-major axis).
####If Average = TRUE, it will average the int and slope across different density contours.
long.diam = function(x, y, df, probs = c(0.95, 0.5, 0.1), average = T, h = 2) {
fun.df = data.frame(cbind(df[,x], df[,y]))
colnames(fun.df) = c("x", "y")
dens = kde2d(fun.df$x, fun.df$y, n = 200, h = h)
dx <- diff(dens$x[1:2])
dy <- diff(dens$y[1:2])
sz <- sort(dens$z)
c1 <- cumsum(sz) * dx * dy
levels <- sapply(probs, function(x) {
approx(c1, sz, xout = 1 - x)$y
})
names(levels) = paste0("L", str_sub(formatC(probs, 2, format = 'f'), -2))
#plot(fun.df$x,fun.df$y, asp = 1)
#contour(dens, levels = levels, labels=probs, add=T, col = c('red', 'blue', 'green'), lwd = 2)
#contour(dens, add = T, col = 'red', lwd = 2)
#abline(lm(fun.df$y~fun.df$x))
ls <- contourLines(dens, levels = levels)
names(ls) = names(levels)
lines.info = list()
for (i in 1:length(ls)) {
d = cbind(ls[[i]]$x, ls[[i]]$y)
exy = predict(ellipsoidhull(d))## the ellipsoid boundary
colnames(exy) = c("x", "y")
me = colMeans((exy)) ## center of the ellipse
dist2center = sqrt(rowSums((t(t(exy)-me))^2))
#plot(exy,type='l',asp=1)
#points(d,col='blue')
#lines(exy[order(dist2center)[1:2],])
#lines(exy[rev(order(dist2center))[1:2],])
max.dist = data.frame(exy[rev(order(dist2center))[1:2],])
line.fit = lm(max.dist$y ~ max.dist$x)
lines.info[[i]] = c(as.numeric(line.fit$coefficients[1]), as.numeric(line.fit$coefficients[2]))
}
names(lines.info) = names(ls)
#plot(fun.df$x,fun.df$y, asp = 1)
#contour(dens, levels = levels, labels=probs, add=T, col = c('red', 'blue', 'green'), lwd = 2)
#abline(lines.info[[1]], col = 'red', lwd = 2)
#abline(lines.info[[2]], col = 'blue', lwd = 2)
#abline(lines.info[[3]], col = 'green', lwd = 2)
#abline(apply(simplify2array(lines.info), 1, mean), col = 'black', lwd = 4)
if (isTRUE(average)) {
apply(simplify2array(lines.info), 1, mean)
} else {
lines.info[[average]]
}
}
Finally, here's the final implementation of the different answers:
library(MASS)
set.seed(123)
samples = 10000
r = 0.9
data = mvrnorm(n=samples, mu=c(0, 0), Sigma=matrix(c(2, r, r, 2), nrow=2))
x = data[, 1] # standard normal (mu=0, sd=1)
y = data[, 2] # standard normal (mu=0, sd=1)
#plot(x, y)
test.df = data.frame(x = x, y = y)
#### Find furthest two points of contour
## BLUE
p <- ggplot(test.df, aes(x, y)) +
geom_density2d(color = 'red', lwd = 2, contour = T, h = 2)
p_built <- ggplot_build(p)
p_data <- p_built$data[[1]]
p_maxring <- p_data[p_data[['level']] == min(p_data[['level']]),][,2:3]
coord_mean <- c(x = mean(p_maxring$x), y = mean(p_maxring$y))
p_maxring <- p_maxring %>%
mutate (mean_dev = sqrt((x - mean(x))^2 + (y - mean(y))^2))
coord_farthest <- c('x' = p_maxring$x[which.max(p_maxring$mean_dev)], 'y' = p_maxring$y[which.max(p_maxring$mean_dev)])
farthest_from_farthest <- sqrt((p_maxring$x - coord_farthest['x'])^2 + (p_maxring$y - coord_farthest['y'])^2)
coord_fff <- c('x' = p_maxring$x[which.max(farthest_from_farthest)], 'y' = p_maxring$y[which.max(farthest_from_farthest)])
#### Find the average intercept and slope of 3 contour lines (0.95, 0.5, 0.1), as in my long.diam function above.
## RED
ln = long.diam("x", "y", test.df)
#### Plot everything. Black line is GLM
ggplot(test.df, aes(x, y)) +
geom_point(color = 'grey') +
geom_density2d(color = 'red', lwd = 1, contour = T, h = 2) +
geom_smooth(method = "glm", se = F, lwd = 1, color = 'black') +
geom_abline(intercept = ln[1], slope = ln[2], col = 'red', lwd = 1) +
geom_segment((aes(x = coord_farthest['x'], y = coord_farthest['y'],
xend = coord_fff['x'], yend = coord_fff['y'])), col = 'blue', lwd = 1) +
coord_equal()

Related

Add jitterred points to a ggnetwork plot

I have a graph of vertices and edges which I'd like to plot using a fruchtermanreingold layout.
Here's the graph edges matrix:
edge.mat <- matrix(as.numeric(strsplit("3651,0,0,1,0,0,0,0,2,0,11,2,0,0,0,300,0,1,0,0,66,0,78,9,0,0,0,0,0,0,11690,0,1,0,0,0,0,0,0,0,0,493,1,1,0,4288,5,0,0,36,0,9,7,3,0,6,1,0,1,7,490,0,0,0,6,0,0,628,6,12,0,0,0,0,0,641,0,0,4,0,0,0,0,0,0,66,0,0,0,0,3165,0,281,0,0,0,0,0,0,0,0,45,1,0,0,35248,0,1698,2,0,1,0,2,99,0,0,6,29,286,0,31987,0,1,10,0,8,0,16,0,21,1,0,0,1718,0,51234,0,0,17,3,12,0,0,7,0,0,0,1,0,2,16736,0,0,0,3,0,0,4,630,0,0,0,9,0,0,29495,53,6,0,0,0,0,5,0,0,0,0,3,0,19,186,0,0,0,482,8,12,0,1,0,7,1,0,6,0,26338",
split = ",")[[1]]),
nrow = 14,
dimnames = list(LETTERS[1:14], LETTERS[1:14]))
I then create an igraph object from that using:
gr <- igraph::graph_from_adjacency_matrix(edge.mat, mode="undirected", weighted=T, diag=F)
And then use ggnetwork to convert gr to a data.frame, with specified vertex colors:
set.seed(1)
gr.df <- ggnetwork::ggnetwork(gr,
layout="fruchtermanreingold",
weights="weight",
niter=50000,
arrow.gap=0)
And then I plot it using ggplot2 and ggnetwork:
vertex.colors <- strsplit("#00BE6B,#DC2D00,#F57962,#EE8044,#A6A400,#62B200,#FF6C91,#F77769,#EA8332,#DA8E00,#C59900,#00ACFC,#C49A00,#DC8D00",
split=",")[[1]]
library(ggplot2)
library(ggnetwork)
ggplot(gr.df, aes(x = x, y = y, xend = xend, yend = yend)) +
geom_edges(color = "gray", aes(size = weight)) +
geom_nodes(color = "black")+
geom_nodelabel(aes(label = vertex.names),
color = vertex.colors, fontface = "bold")+
theme_minimal() +
theme(axis.text=element_blank(),
axis.title=element_blank(),
legend.position="none")
In my case each vertex actually represents many points, where each vertex has a different number of points. Adding that information to gr.df:
gr.df$n <- NA
gr.df$n[which(is.na(gr.df$weight))] <- as.integer(runif(length(which(is.na(gr.df$weight))), 100, 500))
What I'd like to do is add to the plot gr.df$n radially jittered points around each vertex (i.e., with its corresponding n), with the same vertex.colors coding. Any idea how to do that?
I think sampling and then plotting with geom_point is a reasonable strategy. (otherwise you could create your own geom).
Here is some rough code, starting from the relevant bit of your question
gr.df$n <- 1
gr.df$n[which(is.na(gr.df$weight))] <- as.integer(runif(length(which(is.na(gr.df$weight))), 100, 500))
# function to sample
# https://stackoverflow.com/questions/5837572/generate-a-random-point-within-a-circle-uniformly
circSamp <- function(x, y, R=0.1){
n <- length(x)
A <- a <- runif(n,0,1)
b <- runif(n,0,1)
ind <- b < a
a[ind] <- b[ind]
b[ind] <- A[ind]
xn = x+b*R*cos(2*pi*a/b)
yn = y+b*R*sin(2*pi*a/b)
cbind(x=xn, y=yn)
}
# sample
d <- with(gr.df, data.frame(vertex.names=rep(vertex.names, n),
circSamp(rep(x,n), rep(y,n))))
# p is your plot
p + geom_point(data=d, aes(x, y, color = vertex.names),
alpha=0.1, inherit.aes = FALSE) +
scale_color_manual(values = vertex.colors)
Giving

ggplot plot 2d probability density function on top of points on ggplot

I have the following example:
require(mvtnorm)
require(ggplot2)
set.seed(1234)
xx <- data.frame(rmvt(100, df = c(13, 13)))
ggplot(data = xx, aes(x = X1, y= X2)) + geom_point() + geom_density2d()
Here is what I get:
However, I would like to get the density contour from the mutlivariate t density given by the dmvt function. How do I tweak geom_density2d to do that?
This is not an easy question to answer: because the contours need to be calculated and the ellipse drawn using the ellipse package.
Done with elliptical t-densities to illustrate the plotting better.
nu <- 5 ## this is the degrees of freedom of the multivariate t.
library(mvtnorm)
library(ggplot2)
sig <- matrix(c(1, 0.5, 0.5, 1), ncol = 2) ## this is the sigma parameter for the multivariate t
xx <- data.frame( rmvt(n = 100, df = c(nu, nu), sigma = sig)) ## generating the original sample
rtsq <- rowSums(x = matrix(rt(n = 2e6, df = nu)^2, ncol = 2)) ## generating the sample for the ellipse-quantiles. Note that this is a cumbersome calculation because it is the sum of two independent t-squared random variables with the same degrees of freedom so I am using simulation to get the quantiles. This is the sample from which I will create the quantiles.
g <- ggplot( data = xx
, aes( x = X1
, y = X2
)
) + geom_point(colour = "red", size = 2) ## initial setup
library(ellipse)
for (i in seq(from = 0.01, to = 0.99, length.out = 20)) {
el.df <- data.frame(ellipse(x = sig, t = sqrt(quantile(rtsq, probs = i)))) ## create the data for the given quantile of the ellipse.
names(el.df) <- c("x", "y")
g <- g + geom_polygon(data=el.df, aes(x=x, y=y), fill = NA, linetype=1, colour = "blue") ## plot the ellipse
}
g + theme_bw()
This yields:
I still have a question: how does one reduce the size of the plotting ellispe lines?

Fix interpolated polar contour plot function to works with current R and (possibly) use ggplot

The question R interpolated polar contour plot shows an excellent way to produce interpolated polar plots in R. I include the very slightly modified version I'm using:
PolarImageInterpolate <- function(
### Plotting data (in cartesian) - will be converted to polar space.
x, y, z,
### Plot component flags
contours=TRUE, # Add contours to the plotted surface
legend=TRUE, # Plot a surface data legend?
axes=TRUE, # Plot axes?
points=TRUE, # Plot individual data points
extrapolate=FALSE, # Should we extrapolate outside data points?
### Data splitting params for color scale and contours
col_breaks_source = 1, # Where to calculate the color brakes from (1=data,2=surface)
# If you know the levels, input directly (i.e. c(0,1))
col_levels = 10, # Number of color levels to use - must match length(col) if
#col specified separately
col = rev(heat.colors(col_levels)), # Colors to plot
# col = rev(heat.colors(col_levels)), # Colors to plot
contour_breaks_source = 1, # 1=z data, 2=calculated surface data
# If you know the levels, input directly (i.e. c(0,1))
contour_levels = col_levels+1, # One more contour break than col_levels (must be
# specified correctly if done manually
### Plotting params
outer.radius = ceiling(max(sqrt(x^2+y^2))),
circle.rads = pretty(c(0,outer.radius)), #Radius lines
spatial_res=1000, #Resolution of fitted surface
single_point_overlay=0, #Overlay "key" data point with square
#(0 = No, Other = number of pt)
### Fitting parameters
interp.type = 1, #1 = linear, 2 = Thin plate spline
lambda=0){ #Used only when interp.type = 2
minitics <- seq(-outer.radius, outer.radius, length.out = spatial_res)
# interpolate the data
if (interp.type ==1 ){
Interp <- akima:::interp(x = x, y = y, z = z,
extrap = extrapolate,
xo = minitics,
yo = minitics,
linear = FALSE)
Mat <- Interp[[3]]
}
else if (interp.type == 2){
library(fields)
grid.list = list(x=minitics,y=minitics)
t = Tps(cbind(x,y),z,lambda=lambda)
tmp = predict.surface(t,grid.list,extrap=extrapolate)
Mat = tmp$z
}
else {stop("interp.type value not valid")}
# mark cells outside circle as NA
markNA <- matrix(minitics, ncol = spatial_res, nrow = spatial_res)
Mat[!sqrt(markNA ^ 2 + t(markNA) ^ 2) < outer.radius] <- NA
### Set contour_breaks based on requested source
if ((length(contour_breaks_source == 1)) & (contour_breaks_source[1] == 1)){
contour_breaks = seq(min(z,na.rm=TRUE),max(z,na.rm=TRUE),
by=(max(z,na.rm=TRUE)-min(z,na.rm=TRUE))/(contour_levels-1))
}
else if ((length(contour_breaks_source == 1)) & (contour_breaks_source[1] == 2)){
contour_breaks = seq(min(Mat,na.rm=TRUE),max(Mat,na.rm=TRUE),
by=(max(Mat,na.rm=TRUE)-min(Mat,na.rm=TRUE))/(contour_levels-1))
}
else if ((length(contour_breaks_source) == 2) & (is.numeric(contour_breaks_source))){
contour_breaks = pretty(contour_breaks_source,n=contour_levels)
contour_breaks = seq(contour_breaks_source[1],contour_breaks_source[2],
by=(contour_breaks_source[2]-contour_breaks_source[1])/(contour_levels-1))
}
else {stop("Invalid selection for \"contour_breaks_source\"")}
### Set color breaks based on requested source
if ((length(col_breaks_source) == 1) & (col_breaks_source[1] == 1))
{zlim=c(min(z,na.rm=TRUE),max(z,na.rm=TRUE))}
else if ((length(col_breaks_source) == 1) & (col_breaks_source[1] == 2))
{zlim=c(min(Mat,na.rm=TRUE),max(Mat,na.rm=TRUE))}
else if ((length(col_breaks_source) == 2) & (is.numeric(col_breaks_source)))
{zlim=col_breaks_source}
else {stop("Invalid selection for \"col_breaks_source\"")}
# begin plot
Mat_plot = Mat
Mat_plot[which(Mat_plot<zlim[1])]=zlim[1]
Mat_plot[which(Mat_plot>zlim[2])]=zlim[2]
image(x = minitics, y = minitics, Mat_plot , useRaster = TRUE, asp = 1, axes = FALSE, xlab = "", ylab = "", zlim = zlim, col = col)
# add contours if desired
if (contours){
CL <- contourLines(x = minitics, y = minitics, Mat, levels = contour_breaks)
A <- lapply(CL, function(xy){
lines(xy$x, xy$y, col = gray(.2), lwd = .5)
})
}
# add interpolated point if desired
if (points){
points(x, y, pch = 21, bg ="blue")
}
# add overlay point (used for trained image marking) if desired
if (single_point_overlay!=0){
points(x[single_point_overlay],y[single_point_overlay],pch=0)
}
# add radial axes if desired
if (axes){
# internals for axis markup
RMat <- function(radians){
matrix(c(cos(radians), sin(radians), -sin(radians), cos(radians)), ncol = 2)
}
circle <- function(x, y, rad = 1, nvert = 500){
rads <- seq(0,2*pi,length.out = nvert)
xcoords <- cos(rads) * rad + x
ycoords <- sin(rads) * rad + y
cbind(xcoords, ycoords)
}
# draw circles
if (missing(circle.rads)){
circle.rads <- pretty(c(0,outer.radius))
}
for (i in circle.rads){
lines(circle(0, 0, i), col = "#66666650")
}
# put on radial spoke axes:
axis.rads <- c(0, pi / 6, pi / 3, pi / 2, 2 * pi / 3, 5 * pi / 6)
r.labs <- c(90, 60, 30, 0, 330, 300)
l.labs <- c(270, 240, 210, 180, 150, 120)
for (i in 1:length(axis.rads)){
endpoints <- zapsmall(c(RMat(axis.rads[i]) %*% matrix(c(1, 0, -1, 0) * outer.radius,ncol = 2)))
segments(endpoints[1], endpoints[2], endpoints[3], endpoints[4], col = "#66666650")
endpoints <- c(RMat(axis.rads[i]) %*% matrix(c(1.1, 0, -1.1, 0) * outer.radius, ncol = 2))
lab1 <- bquote(.(r.labs[i]) * degree)
lab2 <- bquote(.(l.labs[i]) * degree)
text(endpoints[1], endpoints[2], lab1, xpd = TRUE)
text(endpoints[3], endpoints[4], lab2, xpd = TRUE)
}
axis(2, pos = -1.25 * outer.radius, at = sort(union(circle.rads,-circle.rads)), labels = NA)
text( -1.26 * outer.radius, sort(union(circle.rads, -circle.rads)),sort(union(circle.rads, -circle.rads)), xpd = TRUE, pos = 2)
}
# add legend if desired
# this could be sloppy if there are lots of breaks, and that's why it's optional.
# another option would be to use fields:::image.plot(), using only the legend.
# There's an example for how to do so in its documentation
if (legend){
library(fields)
image.plot(legend.only=TRUE, smallplot=c(.78,.82,.1,.8), col=col, zlim=zlim)
# ylevs <- seq(-outer.radius, outer.radius, length = contour_levels+ 1)
# #ylevs <- seq(-outer.radius, outer.radius, length = length(contour_breaks))
# rect(1.2 * outer.radius, ylevs[1:(length(ylevs) - 1)], 1.3 * outer.radius, ylevs[2:length(ylevs)], col = col, border = NA, xpd = TRUE)
# rect(1.2 * outer.radius, min(ylevs), 1.3 * outer.radius, max(ylevs), border = "#66666650", xpd = TRUE)
# text(1.3 * outer.radius, ylevs[seq(1,length(ylevs),length.out=length(contour_breaks))],round(contour_breaks, 1), pos = 4, xpd = TRUE)
}
}
Unfortunately, this function has a few bugs:
a) Even with a purely radial pattern, the produced plot has a distortion whose origin I don't understand:
#example
r <- rep(seq(0.1, 0.9, len = 8), each = 8)
theta <- rep(seq(0, 7/4*pi, by = pi/4), times = 8)
x <- r*sin(theta)
y <- r*cos(theta)
z <- z <- rep(seq(0, 1, len = 8), each = 8)
PolarImageInterpolate(x, y, z)
why the wiggles between 300° and 360°? The z function is constant in theta, so there's no reason why there should be wiggles.
b) After 4 years, some of the packages loaded have been modified and at least one functionality of the function is broken. Setting interp.type = 2 should use thin plate splines for interpolation instead than a basic linear interpolation, but it doesn't work:
> PolarImageInterpolate(x, y, z, interp.type = 2)
Warning:
Grid searches over lambda (nugget and sill variances) with minima at the endpoints:
(GCV) Generalized Cross-Validation
minimum at right endpoint lambda = 9.493563e-06 (eff. df= 60.80002 )
predict.surface is now the function predictSurface
Error in image.default(x = minitics, y = minitics, Mat_plot, useRaster = TRUE, :
'z' must be a matrix
the first message is a warning and doesn't worry me, but the second one is actually an error and prevents me from using thin plate splines. Can you help me solve these two problems?
Also, I'd like to "upgrade" to using ggplot2, so if you can give an answer which does that, it would be great. Otherwise, after the bugs are fixed, I'll try asking a specific question which only asks to modify the function so that it uses ggplot2.
For the ggplot2 solution, here is a start. geom_raster allows interpolation, but does not work for polar coordinates. Instead, you can use geom_tile, though then you may need to do the interpolation yourself before passing the values to ggplot.
Of important note: the example data you gave gives an error when working with geom_raster that I believe is caused by the spacing of the values. Here is an example set that works (note, used this blog as a guide, though it is now outdated):
dat_grid <-
expand.grid(x = seq(0,350,10), y = 0:10)
dat_grid$density <- runif(nrow(dat_grid))
ggplot(dat_grid
, aes(x = x, y = y, fill = density)) +
geom_tile() +
coord_polar() +
scale_x_continuous(breaks = seq(0,360,90)) +
scale_fill_gradient2(low = "white"
, mid = "yellow"
, high = "red3"
, midpoint = 0.5)
If you are working from raw data, you might be able to get ggplot to do the work for you. Here is an example working from raw data. There are a lot of manual tinkering things to do, but it is at least an optional starting point:
polarData <-
data.frame(
theta = runif(10000, 0, 2*pi)
, r = log(abs(rnorm(10000, 0, 10)))
)
toCart <-
data.frame(
x = polarData$r * cos(polarData$theta)
, y = polarData$r * sin(polarData$theta)
)
axisLines <-
data.frame(
x = 0
, y = 0
, xend = max(polarData$r)*cos(seq(0, 2*pi, pi/4))
, yend = max(polarData$r)*sin(seq(0, 2*pi, pi/4))
, angle = paste(seq(0, 2, 1/4), "pi") )
ticks <-
data.frame(
label = pretty(c(0, max(polarData$r)) )[-1]
)
ggplot(toCart) +
# geom_point(aes(x = x, y = y)) +
stat_density_2d(aes(x = x, y = y
, fill = ..level..)
, geom = "polygon") +
scale_fill_gradient(low = "white"
, high = "red3") +
theme(axis.text = element_blank()
, axis.title = element_blank()
, axis.line = element_blank()
, axis.ticks = element_blank()) +
geom_segment(data = axisLines
, aes(x = x, y = y
, xend = xend
, yend = yend)) +
geom_label(data = axisLines
, aes(x = xend, y = yend, label = angle)) +
geom_label(data = ticks
, aes(x = 0, y = label, label = label))
From an another post, I came to know that the fucnction predict.surface from package fields is deprecated whic is used for interp.type = 2 in PolarImageInterpolate. Instead, a new predictSurface function is introduced, which can be used here.
Example:
r <- rep(seq(0.1, 0.9, len = 8), each = 8)
theta <- rep(seq(0, 7/4*pi, by = pi/4), times = 8)
x <- r*sin(theta)
y <- r*cos(theta)
z <- z <- rep(seq(0, 1, len = 8), each = 8)
PolarImageInterpolate(x, y, z, interp.type = 2)

How to plot a heat map with irregular data in ordinates in ggplot?

Suppose I want to plot the following data:
# First set of X coordinates
x <- seq(0, 10, by = 0.2)
# Angles from 0 to 90 degrees
angles <- seq(0, 90, length.out = 10)
# Convert to radian
angles <- deg2rad(angles)
# Create an empty data frame
my.df <- data.frame()
# For each angle, populate the data frame
for (theta in angles) {
y <- sin(x + theta)
tmp <- data.frame(x = x, y = y, theta = as.factor(theta))
my.df <- rbind(my.df, tmp)
}
x1 <- seq(0, 12, by = 0.3)
y1 <- sin(x1 - 0.5)
tmp <- data.frame(x = x1, y = y1, theta = as.factor(-0.5))
my.df <- rbind(my.df, tmp)
ggplot(my.df, aes(x, y, color = theta)) + geom_line()
That gives me a nice plot:
Now I want to draw a heat map out of this data set. There are tutorials here and there that do it using geom_tile to do it.
So, let's try:
# Convert the angle values from factors to numerics
my.df$theta <- as.numeric(levels(my.df$theta))[my.df$theta]
ggplot(my.df, aes(theta, x)) + geom_tile(aes(fill = y)) + scale_fill_gradient(low = "blue", high = "red")
That does not work, and the reason is that my x coordinates do not have the same step:
x <- seq(0, 10, by = 0.2) vs x1 <- seq(0, 12, by = 0.3)
But as soon as I use the same step x1 <- seq(0, 12, by = 0.2), it works:
I real life, my data sets are not regularly spaced (these are experimental data), but I still need to display them as a heat map. How can I do?
You can use akima to interpolate the function into a form suitable for heat map plots.
library(akima)
library(ggplot2)
my.df.interp <- interp(x = my.df$theta, y = my.df$x, z = my.df$y, nx = 30, ny = 30)
my.df.interp.xyz <- as.data.frame(interp2xyz(my.df.interp))
names(my.df.interp.xyz) <- c("theta", "x", "y")
ggplot(my.df.interp.xyz, aes(x = theta, y = x, fill = y)) + geom_tile() +
scale_fill_gradient(low = "blue", high = "red")
If you wish to use a different resolution you can change the nx and ny arguments to interp.
Another way to do it with just ggplot2 is to use stat_summary_2d.
library(ggplot2)
ggplot(my.df, aes(x = theta, y = x, z = y)) + stat_summary_2d(binwidth = 0.3) +
scale_fill_gradient(low = "blue", high = "red")

R: How to add the noise cluster into DBSCAN plot

I'm trying to plot DBSCAN results. This is what I have done so far. My distance matrix is here.
dbs55_CR_EUCL = dbscan(writeCRToMatrix,eps=0.006, MinPts = 4, method = "dist")
plot(writeCRToMatrix[dbs55_CR_EUCL$cluster>0,],
col=dbs55_CR_EUCL$cluster[dbs55_CR_EUCL$cluster>0],
main="DBSCAN Clustering K = 4 \n (EPS=0.006, MinPts=4) without noise",
pch = 20)
This is the plot:
When I tried plotting all the clusters including the noise cluster I could only see 2 points in my plot.
What I'm looking for are
To add the points in the noise cluster to the plot but with a different symbol. Something similar to the following picture
Shade the cluster areas like in the following picture
Noise clusters have an id of 0. R plots usually ignore a color of 0 so if you want to show the noise points (as black) then you need to do the following:
plot(writeCRToMatrix,
col=dbs55_CR_EUCL$cluster+1L,
main="DBSCAN Clustering K = 4 \n (EPS=0.006, MinPts=4) with noise",
pch = 20)
If you want a different symbol for noise then you could do the following (adapted from the man page):
library(dbscan)
n <- 100
x <- cbind(
x = runif(10, 0, 10) + rnorm(n, sd = 0.2),
y = runif(10, 0, 10) + rnorm(n, sd = 0.2)
)
res <- dbscan::dbscan(x, eps = .2, minPts = 4)
plot(x, col=res$cluster, pch = 20)
points(x[res$cluster == 0L], col = "grey", pch = "+")
Here is code that will create a shaded convex hull for each cluster
library(ggplot2)
library(data.table)
library(dbscan)
dt <- data.table(x, level=as.factor(res$cluster), key = "level")
hulls <- dt[, .SD[chull(x, y)], by = level]
### get rid of hull for noise
hulls <- hulls[level != "0",]
cols <- c("0" = "grey", "1" = "red", "2" = "blue")
ggplot(dt, aes(x=x, y=y, color=level)) +
geom_point() +
geom_polygon(data = hulls, aes(fill = level, group = level),
alpha = 0.2, color = NA) +
scale_color_manual(values = cols) +
scale_fill_manual(values = cols)
Hope this helps.

Resources