I'd like to create a sankey-like plot that I can create in ggplot2 where there are curved lines between my start and end locations. Currently, I have data that looks like this:
df <- data.frame(Line = rep(letters[1:4], 2),
Location = rep(c("Start", "End"), each=4),
X = rep(c(1, 10), each = 4),
Y = c(c(1,3, 5, 15), c(9,12, 14, 6)),
stringsAsFactors = F)
ex:
Line Location X Y
1 a Start 1 1
2 a End 10 9
and creates a plot that looks something like this:
library(ggplot2)
ggplot(df) +
geom_path(aes(x= X, y= Y, group = Line))
I would like to see the data come out like this:
This is another option for setting up the data:
df2 <- data.frame(Line = letters[1:4],
Start.X= rep(1, 4),
Start.Y = c(1,3,5,15),
End.X = rep(10, 4),
End.Y = c(9,12,14,6))
ex:
Line Start.X Start.Y End.X End.Y
1 a 1 1 10 9
I can find examples of how to add a curve to the graphics of base R but these examples don't demonstrate how to get a data frame of the points in between in order to draw that curve. I would prefer to use dplyr for data manipulation. I imagine this will require a for-loop to build a table of the interpolated points.
These examples are similar but do not produce an s-shaped curve:
Plotting lines on map - gcIntermediate
http://flowingdata.com/2011/05/11/how-to-map-connections-with-great-circles/
Thank you in advance!
The code below creates curved lines via a logistic function. You could use whatever function you like instead, but this is the main idea. I should note that for other than graphical purposes, creating a curved line out of 2 points is a bad idea. It implies that the data show a certain type of relation while it actually doesn't imply that relation.
df <- data.frame(Line = rep(letters[1:4], 2),
Location = rep(c("Start", "End"), each=4),
X = rep(c(1, 10), each = 4),
Y = c(c(1,3, 5, 15), c(9,12, 14, 6)),
stringsAsFactors = F)
# logistic function for curved lines
logistic = function(x, y, midpoint = mean(x)) {
ry = range(y)
if (y[1] < y[2]) {
sign = 2
} else {
sign = -2
}
steepness = sign*diff(range(x)) / diff(ry)
out = (ry[2] - ry[1]) / (1 + exp(-steepness * (x - midpoint))) + ry[1]
return(out)
}
# an example
x = c(1, 10)
y = c(1, 9)
xnew = seq(1, 10, .5)
ynew = logistic(xnew, y)
plot(x, y, type = 'b', bty = 'n', las = 1)
lines(xnew, ynew, col = 2, type = 'b')
# applying the function to your example
xnew = seq(min(df$X), max(df$X), .1) # new x grid
m = matrix(NA, length(xnew), 4) # matrix to store results
uniq = unique(df$Line) # loop over all unique values in df$Line
for (i in seq_along(uniq)) {
m[, i] = logistic(xnew, df$Y[df$Line == uniq[i]])
}
# base R plot
matplot(xnew, m, type = 'b', las = 1, bty = 'n', pch = 1)
# put stuff in a dataframe for ggplot
df2 = data.frame(x = rep(xnew, ncol(m)),
y = c(m),
group = factor(rep(1:ncol(m), each = nrow(m))))
library(ggplot2)
ggplot(df) +
geom_path(aes(x= X, y= Y, group = Line, color = Line)) +
geom_line(data = df2, aes(x = x, y = y, group = group, color = group))
Related
I have a data.frame with two factor variables (type and age in df below) and a single numeric variable (value in df) that I'd like to plot using R's plotly package as a grouped boxplot.
Here's the data.frame:
set.seed(1)
df <- data.frame(type = c(rep("t1", 1000), rep("t2", 1000), rep("t3", 1000), rep("t4", 1000), rep("t5", 1000), rep("t6", 1000)),
age = rep(c(rep("y", 500),rep("o", 500)), 6),
value = rep(c(runif(500, 5, 10), runif(500, 7.5, 12.5)), 6),
stringsAsFactors = F)
df$age <- factor(df$age, levels = c("y", "o"), ordered = T)
Here's how I'm currently plotting it:
library(plotly)
library(dplyr)
plot_ly(x = df$type, y = df$value, name = df$age, color = df$type, type = "box", showlegend = F) %>%
layout(yaxis = list(title = "Diversity"), boxmode = "group", boxgap = 0, boxgroupgap = 0)
Which gives:
My question is whether it is possible to color the lines of the boxes by df$age?
I know that for coloring all the boxes with a single color (e.g., #AFB1B5) I can add to the plot_ly function:
line = list(color = "#AFB1B5")
But that would color all box lines similarly whereas what I'm trying to do is to color them differently by df$age.
There is a way to do this that's not that too complicated, but rather ugly. Or something that is over the top cumbersome (I didn't realize how far I was digging until I was done...)
Before I go too far... I noticed that there is a ton of white space and that you have gaps set to zero. You can add the parameter offsetgroup and get rid of a lot more whitespace. Check it out:
plot_ly(data = df,
x = ~type, y = ~value, name = ~age, offsetgroup = ~type, # <- I'm new!
color = ~type, type = "box", showlegend = F) %>%
layout(yaxis = list(title = "Diversity"),
boxmode = "group", boxgap = 0, boxgroupgap = 0)
With the not-too-complicated-but-kind-of-ugly method
The line is the box outline, the median line, the lines from Q1 to the lower fence, from Q3 to the upper fence, and the whiskers.
I assigned the plot to the object plt for this code. When I checked the object, it didn't have the data element, so I built the plot next.
plt <- plotly_build(plt)
Then I added colors with lapply.
# this looks ugly!
lapply(1:12,
function(i){
nm = plt$x$data[[i]]$name
cr = ifelse(nm == "o",
"#66FF66", "black")
plt$x$data[[i]]$line$color <<- cr # change graph by age
}
)
plt
With the ridiculous-amount-of-code-for-a-few-lines-but-looks-better method
I guess it isn't a few lines. It's 48 lines.
For this method, you need to build the plot like I did in the before (plotly_build), so that the data element is in the plt object.
Then you have to determine the first and third quantile, the IQR, the max and min values between the quantiles and 1.5 * IQR for each type and age grouping so that you have the y values for the lines.
I wrote a function to get the upper and lower fences.
fen <- function(vals){
iq = 1.5 * IQR(vals)
q3 = quantile(vals, 3/4) # top of the box
uf = q3 + iq # top of the fence
vt = max(vals[vals > q3 & vals < uf]) # max value in range
q1 = quantile(vals, 1/4) # btm of the box
bf = q1 - iq # btm of the fence
vb = min(vals[vals < q1 & vals > bf]) # min value in range
sz = function(no){
if(length(no) > 1) {no = no[[1]]}
return(no)
}
vt = sz(vt)
vb = sz(vb)
return(list(vt, vb))
}
Then I used this function and the data to determine the remaining values needed to draw the lines.
df1 <- df %>%
# have to reverse the order or it won't line up
mutate(age = factor(age, levels = c("o", "y"), ordered = T)) %>%
group_by(type, age) %>%
summarise(ufen = fen(value)[[1]], # top of the fence
q3 = quantile(value, 3/4), # top of the box
q1 = quantile(value, 1/4), # btm of the box
dfen = fen(value)[[2]]) # btm of the fence
To plot these new lines, I used shapes which is equivalent to ggplot2 annotations. (annotations in Plotly is primarily for text.)
There are several steps to drawing these lines. First I've started with some things that are essentially the same in every line. After that is a vector that helps place the lines on the x-axis.
# line shape basics; the same for every line
tellMe <- function(shade){
list(type = "line",
line = list(color = shade),
xref = "paper",
yref = "y")
}
# setup for placing lines on the x-axis; these are % of space
xers = c(rep(.0825, 4), rep(.083 * 3, 4), rep(.083 * 5, 4))
Now four lapply statements: the upper fences, the lower fences, the upper whiskers, and the lower whiskers.
lns <- lapply(1:12,
function(i) { # upper fence lines
nm = ifelse(df1[i, ]$age == "o",
"#66FF66", "black")
xb = 1/12 * (i - 1)
xn = xb + (1/6 * xers[[i]])
more = tellMe(nm)
c(x0 = xn, x1 = xn,
y0 = df1[i, ]$q3[[1]], # it's named; this makes it val only
y1 = df1[i, ]$ufen, more)
})
mlns <- lapply(1:12,
function(i) { # lower fence lines
nm = ifelse(df1[i, ]$age == "o",
"#66FF66", "black")
xb = 1/12 * (i - 1)
xn = xb + (1/6 * xers[[i]])
more = tellMe(nm)
c(x0 = xn, x1 = xn,
y0 = df1[i, ]$q1[[1]], # it's named; this makes it val only
y1 = df1[i, ]$dfen, more)
})
# default whisker width is 1/2 the width of the box
# current boxes of 1/4 of the space by type
# with domain [0, 1], the box width is 1/12 * .5, so
# the whisker width is
ww = 1/12 * .5 *.5
# already have the center, so half on each side...
ww = ww * .5
wwlns <- lapply(1:12,
function(i) { # upper fence whisker
nm = ifelse(df1[i, ]$age == "o",
"#66FF66", "black")
xb = 1/12 * (i - 1)
xn = xb + (1/6 * xers[[i]])
more = tellMe(nm)
c(x0 = xn - ww, x1 = xn + ww,
y0 = df1[i, ]$ufen, y1 = df1[i, ]$ufen,
more)
})
wwm <- lapply(1:12,
function(i) { # lower fence whisker
nm = ifelse(df1[i, ]$age == "o",
"#66FF66", "black")
xb = 1/12 * (i - 1)
xn = xb + (1/6 * xers[[i]])
more = tellMe(nm)
c(x0 = xn - ww, x1 = xn + ww,
y0 = df1[i, ]$dfen, y1 = df1[i, ]$dfen,
more)
})
Now you have to concatenate the lists and add them to the plot.
# combine shapes
shp <- append(lns, mlns)
shp <- append(shp, wwlns)
shp <- append(shp, wwm)
plt %>% layout(shapes = shp)
There are OBVIOUSLY better color choices out there.
When I do the below code on my data, since there are 35 variables the resulting plot is almost useless because of all the overlap. I can't seem to find anywhere that would give me the list of data that's used to make the plot. For instance, I have a factor called avg_sour that has a direction of about 272 degrees and a magnitude of 1. That's one of the few I can actually see. If I had this data in a table, however, I could see clearly what I'm looking for without having to zoom in and out every time. Add to that the fact that this is for a presentation, so I need to be able to make this visible quickly, without them looking at multiple things--but I think I could get away with a crowded graph and a table that explained the crowded portion. Seems like it ought to be simple, but...I'm afraid I haven't found it yet. Any ideas? I can use any package I can find.
ggbiplot(xD4PCA,obs.scale = .1, var.scale = 1,
varname.size = 3, labels.size=6, circle = T, alpha = 0, center = T)+
scale_x_continuous(limits=c(-2,2)) +
scale_y_continuous(limits=c(-2,2))
If your xD4PCA is from prcomp function, then $rotation gives you eigenvectors. See prcomp function - Value.
You may manually choose and add arrows from xD4PCA$rotation[,1:2]
I was working on this with sample data ir.pca, which is just simple prcomp object using iris data, and all these jobs are based on source code of ggbiplot.
pcobj <- ir.pca # change here with your prcomp object
nobs.factor <- sqrt(nrow(pcobj$x) - 1)
d <- pcobj$sdev
u <- sweep(pcobj$x, 2, 1 / (d * nobs.factor), FUN = '*')
v <- pcobj$rotation
choices = 1:2
choices <- pmin(choices, ncol(u))
df.u <- as.data.frame(sweep(u[,choices], 2, d[choices]^obs.scale, FUN='*'))
v <- sweep(v, 2, d^1, FUN='*')
df.v <- as.data.frame(v[, choices])
names(df.u) <- c('xvar', 'yvar')
names(df.v) <- names(df.u)
df.u <- df.u * nobs.factor
r <- sqrt(qchisq(circle.prob, df = 2)) * prod(colMeans(df.u^2))^(1/4)
v.scale <- rowSums(v^2)
df.v <- r * df.v / sqrt(max(v.scale))
df.v$varname <- rownames(v)
df.v$angle <- with(df.v, (180/pi) * atan(yvar / xvar))
df.v$hjust = with(df.v, (1 - 1.5 * sign(xvar)) / 2)
theta <- c(seq(-pi, pi, length = 50), seq(pi, -pi, length = 50))
circle <- data.frame(xvar = r * cos(theta), yvar = r * sin(theta))
df.v <- df.v[1:2,] # change here like df.v[1:2,]
ggbiplot::ggbiplot(ir.pca,obs.scale = .1, var.scale = 1,
varname.size = 3, labels.size=6, circle = T, alpha = 0, center = T, var.axes = FALSE)+
scale_x_continuous(limits=c(-2,2)) +
scale_y_continuous(limits=c(-2,2)) +
geom_segment(data = df.v, aes(x = 0, y = 0, xend = xvar, yend = yvar),
arrow = arrow(length = unit(1/2, 'picas')),
color = muted('red')) +
geom_text(data = df.v,
aes(label = rownames(df.v), x = xvar, y = yvar,
angle = angle, hjust = hjust),
color = 'darkred', size = 3)
ggbiplot::ggbiplot(ir.pca)+
scale_x_continuous(limits=c(-2,2)) +
scale_y_continuous(limits=c(-2,2)) +
geom_path(data = circle, color = muted('white'),
size = 1/2, alpha = 1/3)
Original one(having all four variables)
Edited one(select only first two variables)
I have a large dataset of gene expression from ~10,000 patient samples (TCGA), and I'm plotting a predicted expression value (x) and the actual observed value (y) of a certain gene signature. For my downstream analysis, I need to draw a precise line through the plot and calculate different parameters in samples above/below the line.
No matter how I draw a line through the data (geom_smooth(method = 'lm', 'glm', 'gam', or 'loess')), the line always seems imperfect - it doesn't cut through the data to my liking (red line is lm in figure).
After playing around for a while, I realized that the 2d kernel density lines (geom_density2d) actually do a good job of showing the slope/trends of my data, so I manually drew a line that kind of cuts through the density lines (black line in figure).
My question: how can I automatically draw a line that cuts through the kernel density lines, as for the black line in the figure? (Rather than manually playing with different intercepts and slopes till something looks good).
The best approach I can think of is to somehow calculate intercept and slope of the longest diameter for each of the kernel lines, take an average of all those intercepts and slopes and plot that line, but that's a bit out of my league. Maybe someone here has experience with this and can help?
A more hacky approach may be getting the x,y coords of each kernel density line from ggplot_build, and going from there, but it feels too hacky (and is also out of my league).
Thanks!
EDIT: Changed a few details to make the figure/analysis easier. (Density lines are smoother now).
Reprex:
library(MASS)
set.seed(123)
samples <- 10000
r <- 0.9
data <- mvrnorm(n=samples, mu=c(0, 0), Sigma=matrix(c(2, r, r, 2), nrow=2))
x <- data[, 1] # standard normal (mu=0, sd=1)
y <- data[, 2] # standard normal (mu=0, sd=1)
test.df <- data.frame(x = x, y = y)
lm(y ~ x, test.df)
ggplot(test.df, aes(x, y)) +
geom_point(color = 'grey') +
geom_density2d(color = 'red', lwd = 0.5, contour = T, h = c(2,2)) + ### EDIT: h = c(2,2)
geom_smooth(method = "glm", se = F, lwd = 1, color = 'red') +
geom_abline(intercept = 0, slope = 0.7, lwd = 1, col = 'black') ## EDIT: slope to 0.7
Figure:
I generally agree with #Hack-R.
However, it was kind of a fun problem and looking into ggplot_build is not such a big deal.
require(dplyr)
require(ggplot2)
p <- ggplot(test.df, aes(x, y)) +
geom_density2d(color = 'red', lwd = 0.5, contour = T, h = c(2,2))
#basic version of your plot
p_built <- ggplot_build(p)
p_data <- p_built$data[[1]]
p_maxring <- p_data[p_data[['level']] == min(p_data[['level']]),] %>%
select(x,y) # extracts the x/y coordinates of the points on the largest ellipse from your 2d-density contour
Now this answer helped me to find the points on this ellipse which are furthest apart.
coord_mean <- c(x = mean(p_maxring$x), y = mean(p_maxring$y))
p_maxring <- p_maxring %>%
mutate (mean_dev = sqrt((x - mean(x))^2 + (y - mean(y))^2)) #extra column specifying the distance of each point to the mean of those points
coord_farthest <- c('x' = p_maxring$x[which.max(p_maxring$mean_dev)], 'y' = p_maxring$y[which.max(p_maxring$mean_dev)])
# gives the coordinates of the point farthest away from the mean point
farthest_from_farthest <- sqrt((p_maxring$x - coord_farthest['x'])^2 + (p_maxring$y - coord_farthest['y'])^2)
#now this looks which of the points is the farthest from the point farthest from the mean point :D
coord_fff <- c('x' = p_maxring$x[which.max(farthest_from_farthest)], 'y' = p_maxring$y[which.max(farthest_from_farthest)])
ggplot(test.df, aes(x, y)) +
geom_density2d(color = 'red', lwd = 0.5, contour = T, h = c(2,2)) +
# geom_segment using the coordinates of the points farthest apart
geom_segment((aes(x = coord_farthest['x'], y = coord_farthest['y'],
xend = coord_fff['x'], yend = coord_fff['y']))) +
geom_smooth(method = "glm", se = F, lwd = 1, color = 'red') +
# as per your request with your geom_smooth line
coord_equal()
coord_equal is super important, because otherwise you will get super weird results - it messed up my brain too. Because if the coordinates are not set equal, the line will seemingly not pass through the point furthest apart from the mean...
I leave it to you to build this into a function in order to automate it. Also, I'll leave it to you to calculate the y-intercept and slope from the two points
Tjebo's approach was kind of good initially, but after a close look, I found that it found the longest distance between two points on an ellipse. While this is close to what I wanted, it failed with either an irregular shape of the ellipse, or the sparsity of points in the ellipse. This is because it measured the longest distance between two points; whereas what I really wanted is the longest diameter of an ellipse; i.e.: the semi-major axis. See image below for examples/details.
Briefly:
To find/draw density contours of specific density/percentage:
R - How to find points within specific Contour
To get the longest diameter ("semi-major axis") of an ellipse:
https://stackoverflow.com/a/18278767/3579613
For function that returns intercept and slope (as in OP), see last piece of code.
The two pieces of code and images below compare two Tjebo's approach vs. my new approach based on the above posts.
#### Reprex from OP
require(dplyr)
require(ggplot2)
require(MASS)
set.seed(123)
samples <- 10000
r <- 0.9
data <- mvrnorm(n=samples, mu=c(0, 0), Sigma=matrix(c(2, r, r, 2), nrow=2))
x <- data[, 1] # standard normal (mu=0, sd=1)
y <- data[, 2] # standard normal (mu=0, sd=1)
test.df <- data.frame(x = x, y = y)
#### From Tjebo
p <- ggplot(test.df, aes(x, y)) +
geom_density2d(color = 'red', lwd = 0.5, contour = T, h = 2)
p_built <- ggplot_build(p)
p_data <- p_built$data[[1]]
p_maxring <- p_data[p_data[['level']] == min(p_data[['level']]),][,2:3]
coord_mean <- c(x = mean(p_maxring$x), y = mean(p_maxring$y))
p_maxring <- p_maxring %>%
mutate (mean_dev = sqrt((x - mean(x))^2 + (y - mean(y))^2)) #extra column specifying the distance of each point to the mean of those points
p_maxring = p_maxring[round(seq(1, nrow(p_maxring), nrow(p_maxring)/23)),] #### Make a small ellipse to illustrate flaws of approach
coord_farthest <- c('x' = p_maxring$x[which.max(p_maxring$mean_dev)], 'y' = p_maxring$y[which.max(p_maxring$mean_dev)])
# gives the coordinates of the point farthest away from the mean point
farthest_from_farthest <- sqrt((p_maxring$x - coord_farthest['x'])^2 + (p_maxring$y - coord_farthest['y'])^2)
#now this looks which of the points is the farthest from the point farthest from the mean point :D
coord_fff <- c('x' = p_maxring$x[which.max(farthest_from_farthest)], 'y' = p_maxring$y[which.max(farthest_from_farthest)])
farthest_2_points = data.frame(t(cbind(coord_farthest, coord_fff)))
plot(p_maxring[,1:2], asp=1)
lines(farthest_2_points, col = 'blue', lwd = 2)
#### From answer in another post
d = cbind(p_maxring[,1], p_maxring[,2])
r = ellipsoidhull(d)
exy = predict(r) ## the ellipsoid boundary
lines(exy)
me = colMeans((exy))
dist2center = sqrt(rowSums((t(t(exy)-me))^2))
max(dist2center) ## major axis
lines(exy[dist2center == max(dist2center),], col = 'red', lwd = 2)
#### The plot here is made from the data in the reprex in OP, but with h = 0.5
library(MASS)
set.seed(123)
samples <- 10000
r <- 0.9
data <- mvrnorm(n=samples, mu=c(0, 0), Sigma=matrix(c(2, r, r, 2), nrow=2))
x <- data[, 1] # standard normal (mu=0, sd=1)
y <- data[, 2] # standard normal (mu=0, sd=1)
test.df <- data.frame(x = x, y = y)
## MAKE BLUE LINE
p <- ggplot(test.df, aes(x, y)) +
geom_density2d(color = 'red', lwd = 0.5, contour = T, h = 0.5) ## NOTE h = 0.5
p_built <- ggplot_build(p)
p_data <- p_built$data[[1]]
p_maxring <- p_data[p_data[['level']] == min(p_data[['level']]),][,2:3]
coord_mean <- c(x = mean(p_maxring$x), y = mean(p_maxring$y))
p_maxring <- p_maxring %>%
mutate (mean_dev = sqrt((x - mean(x))^2 + (y - mean(y))^2))
coord_farthest <- c('x' = p_maxring$x[which.max(p_maxring$mean_dev)], 'y' = p_maxring$y[which.max(p_maxring$mean_dev)])
farthest_from_farthest <- sqrt((p_maxring$x - coord_farthest['x'])^2 + (p_maxring$y - coord_farthest['y'])^2)
coord_fff <- c('x' = p_maxring$x[which.max(farthest_from_farthest)], 'y' = p_maxring$y[which.max(farthest_from_farthest)])
## MAKE RED LINE
## h = 0.5
## Given the highly irregular shape of the contours, I will use only the largest contour line (0.95) for draing the line.
## Thus, average = 1. See function below for details.
ln = long.diam("x", "y", test.df, h = 0.5, average = 1) ## NOTE h = 0.5
## PLOT
ggplot(test.df, aes(x, y)) +
geom_density2d(color = 'red', lwd = 0.5, contour = T, h = 0.5) + ## NOTE h = 0.5
geom_segment((aes(x = coord_farthest['x'], y = coord_farthest['y'],
xend = coord_fff['x'], yend = coord_fff['y'])), col = 'blue', lwd = 2) +
geom_abline(intercept = ln[1], slope = ln[2], color = 'red', lwd = 2) +
coord_equal()
Finally, I came up with the following function to deal with all this. Sorry for the lack of comments/clarity
#### This will return the intercept and slope of the longest diameter (semi-major axis).
####If Average = TRUE, it will average the int and slope across different density contours.
long.diam = function(x, y, df, probs = c(0.95, 0.5, 0.1), average = T, h = 2) {
fun.df = data.frame(cbind(df[,x], df[,y]))
colnames(fun.df) = c("x", "y")
dens = kde2d(fun.df$x, fun.df$y, n = 200, h = h)
dx <- diff(dens$x[1:2])
dy <- diff(dens$y[1:2])
sz <- sort(dens$z)
c1 <- cumsum(sz) * dx * dy
levels <- sapply(probs, function(x) {
approx(c1, sz, xout = 1 - x)$y
})
names(levels) = paste0("L", str_sub(formatC(probs, 2, format = 'f'), -2))
#plot(fun.df$x,fun.df$y, asp = 1)
#contour(dens, levels = levels, labels=probs, add=T, col = c('red', 'blue', 'green'), lwd = 2)
#contour(dens, add = T, col = 'red', lwd = 2)
#abline(lm(fun.df$y~fun.df$x))
ls <- contourLines(dens, levels = levels)
names(ls) = names(levels)
lines.info = list()
for (i in 1:length(ls)) {
d = cbind(ls[[i]]$x, ls[[i]]$y)
exy = predict(ellipsoidhull(d))## the ellipsoid boundary
colnames(exy) = c("x", "y")
me = colMeans((exy)) ## center of the ellipse
dist2center = sqrt(rowSums((t(t(exy)-me))^2))
#plot(exy,type='l',asp=1)
#points(d,col='blue')
#lines(exy[order(dist2center)[1:2],])
#lines(exy[rev(order(dist2center))[1:2],])
max.dist = data.frame(exy[rev(order(dist2center))[1:2],])
line.fit = lm(max.dist$y ~ max.dist$x)
lines.info[[i]] = c(as.numeric(line.fit$coefficients[1]), as.numeric(line.fit$coefficients[2]))
}
names(lines.info) = names(ls)
#plot(fun.df$x,fun.df$y, asp = 1)
#contour(dens, levels = levels, labels=probs, add=T, col = c('red', 'blue', 'green'), lwd = 2)
#abline(lines.info[[1]], col = 'red', lwd = 2)
#abline(lines.info[[2]], col = 'blue', lwd = 2)
#abline(lines.info[[3]], col = 'green', lwd = 2)
#abline(apply(simplify2array(lines.info), 1, mean), col = 'black', lwd = 4)
if (isTRUE(average)) {
apply(simplify2array(lines.info), 1, mean)
} else {
lines.info[[average]]
}
}
Finally, here's the final implementation of the different answers:
library(MASS)
set.seed(123)
samples = 10000
r = 0.9
data = mvrnorm(n=samples, mu=c(0, 0), Sigma=matrix(c(2, r, r, 2), nrow=2))
x = data[, 1] # standard normal (mu=0, sd=1)
y = data[, 2] # standard normal (mu=0, sd=1)
#plot(x, y)
test.df = data.frame(x = x, y = y)
#### Find furthest two points of contour
## BLUE
p <- ggplot(test.df, aes(x, y)) +
geom_density2d(color = 'red', lwd = 2, contour = T, h = 2)
p_built <- ggplot_build(p)
p_data <- p_built$data[[1]]
p_maxring <- p_data[p_data[['level']] == min(p_data[['level']]),][,2:3]
coord_mean <- c(x = mean(p_maxring$x), y = mean(p_maxring$y))
p_maxring <- p_maxring %>%
mutate (mean_dev = sqrt((x - mean(x))^2 + (y - mean(y))^2))
coord_farthest <- c('x' = p_maxring$x[which.max(p_maxring$mean_dev)], 'y' = p_maxring$y[which.max(p_maxring$mean_dev)])
farthest_from_farthest <- sqrt((p_maxring$x - coord_farthest['x'])^2 + (p_maxring$y - coord_farthest['y'])^2)
coord_fff <- c('x' = p_maxring$x[which.max(farthest_from_farthest)], 'y' = p_maxring$y[which.max(farthest_from_farthest)])
#### Find the average intercept and slope of 3 contour lines (0.95, 0.5, 0.1), as in my long.diam function above.
## RED
ln = long.diam("x", "y", test.df)
#### Plot everything. Black line is GLM
ggplot(test.df, aes(x, y)) +
geom_point(color = 'grey') +
geom_density2d(color = 'red', lwd = 1, contour = T, h = 2) +
geom_smooth(method = "glm", se = F, lwd = 1, color = 'black') +
geom_abline(intercept = ln[1], slope = ln[2], col = 'red', lwd = 1) +
geom_segment((aes(x = coord_farthest['x'], y = coord_farthest['y'],
xend = coord_fff['x'], yend = coord_fff['y'])), col = 'blue', lwd = 1) +
coord_equal()
I would like to plot individual subject means for two different conditions in a lattice stripplot with two panels. I would also like to add within-subject confidence intervals that I have calculated and stored in separate data frame. I am trying to overlay these confidence intervals with latticeExtra's layer function. When I add the layer, either both sets of intervals display on both panels (as illustrated in code and first image below) or both sets of intervals display on only the first panel if I add [subscripts] to the x's and y's in the layer command (illustrated in second code clip and image below). How do I get the appropriate intervals to display on the appropriate panel?
library(latticeExtra)
raw_data <- data.frame(subject = rep(1:6, 4), cond1 = as.factor(rep(1:2, each = 12)), cond2 = rep(rep(c("A", "B"), each = 6), 2), response = c(2:7, 6:11, 3:8, 7:12))
summary_data <- data.frame(cond1 = as.factor(rep(1:2, each = 2)), cond2 = rep(c("A", "B"), times = 2), mean = aggregate(response ~ cond2 * cond1, raw_data, mean)$response, within_ci = c(0.57, 0.54, 0.6, 0.63))
summary_data$lci <- summary_data$mean - summary_data$within_ci
summary_data$uci <- summary_data$mean + summary_data$within_ci
subject_stripplot <- stripplot(response ~ cond1 | cond2, groups = subject, data = raw_data,
panel = function(x, y, ...) {
panel.stripplot(x, y, type = "b", lty = 2, ...)
panel.average(x, y, fun = mean, lwd = 2, col = "black", ...) # plot line connecting means
}
)
addWithinCI <- layer(panel.segments(x0 = cond1, y0 = lci, x1 = cond1, y1 = uci, subscripts = TRUE), data = summary_data, under = FALSE)
plot(subject_stripplot + addWithinCI)
Stripplot with both sets of intervals on both panels:
addWithinCI2 <- layer(panel.segments(x0 = cond1[subscripts], y0 = lci[subscripts], x1 = cond1[subscripts], y1 = uci[subscripts], subscripts = TRUE), data = summary_data, under = FALSE)
plot(subject_stripplot + addWithinCI2)
Stripplot with both sets of intervals on only the first panel
One possible solution would be to print the stripplot (e.g., inside a png or any other graphics device) and subsequently modify each sub-panel using trellis.focus.
## display stripplot
print(subject_stripplot)
## loop over grops
for (i in c("A", "B")) {
# subset of current group
dat <- subset(summary_data, cond2 == i)
# add intervals to current panel
trellis.focus(name = "panel", column = ifelse(i == "A", 1, 2), row = 1)
panel.segments(x0 = dat$cond1, y0 = dat$lci,
x1 = dat$cond1, y1 = dat$uci, subscripts = TRUE)
trellis.unfocus()
}
Another (possibly more convenient) solution would be to create a separate xyplot and set the lower and upper y values (y0, y1) passed on to panel.segments manually in dependence of the current panel.number. In contrast to the initial approach using trellis.focus, the thus created plot can be stored in a variable and is hence available for subsequent processing inside R.
p_seg <- xyplot(lci ~ cond1 | cond2, data = summary_data, ylim = c(1, 13),
panel = function(...) {
# lower and upper y values
y0 <- list(summary_data$lci[c(1, 3)], summary_data$lci[c(2, 4)])
y1 <- list(summary_data$uci[c(1, 3)], summary_data$uci[c(2, 4)])
# insert vertical lines depending on current panel
panel.segments(x0 = 1:2, x1 = 1:2,
y0 = y0[[panel.number()]],
y1 = y1[[panel.number()]])
})
p_comb <- subject_stripplot +
as.layer(p_seg)
# print(p_comb)
Another solution that does not require latticeExtra (from Duncan Mackay):
summary_data$cond3 <- sapply(summary_data$cond2, pmatch, LETTERS)
mypanel <- function(x, y, ..., lci, uci, scond1, scond3, groups, type, lty){
pnl = panel.number()
panel.xyplot(x, y, ..., groups = groups, type = type, lty = lty)
panel.average(x, y, horizontal = FALSE, col = "black", lwd = 3)
panel.segments(x0 = scond1[scond3 == pnl],
y0 = lci[scond3 == pnl],
x1 = scond1[scond3 == pnl],
y1 = uci[scond3 == pnl])
}
with(summary_data,
stripplot(response ~ cond1 | cond2, data = raw_data,
groups = subject,
lci = lci,
uci = uci,
scond1 = summary_data$cond1,
scond3 = cond3,
type = "b",
lty = 2,
panel = mypanel)
)
I am using the function joinPolys in the R package PBSmapping to find intersections between polygons. However it is giving a NULL output with my data, even though I am pretty sure the intersection is non-empty.
I've created an example from https://code.google.com/p/pbs-mapping/issues/detail?id=31. In the link, the code is designed to show a case where the code does work (but doesn't work for me). The example is as follows:
Code does not work:
require(PBSmapping)
polyA <- data.frame(PID=rep(1,4),POS=1:4,X=c(0,1,1,0),Y=c(0,0,1,1))
polyB <- data.frame(PID=rep(1,4),POS=1:4,X=c(.5,1.5,1.5,.5),Y=c(.5,.5,1.5,1.5))
# Plot polygons
plotPolys(polyA, xlim=c(0,3), ylim=c(0,3))
addPolys(polyB, border=2)
# returns NULL
print(joinPolys(polyA, polyB))
However, in other cases, the code does work:
require(PBSmapping)
N <- 4
X = cos(seq(0, 2*pi, length = N))
Y = sin(seq(0, 2*pi, length = N))
require(PBSmapping)
polysA1 = data.frame(PID = rep(1, N), POS = 1:N,
X = 5*X, Y = 5*Y)
polysB1 = data.frame(PID = rep(1, N), POS = 1:N,
X = 5*X + 5, Y = 5*Y)
plotMap(NULL, xlim = c(-10, 10), ylim = c(-10, 10))
addPolys(polysA1, col = 'blue', lty = 12, density = 0, pch = 16)
addPolys(polysB1, col = 'red', lty = 12, density = 0, pch = 16)
addPolys(joinPolys(polysA1, polysB1), col = 2)
print(head(joinPolys(polysA1, polysB1)))
I am using R version 3.1.3, and Ubuntu 14.04.2 LTS.
Thanks in advance! I'm new to stackoverflow, so please let me know if there is anything else I can provide.
Cheers