I want to create a custom heatmap, whereby a matrix defines the intensity (alpha) of a series of squares with uniquely specified colors. Additionally, the axis defining each square will be unique (see example below).
Which packages may help do this? The geom_rect function from ggplot2 (used in this different question seems promising but too tightly coupled to a given plot?
e.g
Data:
[,1] [,2]
[1,] 30 5
[2,] 3 50
Axis:
x_bounds <- t(matrix(c(
0,10,
10,30
), 2))
y_bounds <- t(matrix(c(
0,-50,
-50,1000
), 2))
Result mock-up:
Does this give you what you want?
library(ggplot2)
x_bounds <- c(0,10,30)
y_bounds <- c(0,-50,1000)
df <- data.frame(x = c(0,1,0,1),
y = c(0,0,1,1),
fill = c("red","green","blue","yellow"),
alpha = c(0.6,0.6,0.5,0.8))
ggplot(data = df) +
geom_rect(aes(xmin = x, xmax = x+1, ymin = y, ymax = y+1,
fill = fill, alpha = alpha)) +
scale_x_continuous(breaks = min(df$x):(max(df$x)+1),
labels = x_bounds) +
scale_y_continuous(breaks = min(df$y):(max(df$y)+1),
labels = y_bounds) +
scale_fill_identity() +
theme(panel.background=element_blank(),
panel.border=element_blank(),
panel.grid.major=element_blank(),
panel.grid.minor=element_blank(),
plot.background=element_blank())
Related
The figure below is a conceptual diagram used by Michael Clark,
https://m-clark.github.io/docs/lord/index.html
to explain Lord's Paradox and related phenomena in regression.
My question is framed in this context and using ggplot2 but it is broader in terms of geometry & graphing.
I would like to reproduce figures like this, but using actual data. I need to know:
how to draw a new axis at the origin, with a -45 degree angle, corresponding to values of y-x
how to draw little normal distributions or density diagrams, or other representations of the values y-x projected onto this axis.
My minimal base example uses ggplot2,
library(ggplot2)
set.seed(1234)
N <- 200
group <- rep(c(0, 1), each = N/2)
initial <- .75*group + rnorm(N, sd=.25)
final <- .4*initial + .5*group + rnorm(N, sd=.1)
change <- final - initial
df <- data.frame(id = factor(1:N),
group = factor(group,
labels = c('Female', 'Male')),
initial,
final,
change)
#head(df)
#' plot, with regression lines and data ellipses
ggplot(df, aes(x = initial, y = final, color = group)) +
geom_point() +
geom_smooth(method = "lm", formula = y~x) +
stat_ellipse(size = 1.2) +
geom_abline(slope = 1, color = "black", size = 1.2) +
coord_fixed(xlim = c(-.6, 1.2), ylim = c(-.6, 1.2)) +
theme_bw() +
theme(legend.position = c(.15, .85))
This gives the following graph:
In geometry, the coordinates of the -45 degree rotated axes of distributions I want to portray are
(y-x), (x+y) in the original space of the plot. But how can I draw these with
ggplot2 or other software?
An accepted solution can be vague about how the distribution of (y-x) is represented,
but should solve the problem of how to display this on a (y-x) axis.
Fun question! I haven't encountered it yet, but there might be a package to help do this automatically. Here's a manual approach using two hacks:
the clip = "off" parameter of the coord_* functions, to allow us to add annotations outside the plot area.
building a density plot, extracting its coordinates, and then rotating and translating those.
First, we can make a density plot of the change from initial to final, seeing a left skewed distribution:
(my_hist <- df %>%
mutate(gain = final - initial) %>% # gain would be better name
ggplot(aes(gain)) +
geom_density())
Now we can extract the guts of that plot, and transform the coordinates to where we want them to appear in the combined plot:
a <- ggplot_build(my_hist)
rot = pi * 3/4
diag_hist <- tibble(
x = a[["data"]][[1]][["x"]],
y = a[["data"]][[1]][["y"]]
) %>%
# squish
mutate(y = y*0.2) %>%
# rotate 135 deg CCW
mutate(xy = x*cos(rot) - y*sin(rot),
dens = x*sin(rot) + y*cos(rot)) %>%
# slide
mutate(xy = xy - 0.7, # magic number based on plot range below
dens = dens - 0.7)
And here's a combination with the original plot:
ggplot(df, aes(x = initial, y = final, color = group)) +
geom_point() +
geom_smooth(method = "lm", formula = y~x) +
stat_ellipse(size = 1.2) +
geom_abline(slope = 1, color = "black", size = 1.2) +
coord_fixed(clip = "off",
xlim = c(-0.7,1.6),
ylim = c(-0.7,1.6),
expand = expansion(0)) +
annotate("segment", x = -1.4, xend = 0, y = 0, yend = -1.4) +
annotate("path", x = diag_hist$xy, y = diag_hist$dens) +
theme_bw() +
theme(legend.position = c(.15, .85),
plot.margin = unit(c(.1,.1,2,2), "cm"))
I want to plot the gradient plot of intensities, something like this:
I though myself about creating a gradient grid whose distribution was my "I" function, but I have no idea how to do it or if there is an explicit package in R to accomplish this task.
Thank you so much for even thinking about this.
a <- 5*10^(-6)
d <- 0.5*0.005
l <- 500*10^(-9)
n <- pi
theta <- seq(-n,n,length=3500)
I <- function(x){(cos((pi*d*sin(x))/l))^2*(sin((pi*a*sin(x))/l)/((pi*a*sin(x))/l))^2}
y1 <- lapply(theta,I)
y <- unlist(y1)
df <- data.frame(theta,y)
I2 <- function(x){(sin((pi*a*sin(x))/l)/((pi*a*sin(x))/l))^2}
y12 <- lapply(theta,I2)
y2 <- unlist(y12)
df2 <- data.frame(theta,y2)
p = ggplot()
p +
geom_line(data = df, aes(theta,y)) +
xlim(-0.3,0.3) +
geom_line(data = df2, aes(theta,y2))
Making use of patchwork this could be achieved like so:
For the gradient make a second ggplot of rectangles using e.g. geom_rect where you map intensity on color and/or fill
This gradient plot could then be glued to the main plot via patchwork
To get a nice gradient plot
I tripled the number of grid points for the gradient plot,
mapped the cubic root of intensity on color and
get rid of all unnecessary elemnts like y-axis, color guide, ...
BTW:
As your functions are vectorized you don't need lapply to compute the intensities.
Instead of adjusting the limits via xlim() (which removes rows falling outside of the range), set them using coord_cartesian.
library(ggplot2)
library(tibble)
library(patchwork)
a <- 5*10^(-6)
d <- 0.5*0.005
l <- 500*10^(-9)
n <- pi
theta <- seq(-n,n,length=3500)
I <- function(x){(cos((pi*d*sin(x))/l))^2*(sin((pi*a*sin(x))/l)/((pi*a*sin(x))/l))^2}
y <- I(theta)
df <- data.frame(theta,y)
I2 <- function(x){(sin((pi*a*sin(x))/l)/((pi*a*sin(x))/l))^2}
y2 <- I2(theta)
df2 <- data.frame(theta,y2)
p1 = ggplot() +
geom_line(data = df, aes(theta,y)) +
geom_line(data = df2, aes(theta,y2)) +
coord_cartesian(xlim = c(-0.3,0.3))
g <- tibble(
xmin = seq(-n, n, length = 3 * 3500),
xmax = dplyr::lead(xmin),
y = I(xmin)
)
p2 <- ggplot(g, aes(xmin = xmin, xmax = xmax, ymin = 0, ymax = 1, color = y^(1/3))) +
geom_rect() +
coord_cartesian(xlim = c(-0.3,0.3)) +
guides(color = FALSE) +
theme_minimal() +
theme(axis.ticks.y = element_blank(), axis.text.y = element_blank())
p1 / p2 + plot_layout(heights = c(10, 1))
#> Warning: Removed 1 rows containing missing values (geom_rect).
I'm working with stock prices and trying to plot the price difference.
I created one using autoplot.zoo(), my question is, how can I manage to change the point shapes to triangles when they are above the upper threshold and to circles when they are below the lower threshold. I understand that when using the basic plot() function you can do these by calling the points() function, wondering how I can do this but with ggplot2.
Here is the code for the plot:
p<-autoplot.zoo(data, geom = "line")+
geom_hline(yintercept = threshold, color="red")+
geom_hline(yintercept = -threshold, color="red")+
ggtitle("AAPL vs. SPY out of sample")
p+geom_point()
We can't fully replicate without your data, but here's an attempt with some sample generated data that should be similar enough that you can adapt for your purposes.
# Sample data
data = data.frame(date = c(2001:2020),
spread = runif(20, -10,10))
# Upper and lower threshold
thresh <- 4
You can create an additional variable that determines the shape, based on the relationship in the data itself, and pass that as an argument into ggplot.
# Create conditional data
data$outlier[data$spread > thresh] <- "Above"
data$outlier[data$spread < -thresh] <- "Below"
data$outlier[is.na(data$outlier)] <- "In Range"
library(ggplot2)
ggplot(data, aes(x = date, y = spread, shape = outlier, group = 1)) +
geom_line() +
geom_point() +
geom_hline(yintercept = c(thresh, -thresh), color = "red") +
scale_shape_manual(values = c(17,16,15))
# If you want points just above and below# Sample data
data = data.frame(date = c(2001:2020),
spread = runif(20, -10,10))
thresh <- 4
data$outlier[data$spread > thresh] <- "Above"
data$outlier[data$spread < -thresh] <- "Below"
ggplot(data, aes(x = date, y = spread, shape = outlier, group = 1)) +
geom_line() +
geom_point() +
geom_hline(yintercept = c(thresh, -thresh), color = "red") +
scale_shape_manual(values = c(17,16))
Alternatively, you can just add the points above and below the threshold as individual layers with manually specified shapes, like this. The pch argument points to shape type.
# Another way of doing this
data = data.frame(date = c(2001:2020),
spread = runif(20, -10,10))
# Upper and lower threshold
thresh <- 4
ggplot(data, aes(x = date, y = spread, group = 1)) +
geom_line() +
geom_point(data = data[data$spread>thresh,], pch = 17) +
geom_point(data = data[data$spread< (-thresh),], pch = 16) +
geom_hline(yintercept = c(thresh, -thresh), color = "red") +
scale_shape_manual(values = c(17,16))
I'm trying to produce a scatter plot with geom_point where the points are circumscribed by a smoothed polygon, with geom_polygon.
Here's my point data:
set.seed(1)
df <- data.frame(x=c(rnorm(30,-0.1,0.1),rnorm(30,0,0.1),rnorm(30,0.1,0.1)),y=c(rnorm(30,-1,0.1),rnorm(30,0,0.1),rnorm(30,1,0.1)),val=rnorm(90),cluster=c(rep(1,30),rep(2,30),rep(3,30)),stringsAsFactors=F)
I color each point according the an interval that df$val is in. Here's the interval data:
intervals.df <- data.frame(interval=c("(-3,-2]","(-2,-0.999]","(-0.999,0]","(0,1.96]","(1.96,3.91]","(3.91,5.87]","not expressed"),
start=c(-3,-2,-0.999,0,1.96,3.91,NA),end=c(-2,-0.999,0,1.96,3.91,5.87,NA),
col=c("#2f3b61","#436CE8","#E0E0FF","#7d4343","#C74747","#EBCCD6","#D3D3D3"),stringsAsFactors=F)
Assigning colors and intervals to the points:
df <- cbind(df,do.call(rbind,lapply(df$val,function(x){
if(is.na(x)){
return(data.frame(col=intervals.df$col[nrow(intervals.df)],interval=intervals.df$interval[nrow(intervals.df)],stringsAsFactors=F))
} else{
idx <- which(intervals.df$start <= x & intervals.df$end >= x)
return(data.frame(col=intervals.df$col[idx],interval=intervals.df$interval[idx],stringsAsFactors=F))
}
})))
Preparing the colors for the leged which will show each interval:
df$interval <- factor(df$interval,levels=intervals.df$interval)
colors <- intervals.df$col
names(colors) <- intervals.df$interval
Here's where I constructed the smoothed polygons (using a function courtesy of this link):
clusters <- sort(unique(df$cluster))
cluster.cols <- c("#ff00ff","#088163","#ccbfa5")
splinePolygon <- function(xy,vertices,k=3, ...)
{
# Assert: xy is an n by 2 matrix with n >= k.
# Wrap k vertices around each end.
n <- dim(xy)[1]
if (k >= 1) {
data <- rbind(xy[(n-k+1):n,], xy, xy[1:k, ])
} else {
data <- xy
}
# Spline the x and y coordinates.
data.spline <- spline(1:(n+2*k), data[,1], n=vertices, ...)
x <- data.spline$x
x1 <- data.spline$y
x2 <- spline(1:(n+2*k), data[,2], n=vertices, ...)$y
# Retain only the middle part.
cbind(x1, x2)[k < x & x <= n+k, ]
}
library(data.table)
hulls.df <- do.call(rbind,lapply(1:length(clusters),function(l){
dt <- data.table(df[which(df$cluster==clusters[l]),])
hull <- dt[, .SD[chull(x,y)]]
spline.hull <- splinePolygon(cbind(hull$x,hull$y),100)
return(data.frame(x=spline.hull[,1],y=spline.hull[,2],val=NA,cluster=clusters[l],col=cluster.cols[l],interval=NA,stringsAsFactors=F))
}))
hulls.df$cluster <- factor(hulls.df$cluster,levels=clusters)
And here's my ggplot command:
library(ggplot2)
p <- ggplot(df,aes(x=x,y=y,colour=interval))+geom_point(cex=2,shape=1,stroke=1)+labs(x="X", y="Y")+theme_bw()+theme(legend.key=element_blank(),panel.border=element_blank(),strip.background=element_blank())+scale_color_manual(drop=FALSE,values=colors,name="DE")
p <- p+geom_polygon(data=hulls.df,aes(x=x,y=y,group=cluster),color=hulls.df$col,fill=NA)
which produces:
My question is how do I add a legend for the polygon under the legend for the points? I want it to a legend with 3 lines colored according to the cluster colors and the corresponding cluster number beside each line?
Slightly different output, only changing the last line of your code, it may solve your purpose:
p+geom_polygon(data=hulls.df,aes(x=x,y=y,group=cluster, fill=cluster),alpha=0.1)
Say, you want to add a legend of the_factor. My basic idea is,
(1) put the_factor into mapping by using unused aes arguments; aes(xx = the_factor)
(2) if (1) affects something, delete the effect by using scale_xx_manual()
(3) modify the legend by using guides(xx = guide_legend(override.aes = list()))
In your case, aes(fill) and aes(alpha) are unused. The former is better to do it because of no effect. So I used aes(fill=as.factor(cluster)).
p <- ggplot(df,aes(x=x,y=y,colour=interval, fill=as.factor(cluster))) + # add aes(fill=...)
geom_point(cex=2, shape=1, stroke=1) +
labs(x="X", y="Y",fill="cluster") + # add fill="cluster"
theme_bw() + theme(legend.key=element_blank(),panel.border=element_blank(),strip.background=element_blank()) + scale_color_manual(drop=FALSE,values=colors,name="DE") +
guides(fill = guide_legend(override.aes = list(colour = cluster.cols, pch=0))) # add
p <- p+geom_polygon(data=hulls.df,aes(x=x,y=y,group=cluster), color=hulls.df$col,fill=NA)
Of course, you can make the same graph by using aes(alpha = the_factor)). Because it has influence, you need to control it by using scale_alpha_manual().
g <- ggplot(df, aes(x=x,y=y,colour=interval)) +
geom_point(cex=2, shape=1, stroke=1, aes(alpha=as.factor(cluster))) + # add aes(alpha)
labs(x="X", y="Y",alpha="cluster") + # add alpha="cluster"
theme_bw() + theme(legend.key=element_blank(),panel.border=element_blank(),strip.background=element_blank()) + scale_color_manual(drop=FALSE,values=colors,name="DE") +
scale_alpha_manual(values=c(1,1,1)) + # add
guides(alpha = guide_legend(override.aes = list(colour = cluster.cols, pch=0))) # add
g <- p+geom_polygon(data=hulls.df,aes(x=x,y=y,group=cluster), color=hulls.df$col,fill=NA)
What you are asking for is two colour scales. My understanding is that this is not possible. But you can give the impression of having two colour scales with a bit of a cheat and using the filled symbols (shapes 21 to 25).
p <- ggplot(df, aes(x = x, y = y, fill = interval)) +
geom_point(cex = 2, shape = 21, stroke = 1, colour = NA)+
labs(x = "X", y = "Y") +
theme_bw() +
theme(legend.key = element_blank(), panel.border = element_blank(), strip.background = element_blank()) +
scale_fill_manual(drop=FALSE, values=colors, name="DE") +
geom_polygon(data = hulls.df, aes(x = x, y = y, colour = cluster), fill = NA) +
scale_colour_manual(values = cluster.cols)
p
Alternatively, use a filled polygon with a low alpha
p <- ggplot(df,aes(x=x,y=y,colour=interval))+
geom_point(cex=2,shape=1,stroke=1)+
labs(x="X", y="Y")+
theme_bw() +
theme(legend.key = element_blank(),panel.border=element_blank(), strip.background=element_blank()) +
scale_color_manual(drop=FALSE,values=colors,name="DE", guide = guide_legend(override.aes = list(fill = NA))) +
geom_polygon(data=hulls.df,aes(x=x,y=y,group=cluster, fill = cluster), alpha = 0.2, show.legend = TRUE) +
scale_fill_manual(values = cluster.cols)
p
But this might make the point colours difficult to see.
Length of x-axis is important for my plot because it allows one to compare between facets, therefore I want facets to have different x-axis sizes. Here is my example data:
group1 <- seq(1, 10, 2)
group2 <- seq(1, 20, 3)
x = c(group1, group2)
mydf <- data.frame (X =x , Y = rnorm (length (x),5,1),
groups = c(rep(1, length (group1)), rep(2, length(group2))))
And my code:
p1 = ggplot(data=mydf,aes(x=X,y=Y,color=factor(groups)) )+
geom_point(size=2)+
scale_x_continuous(labels=comma)+
theme_bw()
p1+facet_grid(groups ~ .,scales = "fixed",space="free_x")
And the resulting figure:
Panel-1 has x-axis values less then 10 whereas panel-2 has x-axis value extending to 20. Still both panels and have same size on x-axis. Is there any way to make x-axis panel size different for different panels, so that they correspond to their (x-axis) values?
I found an example from some different package that shows what I am trying to do, here is the figure:
Maybe something like this can get you started. There's still some formatting to do, though.
library(grid)
library(gridExtra)
library(dplyr)
library(ggplot2)
p1 <- ggplot(data=mydf[mydf$groups==1,],aes(x=X,y=Y))+
geom_point(size=2)+
theme_bw()
p2 <- ggplot(data=mydf[mydf$groups==2,],aes(x=X,y=Y))+
geom_point(size=2)+
theme_bw()
summ <- mydf %>% group_by(groups) %>% summarize(len=diff(range(X)))
summ$p <- summ$len/max(summ$len)
summ$q <- 1-summ$p
ng <- nullGrob()
grid.arrange(arrangeGrob(p1,ng,widths=summ[1,3:4]),
arrangeGrob(p2,ng,widths=summ[2,3:4]))
I'm sure there's a way to make this more general, and the axes don't line up perfectly yet, but it's a beginning.
Here is a solution following OP's clarifying comment ("I guess axis will be same but the boxes will be of variable size. Is it possible by plotting them separately and aligning in grid?").
library(plyr); library(ggplot2)
buffer <- 0.5 # Extra space around the box
#Calculate box parameters
mydf.box <- ddply(mydf, .(groups), summarise,
max.X = max(X) + buffer,
min.X = 0,
max.Y = max(Y) + buffer,
min.Y = 0,
X = mean(X), Y = mean(Y)) #Dummy values for X and Y needed for geom_rect
p2 <- ggplot(data=mydf,aes(x=X, y=Y) )+
geom_rect(data = mydf.box, aes( xmax = max.X, xmin = min.X,
ymax = max.Y, ymin = min.Y),
fill = "white", colour = "black", fill = NA) +
geom_point(size=2) + facet_grid(groups ~ .,scales = "free_y") +
theme_classic() +
#Extra formatting to make your plot like the example
theme(panel.background = element_rect(fill = "grey85"),
strip.text.y = element_text(angle = 0),
strip.background = element_rect(colour = NA, fill = "grey65"))