ggplot: Drawing tiles / rectangles with discrete variables - r

I'm attempting to draw tiles / rectangles to get the following result:
library(tidyverse)
library(plotly)
set.seed(0)
df <- tibble(
a = runif(5),
b = runif(5),
c = runif(5),
d = runif(5),
case_id = 1:5
) %>% tidyr::pivot_longer(cols = -case_id)
plot <- ggplot2::ggplot(
data = df,
mapping = aes(
x = name,
y = value,
group = case_id
)
) + geom_point()
plot_boxes_y <- seq(from = 0, to = 1, by = .2)
plot_boxes_x <- unique(df$name) %>% length()
for (x in 1:plot_boxes_x) {
for (y in plot_boxes_y) {
plot <- plot + geom_rect(
mapping = aes_(
xmin = x - .5,
xmax = x + .5,
ymin = y - .5,
ymax = y + .5
),
color = "red",
fill = NA
)
}
}
plotly::ggplotly(plot)
As you can see, I currently do this by looping through coordinates and drawing each rectangle individually. The problem is, that this generates many layers which makes plotly::ggplotly() really slow on large datasets.
Therefore, I'm looking for a more efficient way. Please note, that I cannot use the panel.grid, since I intend to visualize z-data by filling rectangles later on.
My approach was to draw geom_tile() on top of the scatter plot:
# my attempt
df$z <- rep(0, nrow(df))
plot2 <- ggplot2::ggplot(
data = df,
mapping = aes(
x = name,
y = value,
color = z,
group = case_id
)
) + geom_point() + geom_tile()
I assume that this fails because of the fact that name is a discrete variable? So, how can i efficiently draw tiles in addition to my scatterplot?
Thanks

Here is a solution using the geom_tile option. The key here creating a data frame to hold the coordinates of the grid and then specifying the aesthetics individually in each of the function calls.
library(ggplot2)
library(tidyr)
set.seed(0)
df <- tibble(
a = runif(5),
b = runif(5),
c = runif(5),
d = runif(5),
case_id = 1:5
) %>% pivot_longer(cols = -case_id)
df$z <- rep(0, nrow(df))
#make data frame for the grid corrdinates
grid<-data.frame(x=factor( ordered( 1:4), labels = c("a", "b", "c", "d" )),
y=rep(seq(0, 1, .1), each=4))
#plot using geom_tile & geom_point
plot2 <- ggplot2::ggplot() + geom_tile(data=grid, aes(x=x, y=y), fill=NA, col="red") +
geom_point(data = df,
mapping = aes(
x = name,
y = value,
color = z,
group = case_id))
print(plot2)

if you don't mind them going beyond the axis
ggplot(df,aes(x=name,y=value)) + geom_point() +
geom_vline(xintercept=seq(0.5,4.5,by=1)) +
geom_hline(yintercept=seq(0,2,by=.2))
else:
#make a new data frame
GRIDS = rbind(
# the vertical lines
data.frame(x=seq(0.5,4.5,by=1),xend=seq(0.5,4.5,by=1),y=0,yend=2),
# the horizontal lines
data.frame(x=0.5,xend=4.5,y=seq(0,2,by=.2),yend=seq(0,2,by=.2))
)
ggplot(df,aes(x=name,y=value)) + geom_point() +
geom_segment(data=GRIDS,aes(x=x,y=y,xend=xend,yend=yend),col="red")

Related

How to draw a multi-colored dashed line (alternating colors for visual effect) [duplicate]

This question already has answers here:
Alternating color of individual dashes in a geom_line
(4 answers)
Closed 8 months ago.
I was wondering if it is possible to create a multicolored dashed line in ggplot.
Basically I have a plot displaying savings based on two packages.
A orange line with savings based on package A
A green line with savings based on package B
I also have a third line and I would like that one to be dashed alterenating between orange and green. Is that something that somebody has been able to do?
Here is an example:
library(tidyverse)
S <- seq(0, 5, by = 0.05)
a <- S ^ 2
b <- S
a_b = a + b #This data should have the dashed multicolor line, since it is the sum of the other two lines.
S <- data.frame(S)
temp <- cbind(S, a, b, a_b)
temp <- gather(temp, variable, value, -S)
desiredOrder <- c("a", "b", "a_b")
temp$variable <- factor(temp$variable, levels = desiredOrder)
temp <- temp[order(temp$variable),]
p <- ggplot(temp, aes(x = S, y = value, colour = variable)) +
theme_minimal() +
geom_line(size = 1) +
scale_color_manual(name = "Legend", values = c("orange", "green", "#0085bd"),
breaks = c("a", "b", "a_b"))
p
I basically want to have a multicolored (dashed or dotted) line for "c"
This is, to my best knowledge, currently only possible via creation of new segments for each alternate color. This is fiddly.
Below I've tried a largely programmatic approach in which you can define the size of the repeating segment (based on your x unit). The positioning of y values is slightly convoluted and it will also result in slightly irregular segment lengths when dealing with different slopes. I also haven't tested it on many data, either. But I guess it's a good start :)
For the legend, I'm taking the same approach, by creating a fake legend and stitching it onto the other plot. The challenges here include:
positioning of legend elements relative to the plot
relative distance between the legend elements
update
For a much neater way to create those segments and a Stat implementation see this thread
library(tidyverse)
library(patchwork)
S <- seq(0, 5, by = 0.05)
a <- S^2
b <- S
a_b <- a + b
df <- data.frame(x = S, a, b, a_b) %>%
pivot_longer(-x, names_to = "variable", values_to = "value")
## a function to create modifiable cuts in order to get segments.
## this looks convoluted - and it is! there are a few if/else statements.
## Why? The assigment of new y to x values depends on how many original values
## you have.
## There might be more direct ways to get there
alt_colors <- function(df, x, y, seg_length, my_cols) {
x <- df[[x]]
y <- df[[y]]
## create new x for each tiny segment
length_seg <- seg_length / length(my_cols)
new_x <- seq(min(x, na.rm = TRUE), x[length(x)], length_seg)
## now we need to interpolate y values for each new x
## This is different depending on how many x and new x you have
if (length(new_x) < length(x)) {
ind_int <- findInterval(new_x, x)
new_y <- sapply(seq_along(ind_int), function(i) {
if (y[ind_int[i]] == y[ind_int[length(ind_int)]]) {
y[ind_int[i]]
} else {
seq_y <- seq(y[ind_int[i]], y[ind_int[i] + 1], length.out = length(my_cols))
head(seq_y, -1)
}
})
} else {
ind_int <- findInterval(new_x, x)
rle_int <- rle(ind_int)
new_y <- sapply(rle_int$values, function(i) {
if (y[i] == y[max(rle_int$values)]) {
y[i]
} else {
seq_y <- seq(y[i], y[i + 1], length.out = rle_int$lengths[i] + 1)
head(seq_y, -1)
}
})
}
## THis is also a bit painful and might cause other bugs that I haven't
## discovered yet.
if (length(unlist(new_y)) < length(new_x)) {
newdat <- data.frame(
x = new_x,
y = rep_len(unlist(new_y), length.out = length(new_x))
)
} else {
newdat <- data.frame(x = new_x, y = unlist(new_y))
}
newdat <- newdat %>%
mutate(xend = lead(x), yend = lead(y)) %>%
drop_na(xend)
newdat$color <- my_cols
newdat
}
## the below is just a demonstration of how the function would work
## using different segment widths
df_alt1 <-
df %>%
filter(variable == "a_b") %>%
alt_colors("x", "value", 1, c("orange", "green"))
df_alt.5 <-
df %>%
filter(variable == "a_b") %>%
alt_colors("x", "value", .5, c("orange", "green"))
df_ab <-
df %>%
filter(variable != "a_b") %>%
# for the identity mapping
mutate(color = ifelse(variable == "a", "green", "orange"))
## create data frame for the legend, also using the alt_colors function as per above
## the amount of x is a bit of trial and error, this is just a quick hack
## this is a trick to center the legend more or less relative to the main plot
y_leg <- ceiling(mean(range(df$value, na.rm = TRUE)))
dist_y <- 2
df_legend <-
data.frame(
variable = rep(unique(df$variable), each = 2),
x = 1:2,
y = rep(seq(y_leg - dist_y, y_leg + dist_y, by = dist_y), each = 2)
)
df_leg_onecol <-
df_legend %>%
filter(variable != "a_b") %>%
mutate(color = ifelse(variable == "a", "green", "orange"))
df_leg_alt <-
df_legend %>%
filter(variable == "a_b") %>%
alt_colors("x", "y", .5, c("orange", "green"))
## I am mapping the colors globally using identity mapping (see scale_identity).
p1 <-
ggplot(mapping = aes(x, value, colour = color)) +
theme_minimal() +
geom_line(data = df_ab, size = 1) +
geom_segment(data = df_alt1, aes(y = y, xend = xend, yend = yend), size = 1) +
scale_color_identity() +
ggtitle("alternating every 1 unit")
p.5 <-
ggplot(mapping = aes(x, value, colour = color)) +
theme_minimal() +
geom_line(data = df_ab, size = 1) +
geom_segment(data = df_alt.5, aes(y = y, xend = xend, yend = yend), size = 1) +
scale_color_identity() +
ggtitle("alternating every .5 unit")
p_leg <-
ggplot(mapping = aes(x, y, colour = color)) +
theme_void() +
geom_line(data = df_leg_onecol, size = 1) +
geom_segment(data = df_leg_alt, aes(xend = xend, yend = yend), size = 1) +
scale_color_identity() +
annotate(
geom = "text", y = unique(df_legend$y), label = unique(df_legend$variable),
x = max(df_legend$x + 1), hjust = 0
)
## set y limits to the range of the main plot
## in order to make the labels visible you need to adjust the plot margin and
## turn clipping off
p1 + p.5 +
(p_leg + coord_cartesian(ylim = range(df$value), clip = "off") +
theme(plot.margin = margin(r = 20, unit = "pt"))) +
plot_layout(widths = c(1, 1, .2))
Created on 2022-01-18 by the reprex package (v2.0.1)
(Copied this over from Alternating color of individual dashes in a geom_line)
Here's a ggplot hack that is simple, but works for two colors only. It results in two lines being overlayed, one a solid line, the other a dashed line.
library(dplyr)
library(ggplot2)
library(reshape2)
# Create df
x_value <- 1:10
group1 <- c(0,1,2,3,4,5,6,7,8,9)
group2 <- c(0,2,4,6,8,10,12,14,16,18)
dat <- data.frame(x_value, group1, group2) %>%
mutate(group2_2 = group2) %>% # Duplicate the column that you want to be alternating colors
melt(id.vars = "x_value", variable.name = "group", value.name ="y_value") # Long format
# Put in your selected order
dat$group <- factor(dat$group, levels=c("group1", "group2", "group2_2"))
# Plot
ggplot(dat, aes(x=x_value, y=y_value)) +
geom_line(aes(color=group, linetype=group), size=1) +
scale_color_manual(values=c("black", "red", "black")) +
scale_linetype_manual(values=c("solid", "solid", "dashed"))
Unfortunately the legend still needs to be edited by hand. Here's the example plot.

Scaling geom_point size on heatmap to fit correctly in R?

Basically, I have a heatmap that contains some points. What Im trying to do is automatically rescale the size of the points in a sensible way for different sized heatmaps. For example, if I have a heatmap that looks like so:
library(reshape)
library(ggplot2)
library(ggnewscale)
# Create matrix
set.seed(1701)
a <- sample(1:10,100, replace=TRUE)
s <- matrix(a, nrow = 5, ncol=5)
s[upper.tri(s)] = t(s)[upper.tri(s)]
rownames(s) <- colnames(s) <- paste0("x", 1:5)
diag(s) <- 0
sDf <- melt(s)
# create diagonal values
diagDf <- data.frame(
var1 = c(paste0("x", 1:5)),
var2 = c(paste0("x", 1:5)),
val = c(2,5,3,1,5)
)
# make plot
ggplot(sDf, aes(X1,X2)) +
geom_tile(aes(fill = value)) +
scale_fill_gradientn(colors = rev(colorspace::sequential_hcl(palette = "Blues 3", n = 100))) +
new_scale_fill() +
geom_point(data = diagDf, aes(var1, var2, col = val), size = 20) +
theme(aspect.ratio = 1)
So in the image above, the diagonal contains geom_points and their size is manually set to size = 20.... This works for this example, but the issue is:
If the heatmap dimensions were changed to say 20x20, then having the size hardcoded to equal 20 won't work due to overlapping & the points being too big etc.
So what Im trying to do is come up with a method that will automatically resize the points to effectively fill square they are contained in without overlapping, being too big or too small.
Any suggestions as to how I could do this?
I would do something like this:
library(reshape)
library(ggplot2)
library(ggnewscale)
n <- 5
# Create matrix
set.seed(1701)
a <- sample(1:10,100, replace=TRUE)
s <- matrix(a, nrow = n, ncol=n)
s[upper.tri(s)] = t(s)[upper.tri(s)]
rownames(s) <- colnames(s) <- paste0("x", 1:n)
diag(s) <- 0
sDf <- melt(s)
# create diagonal values
diagDf <- data.frame(
var1 = c(paste0("x", 1:n)),
var2 = c(paste0("x", 1:n)),
val = sample(1:5,n,replace = T)
)
# make plot
ggplot(sDf, aes(X1,X2)) +
geom_tile(aes(fill = value)) +
scale_fill_gradientn(colors = rev(colorspace::sequential_hcl(palette = "Blues 3", n = 100))) +
new_scale_fill() +
geom_point(data = diagDf, aes(var1, var2, col = val), size = 1/sqrt(nrow(sDf))*80) +
theme(aspect.ratio = 1)
here the size of the points depends on the dimension of the matrix.
an example of the output with a 3x3, 5x5, and 10x10 matrix
You can modify diagDf to contain the co-ordinates of the circles you want to plot using some basic trigonometry, then plot them as filled polygons. This ensures they will always scale exactly with your plot.
library(dplyr)
diagDf <- diagDf %>%
mutate(var1 = as.numeric(as.factor(var1)),
var2 = as.numeric(as.factor(var2))) %>%
split.data.frame(diagDf$var1) %>%
lapply(function(x) {
deg <- seq(0, 2 * pi, length = 100)
var1 <- cos(deg)/2.2
var2 <- sin(deg)/2.2
val <- rep(x$val, 100)
data.frame(var1 = var1 + x$var1, var2 = var2 + x$var2, val = val)}) %>%
{do.call(rbind, .)}
Now with slightly modified plot code, we get:
ggplot(sDf, aes(X1,X2)) +
geom_tile(aes(fill = value)) +
scale_fill_gradientn(colors=rev(colorspace::sequential_hcl(palette = "Blues 3", n=100))) +
new_scale_fill() +
geom_polygon(data = diagDf, aes(var1, var2, fill = val, group = val)) +
theme(aspect.ratio = 1)
Created on 2021-09-27 by the reprex package (v2.0.0)

How to create multiple (6) plots with ggplot and save them to a pdf file?

I have a matrix (pred_matrix, dim = 1e6, 250), the rows are "pixelstacks" of 250 NDVI values of a Landsat scene, from which i did a "fuzzy cmeans" classification witch 6 centers (classes), stored in the list results. I want now to plot a random subset of each class of the 1e6 rows. This is my quick and dirty code so far:
random_index <- floor(runif(10000, 1, 1e6+1))
random_cluster <- results[[6]]$cluster[random_index]
random_pred_matrix <- pred_matrix[random_index, ]
dates_subse_after_pred <- rdn_num[rm_na_pred_df]
random_res <- cbind(random_pred_matrix, random_cluster)
random_res <- t(random_res)
random_res <- cbind(c(dates_subse_after_pred, 1), random_res)
df_1 <- data.frame(random_res[1:250,c(TRUE, random_cluster==1)])
df_2 <- data.frame(random_res[1:250,c(TRUE, random_cluster==2)])
df_3 <- data.frame(random_res[1:250,c(TRUE, random_cluster==3)])
df_4 <- data.frame(random_res[1:250,c(TRUE, random_cluster==4)])
df_5 <- data.frame(random_res[1:250,c(TRUE, random_cluster==5)])
df_6 <- data.frame(random_res[1:250,c(TRUE, random_cluster==6)])
df_1.long <- melt(df_1, id.vars = 1)
df_1.long$X1 <- as.Date(df_1.long$X1)
df_2.long <- melt(df_2, id.vars = 1)
df_2.long$X1 <- as.Date(df_2.long$X1)
df_3.long <- melt(df_3, id.vars = 1)
df_3.long$X1 <- as.Date(df_3.long$X1)
df_4.long <- melt(df_4, id.vars = 1)
df_4.long$X1 <- as.Date(df_4.long$X1)
df_5.long <- melt(df_5, id.vars = 1)
df_5.long$X1 <- as.Date(df_5.long$X1)
df_6.long <- melt(df_6, id.vars = 1)
df_6.long$X1 <- as.Date(df_6.long$X1)
ggplot(df_1.long) +
geom_line( aes(x = X1, y= value, group = variable), color = "lightblue")
ggplot(df_2.long) +
geom_line( aes(x = X1, y= value, group = variable), color = "blue")
ggplot(df_3.long) +
geom_line( aes(x = X1, y= value, group = variable), color = "lightgreen")
ggplot(df_4.long) +
geom_line( aes(x = X1, y= value, group = variable), color = "green")
ggplot(df_5.long) +
geom_line( aes(x = X1, y= value, group = variable), color = "pink")
ggplot(df_6.long) +
geom_line( aes(x = X1, y= value, group = variable), color = "red")
After this i have just hit 6 times the export button in rstudio and inserted it all in a word document...
Is there a way to do this in a loop? Or even produce a final pdf containing the 6 plots?
Separate file
I think what you are after is having the following six times in your code.
ggsave("filename.png", # or pdf if you like
plot = last_plot(), # or give ggplot object name as in myPlot,
width = 5, height = 5,
units = "in", # other options c("in", "cm", "mm"),
dpi = 300)
For example,
library(ggplot2)
p1 <- ggplot(df_1.long) +
geom_line( aes(x = X1, y= value, group = variable),
color = "lightblue")
ggsave("df1.png", plot = p1, dpi = 300)
All in one
If you want all the six files in one pdf, then first do
pdf("file_name.pdf")
# do your ggplots here
p1
p2
p6
dev.off()
If you are using Rstudio I would recommend writing your code in a Rmarkdown file and then exporting to pdf directly.

ggplot2 histogram binwidth [duplicate]

This question already has answers here:
Different breaks per facet in ggplot2 histogram
(4 answers)
Closed 8 years ago.
I would like to create multiple histograms within one plot (using facet_wrap).
This could be an example code:
df <- data.frame(p1 = rnorm(100,5,2), p2 = rnorm(100,80,20), group = rep(LETTERS[1:4],25))
library(ggplot2)
library(reshape)
plotData <- melt(df, id.vars = "group", measure.vars = c("p1","p2") )
m <- ggplot(plotData, aes(x = value, color = group, fill = group, group = group))
m <- m + geom_bar(position=position_dodge())
m <- m + facet_wrap( ~ variable,scales = "free_x")
print(m)
Now, I would like to modify the plot that it creates per parameter ("p1,"p2") let's say 10 bins.
Up to now, I could not find a way to do this as binwidth/breaks calculation should be dependent on a subset of data.
Is it possible at all?
I want to share my solution (taken from the answered question linked above) extended by the possibility to overlay the histograms with density curves scaled to histogram counts:
df <- data.frame(p1 = rnorm(1000,5,2), p2 = rnorm(1000,80,20), group = rep(LETTERS[1:4],25))
library(ggplot2)
library(reshape)
library(plyr)
plotData <- melt(df, id.vars = "group", measure.vars = c("p1","p2") )
nBins <- 10
groupedData <- dlply(plotData, .(variable))
groupedBinWidth <- llply(groupedData, .fun = function(data, nBins) {
r <- range(data$value, na.rm = TRUE, finite = TRUE)
widthOfBins = (r[2] - r[1])/nBins
if (is.na(widthOfBins) || is.infinite(widthOfBins) || (widthOfBins <= 0)) widthOfBins <- NULL
widthOfBins
}, nBins = nBins)
densData <- dlply(plotData, .(variable, group), .fun = function(subData){
param <- subData$variable[1]
group <- subData$group[1]
d <- density(subData$value)
bw <- groupedBinWidth[[param]]
data.frame(x = d$x, y = d$y * nrow(subData) * bw , group = group, variable = param)
})
hls <- mapply(function(x, b) geom_bar(aes(x = value), position = position_dodge(), data = x, binwidth = b),
groupedData, groupedBinWidth)
dLay <- mapply(function(data) geom_density(data = data, aes(x = x, y = y), stat = "identity", fill = NA, size = 1),
densData)
m <- ggplot(plotData, aes(x = value, color = group, fill = group, group = group))
m <- m + hls
m <- m + dLay
m <- m + facet_wrap( ~ variable,scales = "free")
print(m)
Try this - really ugly code, but works if I understand you correctly. You might want to play with geom_density and maybe remove fill to make it more readable.
nbin<- 5
m <- ggplot(plotData, aes(x = value, color = group, fill = group, group = group))
m <- m + geom_histogram(data = subset(plotData, variable == "p1"), binwidth=diff(range(subset(plotData, variable == "p1")$value))/nbin)
m <- m + geom_histogram(data = subset(plotData, variable == "p2"), binwidth=diff(range(subset(plotData, variable == "p2")$value))/nbin)
m <- m + facet_wrap( ~ variable,scales = "free_x")
print(m)

Color one point and add an annotation in ggplot2?

I have a dataframe a with three columns :
GeneName, Index1, Index2
I draw a scatterplot like this
ggplot(a, aes(log10(Index1+1), Index2)) +geom_point(alpha=1/5)
Then I want to color a point whose GeneName is "G1" and add a text box near that point, what might be the easiest way to do it?
You could create a subset containing just that point and then add it to the plot:
# create the subset
g1 <- subset(a, GeneName == "G1")
# plot the data
ggplot(a, aes(log10(Index1+1), Index2)) + geom_point(alpha=1/5) + # this is the base plot
geom_point(data=g1, colour="red") + # this adds a red point
geom_text(data=g1, label="G1", vjust=1) # this adds a label for the red point
NOTE: Since everyone keeps up-voting this question, I thought I would make it easier to read.
Something like this should work. You may need to mess around with the x and y arguments to geom_text().
library(ggplot2)
highlight.gene <- "G1"
set.seed(23456)
a <- data.frame(GeneName = paste("G", 1:10, sep = ""),
Index1 = runif(10, 100, 200),
Index2 = runif(10, 100, 150))
a$highlight <- ifelse(a$GeneName == highlight.gene, "highlight", "normal")
textdf <- a[a$GeneName == highlight.gene, ]
mycolours <- c("highlight" = "red", "normal" = "grey50")
a
textdf
ggplot(data = a, aes(x = Index1, y = Index2)) +
geom_point(size = 3, aes(colour = highlight)) +
scale_color_manual("Status", values = mycolours) +
geom_text(data = textdf, aes(x = Index1 * 1.05, y = Index2, label = "my label")) +
theme(legend.position = "none") +
theme()

Resources