plot rownames based on condition in ggplot - r

I have a data set like this:
df <- data.frame(v1 = rnorm(10), col = rbinom(10, size=1,prob= 0.5))
rownames(df) <- letters[1:10]
> head(df)
v1 col
a -0.1806868 1
b 0.6934783 0
c -0.4658297 1
d 1.6760829 0
e -0.8475840 0
f -1.3499387 1
I plot it like this:
ggplot(df, aes(x = v1, y=rownames(df), group = col, color= col)) + geom_point()
Now I want to show only the rownames on the y-axis where col == 1.
The other names should not be displayed (but the points should be)
To add some context, I have a plot with many overlapping variable names on the y-axis, but I only want to display the names of the ones outside the dashed line

You could use scale_y_discrete:
set.seed(2017);
df <- data.frame(v1 = rnorm(10), col = rbinom(10, size=1,prob= 0.5))
rownames(df) <- letters[1:10]
library(ggplot2);
ggplot(df, aes(x = v1, y = rownames(df), group = col, color = col)) +
geom_point() +
scale_y_discrete(
limits = rownames(df),
labels = ifelse(df$col == 1, rownames(df), ""))

There is not much to add to the answer given by #MauritsEvers, I just had the idea that for your plot it might be desirable to have fewer horizontal lines that guide your eye.
We can use the breaks argument in scale_y_discrete for that.
set.seed(1); df <- data.frame(v1 = rnorm(10), col = rbinom(10, size=1,prob= 0.5))
rownames(df) <- letters[1:10]
axis_labels <- which(df$col == 1)
ggplot(df, aes(x = v1, y=rownames(df), group = col, color= col)) +
geom_point() +
scale_y_discrete(breaks = rownames(df)[axis_labels])

Related

How to draw a multi-colored dashed line (alternating colors for visual effect) [duplicate]

This question already has answers here:
Alternating color of individual dashes in a geom_line
(4 answers)
Closed 8 months ago.
I was wondering if it is possible to create a multicolored dashed line in ggplot.
Basically I have a plot displaying savings based on two packages.
A orange line with savings based on package A
A green line with savings based on package B
I also have a third line and I would like that one to be dashed alterenating between orange and green. Is that something that somebody has been able to do?
Here is an example:
library(tidyverse)
S <- seq(0, 5, by = 0.05)
a <- S ^ 2
b <- S
a_b = a + b #This data should have the dashed multicolor line, since it is the sum of the other two lines.
S <- data.frame(S)
temp <- cbind(S, a, b, a_b)
temp <- gather(temp, variable, value, -S)
desiredOrder <- c("a", "b", "a_b")
temp$variable <- factor(temp$variable, levels = desiredOrder)
temp <- temp[order(temp$variable),]
p <- ggplot(temp, aes(x = S, y = value, colour = variable)) +
theme_minimal() +
geom_line(size = 1) +
scale_color_manual(name = "Legend", values = c("orange", "green", "#0085bd"),
breaks = c("a", "b", "a_b"))
p
I basically want to have a multicolored (dashed or dotted) line for "c"
This is, to my best knowledge, currently only possible via creation of new segments for each alternate color. This is fiddly.
Below I've tried a largely programmatic approach in which you can define the size of the repeating segment (based on your x unit). The positioning of y values is slightly convoluted and it will also result in slightly irregular segment lengths when dealing with different slopes. I also haven't tested it on many data, either. But I guess it's a good start :)
For the legend, I'm taking the same approach, by creating a fake legend and stitching it onto the other plot. The challenges here include:
positioning of legend elements relative to the plot
relative distance between the legend elements
update
For a much neater way to create those segments and a Stat implementation see this thread
library(tidyverse)
library(patchwork)
S <- seq(0, 5, by = 0.05)
a <- S^2
b <- S
a_b <- a + b
df <- data.frame(x = S, a, b, a_b) %>%
pivot_longer(-x, names_to = "variable", values_to = "value")
## a function to create modifiable cuts in order to get segments.
## this looks convoluted - and it is! there are a few if/else statements.
## Why? The assigment of new y to x values depends on how many original values
## you have.
## There might be more direct ways to get there
alt_colors <- function(df, x, y, seg_length, my_cols) {
x <- df[[x]]
y <- df[[y]]
## create new x for each tiny segment
length_seg <- seg_length / length(my_cols)
new_x <- seq(min(x, na.rm = TRUE), x[length(x)], length_seg)
## now we need to interpolate y values for each new x
## This is different depending on how many x and new x you have
if (length(new_x) < length(x)) {
ind_int <- findInterval(new_x, x)
new_y <- sapply(seq_along(ind_int), function(i) {
if (y[ind_int[i]] == y[ind_int[length(ind_int)]]) {
y[ind_int[i]]
} else {
seq_y <- seq(y[ind_int[i]], y[ind_int[i] + 1], length.out = length(my_cols))
head(seq_y, -1)
}
})
} else {
ind_int <- findInterval(new_x, x)
rle_int <- rle(ind_int)
new_y <- sapply(rle_int$values, function(i) {
if (y[i] == y[max(rle_int$values)]) {
y[i]
} else {
seq_y <- seq(y[i], y[i + 1], length.out = rle_int$lengths[i] + 1)
head(seq_y, -1)
}
})
}
## THis is also a bit painful and might cause other bugs that I haven't
## discovered yet.
if (length(unlist(new_y)) < length(new_x)) {
newdat <- data.frame(
x = new_x,
y = rep_len(unlist(new_y), length.out = length(new_x))
)
} else {
newdat <- data.frame(x = new_x, y = unlist(new_y))
}
newdat <- newdat %>%
mutate(xend = lead(x), yend = lead(y)) %>%
drop_na(xend)
newdat$color <- my_cols
newdat
}
## the below is just a demonstration of how the function would work
## using different segment widths
df_alt1 <-
df %>%
filter(variable == "a_b") %>%
alt_colors("x", "value", 1, c("orange", "green"))
df_alt.5 <-
df %>%
filter(variable == "a_b") %>%
alt_colors("x", "value", .5, c("orange", "green"))
df_ab <-
df %>%
filter(variable != "a_b") %>%
# for the identity mapping
mutate(color = ifelse(variable == "a", "green", "orange"))
## create data frame for the legend, also using the alt_colors function as per above
## the amount of x is a bit of trial and error, this is just a quick hack
## this is a trick to center the legend more or less relative to the main plot
y_leg <- ceiling(mean(range(df$value, na.rm = TRUE)))
dist_y <- 2
df_legend <-
data.frame(
variable = rep(unique(df$variable), each = 2),
x = 1:2,
y = rep(seq(y_leg - dist_y, y_leg + dist_y, by = dist_y), each = 2)
)
df_leg_onecol <-
df_legend %>%
filter(variable != "a_b") %>%
mutate(color = ifelse(variable == "a", "green", "orange"))
df_leg_alt <-
df_legend %>%
filter(variable == "a_b") %>%
alt_colors("x", "y", .5, c("orange", "green"))
## I am mapping the colors globally using identity mapping (see scale_identity).
p1 <-
ggplot(mapping = aes(x, value, colour = color)) +
theme_minimal() +
geom_line(data = df_ab, size = 1) +
geom_segment(data = df_alt1, aes(y = y, xend = xend, yend = yend), size = 1) +
scale_color_identity() +
ggtitle("alternating every 1 unit")
p.5 <-
ggplot(mapping = aes(x, value, colour = color)) +
theme_minimal() +
geom_line(data = df_ab, size = 1) +
geom_segment(data = df_alt.5, aes(y = y, xend = xend, yend = yend), size = 1) +
scale_color_identity() +
ggtitle("alternating every .5 unit")
p_leg <-
ggplot(mapping = aes(x, y, colour = color)) +
theme_void() +
geom_line(data = df_leg_onecol, size = 1) +
geom_segment(data = df_leg_alt, aes(xend = xend, yend = yend), size = 1) +
scale_color_identity() +
annotate(
geom = "text", y = unique(df_legend$y), label = unique(df_legend$variable),
x = max(df_legend$x + 1), hjust = 0
)
## set y limits to the range of the main plot
## in order to make the labels visible you need to adjust the plot margin and
## turn clipping off
p1 + p.5 +
(p_leg + coord_cartesian(ylim = range(df$value), clip = "off") +
theme(plot.margin = margin(r = 20, unit = "pt"))) +
plot_layout(widths = c(1, 1, .2))
Created on 2022-01-18 by the reprex package (v2.0.1)
(Copied this over from Alternating color of individual dashes in a geom_line)
Here's a ggplot hack that is simple, but works for two colors only. It results in two lines being overlayed, one a solid line, the other a dashed line.
library(dplyr)
library(ggplot2)
library(reshape2)
# Create df
x_value <- 1:10
group1 <- c(0,1,2,3,4,5,6,7,8,9)
group2 <- c(0,2,4,6,8,10,12,14,16,18)
dat <- data.frame(x_value, group1, group2) %>%
mutate(group2_2 = group2) %>% # Duplicate the column that you want to be alternating colors
melt(id.vars = "x_value", variable.name = "group", value.name ="y_value") # Long format
# Put in your selected order
dat$group <- factor(dat$group, levels=c("group1", "group2", "group2_2"))
# Plot
ggplot(dat, aes(x=x_value, y=y_value)) +
geom_line(aes(color=group, linetype=group), size=1) +
scale_color_manual(values=c("black", "red", "black")) +
scale_linetype_manual(values=c("solid", "solid", "dashed"))
Unfortunately the legend still needs to be edited by hand. Here's the example plot.

Create a three panel plot with one panel spanning 2 columns using ggplot2

I have following data:
df <- data.frame("Stat" = c("Var1","Var1","Var1","Var1","Var1","Var2","Var2","Var2","Var2","Var2","Var2","Var2","Var2","Var2","Var2","Var2","Var2","Var2","Var2","Var2","Var2","Var2","Var2","Var2","Var2","Var3","Var3","Var3","Var3","Var3","Var3","Var3","Var3","Var3","Var3"),
"Value" = c(0,1,2,3,4,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,1,2,3,4,5,6,7,8,9,10),
"n" = c(33,120,223,63,20,17,28,33,22, 35,41,53,44,55,59,39,33, 46,30,29,23,21,14,6,18,7,29,50,80,86,91,83,35,34, 20))
What I wanted to do is to plot the above data as bar plot in one canvas but in three rows (1 columns x 3 rows) and each panel should contain plot for only one variable (Stat) eg. Var1 in first panel, Var2 in second and Var3 in the third panel, using the following code:
library(multipanelfigure)
fig1 <- multi_panel_figure(columns = 2, rows = 2, panel_label_type = "none")
# fit the plots on the panels
fig1 %<>%
fill_panel(Var1Plot, column = 1, row = 1) %<>%
fill_panel(Var2Plot, column = 2, row = 1) %<>%
fill_panel(Var3Plot, column = 1:2, row = 2)
fig1
Issue is how to get the Var1Plot, Var2Plot and Var3Plot so that these can be placed in respective panels above. I used the below code, but not able to get the results into above panels:
library(tidyverse)
df %>% ggplot(aes(x = Value, y = n)) +
geom_bar(stat='identity') + facet_wrap(~ Stat)
Expected plot should look something like this :
Here's an approach with cowplot.
library(cowplot)
figure.list <- map(unique(df$Stat), ~
ggplot(data = subset(df, df$Stat == .x), aes(x = Value, y = n)) +
geom_bar(stat='identity') +
ggtitle(.x))
top <- plot_grid(figure.list[[1]], figure.list[[2]], ncol = 2)
bottom <- plot_grid(figure.list[[3]], ncol = 1)
plot_grid(top, bottom,
ncol=1, rel_heights=c(1,1))
If you really want some to be coord_flip-ed, you could make the list manually:
figure.list <- list()
figure.list[[1]] <- ggplot(data = subset(df, df$Stat == "Var1"), aes(x = Value, y = n)) +
geom_bar(stat='identity') + coord_flip()
figure.list[[2]] <- ggplot(data = subset(df, df$Stat == "Var2"), aes(x = Value, y = n)) +
geom_bar(stat='identity') + coord_flip()
figure.list[[3]] <- ggplot(data = subset(df, df$Stat == "Var3"), aes(x = Value, y = n)) +
geom_bar(stat='identity')
top <- plot_grid(figure.list[[1]], figure.list[[2]], ncol = 2)
bottom <- plot_grid(figure.list[[3]], ncol = 1)
plot_grid(top, bottom,
ncol=1, rel_heights=c(1,1))

ggplot: Drawing tiles / rectangles with discrete variables

I'm attempting to draw tiles / rectangles to get the following result:
library(tidyverse)
library(plotly)
set.seed(0)
df <- tibble(
a = runif(5),
b = runif(5),
c = runif(5),
d = runif(5),
case_id = 1:5
) %>% tidyr::pivot_longer(cols = -case_id)
plot <- ggplot2::ggplot(
data = df,
mapping = aes(
x = name,
y = value,
group = case_id
)
) + geom_point()
plot_boxes_y <- seq(from = 0, to = 1, by = .2)
plot_boxes_x <- unique(df$name) %>% length()
for (x in 1:plot_boxes_x) {
for (y in plot_boxes_y) {
plot <- plot + geom_rect(
mapping = aes_(
xmin = x - .5,
xmax = x + .5,
ymin = y - .5,
ymax = y + .5
),
color = "red",
fill = NA
)
}
}
plotly::ggplotly(plot)
As you can see, I currently do this by looping through coordinates and drawing each rectangle individually. The problem is, that this generates many layers which makes plotly::ggplotly() really slow on large datasets.
Therefore, I'm looking for a more efficient way. Please note, that I cannot use the panel.grid, since I intend to visualize z-data by filling rectangles later on.
My approach was to draw geom_tile() on top of the scatter plot:
# my attempt
df$z <- rep(0, nrow(df))
plot2 <- ggplot2::ggplot(
data = df,
mapping = aes(
x = name,
y = value,
color = z,
group = case_id
)
) + geom_point() + geom_tile()
I assume that this fails because of the fact that name is a discrete variable? So, how can i efficiently draw tiles in addition to my scatterplot?
Thanks
Here is a solution using the geom_tile option. The key here creating a data frame to hold the coordinates of the grid and then specifying the aesthetics individually in each of the function calls.
library(ggplot2)
library(tidyr)
set.seed(0)
df <- tibble(
a = runif(5),
b = runif(5),
c = runif(5),
d = runif(5),
case_id = 1:5
) %>% pivot_longer(cols = -case_id)
df$z <- rep(0, nrow(df))
#make data frame for the grid corrdinates
grid<-data.frame(x=factor( ordered( 1:4), labels = c("a", "b", "c", "d" )),
y=rep(seq(0, 1, .1), each=4))
#plot using geom_tile & geom_point
plot2 <- ggplot2::ggplot() + geom_tile(data=grid, aes(x=x, y=y), fill=NA, col="red") +
geom_point(data = df,
mapping = aes(
x = name,
y = value,
color = z,
group = case_id))
print(plot2)
if you don't mind them going beyond the axis
ggplot(df,aes(x=name,y=value)) + geom_point() +
geom_vline(xintercept=seq(0.5,4.5,by=1)) +
geom_hline(yintercept=seq(0,2,by=.2))
else:
#make a new data frame
GRIDS = rbind(
# the vertical lines
data.frame(x=seq(0.5,4.5,by=1),xend=seq(0.5,4.5,by=1),y=0,yend=2),
# the horizontal lines
data.frame(x=0.5,xend=4.5,y=seq(0,2,by=.2),yend=seq(0,2,by=.2))
)
ggplot(df,aes(x=name,y=value)) + geom_point() +
geom_segment(data=GRIDS,aes(x=x,y=y,xend=xend,yend=yend),col="red")

Plot heatmap from points in R

I want to plot a heatmap in R from a set of points.
I have a data frame like
X Y col
1 2 1
1 1 4
2 4 9
.......
I want to have a heatmap from this, with X and Y being the coordinates of the point, and col can be from 0 to 40. I tried to plot in points or using melt(), but with no luck.
I can plot some points with geom_point(), but I'd like to have a smooth transition from one color to another, some probably this is not the reight thing to do.
set.seed(1)
library(ggplot2)
df <- as.data.frame(expand.grid(1:50, 1:50))
df$col <- sample(0:40, size = nrow(df), replace = TRUE)
ggplot(df, aes(x = Var1, y = Var2, colour = col, fill = col )) +
geom_tile()
produces:
Edit:
And this
set.seed(1)
library(ggplot2)
df <- as.data.frame(expand.grid(1:50, 1:50))
df$col <- sample(0:40, size = nrow(df), replace = TRUE)
df <- df[sample(1:nrow(df), nrow(df) * .2, replace = FALSE), ] # make holes
df <- df[rep(1:nrow(df), df$col), -3]
ggplot(df, aes(x = Var1, y = Var2)) +
geom_point() +
stat_density2d(aes(fill=..density..), geom = "tile", contour = FALSE) +
scale_fill_gradient2(low = "white", high = "red")
produces

Color one point and add an annotation in ggplot2?

I have a dataframe a with three columns :
GeneName, Index1, Index2
I draw a scatterplot like this
ggplot(a, aes(log10(Index1+1), Index2)) +geom_point(alpha=1/5)
Then I want to color a point whose GeneName is "G1" and add a text box near that point, what might be the easiest way to do it?
You could create a subset containing just that point and then add it to the plot:
# create the subset
g1 <- subset(a, GeneName == "G1")
# plot the data
ggplot(a, aes(log10(Index1+1), Index2)) + geom_point(alpha=1/5) + # this is the base plot
geom_point(data=g1, colour="red") + # this adds a red point
geom_text(data=g1, label="G1", vjust=1) # this adds a label for the red point
NOTE: Since everyone keeps up-voting this question, I thought I would make it easier to read.
Something like this should work. You may need to mess around with the x and y arguments to geom_text().
library(ggplot2)
highlight.gene <- "G1"
set.seed(23456)
a <- data.frame(GeneName = paste("G", 1:10, sep = ""),
Index1 = runif(10, 100, 200),
Index2 = runif(10, 100, 150))
a$highlight <- ifelse(a$GeneName == highlight.gene, "highlight", "normal")
textdf <- a[a$GeneName == highlight.gene, ]
mycolours <- c("highlight" = "red", "normal" = "grey50")
a
textdf
ggplot(data = a, aes(x = Index1, y = Index2)) +
geom_point(size = 3, aes(colour = highlight)) +
scale_color_manual("Status", values = mycolours) +
geom_text(data = textdf, aes(x = Index1 * 1.05, y = Index2, label = "my label")) +
theme(legend.position = "none") +
theme()

Resources