I'm trying to draw a simple (scree)-plot with some extra geom_hline and geom_vlines thrown in.
Problem is: whenever I so much as add show_guide=TRUE or add some aes() to the geom_xline, I screw up the original legend.
Here's some (ugly) fake data:
exdf <- data.frame(rep(x=1:12, times = 3), rep(x = c("A", "B", "C"), times = 6), rnorm(36), stringsAsFactors = FALSE)
colnames(exdf) <- c("PC", "variable", "eigenvalue")
And here's my plot:
g <- ggplot(data = exdf, mapping = aes(x = factor(PC), y = eigenvalue))
g <- g + geom_line(mapping = aes(group = factor(variable), linetype = variable))
g <- g + geom_vline(xintercept = 7, colour = "green", show_guide = TRUE)
How do I add a separate legend for the geom_vline without polluting the other legend?
Can't wrap my head around why one layer's color would change that of another legend.
This partly solves the problem:
g <- ggplot(data = exdf, mapping = aes(x = factor(PC), y = eigenvalue))
g <- g + geom_line(mapping = aes(group = factor(variable), linetype = variable))
g <- g + geom_vline(aes(xintercept = x, colour = Threshold), data.frame(x = 7, Threshold = "A"), show_guide = TRUE) + scale_colour_manual(values = c(A = "green")
But the legend will still have crosses for the variable section, albeit not green ones.
Alternatively you could use a geom_line with a new data frame with two rows, both with the same x and y equal to the lower and upper bounds of your data. This will give you a legend that has a horizontal green line for your threshold and no vertical lines.
Based on #Nick K's suggestion in the above, here's a way to do this with clean legends via different data = for the different layers.
exdf <- data.frame(rep(x=1:12, times = 3), rep(x = c("A", "B", "C"), times = 6), rnorm(36), stringsAsFactors = FALSE)
colnames(exdf) <- c("PC", "variable", "eigenvalue")
g <- ggplot()
g <- g + geom_line(data = exdf, mapping = aes(x = factor(PC), y = eigenvalue, group = factor(variable), linetype = variable))
g
thresholds <- data.frame(threshold = "Threshold-A", PC = 7, ymin = min(exdf$eigenvalue), ymax = max(exdf$eigenvalue))
g <- g + geom_linerange(data = thresholds, mapping = aes(x = PC, ymin = ymin, ymax = ymax, color = threshold))
g
yields:
Notice:
I know, the original data exdf are dumb and make an ugly plot; that's not the point here.
Notice that you have to set the data = argument for both layers, and keep the first g <- ggplot() blank, otherwise ggplot2 gets confused about the dataframes.
yeah, it's a hack job (see below), and it also doesn't fill the y-height of the plot, as a geom_vline should.
As an add-on, (not a solution!), here's how it should work with geom_vline:
exdf <- data.frame(rep(x=1:12, times = 3), rep(x = c("A", "B", "C"), times = 6), rnorm(36), stringsAsFactors = FALSE)
colnames(exdf) <- c("PC", "variable", "eigenvalue")
g <- ggplot()
g <- g + geom_line(data = exdf, mapping = aes(x = factor(PC), y = eigenvalue, group = factor(variable), linetype = variable))
g
g + geom_vline(data = thresholds, mapping = aes(xintercept = PC, color = threshold), show_guide = TRUE)
yields:
That fills the yheight, as you would expect from geom_vline, but somehow messes up the legend of variable (notice the vertical lines).
Not sure why this is so, feels like a bug to me.
Here reported: https://github.com/hadley/ggplot2/issues/1267
Related
I have a line chart built using ggplot2. It looks following:
Lines are close to each other and data labels are overlapping. It is not convenient. It would be better if light red labels were below the line and green labels where there is room for them. Something of the sort:
This post is helpful. However, I do not know in advance for which line it would be better to put labels above and for which it would be better to keep them below. Therefore I am looking for a generic solution.
ggrepel does a great job in organizing labels. But cannot figure out how to make it work in my case. I tried different parameters. Here is one of the simplest variants (not the best looking):
Questions:
Is there any way to make in R the chart look like on the 2nd picture?
I think ggrepel computes the best label position taking into account the size of the chart. If I export the chart to PowerPoint, for example, the size of the PowerPoint chart might be different from the size used to get optimal data label positions. Is there any way to pass the size of the chart to ggrepel?
Here is a code I used to generate data and charts:
library(ggplot2)
library(ggrepel)
set.seed(1)
x = rep(1:20, 3)
y = c(runif(20, 10, 11),
runif(20, 11, 12),
runif(20, 12, 13))
z = rep(c("a", "b", "c"), each = 20)
df = data.frame(x = x, y = y, z = z)
ggplot(data = df, aes(x = x, y = y, group = z, color = z)) +
geom_line() +
geom_text(aes(label = round(y, 1)), nudge_y = 1) +
ylim(c(0, 20))
ggplot(data = df, aes(x = x, y = y, group = z, color = z)) +
geom_line() +
geom_text_repel(aes(label = round(y, 1)), nudge_y = 1) +
ylim(c(0, 20))
Changing the theme to theme_bw() and removing gridlines from {ggExtra}'s removeGridX() gets the plot closer your second image. I also increased the size of the lines, limited the axes, and changed geom_text_repel to geom_label_repel to improve readability.
library(ggplot2)
library(ggrepel)
library(ggExtra)
set.seed(1)
x = rep(1:20, 3)
y = c(runif(20, 10, 11),
runif(20, 11, 12),
runif(20, 12, 13))
z = rep(c("a", "b", "c"), each = 20)
df = data.frame(x = x, y = y, z = z)
ggplot(data = df, aes(x = x, y = y, group = z, color = z)) +
theme_bw() + removeGridX() +
geom_line(size = 2) +
geom_label_repel(aes(label = round(y, 1)),
nudge_y = 0.5,
point.size = NA,
segment.color = NA,
min.segment.length = 0.1,
key_glyph = draw_key_path) +
scale_x_continuous(breaks=seq(0,20,by=1)) +
scale_y_continuous(breaks = seq(0, 14, 2), limits = c(0, 14))
I would like to plot multiple lines in a single ggplot, where each line would represent relationship between x and y given two or more parameters.
I know how to do that for one parameter:
Take following example data:
library(ggplot2)
library(reshape2)
rs = data.frame(seq(200, 1000, by=200),
runif(5),
runif(5),
rbinom(n = 5, size = 1, prob = 0.5))
names(rs) = c("x_", "var1", "var2", "par")
melted = melt(rs, id.vars="x_")
ggplot(data = melted,
aes(x = x_, y = value, group = variable, col = variable)) +
geom_point() +
geom_line(linetype = "dashed")
This plots three lines one for var1, one for var2 and one for par.
However, I would like four lines: one for var1 given par=0 and another one for var1 given par=1, and the same then again for var2.
How would this scale up, for example if I want that the condition is a combination of multiple parameters (e.g. par2 + par)?
If you melt the data in a different way, you can use par to change the shape and linetype of your lines, so it's nice and clear which line is which:
rs_melt = melt(rs, id.vars = c("x_", "par"))
ggplot(rs_melt, aes(x = x_, y = value, colour = variable,
shape = factor(par), linetype = factor(par))) +
geom_line(size = 1.1) +
geom_point(size = 3) +
labs(shape = "par", linetype = "par")
Output:
You need to adjust your melt function and add a group column which has both par and var details. I think below is what you want?
library(reshape)
library(ggplot2)
rs = data.frame(seq(200, 1000, by=200), runif(5), runif(5), rbinom(n = 5, size = 1, prob = 0.5))
names(rs)=c("x_", "var1", "var2", "par")
melted = melt(rs, id.vars=c("x_", "par"))
melted$group <- paste(melted$par, melted$variable)
ggplot(data=melted, aes(x=x_, y=value, group =group, col=group))+ geom_point() + geom_line(linetype = "dashed")
I have searched and searched, but I cant seem to find an elegant way of doing this!
I have a dataset Data consisting of Data$x (dates) and Data$y (numbers from 0 to 1)
I want to plot them in a bar-chart:
ggplot(Data) + geom_bar(aes(x = x, y = y, fill = y, stat = "identity")) +
scale_fill_gradient2(low = "red", high = "green", mid = "yellow", midpoint = 0.90)
The result looks like this
However, I wanted to give each bar a gradient in the vertical direction ranging from 0 (red) to y (greener depending on y). Is there any way of doing this smoothly?
I have tried to see if I could impose a picture on the graph as a hack, but I can't impose it on the bars only except in a super super ugly way.
Another, not very pretty, hack using geom_segment. The x start and end positions (x and xend) are hardcoded (- 0.4; + 0.4), so is the size. These numbers needs to be adjusted depending on the number of x values and range of y.
# some toy data
d <- data.frame(x = 1:3, y = 1:3)
# interpolate values from zero to y and create corresponding number of x values
vals <- lapply(d$y, function(y) seq(0, y, by = 0.01))
y <- unlist(vals)
mid <- rep(d$x, lengths(vals))
d2 <- data.frame(x = mid - 0.4,
xend = mid + 0.4,
y = y,
yend = y)
ggplot(data = d2, aes(x = x, xend = xend, y = y, yend = yend, color = y)) +
geom_segment(size = 2) +
scale_color_gradient2(low = "red", mid = "yellow", high = "green",
midpoint = max(d2$y)/2)
A somewhat related question which may give you some other ideas: How to make gradient color filled timeseries plot in R
Doesn't exist as far as I know, but you can manipulate your data to produce it.
library(ggplot2)
df = data.frame(x=c(1:10),y=runif(10))
prepGradient <- function(x,y,spacing=max(y)/100){
stopifnot(length(x)==length(y))
df <- data.frame(x=x,y=y)
newDf = data.frame(x=NULL,y=NULL,z=NULL)
for (r in 1:nrow(df)){
n <- floor(df[r,"y"]/spacing)
for (s in c(1:n)){
tmp <- data.frame(x=df[r,"x"],y=spacing,z=s*spacing)
newDf <- rbind(newDf,tmp)
}
tmp <- data.frame(x=df[r,"x"],y=df[r,"y"]%%spacing,z=df[r,"y"])
newDf <- rbind(newDf,tmp)
}
return(newDf)
}
df2 <- prepGradient(df$x,df$y)
ggplot(df2,aes(x=x,y=y,fill=z)) +
geom_bar(stat="identity") +
scale_fill_gradient2(low="red", high="green", mid="yellow",midpoint=median(df$y))+
ggtitle('Vertical Gradient Example') +
theme_minimal()
Found a less hacky way to do this when answering Change ggplot bar chart fill colors
library(tidyverse)
df <- data.frame(value = c(20, 50, 90),
group = c(1, 2, 3))
df_expanded <- df %>%
rowwise() %>%
summarise(group = group,
value = list(0:value)) %>%
unnest(cols = value)
df_expanded %>%
ggplot() +
geom_tile(aes(
x = group,
y = value,
fill = value,
width = 0.9
)) +
coord_flip() +
scale_fill_viridis_c(option = "C") +
theme(legend.position = "none")
Because this did not explicitly ask for divergent / multi-hue scales (in the title), here a simple hack for a single-hue gradient. This is very much the approach like suggested for a gradient fill under a curve as seen here
library(ggplot2)
d <- data.frame(x = 1:3, y = 1:3)
n_grad <- 1000
grad_df <- data.frame(yintercept = seq(0, 3, len = 200),
alpha = seq(0.3, 0, len = 200))
ggplot(d ) +
geom_col(aes(x, y), fill = "darkblue") +
geom_hline(data = grad_df, aes(yintercept = yintercept, alpha = alpha),
size = 1, colour = "white", show.legend = FALSE) +
## white background looks nicer then
theme_minimal()
I recognize that this has been an issue that's been asked in many other instances, but none of the solutions provided worked for my particular problem.
Here, I have the following data:
library(tidyverse)
library(scales)
mydata <- tibble(Category = c("A", "B", "C", "D"),
Result = c(0.442, 0.537, 0.426, 0.387),
A = c(NA, "A", NA, NA),
B = rep(NA, 4),
C = c(NA, "C", NA, NA),
D = c("D", "D", NA, NA))
mydata$Category <- factor(mydata$Category)
And I have the following vector for the colors:
colors_vct <- c(A = "#0079c0", B = "#cc9900", C = "#252525", D = "#c5120e")
With this information, I can create the following plot:
p <- ggplot(data = mydata , aes(x = Category, y = Result, fill = Category)) +
geom_bar(stat = "identity") + geom_text(aes(label = percent(Result), color = Category), hjust = -.25) +
coord_flip() + scale_y_continuous(limits = c(0,1), labels = percent) +
scale_colour_manual(values = colors_vct) + scale_fill_manual(values = colors_vct)
p
And I'd like to have little triangles appear after the labels based on whether a certain category is mentioned in the last 4 columns of mydata, colored by that category's color, as so:
p <- p + geom_text(data = filter(mydata, mydata[,3] == "A"), aes(label = sprintf("\u25b2")), colour = colors_vct["A"], hjust = -4)
#p <- p + geom_text(data = filter(mydata, mydata[,4] == "B"), aes(label = sprintf("\u25b2")), colour = colors_vct["B"], hjust = -5) #This is commented out because there are no instances where the layer ends up being applied.
p <- p + geom_text(data = filter(mydata, mydata[,5] == "C"), aes(label = sprintf("\u25b2")), colour = colors_vct["C"], hjust = -6)
p <- p + geom_text(data = filter(mydata, mydata[,6] == "D"), aes(label = sprintf("\u25b2")), colour = colors_vct["D"], hjust = -7)
p
This is what I want the final chart to look like (more or less, see bonus question below). Now, I'd like to iterate the last bit of code using a for loop. And this is where I'm running into trouble. It just ends up adding one layer only. How do I make this work? Here is my attempt:
#Set the colors into another table for matching:
colors_tbl <- tibble(Category = levels(mydata$Category),
colors = c("#0079c0", "#cc9900", "#252525", "#c5120e"))
for (i in seq_along(mydata$Category)) {
if (is_character(mydata[[i]])) { #This makes the loop skip if there is nothing to be applied, as with category B.
#Filters to just the specific categories I need to have the triangles shown.
triangles <- filter(mydata, mydata[,(i+2)] == levels(mydata$Category)[i])
#Matches up with the colors_tbl to determine which color to use for that triangle.
triangles <- mutate(triangles, colors = colors_tbl$colors[match(levels(triangles$Category)[i], colors_tbl$Category)])
#Sets a particular position for that triangle for the hjust argument below.
pos <- -(i+3)
#Adding the layer to the plot object
p <- p + geom_text(data = triangles, aes(label = sprintf("\u25b2")), color = triangles$colors, hjust = pos)
}
}
p
:(
Bonus question: Is there a way I can avoid gaps in between the triangles, as per the 2nd chart?
EDIT: As per #baptiste 's suggestion, I re-processed the data as such:
mydata2 <- mydata %>% gather(key = comp, value = Present, -Result, -Category)
mydata2 <- mydata2 %>% mutate(colors = colors_tbl$colors[match(mydata2$Present, colors_tbl$Category)]) %>%
filter(!is.na(mydata2$Present)) %>% select(-comp)
mydata2 <- mydata2 %>% mutate(pos = if_else(Present == "A", -4, if_else(Present == "B", -5, if_else(Present == "C", -6, -7))))
p <- p + geom_text(data = mydata2, aes(x = Category, label = sprintf("\u25b2")), colour = mydata2$colors, hjust = mydata2$pos)
p
Ok, I got it to work. my bonus question still stands.
I am trying to plot a simple scatter plot for 3 groups, with different horizontal lines (line segment) for each group: for instance a hline at 3 for group "a", a hline at 2.5 for group "b" and a hline at 6 for group "c".
library(ggplot2)
df <- data.frame(tt = rep(c("a","b","c"),40),
val = round(rnorm(120, m = rep(c(4, 5, 7), each = 40))))
ggplot(df, aes(tt, val))+
geom_jitter(aes(tt, val), data = df, colour = I("red"),
position = position_jitter(width = 0.05))
I really appreciate your help!
Never send a line when a point can suffice:
library(ggplot2)
df <- data.frame(tt = rep(c("a","b","c"),40),
val = round(rnorm(120, m = rep(c(4, 5, 7), each = 40))))
hline <- data.frame(tt=c("a", "b", "c"), v=c(3, 2.5, 6))
ggplot(df, aes(tt, val))+
geom_point(data=hline, aes(tt, v), shape=95, size=20) +
geom_jitter(aes(tt, val), data = df, colour = I("red"),
position = position_jitter(width = 0.05))
There are other ways if this isn't acceptable, such as:
hline <- data.frame(tt=c(1, 2, 3), v=c(3, 2.5, 6))
ggplot(df, aes(tt, val))+
geom_jitter(aes(tt, val), data = df, colour = I("red"),
position = position_jitter(width = 0.05)) +
geom_segment(data=hline, aes(x=tt-0.25, xend=tt+0.25, y=v, yend=v))
The downside for the point is the egregious thickness and no control over width.
The downside for the segment is the need to use numerics for the discrete axis position vs the factors.
I also should have set the random seed to ensure reproducibility.