Related
This question already has answers here:
Alternating color of individual dashes in a geom_line
(4 answers)
Closed 8 months ago.
I was wondering if it is possible to create a multicolored dashed line in ggplot.
Basically I have a plot displaying savings based on two packages.
A orange line with savings based on package A
A green line with savings based on package B
I also have a third line and I would like that one to be dashed alterenating between orange and green. Is that something that somebody has been able to do?
Here is an example:
library(tidyverse)
S <- seq(0, 5, by = 0.05)
a <- S ^ 2
b <- S
a_b = a + b #This data should have the dashed multicolor line, since it is the sum of the other two lines.
S <- data.frame(S)
temp <- cbind(S, a, b, a_b)
temp <- gather(temp, variable, value, -S)
desiredOrder <- c("a", "b", "a_b")
temp$variable <- factor(temp$variable, levels = desiredOrder)
temp <- temp[order(temp$variable),]
p <- ggplot(temp, aes(x = S, y = value, colour = variable)) +
theme_minimal() +
geom_line(size = 1) +
scale_color_manual(name = "Legend", values = c("orange", "green", "#0085bd"),
breaks = c("a", "b", "a_b"))
p
I basically want to have a multicolored (dashed or dotted) line for "c"
This is, to my best knowledge, currently only possible via creation of new segments for each alternate color. This is fiddly.
Below I've tried a largely programmatic approach in which you can define the size of the repeating segment (based on your x unit). The positioning of y values is slightly convoluted and it will also result in slightly irregular segment lengths when dealing with different slopes. I also haven't tested it on many data, either. But I guess it's a good start :)
For the legend, I'm taking the same approach, by creating a fake legend and stitching it onto the other plot. The challenges here include:
positioning of legend elements relative to the plot
relative distance between the legend elements
update
For a much neater way to create those segments and a Stat implementation see this thread
library(tidyverse)
library(patchwork)
S <- seq(0, 5, by = 0.05)
a <- S^2
b <- S
a_b <- a + b
df <- data.frame(x = S, a, b, a_b) %>%
pivot_longer(-x, names_to = "variable", values_to = "value")
## a function to create modifiable cuts in order to get segments.
## this looks convoluted - and it is! there are a few if/else statements.
## Why? The assigment of new y to x values depends on how many original values
## you have.
## There might be more direct ways to get there
alt_colors <- function(df, x, y, seg_length, my_cols) {
x <- df[[x]]
y <- df[[y]]
## create new x for each tiny segment
length_seg <- seg_length / length(my_cols)
new_x <- seq(min(x, na.rm = TRUE), x[length(x)], length_seg)
## now we need to interpolate y values for each new x
## This is different depending on how many x and new x you have
if (length(new_x) < length(x)) {
ind_int <- findInterval(new_x, x)
new_y <- sapply(seq_along(ind_int), function(i) {
if (y[ind_int[i]] == y[ind_int[length(ind_int)]]) {
y[ind_int[i]]
} else {
seq_y <- seq(y[ind_int[i]], y[ind_int[i] + 1], length.out = length(my_cols))
head(seq_y, -1)
}
})
} else {
ind_int <- findInterval(new_x, x)
rle_int <- rle(ind_int)
new_y <- sapply(rle_int$values, function(i) {
if (y[i] == y[max(rle_int$values)]) {
y[i]
} else {
seq_y <- seq(y[i], y[i + 1], length.out = rle_int$lengths[i] + 1)
head(seq_y, -1)
}
})
}
## THis is also a bit painful and might cause other bugs that I haven't
## discovered yet.
if (length(unlist(new_y)) < length(new_x)) {
newdat <- data.frame(
x = new_x,
y = rep_len(unlist(new_y), length.out = length(new_x))
)
} else {
newdat <- data.frame(x = new_x, y = unlist(new_y))
}
newdat <- newdat %>%
mutate(xend = lead(x), yend = lead(y)) %>%
drop_na(xend)
newdat$color <- my_cols
newdat
}
## the below is just a demonstration of how the function would work
## using different segment widths
df_alt1 <-
df %>%
filter(variable == "a_b") %>%
alt_colors("x", "value", 1, c("orange", "green"))
df_alt.5 <-
df %>%
filter(variable == "a_b") %>%
alt_colors("x", "value", .5, c("orange", "green"))
df_ab <-
df %>%
filter(variable != "a_b") %>%
# for the identity mapping
mutate(color = ifelse(variable == "a", "green", "orange"))
## create data frame for the legend, also using the alt_colors function as per above
## the amount of x is a bit of trial and error, this is just a quick hack
## this is a trick to center the legend more or less relative to the main plot
y_leg <- ceiling(mean(range(df$value, na.rm = TRUE)))
dist_y <- 2
df_legend <-
data.frame(
variable = rep(unique(df$variable), each = 2),
x = 1:2,
y = rep(seq(y_leg - dist_y, y_leg + dist_y, by = dist_y), each = 2)
)
df_leg_onecol <-
df_legend %>%
filter(variable != "a_b") %>%
mutate(color = ifelse(variable == "a", "green", "orange"))
df_leg_alt <-
df_legend %>%
filter(variable == "a_b") %>%
alt_colors("x", "y", .5, c("orange", "green"))
## I am mapping the colors globally using identity mapping (see scale_identity).
p1 <-
ggplot(mapping = aes(x, value, colour = color)) +
theme_minimal() +
geom_line(data = df_ab, size = 1) +
geom_segment(data = df_alt1, aes(y = y, xend = xend, yend = yend), size = 1) +
scale_color_identity() +
ggtitle("alternating every 1 unit")
p.5 <-
ggplot(mapping = aes(x, value, colour = color)) +
theme_minimal() +
geom_line(data = df_ab, size = 1) +
geom_segment(data = df_alt.5, aes(y = y, xend = xend, yend = yend), size = 1) +
scale_color_identity() +
ggtitle("alternating every .5 unit")
p_leg <-
ggplot(mapping = aes(x, y, colour = color)) +
theme_void() +
geom_line(data = df_leg_onecol, size = 1) +
geom_segment(data = df_leg_alt, aes(xend = xend, yend = yend), size = 1) +
scale_color_identity() +
annotate(
geom = "text", y = unique(df_legend$y), label = unique(df_legend$variable),
x = max(df_legend$x + 1), hjust = 0
)
## set y limits to the range of the main plot
## in order to make the labels visible you need to adjust the plot margin and
## turn clipping off
p1 + p.5 +
(p_leg + coord_cartesian(ylim = range(df$value), clip = "off") +
theme(plot.margin = margin(r = 20, unit = "pt"))) +
plot_layout(widths = c(1, 1, .2))
Created on 2022-01-18 by the reprex package (v2.0.1)
(Copied this over from Alternating color of individual dashes in a geom_line)
Here's a ggplot hack that is simple, but works for two colors only. It results in two lines being overlayed, one a solid line, the other a dashed line.
library(dplyr)
library(ggplot2)
library(reshape2)
# Create df
x_value <- 1:10
group1 <- c(0,1,2,3,4,5,6,7,8,9)
group2 <- c(0,2,4,6,8,10,12,14,16,18)
dat <- data.frame(x_value, group1, group2) %>%
mutate(group2_2 = group2) %>% # Duplicate the column that you want to be alternating colors
melt(id.vars = "x_value", variable.name = "group", value.name ="y_value") # Long format
# Put in your selected order
dat$group <- factor(dat$group, levels=c("group1", "group2", "group2_2"))
# Plot
ggplot(dat, aes(x=x_value, y=y_value)) +
geom_line(aes(color=group, linetype=group), size=1) +
scale_color_manual(values=c("black", "red", "black")) +
scale_linetype_manual(values=c("solid", "solid", "dashed"))
Unfortunately the legend still needs to be edited by hand. Here's the example plot.
I would like to create a graph that has superscripts on the axis instead of displaying unformatted numbers using ggplot2. I know that there are a lot of answers which change the axis label, but not the axis text. I am not trying to change the label of the graph, but the text on the axis.
Example:
x<-c('2^-5','2^-3','2^-1','2^1','2^2','2^3','2^5','2^7','2^9','2^11','2^13')
y<-c('2^-5','2^-3','2^-1','2^1','2^2','2^3','2^5','2^7','2^9','2^11','2^13')
df<-data.frame(x,y)
p<-ggplot()+
geom_point(data=df,aes(x=x,y=y),size=4)
p
So I would like the x-axis to display the same numbers but without the carrot.
EDIT:
A purely base approach:
df %>%
mutate_all(as.character)->new_df
res<-unlist(Map(function(x) eval(parse(text=x)),new_df$x))#replace with y for y
to_use<-unlist(lapply(res,as.expression))
split_text<-strsplit(gsub("\\^"," ",names(to_use))," ")
join_1<-as.numeric(sapply(split_text,"[[",1)) #tidyr::separate might help, less robust for numeric(I think)
join_2<-as.numeric(sapply(split_text,"[[",2))
to_use_1<-sapply(seq_along(join_1),function(x) parse(text=paste(join_1[x],"^",
join_2[x])))
The above can be reduced to less step, I posted the stepwise approach I took. The result for only x, the same can be done for y:
new_df %>%
ggplot()+
geom_point(aes(x=x,y=y),size=4)+
scale_x_discrete(breaks=df$x,labels=to_use_1)#replace with y and scale_y_discrete for y
Plot:
Original and erroneous answer:
I have deviated from standard tidyverse practice by using $, you can replace it with . and it might work although in this case it's not really important since the focus is on labels.:
library(dplyr)
df %>%
mutate(new_x=gsub("\\^"," ",x),
new_y=gsub("\\^"," ",y))->new_df
new_df %>%
ggplot()+
geom_point(aes(x=x,y=y),size=4)+
scale_x_discrete(breaks=x,labels=new_df$new_x)+
scale_y_discrete(breaks=y,labels=new_df$new_y)
This can be done with functions scale_x_log2 and scale_y_log2 that can be found in GitHub package jrnoldmisc.
First, install the package.
devtools::install_github("jrnold/rubbish")
Then, coerce the variables to numeric. I wil work with a copy of the original dataframe.
df1 <- df
df1[] <- lapply(df1, function(x){
x <- as.character(x)
sapply(x, function(.x)eval(parse(text = .x)))
})
Now, graph it.
library(jrnoldmisc)
library(ggplot2)
library(MASS)
library(scales)
a <- ggplot(df1, aes(x = x, y = y, size = 4)) +
geom_point(show.legend = FALSE) +
scale_x_log2(limits = c(0.01, NA),
labels = trans_format("log2", math_format(2^.x)),
breaks = trans_breaks("log2", function(x) 2^x, n = 10)) +
scale_y_log2(limits = c(0.01, NA),
labels = trans_format("log2", math_format(2^.x)),
breaks = trans_breaks("log2", function(x) 2^x, n = 10))
a + annotation_logticks(base = 2)
Edit.
Following the discussion in the comments, here are the two other ways that were seen to give different axis labels.
Axis labels every tick mark. Set limits = c(1.01, NA) and function argument n = 11, an odd number.
Axis labels on odd number exponents. Keep limits = c(0.01, NA), change to function(x) 2^(x - 1), n = 11.
Just the instructions, no plots.
The first.
a <- ggplot(df1, aes(x = x, y = y, size = 4)) +
geom_point(show.legend = FALSE) +
scale_x_log2(limits = c(1.01, NA),
labels = trans_format("log2", math_format(2^.x)),
breaks = trans_breaks("log2", function(x) 2^(x), n = 11)) +
scale_y_log2(limits = c(1.01, NA),
labels = trans_format("log2", math_format(2^.x)),
breaks = trans_breaks("log2", function(x) 2^(x), n = 11))
a + annotation_logticks(base = 2)
And the second.
a <- ggplot(df1, aes(x = x, y = y, size = 4)) +
geom_point(show.legend = FALSE) +
scale_x_log2(limits = c(0.01, NA),
labels = trans_format("log2", math_format(2^.x)),
breaks = trans_breaks("log2", function(x) 2^(x - 1), n = 11)) +
scale_y_log2(limits = c(0.01, NA),
labels = trans_format("log2", math_format(2^.x)),
breaks = trans_breaks("log2", function(x) 2^(x - 1), n = 11))
a + annotation_logticks(base = 2)
You can provide a function to the labels argument of the scale_x_*** and scale_y_*** functions to generate labels with superscripts (or other formatting). See examples below.
library(jrnoldmisc)
library(ggplot2)
df<-data.frame(x=2^seq(-5,5,2),
y=2^seq(-5,5,2))
ggplot(df) +
geom_point(aes(x=x,y=y),size=2) +
scale_x_log2(breaks=2^seq(-5,5,2),
labels=function(x) parse(text=paste("2^",round(log2(x),2))))
ggplot(df) +
geom_point(aes(x=x,y=y),size=2) +
scale_x_continuous(breaks=c(2^-5, 2^seq(1,5,2)),
labels=function(x) parse(text=paste("2^",round(log2(x),2))))
ggplot(df) +
geom_point(aes(x=x,y=y),size=2) +
scale_x_log10(breaks=10^seq(-1,1,1),
labels=function(x) parse(text=paste("10^",round(log10(x),2))))
I am trying to insert labels into a proportional barchart: one label per segment, with as text the percentage of each segment. With the help of thothal I managed to do this:
var1 <- factor(as.character(c(1,1,2,3,1,4,3,2,3,2,1,4,2,3,2,1,4,3,1,2)))
var2 <- factor(as.character(c(1,4,2,3,4,2,1,2,3,4,2,1,1,3,2,1,2,4,3,2)))
data <- data.frame(var1, var2)
dat <- ddply(data, .(var1), function(.) {
res <- cumsum(prop.table(table(factor(.$var2))))
data.frame(lab = names(res), y = c(res))
})
ggplot(data, aes(x = var1)) + geom_bar(aes(fill = var2), position = 'fill') +
geom_text(aes(label = lab, x = var1, y = y), data = dat)
I would like to have for labels the percentage of each level, and not the level name.
Any help appreciated!
You are telling geom_text to use var2 as your y variable. That is in fact as.numeric(data$var2), which translates to a range of 1-4. However, your barplot uses the cumulative percentages.
Hence you have to calculate these positions before:
library(ggplot2)
library(plyr) # just for convenience
var1 <- factor(as.character(c(1,1,2,3,1,4,3,2,3,2,1,4,2,3,2,1,4,3,1,2)))
var2 <- factor(as.character(c(1,4,2,3,4,2,1,2,3,4,2,1,1,3,2,1,2,4,3,2)))
data <- data.frame(var1, var2)
dat <- ddply(data, .(var1), function(.) {
res <- cumsum(prop.table(table(factor(.$var2)))) # re-factor to use only used levels
res2 <- prop.table(table(factor(.$var2))) # re-factor to use only used levels
data.frame(lab = names(res), y = c(res), lab2 = c(res2))
})
ggplot(data, aes(x = var1)) + geom_bar(aes(fill = var2), position = 'fill') +
geom_text(aes(label = round(lab2, 2), x = var1, y = y), data = dat)
This places the labs at the end of each bar. If you want to have them slightly offset, you should play arround in the creation of dat.
Another way to get non-cumulative percentage plus centering the labels, for future reference:
dat <- ddply(data, .(var1), function(.) {
good <- prop.table(table(factor(.$var2)))
res <- cumsum(prop.table(table(factor(.$var2))))
data.frame(lab = names(res), y = c(res), good = good, pos = cumsum(good) - 0.5*good)
})
ggplot(data, aes(x = var1)) + geom_bar(aes(fill = var2), position = 'fill') +
geom_text(aes(label = round(good.Freq, 2), x = var1, y = pos.Freq), data = dat)
I used the following code and work well for me, give it a try.
geom_text(aes(label = paste(round(dat2$value,0), "%"),
vjust = ifelse(value >= 0, -0.05, 1.15)
),
size = 4, position = position_stack(vjust=0.5)
)
Basically, you need label = paste(y value, "%"). In my code, dat2 is the data file name; value is the Y value in the figure. In this case, I rounded up the number with 0 decimal.Good luck.
I'd like to write some conditional stats in my graph if the data is bigger than a certain value.
With the kind help of Jack Ryan (Cut data and access groups to draw percentile lines), I could create the following script that groups data into hours and plots the result:
# Read example data
A <- read.csv(url('http://people.ee.ethz.ch/~hoferr/download/data-20130812.csv'))
# Libraries
library(doBy)
library(ggplot2)
library(plyr)
library(reshape2)
library(MASS)
library(scales)
# Sample size function
give.n <- function(x){
return(c(y = min(x) - 0.2, label = length(x)))
}
# Calculate gaps
gaps <- rep(NA, length(A$Timestamp))
times <- A$Timestamp
loss <- A$pingLoss
gap.start <- 1
gap.end <- 1
for(i in 2:length(A$Timestamp))
{ #For all rows
if(is.na(A$pingRTT.ms.[i]))
{ #Currently no connection
if(!is.na(A$pingRTT.ms.[i-1]))
{ #Connection lost now
gap.start <- i
}
if(!is.na(A$pingRTT.ms.[i+1]))
{ # Connection restores next time
gap.end <- i+1
gaps[gap.start] <- as.numeric(A$Timestamp[gap.end]-A$Timestamp[gap.start], units="secs")
loss[gap.start] <- gap.end - gap.start
}
}
}
H <- data.frame(times, gaps, loss)
H <- H[complete.cases(H),]
C <- H
C$dates <- strptime(C$times, "%Y-%m-%d %H:%M:%S")
C$h1 <- C$dates$hour
# Calculate percentiles
cuts <- c(1, .75, .5, .25, 0)
c <- ddply(C, .(h1), function (x) { summarise(x, y = quantile(x$gaps, cuts)) } )
c$cuts <- cuts
c <- dcast(c, h1 ~ cuts, value.var = "y")
c.melt <- melt(c, id.vars = "h1")
p <- ggplot(c.h1.melt, aes(x = h1, y = value, color = variable)) +
geom_point(size = 4) +
stat_summary(fun.data = max.n, geom = "text", fun.y = max, colour = "red", angle = 90, size=4) +
scale_colour_brewer(palette="RdYlBu", name="Percentile", guide = guide_legend(reverse=TRUE)) +
scale_x_continuous(breaks=0:23, limits = c(0,23)) +
annotation_logticks(sides = "lr") +
theme_bw() +
scale_y_log10(breaks=c(1e0,1e1,1e2,1e3,1e4), labels = trans_format("log10", math_format(10^.x)), limits=c(1e0,1e4)) +
xlab("Hour of day") + ylab("Ping gaps [s]")
p
p <- ggplot(c.m1.melt, aes(x = m1/60, y = value, color = variable)) +
geom_point(size = 1) +
stat_summary(fun.data = give.n, geom = "text", fun.y = median, angle = 90, size=4) +
stat_summary(fun.data = max.n, geom = "text", fun.y = max, colour = "red", angle = 90, size=4) +
scale_colour_brewer(palette="RdYlBu", name="Percentile", guide = guide_legend(reverse=TRUE)) +
scale_x_continuous(breaks=0:23, limits = c(0,24)) +
annotation_logticks(sides = "lr") +
theme_bw() +
scale_y_log10(breaks=c(1e0,1e1,1e2,1e3,1e4), labels = trans_format("log10", math_format(10^.x)), limits=c(1e0,1e4)) +
xlab("Time of day") + ylab("Ping gaps [s]")
p
This creates an hourly grouped plot of gaps with the length of the longest gaps written right next to the data points:
Below is the minutely grouped plot. The number are unreadable why I'd like to add conditional stats if the gap is longer than 5 minutes or only for the ten longest gaps or something like this.
I tried to just change the stat function to
max.n.filt <- function(x){
filter = 300
if ( x > filter ) {
return(c(y = max(x) + 0.4, label = round(max(10^x),2)))
} else {
return(c(y=x, label = ""))
}
}
and use this for the minutely grouped plot. But I got this error:
Error in list_to_dataframe(res, attr(.data, "split_labels")) :
Results do not have equal lengths
In addition: There were 50 or more warnings (use warnings() to see the first 50)
Error in if (nrow(layer_data) == 0) return() : argument is of length zero
Calls: print ... print.ggplot -> ggplot_gtable -> Map -> mapply -> <Anonymous>
In addition: Warning message:
Removed 6 rows containing missing values (geom_point).
In addition, in the hourly plot, I'd like to write the number of samples per hour right next to the length of the gaps. I think I can add a new column to the c data frame, but unfortunately I can't find a way to do this.
Any help is very much appreciated.
See ?stat_summary.
fun.data : Complete summary function. Should take data frame as input
and return data frame as output
Your function max.n.filt uses an if() statement that tries to evaluate the condition x > filter. But when length(x) > 1, the if() statement only evaluates the condition for the first value of x. When used on a data frame, this will return a list cobbled together from the original input x and whatever label the if() statement returns.
> max.n.filt(data.frame(x=c(10,15,400)))
$y.x
[1] 10 15 400
$label
[1] ""
Try a function that uses ifelse() instead:
max.n.filt2 <- function(x){
filter = 300 # whatever threshold
y = ifelse( x > filter, max(x) + 1, x[,1] )
label = ifelse( x > filter, round(max(x),2), NA )
return(data.frame(y=y[,1], label=label[,1]))
}
> max.n.filt2(data.frame(x=c(10,15,400)))
y label
1 10 NA
2 15 NA
3 401 400
Alternatively, you might just find it easier to use geom_text(). I can't reproduce your example, but here's a simulated dataset:
set.seed(101)
sim_data <- expand.grid(m1=1:1440, variable=factor(c(0,0.25,0.5,0.75,1)))
sim_data$sample_size <- sapply(1:1440, function(.) sample(1:25, 1, replace=T))
sim_data$value = t(sapply(1:1440, function(.) quantile(rgamma(sim_data$sample_size, 0.9, 0.5),c(0,0.25,0.5,0.75,1))))[1:(1440*5)]
Just use the subset argument in geom_text() to select those points you wish to label:
ggplot(sim_data, aes(x = m1/60, y = value, color = variable)) +
geom_point(size = 4) + geom_text(aes(label=round(value)), subset = .(variable == 1 & value > 25), angle = 90, size = 4, colour = "red", hjust = -0.5)
If you have a column of sample sizes, those can be incorporated into label with paste():
ggplot(sim_data, aes(x = m1/60, y = value, color = variable)) +
geom_point(size = 4) + geom_text(aes(label=paste(round(value),", N=",sample_size)), subset = .(variable == 1 & value > 25), angle = 90, size = 4, colour = "red", hjust = -0.25)
(or create a separate column in your data with whatever labels you want.) If you're asking about how to retrieve the sample sizes, you could modify your call to ddply() like this:
...
c2 <- ddply(C, .(h1), function (x) { cbind(summarise(x, y = quantile(x$gaps, cuts)), n=nrow(x)) } )
c2$cuts <- cuts
c2 <- dcast(c2, h1 + n ~ cuts, value.var = "y")
c2.h1.melt <- melt(c2, id.vars = c("h1","n"))
...
I have a dataframe a with three columns :
GeneName, Index1, Index2
I draw a scatterplot like this
ggplot(a, aes(log10(Index1+1), Index2)) +geom_point(alpha=1/5)
Then I want to color a point whose GeneName is "G1" and add a text box near that point, what might be the easiest way to do it?
You could create a subset containing just that point and then add it to the plot:
# create the subset
g1 <- subset(a, GeneName == "G1")
# plot the data
ggplot(a, aes(log10(Index1+1), Index2)) + geom_point(alpha=1/5) + # this is the base plot
geom_point(data=g1, colour="red") + # this adds a red point
geom_text(data=g1, label="G1", vjust=1) # this adds a label for the red point
NOTE: Since everyone keeps up-voting this question, I thought I would make it easier to read.
Something like this should work. You may need to mess around with the x and y arguments to geom_text().
library(ggplot2)
highlight.gene <- "G1"
set.seed(23456)
a <- data.frame(GeneName = paste("G", 1:10, sep = ""),
Index1 = runif(10, 100, 200),
Index2 = runif(10, 100, 150))
a$highlight <- ifelse(a$GeneName == highlight.gene, "highlight", "normal")
textdf <- a[a$GeneName == highlight.gene, ]
mycolours <- c("highlight" = "red", "normal" = "grey50")
a
textdf
ggplot(data = a, aes(x = Index1, y = Index2)) +
geom_point(size = 3, aes(colour = highlight)) +
scale_color_manual("Status", values = mycolours) +
geom_text(data = textdf, aes(x = Index1 * 1.05, y = Index2, label = "my label")) +
theme(legend.position = "none") +
theme()