I am trying to plot two data.frame as two layers using ggplot2 "geom_raster" function. The top layer contains NA values that are set to "transparent" in order to make the underneath layer visible. As the scale_fill_xxx function can't be used twice, I've tried the following code (based on this post : ggplot2 - using two different color scales for overlayed plots) :
library(ggplot2)
df1 <- data.frame(x=rep(c(1,2,3),times=3), y=c(1,1,1,2,2,2,3,3,3), data= c(NA,4,9,NA,2,7,NA,NA,3))
df2 <- data.frame(x=rep(c(1,2,3),times=3), y=c(1,1,1,2,2,2,3,3,3), data= c(1,NA,NA,2,NA,NA,1,2,NA))
ggplot() +
geom_raster(data=df1, aes(y= y, x= x, fill= data)) +
scale_fill_gradientn(name="df1", colours=c("red", "blue"), na.value = "transparent") +
geom_raster(data= df2, aes(y= y, x= x, colour= as.factor(data))) +
scale_colour_manual(values = c("green", "black"), name= "df2", labels= c("Class 1", "Class 2"), na.value="transparent")
The thing is that the "colour" / "scale_colour_manual" solution does not return what I expect (it returns a dark grey plot instead). I would like the df1 "data" column to be represented on a red to blue scale (NA's should be transparent) and the df2 "data" column to be represented according to class number ("1"=green and "2"=black).
Could anyone help me to understand what's wrong with my procedure?
Here is a solution :
df3 = merge(df1, df2, by = c("x","y"))
names(df3)[names(df3) == "data.x"] <- "data.1"
names(df3)[names(df3) == "data.y"] <- "data.2"
df3$data = df3$data.1
df3$data[is.na(df3$data)] = df3$data.2[is.na(df3$data)]
myGrad <- colorRampPalette(c('blue','red')) # color gradient
min_value = min(df3$data[df3$data >2]) # minimum value except 1 and 2
max_value = max(df3$data) # maximum value
param = max_value - min_value + 1 # number of colors in the gradient
ggplot(df3, aes(x, y, fill = data)) + geom_raster() +
scale_fill_gradientn(colours=c("green","black", myGrad(param)),
values = rescale(c(1, 2, seq(min_value, max_value, 1))), na.value = "transparent")
I guess you will use this plot with higher values and ranges, I tried with a 5x5 matrix:
set.seed(123)
df4 = data.frame(x=rep(c(1,2,3,4,5),5), y=c(rep(1,5), rep(2,5), rep(3,5), rep(4,5), rep(5,5)),
data = sample(c(1:20), 25, prob = c(0.2,0.2,rep(0.6/18,18)), replace = T))
min_value = min(df4$data[df4$data >2])
max_value = max(df4$data)
param = max_value - min_value + 1
ggplot(df4, aes(x, y, fill = data)) + geom_raster() +
scale_fill_gradientn(colours=c("green","black", myGrad(param)),
values = rescale(c(1, 2, seq(min_value, max_value, 1))), na.value = "transparent")
Related
I'm wondering if in a geom_line you can make it so the colors of, say, the dashes within a single line alternate (rather than the colors differing between lines). For example, if I wanted this singular line to alternate red, green, and blue rather than being just red.
library(tidyverse)
ggplot(tibble(x = 1:10, y = 1:10), aes(x, y)) +
geom_line(linetype = "dashed", color = "red") # i'd like to say something like, color = c("red", "green", "blue") instead
While a little inefficient, a little-known thing about R's par(lty=) (that geom_line(linetype=) shares) is that it can be specified as on/off stretches. From ?par under Line Type Specification:
Line types can either be specified by giving an index into a small
built-in table of line types (1 = solid, 2 = dashed, etc, see
'lty' above) ...
(which is what most tutorials/howtos/plots tend to use)
... or directly as the lengths of on/off stretches of
line. This is done with a string of an even number (up to eight)
of characters, namely _non-zero_ (hexadecimal) digits which give
the lengths in consecutive positions in the string. For example,
the string '"33"' specifies three units on followed by three off
and '"3313"' specifies three units on followed by three off
followed by one on and finally three off. The 'units' here are
(on most devices) proportional to 'lwd', and with 'lwd = 1' are in
pixels or points or 1/96 inch.
So with your dat, one could do
dat <- tibble(x = 1:10, y = 1:10)
ggplot(dat, aes(x,y)) +
geom_line(linetype="1741", color="red", size=3) +
geom_line(linetype="1345", color="blue", size=3) +
geom_line(linetype="49", color="green", size=3)
to get
I could not get it to work without one blank space: the on/off stretches must always start with an "on", and end with an "off"; as such I could not find a pattern that didn't (at least once) end on an "on" without an imposed gap.
For further explanation, since we always must start with an "on", I start all three with at least a single pixel of "on"; the trick is to make the "long" stretch for the beginning to be the last line plotted, so it over-plots the others.
red: R.......RRRR.
1 -4--
---7--- 1
grn: G...GGGG.....
1 -4--
-3- --5--
blu: BBBB.........
-4--
----9----
This has some advantages: regardless of size=, it scales the same. For instance, omitting size=,
Using approx:
# number of points at which interpolation takes place
# increase if line takes sharp turns
n = 100
# number of segments along line, according to taste
n_seg = 20
# segment colors
cols = c("red", "green", "blue")
# interpolate
d = approx(dat$x, dat$y, n = n)
# create start and end points for segments
d2 = data.frame(x = head(d$x, -1), xend = d$x[-1],
y = head(d$y, -1), yend = d$y[-1])
# create vector of segment colors
d2$col = rep(cols, each = ceiling((n - 1) / n_seg), length.out = n - 1)
ggplot(d2, aes(x = x, xend = xend, y = y, yend = yend, color = col)) +
geom_segment() + scale_color_identity(guide = "none")
This is an implementation of a new Stat based on GeomSegment which creates alternating segments of different colors. This works by passing the alternating colors to the data frame created in Stat$compute_group. GeomSegment uses StatIdentity, so no need to specifically map xend, yend and color.
BIG THANKS to Henrik for showing a very neat way of creating the segments. (my own way was very convoluted, and I'll leave it in this thread for posterity). The only remaining "problem" is that the segments might have different lengths in changing slopes - on the other hand, it might be visually desirable to have different segment lengths in this case.
library(ggplot2)
## attaching just for demonstration purpose
library(patchwork)
# geom_colorpath
# #description lines with alternating color "just for the effect".
# #name colorpath
# #examples
# #export
StatColorPath <- ggproto("StatColorPath", Stat,
compute_group = function(data, scales, params,
n_seg = 20, n = 100, cols = c("black", "white")) {
# interpolate
d <- approx(data$x, data$y, n = n)
# create start and end points for segments
d2 <- data.frame(
x = head(d$x, -1), xend = d$x[-1],
y = head(d$y, -1), yend = d$y[-1]
)
# create vector of segment colors
d2$color <- rep(cols, each = ceiling((n - 1) / n_seg), length.out = n - 1)
d2
},
required_aes = c("x", "y")
)
# #rdname colorpath
# #import ggplot2
# #inheritParams ggplot2::layer
# #inheritParams ggplot2::geom_segment
# #param n_seg number of segments along line, according to taste
# #param n number of points at which interpolation takes place
# increase if line takes sharp turns
# #param cols vector of alternating colors
# #export
geom_colorpath <- function(mapping = NULL, data = NULL, geom = "segment",
position = "identity", na.rm = FALSE, show.legend = NA,
inherit.aes = TRUE, cols = c("black", "white"),
n_seg = 20, n = 100, ...) {
layer(
stat = StatColorPath, data = data, mapping = mapping, geom = geom,
position = position, show.legend = show.legend, inherit.aes = inherit.aes,
params = list(na.rm = na.rm, cols = cols, n = n, n_seg = n_seg,...)
)
}
## examples
dat <- data.frame(x = seq(2,10, 2), y = seq(4,20, 4))
p1 <- ggplot(dat, aes(x = x, y = y)) +
geom_colorpath()+
ggtitle("Default colors")
p2 <- ggplot(dat, aes(x, y)) +
geom_colorpath(cols = c("red", "blue"))+
ggtitle("Two colors")
p3 <- ggplot(dat, aes(x, y)) +
geom_colorpath(cols = c("red", "blue", "green"))+
ggtitle("Three colors")
p4 <- ggplot(dat, aes(x, y)) +
geom_colorpath(cols = c("red", "blue", "green", "white"))+
ggtitle("Four colors")
wrap_plots(mget(ls(pattern = "p[1-9]")))
air_df <- data.frame(x = 1: length(AirPassengers), y = c(AirPassengers))
a1 <- ggplot(air_df, aes(x, y)) +
geom_colorpath(cols = c("red", "blue", "green"))+
ggtitle("Works also with more complex curves")
a2 <- ggplot(air_df, aes(x, y)) +
geom_colorpath(cols = c("red", "blue", "green"), n_seg = 150)+
ggtitle("... more color segments")
a1 / a2
Created on 2022-06-22 by the reprex package (v2.0.1)
Here's a ggplot hack that is simple, but works for two colors only (your question is the top result when searching for "alternating colored dashed line" and I wanted to put this option out there). It results in two lines being overlayed, one a solid line, the other a dashed line.
library(dplyr)
library(ggplot2)
library(reshape2)
# Create df
x_value <- 1:10
group1 <- c(0,1,2,3,4,5,6,7,8,9)
group2 <- c(0,2,4,6,8,10,12,14,16,18)
dat <- data.frame(x_value, group1, group2) %>%
mutate(group2_2 = group2) %>% # Duplicate the column that you want to be alternating colors
melt(id.vars = "x_value", variable.name = "group", value.name ="y_value") # Long format
# Put in your selected order
dat$group <- factor(dat$group, levels=c("group1", "group2", "group2_2"))
# Plot
ggplot(dat, aes(x=x_value, y=y_value)) +
geom_line(aes(color=group, linetype=group), size=1) +
scale_color_manual(values=c("black", "red", "black")) +
scale_linetype_manual(values=c("solid", "solid", "dashed"))
Unfortunately the legend still needs to be edited by hand. Here's the example plot.
I'm trying to plot 2 sets of data points and a single line in R using ggplot.
The issue I'm having is with the legend.
As can be seen in the attached image, the legend applies the lines to all 3 data sets even though only one of them is plotted with a line.
I have melted the data into one long frame, but this still requires me to filter the data sets for each individual call to geom_line() and geom_path().
I want to graph the melted data, plotting a line based on one data set, and points on the remaining two, with a complete legend.
Here is the sample script I wrote to produce the plot:
xseq <- 1:100
x <- rnorm(n = 100, mean = 0.5, sd = 2)
x2 <- rnorm(n = 100, mean = 1, sd = 0.5)
x.lm <- lm(formula = x ~ xseq)
x.fit <- predict(x.lm, newdata = data.frame(xseq = 1:100), type = "response", se.fit = TRUE)
my_data <- data.frame(x = xseq, ypoints = x, ylines = x.fit$fit, ypoints2 = x2)
## Now try and plot it
melted_data <- melt(data = my_data, id.vars = "x")
p <- ggplot(data = melted_data, aes(x = x, y = value, color = variable, shape = variable, linetype = variable)) +
geom_point(data = filter(melted_data, variable == "ypoints")) +
geom_point(data = filter(melted_data, variable == "ypoints2")) +
geom_path(data = filter(melted_data, variable == "ylines"))
pushViewport(viewport(layout = grid.layout(1, 1))) # One on top of the other
print(p, vp = viewport(layout.pos.row = 1, layout.pos.col = 1))
You can set them manually like this:
We set linetype = "solid" for the first item and "blank" for others (no line).
Similarly for first item we set no shape (NA) and for others we will set whatever shape we need (I just put 7 and 8 there for an example). See e.g. http://www.r-bloggers.com/how-to-remember-point-shape-codes-in-r/ to help you to choose correct shapes for your needs.
If you are happy with dots then you can use my_shapes = c(NA,16,16) and scale_shape_manual(...) is not needed.
my_shapes = c(NA,7,8)
ggplot(data = melted_data, aes(x = x, y = value, color=variable, shape=variable )) +
geom_path(data = filter(melted_data, variable == "ylines") ) +
geom_point(data = filter(melted_data, variable %in% c("ypoints", "ypoints2"))) +
scale_colour_manual(values = c("red", "green", "blue"),
guide = guide_legend(override.aes = list(
linetype = c("solid", "blank","blank"),
shape = my_shapes))) +
scale_shape_manual(values = my_shapes)
But I am very curious if there is some more automated way. Hopefully someone can post better answer.
This post relied quite heavily on this answer: ggplot2: Different legend symbols for points and lines
From a data frame I want to plot a pie chart for five categories with their percentages as labels in the same graph in order from highest to lowest, going clockwise.
My code is:
League<-c("A","B","A","C","D","E","A","E","D","A","D")
data<-data.frame(League) # I have more variables
p<-ggplot(data,aes(x="",fill=League))
p<-p+geom_bar(width=1)
p<-p+coord_polar(theta="y")
p<-p+geom_text(data,aes(y=cumsum(sort(table(data)))-0.5*sort(table(data)),label=paste(as.character(round(sort(table(data))/sum(table(data)),2)),rep("%",5),sep="")))
p
I use
cumsum(sort(table(data)))-0.5*sort(table(data))
to place the label in the corresponding portion and
label=paste(as.character(round(sort(table(data))/sum(table(data)),2)),rep("%",5),sep="")
for the labels which is the percentages.
I get the following output:
Error: ggplot2 doesn't know how to deal with data of class uneval
I've preserved most of your code. I found this pretty easy to debug by leaving out the coord_polar... easier to see what's going on as a bar graph.
The main thing was to reorder the factor from highest to lowest to get the plotting order correct, then just playing with the label positions to get them right. I also simplified your code for the labels (you don't need the as.character or the rep, and paste0 is a shortcut for sep = "".)
League<-c("A","B","A","C","D","E","A","E","D","A","D")
data<-data.frame(League) # I have more variables
data$League <- reorder(data$League, X = data$League, FUN = function(x) -length(x))
at <- nrow(data) - as.numeric(cumsum(sort(table(data)))-0.5*sort(table(data)))
label=paste0(round(sort(table(data))/sum(table(data)),2) * 100,"%")
p <- ggplot(data,aes(x="", fill = League,fill=League)) +
geom_bar(width = 1) +
coord_polar(theta="y") +
annotate(geom = "text", y = at, x = 1, label = label)
p
The at calculation is finding the centers of the wedges. (It's easier to think of them as the centers of bars in a stacked bar plot, just run the above plot without the coord_polar line to see.) The at calculation can be broken out as follows:
table(data) is the number of rows in each group, and sort(table(data)) puts them in the order they'll be plotted. Taking the cumsum() of that gives us the edges of each bar when stacked on top of each other, and multiplying by 0.5 gives us the half the heights of each bar in the stack (or half the widths of the wedges of the pie).
as.numeric() simply ensures we have a numeric vector rather than an object of class table.
Subtracting the half-widths from the cumulative heights gives the centers each bar when stacked up. But ggplot will stack the bars with the biggest on the bottom, whereas all our sort()ing puts the smallest first, so we need to do nrow - everything because what we've actually calculate are the label positions relative to the top of the bar, not the bottom. (And, with the original disaggregated data, nrow() is the total number of rows hence the total height of the bar.)
Preface: I did not make pie charts of my own free will.
Here's a modification of the ggpie function that includes percentages:
library(ggplot2)
library(dplyr)
#
# df$main should contain observations of interest
# df$condition can optionally be used to facet wrap
#
# labels should be a character vector of same length as group_by(df, main) or
# group_by(df, condition, main) if facet wrapping
#
pie_chart <- function(df, main, labels = NULL, condition = NULL) {
# convert the data into percentages. group by conditional variable if needed
df <- group_by_(df, .dots = c(condition, main)) %>%
summarize(counts = n()) %>%
mutate(perc = counts / sum(counts)) %>%
arrange(desc(perc)) %>%
mutate(label_pos = cumsum(perc) - perc / 2,
perc_text = paste0(round(perc * 100), "%"))
# reorder the category factor levels to order the legend
df[[main]] <- factor(df[[main]], levels = unique(df[[main]]))
# if labels haven't been specified, use what's already there
if (is.null(labels)) labels <- as.character(df[[main]])
p <- ggplot(data = df, aes_string(x = factor(1), y = "perc", fill = main)) +
# make stacked bar chart with black border
geom_bar(stat = "identity", color = "black", width = 1) +
# add the percents to the interior of the chart
geom_text(aes(x = 1.25, y = label_pos, label = perc_text), size = 4) +
# add the category labels to the chart
# increase x / play with label strings if labels aren't pretty
geom_text(aes(x = 1.82, y = label_pos, label = labels), size = 4) +
# convert to polar coordinates
coord_polar(theta = "y") +
# formatting
scale_y_continuous(breaks = NULL) +
scale_fill_discrete(name = "", labels = unique(labels)) +
theme(text = element_text(size = 22),
axis.ticks = element_blank(),
axis.text = element_blank(),
axis.title = element_blank())
# facet wrap if that's happening
if (!is.null(condition)) p <- p + facet_wrap(condition)
return(p)
}
Example:
# sample data
resps <- c("A", "A", "A", "F", "C", "C", "D", "D", "E")
cond <- c(rep("cat A", 5), rep("cat B", 4))
example <- data.frame(resps, cond)
Just like a typical ggplot call:
ex_labs <- c("alpha", "charlie", "delta", "echo", "foxtrot")
pie_chart(example, main = "resps", labels = ex_labs) +
labs(title = "unfacetted example")
ex_labs2 <- c("alpha", "charlie", "foxtrot", "delta", "charlie", "echo")
pie_chart(example, main = "resps", labels = ex_labs2, condition = "cond") +
labs(title = "facetted example")
It worked on all included function greatly inspired from here
ggpie <- function (data)
{
# prepare name
deparse( substitute(data) ) -> name ;
# prepare percents for legend
table( factor(data) ) -> tmp.count1
prop.table( tmp.count1 ) * 100 -> tmp.percent1 ;
paste( tmp.percent1, " %", sep = "" ) -> tmp.percent2 ;
as.vector(tmp.count1) -> tmp.count1 ;
# find breaks for legend
rev( tmp.count1 ) -> tmp.count2 ;
rev( cumsum( tmp.count2 ) - (tmp.count2 / 2) ) -> tmp.breaks1 ;
# prepare data
data.frame( vector1 = tmp.count1, names1 = names(tmp.percent1) ) -> tmp.df1 ;
# plot data
tmp.graph1 <- ggplot(tmp.df1, aes(x = 1, y = vector1, fill = names1 ) ) +
geom_bar(stat = "identity", color = "black" ) +
guides( fill = guide_legend(override.aes = list( colour = NA ) ) ) +
coord_polar( theta = "y" ) +
theme(axis.ticks = element_blank(),
axis.text.y = element_blank(),
axis.text.x = element_text( colour = "black"),
axis.title = element_blank(),
plot.title = element_text( hjust = 0.5, vjust = 0.5) ) +
scale_y_continuous( breaks = tmp.breaks1, labels = tmp.percent2 ) +
ggtitle( name ) +
scale_fill_grey( name = "") ;
return( tmp.graph1 )
} ;
An example :
sample( LETTERS[1:6], 200, replace = TRUE) -> vector1 ;
ggpie(vector1)
Output
I have a dataframe a with three columns :
GeneName, Index1, Index2
I draw a scatterplot like this
ggplot(a, aes(log10(Index1+1), Index2)) +geom_point(alpha=1/5)
Then I want to color a point whose GeneName is "G1" and add a text box near that point, what might be the easiest way to do it?
You could create a subset containing just that point and then add it to the plot:
# create the subset
g1 <- subset(a, GeneName == "G1")
# plot the data
ggplot(a, aes(log10(Index1+1), Index2)) + geom_point(alpha=1/5) + # this is the base plot
geom_point(data=g1, colour="red") + # this adds a red point
geom_text(data=g1, label="G1", vjust=1) # this adds a label for the red point
NOTE: Since everyone keeps up-voting this question, I thought I would make it easier to read.
Something like this should work. You may need to mess around with the x and y arguments to geom_text().
library(ggplot2)
highlight.gene <- "G1"
set.seed(23456)
a <- data.frame(GeneName = paste("G", 1:10, sep = ""),
Index1 = runif(10, 100, 200),
Index2 = runif(10, 100, 150))
a$highlight <- ifelse(a$GeneName == highlight.gene, "highlight", "normal")
textdf <- a[a$GeneName == highlight.gene, ]
mycolours <- c("highlight" = "red", "normal" = "grey50")
a
textdf
ggplot(data = a, aes(x = Index1, y = Index2)) +
geom_point(size = 3, aes(colour = highlight)) +
scale_color_manual("Status", values = mycolours) +
geom_text(data = textdf, aes(x = Index1 * 1.05, y = Index2, label = "my label")) +
theme(legend.position = "none") +
theme()
There were example code for E on ggplot2 library:
theme_set(theme_bw())
dat = data.frame(value = rnorm(100,sd=2.5))
dat = within(dat, {
value_scaled = scale(value, scale = sd(value))
obs_idx = 1:length(value)
})
ggplot(aes(x = obs_idx, y = value_scaled), data = dat) +
geom_ribbon(ymin = -1, ymax = 1, alpha = 0.1) +
geom_line() + geom_point()
There is a question: How I can make in ggplot2 my first 10 lines in red and the rest lines in blue based on example? I tried to use some kind of layer syntax is, but it doesn't work.
First, add another column to your data frame dat. It has value 0 for the first 10 rows and 1 for the rest.
dat$group <- factor(rep.int(c(0, 1), c(10, nrow(dat)-10)))
Generate the plot:
library(ggplot2)
ggplot(aes(x = obs_idx, y = value_scaled), data = dat) +
geom_ribbon(ymin = -1, ymax = 1, alpha = 0.1) +
geom_line(aes(colour = group), show_guide = FALSE) +
scale_colour_manual(values = c("red", "blue")) +
geom_point()
The parameter show_guide = FALSE suppresses the legend for the red and blue lines.
OK, I could manage layers, the code is (not elegant, but works):
require(ggplot2)
value=round(rnorm(50,200,50),0)
nmbrs<-length(value) ## length of vector
obrv<-1:length(value) ## list of observations
#create data frame from the values
data_lj<-data.frame(obrv,value)
data_lj20<-data.frame(data_lj[1:20,1:2])
data_lj21v<-data.frame(data_lj[20:nmbrs,1:2])
#plot with ggplot
rr<-ggplot()+
layer(mapping=aes(obrv,value),geom="line",data=data_lj20,colour="red")+
layer(mapping=aes(obrv,value),geom="line",data=data_lj21v,colour="blue")
print(rr)