I have a dataframe (dat) with two columns 1) Month and 2) Value. I would like to highlight that the x-axis is not continuous in my boxplot by interrupting the x-axis with two angled lines on the x-axis that are empty between the angled lines.
Example Data and Boxplot
library(ggplot2)
set.seed(321)
dat <- data.frame(matrix(ncol = 2, nrow = 18))
x <- c("Month", "Value")
colnames(dat) <- x
dat$Month <- rep(c(1,2,3,10,11,12),3)
dat$Value <- rnorm(18,20,2)
ggplot(data = dat, aes(x = factor(Month), y = Value)) +
geom_boxplot() +
labs(x = "Month") +
theme_bw() +
theme(panel.grid = element_blank(),
text = element_text(size = 16),
axis.text.x = element_text(size = 14, color = "black"),
axis.text.y = element_text(size = 14, color = "black"))
The ideal figure would look something like below. How can I make this discontinuous axis in ggplot?
You could make use of the extended axis guides in the ggh4x package. Alas, you won't easily be able to create the "separators" without a hack similar to the one suggested by user Zhiqiang Wang
guide_axis_truncated accepts vectors to define lower and upper trunks. This also works for units, by the way, then you have to pass the vector inside the unit function (e.g., trunc_lower = unit(c(0,.45), "npc") !
library(ggplot2)
library(ggh4x)
set.seed(321)
dat <- data.frame(matrix(ncol = 2, nrow = 18))
x <- c("Month", "Value")
colnames(dat) <- x
dat$Month <- rep(c(1,2,3,10,11,12),3)
dat$Value <- rnorm(18,20,2)
# this is to make it slightly more programmatic
x1end <- 3.45
x2start <- 3.55
p <-
ggplot(data = dat, aes(x = factor(Month), y = Value)) +
geom_boxplot() +
labs(x = "Month") +
theme_classic() +
theme(axis.line = element_line(colour = "black"))
p +
guides(x = guide_axis_truncated(
trunc_lower = c(-Inf, x2start),
trunc_upper = c(x1end, Inf)
))
Created on 2021-11-01 by the reprex package (v2.0.1)
The below is taking user Zhiqiang Wang's hack a step further. You will see I am using simple trigonometry to calculate the segment coordinates. in order to make the angle actually look as it is defined in the function, you would need to set coord_equal.
# a simple function to help make the segments
add_separators <- function(x, y = 0, angle = 45, length = .1){
add_y <- length * sin(angle * pi/180)
add_x <- length * cos(angle * pi/180)
## making the list for your segments
myseg <- list(x = x - add_x, xend = x + add_x,
y = rep(y - add_y, length(x)), yend = rep(y + add_y, length(x)))
## this function returns an annotate layer with your segment coordinates
annotate("segment",
x = myseg$x, xend = myseg$xend,
y = myseg$y, yend = myseg$yend)
}
# you will need to set limits for correct positioning of your separators
# I chose 0.05 because this is the expand factor by default
y_sep <- min(dat$Value) -0.05*(min(dat$Value))
p +
guides(x = guide_axis_truncated(
trunc_lower = c(-Inf, x2start),
trunc_upper = c(x1end, Inf)
)) +
add_separators(x = c(x1end, x2start), y = y_sep, angle = 70) +
# you need to set expand to 0
scale_y_continuous(expand = c(0,0)) +
## to make the angle look like specified, you would need to use coord_equal()
coord_cartesian(clip = "off", ylim = c(y_sep, NA))
I think it is possible to get what you want. It may take some work.
Here is your graph:
library(ggplot2)
set.seed(321)
dat <- data.frame(matrix(ncol = 2, nrow = 18))
x <- c("Month", "Value")
colnames(dat) <- x
dat$Month <- rep(c(1,2,3,10,11,12),3)
dat$Value <- rnorm(18,20,2)
p <- ggplot(data = dat, aes(x = factor(Month), y = Value)) +
geom_boxplot() +
labs(x = "Month") +
theme_bw() +
theme(panel.grid = element_blank(),
text = element_text(size = 16),
axis.text.x = element_text(size = 14, color = "black"),
axis.text.y = element_text(size = 14, color = "black"))
Here is my effort:
p + annotate("segment", x = c(3.3, 3.5), xend = c(3.6, 3.8), y = c(14, 14), yend = c(15, 15))+
coord_cartesian(clip = "off", ylim = c(15, 25))
Get something like this:
If you want to go further, it may take several tries to get it right:
p + annotate("segment", x = c(3.3, 3.5), xend = c(3.6, 3.8), y = c(14, 14), yend = c(15, 15))+
annotate("segment", x = c(0, 3.65), xend = c(3.45, 7), y = c(14.55, 14.55), yend = c(14.55, 14.55)) +
coord_cartesian(clip = "off", ylim = c(15, 25)) +
theme_classic()+
theme(axis.line.x = element_blank())
Just replace axis with two new lines. This is a rough idea, it may take some time to make it perfect.
You could use facet_wrap. If you assign the first 3 months to one group, and the other months to another, then you can produce two plots that are side by side and use a single y axis.
It's not exactly what you want, but it will show the data effectively, and highlights the fact that the x axis is not continuous.
dat$group[dat$Month %in% c("1", "2", "3")] <- 1
dat$group[dat$Month %in% c("10", "11", "12")] <- 2
ggplot(data = dat, aes(x = factor(Month), y = Value)) +
geom_boxplot() +
labs(x = "Month") +
theme_bw() +
theme(panel.grid = element_blank(),
text = element_text(size = 16),
axis.text.x = element_text(size = 14, color = "black"),
axis.text.y = element_text(size = 14, color = "black")) +
facet_wrap(~group, scales = "free_x")
* Differences in the plot are likely due to using different versions of R where the set.seed gives different result
Related
I am using the windrose function posted here: Wind rose with ggplot (R)?
I need to have the percents on the figure showing on the individual lines (rather than on the left side), but so far I have not been able to figure out how. (see figure below for depiction of goal)
Here is the code that makes the figure:
p.windrose <- ggplot(data = data,
aes(x = dir.binned,y = (..count..)/sum(..count..),
fill = spd.binned)) +
geom_bar()+
scale_y_continuous(breaks = ybreaks.prct,labels=percent)+
ylab("")+
scale_x_discrete(drop = FALSE,
labels = waiver()) +
xlab("")+
coord_polar(start = -((dirres/2)/360) * 2*pi) +
scale_fill_manual(name = "Wind Speed (m/s)",
values = spd.colors,
drop = FALSE)+
theme_bw(base_size = 12, base_family = "Helvetica")
I marked up the figure I have so far with what I am trying to do! It'd be neat if the labels either auto-picked the location with the least wind in that direction, or if it had a tag for the placement so that it could be changed.
I tried using geom_text, but I get an error saying that "aesthetics must be valid data columns".
Thanks for your help!
One of the things you could do is to make an extra data.frame that you use for the labels. Since the data isn't available from your question, I'll illustrate with mock data below:
library(ggplot2)
# Mock data
df <- data.frame(
x = 1:360,
y = runif(360, 0, 0.20)
)
labels <- data.frame(
x = 90,
y = scales::extended_breaks()(range(df$y))
)
ggplot(data = df,
aes(x = as.factor(x), y = y)) +
geom_point() +
geom_text(data = labels,
aes(label = scales::percent(y, 1))) +
scale_x_discrete(breaks = seq(0, 1, length.out = 9) * 360) +
coord_polar() +
theme(axis.ticks.y = element_blank(), # Disables default y-axis
axis.text.y = element_blank())
#teunbrand answer got me very close! I wanted to add the code I used to get everything just right in case anyone in the future has a similar problem.
# Create the labels:
x_location <- pi # x location of the labels
# Get the percentage
T_data <- data %>%
dplyr::group_by(dir.binned) %>%
dplyr::summarise(count= n()) %>%
dplyr::mutate(y = count/sum(count))
labels <- data.frame(x = x_location,
y = scales::extended_breaks()(range(T_data$y)))
# Create figure
p.windrose <- ggplot() +
geom_bar(data = data,
aes(x = dir.binned, y = (..count..)/sum(..count..),
fill = spd.binned))+
geom_text(data = labels,
aes(x=x, y=y, label = scales::percent(y, 1))) +
scale_y_continuous(breaks = waiver(),labels=NULL)+
scale_x_discrete(drop = FALSE,
labels = waiver()) +
ylab("")+xlab("")+
coord_polar(start = -((dirres/2)/360) * 2*pi) +
scale_fill_manual(name = "Wind Speed (m/s)",
values = spd.colors,
drop = FALSE)+
theme_bw(base_size = 12, base_family = "Helvetica") +
theme(axis.ticks.y = element_blank(), # Disables default y-axis
axis.text.y = element_blank())
By using ggplot and faced_grid functions I'm trying to make a heatmap. I have a categorical y axis, and I want y axis labels to be left aligned. When I use theme(axis.text.y.left = element_text(hjust = 0)), each panels' labels are aligned independently. Here is the code:
#data
set.seed(1)
gruplar <- NA
for(i in 1:20) gruplar[i] <- paste(LETTERS[sample(c(1:20),sample(c(1:20),1),replace = T) ],
sep="",collapse = "")
gruplar <- cbind(gruplar,anagruplar=rep(1:4,each=5))
tarih <- data.frame(yil= rep(2014:2019,each=12) ,ay =rep_len(1:12, length.out = 72))
gruplar <- gruplar[rep(1:nrow(gruplar),each=nrow(tarih)),]
tarih <- tarih[rep_len(1:nrow(tarih),length.out = nrow(gruplar)),]
grouped <- cbind(tarih,gruplar)
grouped$value <- rnorm(nrow(grouped))
#plot
p <- ggplot(grouped,aes(ay,gruplar,fill=value))
p <- p + facet_grid(anagruplar~yil,scales = "free",
space = "free",switch = "y")
p <- p + theme_minimal(base_size = 14) +labs(x="",y="") +
theme(strip.placement = "outside",
strip.text.y = element_text(angle = 90))
p <- p + geom_raster(aes(fill = value), na.rm = T)
p + theme(axis.text.y.left = element_text(hjust = 0, size=14))
I know that by putting spaces and using a mono-space font I can solve the problem, but I have to use the font 'Calibri Light'.
Digging into grobs isn't my favourite hack, but it can serve its purpose here:
# generate plot
# (I used a smaller base_size because my computer screen is small)
p <- ggplot(grouped,aes(ay,gruplar,fill=value)) +
geom_raster(aes(fill = value),na.rm = T) +
facet_grid(anagruplar~yil,scales = "free",space = "free",switch = "y") +
labs(x="", y="") +
theme_minimal(base_size = 10) +
theme(strip.placement = "outside",
strip.text.y = element_text(angle = 90),
axis.text.y.left = element_text(hjust = 0, size=10))
# examine ggplot object: alignment is off
p
# convert to grob object: alignment is unchanged (i.e. still off)
gp <- ggplotGrob(p)
dev.off(); grid::grid.draw(gp)
# change viewport parameters for left axis grobs
for(i in which(grepl("axis-l", gp$layout$name))){
gp$grobs[[i]]$vp$x <- unit(0, "npc") # originally 1npc
gp$grobs[[i]]$vp$valid.just <- c(0, 0.5) # originally c(1, 0.5)
}
# re-examine grob object: alignment has been corrected
dev.off(); grid::grid.draw(gp)
I guess one option is to draw the labels on the right-hand side, and move that column in the gtable,
p <-ggplot(grouped,aes(ay,gruplar,fill=value)) +
facet_grid(anagruplar~yil,scales = "free",space = "free",switch = "y") +
geom_raster(aes(fill = value),na.rm = T) +
theme_minimal(base_size = 12) + labs(x="",y="") +
scale_y_discrete(position='right') +
theme(strip.placement = "outside", strip.text.y = element_text(angle = 90))+
theme(axis.text.y.left = element_text(hjust = 0,size=14))
g <- ggplotGrob(p)
id1 <- unique(g$layout[grepl("axis-l", g$layout$name),"l"])
id2 <- unique(g$layout[grepl("axis-r", g$layout$name),"l"])
g2 <- gridExtra::gtable_cbind(g[,seq(1,id1-1)],g[,id2], g[,seq(id1+1, id2-1)], g[,seq(id2+1, ncol(g))])
library(grid)
grid.newpage()
grid.draw(g2)
This seems like a bug in ggplot2, or at least what I consider an undesirable / unexpected behavior. You may have seen the approach suggested here, which uses string padding on a mono-space font to achieve the alignment.
This is pretty hacky, but if you need to achieve alignment using a particular font, you might replace the axis labels altogether with geom_text. I have a mostly-working solution, but it is ugly, in that each step seems to break something else!
library(ggplot2); library(dplyr)
# To add a blank facet before 2014, I convert to character
grouped$yil = as.character(grouped$yil)
# I add some rows for the dummy facet, in year "", to use for labels
grouped <- grouped %>%
bind_rows(grouped %>%
group_by(gruplar) %>%
slice(1) %>%
mutate(yil = "",
value = NA_real_) %>%
ungroup())
p <- ggplot(grouped,
aes(ay,gruplar,fill=value)) +
geom_raster(aes(fill = value),na.rm = T) +
scale_x_continuous(breaks = 4*0:3) +
facet_grid(anagruplar~yil,
scales = "free",space = "free",switch = "y") +
theme_minimal(base_size = 14) +
labs(x="",y="") +
theme(strip.placement = "outside",
strip.text.y = element_text(angle = 90),
axis.text.y.left = element_blank(),
panel.grid = element_blank()) +
geom_text(data = grouped %>%
filter(yil == ""),
aes(x = -40, y = gruplar, label = gruplar), hjust = 0) +
scale_fill_continuous(na.value = "white")
p
(The last problem with this plot that I can see is that it shows an orphaned "0" on the x axis of the dummy facet. Need another hack to get rid of that!)
I'm looking for a way to show a histogram of values (time2) with binwidth equal to 1, I think, and have the color of each observation ("count") be mapped to a second variable (diff).
df <- data.frame(person=seq(from=1, to=12, by=1),
time1=c(9, 9, 9, 8, 8, 8, 8, 7, 7, 6, 6, 5),
time2=c(9, 4, 3, 9, 6, 5, 4, 9, 3, 2, 1, 2))
df$diff <- df$time2-df$time1
I've not come across a plot like this before, and I don't know of a way to implement this is ggplot2. Any ideas? This toy example shows the distribution of values for 12 people measured at time 1 and time 2. The color is mapped to the change in values from time 1 to time 2. I'm trying to show non-quant students how the group mean shifts down by 2.75, but the individual movement from time 1 to time 2 ranges from an increase of 2 points, to a decrease of 6 points. On average the group improves, but one person stays the same and two people get worse.
Here is a hacked solution using geom_tile(). I'm sure someone could rewrite the data manipulation code using pure dplyr/purr. Most of the work is performed by mapping each of the tile to a x and y coordinate.
df_plot = df %>%
gather(time, value, time1:time2)
df_plot = df_plot %>%
split(df_plot$time) %>%
lapply(function(x) {x %>% group_by(value) %>% mutate(y=1:n())}) %>%
bind_rows() %>%
mutate(diff = factor(diff))
ggplot(df_plot) +
geom_tile(aes(x = value, y = y, fill = diff)) +
facet_wrap(~time) +
theme_classic() +
scale_fill_brewer(type = "seq", palette = 3) +
scale_x_continuous(breaks = 0:10) +
xlab("") + ylab("")
You can fudge with the fill colors to achieve your desired output. Also need to fudge with the plot dimensions to ensure that your tiles are squares.
# load packages
library(ggplot2)
# calculate nth occurence of time 1 value
new.df <- df %>%
group_by(time1) %>%
mutate(time1Index=1:n())
# plot time 1
p<- ggplot(new.df, aes(x = time1 , y=time1Index, fill = diff)) + geom_tile()
p + expand_limits(x = c(0, 10)) + xlab("") + ylab("")
# calculate nth occurence of time 2 value
new.df2 <- df %>%
group_by(time2) %>%
mutate(time2Index=1:n())
# plot time 2
p2<- ggplot(new.df2, aes(x = time2 , y=time2Index, fill = diff)) + geom_tile()
p2 + expand_limits(x = c(0, 10)) +
xlab("") + ylab("")
Here's an alternative that uses gridExtra if you want an alternative to facet_wrap- otherwise similar to Vlo's use of geom_tile. Used your example data for df:
Libraries:
library(data.table)
library(reshape2)
library(ggplot2)
library(gridExtra)
Convert to a data table, then add y values for time1 and time2 with .N and grouping for each
dt <- as.data.table(df)
dt[, y1 := 1:.N, by = time1][, y2 := 1:.N, by = time2]
Then, make a separate ggplot object for each, with particular scaling and color parameters:
p1 <- ggplot(dt) +
geom_tile(aes(x = time1, y = y1), fill = "white", col = "black") +
coord_cartesian(xlim = c(0, 10), ylim = c(0.5, 4.5), expand = TRUE) +
scale_x_continuous(breaks = 0:10)+
theme_classic() +
theme(axis.line.y = element_blank(),
axis.ticks.y = element_blank(),
axis.text.y = element_blank(),
axis.title.y = element_blank(),
plot.margin = unit(c(6,1,1,0.5), "cm"))
p2 <- ggplot(dt) +
geom_tile(aes(x = time2, y = y2, fill = diff), col = "black") +
scale_fill_gradientn(colours = c("#237018", "white", "red4"), values = c(0, 0.8, 1)) +
coord_cartesian(xlim = c(0, 10), ylim = c(0.5, 4.5), expand = TRUE) +
scale_x_continuous(breaks = 0:10) +
theme_classic() +
theme(axis.line.y = element_blank(),
axis.ticks.y = element_blank(),
axis.text.y = element_blank(),
axis.title.y = element_blank(),
plot.margin = unit(c(6,1,1,0.5), "cm"),
legend.position = c(0, 1.55),
legend.direction = "horizontal")
Then use grid.arrange to plot them adjacent:
grid.arrange(p1, p2, nrow = 1)
Output:
Couldn't quite get the legend right, might need some more work there.
Follow up to:
Subgroup axes ggplot2 similar to Excel PivotChart
ggplot2 multiple sub groups of a bar chart
R version 3.1.1 (2014-07-10) Platform: i386-w64-mingw32/i386 (32-bit)
I am working on a plot with ggplot2. The aim is to tweak the axis into a look similar to Excels famous pivot graphs. I know, how I can achieve the look I want, but as soon as I use axis limits, the code is not sufficient any more.
Data:
library(reshape2)
library(ggplot2)
library(grid)
df=data.frame(year=rep(2010:2014,each=4),
quarter=rep(c("Q1","Q2","Q3","Q4"),5),
da=c(46,47,51,50,56.3,53.6,55.8,58.9,61.0,63,58.8,62.5,59.5,61.7,60.6,63.9,68.4,62.2,62,70.4))
df.m <- melt(data = df,id.vars = c("year","quarter"))
g1 <- ggplot(data = df.m, aes(x = interaction(quarter,year), y = value, group = variable)) +
geom_area(fill = "red")+
coord_cartesian(ylim = c(0, 75)) +
annotate(geom = "text", x = seq_len(nrow(df)), y = -1.5, label = df$quarter, size = 2, color = "gray48") +
annotate(geom = "text", x = 2.5 + 4 * (0:4), y = -3, label = unique(df$year), size = 3, color ="gray48") +
theme_grey(base_size = 10)+
theme(line = element_line(size = 0.2),
axis.title.x = element_blank(),
axis.text.x = element_blank(),
legend.position= "none")
#remove clipping of x axis labels
g2 <- ggplot_gtable(ggplot_build(g1))
g2$layout$clip[g2$layout$name == "panel"] <- "off"
grid.draw(g2)
png(filename = "test.png",width = 14/2.54,height = 6/2.54, units = "in",res = 300)
grid.draw(g2)
dev.off()
The plot is fine and the axis lables are as wished. But as soon as you change the limits of the y axis everything is messed up.
I hope you have an idea, how to solve my problem!
Actually, it is plotting exactly what you are asking for. Check ?geom_area, and you will note that the minimum y is 0. So when you turn off clipping, ggplot will show as much of the area as it can within the limits of the lower margin. Instead use geom_ribbon(). It has ymax and ymin. Also, you need to take care setting the y-coordinates in the two annotate() functions.
library(reshape2)
library(ggplot2)
library(grid)
df=data.frame(year=rep(2010:2014,each=4),
quarter=rep(c("Q1","Q2","Q3","Q4"),5),
da=c(46,47,51,50,56.3,53.6,55.8,58.9,61.0,63,58.8,62.5,59.5,61.7,60.6,63.9,68.4,62.2,62,70.4))
df.m <- melt(data = df,id.vars = c("year","quarter"))
ymin <- 40
g1 <- ggplot(data = df.m, aes(x = interaction(quarter,year), ymax = value, group = variable)) +
geom_ribbon(aes(ymin=ymin), fill = "red")+
coord_cartesian(ylim = c(ymin, 75)) +
annotate(geom = "text", x = seq_len(nrow(df)), y = 37.5, label = df$quarter, size = 2, color = "gray48") +
annotate(geom = "text", x = 2.5 + 4 * (0:4), y = 36.5, label = unique(df$year), size = 3, color ="gray48") +
theme_grey(base_size = 10)+
theme(line = element_line(size = 0.2),
axis.title.x = element_blank(),
axis.text.x = element_blank(),
legend.position= "none",
plot.margin = unit(c(1,1,3,1), "lines")) # The bottom margin is exaggerated a little
# turn off clipping of the panel
g2 <- ggplotGrob(g1)
g2$layout$clip[g2$layout$name == "panel"] <- "off"
grid.draw(g2)
Here is my code that produces a plot. You can run it:
library(ggplot2)
library(grid)
time <- c(87,87.5, 88,87,87.5,88)
value <- c(10.25,10.12,9.9,8,7,6)
variable <-c("a","a","a","b","b","b")
PointSize <-c(5,5,5,5,5,5)
ShapeType <-c(10,10,10,10,10,10)
stacked <- data.frame(time, value, variable, PointSize, ShapeType)
stacked$PointSize <- ifelse(stacked$time==88, 8, 5)
stacked$ShapeType <- ifelse(stacked$time==88, 16,10)
MyPlot <- ggplot(stacked, aes(x=time, y=value, colour=variable, group=variable)) + geom_line() + xlab("Strike") + geom_point(aes(shape = ShapeType, size = PointSize)) + theme(axis.text.x = element_text(angle = 90, hjust = 1), axis.text = element_text(size = 10), axis.title=element_text(size=14), plot.title = element_text(size = rel(2)) , legend.position = "bottom", legend.text = element_text(size = 10), legend.key.size = unit(1, "cm") ) + scale_shape_identity(guide="none")+scale_size_identity(guide="none")
MyPlot
The plot that is produced highlight the point on the line where the time = 88.
I want to also highlight the point on the the line where the time = 87.925
Is this possible? The thing is that I do not have corresponding value for that time. IS there a way to just find put the point on the lines where time = 87.925 or does some interpolation need to take place so I can get a a value for that time?
Thank you!
You can use ggplot_build to pull out an interpolated value for each line . . .
## create a fake ggplot to smooth your values using a linear fit ##
tmp.plot <- ggplot(stacked, aes(x = time, y = value, colour = variable)) + stat_smooth(method="lm")
## use ggplot_build to pull out the smoothing values ##
tmp.dat <- ggplot_build(tmp.plot)$data[[1]]
## find the x values closest to 87.925 for each variable ##
tmp.ids <- which(abs(tmp.dat$x - 87.925)<0.001)
## store the x and y values for each variable ##
new.points <- tmp.dat[tmp.ids,2:3]
## create a data frame with the new points ##
newpts <- data.frame(new.points,c("a","b"),c(8,8),c(16,16))
names(newpts) <- c("time","value","variable","PointSize","ShapeType")
## add the new points to your original data frame ##
stacked <- rbind(stacked,newpts)
## plot ##
MyPlot
Instead of using a point for highlighting the 87.925 value for time, you can also use a vertical line:
ggplot(stacked, aes(x=time, y=value, colour=variable, group=variable)) +
geom_line() +
geom_point(aes(shape = ShapeType, size = PointSize)) +
geom_vline(aes(xintercept=87.925)) +
xlab("Strike") +
theme(axis.text.x = element_text(angle = 90, hjust = 1), axis.text = element_text(size = 10),
axis.title=element_text(size=14), plot.title = element_text(size = rel(2)), legend.position = "bottom",
legend.text = element_text(size = 10), legend.key.size = unit(1, "cm")) +
scale_shape_identity(guide="none") +
scale_size_identity(guide="none")
the result:
Update: you can add short lines with geom_segment. Replace geom_vline with
geom_segment(aes(x = 87.925, y = 6, xend = 87.925, yend = 6.3), color="black") +
geom_segment(aes(x = 87.925, y = 9.8, xend = 87.925, yend = 10.05), color="black") +
which results in: